Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edits to Update-by-Query doc #1

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 223 additions & 0 deletions docs/java-api/docs/update-by-query.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
[[docs-update-by-query]]
== Update By Query API

experimental[The update-by-query API is new and should still be considered experimental. The API may change in ways that are not backwards compatible]

The simplest usage of `updateByQuery` updates each
document in an index without changing the source. This usage enables
<<picking-up-a-new-property,picking up a new property>> or another online
mapping change.

[source,java]
--------------------------------------------------
UpdateByQueryRequestBuilder updateByQuery = UpdateByQueryAction.INSTANCE.newRequestBuilder(client);

updateByQuery.source("source_index").abortOnVersionConflict(false);

BulkIndexByScrollResponse response = updateByQuery.get();
--------------------------------------------------

Calls to the `updateByQuery` API start by getting a snapshot of the index, indexing
any documents found using the `internal` versioning.

NOTE: Version conflicts happen when a document changes between the time of the
snapshot and the time the index request processes.

When the versions match, `updateByQuery` updates the document
and increments the version number.

All update and query failures cause `updateByQuery` to abort. These failures are
available from the `BulkIndexByScrollResponse#getIndexingFailures` method. Any
successful updates remain and are not rolled back. While the first failure
causes the abort, the response contains all of the failures generated by the
failed bulk request.

To prevent version conflicts from causing `updateByQuery` to abort, set
`abortOnVersionConflict(false)`. The first example does this because it is
trying to pick up an online mapping change and a version conflict means that
the conflicting document was updated between the start of the `updateByQuery`
and the time when it attempted to update the document. This is fine because
that update will have picked up the online mapping update.

The `UpdateByQueryRequestBuilder` API supports filtering the updated documents,
limiting the total number of documents to update, and updating documents
with a script:

[source,java]
--------------------------------------------------
UpdateByQueryRequestBuilder updateByQuery = UpdateByQueryAction.INSTANCE.newRequestBuilder(client);

updateByQuery.source("source_index")
.filter(termQuery("level", "awesome"))
.size(1000)
.script(new Script("ctx._source.awesome = 'absolutely'", ScriptType.INLINE, "painless", emptyMap()));

BulkIndexByScrollResponse response = updateByQuery.get();
--------------------------------------------------

`UpdateByQueryRequestBuilder` also enables direct access to the query used
to select the documents. You can use this access to change the default scroll size or
otherwise modify the request for matching documents.

[source,java]
--------------------------------------------------
UpdateByQueryRequestBuilder updateByQuery = UpdateByQueryAction.INSTANCE.newRequestBuilder(client);

updateByQuery.source("source_index")
.source().setSize(500);

BulkIndexByScrollResponse response = updateByQuery.get();
--------------------------------------------------

You can also combine `size` with sorting to limit the documents updated:

[source,java]
--------------------------------------------------
UpdateByQueryRequestBuilder updateByQuery = UpdateByQueryAction.INSTANCE.newRequestBuilder(client);

updateByQuery.source("source_index").size(100)
.source().addSort("cat", SortOrder.DESC);

BulkIndexByScrollResponse response = updateByQuery.get();
--------------------------------------------------

In addition to changing the `_source` field for the document, you can use a script
to change the `update` action, similar to the Update API:

[source,java]
--------------------------------------------------
UpdateByQueryRequestBuilder updateByQuery = UpdateByQueryAction.INSTANCE.newRequestBuilder(client);

updateByQuery.source("source_index")
.script(new Script(
"if (ctx._source.awesome == 'absolutely) {"
+ " ctx.op='noop'
+ "} else if (ctx._source.awesome == 'lame') {"
+ " ctx.op='delete'"
+ "} else {"
+ "ctx._source.awesome = 'absolutely'}", ScriptType.INLINE, "painless", emptyMap()));

BulkIndexByScrollResponse response = updateByQuery.get();
--------------------------------------------------

As in the <<docs-update,Update API>>, you can set the value of `ctx.op` to change the
operation that executes:

`noop`::

Set `ctx.op = "noop"` if your script doesn't make any
changes. The `updateByQuery` operaton then omits that document from the updates.
This behavior increments the `noop` counter in the
<<docs-update-by-query-response-body, response body>>.

`delete`::

Set `ctx.op = "delete"` if your script deletes the document. The deletion
increments the `deleted` counter in the
<<docs-update-by-query-response-body, response body>>.

Setting `ctx.op` to any other value generates an error. Setting any
other field in `ctx` generates an error.

By design, this API only enables source modification for documents and cannot move documents.

You can also perform these operations on multiple indices and types at once, similar to the search API:

[source,java]
--------------------------------------------------
UpdateByQueryRequestBuilder updateByQuery = UpdateByQueryAction.INSTANCE.newRequestBuilder(client);

updateByQuery.source("foo", "bar").source().setTypes("a", "b");

BulkIndexByScrollResponse response = updateByQuery.get();
--------------------------------------------------

If you provide a `routing` value then the process copies the routing value to the scroll query,
limiting the process to the shards that match that routing value:

[source,java]
--------------------------------------------------
UpdateByQueryRequestBuilder updateByQuery = UpdateByQueryAction.INSTANCE.newRequestBuilder(client);

updateByQuery.source().setRouting("cat");

BulkIndexByScrollResponse response = updateByQuery.get();
--------------------------------------------------

`updateByQuery` can also use the <<ingest>> feature by
specifying a `pipeline` like this:

[source,java]
--------------------------------------------------
UpdateByQueryRequestBuilder updateByQuery = UpdateByQueryAction.INSTANCE.newRequestBuilder(client);

updateByQuery.setPipeline("hurray");

BulkIndexByScrollResponse response = updateByQuery.get();
--------------------------------------------------

[float]
[[docs-update-by-query-task-api]]
=== Works with the Task API

You can fetch the status of all running update-by-query requests with the
<<tasks,Task API>>:

[source,java]
--------------------------------------------------
ListTasksResponse tasksList = client.admin().cluster().prepareListTasks()
.setActions(UpdateByQueryAction.NAME).setDetailed(true).get();

for (TaskInfo info: tasksList.getTasks()) {
TaskId taskId = info.getTaskId();
BulkByScrollTask.Status status = (BulkByScrollTask.Status) info.getStatus();
// do stuff
}

--------------------------------------------------

With the `TaskId` shown above you can look up the task directly:

// provide API Example
[source,java]
--------------------------------------------------
GetTaskResponse get = client.admin().cluster().prepareGetTask(taskId).get();
--------------------------------------------------

[float]
[[docs-update-by-query-cancel-task-api]]
=== Works with the Cancel Task API

Any Update By Query can be canceled using the <<tasks,Task Cancel API>>:

[source,java]
--------------------------------------------------
// Cancel all update-by-query requests
client.admin().cluster().prepareCancelTasks().setActions(UpdateByQueryAction.NAME).get().getTasks()
// Cancel a specific update-by-query request
client.admin().cluster().prepareCancelTasks().setTaskId(taskId).get().getTasks()
--------------------------------------------------

Use the `list tasks` API to find the value of `taskId`.

Cancelling a request is typically a very fast process but can take up to a few seconds.
The task status API continues to list the task until the cancellation is complete.

[float]
[[docs-update-by-query-rethrottle]]
=== Rethrottling

Use the `_rethrottle` API to change the value of `requests_per_second` on a running update:

[source,java]
--------------------------------------------------
RethrottleAction.INSTANCE.newRequestBuilder(client).setTaskId(taskId).setRequestsPerSecond(2.0f).get();
--------------------------------------------------

Use the `list tasks` API to find the value of `taskId`.

As with the `updateByQuery` API, the value of `requests_per_second`
can be any positive float value to set the level of the throttle, or `Float.POSITIVE_INFINITY` to disable throttling.
A new rethrottling value that speeds up the query takes
effect immediately. Rethrotting values that slow the query take effect
after completing the current batch in order to prevent scroll timeouts.