Create action to migrate the contents of one index to a new index #20024
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The standard way to change an index's mapping is to create a new index with the
new mapping,
_reindex
the documents into the new index, flip the alias fromthe old index to the new index, and then remove the old index. Traditionally
this sort of thing has been left as an exercise for those implementing an
application against Elasticsearch but I think now is the time to implement this
in Elasticsearch because:
.tasks
index for storing the results oftasks long running. While we were fairly careful in designing its mappings,
I'm under no illusion that we got it right the first try. That just isn't the
way software works. We're going to want to run this on
.tasks
one day.handling upgrades to the format of the data is a concern for Logstash's
engineers.
In all of these cases the indexes are implementation details of their
application so we'd like to automatically upgrade them on startup rather than
provide upgrade scripts. That means that the application will want to migrate
its data every time it starts up so a user only has to get involved if the data
migration fails.
3 of the 5 applications that will need to do this migration live inside
Elasticsearch (Watcher and Security are a plugin,
.tasks
is in coreElasticsearch). So it looks like the right place to implement this is in core
Elasticsearch. The other advantage of implementing it there is that it can be
used by the widest range of users.
This PR intends to build an action into core Elasticsearch that:
200 OK
when the index is in the desired statealready.
important in "masterless" systems like Logstash so they can invoke this API on
startup and not have to worry about one node "winning". They all get the same
response.
responds with that information rather than some cryptic failure message.
index steps.
It exposes it with an HTTP request that looks like:
In this example
index_1
is the source index andindex_2
is the destinationindex. Unlike a normal create index command the
aliases
section is required.This is how
_migrate
knows that the process is complete and it is a goodpractice anyway. The alias is added to the destination index after all the docs
in the source index are migrated to the destination index and the destination
index has been
_refresh
ed so they are visible.Like
_reindex
and_delete_by_query
and_update_by_query
, these requestsare "big" in that they do many things and we expect them to take a long time if
they operate on a large number of documents. This can't be helped so we want to
make sure that this request integrates well with the task management API. That
means that it should be
"cancellable": true
and it's status should be superexpressive, returning the phase of the operation currently being performed and
if that phase is reindex then it needs to return the details of the reindex's
status.
We try to limit the number of "big" operations in core Elasticsearch because
every one of them feels like a new trap we are setting for unsuspecting users.
We will need to warn users that this can take some time and put some load on
the cluster. For the users all the way at the top of the document we don't
expect this to be a problem though. A Security index with a million documents
is huge but not a ton of work for reindex. We just have to make very very
sure that it is obvious to users that doing this against an index with a
hundred million documents is going to take a long time.