Skip to content

Commit

Permalink
Resolving failed Kibana upgrade migrations
Browse files Browse the repository at this point in the history
  • Loading branch information
rudolf committed Oct 19, 2020
1 parent 2e37bd0 commit 4310b58
Show file tree
Hide file tree
Showing 2 changed files with 66 additions and 28 deletions.
6 changes: 3 additions & 3 deletions docs/setup/upgrade.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,9 @@ Before you upgrade {kib}:
To roll back to an earlier version, you **must** have a backup of your data.
* If you are using custom plugins, check that a compatible version is
available.
* Shut down all {kib} nodes. Running more than one {kib} version against the
same Elasticseach index is unsupported. If you upgrade while older {kib} nodes are
running, the upgrade can fail.
* Shut down all {kib} instances. Running more than one {kib} version against
the same Elasticseach index is unsupported. Upgrading while older {kib}
instances are running can cause data loss or upgrade failures.

To identify the changes you need to make to upgrade, and to enable you to
perform an Elasticsearch rolling upgrade with no downtime, you must upgrade to
Expand Down
88 changes: 63 additions & 25 deletions docs/setup/upgrade/upgrade-migrations.asciidoc
Original file line number Diff line number Diff line change
@@ -1,54 +1,92 @@
[[upgrade-migrations]]
=== Migrate saved objects
=== Upgrade migrations

Every time {kib} is upgraded it checks to see if all saved objects, such as dashboards, visualizations, and index patterns, are compatible with the new version. If any objects need to be updated, then the automatic saved object migration process is kicked off.
Every time {kib} is upgraded it checks to see if all saved objects, such as dashboards, visualizations, and index patterns, are compatible with the new version. If any saved objects need to be updated, then the automatic saved object migration process is kicked off.

NOTE: 6.7 includes an https://www.elastic.co/guide/en/kibana/6.7/upgrade-assistant.html[Upgrade Assistant]
to help you prepare for your upgrade to 7.0. To access the assistant, go to *Management > 7.0 Upgrade Assistant*.

WARNING: The following instructions assumes {kib} is using the default index names. If the `kibana.index` or `xpack.tasks.index` configuration settings were changed these instructions will have to be adapted accordingly.

[float]
[[upgrade-migrations-process]]
==== How the process works

Saved objects are stored in an index named `.kibana_N`, where `N` is a number that increments over time as {kib} is upgraded. The index alias `.kibana` points to the latest up-to-date index for a given install.
==== Background

NOTE: Prior to 6.5.0, saved objects were stored directly in an index named `.kibana`, so the first time you upgrade to {kib} version 6.5+, {kib} will migrate into `.kibana_1` and set `.kibana` up as an index alias.
Saved objects are stored in two indices `.kibana_N` and `.kibana_task_manager_N`, where `N` is a number that increments every time {kib} runs an upgrade migration. The index aliases `.kibana` and `.kibana_task_manager` point to the most up-to-date index.

While {kib} is starting up and before serving any HTTP traffic, it checks to see if any internal mapping changes or data transformations for existing saved objects are required.

When changes are necessary, a new incremental `.kibana_N` index is created with updated mappings, then the saved objects are loaded in batches from the existing index, transformed to whatever extent necessary, and added to this new index.
When changes are necessary, a new migration is started. To ensure that only one {kib} instance performs the migration, each instance will attempt to obtain a migration lock by creating a new `.kibana_N+1` index. The instance that succeeds in creating the index will then read batches of documents from the existing index, migrate them, and write them to the new index. Once the objects are migrated, the lock is released by pointing the `.kibana` index alias the new upgraded `.kibana_N+1` index.

Instances that failed to acquire a lock will log `Another Kibana instance appears to be migrating the index. Waiting for that migration to complete`. The instance will then wait until `.kibana` points to an upgraded index before starting up and serving HTTP traffic.

Once the objects are migrated, the `.kibana` index alias is updated to point to the new index, and {kib} finishes starting up and serving HTTP traffic.
NOTE: Prior to 6.5.0, saved objects were stored directly in an index named `.kibana`. After upgrading to version 6.5+, {kib} will migrate this index into `.kibana_N` and set `.kibana` up as an index alias. +
Prior to 7.4.0, task manager tasks were stored directly in an index name `.kibana_task_manager`. After upgrading to version 7.4+, {kib} will migrate this index into `.kibana_task_manager_N` and set `.kibana_task_manager` up as an index alias.

[float]
[[upgrade-migrations-old-indices]]
==== Handling old `.kibana` indices
[[upgrade-migrations-errors]]
==== Resolving migration failures

If {kib} terminates unexpectedly while migrating a saved object index, manual intervention is required before {kib} will attempt to perform the migration again.

After migrations have run, there will be multiple {kib} indices in {es}: (`.kibana_1`, `.kibana_2`, etc). {kib} only uses the index that the `.kibana` alias points to. The other {kib} indices can be safely deleted, but are left around as a matter of historical record, and to facilitate rolling {kib} back to a previous version.
As mentioned above, {kib} will create a migration lock for each index that requires a migration by creating a new `.kibana_N+1` index. For example: if the `.kibana_task_manager` alias is pointing to `.kibana_task_manager_5` then the first {kib} that succeeds in creating `.kibana_task_manager_6` will obtain the lock to start migrations.

However, if the instance that obtained the lock fails to migrate the index, all other {kib} instances will be blocked from performing this migration. This includes the instance that originally obtained the lock, it will be blocked from retrying the migration even when restarted.

To remove the lock and allow a new migration attempt, restore the backup snapshot:

1. Shutdown all {kib} instances to be 100% sure that there are no instances currently performing a migration.
2. Create a backup snapshot of all `.kibana*` indices.
3. Delete all saved object indices with `DELETE /.kibana*`
4. Restore the `.kibana*` indices and their aliases using the backup snapshot taken from before the upgrade was initiated. See {es} {ref}/modules-snapshots.html[snapshots]
5. Start up all {kib} instances.

If no backup snapshots are available, any migration locks can be manually removed as a last resort:

1. Shutdown all {kib} instances to be 100% sure that there are no instances currently performing a migration.
2. Identify any migration locks by comparing the output of `GET /_cat/aliases` and `GET /_cat/indices`. If e.g. `.kibana` is pointing to `.kibana_4` and there is a `.kibana_5` index, the `.kibana_5` index will act like a migration lock blocking further attempts. Be sure to check both the `.kibana` and `.kibana_task_manager` aliases and their indices.
3. Remove any migration locks e.g. `DELETE /.kibana_5`.
4. Start up all {kib} instances.

[float]
[[upgrade-migrations-errors]]
==== Handling errors during saved object migrations
[[upgrade-migrations-rolling-back]]
==== Rolling back to a previous version of {kib}

In order to rollback after a failed upgrade migration, the saved object indices might also have to be rolled back to be compatible with the previous {kibana} version.

NOTE: {kib} does not run a migration for every saved object index on every upgrade. A {kib} version upgrade can cause no migrations, migrate only the `.kibana` or the `.kibana_task_manager` index or both.

Rollback by restoring the saved object indices from a backup snapshot:

1. Shutdown all {kib} instances to be 100% sure that there are no instances currently performing a migration.
2. Create a backup snapshot of all `.kibana*` indices.
3. Delete all saved object indices with `DELETE /.kibana*`
4. Restore the `.kibana* indices and their aliases using the backup snapshot taken from before the upgrade was initiated. See {es} {ref}/modules-snapshots.html[snapshots]
5. Start up all {kib} instances on the older version you wish to rollback to.

If {kib} terminates unexpectedly while migrating a saved object index, some additional work may be required in order to get {kib} to re-attempt the migration.
(Not recommended) If a backup snapshot is not available {kib}'s indices can be manually rolled back as a last resort:

For example, if the `.kibana` alias is pointing to `.kibana_4`, and there is a `.kibana_5` index in {es}, the `.kibana_5` index will need to be deleted. {kib} will never attempt to overwrite an existing index.
1. Shutdown all {kib} instances to be 100% sure that there are no {kib} instances currently performing a migration.
2. Create a backup snapshot of the `.kibana*` indices.
3. Use the logs from the upgraded instances to identify which indices {kib} attempted to upgrade. The server logs will contain an entry like `[savedobjects-service] Creating index .kibana_4.` and/or `[savedobjects-service] Creating index .kibana_task_manager_2.` If no indices were created after upgrading {kib} then no further action is required to perform a rollback, skip ahead to step (5). If you're running multiple {kib} instances, be sure to inspect all instances' logs.
4. Delete each of the indices identified in step (2). e.g. `DELETE /.kibana_task_manager_2`
5. Inspect the output of `GET /_cat/aliases` if either the `.kibana` and/or `.kibana_task_manager` alias is missing, these will have to be created manually. Find the latest index from the output of `GET /_cat/indices` and create the missing alias to point to the latest index. E.g. if the `.kibana` alias was missing and the latest index is `.kibana_3` create a new alias with `POST /.kibana_3/_aliases/.kibana`.
6. Start up {kib} on the older version you wish to rollback to.

WARNING: Any changes made after a successfull upgrade migration will be lost when rolling back to a previous version.

[float]
[[upgrade-migrations-multiple-instances]]
==== Support for multiple {kib} instances

If you're running multiple {kib} instances for a single index behind a load balancer, it's important that you stop all instances before upgrading, so you do not have multiple different versions of {kib} trying to perform saved object migrations.
WARNING: Kibana does not support rolling upgrades. If you're running multiple {kib} instances, all instances should be stopped before upgrading.

The first instance that triggers saved object migrations will run the entire process. Any other instances started up while a migration is running will log a message and then wait until saved object migration has completed before they start serving HTTP traffic.
Different versions of {kib} running against the same {es} index, such as during a rolling upgrade, can cause data loss. This is because acknowledged writes from the older instances could be written into the _old_ index while the migration is in progress. To prevent this from happening ensure that all old {kiba} instances are shutdown before starting up instances on a newer version.

[float]
[[upgrade-migrations-rolling-back]]
==== Rolling back to a previous version of {kib}
The first instance that triggers saved object migrations will run the entire process. Any other instances started up while a migration is running will log a message and then wait until saved object migrations has completed before they start serving HTTP traffic.

When rolling {kib} back to a previous version, point the `.kibana` alias to
the appropriate {kib} index. When you have the previous version running again,
delete the more recent `.kibana_N` index or indices so that future upgrades are
based on the current {kib} index. You must restart {kib} to re-trigger the migration.
[float]
[[upgrade-migrations-old-indices]]
==== Handling old `.kibana_N` indices

WARNING: Rolling back to a previous {kib} version can result in saved object data loss if you had successfully upgraded and made changes to saved objects before rolling back.
After migrations have completed, there will be multiple {kib} indices in {es}: (`.kibana_1`, `.kibana_2`, etc). {kib} only uses the index that the `.kibana` alias points to. The other {kib} indices can be safely deleted, but are left around as a matter of historical record, and to facilitate rolling {kib} back to a previous version.

0 comments on commit 4310b58

Please sign in to comment.