GeoIP Ingest plugin should be not break with feature migration #85756

williamrandolph · 2022-04-07T21:27:31Z

Bug

It is possible for a system feature migration to break the geoip-ingest plugin on a cluster that has been upgraded from 7.17 to 8.x. The system features migration will reindex the .geoip_databases index into a new index called .geoip_databases-reindexed-for-8, then delete the original .geoip_databases index and replace it with an alias. The plugin is written to expect .geoip_databases to be a concrete index, so it fails to reload its database, and geoip ingest processors stop working.

We need to fix the code so that running a system feature migration doesn't create this problem.

Reproducing

Steps to reproduce:

On a 7.17 cluster, create a pipeline with a geoip processor.
Upgrade the cluster to 8.1.
Run a system features migration and wait for it to complete.
Wait for the geoip-ingest plugin to reload its databases, or restart the cluster to make changes take effect
Ingest a document using the pipeline.

Here are some curl commands I used on my local, single-node cluster:

Expand for details

# ON 7.17 - create geoip processor

curl -s -XPUT -H'content-type:application/json' \
   'localhost:9200/_ingest/pipeline/geoip' \
   -d '{"description":"test","processors":[{"geoip":{"field":"ip"}}]}'

curl -s -XPUT -H'content-type:application/json' \
    'localhost:9200/my-index-00001/_doc/pre_migration?pipeline=geoip' \
    -d '{"ip":"8.8.8.8"}'

curl -s -XGET 'localhost:9200/my-index-00001/_doc/pre_migration'

# ON 7.17 - create snapshot

curl -s -XPUT -H'content-type:application/json' \
    'localhost:9200/_snapshot/fs_backup' \
    -d '{"type":"fs","settings":{"location":"/Users/wbrafford/work/es-builds/snapshots"}}'

curl -s -XPUT -H'content-type:application/json' \
    'localhost:9200/_snapshot/fs_backup/pre_migration' \
    -d '{"indices":"-*","feature_states":["geoip"],"include_global_state":false}'

# ON 8.1 - run feature migration

curl -s -XPOST 'localhost:9200/_migration/system_features'

# ON 8.1 - restart and test doc

curl -s -XPUT -H'content-type:application/json' \
    'localhost:9200/my-index-00001/_doc/post_migration?pipeline=geoip' \
    -d '{"ip":"8.8.8.8"}'

curl -s -XGET 'localhost:9200/my-index-00001/_doc/post_migration'

# restore snapshot

curl -s -XPOST -H'content-type:application/json' \
    'localhost:9200/_snapshot/fs_backup/pre_migration/_restore' \
    -d '{"indices":"-*","feature_states":["geoip"],"include_global_state":false}'

curl -s -XPUT -H'content-type:application/json' \
    'localhost:9200/my-index-00001/_doc/post_restore?pipeline=geoip' \
    -d '{"ip":"8.8.8.8"}'

curl -s -XGET 'localhost:9200/my-index-00001/_doc/post_restore'

Workaround

The only fix I know of is to restore a geoip feature state from a snapshot taken before the system feature migration:

POST /_snapshot/<repo_name>/<snapshot_name>/_restore
{
  "indices": "-*",
  "feature_states": ["geoip"],
  "include_global_state": false
}

It doesn't really matter how old the snapshot is, because once the plugin is restored to a good state, it can update the geoip index.

Open questions

What approach should we take to fix this?

The .geoip_databases index doesn't contain any user data. We could just "migrate" it by deleting it and recreating it. This kind of fix would be the responsibility of the core-infra team.
We could make the geoip ingest plugin robust for the case where .geoip_databases is an alias, not a concrete index. This doesn't seem like something the plugin needs intrinsically, but it might be easier to do than changing the migration code.

We should also look at giving users a more convenient way to reset the state of the geoip-ingest plugin. #70426 would have been really useful to have for this bug.

cc @dakrone, @gwbrown, @joegallo

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-04-07T21:27:33Z

Pinging @elastic/es-data-management (Team:Data Management)

elasticmachine · 2022-04-07T21:27:33Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

gwbrown · 2022-04-07T21:37:59Z

We could make the geoip ingest plugin robust for the case where .geoip_databases is an alias, not a concrete index. This doesn't seem like something the plugin needs intrinsically, but it might be easier to do than changing the migration code.

This would be my preference, unless there's a reason I'm unaware of why this is difficult. Having to always have a concrete index at that name makes life very difficult given the lack of ability to rename indices, to the degree that we have considered requiring system indices to be behind aliases (and I still think that would be a good idea, even if we haven't done so yet).

dakrone · 2022-04-07T22:13:15Z

+1 to making the geoip downloader handle the case where the index is an alias without any problems.

pa-jberanek · 2022-04-29T08:17:02Z

I opened a case with Elastic Cloud support about a different impact of this issue. Our cluster ended up with a ".geoip_databases-reindexed-for-8" index, and this breaks both "Index Management" and the "Indices" view under Stack Monitoring in Kibana.

Kibana shows errors like:

Error loading indices
Indices [.geoip_databases-reindexed-for-8] use and access is reserved for system operations

I tried creating a special role and user which could delete restricted indices, so I could delete the index - but even this special user wasn't able to delete the problematic index.

williamrandolph added >bug :Core/Infra/Core Core issues without another label :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.3.0 v8.2.1 labels Apr 7, 2022

elasticmachine added Team:Data Management Meta label for data/management team Team:Core/Infra Meta label for core/infra team labels Apr 7, 2022

dakrone assigned joegallo Apr 11, 2022

joegallo mentioned this issue Apr 20, 2022

Handle .geoip_databases being an alias or a concrete index #85792

Merged

joegallo closed this as completed in #85792 May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeoIP Ingest plugin should be not break with feature migration #85756

GeoIP Ingest plugin should be not break with feature migration #85756

williamrandolph commented Apr 7, 2022

elasticmachine commented Apr 7, 2022

elasticmachine commented Apr 7, 2022

gwbrown commented Apr 7, 2022 •

edited

Loading

dakrone commented Apr 7, 2022 •

edited

Loading

pa-jberanek commented Apr 29, 2022 •

edited

Loading

GeoIP Ingest plugin should be not break with feature migration #85756

GeoIP Ingest plugin should be not break with feature migration #85756

Comments

williamrandolph commented Apr 7, 2022

Bug

Reproducing

Workaround

Open questions

elasticmachine commented Apr 7, 2022

elasticmachine commented Apr 7, 2022

gwbrown commented Apr 7, 2022 • edited Loading

dakrone commented Apr 7, 2022 • edited Loading

pa-jberanek commented Apr 29, 2022 • edited Loading

gwbrown commented Apr 7, 2022 •

edited

Loading

dakrone commented Apr 7, 2022 •

edited

Loading

pa-jberanek commented Apr 29, 2022 •

edited

Loading