Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoIP Ingest plugin should be not break with feature migration #85756

Closed
williamrandolph opened this issue Apr 7, 2022 · 5 comments · Fixed by #85792
Closed

GeoIP Ingest plugin should be not break with feature migration #85756

williamrandolph opened this issue Apr 7, 2022 · 5 comments · Fixed by #85792
Assignees
Labels
>bug :Core/Infra/Core Core issues without another label :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Core/Infra Meta label for core/infra team Team:Data Management Meta label for data/management team v8.2.1 v8.3.0

Comments

@williamrandolph
Copy link
Contributor

Bug

It is possible for a system feature migration to break the geoip-ingest plugin on a cluster that has been upgraded from 7.17 to 8.x. The system features migration will reindex the .geoip_databases index into a new index called .geoip_databases-reindexed-for-8, then delete the original .geoip_databases index and replace it with an alias. The plugin is written to expect .geoip_databases to be a concrete index, so it fails to reload its database, and geoip ingest processors stop working.

We need to fix the code so that running a system feature migration doesn't create this problem.

Reproducing

Steps to reproduce:

  1. On a 7.17 cluster, create a pipeline with a geoip processor.
  2. Upgrade the cluster to 8.1.
  3. Run a system features migration and wait for it to complete.
  4. Wait for the geoip-ingest plugin to reload its databases, or restart the cluster to make changes take effect
  5. Ingest a document using the pipeline.

Here are some curl commands I used on my local, single-node cluster:

Expand for details
# ON 7.17 - create geoip processor

curl -s -XPUT -H'content-type:application/json' \
   'localhost:9200/_ingest/pipeline/geoip' \
   -d '{"description":"test","processors":[{"geoip":{"field":"ip"}}]}'

curl -s -XPUT -H'content-type:application/json' \
    'localhost:9200/my-index-00001/_doc/pre_migration?pipeline=geoip' \
    -d '{"ip":"8.8.8.8"}'

curl -s -XGET 'localhost:9200/my-index-00001/_doc/pre_migration'

# ON 7.17 - create snapshot

curl -s -XPUT -H'content-type:application/json' \
    'localhost:9200/_snapshot/fs_backup' \
    -d '{"type":"fs","settings":{"location":"/Users/wbrafford/work/es-builds/snapshots"}}'

curl -s -XPUT -H'content-type:application/json' \
    'localhost:9200/_snapshot/fs_backup/pre_migration' \
    -d '{"indices":"-*","feature_states":["geoip"],"include_global_state":false}'

# ON 8.1 - run feature migration

curl -s -XPOST 'localhost:9200/_migration/system_features'

# ON 8.1 - restart and test doc

curl -s -XPUT -H'content-type:application/json' \
    'localhost:9200/my-index-00001/_doc/post_migration?pipeline=geoip' \
    -d '{"ip":"8.8.8.8"}'

curl -s -XGET 'localhost:9200/my-index-00001/_doc/post_migration'

# restore snapshot

curl -s -XPOST -H'content-type:application/json' \
    'localhost:9200/_snapshot/fs_backup/pre_migration/_restore' \
    -d '{"indices":"-*","feature_states":["geoip"],"include_global_state":false}'

curl -s -XPUT -H'content-type:application/json' \
    'localhost:9200/my-index-00001/_doc/post_restore?pipeline=geoip' \
    -d '{"ip":"8.8.8.8"}'

curl -s -XGET 'localhost:9200/my-index-00001/_doc/post_restore'

Workaround

The only fix I know of is to restore a geoip feature state from a snapshot taken before the system feature migration:

POST /_snapshot/<repo_name>/<snapshot_name>/_restore
{
  "indices": "-*",
  "feature_states": ["geoip"],
  "include_global_state": false
}

It doesn't really matter how old the snapshot is, because once the plugin is restored to a good state, it can update the geoip index.

Open questions

What approach should we take to fix this?

  1. The .geoip_databases index doesn't contain any user data. We could just "migrate" it by deleting it and recreating it. This kind of fix would be the responsibility of the core-infra team.
  2. We could make the geoip ingest plugin robust for the case where .geoip_databases is an alias, not a concrete index. This doesn't seem like something the plugin needs intrinsically, but it might be easier to do than changing the migration code.

We should also look at giving users a more convenient way to reset the state of the geoip-ingest plugin. #70426 would have been really useful to have for this bug.

cc @dakrone, @gwbrown, @joegallo

@williamrandolph williamrandolph added >bug :Core/Infra/Core Core issues without another label :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.3.0 v8.2.1 labels Apr 7, 2022
@elasticmachine elasticmachine added Team:Data Management Meta label for data/management team Team:Core/Infra Meta label for core/infra team labels Apr 7, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@gwbrown
Copy link
Contributor

gwbrown commented Apr 7, 2022

We could make the geoip ingest plugin robust for the case where .geoip_databases is an alias, not a concrete index. This doesn't seem like something the plugin needs intrinsically, but it might be easier to do than changing the migration code.

This would be my preference, unless there's a reason I'm unaware of why this is difficult. Having to always have a concrete index at that name makes life very difficult given the lack of ability to rename indices, to the degree that we have considered requiring system indices to be behind aliases (and I still think that would be a good idea, even if we haven't done so yet).

@dakrone
Copy link
Member

dakrone commented Apr 7, 2022

+1 to making the geoip downloader handle the case where the index is an alias without any problems.

@pa-jberanek
Copy link

pa-jberanek commented Apr 29, 2022

I opened a case with Elastic Cloud support about a different impact of this issue. Our cluster ended up with a ".geoip_databases-reindexed-for-8" index, and this breaks both "Index Management" and the "Indices" view under Stack Monitoring in Kibana.

Kibana shows errors like:

Error loading indices
Indices [.geoip_databases-reindexed-for-8] use and access is reserved for system operations

I tried creating a special role and user which could delete restricted indices, so I could delete the index - but even this special user wasn't able to delete the problematic index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Core/Infra/Core Core issues without another label :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP Team:Core/Infra Meta label for core/infra team Team:Data Management Meta label for data/management team v8.2.1 v8.3.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants