Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate documents when using ILM rollover #56273

Closed
CSharpBender opened this issue May 6, 2020 · 2 comments
Closed

Duplicate documents when using ILM rollover #56273

CSharpBender opened this issue May 6, 2020 · 2 comments
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team

Comments

@CSharpBender
Copy link

CSharpBender commented May 6, 2020

Elasticsearch version (bin/elasticsearch --version):
Elasticsearch 7.6.1
Plugins installed: []
Nest 7.6.1
JVM version (java -version):
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 13.0.2+8, mixed mode, sharing)
OS version (uname -a if on a Unix-like system):
Linux c87e626860d1 4.19.76-linuxkit #1 SMP Thu Oct 17 19:31:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
I have a big index that keeps historical data and only the documents within the last month are being updated so using index rollover seems perfect for this use case.
The problem is that I need to be able to update/delete last month records but when the rollover happens I end up with duplicate documents with the same ID.
It seems that rollover works only with data that never gets updated, like application logs.

Steps to reproduce:
Although my scenario is to edit last month entries this reproduces whenever the rollover happens and one document gets updated. So it doesn't matter if it's one month or one day.
Setup ILM policy with max 2 documents

PUT _ilm/policy/mytest_rollover_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {           
            "max_docs": 2
          }
        }
      }
    }
  }
}

PUT _template/mytest_template
{
  "index_patterns": ["mytest-*"],                 
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0,
    "index.lifecycle.name": "mytest_rollover_policy",      
    "index.lifecycle.rollover_alias": "mytest"    
  }
}

PUT mytest-000001
{
  "aliases": {
    "mytest": {
      "is_write_index": true
    }
  },
  "mappings": {
    "properties": {
      "name":   { "type": "text"  }     
    }
  }
}

Insert documents and wait for rollover to happen

PUT /mytest/_doc/1
{
 "name" : "Name1"
}

PUT /mytest/_doc/2
{
 "name" : "Name2"
}

PUT /mytest/_doc/3
{
 "name" : "Name3"
}

Make a document update

PUT /mytest/_doc/1
{
 "name" : "Name1111"
}

Query for the documents and notice that the document with ID=1 is duplicated

GET /mytest/_search

I understand that only one index is writable and the original document cannot be updated but because I'm using the alias I was expecting to get only the latest document which is stored in the active writable index.

@nik9000 nik9000 added the :Data Management/ILM+SLM Index and Snapshot lifecycle management label May 6, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/ILM+SLM)

@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label May 6, 2020
@dakrone
Copy link
Member

dakrone commented May 6, 2020

Closing this as a duplicate of #44794

@dakrone dakrone closed this as completed May 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/ILM+SLM Index and Snapshot lifecycle management Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

4 participants