Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply index create block when all the nodes of the cluster has breached high disk watermark #4456

Closed
RS146BIJAY opened this issue Sep 8, 2022 · 3 comments
Labels
discuss Issues intended to help drive brainstorming and decision making distributed framework enhancement Enhancement or improvement to existing feature or request Indexing & Search

Comments

@RS146BIJAY
Copy link
Contributor

RS146BIJAY commented Sep 8, 2022

Describe the bug

OpenSearch stops allocating any shards to nodes that have breached High Disk Watermark. If a scenario arises that all the nodes in the cluster breached high disk watermark, no new shards will be created on this cluster. Now if we try to create a new index on this cluster, it will be a red index (since no primary or replica shards can be created for this index).

Proposal

In order to prevent red cluster, we propose that whenever all the nodes in the cluster has breached high disk watermark we apply a INDEX CREATE BLOCK on the cluster to prevent creation of any red indices. We will modify the DiskThresholdMonitor (which monitors for disk watermarks thresholds on a domain and take appropriate action) to include an extra check on whether low disk watermark is breached on all the nodes in cluster. If it has, it will apply an index create block on the entire cluster.

Consideration:

  1. Once low disk watermark is no longer breached on any node of cluster, blocks applied should be automatically removed.
  2. If index create block is already applied on the cluster will this change cause any conflict.
  3. Do we need to handle the case when low disk watermark gets unbreached during rerouting?
@RS146BIJAY RS146BIJAY added bug Something isn't working untriaged labels Sep 8, 2022
@Bukhtawar Bukhtawar added enhancement Enhancement or improvement to existing feature or request and removed untriaged bug Something isn't working labels Sep 8, 2022
@dreamer-89 dreamer-89 added discuss Issues intended to help drive brainstorming and decision making Indexing & Search distributed framework labels Sep 13, 2022
@Gaganjuneja
Copy link
Contributor

@RS146BIJAY Definitely a good guard rail to protect the system. Just out of curiosity, I want to know if we can evaluate(dry run) the index creation upfront if it is going to make the cluster red and respond with the reason? The reason could be anything be it low disk, low capacity, number of shards exceeding at node level etc.

@Bukhtawar
Copy link
Collaborator

The problem doing a dry-run is we would usually go with an optimistic locking approach assuming that between the dry-run and the actual call nothing else has changes. However it is quite possible that when we did pre-checks the validations returns just about fine but fails during an actual call due to multiple concurrent requests changing the state of the system.

Then we need to understand how shards get assigned. When the leader assigns shards to the node for them to start, it tries to pick node based on some algorithm. Now its quite possible that between the assignment and the actual shard initialisation, disks could go full so it's actually tricky to get things right. Let me know if you have thoughts here

@Gaganjuneja
Copy link
Contributor

Gaganjuneja commented Oct 3, 2022

Thanks @Bukhtawar for clarifying and it does make sense as well. Optimistic locking is definitely a very costly operation here. We could think of something like resource allocation. Each request has some system requirements in terms of CPU, Disk, Memory, etc. and if we can try reserving these resources for the request and if it succeeds then the actual request will be processed otherwise reserved resources will be released. It will also help in managing the overall cluster resources. I would like to hear your thoughts on this.

@RS146BIJAY RS146BIJAY changed the title Apply index create block when all the nodes of the cluster has breached low disk watermark Apply index create block when all the nodes of the cluster has breached high disk watermark Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making distributed framework enhancement Enhancement or improvement to existing feature or request Indexing & Search
Projects
None yet
Development

No branches or pull requests

4 participants