Add shard allocation explain API to explain why shards are (or aren't) UNASSIGNED #14593

dakrone · 2015-11-06T21:26:11Z

Idea

Relates to a comment on #8606 and supersedes #14405.

We currently have the /_cluster/reroute API, that, with the explain and
dry_run parameters allow a user to manually specify an allocation command and
get back an explanation for why that shard can or can not undergo the requested
allocation. This is useful, however, it requires a user to know which node a
shard should be on, and to construct an allocation command for the shard.

I would like to build an API to answer one of the most often asked questions:
"Why is my shard UNASSIGNED?"

Instead of it being shard and node specific, I envision an API that looks like:

GET /_cluster/allocation/explain
{
  "index": "myindex"
  "shard": 0,
  "primary": false
}

Which is basically asking "explain the allocation for a replica of shard 0 for
the 'myindex' index".

Here's an idea of how I'd like the response to look:

{
    "shard": {
        "index": "myindex"
        "shard": 0,
        "primary": false
    },
    "cluster_info": {
        "nodes": {
            "nodeuuid1": {
                "lowest_usage": {
                    "path": "/var/data1",
                    "free_bytes": 1000,
                    "used_bytes": 400,
                    "total_bytes": 1400,
                    "free_disk_percentage": "71.3%"
                    "used_disk_percentage": "28.6%"
                },
                "highest_usage": {
                    "path": "/var/data2",
                    "free_bytes": 1200,
                    "used_bytes": 600,
                    "total_bytes": 1800,
                    "free_disk_percentage": "66.6%"
                    "used_disk_percentage": "33.3%"
                }
            },
            "nodeuuid2": {
                "lowest_usage": {
                    "path": "/var/data1",
                    "free_bytes": 1000,
                    "used_bytes": 400,
                    "total_bytes": 1400,
                    "free_disk_percentage": "71.3%"
                    "used_disk_percentage": "28.6%"
                },
                "highest_usage": {
                    "path": "/var/data2",
                    "free_bytes": 1200,
                    "used_bytes": 600,
                    "total_bytes": 1800,
                    "free_disk_percentage": "66.6%"
                    "used_disk_percentage": "33.3%"
                }
            },
            "nodeuuid3": {
                "lowest_usage": {
                    "path": "/var/data1",
                    "free_bytes": 1000,
                    "used_bytes": 400,
                    "total_bytes": 1400,
                    "free_disk_percentage": "71.3%"
                    "used_disk_percentage": "28.6%"
                },
                "highest_usage": {
                    "path": "/var/data2",
                    "free_bytes": 1200,
                    "used_bytes": 600,
                    "total_bytes": 1800,
                    "free_disk_percentage": "66.6%"
                    "used_disk_percentage": "33.3%"
                }
            }
        },
        "shard_sizes": {
            "[myindex][0][P]": 1228718,
            "[myindex][0][R]": 1231289,
            "[myindex][1][P]": 1248718,
            "[myindex][1][R]": 1298718,
        }
    },
    "nodes": {
        "nodeuuid1": {
            "final_decision": "NO"
            "decisions": [
                {
                    "decider" : "same_shard",
                    "decision" : "NO",
                    "explanation" : "shard cannot be allocated on same node [JZU4UIPFQtWn34FyAH6VoQ] it already exists on"
                },
                {
                    "decider" : "snapshot_in_progress",
                    "decision" : "NO",
                    "explanation" : "a snapshot is in progress"
                }
            ],
            "weight": 1.9
        }
        "nodeuuid2": {
            "final_decision": "NO"
            "decisions": [
                {
                    "decider" : "node_version",
                    "decision" : "NO",
                    "explanation" : "target node version [1.4.0] is older than source node version [1.7.3]"
                }
            ],
            "weight": 1.3
        }
        "nodeuuid3": {
            "final_decision": "YES"
            "decisions": [],
            "weight": 0.9
        }
    }
}

Breaking down the parts:

`shard`

The same information passed into the request, so it is contained in the request
itself as well.

`cluster_info`

This roughly relates to #14405,
however, I realized this API is not node-specific, so putting it in nodes' stats
API doesn't make sense. Instead, it's master-only information used for
allocation. Since this is gathered and used for allocation, it makes sense to
expose it here since it influences the final decision.

`nodes`

This is a map where each key is the node uuid (should probably include the node
name as well to be helpful). It has sub keys:

`final_decision`

A simple "YES" or "NO" for whether the shard can currently be allocated on this
node.

`decisions`

A list of all the "NO" decisions preventing allocation on this node. I could see
a flag being added to include all the "YES" decisions, but I think it should
default to showing "NO" only to prevent it being too verbose.

`weight`

I'd like to change the ShardAllocator interface to add the ability to "weigh"
a shard for a node, with whatever criteria it usually balances with. For
example, with the BalancedShardsAllocator, the weight would be the calculation
based on the number of shards for the index as well as the shard count.

I could see this being useful for answering the question "Okay, if all the
decisions where 'YES', where would this shard likely end up?".

It might be trickier to implement, but it could be added on at a later time.

Conclusion

I think this would go a long way towards helping users understand from a
cluster-level (instead of a single node-level) perspective why their shard can
or cannot be allocated.

Thoughts and feedback welcome! This is potentially a non-trivial amount of work
so I wanted to see what people thought before spending time implementing it.

The text was updated successfully, but these errors were encountered:

seang-es · 2015-11-06T21:29:36Z

This would be an extremely helpful feature for us. We get tons of cases where the customer has unassigned shards and does not know why, and until we get logs from them, we have no answer. A quick answer would help the user self-diagnose. If it could be linked with _cat/shards to give us a single coherent display showing what's unassigned and why, it would be incredible.

ppf2 · 2015-11-06T21:32:54Z

Finally, great to see this API being discussed and planned! Linking #9471 as well.

+1 on a single API to get all the information/causes on why a shard cannot be unassigned. It doesn't look like this one will cover the corruption scenario and the user will have to use the other API (#11545)? Having a single API will not only be helpful for admins but also facilitate future Marvel's reporting of unassigned shards and their reasons.

GlenRSmith · 2015-11-06T21:34:25Z

👍 How about the request body have properties that allow interrogation based on shard states?

dakrone · 2015-11-06T21:37:30Z

How about the request body have properties that allow interrogation based on shard states?

I'm not sure what you mean by this, can you elaborate?

GlenRSmith · 2015-11-06T21:41:47Z

I'm not sure what you mean by this, can you elaborate?

Skip the step of finding the unallocated shards and asking about each of them individually, and just

GET /_cluster/allocation/explain
{
  "index": "myindex"
  "shard": {
    "state": "unallocated"
  }
}

Not a literal proposal of form, but gets my intent across, I think.

clintongormley · 2015-11-08T17:14:05Z

I like the idea a lot, but I'd reorganise things a bit, especially so it works better in the 500 node case. I think the disk usage stats should also be available in the nodes stats API. In this API, I'd perhaps include the disk (relevant?) usage stats in the disk usage decider, rather than in a separate section (ie put it where it is relevant).

Perhaps default to a non-verbose output which doesn't list every YES decision, but only the first NO decision for each node.

Btw, what is this supposed to represent?

    "shard_sizes": {
        "[myindex][0][P]": 1228718,
        "[myindex][0][R]": 1231289,
        "[myindex][1][P]": 1248718,
        "[myindex][1][R]": 1298718,
    }

dakrone · 2015-11-09T19:22:37Z

I think the disk usage stats should also be available in the nodes stats API.

I think this is doable, but is more complex because the stats are only available to the master node, so they're much harder to retrieve.

Perhaps default to a non-verbose output which doesn't list every YES decision, but only the first NO decision for each node.

This is similar to what I proposed, show all "NO" decisions (showing only the first one doesn't help since it could be a red herring for people to fix) and eliding all the "YES" decisions by default.

Btw, what is this supposed to represent? ... shard_sizes ...

The ClusterInfoService gathers shard sizes every 30 seconds which are used for the disk decider, they aren't node-specific so they don't really belong in the nodes stats, so I figured they might be useful here.

dakrone · 2015-11-09T21:03:08Z

I'd perhaps include the disk (relevant?) usage stats in the disk usage decider

Hmm... we could add a public ToXContent supplementaryInfo() method to the AllocationDecider method that would allow certain deciders to return any additional information. Interesting idea...

jordansissel · 2016-01-22T22:23:20Z

I would also love this feature. +1 - Here's my story:

One scenario that is hard to debug (I ran into today) is if I configure ES cluster.routing.allocation.awareness.attributes to a nonsense attribute name (meaning, one that doesn't exist and is not set on any nodes), I will end up with new indices that are not able to allocate any shards. The explanation if I use _reroute?dry_run&explain today is slightly confusing, and as I understand it, the reroute API is a very advanced feature, so exposing newbies to it would be not so nice.

Having a friendlier (less advanced, whatever) way of asking why a shard is not assigned/allocated would be ever so lovely. <3 <3

pickypg · 2016-02-12T18:59:44Z

Came across this while searching for this kind of feature. Definitely want! +1

bleskes · 2016-03-01T14:13:03Z

I looked at this again and I like it. I agree that it runs the danger of being too verbose. Here are some minor suggestions:

If we can't put the disk free stats on the node stats, I would output it last. Same goes for shard sizes. This assumes that a NO decision will tell the values it based things on (which might make these sections completely irrelevant)
Put the "yes" node first. We should be able to see in a quick glance if a shard can be assigned.
Try to rename nodes - I find it confusing. My only alternative is "possible_allocations" - though that's longish.

Last - why do we need to specify a primary flag in the request?

dakrone · 2016-03-01T15:25:32Z

Last - why do we need to specify a primary flag in the request?

Allocation Deciders can have different rules for primary versus replica, additionally, when retrieving the shard from the cluster state, I plan to use the usassigned_info in the response as well, so we need to differentiate between the two.

This adds a new `/_cluster/allocation/explain` API that explains why a shard can or cannot be allocated to nodes in the cluster. Additionally, it will show where the master *desires* to put the shard, according to the `ShardsAllocator`. It looks like this: ``` GET /_cluster/allocation/explain?pretty { "index": "only-foo", "shard": 0, "primary": false } ``` Though, you can optionally send an empty body, which means "explain the allocation for the first unassigned shard you find". The output when a shard is unassigned looks like this: ``` { "shard" : { "index" : "only-foo", "index_uuid" : "KnW0-zELRs6PK84l0r38ZA", "id" : 0, "primary" : false }, "assigned" : false, "unassigned_info" : { "reason" : "INDEX_CREATED", "at" : "2016-03-22T20:04:23.620Z" }, "nodes" : { "V-Spi0AyRZ6ZvKbaI3691w" : { "node_name" : "Susan Storm", "node_attributes" : { "bar" : "baz" }, "final_decision" : "NO", "weight" : 0.06666675, "decisions" : [ { "decider" : "filter", "decision" : "NO", "explanation" : "node does not match index include filters [foo:\"bar\"]" } ] }, "Qc6VL8c5RWaw1qXZ0Rg57g" : { "node_name" : "Slipstream", "node_attributes" : { "bar" : "baz", "foo" : "bar" }, "final_decision" : "NO", "weight" : -1.3833332, "decisions" : [ { "decider" : "same_shard", "decision" : "NO", "explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists" } ] }, "PzdyMZGXQdGhqTJHF_hGgA" : { "node_name" : "The Symbiote", "node_attributes" : { }, "final_decision" : "NO", "weight" : 2.3166666, "decisions" : [ { "decider" : "filter", "decision" : "NO", "explanation" : "node does not match index include filters [foo:\"bar\"]" } ] } } } ``` And when the shard *is* assigned, the output looks like: ``` { "shard" : { "index" : "only-foo", "index_uuid" : "KnW0-zELRs6PK84l0r38ZA", "id" : 0, "primary" : true }, "assigned" : true, "assigned_node_id" : "Qc6VL8c5RWaw1qXZ0Rg57g", "nodes" : { "V-Spi0AyRZ6ZvKbaI3691w" : { "node_name" : "Susan Storm", "node_attributes" : { "bar" : "baz" }, "final_decision" : "NO", "weight" : 1.4499999, "decisions" : [ { "decider" : "filter", "decision" : "NO", "explanation" : "node does not match index include filters [foo:\"bar\"]" } ] }, "Qc6VL8c5RWaw1qXZ0Rg57g" : { "node_name" : "Slipstream", "node_attributes" : { "bar" : "baz", "foo" : "bar" }, "final_decision" : "CURRENTLY_ASSIGNED", "weight" : 0.0, "decisions" : [ { "decider" : "same_shard", "decision" : "NO", "explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists" } ] }, "PzdyMZGXQdGhqTJHF_hGgA" : { "node_name" : "The Symbiote", "node_attributes" : { }, "final_decision" : "NO", "weight" : 3.6999998, "decisions" : [ { "decider" : "filter", "decision" : "NO", "explanation" : "node does not match index include filters [foo:\"bar\"]" } ] } } } ``` Only "NO" decisions are returned by default, but all decisions can be shown by specifying the `?include_yes_decisions=true` parameter in the request. Resolves elastic#14593

dakrone added >feature high hanging fruit discuss :Cluster labels Nov 6, 2015

dakrone self-assigned this Nov 6, 2015

dakrone added the :Allocation label Nov 6, 2015

This was referenced Dec 4, 2015

API to return info/details on why shards are in unassigned state #9471

Closed

Provide warning in log files when shards cannot be allocated due to version differences #11781

Closed

clintongormley mentioned this issue Jan 26, 2016

Visibility problem when cluster.routing.allocation.awareness.attributes is misconfigured #16195

Closed

dakrone mentioned this issue Mar 23, 2016

Add API to explain why a shard is or isn't assigned #17305

Merged

dakrone closed this as completed in #17305 Mar 28, 2016

lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add shard allocation explain API to explain why shards are (or aren't) UNASSIGNED #14593

Add shard allocation explain API to explain why shards are (or aren't) UNASSIGNED #14593

dakrone commented Nov 6, 2015

seang-es commented Nov 6, 2015

ppf2 commented Nov 6, 2015

GlenRSmith commented Nov 6, 2015

dakrone commented Nov 6, 2015

GlenRSmith commented Nov 6, 2015

clintongormley commented Nov 8, 2015

dakrone commented Nov 9, 2015

dakrone commented Nov 9, 2015

jordansissel commented Jan 22, 2016

pickypg commented Feb 12, 2016

bleskes commented Mar 1, 2016

dakrone commented Mar 1, 2016

Add shard allocation explain API to explain why shards are (or aren't) UNASSIGNED #14593

Add shard allocation explain API to explain why shards are (or aren't) UNASSIGNED #14593

Comments

dakrone commented Nov 6, 2015

Idea

shard

cluster_info

nodes

final_decision

decisions

weight

Conclusion

seang-es commented Nov 6, 2015

ppf2 commented Nov 6, 2015

GlenRSmith commented Nov 6, 2015

dakrone commented Nov 6, 2015

GlenRSmith commented Nov 6, 2015

clintongormley commented Nov 8, 2015

dakrone commented Nov 9, 2015

dakrone commented Nov 9, 2015

jordansissel commented Jan 22, 2016

pickypg commented Feb 12, 2016

bleskes commented Mar 1, 2016

dakrone commented Mar 1, 2016

`shard`

`cluster_info`

`nodes`

`final_decision`

`decisions`

`weight`