Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add shard allocation explain API to explain why shards are (or aren't) UNASSIGNED #14593

Closed
dakrone opened this issue Nov 6, 2015 · 12 comments
Closed
Assignees
Labels
discuss :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >feature high hanging fruit

Comments

@dakrone
Copy link
Member

dakrone commented Nov 6, 2015

Idea

Relates to a comment on #8606 and supersedes #14405.

We currently have the /_cluster/reroute API, that, with the explain and
dry_run parameters allow a user to manually specify an allocation command and
get back an explanation for why that shard can or can not undergo the requested
allocation. This is useful, however, it requires a user to know which node a
shard should be on, and to construct an allocation command for the shard.

I would like to build an API to answer one of the most often asked questions:
"Why is my shard UNASSIGNED?"

Instead of it being shard and node specific, I envision an API that looks like:

GET /_cluster/allocation/explain
{
  "index": "myindex"
  "shard": 0,
  "primary": false
}

Which is basically asking "explain the allocation for a replica of shard 0 for
the 'myindex' index".

Here's an idea of how I'd like the response to look:

{
    "shard": {
        "index": "myindex"
        "shard": 0,
        "primary": false
    },
    "cluster_info": {
        "nodes": {
            "nodeuuid1": {
                "lowest_usage": {
                    "path": "/var/data1",
                    "free_bytes": 1000,
                    "used_bytes": 400,
                    "total_bytes": 1400,
                    "free_disk_percentage": "71.3%"
                    "used_disk_percentage": "28.6%"
                },
                "highest_usage": {
                    "path": "/var/data2",
                    "free_bytes": 1200,
                    "used_bytes": 600,
                    "total_bytes": 1800,
                    "free_disk_percentage": "66.6%"
                    "used_disk_percentage": "33.3%"
                }
            },
            "nodeuuid2": {
                "lowest_usage": {
                    "path": "/var/data1",
                    "free_bytes": 1000,
                    "used_bytes": 400,
                    "total_bytes": 1400,
                    "free_disk_percentage": "71.3%"
                    "used_disk_percentage": "28.6%"
                },
                "highest_usage": {
                    "path": "/var/data2",
                    "free_bytes": 1200,
                    "used_bytes": 600,
                    "total_bytes": 1800,
                    "free_disk_percentage": "66.6%"
                    "used_disk_percentage": "33.3%"
                }
            },
            "nodeuuid3": {
                "lowest_usage": {
                    "path": "/var/data1",
                    "free_bytes": 1000,
                    "used_bytes": 400,
                    "total_bytes": 1400,
                    "free_disk_percentage": "71.3%"
                    "used_disk_percentage": "28.6%"
                },
                "highest_usage": {
                    "path": "/var/data2",
                    "free_bytes": 1200,
                    "used_bytes": 600,
                    "total_bytes": 1800,
                    "free_disk_percentage": "66.6%"
                    "used_disk_percentage": "33.3%"
                }
            }
        },
        "shard_sizes": {
            "[myindex][0][P]": 1228718,
            "[myindex][0][R]": 1231289,
            "[myindex][1][P]": 1248718,
            "[myindex][1][R]": 1298718,
        }
    },
    "nodes": {
        "nodeuuid1": {
            "final_decision": "NO"
            "decisions": [
                {
                    "decider" : "same_shard",
                    "decision" : "NO",
                    "explanation" : "shard cannot be allocated on same node [JZU4UIPFQtWn34FyAH6VoQ] it already exists on"
                },
                {
                    "decider" : "snapshot_in_progress",
                    "decision" : "NO",
                    "explanation" : "a snapshot is in progress"
                }
            ],
            "weight": 1.9
        }
        "nodeuuid2": {
            "final_decision": "NO"
            "decisions": [
                {
                    "decider" : "node_version",
                    "decision" : "NO",
                    "explanation" : "target node version [1.4.0] is older than source node version [1.7.3]"
                }
            ],
            "weight": 1.3
        }
        "nodeuuid3": {
            "final_decision": "YES"
            "decisions": [],
            "weight": 0.9
        }
    }
}

Breaking down the parts:

shard

The same information passed into the request, so it is contained in the request
itself as well.

cluster_info

This roughly relates to #14405,
however, I realized this API is not node-specific, so putting it in nodes' stats
API doesn't make sense. Instead, it's master-only information used for
allocation. Since this is gathered and used for allocation, it makes sense to
expose it here since it influences the final decision.

nodes

This is a map where each key is the node uuid (should probably include the node
name as well to be helpful). It has sub keys:

final_decision

A simple "YES" or "NO" for whether the shard can currently be allocated on this
node.

decisions

A list of all the "NO" decisions preventing allocation on this node. I could see
a flag being added to include all the "YES" decisions, but I think it should
default to showing "NO" only to prevent it being too verbose.

weight

I'd like to change the ShardAllocator interface to add the ability to "weigh"
a shard for a node, with whatever criteria it usually balances with. For
example, with the BalancedShardsAllocator, the weight would be the calculation
based on the number of shards for the index as well as the shard count.

I could see this being useful for answering the question "Okay, if all the
decisions where 'YES', where would this shard likely end up?".

It might be trickier to implement, but it could be added on at a later time.

Conclusion

I think this would go a long way towards helping users understand from a
cluster-level (instead of a single node-level) perspective why their shard can
or cannot be allocated.

Thoughts and feedback welcome! This is potentially a non-trivial amount of work
so I wanted to see what people thought before spending time implementing it.

@seang-es
Copy link

seang-es commented Nov 6, 2015

This would be an extremely helpful feature for us. We get tons of cases where the customer has unassigned shards and does not know why, and until we get logs from them, we have no answer. A quick answer would help the user self-diagnose. If it could be linked with _cat/shards to give us a single coherent display showing what's unassigned and why, it would be incredible.

@ppf2
Copy link
Member

ppf2 commented Nov 6, 2015

Finally, great to see this API being discussed and planned! Linking #9471 as well.

+1 on a single API to get all the information/causes on why a shard cannot be unassigned. It doesn't look like this one will cover the corruption scenario and the user will have to use the other API (#11545)? Having a single API will not only be helpful for admins but also facilitate future Marvel's reporting of unassigned shards and their reasons.

@GlenRSmith
Copy link
Contributor

👍 How about the request body have properties that allow interrogation based on shard states?

@dakrone
Copy link
Member Author

dakrone commented Nov 6, 2015

How about the request body have properties that allow interrogation based on shard states?

I'm not sure what you mean by this, can you elaborate?

@GlenRSmith
Copy link
Contributor

I'm not sure what you mean by this, can you elaborate?

Skip the step of finding the unallocated shards and asking about each of them individually, and just

GET /_cluster/allocation/explain
{
  "index": "myindex"
  "shard": {
    "state": "unallocated"
  }
}

Not a literal proposal of form, but gets my intent across, I think.

@clintongormley
Copy link

I like the idea a lot, but I'd reorganise things a bit, especially so it works better in the 500 node case. I think the disk usage stats should also be available in the nodes stats API. In this API, I'd perhaps include the disk (relevant?) usage stats in the disk usage decider, rather than in a separate section (ie put it where it is relevant).

Perhaps default to a non-verbose output which doesn't list every YES decision, but only the first NO decision for each node.

Btw, what is this supposed to represent?

    "shard_sizes": {
        "[myindex][0][P]": 1228718,
        "[myindex][0][R]": 1231289,
        "[myindex][1][P]": 1248718,
        "[myindex][1][R]": 1298718,
    }

@dakrone
Copy link
Member Author

dakrone commented Nov 9, 2015

I think the disk usage stats should also be available in the nodes stats API.

I think this is doable, but is more complex because the stats are only available to the master node, so they're much harder to retrieve.

Perhaps default to a non-verbose output which doesn't list every YES decision, but only the first NO decision for each node.

This is similar to what I proposed, show all "NO" decisions (showing only the first one doesn't help since it could be a red herring for people to fix) and eliding all the "YES" decisions by default.

Btw, what is this supposed to represent? ... shard_sizes ...

The ClusterInfoService gathers shard sizes every 30 seconds which are used for the disk decider, they aren't node-specific so they don't really belong in the nodes stats, so I figured they might be useful here.

@dakrone
Copy link
Member Author

dakrone commented Nov 9, 2015

I'd perhaps include the disk (relevant?) usage stats in the disk usage decider

Hmm... we could add a public ToXContent supplementaryInfo() method to the AllocationDecider method that would allow certain deciders to return any additional information. Interesting idea...

@jordansissel
Copy link
Contributor

I would also love this feature. +1 - Here's my story:

One scenario that is hard to debug (I ran into today) is if I configure ES cluster.routing.allocation.awareness.attributes to a nonsense attribute name (meaning, one that doesn't exist and is not set on any nodes), I will end up with new indices that are not able to allocate any shards. The explanation if I use _reroute?dry_run&explain today is slightly confusing, and as I understand it, the reroute API is a very advanced feature, so exposing newbies to it would be not so nice.

Having a friendlier (less advanced, whatever) way of asking why a shard is not assigned/allocated would be ever so lovely. <3 <3

@pickypg
Copy link
Member

pickypg commented Feb 12, 2016

Came across this while searching for this kind of feature. Definitely want! +1

@bleskes
Copy link
Contributor

bleskes commented Mar 1, 2016

I looked at this again and I like it. I agree that it runs the danger of being too verbose. Here are some minor suggestions:

  1. If we can't put the disk free stats on the node stats, I would output it last. Same goes for shard sizes. This assumes that a NO decision will tell the values it based things on (which might make these sections completely irrelevant)
  2. Put the "yes" node first. We should be able to see in a quick glance if a shard can be assigned.
  3. Try to rename nodes - I find it confusing. My only alternative is "possible_allocations" - though that's longish.

Last - why do we need to specify a primary flag in the request?

@dakrone
Copy link
Member Author

dakrone commented Mar 1, 2016

Last - why do we need to specify a primary flag in the request?

Allocation Deciders can have different rules for primary versus replica, additionally, when retrieving the shard from the cluster state, I plan to use the usassigned_info in the response as well, so we need to differentiate between the two.

dakrone added a commit to dakrone/elasticsearch that referenced this issue Mar 28, 2016
This adds a new `/_cluster/allocation/explain` API that explains why a
shard can or cannot be allocated to nodes in the cluster. Additionally,
it will show where the master *desires* to put the shard, according to
the `ShardsAllocator`.

It looks like this:

```
GET /_cluster/allocation/explain?pretty
{
  "index": "only-foo",
  "shard": 0,
  "primary": false
}
```

Though, you can optionally send an empty body, which means "explain the
allocation for the first unassigned shard you find".

The output when a shard is unassigned looks like this:

```
{
  "shard" : {
    "index" : "only-foo",
    "index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
    "id" : 0,
    "primary" : false
  },
  "assigned" : false,
  "unassigned_info" : {
    "reason" : "INDEX_CREATED",
    "at" : "2016-03-22T20:04:23.620Z"
  },
  "nodes" : {
    "V-Spi0AyRZ6ZvKbaI3691w" : {
      "node_name" : "Susan Storm",
      "node_attributes" : {
        "bar" : "baz"
      },
      "final_decision" : "NO",
      "weight" : 0.06666675,
      "decisions" : [ {
        "decider" : "filter",
        "decision" : "NO",
        "explanation" : "node does not match index include filters [foo:\"bar\"]"
      } ]
    },
    "Qc6VL8c5RWaw1qXZ0Rg57g" : {
      "node_name" : "Slipstream",
      "node_attributes" : {
        "bar" : "baz",
        "foo" : "bar"
      },
      "final_decision" : "NO",
      "weight" : -1.3833332,
      "decisions" : [ {
        "decider" : "same_shard",
        "decision" : "NO",
        "explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
      } ]
    },
    "PzdyMZGXQdGhqTJHF_hGgA" : {
      "node_name" : "The Symbiote",
      "node_attributes" : { },
      "final_decision" : "NO",
      "weight" : 2.3166666,
      "decisions" : [ {
        "decider" : "filter",
        "decision" : "NO",
        "explanation" : "node does not match index include filters [foo:\"bar\"]"
      } ]
    }
  }
}
```

And when the shard *is* assigned, the output looks like:

```
{
  "shard" : {
    "index" : "only-foo",
    "index_uuid" : "KnW0-zELRs6PK84l0r38ZA",
    "id" : 0,
    "primary" : true
  },
  "assigned" : true,
  "assigned_node_id" : "Qc6VL8c5RWaw1qXZ0Rg57g",
  "nodes" : {
    "V-Spi0AyRZ6ZvKbaI3691w" : {
      "node_name" : "Susan Storm",
      "node_attributes" : {
        "bar" : "baz"
      },
      "final_decision" : "NO",
      "weight" : 1.4499999,
      "decisions" : [ {
        "decider" : "filter",
        "decision" : "NO",
        "explanation" : "node does not match index include filters [foo:\"bar\"]"
      } ]
    },
    "Qc6VL8c5RWaw1qXZ0Rg57g" : {
      "node_name" : "Slipstream",
      "node_attributes" : {
        "bar" : "baz",
        "foo" : "bar"
      },
      "final_decision" : "CURRENTLY_ASSIGNED",
      "weight" : 0.0,
      "decisions" : [ {
        "decider" : "same_shard",
        "decision" : "NO",
        "explanation" : "the shard cannot be allocated on the same node id [Qc6VL8c5RWaw1qXZ0Rg57g] on which it already exists"
      } ]
    },
    "PzdyMZGXQdGhqTJHF_hGgA" : {
      "node_name" : "The Symbiote",
      "node_attributes" : { },
      "final_decision" : "NO",
      "weight" : 3.6999998,
      "decisions" : [ {
        "decider" : "filter",
        "decision" : "NO",
        "explanation" : "node does not match index include filters [foo:\"bar\"]"
      } ]
    }
  }
}
```

Only "NO" decisions are returned by default, but all decisions can be
shown by specifying the `?include_yes_decisions=true` parameter in the
request.

Resolves elastic#14593
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Cluster :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >feature high hanging fruit
Projects
None yet
Development

No branches or pull requests

9 participants