Visibility problem when cluster.routing.allocation.awareness.attributes is misconfigured #16195

jordansissel · 2016-01-22T22:55:00Z

Scenario: If a cluster is misconfigured such that all nodes have cluster.routing.allocation.awareness.attributes=some_missing_attribute and zero nodes actually set this attribute, then shard allocation will fail and new indexes will have unallocated shards.

The symptoms are:

One or more indices are red

The challenge is that it is very difficult to debug this scenario. @dakrone was kind and pointed me at a nifty (advanced!) trick to ask Elasticsearch why it's allocation decision was made, and the decision is unhelpful (details below).

To reproduce this:

Run Elasticsearch with the default configuration file and the following additional settings:

cluster.routing.allocation.awareness.attributes=foobar

Create a new index:

% curl -DPUT localhost:9200/example -d '{}'
{"acknowledged":true}

Check the status:

% curl -s localhost:9200/_cat/indices|grep example
red open example               5 1

Troubleshooting: Check the elasticsearch logs (at default level) and I don't see any information hinting at allocation issues.
Debugging: Try a dry run allocation via _cluster/reroute:

% curl -s 'localhost:9200/_cluster/reroute?dry_run&explain&pretty' -d '{ "commands": [ { "
allocate": { "index": "example", "shard": 0, "node": "happynode" } } ] }' | jq '.explanati
ons'
[
  {
    "command": "allocate",
    "parameters": {
      "index": "example",
      "shard": 0,
      "node": "happynode",
      "allow_primary": false
    },
    "decisions": [
      {
        "decider": "allocate_allocation_command",
        "decision": "NO",
        "explanation": "trying to allocate a primary shard [example][0], which is disabled
"
      }
    ]
  }
]

Overall, I believe Elasticsearch to be acting correctly (It has nowhere to route shards because of our configuration!). However, my concern is the lack of visibility into this issue for users:

The debugging trickery via _cluster/reroute uses phrasing that I interpret to mean that allocation on is disabled.
It's unclear how, without the reroute trick, how to ask Elasticsearch for hints on troubleshooting the misconfiguration

For a user, the correction would be to have at least one data node with node.foobar: whatever (more generally, that an awareness attribute must exist on at least one data node), and with some more clear logging and/or response/hinting to tell users something along the lines of "Could not allocate shard on any nodes because no nodes match the criteria: has attribute foobar"

The text was updated successfully, but these errors were encountered:

clintongormley · 2016-01-26T10:25:16Z

Related to #14593 and #12412

clintongormley · 2016-01-26T10:26:23Z

The shard allocation explain API (#14593) would be a big win here, but I agree that the logging and failure messages can be improved.

DaveCTurner · 2018-02-17T15:39:32Z

I think this is fixed by #14593. In 6.2.1, GET /_cluster/allocation/explain yields the following:

{
  "index": "testidx",
  "shard": 0,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "INDEX_CREATED",
    "at": "2018-02-17T15:35:54.033Z",
    "last_allocation_status": "no"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "Xf5BTYTnRdinK8hulkQ5yg",
      "node_name": "Xf5BTYT",
      "transport_address": "127.0.0.1:9300",
      "node_decision": "no",
      "weight_ranking": 1,
      "deciders": [
        {
          "decider": "awareness",
          "decision": "NO",
          "explanation": "node does not contain the awareness attribute [foobar]; required attributes cluster setting [cluster.routing.allocation.awareness.attributes=foobar]"
        }
      ]
    }
  ]

This seems sufficient.

clintongormley added >enhancement help wanted adoptme :Allocation labels Jan 26, 2016

lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018

clintongormley added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 14, 2018

DaveCTurner closed this as completed Feb 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visibility problem when cluster.routing.allocation.awareness.attributes is misconfigured #16195

Visibility problem when cluster.routing.allocation.awareness.attributes is misconfigured #16195

jordansissel commented Jan 22, 2016

clintongormley commented Jan 26, 2016

clintongormley commented Jan 26, 2016

DaveCTurner commented Feb 17, 2018

Visibility problem when cluster.routing.allocation.awareness.attributes is misconfigured #16195

Visibility problem when cluster.routing.allocation.awareness.attributes is misconfigured #16195

Comments

jordansissel commented Jan 22, 2016

clintongormley commented Jan 26, 2016

clintongormley commented Jan 26, 2016

DaveCTurner commented Feb 17, 2018