Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visibility problem when cluster.routing.allocation.awareness.attributes is misconfigured #16195

Closed
jordansissel opened this issue Jan 22, 2016 · 3 comments
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement help wanted adoptme

Comments

@jordansissel
Copy link
Contributor

Scenario: If a cluster is misconfigured such that all nodes have cluster.routing.allocation.awareness.attributes=some_missing_attribute and zero nodes actually set this attribute, then shard allocation will fail and new indexes will have unallocated shards.

The symptoms are:

  • One or more indices are red

The challenge is that it is very difficult to debug this scenario. @dakrone was kind and pointed me at a nifty (advanced!) trick to ask Elasticsearch why it's allocation decision was made, and the decision is unhelpful (details below).

To reproduce this:

  1. Run Elasticsearch with the default configuration file and the following additional settings:
  • cluster.routing.allocation.awareness.attributes=foobar
  1. Create a new index:
% curl -DPUT localhost:9200/example -d '{}'
{"acknowledged":true}
  1. Check the status:
% curl -s localhost:9200/_cat/indices|grep example
red open example               5 1
  1. Troubleshooting: Check the elasticsearch logs (at default level) and I don't see any information hinting at allocation issues.

  2. Debugging: Try a dry run allocation via _cluster/reroute:

% curl -s 'localhost:9200/_cluster/reroute?dry_run&explain&pretty' -d '{ "commands": [ { "
allocate": { "index": "example", "shard": 0, "node": "happynode" } } ] }' | jq '.explanati
ons'
[
  {
    "command": "allocate",
    "parameters": {
      "index": "example",
      "shard": 0,
      "node": "happynode",
      "allow_primary": false
    },
    "decisions": [
      {
        "decider": "allocate_allocation_command",
        "decision": "NO",
        "explanation": "trying to allocate a primary shard [example][0], which is disabled
"
      }
    ]
  }
]

Overall, I believe Elasticsearch to be acting correctly (It has nowhere to route shards because of our configuration!). However, my concern is the lack of visibility into this issue for users:

  1. The debugging trickery via _cluster/reroute uses phrasing that I interpret to mean that allocation on is disabled.
  2. It's unclear how, without the reroute trick, how to ask Elasticsearch for hints on troubleshooting the misconfiguration

For a user, the correction would be to have at least one data node with node.foobar: whatever (more generally, that an awareness attribute must exist on at least one data node), and with some more clear logging and/or response/hinting to tell users something along the lines of "Could not allocate shard on any nodes because no nodes match the criteria: has attribute foobar"

@clintongormley
Copy link

Related to #14593 and #12412

@clintongormley
Copy link

The shard allocation explain API (#14593) would be a big win here, but I agree that the logging and failure messages can be improved.

@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
@clintongormley clintongormley added :Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) and removed :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. labels Feb 14, 2018
@DaveCTurner
Copy link
Contributor

I think this is fixed by #14593. In 6.2.1, GET /_cluster/allocation/explain yields the following:

{
  "index": "testidx",
  "shard": 0,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "INDEX_CREATED",
    "at": "2018-02-17T15:35:54.033Z",
    "last_allocation_status": "no"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is not permitted to any of the nodes",
  "node_allocation_decisions": [
    {
      "node_id": "Xf5BTYTnRdinK8hulkQ5yg",
      "node_name": "Xf5BTYT",
      "transport_address": "127.0.0.1:9300",
      "node_decision": "no",
      "weight_ranking": 1,
      "deciders": [
        {
          "decider": "awareness",
          "decision": "NO",
          "explanation": "node does not contain the awareness attribute [foobar]; required attributes cluster setting [cluster.routing.allocation.awareness.attributes=foobar]"
        }
      ]
    }
  ]

This seems sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) >enhancement help wanted adoptme
Projects
None yet
Development

No branches or pull requests

4 participants