Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose estimated disk usage and watermark information in nodes stats API #8686

Closed
dakrone opened this issue Nov 27, 2014 · 12 comments
Closed
Assignees
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement

Comments

@dakrone
Copy link
Member

dakrone commented Nov 27, 2014

We currently log whether a node is over the high and low watermark once every 30 seconds. To be more efficient and reduce the amount of logs generated, we should log it only once (using a latch) when the watermark is exceeded, and once when the disk goes back under the watermark.

We should also expose whether a node is above the watermarks in the nodes stats.

@synhershko
Copy link
Contributor

+1 for exposing this in node stats, way better than nagging in the logs

@kimchy
Copy link
Member

kimchy commented Nov 27, 2014

brainstorming here, but maybe we should have an allocation status API, and each decider can return an explanation + structured flags (allow_all, allow_primary, allow_replica) on a cluster view, node view, or index view (i.e. when concurrent recoveries is breached, ...). If its provided with a specific shard, then maybe we can give details on that specific shard?

@dakrone
Copy link
Member Author

dakrone commented Nov 27, 2014

@kimchy +1 on an allocation status API, I think we should separate the two (do both, but separately I mean), I think the disk usage percentage and watermark passed/not-passed should be exposed via the nodes stats API as part of the FsStats as a first step, then we can add the allocation status API as an additional step.

@kimchy
Copy link
Member

kimchy commented Nov 27, 2014

@dakrone ++

@clintongormley
Copy link

@dakrone you planning on returning to this one at some stage?

@dakrone dakrone changed the title Log disk thresholds once and expose allocation blocking in nodes stats API Expose estimated disk usage and watermark information in nodes stats API Dec 5, 2016
@dakrone
Copy link
Member Author

dakrone commented Dec 5, 2016

@clintongormley yes, I've updated the title for this as well for its actual work.

dakrone added a commit to dakrone/elasticsearch that referenced this issue Jan 7, 2017
This exposes the least and most used disk usage estimates within the "fs" nodes
stats output:

```json
GET /_nodes/stats/fs?pretty&human
{
  "nodes" : {
    "34fPVU0uQ_-wWitDzDXX_g" : {
      "fs" : {
        "timestamp" : 1481238723550,
        "total" : {
          "total" : "396.1gb",
          "total_in_bytes" : 425343254528,
          "free" : "140.6gb",
          "free_in_bytes" : 151068725248,
          "available" : "120.5gb",
          "available_in_bytes" : 129438912512
        },
        "least_usage_estimate" : {
          "path" : "/home/hinmanm/es/elasticsearch/distribution/build/cluster/run node0/elasticsearch-6.0.0-alpha1-SNAPSHOT/data/nodes/0",
          "total" : "396.1gb",
          "total_in_bytes" : 425343254528,
          "available" : "120.5gb",
          "available_in_bytes" : 129438633984,
          "used_disk_percent" : 69.56842912023208
        },
        "most_usage_estimate" : {
          "path" : "/home/hinmanm/es/elasticsearch/distribution/build/cluster/run node0/elasticsearch-6.0.0-alpha1-SNAPSHOT/data/nodes/0",
          "total" : "396.1gb",
          "total_in_bytes" : 425343254528,
          "available" : "120.5gb",
          "available_in_bytes" : 129438633984,
          "used_disk_percent" : 69.56842912023208
        },
        "data" : [{...}],
        "io_stats" : {...}
      }
    }
  }
}
```

Resolves elastic#8686
@lcawl lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018
@godber
Copy link

godber commented Jan 8, 2020

Hi All, sorry to comment on this blast from the past but I've recently found myself needing this type of functionality. Was the PR accidentally closed instead of accepted or something? It looks like everyone was in agreement an MR was made and then closed.

@DaveCTurner
Copy link
Contributor

I'm not sure why it's marked as Closed in the Github UI since it was indeed merged (see 4eb32e9) and still works to this day (version 7.5.0):

$ curl -s 'http://localhost:9200/_nodes/stats/fs' | jq .
{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "elasticsearch",
  "nodes": {
    "TXGs3dDwS1WmAQnicIxqkg": {
      "timestamp": 1578564322370,
      "name": "node-0",
      "transport_address": "127.0.0.1:9300",
      "host": "127.0.0.1",
      "ip": "127.0.0.1:9300",
      "roles": [
        "ingest",
        "master",
        "data",
        "ml"
      ],
      "attributes": {
        "ml.machine_memory": "17179869184",
        "xpack.installed": "true",
        "ml.max_open_jobs": "20"
      },
      "fs": {
        "timestamp": 1578564322370,
        "total": {
          "total_in_bytes": 499963170816,
          "free_in_bytes": 270946213888,
          "available_in_bytes": 265933774848
        },
        "least_usage_estimate": {
          "path": "/Users/davidturner/issues/8686/elasticsearch-7.5.0/data-0/nodes/0",
          "total_in_bytes": 499963170816,
          "available_in_bytes": 265933742080,
          "used_disk_percent": 46.80933364632356
        },
        "most_usage_estimate": {
          "path": "/Users/davidturner/issues/8686/elasticsearch-7.5.0/data-0/nodes/0",
          "total_in_bytes": 499963170816,
          "available_in_bytes": 265933742080,
          "used_disk_percent": 46.80933364632356
        },
        "data": [
          {
            "path": "/Users/davidturner/issues/8686/elasticsearch-7.5.0/data-0/nodes/0",
            "mount": "/ (/dev/disk1s1)",
            "type": "apfs",
            "total_in_bytes": 499963170816,
            "free_in_bytes": 270946213888,
            "available_in_bytes": 265933774848
          }
        ]
      }
    }
  }
}

This information is also available in the allocation explain API if you pass the ?include_disk_info parameter.

@dakrone
Copy link
Member Author

dakrone commented Jan 9, 2020

I'm not sure why it's marked as Closed in the Github UI since it was indeed merged (see 4eb32e9) and still works to this day (version 7.5.0)

Github only marks squashed commits as "Merged" if you used the merge button, otherwise they are marked as closed. I probably merged it manually (or hey, maybe the merge button wasn't even around back then).

@godber
Copy link

godber commented Jan 9, 2020

Ah, my bad, thanks for taking the time to respond. We don't see this in our 6.8.1 clusters, so I was confused and thought it didn't make it in. My issue must be something else.

@DaveCTurner
Copy link
Contributor

@godber this came up again internally and we found that these stats are indeed sometimes (often) missing or stale, so that might explain your issue. The fix would have been rather convoluted, and the correct stats are available from the allocation explain API, and/or can be computed from the stats API response, so we've reverted this change in #59755.

@godber
Copy link

godber commented Jul 21, 2020

@DaveCTurner thanks for remembering me and pinging me here!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. >enhancement
Projects
None yet
Development

No branches or pull requests

7 participants