Expose estimated disk usage and watermark information in nodes stats API #8686

dakrone · 2014-11-27T15:52:49Z

We currently log whether a node is over the high and low watermark once every 30 seconds. To be more efficient and reduce the amount of logs generated, we should log it only once (using a latch) when the watermark is exceeded, and once when the disk goes back under the watermark.

We should also expose whether a node is above the watermarks in the nodes stats.

synhershko · 2014-11-27T16:10:00Z

+1 for exposing this in node stats, way better than nagging in the logs

kimchy · 2014-11-27T16:18:01Z

brainstorming here, but maybe we should have an allocation status API, and each decider can return an explanation + structured flags (allow_all, allow_primary, allow_replica) on a cluster view, node view, or index view (i.e. when concurrent recoveries is breached, ...). If its provided with a specific shard, then maybe we can give details on that specific shard?

dakrone · 2014-11-27T16:20:12Z

@kimchy +1 on an allocation status API, I think we should separate the two (do both, but separately I mean), I think the disk usage percentage and watermark passed/not-passed should be exposed via the nodes stats API as part of the FsStats as a first step, then we can add the allocation status API as an additional step.

kimchy · 2014-11-27T16:21:17Z

@dakrone ++

clintongormley · 2016-11-26T13:48:38Z

@dakrone you planning on returning to this one at some stage?

dakrone · 2016-12-05T20:51:06Z

@clintongormley yes, I've updated the title for this as well for its actual work.

This exposes the least and most used disk usage estimates within the "fs" nodes stats output: ```json GET /_nodes/stats/fs?pretty&human { "nodes" : { "34fPVU0uQ_-wWitDzDXX_g" : { "fs" : { "timestamp" : 1481238723550, "total" : { "total" : "396.1gb", "total_in_bytes" : 425343254528, "free" : "140.6gb", "free_in_bytes" : 151068725248, "available" : "120.5gb", "available_in_bytes" : 129438912512 }, "least_usage_estimate" : { "path" : "/home/hinmanm/es/elasticsearch/distribution/build/cluster/run node0/elasticsearch-6.0.0-alpha1-SNAPSHOT/data/nodes/0", "total" : "396.1gb", "total_in_bytes" : 425343254528, "available" : "120.5gb", "available_in_bytes" : 129438633984, "used_disk_percent" : 69.56842912023208 }, "most_usage_estimate" : { "path" : "/home/hinmanm/es/elasticsearch/distribution/build/cluster/run node0/elasticsearch-6.0.0-alpha1-SNAPSHOT/data/nodes/0", "total" : "396.1gb", "total_in_bytes" : 425343254528, "available" : "120.5gb", "available_in_bytes" : 129438633984, "used_disk_percent" : 69.56842912023208 }, "data" : [{...}], "io_stats" : {...} } } } } ``` Resolves elastic#8686

godber · 2020-01-08T23:37:21Z

Hi All, sorry to comment on this blast from the past but I've recently found myself needing this type of functionality. Was the PR accidentally closed instead of accepted or something? It looks like everyone was in agreement an MR was made and then closed.

DaveCTurner · 2020-01-09T10:10:33Z

I'm not sure why it's marked as Closed in the Github UI since it was indeed merged (see 4eb32e9) and still works to this day (version 7.5.0):

$ curl -s 'http://localhost:9200/_nodes/stats/fs' | jq .
{
  "_nodes": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "cluster_name": "elasticsearch",
  "nodes": {
    "TXGs3dDwS1WmAQnicIxqkg": {
      "timestamp": 1578564322370,
      "name": "node-0",
      "transport_address": "127.0.0.1:9300",
      "host": "127.0.0.1",
      "ip": "127.0.0.1:9300",
      "roles": [
        "ingest",
        "master",
        "data",
        "ml"
      ],
      "attributes": {
        "ml.machine_memory": "17179869184",
        "xpack.installed": "true",
        "ml.max_open_jobs": "20"
      },
      "fs": {
        "timestamp": 1578564322370,
        "total": {
          "total_in_bytes": 499963170816,
          "free_in_bytes": 270946213888,
          "available_in_bytes": 265933774848
        },
        "least_usage_estimate": {
          "path": "/Users/davidturner/issues/8686/elasticsearch-7.5.0/data-0/nodes/0",
          "total_in_bytes": 499963170816,
          "available_in_bytes": 265933742080,
          "used_disk_percent": 46.80933364632356
        },
        "most_usage_estimate": {
          "path": "/Users/davidturner/issues/8686/elasticsearch-7.5.0/data-0/nodes/0",
          "total_in_bytes": 499963170816,
          "available_in_bytes": 265933742080,
          "used_disk_percent": 46.80933364632356
        },
        "data": [
          {
            "path": "/Users/davidturner/issues/8686/elasticsearch-7.5.0/data-0/nodes/0",
            "mount": "/ (/dev/disk1s1)",
            "type": "apfs",
            "total_in_bytes": 499963170816,
            "free_in_bytes": 270946213888,
            "available_in_bytes": 265933774848
          }
        ]
      }
    }
  }
}

This information is also available in the allocation explain API if you pass the ?include_disk_info parameter.

dakrone · 2020-01-09T16:16:31Z

I'm not sure why it's marked as Closed in the Github UI since it was indeed merged (see 4eb32e9) and still works to this day (version 7.5.0)

Github only marks squashed commits as "Merged" if you used the merge button, otherwise they are marked as closed. I probably merged it manually (or hey, maybe the merge button wasn't even around back then).

godber · 2020-01-09T18:03:25Z

Ah, my bad, thanks for taking the time to respond. We don't see this in our 6.8.1 clusters, so I was confused and thought it didn't make it in. My issue must be something else.

DaveCTurner · 2020-07-20T15:34:47Z

@godber this came up again internally and we found that these stats are indeed sometimes (often) missing or stale, so that might explain your issue. The fix would have been rather convoluted, and the correct stats are available from the allocation explain API, and/or can be computed from the stats API response, so we've reverted this change in #59755.

godber · 2020-07-21T01:15:33Z

@DaveCTurner thanks for remembering me and pinging me here!

dakrone mentioned this issue Nov 27, 2014

Disk free space threshold - at least a Warning message in the log file #8367

Closed

bleskes mentioned this issue Nov 30, 2014

Index can get stuck closed if any of its shards has half or less of its lucene indexes running #3354

Closed

dakrone mentioned this issue Jan 21, 2015

Relax restrictions on filesystem size reporting in DiskUsage #9283

Merged

dakrone mentioned this issue Nov 6, 2015

Add shard allocation explain API to explain why shards are (or aren't) UNASSIGNED #14593

Closed

clintongormley added >enhancement :Allocation labels Nov 21, 2015

clintongormley assigned dakrone Nov 21, 2015

dakrone changed the title ~~Log disk thresholds once and expose allocation blocking in nodes stats API~~ Expose estimated disk usage and watermark information in nodes stats API Dec 5, 2016

dakrone mentioned this issue Dec 9, 2016

Expose disk usage estimates in nodes stats #22081

Closed

dakrone closed this as completed in 4eb32e9 Jan 19, 2017

lcawl added :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. and removed :Allocation labels Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose estimated disk usage and watermark information in nodes stats API #8686

Expose estimated disk usage and watermark information in nodes stats API #8686

dakrone commented Nov 27, 2014

synhershko commented Nov 27, 2014

kimchy commented Nov 27, 2014

dakrone commented Nov 27, 2014

kimchy commented Nov 27, 2014

clintongormley commented Nov 26, 2016

dakrone commented Dec 5, 2016

godber commented Jan 8, 2020

DaveCTurner commented Jan 9, 2020

dakrone commented Jan 9, 2020

godber commented Jan 9, 2020

DaveCTurner commented Jul 20, 2020

godber commented Jul 21, 2020

Expose estimated disk usage and watermark information in nodes stats API #8686

Expose estimated disk usage and watermark information in nodes stats API #8686

Comments

dakrone commented Nov 27, 2014

synhershko commented Nov 27, 2014

kimchy commented Nov 27, 2014

dakrone commented Nov 27, 2014

kimchy commented Nov 27, 2014

clintongormley commented Nov 26, 2016

dakrone commented Dec 5, 2016

godber commented Jan 8, 2020

DaveCTurner commented Jan 9, 2020

dakrone commented Jan 9, 2020

godber commented Jan 9, 2020

DaveCTurner commented Jul 20, 2020

godber commented Jul 21, 2020