Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container_fs metrics {device} has no label or info-metric to associate it with a volumeattachment or persistentvolumeclaim #3588

Open
ringerc opened this issue Sep 4, 2024 · 0 comments

Comments

@ringerc
Copy link

ringerc commented Sep 4, 2024

Problem

There appears to be no label or info-metric to associate cadvisor's container_fs_* metrics with a PersistentVolume attachment or PersistentVolumeClaim, or with the mount-point of the fs within the container. There is only a device label, for the device-node path within the OS.

This makes it seemingly impossible to determine which meaningful volume a container's I/O is associated with; for example, if a database container has two PVs mounted, one for the main DB and one for WAL, and it also has an ephemeral volume for tempfiles and sorts, there seems to be no way to tell which container_fs_writes_bytes_total metric is for which of the volumes.

Proposed feature

At would be enormously helpful if cadvsor added a label or exposed an info metric associating (device,container) label-pairs from container_fs_* metrics with the k8s volume attachment name or persistent volume claim.

It'd also be great to have an info-metric exposing the volume mount path within the container to the (container,device). This can't be done as an extra label on the container_fs_* metrics because one device node can be mounted multiple times within one container (bind mounts, subvolume mounts, btrfs submounts, etc). This would make it possible to see the container-path the volume is mounted on in monitoring. It would also then be possible to associate the persistent volume by exposing the volumeMount paths for a Pod in kube-state-metrics.

Exposing the filesystem uuid would also be helpful.

Alternatives considered

kube-state-metrics cannot provide this because it has no insight into the device node a container volumeMount path is associated with. There's nothing usable in PersistentVolume or PersistentVolumeClaim's .spec or .status. cadvisor doesn't appear to expose the csi info, pvc uid which could be used to associate these. There is a VolumeAttachment CR with .status.metadata.devicePath (for some CSIs) which k-s-m exposes as kube_volumeattachment_status_attachment_metadata but this only seems to be provided by the AWS EKS CSI, and the volume paths differ to those seen within the container e.g. an in-container /dev/dm-0 is exposed as /dev/xvdaa in the attachment metadata. Thus this is not usable for volume associations.

node-exporter recently gained filesystem_mount_info (prometheus/node_exporter#2970) which maps device to mountpoint - but it isn't container-scoped and doesn't expose the volume attachment so it's not usable for associating the device with a PV. Its older node_filesystem_avail_bytes{device,mountpoint} similarly exposes host-path mount points under /run/containerd/io.containerd.grpc.v1.cri/ and has no info that could be used to associate with a PV, PV attachment or volumeMount. (Due to mount-scoping rules it cannot see some mounts anyway).

kubelet metrics don't appear to expose the needed info either, and there's nothing apparent in the main k8s metrics docs either. Querying kubectl get --raw "/api/v1/nodes/NODENAME/proxy/metrics" and kubectl get --raw "/api/v1/nodes/NODENAME/proxy/metrics/resource" didn't reveal anything promising.

Kubelet /stats/summary is (a) deprecated and (b) exposes the volume's name as listed in Pod.spec.volumes and any pvcRef but not the device-node or mount-path so it cannot be used to associate metrics. It doesn't have I/O stats so it's not an alternative data source either.

So I didn't find any way to associate the cadvisor metrics to the pv attachment, pvc, and pv by fs uuid, pv uuid, pvc uuid, data exposed by kube apiserver, other existing metrics api servers, etc.

Benefits

If a volumeattachment label was available directly or via an info-metric, this could be joined on kube_volumeattachment_spec_source_persistentvolume from kube-state-metrics to find the kube_persistentvolumeclaim_info, kube_persisistentvolume_info, kube_persistentvolumeclaim_labels, etc.

If a mapping of volumeMount paths to volumes and devices was available, I/O could be associated with a specific container path in reporting and dashboards, e.g. "100MiB/s on /postgres/data, 200 MiB/s on /postgres/pg_wal, 500MiB/s on /postgres/ephemeral_store_tablespace".

Details

cadvisor exposes some useful container-level filesystem I/O metrics:

  • container_fs_reads_bytes_total
  • container_fs_reads_total
  • container_fs_writes_bytes_total
  • container_fs_writes_total

which are exposed with labels including device (device node path the filesystem is mounted from) and name (container-id without containerd:// prefix), e.g.

container_fs_reads_bytes_total{container="...", device="/dev/dm-0", job="kubelet", metrics_path="/metrics/cadvisor", name="...", pod="...", ...}

There is nothing here, or in any of the other cadvisor metrics I found, that would allow this to be associated with a persistent volume claim. kube-state-metrics cannot expose this information because it does not have access to the device-node paths from which volumes are mounted within containers. See kubernetes/kube-state-metrics#1701

Looking at the cadvisor source:

  • there's FsInfo in MachineInfo which knows the device-node path, but not any volume attachment or persistent volume info.
  • there's PerDiskStats in DiskIoStats in ContainerStats, but nothing there associates with a volume attachment or a mount path. There's FsStats in the same file, which again is only keyed by Device and filesystem type, it doesn't have path or attachment info.
  • in metrics/prometheus.go the metrics with a "device" label don't appear to have anything else to associate with an attachment or claim; I didn't find likely keywords like "vol", "attach", "mount" or "path" anywhere.
  • There's a GetFsInfoByFsUUID function, and fs/fs.go uses https://pkg.go.dev/github.com/moby/sys/mountinfo#Info so it has access to the mount uid, but the FS UUID isn't exposed as a label, and even if it was there doesn't seem to be anything elsewhere to use the fs uuid to join on for a volume attachment etc.

There's container_blkio_device_usage_total with major minor and operation but that doesn't provide any association; the rest only have device as a label.

It looks like cadvisor could expose an info-metric on device -> mount points using Mountpoint from https://pkg.go.dev/github.com/moby/sys/mountinfo#Info, and expose the filesystem uid too. This doesn't provide a way to associate with a PV or PVC directly, but might be usable indirectly via pod metadata from kube-state-metrics etc since volume and volumeMount on a Pod are exposed in the API.

Ideally kubelet could expose this mapping instead. Perhaps via https://kubernetes.io/docs/reference/instrumentation/cri-pod-container-metrics/ . There's no sign it does so though.


Related: #1702

@ringerc ringerc changed the title No info-metric to associate container_fs metrics {device} label with persistentvolumeclaim or persisistentvolumeattachment No info-metric to associate container_fs metrics {device} label with volumeattachment or persistentvolumeclaim Sep 4, 2024
@ringerc ringerc changed the title No info-metric to associate container_fs metrics {device} label with volumeattachment or persistentvolumeclaim container_fs metrics {device} has no label or info-metric to associate it with a volumeattachment or persistentvolumeclaim Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant