-
Notifications
You must be signed in to change notification settings - Fork 43k
Kubelet - Cadvisor Exposes Misleading IO metrics In Cgroup V2 #102285
Copy link
Copy link
Closed
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.Denotes an issue or PR has remained open with no activity and has become stale.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Type
Projects
Status
Done
What happened:
We recently upgraded our k8s node from cgroup v1 to cgroup v2 hierarchy and saw that few io related metrics from kubelet (/metrics/cadvisor) are showing erroneous values. for example -
container_fs_writes_bytes_totalorcontainer_fs_writes_totalalways has zero value for all cgroups within that node even though one pod is heavily writing to some pvc in that node. e.gWhat you expected to happen:
We expected to see correct values as specified by
io.statfile under container's cgroup v2 hierarchy.How to reproduce it (as minimally and precisely as possible):
Simply provision a k8s node with below environment and enable cgroup v2 with following commands during boot.
Anything else we need to know?:
We've tried kubelet version (v.1.19.8) on another node having cgroup v1 hierarchy with exactly same workload and saw that above metrics are giving values as expected, but they remains zero for cgroup v2 enabled node. We are using kubelet version 1.19.8 which has dependency of cadvisor version v0.37.4. To debug it, I’ve tried increasing the verbosity of kubelet logs --v=5, but there were no surprises there. I’ve also tried running cadvisor (v0.37.4) standalone binary on cgroup v2 enabled node, but it was having the same issue i.e /metrics endpoint zero value for above metrics.
Environment:
kubectl version): 1.19.8cat /etc/os-release): Debian GNU/Linux 10 (buster)`uname -a): 5.10.0-0.bpo.3-cloud-amd64 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Debian 5.10.13-1~bpo10+1 (2021-02-11) x86_64 GNU/Linux