Skip to content

Zero value Kubelet PSI metrics emitted even if underlying OS doesn't enable it #136333

@ngopalak-redhat

Description

@ngopalak-redhat

What happened?

In Kubernetes 1.34, the KubeletPSI feature gate was set to true. However, when running Kubernetes on an OS that does not have PSI enabled, the PSI metrics are not expected to be generated. Here are the counts of the metrics observed during a test:

# for node in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do echo "=== Node: $node ==="; for metric in cpu_waiting cpu_stalled memory_waiting memory_stalled io_waiting io_stalled; do echo -n "container_pressure_${metric}_seconds_total: "; kubectl get --raw "/api/v1/nodes/$node/proxy/metrics/cadvisor" | grep "container_pressure_${metric}_seconds_total" | wc -l; done; done
=== Node: ip-10-0-11-217.us-east-2.compute.internal ===
container_pressure_cpu_waiting_seconds_total:      267
container_pressure_cpu_stalled_seconds_total:      267
container_pressure_memory_waiting_seconds_total:      267
container_pressure_memory_stalled_seconds_total:      267
container_pressure_io_waiting_seconds_total:      267
container_pressure_io_stalled_seconds_total:      267

I suspect this part of the code:

https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cadvisor/cadvisor_linux.go#L105C53-L105C63

	if utilfeature.DefaultFeatureGate.Enabled(features.KubeletPSI) {
		includedMetrics[cadvisormetrics.PressureMetrics] = struct{}{}
	} 

and prometheus does this: https://github.com/google/cadvisor/blob/master/metrics/prometheus.go#L1842

	if includedMetrics.Has(container.PressureMetrics) {
		c.containerMetrics = append(c.containerMetrics, []containerMetric{
			{
				name:      "container_pressure_cpu_stalled_seconds_total", 

cc: @haircommander @bitoku

What did you expect to happen?

There should be no PSI metrics when underlying OS never enables it.

How can we reproduce it (as minimally and precisely as possible)?

Run k8s 1.34 and above and use grafana to monitor PSI metrics

Anything else we need to know?

This is a good to have. Doesn't really impact anything as cardinality is also negligible. This zero value confuses the end-user as it gives a visual impression that PSI is enabled when looking at grafana charts.

I can propose a fix on this.

Kubernetes version

1.34

Details
$ kubectl version
# paste output here

Cloud provider

Details
k8s on GCP instances

OS version

Linux: 5.14.0-570.78.1.el9_6.x86_64

Details
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Details

Container runtime (CRI) and version (if applicable)

Details

Related plugins (CNI, CSI, ...) and versions (if applicable)

Details

Metadata

Metadata

Labels

kind/bugCategorizes issue or PR as related to a bug.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.sig/nodeCategorizes an issue or PR as relevant to SIG Node.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions