[PodLevelResources] Propagate Pod level hugepage cgroup to containers by KevinTMtz · Pull Request #131089 · kubernetes/kubernetes

KevinTMtz · 2025-03-27T18:24:12Z

What type of PR is this?

What this PR does / why we need it:

Follow up of [PodLevelResources] Pod Level Hugepage Resources.

This PR propagates Pod level hugepage cgroup to containers with the following changes:

Pod level hugepage cgroup when unset in container
Unit test propagate pod level hugepages to containers

Additionally adds:

Validation logic for pod level hugepages
Unit test pod level hugepage default and validation logic
E2E tests for container hugepage resources immutability

Which issue(s) this PR fixes:

Fixes #132543

Special notes for your reviewer:

Does this PR introduce a user-facing change?

- Changes underlying logic to propagate Pod level hugepage cgroup to containers when they do not specify hugepage resources.
- Adds validation to enforce the hugepage aggregated container limits to be smaller or equal to pod-level limits. This was already enforced with the defaulted requests from the specified limits, however it did not make it clear about both hugepage requests and limits.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [Other doc]: https://docs.google.com/document/d/1JaqE2eRmFAPlRayv8vsAWE4SmQCVXQLr9rFPhEaPlvQ/edit?usp=sharing

k8s-ci-robot · 2025-03-27T18:24:22Z

Hi @KevinTMtz. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

KevinTMtz · 2025-03-27T18:32:07Z

/assign @ndixita

liggitt · 2025-07-17T23:35:20Z

Validation changes and API doc changes look good. Only outstanding comment is the request for testing of pod / pod-spec that incorporates the defaulting logic at #131089 (comment)

ndixita · 2025-07-22T03:30:09Z

/retest

ndixita · 2025-07-22T04:21:03Z

+			}
+
+			// Step 2: Deployment defaulting and validation
+			podForPodValidation = makePod(tc.podResources, tc.containerResources)


nit: use a different variable new here which is a copy of podForPodValidation just to avoid confusion

I added numbering to the variables, thank you.

liggitt · 2025-07-22T13:17:19Z

+		}
+
+		// We do not default pod-level hugepage limits if there is a hugepage request.
+		if _, exists := pod.Spec.Resources.Requests[key]; exists {


If a pod is created / defaulted in a 1.34 server with this change, what will happen when that pod is read from / updated via a 1.33 API server? The 1.33 server will default that field, right? That means a read-from-1.33 / update-via-1.34 will appear as a mutation of the pod-level limits and be rejected by 1.34 validation?

Yes, it would not pass validation, with this change, validation is being tightened. However, it would reject just one specific case that anyways would not make sense for the user, because all the other ones are already rejected without the change in defaulting logic.

Cases that are already being rejected:

The pod level hugepage request is not equal to the aggregated hugepage container limits.

The defaulting logic would set the pod level limit from the aggregated hugepage container limits, however because of the defaulted pod level hugepage limit being different than the specified pod level hugepage request, the spec would be invalid.

With the new change: It would still be rejected, however because of not specifying the hugepage limit when specifying the hugepage request, and not because of request and limit not being equal.

Single case where the pod is currently accepted, but would be rejected with the change:

The pod level hugepage request is equal to the aggregated hugepage container limits.

The defaulting logic would set the pod level limit from the aggregated hugepage container limits, and because of the defaulted pod level hugepage limit being equal to the specified pod level request, the spec would be valid.

With the new change: This case would not pass validation, because the pod level hugepage limit would not be defaulted, and validation would reject the spec because of not specifying the hugepage limit when a hugepage request is specified.

ok, I need to think through that explanation

since the updates to this PR added defaulting changes, pulling back into the api review queue to try to reason through those and make sure we're still in good shape

Ok, this looks good, make sure we have the following test scenarios covered:

PodSpec in a Deployment with container hugepages limit and no request

I think this doesn't get defaulted (e.g. in a Deployment), and passes validation by skipping the limit && request equality check on podspec

the eventual pod would default container hugepages request, and still pass validation

make sure pod-level resources also avoids defaulting hugepages, and validation is happy with this scenario both for podspec and pod

Pod with a pod-level hugepages request and no limit errors as expected

Pod with a pod-level hugepages request and limit works as expected

KevinTMtz · 2025-07-24T17:11:21Z

/retest

liggitt · 2025-07-24T21:03:38Z

+		}
+
+		// We do not default pod-level hugepage limits if there is a hugepage request.
+		if _, exists := pod.Spec.Resources.Requests[key]; exists {


Ok, this looks good, make sure we have the following test scenarios covered:

PodSpec in a Deployment with container hugepages limit and no request

I think this doesn't get defaulted (e.g. in a Deployment), and passes validation by skipping the limit && request equality check on podspec

the eventual pod would default container hugepages request, and still pass validation

make sure pod-level resources also avoids defaulting hugepages, and validation is happy with this scenario both for podspec and pod

Pod with a pod-level hugepages request and no limit errors as expected

Pod with a pod-level hugepages request and limit works as expected

liggitt · 2025-07-24T21:08:21Z

Defaulting change looks good, this is on a field that was alpha-gated in 1.33. The change makes it clearer that someone explicitly setting pod-level hugepages requests must set the corresponding pod-level limit themselves.

Marked as API approved, will tag once the test coverage is verified and the clarification at #131089 (comment) is updated

The hugepage aggregated container limits cannot be greater than pod-level limits. This was already enforced with the defaulted requests from the specfied limits, however it did not make it clear about both hugepage requests and limits.

Pod level hugepage resources are not propagated to the containers, only pod level cgroup values are propagated to the containers when they do not specify hugepage resources.

KevinTMtz · 2025-07-24T21:49:55Z

Ok, this looks good, make sure we have the following test scenarios covered:

PodSpec in a Deployment with container hugepages limit and no request

I think this doesn't get defaulted (e.g. in a Deployment), and passes validation by skipping the limit && request equality check on podspec

the eventual pod would default container hugepages request, and still pass validation

make sure pod-level resources also avoids defaulting hugepages, and validation is happy with this scenario both for podspec and pod

Currently there are no E2E tests that focus the PodSpec, current E2E tests focus on pods. Would it be possible to add those in a follow up PR? Deployments undergo the same resource validation as pod, so we would actually be testing the same cases, however only the ones that are valid without the defaulting.

In regards to unit test, the newly added pkg/api/testing/validate_pod_level_defaults_test.go covers those cases.

Pod with a pod-level hugepages request and no limit errors as expected

Unit and E2E tests added in #130577 together with the new tests added in this PR cover this functionality.

Pod with a pod-level hugepages request and limit works as expected

Unit and E2E tests added in #130577 together with the new tests added in this PR cover this functionality.

liggitt · 2025-07-24T21:50:52Z

/lgtm
/approve

k8s-ci-robot · 2025-07-24T21:50:58Z

LGTM label has been added.

Details

Git tree hash: 6922a6382899bc3073cb388b5738aeef21eb5eec

k8s-ci-robot · 2025-07-24T21:51:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: KevinTMtz, liggitt, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~api/OWNERS~~ [liggitt]
~~pkg/api/OWNERS~~ [liggitt]
~~pkg/apis/OWNERS~~ [liggitt]
~~pkg/generated/openapi/OWNERS~~ [liggitt]
~~pkg/kubelet/OWNERS~~ [liggitt,tallclair]
~~staging/src/k8s.io/api/OWNERS~~ [liggitt]
~~test/e2e_node/OWNERS~~ [liggitt,tallclair]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

KevinTMtz · 2025-07-24T22:01:07Z

/retest

KevinTMtz · 2025-07-24T22:44:07Z

/retest

KevinTMtz · 2025-07-24T23:48:52Z

/retest

pacoxu · 2025-07-25T02:00:24Z

/test pull-kubernetes-e2e-gce

k8s-ci-robot · 2025-07-25T02:06:13Z

@KevinTMtz: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kubernetes-e2e-gce	`1bc995c`	link	unknown	`/test pull-kubernetes-e2e-gce`
pull-kubernetes-integration	`1bc995c`	link	unknown	`/test pull-kubernetes-integration`

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

KevinTMtz · 2025-07-25T02:06:56Z

/retest

k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 27, 2025

k8s-ci-robot requested review from bart0sh and rphillips March 27, 2025 18:24

k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 27, 2025

github-project-automation Bot added this to SIG Node: code and documentation PRs Mar 27, 2025

github-project-automation Bot moved this to Triage in SIG Node: code and documentation PRs Mar 27, 2025

k8s-ci-robot assigned ndixita Mar 27, 2025

bart0sh moved this from Triage to Work in progress in SIG Node: code and documentation PRs Apr 7, 2025

ndixita moved this to Needs Triage in SIG Node: Pod Level Resources Jun 23, 2025

ndixita added this to SIG Node: Pod Level Resources Jun 23, 2025

ndixita removed the status in SIG Node: Pod Level Resources Jun 23, 2025

KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from b48a3ed to 3ea8028 Compare June 24, 2025 21:31

KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from 3ea8028 to ec198a6 Compare June 25, 2025 17:50

k8s-ci-robot added the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Jun 25, 2025

KevinTMtz force-pushed the pod-level-hugepage-cgroups branch from 352b28d to cc23ad7 Compare July 17, 2025 19:02

liggitt moved this from Changes requested to API review completed, 1.34 in API Reviews Jul 17, 2025

ndixita reviewed Jul 22, 2025

View reviewed changes

liggitt reviewed Jul 22, 2025

View reviewed changes

Comment thread pkg/apis/core/v1/defaults.go

liggitt reviewed Jul 22, 2025

View reviewed changes

KevinTMtz added 2 commits July 24, 2025 17:13

Pod level hugepage cgroup when unset in container

52b4574

Unit test propagate pod level hugepages to containers

8e3f93c

liggitt reviewed Jul 24, 2025

View reviewed changes

KevinTMtz added 4 commits July 24, 2025 21:29

Unit test pod level hugepage Default and Validation logic

9f5b09e

E2E tests for container hugepage resources immutability

f925e55

Pod level hugepage resources are not propagated to the containers, only pod level cgroup values are propagated to the containers when they do not specify hugepage resources.

Generated files

1bc995c

jenshu mentioned this pull request Jul 30, 2025

Pod level resources kubernetes/enhancements#2837

Open

23 tasks

rashansmith mentioned this pull request Aug 17, 2025

Update release notes draft to version v1.34.0-rc.0 kubernetes/sig-release#2835

Merged

KevinTMtz mentioned this pull request Dec 1, 2025

REQUEST: New membership for KevinTMtz kubernetes/org#6013

Closed

11 tasks

KevinTMtz mentioned this pull request Mar 11, 2026

[PodLevelResources] apiserver: fix pod-level resource limits defaulting on update #136676

Open

Conversation

KevinTMtz commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

k8s-ci-robot commented Mar 27, 2025

Uh oh!

KevinTMtz commented Mar 27, 2025

Uh oh!

liggitt commented Jul 17, 2025

Uh oh!

ndixita commented Jul 22, 2025

Uh oh!

ndixita Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

KevinTMtz Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liggitt Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

KevinTMtz Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liggitt Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liggitt Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

KevinTMtz commented Jul 24, 2025

Uh oh!

Uh oh!

liggitt Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

liggitt commented Jul 24, 2025

Uh oh!

KevinTMtz commented Jul 24, 2025

Uh oh!

liggitt commented Jul 24, 2025

Uh oh!

k8s-ci-robot commented Jul 24, 2025

Uh oh!

k8s-ci-robot commented Jul 24, 2025

Uh oh!

KevinTMtz commented Jul 24, 2025

Uh oh!

KevinTMtz commented Jul 24, 2025

Uh oh!

KevinTMtz commented Jul 24, 2025

Uh oh!

pacoxu commented Jul 25, 2025

Uh oh!

k8s-ci-robot commented Jul 25, 2025

Uh oh!

KevinTMtz commented Jul 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

KevinTMtz commented Mar 27, 2025 •

edited

Loading

KevinTMtz Jul 22, 2025 •

edited

Loading

liggitt Jul 22, 2025 •

edited

Loading