[PodLevelResources] Propagate Pod level hugepage cgroup to containers#131089
[PodLevelResources] Propagate Pod level hugepage cgroup to containers#131089k8s-ci-robot merged 6 commits intokubernetes:masterfrom
Conversation
|
Hi @KevinTMtz. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/assign @ndixita |
b48a3ed to
3ea8028
Compare
3ea8028 to
ec198a6
Compare
352b28d to
cc23ad7
Compare
|
Validation changes and API doc changes look good. Only outstanding comment is the request for testing of pod / pod-spec that incorporates the defaulting logic at #131089 (comment) |
|
/retest |
| } | ||
|
|
||
| // Step 2: Deployment defaulting and validation | ||
| podForPodValidation = makePod(tc.podResources, tc.containerResources) |
There was a problem hiding this comment.
nit: use a different variable new here which is a copy of podForPodValidation just to avoid confusion
There was a problem hiding this comment.
I added numbering to the variables, thank you.
| } | ||
|
|
||
| // We do not default pod-level hugepage limits if there is a hugepage request. | ||
| if _, exists := pod.Spec.Resources.Requests[key]; exists { |
There was a problem hiding this comment.
If a pod is created / defaulted in a 1.34 server with this change, what will happen when that pod is read from / updated via a 1.33 API server? The 1.33 server will default that field, right? That means a read-from-1.33 / update-via-1.34 will appear as a mutation of the pod-level limits and be rejected by 1.34 validation?
There was a problem hiding this comment.
Yes, it would not pass validation, with this change, validation is being tightened. However, it would reject just one specific case that anyways would not make sense for the user, because all the other ones are already rejected without the change in defaulting logic.
Cases that are already being rejected:
-
The pod level hugepage request is not equal to the aggregated hugepage container limits.
The defaulting logic would set the pod level limit from the aggregated hugepage container limits, however because of the defaulted pod level hugepage limit being different than the specified pod level hugepage request, the spec would be invalid.
With the new change: It would still be rejected, however because of not specifying the hugepage limit when specifying the hugepage request, and not because of request and limit not being equal.
Single case where the pod is currently accepted, but would be rejected with the change:
-
The pod level hugepage request is equal to the aggregated hugepage container limits.
The defaulting logic would set the pod level limit from the aggregated hugepage container limits, and because of the defaulted pod level hugepage limit being equal to the specified pod level request, the spec would be valid.
With the new change: This case would not pass validation, because the pod level hugepage limit would not be defaulted, and validation would reject the spec because of not specifying the hugepage limit when a hugepage request is specified.
There was a problem hiding this comment.
ok, I need to think through that explanation
since the updates to this PR added defaulting changes, pulling back into the api review queue to try to reason through those and make sure we're still in good shape
There was a problem hiding this comment.
Ok, this looks good, make sure we have the following test scenarios covered:
-
PodSpec in a Deployment with container hugepages limit and no request
- I think this doesn't get defaulted (e.g. in a Deployment), and passes validation by skipping the
limit && requestequality check on podspec - the eventual pod would default container hugepages request, and still pass validation
- make sure pod-level resources also avoids defaulting hugepages, and validation is happy with this scenario both for podspec and pod
- I think this doesn't get defaulted (e.g. in a Deployment), and passes validation by skipping the
-
Pod with a pod-level hugepages request and no limit errors as expected
-
Pod with a pod-level hugepages request and limit works as expected
|
/retest |
| } | ||
|
|
||
| // We do not default pod-level hugepage limits if there is a hugepage request. | ||
| if _, exists := pod.Spec.Resources.Requests[key]; exists { |
There was a problem hiding this comment.
Ok, this looks good, make sure we have the following test scenarios covered:
-
PodSpec in a Deployment with container hugepages limit and no request
- I think this doesn't get defaulted (e.g. in a Deployment), and passes validation by skipping the
limit && requestequality check on podspec - the eventual pod would default container hugepages request, and still pass validation
- make sure pod-level resources also avoids defaulting hugepages, and validation is happy with this scenario both for podspec and pod
- I think this doesn't get defaulted (e.g. in a Deployment), and passes validation by skipping the
-
Pod with a pod-level hugepages request and no limit errors as expected
-
Pod with a pod-level hugepages request and limit works as expected
|
Defaulting change looks good, this is on a field that was alpha-gated in 1.33. The change makes it clearer that someone explicitly setting pod-level hugepages requests must set the corresponding pod-level limit themselves. Marked as API approved, will tag once the test coverage is verified and the clarification at #131089 (comment) is updated |
The hugepage aggregated container limits cannot be greater than pod-level limits. This was already enforced with the defaulted requests from the specfied limits, however it did not make it clear about both hugepage requests and limits.
Pod level hugepage resources are not propagated to the containers, only pod level cgroup values are propagated to the containers when they do not specify hugepage resources.
Currently there are no E2E tests that focus the PodSpec, current E2E tests focus on pods. Would it be possible to add those in a follow up PR? Deployments undergo the same resource validation as pod, so we would actually be testing the same cases, however only the ones that are valid without the defaulting. In regards to unit test, the newly added
Unit and E2E tests added in #130577 together with the new tests added in this PR cover this functionality.
Unit and E2E tests added in #130577 together with the new tests added in this PR cover this functionality. |
|
/lgtm |
|
LGTM label has been added. DetailsGit tree hash: 6922a6382899bc3073cb388b5738aeef21eb5eec |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: KevinTMtz, liggitt, tallclair The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
/test pull-kubernetes-e2e-gce |
|
@KevinTMtz: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest |
What type of PR is this?
What this PR does / why we need it:
Follow up of [PodLevelResources] Pod Level Hugepage Resources.
This PR propagates Pod level hugepage cgroup to containers with the following changes:
Additionally adds:
Which issue(s) this PR fixes:
Fixes #132543
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: