Skip to content

[PodLevelResources] handle pod-level resource manager alignment#133279

Merged
k8s-ci-robot merged 6 commits intokubernetes:masterfrom
ffromani:pod-level-resource-managers
Jul 30, 2025
Merged

[PodLevelResources] handle pod-level resource manager alignment#133279
k8s-ci-robot merged 6 commits intokubernetes:masterfrom
ffromani:pod-level-resource-managers

Conversation

@ffromani
Copy link
Copy Markdown
Contributor

@ffromani ffromani commented Jul 29, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR makes the cpu and memory manager ignore with a loud log pods which are candidate for exclusive resource allocation, but which also include pod-level resources.
These pods aren't supported by the cpu and memory manager so we either make admission fail or we force them to not be eligible for exclusive resource allocation.
According to the conversation in the KEP, this PR implements the latter approach (see this thread: kubernetes/enhancements#5362 (comment))

This PR also reuses parts of #132634 (with proper authorship attribution):
Unit tests for no hints nor aligment of CPU and Memory
E2E tests for no hints nor aligment of CPU and Memory managers

Which issue(s) this PR is related to:

Fixes: #132445

Special notes for your reviewer:

I'm reverting one of @KevinTMtz to preserve the initial work as closely as possible so my PR is purely additive to that work

Does this PR introduce a user-facing change?

- Prevents any type of CPU/Memory alignment or hint generation with the Topology manager from the CPU or Memory manager when Pod Level resources are used in the pod spec.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: [kubernetes/enhancements#2837](https://github.com/kubernetes/enhancements/issues/2837)

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 29, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Jul 29, 2025
@ffromani
Copy link
Copy Markdown
Contributor Author

/sig node

@k8s-ci-robot k8s-ci-robot removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 29, 2025
@k8s-ci-robot k8s-ci-robot requested review from klueska and mrunalp July 29, 2025 14:37
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ffromani

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2025
@ffromani
Copy link
Copy Markdown
Contributor Author

/test pull-kubernetes-node-kubelet-serial-podlevelresources

@ffromani
Copy link
Copy Markdown
Contributor Author

TL;DR: right now I don't recall which other kube component is involved in the e2e_node flow besides the kubelet. I was thinking apiserver, and would make sense, but a quick ps on my test box is inconclusive.
If I hardcode the FG to true in kube_features.go, all the kube components gets the right value, which explains why the test passes. So we need to figure out where we forgot to enable the FG and that should be it (plus minor polishing in the commits)

ffromani added 2 commits July 29, 2025 20:19
When pod-level resources are detected, the cpu and memory manages
cannot engage because the feature is not yet compatible,
one of the main reasons being the managers only work at container level.

So, the managers has to detect if pod level resources are in use,
and turn themselves to no-operation skipping resource allocation
should that be the case.

We add an intentional loud log to inform the user, because
pods with pod-level resources landing on a node which cannot
actuate the desired spec is likely to be undesirable.

Signed-off-by: Francesco Romani <[email protected]>
Signed-off-by: Francesco Romani <[email protected]>
@ffromani ffromani force-pushed the pod-level-resource-managers branch from a1926bd to a3a767b Compare July 29, 2025 18:20
@ffromani
Copy link
Copy Markdown
Contributor Author

removed the WIP from the core commit fixing the logic. IMO is ready and as good as I can make it. The pending thing is fixing the e2e tests as discussed activating correctly and fully the FGs. @ndixita I have to leave now, feel free to pick my commit in your PR and move on from there.

@ndixita
Copy link
Copy Markdown
Contributor

ndixita commented Jul 29, 2025

/test pull-kubernetes-node-kubelet-serial-podlevelresources

@ndixita
Copy link
Copy Markdown
Contributor

ndixita commented Jul 29, 2025

/test pull-kubernetes-node-kubelet-serial-cpu-manager
/test pull-kubernetes-node-kubelet-serial-hugepages
/test pull-kubernetes-node-kubelet-serial-memory-manager
/test pull-kubernetes-node-kubelet-serial-topology-manager

@ndixita
Copy link
Copy Markdown
Contributor

ndixita commented Jul 29, 2025

/test pull-kubernetes-e2e-gce

@ndixita
Copy link
Copy Markdown
Contributor

ndixita commented Jul 29, 2025

/sig node

@tallclair
Copy link
Copy Markdown
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 5269177f645210309af97c6304dc7cad08142c9a

@tallclair
Copy link
Copy Markdown
Member

/retitle [PodLevelResources] handle pod-level resource manager alignment

@k8s-ci-robot k8s-ci-robot changed the title WIP: [PodLevelResources] handle pod-level resource manager alignment [PodLevelResources] handle pod-level resource manager alignment Jul 29, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 29, 2025
@ndixita ndixita moved this to Needs Review in SIG Node: Pod Level Resources Jul 29, 2025
@ndixita
Copy link
Copy Markdown
Contributor

ndixita commented Jul 29, 2025

removed the WIP from the core commit fixing the logic. IMO is ready and as good as I can make it. The pending thing is fixing the e2e tests as discussed activating correctly and fully the FGs. @ndixita I have to leave now, feel free to pick my commit in your PR and move on from there.

@ffromani I have dropped the details on slack chat. The tests were failing because of job configuration not having PodLevelResources enabled. Everything else looks good. Thank you

@ndixita
Copy link
Copy Markdown
Contributor

ndixita commented Jul 29, 2025

/lgtm

@tallclair
Copy link
Copy Markdown
Member

/milestone v1.34

Exception: https://groups.google.com/g/kubernetes-sig-release/c/WLVtwIEgiuQ/m/D5UqP_fQAAAJ

@k8s-ci-robot k8s-ci-robot added this to the v1.34 milestone Jul 29, 2025
@k8s-ci-robot k8s-ci-robot merged commit 91731d0 into kubernetes:master Jul 30, 2025
19 checks passed
@github-project-automation github-project-automation Bot moved this from Triage to Done in SIG Node CI/Test Board Jul 30, 2025
@github-project-automation github-project-automation Bot moved this from Needs Review to Done in SIG Node: Pod Level Resources Jul 30, 2025
@ffromani ffromani deleted the pod-level-resource-managers branch July 31, 2025 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

Event for unsupported pod-level resource policies for alignment managers

5 participants