Skip to content

DRA extended resource quota#134210

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
yliaog:admit_quota
Nov 6, 2025
Merged

DRA extended resource quota#134210
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
yliaog:admit_quota

Conversation

@yliaog
Copy link
Copy Markdown
Contributor

@yliaog yliaog commented Sep 22, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

adjusts DRA extended resource quota to include devices usages from regular resource claims

Which issue(s) this PR is related to:

#133671

Special notes for your reviewer:

Does this PR introduce a user-facing change?

ResourceQuota now counts device class requests within a ResourceClaim object as consuming two additional quotas when the DRAExtendedResource feature is enabled:
- `requests.deviceclass.resource.k8s.io/<deviceclass>` with a quantity equal to the worst case count of devices requested
- requests for device classes that map to an extended resource consume `requests.<extended resource name>`

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/apiserver area/test kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 22, 2025
@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Sep 22, 2025
@github-project-automation github-project-automation Bot moved this to Needs Triage in SIG Apps Sep 22, 2025
@k8s-ci-robot k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. wg/device-management Categorizes an issue or PR as relevant to WG Device Management. labels Sep 22, 2025
@k8s-triage-robot
Copy link
Copy Markdown

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 3, 2025
@yliaog
Copy link
Copy Markdown
Contributor Author

yliaog commented Nov 4, 2025

@liggitt this PR is decoupled from #134882, it does not rely on the specifics of how device requests or <container, resource, request> mappings are created. It can be reviewed separately. Please take a look.

Comment thread pkg/quota/v1/evaluator/core/registry.go Outdated
Comment thread pkg/quota/v1/evaluator/core/resource_claims.go Outdated
@yliaog
Copy link
Copy Markdown
Contributor Author

yliaog commented Nov 5, 2025

/retest

Comment thread pkg/quota/v1/evaluator/core/resource_claims.go Outdated
Comment thread pkg/quota/v1/evaluator/core/resource_claims.go Outdated
@liggitt
Copy link
Copy Markdown
Member

liggitt commented Nov 5, 2025

lgtm, go ahead and squash for merge

Copy link
Copy Markdown
Contributor

@pohly pohly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits on the DRA part. Non-blocking, can also be a future cleanup PR.

func (c *ExtendedResourceCache) updateClassMapping(deviceClass *resourceapi.DeviceClass) {
if deviceClass == nil {
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the nil check?

As far as I can tell it's always called with a valid instance.

Might be worth removing also for the other methods, I missed that earlier.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the nil check in this and other methods

// mapping maps extended resource name to device class name
mapping map[v1.ResourceName]string
// classMapping maps device class name to extended resource name
classMapping map[string]string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, now the naming of these "mappings" is a bit unfortunate.

How about this?

mapping -> resourceName2class
classMapping -> class2ResourceName

The log message also seem a bit ambiguous.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to resourceName2class and class2ResourceName

func (c *ExtendedResourceCache) removeClassMapping(deviceClass *resourceapi.DeviceClass) {
if deviceClass == nil {
return
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment thread test/e2e/dra/dra.go Outdated
pod := b.Pod()
res := v1.ResourceList{}

// b.ExtendedResourceName(0) is added to the deivce class with name: b.ClassName()+"0"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// b.ExtendedResourceName(0) is added to the deivce class with name: b.ClassName()+"0"
// b.ExtendedResourceName(0) is added to the device class with name: b.ClassName()+"0"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@yliaog
Copy link
Copy Markdown
Contributor Author

yliaog commented Nov 5, 2025

@liggitt squashed the commits, PTAL

@yliaog
Copy link
Copy Markdown
Contributor Author

yliaog commented Nov 6, 2025

/retest

@k8s-ci-robot k8s-ci-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Nov 6, 2025
@liggitt
Copy link
Copy Markdown
Member

liggitt commented Nov 6, 2025

/lgtm
/approve
/sig scheduling api-machinery node

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 416626f4f143ad14d618f284299a1445545e2cbf

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, yliaog

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pacoxu
Copy link
Copy Markdown
Member

pacoxu commented Nov 6, 2025

/retest-required

@yliaog
Copy link
Copy Markdown
Contributor Author

yliaog commented Nov 6, 2025

/retest

5 similar comments
@yliaog
Copy link
Copy Markdown
Contributor Author

yliaog commented Nov 6, 2025

/retest

@yliaog
Copy link
Copy Markdown
Contributor Author

yliaog commented Nov 6, 2025

/retest

@yliaog
Copy link
Copy Markdown
Contributor Author

yliaog commented Nov 6, 2025

/retest

@yliaog
Copy link
Copy Markdown
Contributor Author

yliaog commented Nov 6, 2025

/retest

@yliaog
Copy link
Copy Markdown
Contributor Author

yliaog commented Nov 6, 2025

/retest

@pohly
Copy link
Copy Markdown
Contributor

pohly commented Nov 6, 2025

/lgtm
/retest

Known flakes.

@pohly
Copy link
Copy Markdown
Contributor

pohly commented Nov 6, 2025

squashed the commits, PTAL

Please avoid rebases during squashing. You can use git rebase -i --keep-base instead.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@yliaog: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce 870062d link unknown /test pull-kubernetes-e2e-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@pacoxu
Copy link
Copy Markdown
Member

pacoxu commented Nov 6, 2025

/test pull-kubernetes-e2e-gce
https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/134210/pull-kubernetes-e2e-gce/1986325424531050496
Kubernetes e2e suite: [It] [sig-node] FileKeyRef [FeatureGate:EnvFiles] [Beta] should allow ephemeralContainer to consume fileKeyRef, @HirazawaUi could you check if this flake is a new flake after #134414.

Comment thread test/e2e/dra/dra.go
b := drautils.NewBuilder(f, driver)
b.UseExtendedResourceName = true

ginkgo.It("must run a pod with extended resource with resource quota", func(ctx context.Context) {
Copy link
Copy Markdown
Contributor

@pohly pohly Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is flaking: #135177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Projects

Status: ✅ Done
Archived in project
Archived in project
Archived in project

Development

Successfully merging this pull request may close these issues.