-
Notifications
You must be signed in to change notification settings - Fork 1.7k
DRA: ReservedFor Workloads #5194
Copy link
Copy link
Closed
Labels
sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.Categorizes an issue or PR as relevant to SIG Scheduling.stage/alphaDenotes an issue tracking an enhancement targeted for Alpha statusDenotes an issue tracking an enhancement targeted for Alpha statuswg/device-managementCategorizes an issue or PR as relevant to WG Device Management.Categorizes an issue or PR as relevant to WG Device Management.
Metadata
Metadata
Assignees
Labels
sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.Categorizes an issue or PR as relevant to SIG Scheduling.stage/alphaDenotes an issue tracking an enhancement targeted for Alpha statusDenotes an issue tracking an enhancement targeted for Alpha statuswg/device-managementCategorizes an issue or PR as relevant to WG Device Management.Categorizes an issue or PR as relevant to WG Device Management.
Type
Projects
Status
✅ Done
Status
Closed
Status
Pre-Alpha
Status
Removed
Status
Removed from Milestone
Enhancement Description
Currently, when the scheduler allocates a ResourceClaim for a given Pod, it adds that Pod to the ResourceClaimStatus.ReservedFor list. A claim shared among multiple pods will have multiple entries in this list. This allows the ResourceClaimController to know when to de-allocate the claim; it does so once this list is empty.
The length of this list is limited to 256 pods. However, some workloads are much larger and may share a resource claim across many more pods, even in the thousands. Simply increasing the pod list to thousands is not a good long term solution.
Instead, this proposal is to allow us to reserve it for a workload. For example, rather than listing individual pods, you could list the Job or ReplicaSet or StatefulSet that is sharing the ResourceClaim. This avoids race conditions as pods come and go, without requiring listing every pod.
cc @pohly @klueska @thebinaryone1
k/enhancements) update PR(s):k/k) update PR(s):k/website) update PR(s):Please keep this description up to date. This will help the Enhancement Team to track the evolution of the enhancement efficiently.