Skip to content

Add workload aware preemption#137606

Merged
k8s-ci-robot merged 3 commits intokubernetes:masterfrom
Argh4k:workload-preemption-v1alpha2
Mar 23, 2026
Merged

Add workload aware preemption#137606
k8s-ci-robot merged 3 commits intokubernetes:masterfrom
Argh4k:workload-preemption-v1alpha2

Conversation

@Argh4k
Copy link
Copy Markdown
Contributor

@Argh4k Argh4k commented Mar 10, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR implements workload aware preemption from KEP-5710.

Which issue(s) this PR is related to:

KEP-5710

Special notes for your reviewer:

This PR builds on top of two PRs: #136577 and #137562. Changes done in this PR starts with the Extend executor to work with pod groups commit

I am still actively extending test coverage for it.

Does this PR introduce a user-facing change?

When WorkloadAwarePreemption Feature Gate is enabled, and the Pod Group scheduling fails to find a place for the Pod Group, instead of running default preemption for each pod from the pod group, the workload aware preemption will be run for the whole group.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

[KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/5710-workload-aware-preemption)

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Mar 10, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 10, 2026
@k8s-ci-robot k8s-ci-robot added area/test sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Mar 10, 2026
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Mar 10, 2026
@github-project-automation github-project-automation Bot moved this to Needs Triage in SIG Scheduling Mar 10, 2026
@Argh4k
Copy link
Copy Markdown
Contributor Author

Argh4k commented Mar 10, 2026

/area workload-aware

Copy link
Copy Markdown
Member

@tosi3k tosi3k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial pass - I didn't go through the last 4 commits since I need to switch context to another review.

Comment thread staging/src/k8s.io/kube-scheduler/framework/interface.go Outdated
Comment thread pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go Outdated
Comment thread pkg/scheduler/framework/preemption/executor.go Outdated
Comment thread pkg/scheduler/testing/wrappers.go Outdated
Comment thread pkg/scheduler/framework/preemption/executor.go Outdated
// ExecutorPreemptor is an interface that represents a preemptor.
// It's mainly used to abstrac away fields needed for logging
// purposes during preemption calls done by Executor.
type ExecutorPreemptor interface {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we extend the already existing Preemptor interface defined in types.go instead?

Feels as if we define gazillion additional receiver methods here that could be specified in the Preemptor interface in the first place.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that Preemptor has more data in it, like CycleState for pods, which we do not need. This would force us to pass unnecessary data. Overall I would prefer to have this interface as an unimported internal only helper for unifying pod/pod group path, but Preempt method is also replaced in integration tests that are defined in other package.

Comment thread staging/src/k8s.io/kube-scheduler/framework/interface.go Outdated
Comment thread pkg/scheduler/framework/preemption/executor.go Outdated
Comment thread pkg/scheduler/framework/preemption/executor.go Outdated
@tosi3k tosi3k moved this from Needs Triage to 1st Reviewer In Progress in SIG Scheduling Mar 11, 2026
@Argh4k Argh4k force-pushed the workload-preemption-v1alpha2 branch from ddd7efb to 2f3e7bb Compare March 11, 2026 11:51
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Mar 11, 2026
Comment thread pkg/scheduler/framework/preemption/podgrouppreemption.go
Comment thread pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go Outdated
Copy link
Copy Markdown
Member

@dom4ha dom4ha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM without tests yet

Comment thread pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go Outdated
Comment thread pkg/scheduler/schedule_one_podgroup.go Outdated
}
}

victimsToPreempt = append(victimsToPreempt, nonViolatingVictims...)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is no longer there, so resolving

Copy link
Copy Markdown
Member

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM on the implementation side, left some small ones only. As discussed at sig meeting, I leave the testing part for other reviewers.

Comment thread pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go Outdated
Comment thread pkg/scheduler/schedule_one_podgroup.go Outdated
Comment thread pkg/scheduler/schedule_one_podgroup.go Outdated
Comment thread pkg/scheduler/backend/cache/snapshot_test.go Outdated
Comment thread pkg/scheduler/framework/preemption/podgrouppreemption_test.go Outdated
Comment thread pkg/scheduler/backend/cache/snapshot.go
Comment thread pkg/scheduler/framework/preemption/preemption.go Outdated
Comment thread pkg/scheduler/framework/preemption/preemption.go
Comment thread pkg/scheduler/framework/cycle_state.go
Comment thread pkg/scheduler/framework/runtime/framework.go Outdated
Comment thread pkg/scheduler/schedule_one_podgroup_test.go Outdated
Comment thread pkg/scheduler/scheduler.go
Comment thread pkg/scheduler/framework/preemption/types.go Outdated
Comment thread pkg/scheduler/framework/preemption/podgrouppreemption.go Outdated
Comment thread staging/src/k8s.io/kube-scheduler/framework/interface.go Outdated
Copy link
Copy Markdown
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this LGTM modulo the last comment about externalizing the plugin interface.

Comment thread pkg/scheduler/framework/preemption/podgrouppreemption_test.go Outdated
Comment thread pkg/scheduler/schedule_one_podgroup.go Outdated
Copy link
Copy Markdown
Member

@macsko macsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the scheduler's core code and integration tests. Preemption code is still waiting my review

Comment thread pkg/scheduler/backend/cache/snapshot_test.go Outdated
Comment thread pkg/scheduler/schedule_one_podgroup.go Outdated
Comment thread pkg/scheduler/schedule_one_podgroup.go Outdated
Comment thread pkg/scheduler/schedule_one_podgroup.go Outdated
Comment thread pkg/scheduler/schedule_one_podgroup.go Outdated
Comment thread test/integration/scheduler/podgroup/podgroup_test.go
Comment thread test/integration/scheduler/preemption/podgrouppreemption_test.go Outdated
Comment thread test/integration/scheduler/preemption/podgrouppreemption_test.go Outdated
Comment thread test/integration/scheduler/preemption/podgrouppreemption_test.go
Comment thread test/integration/scheduler/podgroup/podgroup_test.go
Copy link
Copy Markdown
Member

@macsko macsko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished the review of the entire PR. Mostly sent smaller comments or nits, so you can consider applying some of them in a follow-up PR if that's easier for you

Comment thread test/integration/scheduler/podgroup/podgroup_test.go
Comment thread pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go Outdated
Comment thread pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go Outdated
Comment thread pkg/scheduler/framework/preemption/preemption.go Outdated
Comment thread pkg/scheduler/framework/preemption/preemption.go Outdated
Comment thread pkg/scheduler/framework/preemption/podgrouppreemption_test.go Outdated
Comment thread pkg/scheduler/framework/preemption/podgrouppreemption_test.go Outdated
Comment thread pkg/scheduler/framework/preemption/podgrouppreemption_test.go Outdated
Comment thread pkg/scheduler/framework/preemption/podgrouppreemption_test.go
}
}

func TestFilterVictimsWithPDBViolation(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to verify in the tests whether filterVictimsWithPDBViolation works similar to the filterPodsWithPDBViolation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that filterPodsWithPDBViolation is in defaultpreemption package.

I think that for 1.37 we will unify this path as we need to handle pod group victims in pod by pod scenario, so the tests will also be unified.

Comment thread pkg/scheduler/schedule_one_podgroup.go
Comment thread pkg/scheduler/schedule_one.go Outdated
Comment thread pkg/scheduler/schedule_one.go Outdated
Comment thread pkg/scheduler/schedule_one.go Outdated
@tosi3k
Copy link
Copy Markdown
Member

tosi3k commented Mar 23, 2026

/test pull-kubernetes-e2e-kind

@macsko
Copy link
Copy Markdown
Member

macsko commented Mar 23, 2026

PR looks good, let's merge it!
/lgtm
/approve

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: da5473bf6663f26367d982e8cc089c9ccd78d101

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Argh4k, macsko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tosi3k
Copy link
Copy Markdown
Member

tosi3k commented Mar 23, 2026

/woof

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@tosi3k: dog image

Details

In response to this:

/woof

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@macsko
Copy link
Copy Markdown
Member

macsko commented Mar 23, 2026

/milestone v1.36
Based on the exception approval: kubernetes/enhancements#5710 (comment)

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

k8s-ci-robot commented Mar 23, 2026

@Argh4k: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-gce-csi-serial b16287a link false /test pull-kubernetes-e2e-gce-csi-serial
pull-kubernetes-e2e-storage-kind-vac b16287a link false /test pull-kubernetes-e2e-storage-kind-vac
pull-kubernetes-e2e-storage-kind-disruptive b16287a link false /test pull-kubernetes-e2e-storage-kind-disruptive
pull-kubernetes-e2e-gce-storage-slow b16287a link false /test pull-kubernetes-e2e-gce-storage-slow
pull-kubernetes-e2e-gce-storage-snapshot b16287a link false /test pull-kubernetes-e2e-gce-storage-snapshot
pull-kubernetes-node-e2e-containerd-1-7-dra 5dc6427 link false /test pull-kubernetes-node-e2e-containerd-1-7-dra
pull-kubernetes-node-e2e-containerd-2-0-dra-alpha-beta-features 5dc6427 link false /test pull-kubernetes-node-e2e-containerd-2-0-dra-alpha-beta-features
pull-kubernetes-node-e2e-containerd-2-0-dra 5dc6427 link false /test pull-kubernetes-node-e2e-containerd-2-0-dra
pull-kubernetes-kind-dra 5dc6427 link false /test pull-kubernetes-kind-dra
pull-kubernetes-kind-dra-n-1 5dc6427 link false /test pull-kubernetes-kind-dra-n-1
pull-kubernetes-dra-integration 5dc6427 link false /test pull-kubernetes-dra-integration
pull-kubernetes-kind-dra-all 5dc6427 link false /test pull-kubernetes-kind-dra-all
pull-kubernetes-kind-dra-n-3 5dc6427 link false /test pull-kubernetes-kind-dra-n-3
pull-kubernetes-kind-dra-n-2 5dc6427 link false /test pull-kubernetes-kind-dra-n-2

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@macsko
Copy link
Copy Markdown
Member

macsko commented Mar 23, 2026

/lgtm

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: e3fd7d5f3be12551724d8e931a2d2eaaebf18fb9

@tosi3k
Copy link
Copy Markdown
Member

tosi3k commented Mar 23, 2026

/test pull-kubernetes-e2e-gce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test area/workload-aware Categorizes an issue or PR as relevant to Workload-aware and Topology-aware scheduling subprojects. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Projects

Status: ✅ Done
Archived in project
Archived in project

Development

Successfully merging this pull request may close these issues.