Skip to content

KEP-4671: Add Workload Aware Scheduling blog post for v1.36#54667

Merged
k8s-ci-robot merged 10 commits intokubernetes:mainfrom
macsko:was_blog_136
Apr 28, 2026
Merged

KEP-4671: Add Workload Aware Scheduling blog post for v1.36#54667
k8s-ci-robot merged 10 commits intokubernetes:mainfrom
macsko:was_blog_136

Conversation

@macsko
Copy link
Copy Markdown
Member

@macsko macsko commented Feb 25, 2026

Description

This PR adds a blog post for Workload Aware Scheduling initiative update for v1.36

Issue

KEP: kubernetes/enhancements#4671

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 25, 2026
@k8s-ci-robot k8s-ci-robot requested a review from lmktfy February 25, 2026 14:06
@k8s-ci-robot k8s-ci-robot added the area/blog Issues or PRs related to the Kubernetes Blog subproject label Feb 25, 2026
@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 25, 2026
@macsko
Copy link
Copy Markdown
Member Author

macsko commented Feb 25, 2026

/sig scheduling
/area workload-aware

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. area/workload-aware Categorizes an issue or PR as relevant to Workload-aware and Topology-aware scheduling subprojects. labels Feb 25, 2026
@github-project-automation github-project-automation Bot moved this to Needs Triage in SIG Scheduling Feb 25, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 25, 2026

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit 3214748
🔍 Latest deploy log https://app.netlify.com/projects/kubernetes-io-main-staging/deploys/69e9cef4aac46400088b7c4a
😎 Deploy Preview https://deploy-preview-54667--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 2, 2026
Comment thread content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md Outdated
Comment thread content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 3, 2026
@macsko macsko changed the title KEP-4671: Workload Aware Scheduling blog post KEP-4671: Add Workload Aware Scheduling blog post for v1.36 Apr 3, 2026
Copy link
Copy Markdown
Member

@lmktfy lmktfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as a draft, but it's clearly not ready to merge as-is.

To power this, the `kube-scheduler` features a new *PodGroup scheduling cycle* that enables atomic workload processing
and paves the way for future enhancements. We are also rolling out the first iterations of *topology-aware scheduling*
and *workload-aware preemption* to advance the scheduling capabilities for these workloads. Additionally,
we have added *ResourceClaim support for workloads* to unlock *Dynamic Resource Allocation
Copy link
Copy Markdown
Member

@lmktfy lmktfy Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "we" mean:
Maciej Skoczeń (Google),
Antoni Zawodny (Google),
Matt Matejczyk (Google),
Bartosz Rejman (Google),
Jon Huhn (Microsoft),
Maciej Wyrzuc (Google),
TBD

?

If not: reword for clarity.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

completely replacing the previous `v1alpha1` API version.

In v1.35, Pod groups and their runtime states were embedded within the Workload resource.
In the new model, the Workload object is rarely updated (as a template object),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What part of a Workload is immutable? Can I still set labels?

minCount: 4
```

Next, workload controllers (such as the Job controller) stamp out runtime PodGroup instances based on those templates.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that if I have a CronJob I could end up having a new Workload created every n minutes? (If so, might be worth calling out the limitation).

Comment thread content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md Outdated
Comment thread content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md
Comment thread content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md Outdated
Comment thread content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md Outdated
Comment thread content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md Outdated

## Integration with the Job controller

TBD
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part doesn't look ready to merge.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should just remove this section.

Job controller will stay alpha in 1.37 anyway for future API changes so I think its okay to skip this update for 1.36.

WDYT @helayoty?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a few sentences here would be beneficial, unless the job integration is already documented elsewhere. Since this is our only real controller integration so far, it's likely the easiest way for users to try out WAS.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is #54489.

Okay I think its fine to add then.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this section, PTAL

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section has been updated and I believe it's ready now. PTAL.

Comment thread content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md Outdated
Copy link
Copy Markdown
Contributor

@graz-dev graz-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 20, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 1fef3186689baa9f4ce0cf19ff598763a5e46557

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 21, 2026
@k8s-ci-robot k8s-ci-robot requested a review from graz-dev April 21, 2026 10:17
@helayoty helayoty moved this from Needs Triage to In Progress in SIG Scheduling Apr 21, 2026
@helayoty helayoty moved this from Backlog to In Progress in Workload-aware & Topology-aware Workstream Apr 21, 2026
Copy link
Copy Markdown
Member

@helayoty helayoty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL on the Job integration section sugget.

Comment thread content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md
AI/ML and batch workloads introduce unique scheduling challenges that go beyond simple Pod-by-Pod scheduling.
In Kubernetes v1.35, we introduced the first tranche of *workload-aware scheduling* improvements,
featuring the foundational Workload API alongside basic *gang scheduling* support built on a Pod-based framework,
and an *opportunistic batching* feature to efficiently process identical Pods.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we manage to integrate the opportunistic batching for gang-scheduling plugin?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opportunistic batching is enabled by default also during gang scheduling as long as the member pods are identical

`scheduling.k8s.io/v1alpha2` {{< glossary_tooltip text="API group" term_id="api-group" >}},
completely replacing the previous `v1alpha1` API version.

In v1.35, Pod groups and their runtime states were embedded within the Workload resource.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd state in breif why this was not the good way to follow and why we decoupled PodGroup.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph describes that:

This separation also improves performance and scalability as the PodGroup API allows per-replica sharding of status updates.

Copy link
Copy Markdown
Member

@lmktfy lmktfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

with nits / quibbles.

Comment thread content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md Outdated
Comment thread content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md Outdated
* Reaching out via [Slack (#workload-aware-scheduling)](https://kubernetes.slack.com/archives/C0AHLJ0EAEL).
* Joining the [SIG Scheduling](https://docs.google.com/document/d/13mwye7nvrmV11q9_Eg77z-1w3X7Q1GTbslpml4J7F3A/edit)
or [Workload API integration](https://docs.google.com/document/d/1XSPdK4L3zkAFhAZ3hBQJr2k7JX9CpGD7NeQfujM1PT4/edit) meetings.
* Filing a new [issue](https://github.com/kubernetes/enhancements/issues) in the Kubernetes repository.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not encourage people to do this; we don't want people to file KEPs without following the recommended process.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I wanted to link a k/k repo

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 22, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: f9005633b62f4e12bcff8cb95fa4447d0fb64239

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 23, 2026
@k8s-ci-robot k8s-ci-robot requested a review from lmktfy April 23, 2026 07:49
@graz-dev
Copy link
Copy Markdown
Contributor

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 28, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: c049dc812027db32939b1b4afd746aad73c5b8ad

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: graz-dev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 28, 2026
@k8s-ci-robot k8s-ci-robot merged commit d95e318 into kubernetes:main Apr 28, 2026
6 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in SIG Scheduling Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/blog Issues or PRs related to the Kubernetes Blog subproject area/workload-aware Categorizes an issue or PR as relevant to Workload-aware and Topology-aware scheduling subprojects. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

10 participants