Skip to content

[KEP-5440]: MutablePodResourcesForSuspendedJobs #132441

Merged
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
kannon92:poc-job-relax-pod-template
Nov 5, 2025
Merged

[KEP-5440]: MutablePodResourcesForSuspendedJobs #132441
k8s-ci-robot merged 1 commit intokubernetes:masterfrom
kannon92:poc-job-relax-pod-template

Conversation

@kannon92
Copy link
Copy Markdown
Contributor

@kannon92 kannon92 commented Jun 22, 2025

  • Add feature gate to control mutable pod resources for suspended jobs
  • Implement validatePodResourceUpdatesOnly function to allow only container resource updates
  • Update strategy validation to check feature gate and job state (suspended + not started)
  • Add comprehensive unit tests for feature gate functionality
  • Ensure backward compatibility when feature gate is disabled

This enables users to update container resources on suspended jobs that haven't started yet, while maintaining immutability for all other pod template fields.

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR is related to:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

KEP-5440: Allow for resizing of resources while job is suspended.  This feature is alpha.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jun 22, 2025
@k8s-ci-robot k8s-ci-robot requested review from deads2k and soltysh June 22, 2025 01:36
@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 22, 2025
@github-project-automation github-project-automation Bot moved this to Needs Triage in SIG Apps Jun 22, 2025
Copy link
Copy Markdown
Member

@lmktfy lmktfy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this validation could go into a highly recommended ValidatingAdmissionPolicy.

For example: we could say that shrinking limits and requests for a template is always OK, but restrict resizes that enlarge them.

💭 What would the authz look like for template changes?

@kannon92
Copy link
Copy Markdown
Contributor Author

Some of this validation could go into a highly recommended ValidatingAdmissionPolicy.

For example: we could say that shrinking limits and requests for a template is always OK, but restrict resizes that enlarge them.

💭 What would the authz look like for template changes?

Maybe. But right now, we are forbidding any resizing. So we could relax it in-tree and then leave it to people to implement their own VAP?

But that could lead to nobody implement this and then its free reign. So I'm not sure what is best here.

@kannon92 kannon92 force-pushed the poc-job-relax-pod-template branch from ffc331b to add236c Compare June 23, 2025 21:27
Comment thread pkg/features/kube_features.go Outdated
@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 26, 2025
@kannon92 kannon92 force-pushed the poc-job-relax-pod-template branch from add236c to f49bc5f Compare June 28, 2025 15:50
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 28, 2025
@kannon92 kannon92 force-pushed the poc-job-relax-pod-template branch 4 times, most recently from 60c0bfa to 5c1f00c Compare July 3, 2025 20:51
@kannon92
Copy link
Copy Markdown
Contributor Author

kannon92 commented Jul 5, 2025

/retest

Comment thread pkg/registry/batch/job/strategy_test.go
Comment thread test/integration/job/job_test.go
@kannon92 kannon92 force-pushed the poc-job-relax-pod-template branch from c1d22c4 to 0e439db Compare November 3, 2025 18:40
@kannon92
Copy link
Copy Markdown
Contributor Author

kannon92 commented Nov 3, 2025

#132441 (comment)

/hold

Gotta address this.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 3, 2025
Copy link
Copy Markdown
Contributor

@deads2k deads2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use some clarifications.

Comment thread pkg/apis/batch/validation/validation.go Outdated
Comment thread pkg/registry/batch/job/strategy.go Outdated
Comment thread pkg/registry/batch/job/strategy.go Outdated

if utilfeature.DefaultFeatureGate.Enabled(features.MutablePodResourcesForSuspendedJobs) {
mutablePodTemplate := suspended && batchvalidation.IsConditionTrue(oldJob.Status.Conditions, batch.JobSuspended) && oldJob.Status.Active == 0
opts.AllowMutableSchedulingDirectives = mutablePodTemplate
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is less obvious to me. Why does this existing option become true some times? Actually it's more than that. This change takes us from AllowMutableSchedulingDirectives = suspended && notStarted to ignoring whether there is a start time, so a job that has started, but doesn't have an active pod can change. Why did this validation change as part of the pod resources change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this comes from #134980 (comment).

cc @soltysh @mimowo

The KEP update (done today) was kubernetes/enhancements#5678.

Copy link
Copy Markdown
Contributor

@mimowo mimowo Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this existing option become true some times?

Yeah, currently the precondition logic is inside the strategy.go. We could also set the option based on the future gate and then check all preconditions inside the validation.go. I think the only explanation is to put the code next to the pre-existing FG which follows the pattern, ie. AllowMutableSchedulingDirectives. +1 to move it to validation.go - from my PoV either in this PR or a small follow up.

Why did this validation change as part of the pod resources change?

The motivation is to align the handling of AllowMutableSchedulingDirectives. Over time it turned out that the check for status.startTime is not good, because:

  1. it does not allow to change PodTemplate while the job is suspended, which is essential for Kueue, and thus we do this trick of clearing starttime ourselves. It allows to change only if never started which is too restrictive.
  2. It does not really guarantee that all Pods are deleted (have deletionTimestamp), and thus in Kueue we need to defer the stop until status.Active=0, we do it here.
  3. There is not much of conceptual difference between scheduling directives, like nodeSelectors, and Pod resources - both influence scheduling decisions, and so it seems reasonable to align the condition for mutability.

I think it is reasonable to align the mutability condition in this KEP, otherwise it would require the overhead of yet another KEP just to align.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is reasonable to align the mutability condition in this KEP, otherwise it would require the overhead of yet another KEP just to align.

I don't see a need for another KEP, but another featuregate with a different graduation lifecycle makes sense. We can update our job controller to manipulate startTime in this way, but we have no practical way of doing that for other controllers and certainly not in skewing cases. This means that we cannot simply change the validation under an unrelated featuregate and promote that featuregate in a normal cycle.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so can we just have another feature gate under this KEP like AdjustMutableSchedulingDirectives?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our controller might make a change (weird featuregate to wire it to btw), but other job controllers may not. Certainly not on a version skew. I can see the justification for expanding to allow validation for oldCondition || newCondition, but this tightening of validation needs a separate gate and a different cycle for deciding when to force it.

I can appreciate that it is a gap we'd like to close, but suddenly requiring new validation to use unrelated features is untenable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened: #135104 PTAL.

allErrs := field.ErrorList{}

// Create a deep copy of the old pod spec to modify
oldPodCopy := oldPod.DeepCopy()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking remark, but this might be heavy, and we already take the entire template above: https://github.com/kubernetes/kubernetes/pull/132441/files#diff-67fdcc10f97b3bbec1288558a1575efd01474516066db5332a141592ee5dd4e4R642

So we could optimize by reusing the same copy if we do early:

if opts.AllowMutableSchedulingDirectives || opts.AllowMutablePodResources {
   oldTemplate = oldSpec.Template.DeepCopy() // +k8s:verify-mutation:reason=clone
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is non-blocking.

oldTemplate.Labels = template.Labels // +k8s:verify-mutation:reason=clone
oldTemplate.Spec.SchedulingGates = template.Spec.SchedulingGates // +k8s:verify-mutation:reason=clone
}
allErrs = append(allErrs, apivalidation.ValidateImmutableField(template, oldTemplate, fldPath.Child("template"))...)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could rely still on this line if we above just follow the tested pattern of mungling the oldTemplate. It would also save the unnecessary DeepCopy, posted here: https://github.com/kubernetes/kubernetes/pull/132441/files#r2489218558

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is non-blocking.

opts.AllowMutableSchedulingDirectives = suspended && notStarted

if utilfeature.DefaultFeatureGate.Enabled(features.MutablePodResourcesForSuspendedJobs) {
opts.AllowMutablePodResources = batchvalidation.IsConditionTrue(oldJob.Status.Conditions, batch.JobSuspended) && oldJob.Status.Active == 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
opts.AllowMutablePodResources = batchvalidation.IsConditionTrue(oldJob.Status.Conditions, batch.JobSuspended) && oldJob.Status.Active == 0
opts.AllowMutablePodResources = suspended && batchvalidation.IsConditionTrue(oldJob.Status.Conditions, batch.JobSuspended) && oldJob.Status.Active == 0

I think it better captures the intention, and also prevents some race conditions when someone could mutate the Job while suspend=false, but the JobSuspended is still there from the previous suspend.

Copy link
Copy Markdown
Contributor

@mimowo mimowo Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see you moved this condition here: https://github.com/kubernetes/kubernetes/pull/132441/files#diff-67fdcc10f97b3bbec1288558a1575efd01474516066db5332a141592ee5dd4e4

I think splitting these checks makes it hard to follow. Shouldn't we move also batchvalidation.IsConditionTrue(oldJob.Status.Conditions, batch.JobSuspended) && oldJob.Status.Active == 0 to validation.go?

Also, splitting the checks allows adding tests in validation_test.go which would be detached from reality (bypassing the checks).

I think we could have opts.AllowMutablePodResources = utilfeature.DefaultFeatureGate.Enabled(features.MutablePodResourcesForSuspendedJobs)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a thread with david on this.

Validation checks spec and status as two separate function calls.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the code I added validation I don't have access to anything on the status field.

Same for updates to status.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the commit in because I do find this harder to follow.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure is highlighting edges in the "spec cannot stop status updates". We haven't had a peer "status cannot stop spec updates", so it's an interesting dynamic. I'm glad to have the spec related validation all sitting in validation at least.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

Comment thread pkg/apis/batch/validation/validation.go Outdated
@deads2k
Copy link
Copy Markdown
Contributor

deads2k commented Nov 4, 2025

please squash to something more reasonable before merge

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 4, 2025
@kannon92 kannon92 force-pushed the poc-job-relax-pod-template branch from cdf30cf to 37c2054 Compare November 4, 2025 19:11
@kannon92 kannon92 requested review from deads2k and mimowo November 4, 2025 19:13
@kannon92
Copy link
Copy Markdown
Contributor Author

kannon92 commented Nov 4, 2025

/hold cancel

The hold was put on some changes that moved into a separate PR. This is ready for final review.

Copy link
Copy Markdown
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

/hold
in case @mimowo wants to have his comment addressed

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: e965cf172fe3b68ba106bedfc4548cbab1289d28

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, kannon92, soltysh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

{Type: field.ErrorTypeInvalid, Field: "spec.template"},
},
},
"feature gate enabled - unsuspended job resource updates rejected": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test case which shows that the status.Active==0 checks is used, so

"feature gate enabled - job suspended, but with remaining active Pods  resource updates rejected": {
			enableFeatureGate: true,
			job: &batch.Job{
				ObjectMeta: metav1.ObjectMeta{
					Name:            "myjob",
					Namespace:       metav1.NamespaceDefault,
					ResourceVersion: "0",
				},
				Spec: batch.JobSpec{
					Selector:       validSelector,
					Template:       validPodTemplateSpec,
					ManualSelector: ptr.To(true),
					Parallelism:    ptr.To[int32](1),
					Suspend:        ptr.To(true ),
				},
				Status: batch.JobStatus{
					Active: 1,
					Conditions: []batch.JobCondition{
						{
							Type:   batch.JobSuspended,
							Status: api.ConditionStatus(metav1.ConditionFalse),
						},
					},
				},
			},
			update: func(job *batch.Job) {
				job.Spec.Template.Spec.Containers[0].Resources.Requests = api.ResourceList{
					api.ResourceCPU: resource.MustParse("200m"),
				}
			},
			wantErrs: field.ErrorList{
				{Type: field.ErrorTypeInvalid, Field: "spec.template"},
			},
		},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and

"feature gate enabled - job suspended, with the suspended condition, but pods remain active - reject": {
			enableFeatureGate: true,
			job: &batch.Job{
				ObjectMeta: metav1.ObjectMeta{
					Name:            "myjob",
					Namespace:       metav1.NamespaceDefault,
					ResourceVersion: "0",
				},
				Spec: batch.JobSpec{
					Selector:       validSelector,
					Template:       validPodTemplateSpec,
					ManualSelector: ptr.To(true),
					Parallelism:    ptr.To[int32](1),
					Suspend:        ptr.To(true ),
				},
				Status: batch.JobStatus{
					Active: 1,
					Conditions: []batch.JobCondition{
						{
							Type:   batch.JobSuspended,
							Status: api.ConditionStatus(metav1.ConditionTrue),
						},
					},
				},
			},
			update: func(job *batch.Job) {
				job.Spec.Template.Spec.Containers[0].Resources.Requests = api.ResourceList{
					api.ResourceCPU: resource.MustParse("200m"),
				}
			},
			wantErrs: field.ErrorList{
				{Type: field.ErrorTypeInvalid, Field: "spec.template"},
			},
		},

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

- Add feature gate to control mutable pod resources for suspended jobs
- Implement validatePodResourceUpdatesOnly function to allow only container resource updates
- Allow resource updates for suspended jobs regardless of whether they have started
- Add comprehensive unit and integration tests for all scenarios including started-then-suspended
- Ensure backward compatibility when feature gate is disabled

This enables users to update container resources on suspended jobs, including jobs that
have previously started and been suspended, while maintaining immutability for all other
pod template fields.
@mimowo
Copy link
Copy Markdown
Contributor

mimowo commented Nov 5, 2025

/hold
in case @mimowo wants to have his comment addressed

/unhold
my main comment about tests is addressed: #132441 (comment). The comment about reusing the mechanism for mutability testing #132441 (comment) and #132441 (comment) are non-blocking.

Thank you 👍
/lgtm

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: c67bb1e9198cd912765c9418d436a48cb7ace432

@soltysh
Copy link
Copy Markdown
Contributor

soltysh commented Nov 5, 2025

Known flake, tracked in #134998

/test pull-kubernetes-node-e2e-containerd

@mimowo
Copy link
Copy Markdown
Contributor

mimowo commented Nov 5, 2025

/test pull-kubernetes-node-e2e-containerd
flake

@kannon92
Copy link
Copy Markdown
Contributor Author

kannon92 commented Nov 5, 2025

/retest

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

@kannon92: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-e2e-kind 7db5311 link unknown /test pull-kubernetes-e2e-kind
pull-kubernetes-node-e2e-containerd 7db5311 link unknown /test pull-kubernetes-node-e2e-containerd
pull-kubernetes-e2e-gce 7db5311 link unknown /test pull-kubernetes-e2e-gce

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@mimowo
Copy link
Copy Markdown
Contributor

mimowo commented Nov 5, 2025

/test pull-kubernetes-node-e2e-containerd
/test pull-kubernetes-e2e-kind
/test pull-kubernetes-e2e-gce
flakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

8 participants