Skip to content

fix: coordination rolling update not working for LWS workloads#151

Merged
Syspretor merged 2 commits intosgl-project:mainfrom
LikiosSedo:fix/issue-150-coordination-rolling-update
Jan 30, 2026
Merged

fix: coordination rolling update not working for LWS workloads#151
Syspretor merged 2 commits intosgl-project:mainfrom
LikiosSedo:fix/issue-150-coordination-rolling-update

Conversation

@LikiosSedo
Copy link
Copy Markdown
Contributor

Ⅰ. Motivation

Coordination rolling update does not work for LeaderWorkerSet workloads when only
coordinationRequirements.strategy.rollingUpdate is configured (without role-level rolloutStrategy).

Ⅱ. Modifications

Two bugs in pkg/reconciler/lws_reconciler.go:

  1. Condition only checks role-level config: Changed to check both role.RolloutStrategy and
    rollingUpdateStrategy parameter
  2. lwsSpecEqual skips RolloutStrategy comparison: Added RolloutStrategy comparison to detect partition
    changes

Ⅲ. Does this pull request fix one issue?

fixes #150

Ⅳ. List the added test cases

Manual and Local e2e testing verified:

  • LWS partition correctly set from coordination config
  • Partition values update during rolling update progress

Ⅴ. Describe how to verify it

  1. Create RBG with LWS workloads and coordination.strategy.rollingUpdate.maxSkew
  2. Trigger rolling update by modifying pod template
  3. Observe LWS spec.rolloutStrategy.rollingUpdateConfiguration.partition changes from non-zero to 0

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @LikiosSedo, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where coordination rolling updates failed for LeaderWorkerSet (LWS) workloads when only the coordination-level rolling update strategy was defined. The changes ensure that the system correctly processes both role-level and coordination-level rolling update configurations, preventing missed updates and ensuring proper partition management during rollouts.

Highlights

  • Rolling Update Condition Logic: The logic for determining if a rolling update configuration should be applied has been updated to correctly consider both role-level RolloutStrategy and coordination-level rollingUpdateStrategy parameters, ensuring proper application of MaxSurge, MaxUnavailable, and Partition.
  • RolloutStrategy Comparison: The lwsSpecEqual function now includes a deep equality comparison for RolloutStrategy to ensure that changes in this configuration, particularly partition changes, are properly detected and trigger necessary updates.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug where coordination-level rolling update strategies were not being applied to LWS workloads. The changes correctly expand the condition for applying rolling update configurations and add a check for RolloutStrategy changes to trigger reconciliation. My review identified a minor omission where MaxSurge from the coordination strategy is not handled, leading to incomplete override logic. I've included a suggestion to fix this for consistency.

Comment thread pkg/reconciler/lws_reconciler.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes coordination-based rolling updates for LeaderWorkerSet (LWS) workloads so that coordination-level rolling update settings are correctly applied and detected, addressing issue #150 where partition values were never propagated to LWS workloads.

Changes:

  • Adjusts the LWS rolling update configuration to honor either role-level RolloutStrategy.RollingUpdate or coordination-level rollingUpdateStrategy, including proper precedence for MaxUnavailable and Partition.
  • Updates lwsSpecEqual to include RolloutStrategy in the semantic equality check so that changes to rollout/partition settings trigger a patch instead of being treated as no-op.
Comments suppressed due to low confidence (1)

pkg/reconciler/lws_reconciler.go:329

  • The new handling of rolling update configuration and RolloutStrategy comparison for LWS (including coordination-level rollingUpdateStrategy and lwsSpecEqual comparing RolloutStrategy) is not covered by unit tests in lws_reconciler_test.go, even though similar rollout/coordination behavior is tested for other reconcilers (e.g., sts_reconciler_test.go and instanceset_reconciler_test.go). To keep behavior stable and prevent regressions in the coordination rolling update logic, please add focused tests for: (1) applying coordination-only rolling update (no role-level RolloutStrategy) to LWS, including partition/maxUnavailable selection precedence between coordination and role-level config, and (2) semanticallyEqualLeaderWorkerSet / lwsSpecEqual detecting changes in RolloutStrategy (e.g., partition changes) so that patches are actually applied.
	// RollingUpdate
	hasRoleRollingUpdate := role.RolloutStrategy != nil && role.RolloutStrategy.RollingUpdate != nil
	hasCoordinationRollingUpdate := rollingUpdateStrategy != nil
	if hasRoleRollingUpdate || hasCoordinationRollingUpdate {
		rollingUpdateConfiguration := lwsapplyv1.RollingUpdateConfiguration()
		if hasRoleRollingUpdate {
			if role.RolloutStrategy.RollingUpdate.MaxSurge != nil {
				rollingUpdateConfiguration = rollingUpdateConfiguration.WithMaxSurge(*role.RolloutStrategy.RollingUpdate.MaxSurge)
			}
			if role.RolloutStrategy.RollingUpdate.MaxUnavailable != nil {
				rollingUpdateConfiguration = rollingUpdateConfiguration.WithMaxUnavailable(*role.RolloutStrategy.RollingUpdate.MaxUnavailable)
			}
		}
		if rollingUpdateStrategy != nil && rollingUpdateStrategy.MaxUnavailable != nil {
			rollingUpdateConfiguration =
				rollingUpdateConfiguration.WithMaxUnavailable(*rollingUpdateStrategy.MaxUnavailable)
		}
		var partitionIntOrStr *intstr.IntOrString
		if rollingUpdateStrategy != nil && rollingUpdateStrategy.Partition != nil {
			partitionIntOrStr = rollingUpdateStrategy.Partition
		} else if hasRoleRollingUpdate && role.RolloutStrategy.RollingUpdate.Partition != nil {
			partitionIntOrStr = role.RolloutStrategy.RollingUpdate.Partition
		}

		if partitionIntOrStr != nil {
			partition, err := intstr.GetScaledValueFromIntOrPercent(partitionIntOrStr, int(*role.Replicas), true)
			if err != nil {
				return nil, err
			}
			rollingUpdateConfiguration = rollingUpdateConfiguration.WithPartition(int32(partition))
		}

		lwsSpecConfig = lwsSpecConfig.WithRolloutStrategy(
			lwsapplyv1.RolloutStrategy().WithRollingUpdateConfiguration(rollingUpdateConfiguration),
		)
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@RongGu
Copy link
Copy Markdown
Collaborator

RongGu commented Jan 26, 2026

@LikiosSedo Please check the ut and e2e test issue.

@LikiosSedo LikiosSedo force-pushed the fix/issue-150-coordination-rolling-update branch from 58e07da to f253170 Compare January 26, 2026 12:23
@LikiosSedo
Copy link
Copy Markdown
Contributor Author

@LikiosSedo Please check the ut and e2e test issue.

Done

Copy link
Copy Markdown
Collaborator

@Syspretor Syspretor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@Syspretor Syspretor merged commit 662f682 into sgl-project:main Jan 30, 2026
9 of 11 checks passed
LikiosSedo added a commit to LikiosSedo/rbg that referenced this pull request Jan 30, 2026
…roject#151)

* fix: coordination rolling update not working for LWS workloads

* test: add unit tests for LWS coordination rolling update

---------

Co-authored-by: 刘森栋 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Coordination rolling update: partition not applied to LWS workloads

4 participants