Skip to content

feature(rbg): support to use InstanceSet as role workload#110

Merged
cheyang merged 3 commits intosgl-project:mainfrom
Syspretor:feature/instanceset-integration
Nov 28, 2025
Merged

feature(rbg): support to use InstanceSet as role workload#110
cheyang merged 3 commits intosgl-project:mainfrom
Syspretor:feature/instanceset-integration

Conversation

@Syspretor
Copy link
Copy Markdown
Collaborator

@Syspretor Syspretor commented Nov 21, 2025

Ⅰ. Motivation

This PR primarily introduces InstanceSet as a configurable workload option for Roles. For details on InstanceSet, see #50 :

  1. For Roles configured with InstanceSet as the workload, two patterns are supported: role.Template and role.LeaderWorkerSet. With a single workload type, it supports both single-node and multi-node (cross-machine) deployments, reducing version dependencies on external workloads.
  2. The Instance implementation is enhanced compared with StatefulSet and LWS, and natively supports in-place upgrades. Updates to the image and metadata are performed in place; the Pod is not recreated—instead, containers are restarted in place.
  3. When using Instance as the Role workload, it remains compatible with all existing orthogonal capabilities such as Coordination Update, GangSchedule, and ScalingAdapter.

Ⅱ. Modifications

  1. Add an InstanceSet implementation to the WorkloadReconciler interface.
  2. Add new API fields in role.rolloutStrategy.
// RollingUpdate defines the parameters to be used for RollingUpdateStrategyType.
type RollingUpdate struct {
	// New API Field
    // Type indicates the type of the InstanceSetUpdateStrategy.
	// Default is InPlaceIfPossible.
	Type UpdateStrategyType `json:"type,omitempty"`

	Partition *int32 `json:"partition,omitempty"`

	MaxUnavailable intstr.IntOrString `json:"maxUnavailable,omitempty"`
        
	MaxSurge intstr.IntOrString `json:"maxSurge,omitempty"`

	// New API Field
	// Paused indicates that the InstanceSet is paused.
	// Default value is false
	Paused bool `json:"paused,omitempty"`

	// New API Field
	// InPlaceUpdateStrategy contains strategies for in-place update.
	InPlaceUpdateStrategy *InPlaceUpdateStrategy `json:"inPlaceUpdateStrategy,omitempty"`
}

// InPlaceUpdateStrategy defines the strategies for in-place update.
type InPlaceUpdateStrategy struct {
	// GracePeriodSeconds is the timespan between set Pod status to not-ready and update images in Pod spec
	// when in-place update a Pod.
	GracePeriodSeconds int32 `json:"gracePeriodSeconds,omitempty"`
}
  1. Implement InstanceSetReconciler with support for three patterns: role.template, role.leaderWorkerSet, and role.components.
  2. For role.leaderWorkerSet, ensure compatibility with existing LeaderWorkerSet environment variables to enable seamless switching/migration.
  3. Extract the Service reconciliation logic into a shared ServiceReconciler responsible for reconciling Services associated with Roles.
  4. Make the InstanceSetReconciler implementation compatible with the Coordination Update logic.

Ⅲ. Does this pull request fix one issue?

Ⅳ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅴ. Describe how to verify it

  • Create Rbg with a role defined instanceSet workload
    • Declare role.template
apiVersion: workloads.x-k8s.io/v1alpha1
kind: RoleBasedGroup
metadata:
  name: demo
spec:
  roles:
    - name: engine
      labels:
        instance.rolebasedgroup.workloads.x-k8s.io/pattern: Deployment
      replicas: 10
      rolloutStrategy:
        type: RollingUpdate
        rollingUpdate:
          type: InPlaceIfPossible
      workload:
        apiVersion: workloads.x-k8s.io/v1alpha1
        kind: InstanceSet
      template:
        spec:
          containers:
            - name: main
              image: nginx:latest

The following pods will be create by such yaml:

demo-engine-2s2np   1/1     Running   0          3s
demo-engine-5k8m9   1/1     Running   0          3s
demo-engine-74mk9   1/1     Running   0          3s
demo-engine-7vd96   1/1     Running   0          3s
demo-engine-96vhm   1/1     Running   0          3s
demo-engine-bf654   1/1     Running   0          3s
demo-engine-mjcl5   1/1     Running   0          3s
demo-engine-qzbt8   1/1     Running   0          3s
demo-engine-tz5sl   1/1     Running   0          3s
demo-engine-xlznc   1/1     Running   0          3s

A stateful mode of InstanceSet will be supported in v0.6.0, With stateful mode, pods created by instanceSet:

demo-engine-0   1/1     Running   0          3s
demo-engine-1   1/1     Running   0          3s
demo-engine-2   1/1     Running   0          3s
demo-engine-3   1/1     Running   0          3s
demo-engine-4   1/1     Running   0          3s
demo-engine-5   1/1     Running   0          3s
demo-engine-6   1/1     Running   0          3s
demo-engine-7   1/1     Running   0          3s
demo-engine-8   1/1     Running   0          3s
demo-engine-9   1/1     Running   0          3s
  • Declare role.leaderWorkerSet
apiVersion: workloads.x-k8s.io/v1alpha1
kind: RoleBasedGroup
metadata:
 name: demo
spec:
 roles:
   - name: engine
     labels:
       instance.rolebasedgroup.workloads.x-k8s.io/pattern: Deployment
     replicas: 3
     rolloutStrategy:
       type: RollingUpdate
       rollingUpdate:
         type: InPlaceIfPossible
     workload:
       apiVersion: workloads.x-k8s.io/v1alpha1
       kind: InstanceSet
     template:
       spec:
         containers:
           - name: main
             image: nginx:latest
     leaderWorkerSet:
       patchLeaderTemplate:
         metadata:
           labels:
             role: leader
       patchWorkerTemplate:
         metadata:
           labels:
             role: worker
       size: 3

The following pods will be create by such yaml:

demo-engine-485s9-0   1/1     Running   0          3s
demo-engine-485s9-1   1/1     Running   0          3s
demo-engine-485s9-2   1/1     Running   0          3s
demo-engine-6bpk5-0   1/1     Running   0          3s
demo-engine-6bpk5-1   1/1     Running   0          3s
demo-engine-6bpk5-2   1/1     Running   0          3s
demo-engine-c7pp6-0   1/1     Running   0          3s
demo-engine-c7pp6-1   1/1     Running   0          3s
demo-engine-c7pp6-2   1/1     Running   0          3s

The following environment variables are still available in any Pod:

# kubectl exec demo-engine-485s9-worker-1 -- env | grep LWS
LWS_WORKER_INDEX=2
LWS_GROUP_SIZE=3
LWS_LEADER_ADDRESS=demo-engine-485s9-leader-0.s-demo-engine.default
  • Declare role.components
apiVersion: workloads.x-k8s.io/v1alpha1
kind: RoleBasedGroup
metadata:
  name: demo
spec:
  roles:
    - name: engine
      labels:
        instance.rolebasedgroup.workloads.x-k8s.io/pattern: Deployment
      replicas: 3
      rolloutStrategy:
        type: RollingUpdate
        rollingUpdate:
          type: InPlaceIfPossible
      workload:
        apiVersion: workloads.x-k8s.io/v1alpha1
        kind: InstanceSet
      components:
        - name: component-1
          size: 1
          template:
            spec:
              containers:
                - name: main
                  image: nginx:latest
        - name: component-2
          size: 2
          template:
            spec:
              containers:
                - name: main
                  image: nginx:latest
        - name: component-3
          size: 3
          template:
            spec:
              containers:
                - name: main
                  image: nginx:latest

The following pods will be create by such yaml:

demo-engine-7ncsp-component-1-0   1/1     Running   0          6m5s
demo-engine-7ncsp-component-2-0   1/1     Running   0          6m5s
demo-engine-7ncsp-component-2-1   1/1     Running   0          6m5s
demo-engine-7ncsp-component-3-0   1/1     Running   0          6m5s
demo-engine-7ncsp-component-3-1   1/1     Running   0          6m5s
demo-engine-7ncsp-component-3-2   1/1     Running   0          6m5s
demo-engine-8hrkk-component-1-0   1/1     Running   0          10s
demo-engine-8hrkk-component-2-0   1/1     Running   0          10s
demo-engine-8hrkk-component-2-1   1/1     Running   0          10s
demo-engine-8hrkk-component-3-0   1/1     Running   0          10s
demo-engine-8hrkk-component-3-1   1/1     Running   0          10s
demo-engine-8hrkk-component-3-2   1/1     Running   0          10s
demo-engine-cthzt-component-1-0   1/1     Running   0          6m5s
demo-engine-cthzt-component-2-0   1/1     Running   0          6m5s
demo-engine-cthzt-component-2-1   1/1     Running   0          6m5s
demo-engine-cthzt-component-3-0   1/1     Running   0          6m5s
demo-engine-cthzt-component-3-1   1/1     Running   0          6m5s
demo-engine-cthzt-component-3-2   1/1     Running   0          6m5s

VI. Special notes for reviews

Checklist

  • Format your code make fmt.
  • Add unit tests or integration tests.
  • Update the documentation related to the change.

@Syspretor Syspretor requested a review from cheyang November 21, 2025 10:12
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment thread api/workloads/v1alpha1/rolebasedgroup_types.go Outdated
@cheyang
Copy link
Copy Markdown
Collaborator

cheyang commented Nov 21, 2025

/gemini review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This work-in-progress PR adds support for using InstanceSet as a role workload type in the RoleBasedGroup (RBG) system. The change enables users to define roles with InstanceSet workloads, extending beyond the existing support for StatefulSet, Deployment, and LeaderWorkerSet.

Key changes:

  • Introduces InstanceSet workload type support with a new InstanceSetReconciler
  • Refactors headless service reconciliation into a shared ServiceReconciler
  • Adds support for Components field in RoleSpec for multi-component instance definitions
  • Implements LeaderWorkerSet-style topology support for InstanceSet workloads

Reviewed changes

Copilot reviewed 26 out of 30 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/reconciler/instanceset_reconciler.go New reconciler implementing InstanceSet workload lifecycle management
pkg/reconciler/instanceset_reconciler_test.go Comprehensive test suite for InstanceSetReconciler functionality
pkg/reconciler/svc_reconciler.go Extracted and generalized headless service reconciliation logic
pkg/reconciler/svc_reconciler_test.go Tests for the new ServiceReconciler
pkg/reconciler/workload_reconciler.go Added InstanceSet case to workload factory and equality checking
pkg/reconciler/sts_reconciler.go Refactored to use shared ServiceReconciler
pkg/reconciler/lws_reconciler.go Updated injector naming from "env" to "common_env"
pkg/reconciler/pod_reconciler.go Added support for "lws_env" injector alongside "common_env"
pkg/discovery/injector.go New InjectLeaderWorkerSetEnv method for LWS-specific environment variables
pkg/discovery/env_builder.go Added InstanceSet-specific environment variables (INSTANCE_NAME, COMPONENT_NAME, etc.)
pkg/reconciler/instance/core/instance_core.go Added LWS worker index label initialization for InstanceSet pods
pkg/reconciler/instanceset/sync/update.go Fixed type reference from deprecated to unified UpdateStrategyType
pkg/reconciler/instanceset/core/implement.go Fixed type reference from deprecated to unified UpdateStrategyType
api/workloads/v1alpha1/constant.go Added InstanceSetWorkloadType, UpdateStrategyType, and LWS-related constants
api/workloads/v1alpha1/rolebasedgroup_types.go Added Components type and fields to RollingUpdate for InstanceSet support
api/workloads/v1alpha1/instanceset_types.go Unified InstanceSetUpdateStrategyType with UpdateStrategyType
api/workloads/v1alpha1/zz_generated.deepcopy.go Generated deepcopy methods for new Components type
pkg/utils/constant.go Added InstanceSetCrdName constant
pkg/utils/crds.go Added GetInstanceSetGVK helper function
internal/controller/workloads/rolebasedgroup_controller.go Added InstanceSet to owned resources and dynamic CRD watching
test/wrappers/role_wrapper.go Added InstanceSet case to workload type builder
client-go/applyconfiguration/workloads/v1alpha1/* Generated apply configuration types for Components and updated RollingUpdate

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/reconciler/workload_reconciler.go
Comment thread pkg/reconciler/instanceset_reconciler.go Outdated
Comment thread pkg/reconciler/instanceset_reconciler_test.go Outdated
Comment thread api/workloads/v1alpha1/rolebasedgroup_types.go Outdated
Comment thread test/wrappers/role_wrapper.go Outdated
Comment thread pkg/discovery/injector.go Outdated
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for InstanceSet as a workload type for RoleBasedGroup. The changes are extensive, including new API types, a new InstanceSetReconciler, and refactoring of service reconciliation logic. The implementation looks mostly correct, but I've found a few issues.

Specifically:

  • The WorkloadEqual function has a bug for InstanceSet that will cause reconciliation loops.
  • There are some minor style issues in comments and string formatting.

Additionally, it seems the removal of semanticallyEqualStatefulSet from sts_reconciler.go will cause a compilation error in workload_reconciler.go as it's still being called. Please ensure this is resolved.

I've added specific comments for the issues I found in the changed files.

Comment thread pkg/reconciler/workload_reconciler.go Outdated
Comment thread api/workloads/v1alpha1/rolebasedgroup_types.go Outdated
Comment thread pkg/reconciler/instanceset_reconciler.go Outdated
Comment thread api/workloads/v1alpha1/constant.go Outdated
@Syspretor Syspretor marked this pull request as ready for review November 26, 2025 02:54
@Syspretor Syspretor changed the title [WIP] feature(rbg): support to use InstanceSet as role workload feature(rbg): support to use InstanceSet as role workload Nov 26, 2025
@Syspretor Syspretor force-pushed the feature/instanceset-integration branch 2 times, most recently from fa5b020 to e741df0 Compare November 26, 2025 06:47
@Syspretor Syspretor force-pushed the feature/instanceset-integration branch 3 times, most recently from a2317f5 to c2aaf59 Compare November 27, 2025 11:58
@Syspretor
Copy link
Copy Markdown
Collaborator Author

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for InstanceSet as a workload for Roles, which is a significant and valuable feature. The changes include new API fields, a new InstanceSetReconciler, and refactoring of the service reconciliation logic. The implementation looks solid, and the inclusion of examples and tests is appreciated. My review focuses on improving API consistency, code clarity, and fixing a potential bug in label assignments to ensure consistency across different workload patterns.

Comment thread pkg/reconciler/instanceset_reconciler.go
Comment thread pkg/reconciler/instanceset_reconciler.go Outdated
Comment thread api/workloads/v1alpha1/constant.go Outdated
Comment thread pkg/reconciler/instance/utils/instance_utils.go Outdated
@Syspretor Syspretor force-pushed the feature/instanceset-integration branch 3 times, most recently from 44b884b to d015a4c Compare November 28, 2025 01:42
Comment thread pkg/reconciler/common.go Outdated
Comment thread pkg/reconciler/common.go Outdated
Comment thread pkg/reconciler/common.go
Comment thread pkg/reconciler/common.go
Comment thread pkg/reconciler/common.go
@Syspretor Syspretor force-pushed the feature/instanceset-integration branch from d015a4c to 3b687be Compare November 28, 2025 02:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/instanceset/instanceset-with-components.yaml
Comment thread pkg/reconciler/instanceset_reconciler.go Outdated
Comment thread pkg/reconciler/instanceset_reconciler.go Outdated
Comment thread pkg/reconciler/common.go Outdated
Comment thread examples/instanceset/instanceset-with-deployment-pattern.yaml
Comment thread examples/instanceset/instanceset-with-lws-pattern.yaml
Comment thread api/workloads/v1alpha1/constant.go Outdated
@Syspretor Syspretor force-pushed the feature/instanceset-integration branch from 3b687be to d8aa03b Compare November 28, 2025 10:06
@Syspretor Syspretor force-pushed the feature/instanceset-integration branch from d8aa03b to 5685846 Compare November 28, 2025 10:17
Copy link
Copy Markdown
Collaborator

@cheyang cheyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@cheyang cheyang merged commit bc8aa6a into sgl-project:main Nov 28, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants