[KEP-30]: Introduce InstanceSet Workload Support in RoleBasedGroup for Improved LLM Orchestration#26
Merged
cheyang merged 3 commits intosgl-project:mainfrom Nov 7, 2025
Merged
Conversation
bfc0ef6 to
bddba94
Compare
cheyang
reviewed
Sep 17, 2025
|
|
||
|
|
||
| #### Option 2: Expose in a Way Compatible with LWS | ||
| Keep the RBG API compatible with the existing **LeaderWorkerSet (LWS)** structure and semantics, |
Collaborator
There was a problem hiding this comment.
I prefer option 2, and I think it can support single template ( via sts before) and lws template (via lws before) while still letting end user opt-in to InstanceSet power behind the same fields.
cheyang
reviewed
Sep 17, 2025
| components: | ||
| - name: leader | ||
| size: 1 | ||
| serviceName: deepseek-r1-master |
Collaborator
There was a problem hiding this comment.
Is this also headless service? And it will be created by InstanceSet controller automatically?
Contributor
There was a problem hiding this comment.
- In the design,
serviceNamerepresents the name of the headless service. - In my opinion, since headless services are Kubernetes resources within an LLM application, they are better created by the platform.
Contributor
Author
There was a problem hiding this comment.
- In the design,
serviceNamerepresents the name of the headless service.- In my opinion, since headless services are Kubernetes resources within an LLM application, they are better created by the platform.
agree
Signed-off-by: veophi <[email protected]>
Signed-off-by: veophi <[email protected]>
bddba94 to
537373a
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ⅰ. Motivation
The current RoleBasedGroup (RBG) relies on multiple external workloads such as Deployment, StatefulSet, and LeaderWorkerSet (LWS).
This dependency limits RBG's extensibility and increases the complexity for users who are not deeply familiar with the Kubernetes workload ecosystem.
To better serve large-model (LLM) inference/training scenarios and reduce user cognitive overhead, this PR proposes introducing InstanceSet as a first-class workload type in RBG.
InstanceSet enables:
Instanceas a minimal orchestration unit, with richer lifecycle control.Ⅱ. Modifications
Introduce InstanceSet KEP docs:
Ⅲ. Does this pull request fix one issue?
fixes #3 #21
Ⅳ. List the added test cases
TBD — Will add unit tests and integration tests for:
(No backward compatibility tests for switching existing RBG workloads to InstanceSet as it’s out of scope.)
Ⅴ. Describe how to verify it
VI. Special notes for reviews
Checklist
make fmt.