rbgcli: adds multi-node LLM inference serving support #265
Merged
Syspretor merged 5 commits intosgl-project:mainfrom Apr 20, 2026
Merged
rbgcli: adds multi-node LLM inference serving support #265Syspretor merged 5 commits intosgl-project:mainfrom
Syspretor merged 5 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
9b1a131 to
55fe737
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends the kubectl-rbg llm CLI to support multi-node (Leader/Worker) LLM inference serving for vLLM and SGLang, including engine refactors, new model config fields, and improved port-forward behavior for leader-only API serving.
Changes:
- Refactors engine plugins to generate full
Patternspecs (Standalone vs LeaderWorker) and adds distributed args + shared-memory (/dev/shm) support. - Updates model config schema to use Kubernetes-native
ResourceList/EnvVarplus newdistributed.sizeandshmSizefields; adjusts tests accordingly. - Adds readiness + API probing flow to
llm runand fixesllm chatto port-forward to the leader pod for LeaderWorkerPattern.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
internal/controller/workloads/rolebasedgroup_controller_test.go |
Formatting-only adjustments in label expectation maps. |
cmd/cli/plugin/engine/interface.go |
Replaces GenerateTemplate with GeneratePattern and introduces GenerateOptions (distributed size, shm size, resources/env/args). |
cmd/cli/plugin/engine/vllm.go |
Implements GeneratePattern with LeaderWorker support, worker strategic-merge patch for full args, shm volume support. |
cmd/cli/plugin/engine/vllm_test.go |
Adds coverage for distributed pattern generation and validates worker patch args via JSON parsing. |
cmd/cli/plugin/engine/sglang.go |
Implements GeneratePattern with distributed args and shm volume support (no worker patch needed). |
cmd/cli/plugin/engine/sglang_test.go |
Updates tests for pattern output and adds distributed pattern assertions. |
cmd/cli/cmd/llm/run/model_config.go |
Updates YAML schema: resources/env become K8s types; adds distributed + shmSize. |
cmd/cli/cmd/llm/run/models.yaml |
Updates builtin model definitions for new resource format and adds distributed modes. |
cmd/cli/cmd/llm/run/model_config_test.go |
Adjusts tests to match new ResourceList parsing and expectations. |
cmd/cli/cmd/llm/run.go |
Refactors run flow (mode resolution, pattern generation, storage mount), adds wait-for-ready and API probing via port-forward. |
cmd/cli/cmd/llm/run_test.go |
Updates tests to exercise generateRBG and metadata annotation behavior. |
cmd/cli/cmd/llm/chat/portforward.go |
Exports a safer port-forward session abstraction (StartPortForward, IsAlive, Stop) with single Wait() ownership. |
cmd/cli/cmd/llm/chat/chat.go |
Ensures llm chat targets the leader pod first when using LeaderWorkerPattern. |
cmd/cli/cmd/llm/pull.go / cmd/cli/cmd/llm/pull_test.go |
Increases pull job active deadline default from 2h to 24h and updates tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cheyang
reviewed
Apr 14, 2026
added 4 commits
April 16, 2026 18:42
…Pattern; any component for standalonePattern
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds multi-node LLM inference serving support to
kubectl-rbg llm runcommand, enabling deployment of distributed vLLM/SGLang inference clusters.Key Changes
1. Multi-node Serving Support
--distributed-nodesflag to deploy multi-node distributed inference services2. Port-forward Improvements
llm chatcommand correctly port-forwards to Leader Pod for LeaderWorkerPattern3. Shared Memory Configuration
--shm-sizeflag to configure shared memory size for large model inference workloads4. Code Refactoring
generateArgsfunction for container startup argument construction5. Service Readiness Check
Files Changed
cmd/cli/cmd/llm/run.gocmd/cli/plugin/engine/vllm.gocmd/cli/plugin/engine/sglang.gocmd/cli/cmd/llm/chat/chat.gocmd/cli/cmd/llm/chat/portforward.gocmd/cli/cmd/llm/pull.go