rbgcli: adds multi-node LLM inference serving support by diw-zw · Pull Request #265 · sgl-project/rbg

diw-zw · 2026-04-10T02:18:06Z

Summary

Adds multi-node LLM inference serving support to kubectl-rbg llm run command, enabling deployment of distributed vLLM/SGLang inference clusters.

Key Changes

1. Multi-node Serving Support

Add --distributed-nodes flag to deploy multi-node distributed inference services
Generate correct startup parameters for Leader and Worker roles
Worker uses Strategic Merge Patch to inject complete args, preventing parameter override issues

2. Port-forward Improvements

llm chat command correctly port-forwards to Leader Pod for LeaderWorkerPattern

3. Shared Memory Configuration

Add --shm-size flag to configure shared memory size for large model inference workloads

4. Code Refactoring

Unify generateArgs function for container startup argument construction
Improve engine interface design for better maintainability

5. Service Readiness Check

Check if LLM service is up before returning from llm run command
Ensure the inference endpoint is ready before user interaction

Files Changed

File	Description
`cmd/cli/cmd/llm/run.go`	Core multi-node deployment logic
`cmd/cli/plugin/engine/vllm.go`	vLLM multi-node parameter generation
`cmd/cli/plugin/engine/sglang.go`	SGLang multi-node parameter generation
`cmd/cli/cmd/llm/chat/chat.go`	Port-forward logic fix
`cmd/cli/cmd/llm/chat/portforward.go`	Service readiness check
`cmd/cli/cmd/llm/pull.go`	Minor fix
Test files	Add unit tests for multi-node scenarios

gemini-code-assist · 2026-04-10T02:18:12Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

This PR extends the kubectl-rbg llm CLI to support multi-node (Leader/Worker) LLM inference serving for vLLM and SGLang, including engine refactors, new model config fields, and improved port-forward behavior for leader-only API serving.

Changes:

Refactors engine plugins to generate full Pattern specs (Standalone vs LeaderWorker) and adds distributed args + shared-memory (/dev/shm) support.
Updates model config schema to use Kubernetes-native ResourceList/EnvVar plus new distributed.size and shmSize fields; adjusts tests accordingly.
Adds readiness + API probing flow to llm run and fixes llm chat to port-forward to the leader pod for LeaderWorkerPattern.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`internal/controller/workloads/rolebasedgroup_controller_test.go`	Formatting-only adjustments in label expectation maps.
`cmd/cli/plugin/engine/interface.go`	Replaces `GenerateTemplate` with `GeneratePattern` and introduces `GenerateOptions` (distributed size, shm size, resources/env/args).
`cmd/cli/plugin/engine/vllm.go`	Implements `GeneratePattern` with LeaderWorker support, worker strategic-merge patch for full args, shm volume support.
`cmd/cli/plugin/engine/vllm_test.go`	Adds coverage for distributed pattern generation and validates worker patch args via JSON parsing.
`cmd/cli/plugin/engine/sglang.go`	Implements `GeneratePattern` with distributed args and shm volume support (no worker patch needed).
`cmd/cli/plugin/engine/sglang_test.go`	Updates tests for pattern output and adds distributed pattern assertions.
`cmd/cli/cmd/llm/run/model_config.go`	Updates YAML schema: resources/env become K8s types; adds `distributed` + `shmSize`.
`cmd/cli/cmd/llm/run/models.yaml`	Updates builtin model definitions for new resource format and adds distributed modes.
`cmd/cli/cmd/llm/run/model_config_test.go`	Adjusts tests to match new `ResourceList` parsing and expectations.
`cmd/cli/cmd/llm/run.go`	Refactors run flow (mode resolution, pattern generation, storage mount), adds wait-for-ready and API probing via port-forward.
`cmd/cli/cmd/llm/run_test.go`	Updates tests to exercise `generateRBG` and metadata annotation behavior.
`cmd/cli/cmd/llm/chat/portforward.go`	Exports a safer port-forward session abstraction (`StartPortForward`, `IsAlive`, `Stop`) with single `Wait()` ownership.
`cmd/cli/cmd/llm/chat/chat.go`	Ensures `llm chat` targets the leader pod first when using LeaderWorkerPattern.
`cmd/cli/cmd/llm/pull.go` / `cmd/cli/cmd/llm/pull_test.go`	Increases pull job active deadline default from 2h to 24h and updates tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…Pattern; any component for standalonePattern

Syspretor

/lgtm

diw-zw requested a review from cheyang April 10, 2026 02:18

diw-zw force-pushed the multi-node branch 2 times, most recently from 9b1a131 to 55fe737 Compare April 10, 2026 10:23

cheyang requested a review from Copilot April 14, 2026 06:47

Copilot started reviewing on behalf of cheyang April 14, 2026 06:48 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Comment thread cmd/cli/cmd/llm/run.go Outdated

Comment thread cmd/cli/cmd/llm/run.go Outdated

Comment thread cmd/cli/cmd/llm/run.go Outdated

cheyang reviewed Apr 14, 2026

View reviewed changes

Comment thread cmd/cli/cmd/llm/run.go Outdated

照微 added 4 commits April 16, 2026 18:42

[cli] llm run: support multi-nodes serving

4144f20

Add shared memory config

279fe32

Port-forward the right pod in "llm chat" cli: leader for LeaderWorker…

76a3082

…Pattern; any component for standalonePattern

refector

16f9c46

diw-zw force-pushed the multi-node branch from 55fe737 to 91851ff Compare April 16, 2026 10:43

Check if llm service is up before return

720040f

diw-zw force-pushed the multi-node branch from 91851ff to 720040f Compare April 19, 2026 14:40

Syspretor approved these changes Apr 20, 2026

View reviewed changes

Syspretor merged commit f8b8951 into sgl-project:main Apr 20, 2026
8 of 9 checks passed

diw-zw deleted the multi-node branch April 30, 2026 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rbgcli: adds multi-node LLM inference serving support #265

rbgcli: adds multi-node LLM inference serving support #265
Syspretor merged 5 commits intosgl-project:mainfrom
diw-zw:multi-node

diw-zw commented Apr 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Syspretor left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

diw-zw commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Files Changed

Uh oh!

gemini-code-assist Bot commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Syspretor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

diw-zw commented Apr 10, 2026 •

edited

Loading