Skip to content

rbgcli: adds multi-node LLM inference serving support #265

Merged
Syspretor merged 5 commits intosgl-project:mainfrom
diw-zw:multi-node
Apr 20, 2026
Merged

rbgcli: adds multi-node LLM inference serving support #265
Syspretor merged 5 commits intosgl-project:mainfrom
diw-zw:multi-node

Conversation

@diw-zw
Copy link
Copy Markdown
Collaborator

@diw-zw diw-zw commented Apr 10, 2026

Summary

Adds multi-node LLM inference serving support to kubectl-rbg llm run command, enabling deployment of distributed vLLM/SGLang inference clusters.

Key Changes

1. Multi-node Serving Support

  • Add --distributed-nodes flag to deploy multi-node distributed inference services
  • Generate correct startup parameters for Leader and Worker roles
  • Worker uses Strategic Merge Patch to inject complete args, preventing parameter override issues

2. Port-forward Improvements

  • llm chat command correctly port-forwards to Leader Pod for LeaderWorkerPattern

3. Shared Memory Configuration

  • Add --shm-size flag to configure shared memory size for large model inference workloads

4. Code Refactoring

  • Unify generateArgs function for container startup argument construction
  • Improve engine interface design for better maintainability

5. Service Readiness Check

  • Check if LLM service is up before returning from llm run command
  • Ensure the inference endpoint is ready before user interaction

Files Changed

File Description
cmd/cli/cmd/llm/run.go Core multi-node deployment logic
cmd/cli/plugin/engine/vllm.go vLLM multi-node parameter generation
cmd/cli/plugin/engine/sglang.go SGLang multi-node parameter generation
cmd/cli/cmd/llm/chat/chat.go Port-forward logic fix
cmd/cli/cmd/llm/chat/portforward.go Service readiness check
cmd/cli/cmd/llm/pull.go Minor fix
Test files Add unit tests for multi-node scenarios

@gemini-code-assist
Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@diw-zw diw-zw requested a review from cheyang April 10, 2026 02:18
@diw-zw diw-zw force-pushed the multi-node branch 2 times, most recently from 9b1a131 to 55fe737 Compare April 10, 2026 10:23
@cheyang cheyang requested a review from Copilot April 14, 2026 06:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the kubectl-rbg llm CLI to support multi-node (Leader/Worker) LLM inference serving for vLLM and SGLang, including engine refactors, new model config fields, and improved port-forward behavior for leader-only API serving.

Changes:

  • Refactors engine plugins to generate full Pattern specs (Standalone vs LeaderWorker) and adds distributed args + shared-memory (/dev/shm) support.
  • Updates model config schema to use Kubernetes-native ResourceList/EnvVar plus new distributed.size and shmSize fields; adjusts tests accordingly.
  • Adds readiness + API probing flow to llm run and fixes llm chat to port-forward to the leader pod for LeaderWorkerPattern.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
internal/controller/workloads/rolebasedgroup_controller_test.go Formatting-only adjustments in label expectation maps.
cmd/cli/plugin/engine/interface.go Replaces GenerateTemplate with GeneratePattern and introduces GenerateOptions (distributed size, shm size, resources/env/args).
cmd/cli/plugin/engine/vllm.go Implements GeneratePattern with LeaderWorker support, worker strategic-merge patch for full args, shm volume support.
cmd/cli/plugin/engine/vllm_test.go Adds coverage for distributed pattern generation and validates worker patch args via JSON parsing.
cmd/cli/plugin/engine/sglang.go Implements GeneratePattern with distributed args and shm volume support (no worker patch needed).
cmd/cli/plugin/engine/sglang_test.go Updates tests for pattern output and adds distributed pattern assertions.
cmd/cli/cmd/llm/run/model_config.go Updates YAML schema: resources/env become K8s types; adds distributed + shmSize.
cmd/cli/cmd/llm/run/models.yaml Updates builtin model definitions for new resource format and adds distributed modes.
cmd/cli/cmd/llm/run/model_config_test.go Adjusts tests to match new ResourceList parsing and expectations.
cmd/cli/cmd/llm/run.go Refactors run flow (mode resolution, pattern generation, storage mount), adds wait-for-ready and API probing via port-forward.
cmd/cli/cmd/llm/run_test.go Updates tests to exercise generateRBG and metadata annotation behavior.
cmd/cli/cmd/llm/chat/portforward.go Exports a safer port-forward session abstraction (StartPortForward, IsAlive, Stop) with single Wait() ownership.
cmd/cli/cmd/llm/chat/chat.go Ensures llm chat targets the leader pod first when using LeaderWorkerPattern.
cmd/cli/cmd/llm/pull.go / cmd/cli/cmd/llm/pull_test.go Increases pull job active deadline default from 2h to 24h and updates tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cmd/cli/cmd/llm/run.go Outdated
Comment thread cmd/cli/cmd/llm/run.go Outdated
Comment thread cmd/cli/cmd/llm/run.go Outdated
Comment thread cmd/cli/cmd/llm/run.go Outdated
Copy link
Copy Markdown
Collaborator

@Syspretor Syspretor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@Syspretor Syspretor merged commit f8b8951 into sgl-project:main Apr 20, 2026
8 of 9 checks passed
@diw-zw diw-zw deleted the multi-node branch April 30, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants