chore: add examples for mooncake integration with v1alpha2 api by Syspretor · Pull Request #272 · sgl-project/rbg

Syspretor · 2026-04-13T12:39:36Z

Ⅰ. Motivation

Ⅱ. Modifications

Ⅲ. Does this pull request fix one issue?

fixes #XXXX

Ⅳ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅴ. Describe how to verify it

VI. Special notes for reviews

Checklist

Format your code make fmt.
Add unit tests or integration tests.
Update the documentation related to the change.

gemini-code-assist

Code Review

This pull request introduces two new example configurations for LLM inference using SGLang and Mooncake KV Cache on Kubernetes using the RoleBasedGroup v1alpha2 API: one for aggregated deployment and another for Prefill-Decode (PD) disaggregated deployment. Feedback highlights a critical missing configuration in the PD-disaggregated example where the decode role lacks the necessary Mooncake connection settings and transfer backend flags to participate in the caching mechanism. Additionally, there is a discrepancy in the aggregated example where the rolloutStrategy mentioned in the file header is not actually implemented in the resource specification.

Copilot

Pull request overview

Adds v1alpha2 RoleBasedGroup example manifests demonstrating SGLang + Mooncake KV cache deployments, intended to replace/modernize deprecated v1alpha1 Mooncake examples.

Changes:

Added a v1alpha2 aggregated inference example using Mooncake (master/store/worker).
Added a v1alpha2 PD-disaggregated inference example using Mooncake (master/store/prefill/decode/router).
Used roleTemplates + standalonePattern.templateRef.patch to reduce duplication across roles.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
examples/inference/ecosystem/mooncake/agg-with-mooncake.yaml	New aggregated Mooncake-backed SGLang example (master/store/worker + Service).
examples/inference/ecosystem/mooncake/pd-disagg-with-mooncake.yaml	New PD-disaggregated Mooncake-backed SGLang example (master/store/prefill/decode/router + Service).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: 玖宇 <[email protected]>

Copilot

Pull request overview

This PR adds new v1alpha2 example manifests demonstrating Mooncake integrations for inference (SGLang / vLLM PD-disaggregation) and for running Mooncake as a reusable KV-cache service.

Changes:

Add PD-disaggregated inference examples using Mooncake transfer backend (SGLang + vLLM).
Add v1alpha2 Mooncake KV-cache examples (standalone service, aggregated inference, PD-disaggregated inference with cache reuse).
Standardize these examples on RoleBasedGroup v1alpha2 with standalonePattern and roleTemplates.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
examples/inference/ecosystem/mooncake/mooncake-transfer-engine/vllm-pd-disgg-with-mooncake-te.yaml	vLLM PD-disaggregation example using Mooncake transfer backend + router Service
examples/inference/ecosystem/mooncake/mooncake-transfer-engine/sgl-pd-disgg-with-mooncake-te.yaml	SGLang PD-disaggregation example using Mooncake transfer backend + router Service
examples/inference/ecosystem/mooncake/mooncake-store/standalone-mooncake-store.yaml	Standalone Mooncake master/store deployment to be reused by other RBGs
examples/inference/ecosystem/mooncake/mooncake-store/pd-disagg-kvcache-reuse-with-mooncake.yaml	PD-disaggregated SGLang example embedding Mooncake master/store for KV-cache reuse
examples/inference/ecosystem/mooncake/mooncake-store/agg-kvcache-reuse-with-mooncake.yaml	Aggregated SGLang example embedding Mooncake master/store for KV-cache reuse

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+#   2. In your inference RBG, configure the following environment variables:
+#      - MOONCAKE_MASTER: "s-<rbg-name>-mooncake-master:50051"
+#        Example: "s-mooncake-service-mooncake-master:50051"
+#      - MOONCAKE_TE_META_DATA_SERVER: "http://s-<rbg-name>-mooncake-master:8080/metadata"
+#        Example: "http://s-mooncake-service-mooncake-master:8080/metadata"


+# vLLM runtime with Mooncake as the KV transfer backend.
+#
+# Architecture:
+#   - router: vLLM Router (NIXL-based) for request routing between prefill and decode


+#
+# Environment variables:
+#   - VLLM_MOONCAKE_BOOTSTRAP_PORT: Port for Mooncake bootstrap server (default: 8998)
+#     Required only for prefiller instances


+                    - --rpc_port
+                    - "50051"
+                    - --http_metadata_server_host
+                    - $(POD_IP)
+                    - --http_metadata_server_port


stmatengss · 2026-04-16T15:27:20Z

LGTM

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

stmatengss reviewed Apr 13, 2026

View reviewed changes

Comment thread examples/inference/ecosystem/mooncake/mooncake-store/agg-kvcache-reuse-with-mooncake.yaml

cheyang requested a review from Copilot April 13, 2026 13:19

Copilot started reviewing on behalf of cheyang April 13, 2026 13:19 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

ykwd mentioned this pull request Apr 14, 2026

[Usage]: How to use mooncake-client kvcache-ai/Mooncake#1882

Closed

1 task

chore: add examples for mooncake integration with v1alpha2 api

033f932

Signed-off-by: 玖宇 <[email protected]>

Syspretor force-pushed the chore/add-examples-for-mc branch from 951ab61 to 033f932 Compare April 14, 2026 08:47

cheyang requested review from Copilot and stmatengss April 14, 2026 13:33

cheyang approved these changes Apr 14, 2026

View reviewed changes

cheyang merged commit ac8823f into sgl-project:main Apr 14, 2026
11 checks passed

Copilot started reviewing on behalf of cheyang April 14, 2026 13:33 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add examples for mooncake integration with v1alpha2 api#272

chore: add examples for mooncake integration with v1alpha2 api#272
cheyang merged 1 commit intosgl-project:mainfrom
Syspretor:chore/add-examples-for-mc

Syspretor commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

stmatengss commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Syspretor commented Apr 13, 2026

Ⅰ. Motivation

Ⅱ. Modifications

Ⅲ. Does this pull request fix one issue?

Ⅳ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅴ. Describe how to verify it

VI. Special notes for reviews

Checklist

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

stmatengss commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants