Skip to content

chore: add examples for mooncake integration with v1alpha2 api#272

Merged
cheyang merged 1 commit intosgl-project:mainfrom
Syspretor:chore/add-examples-for-mc
Apr 14, 2026
Merged

chore: add examples for mooncake integration with v1alpha2 api#272
cheyang merged 1 commit intosgl-project:mainfrom
Syspretor:chore/add-examples-for-mc

Conversation

@Syspretor
Copy link
Copy Markdown
Collaborator

Ⅰ. Motivation

Ⅱ. Modifications

Ⅲ. Does this pull request fix one issue?

fixes #XXXX

Ⅳ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅴ. Describe how to verify it

VI. Special notes for reviews

Checklist

  • Format your code make fmt.
  • Add unit tests or integration tests.
  • Update the documentation related to the change.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two new example configurations for LLM inference using SGLang and Mooncake KV Cache on Kubernetes using the RoleBasedGroup v1alpha2 API: one for aggregated deployment and another for Prefill-Decode (PD) disaggregated deployment. Feedback highlights a critical missing configuration in the PD-disaggregated example where the decode role lacks the necessary Mooncake connection settings and transfer backend flags to participate in the caching mechanism. Additionally, there is a discrepancy in the aggregated example where the rolloutStrategy mentioned in the file header is not actually implemented in the resource specification.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds v1alpha2 RoleBasedGroup example manifests demonstrating SGLang + Mooncake KV cache deployments, intended to replace/modernize deprecated v1alpha1 Mooncake examples.

Changes:

  • Added a v1alpha2 aggregated inference example using Mooncake (master/store/worker).
  • Added a v1alpha2 PD-disaggregated inference example using Mooncake (master/store/prefill/decode/router).
  • Used roleTemplates + standalonePattern.templateRef.patch to reduce duplication across roles.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
examples/inference/ecosystem/mooncake/agg-with-mooncake.yaml New aggregated Mooncake-backed SGLang example (master/store/worker + Service).
examples/inference/ecosystem/mooncake/pd-disagg-with-mooncake.yaml New PD-disaggregated Mooncake-backed SGLang example (master/store/prefill/decode/router + Service).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/inference/ecosystem/mooncake/agg-with-mooncake.yaml Outdated
Comment thread examples/inference/ecosystem/mooncake/agg-with-mooncake.yaml Outdated
Comment thread examples/inference/ecosystem/mooncake/pd-disagg-with-mooncake.yaml Outdated
Comment thread examples/inference/ecosystem/mooncake/pd-disagg-with-mooncake.yaml Outdated
@Syspretor Syspretor force-pushed the chore/add-examples-for-mc branch from 951ab61 to 033f932 Compare April 14, 2026 08:47
@cheyang cheyang requested review from Copilot and stmatengss April 14, 2026 13:33
@cheyang cheyang merged commit ac8823f into sgl-project:main Apr 14, 2026
11 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds new v1alpha2 example manifests demonstrating Mooncake integrations for inference (SGLang / vLLM PD-disaggregation) and for running Mooncake as a reusable KV-cache service.

Changes:

  • Add PD-disaggregated inference examples using Mooncake transfer backend (SGLang + vLLM).
  • Add v1alpha2 Mooncake KV-cache examples (standalone service, aggregated inference, PD-disaggregated inference with cache reuse).
  • Standardize these examples on RoleBasedGroup v1alpha2 with standalonePattern and roleTemplates.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
examples/inference/ecosystem/mooncake/mooncake-transfer-engine/vllm-pd-disgg-with-mooncake-te.yaml vLLM PD-disaggregation example using Mooncake transfer backend + router Service
examples/inference/ecosystem/mooncake/mooncake-transfer-engine/sgl-pd-disgg-with-mooncake-te.yaml SGLang PD-disaggregation example using Mooncake transfer backend + router Service
examples/inference/ecosystem/mooncake/mooncake-store/standalone-mooncake-store.yaml Standalone Mooncake master/store deployment to be reused by other RBGs
examples/inference/ecosystem/mooncake/mooncake-store/pd-disagg-kvcache-reuse-with-mooncake.yaml PD-disaggregated SGLang example embedding Mooncake master/store for KV-cache reuse
examples/inference/ecosystem/mooncake/mooncake-store/agg-kvcache-reuse-with-mooncake.yaml Aggregated SGLang example embedding Mooncake master/store for KV-cache reuse

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +25 to +29
# 2. In your inference RBG, configure the following environment variables:
# - MOONCAKE_MASTER: "s-<rbg-name>-mooncake-master:50051"
# Example: "s-mooncake-service-mooncake-master:50051"
# - MOONCAKE_TE_META_DATA_SERVER: "http://s-<rbg-name>-mooncake-master:8080/metadata"
# Example: "http://s-mooncake-service-mooncake-master:8080/metadata"
# vLLM runtime with Mooncake as the KV transfer backend.
#
# Architecture:
# - router: vLLM Router (NIXL-based) for request routing between prefill and decode
#
# Environment variables:
# - VLLM_MOONCAKE_BOOTSTRAP_PORT: Port for Mooncake bootstrap server (default: 8998)
# Required only for prefiller instances
Comment on lines +83 to +87
- --rpc_port
- "50051"
- --http_metadata_server_host
- $(POD_IP)
- --http_metadata_server_port
@stmatengss
Copy link
Copy Markdown
Collaborator

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants