Support Mooncake migration backend for PD disaggregation by Risc-lt · Pull Request #3620 · InternLM/lmdeploy

Risc-lt · 2025-06-07T15:34:47Z

PD-Disaggregated KVCache Transfer Pipeline with Mooncake

This PR introduces a new implementation of the Prefill-Decode disaggregated KVCache transfer pipeline with LMDeploy, using native Mooncake components of transfer engine as an option other than dlslime. The goal is to enable disaggregated prefill/decode workloads across nodes for large-scale LLM inference, inspired by lmdeploy-distserve. #3304 (comment)

Architecture Overview

Interfaces

The Mooncake migration backend implementation expose interfaces below:

p2p_initialize: Notify Prefill & Decode Engines to initilize migration backend instance of Mooncake transfer engine.
register_memory_region: Register memory region for the connection
endpoint_info: Return local memry pool and endpoint configuartion info.
p2p_connect: Recieve endpoint infomation from the other side of connecting nodes.
p2p_migrate: Set up conection for prefill-decode nodes and transfer kvcache synchronously in read mode.

Control Plane

Proxy server firstly use FastAPI post to send the endpoint info to notify the prefill-decode servers to send their local endpoint info to the other one through TCP socket. After p2p-connection is established, Mooncake migration backend start to transfer kvcache through RDMA link.

Workflow

Current Status

Functional validation on A10 with eRDMA as RoCEv2 support.
All basic PD workflows (initialize $\Rightarrow$ connect $\Rightarrow$ prefill $\Rightarrow$ migrate $\Rightarrow$ decode) goes well as previous version of dlslime.

Next Steps

Check migration addresses for validating the quality of ouput tokens.
Improve the kvcache transferring efficiency to surpass dlslime version.
Remove unecessary testing logs.

How to Build

pip install mooncake-transfer-engine
pip install -v -e .

How to Run

Start Proxy

lmdeploy serve proxy   --server-name <proxy-ip-address>   --server-port 8000   --routing-strategy "min_expected_latency"   --serving-strategy DistServe   --log-level INFO

Start Prefill Engine

CUDA_VISIBLE_DEVICES=0 lmdeploy serve api_server Qwen/Qwen2.5-7B-Instruct   --server-name <server-ip-address>  --server-port 23333  --role Prefill   --proxy-url <proxy-ip-address:port>  --backend pytorch  --migration-backend Mooncake

Start Decode Engine

CUDA_VISIBLE_DEVICES=1 lmdeploy serve api_server Qwen/Qwen2.5-7B-Instruct   --server-name <server-ip-address>  --server-port 23334   --role Decode   --proxy-url <proxy-ip-address:port>   --backend pytorch  --migration-backend Mooncake

Client Side

curl -X POST "<proxy-ip-address:port>/v1/completions" -H "Content-Type: application/json" -d '{"model":"Qwen/Qwen2.5-7B-Instruct","temperature":0,"prompt":"Shanghai is a city that ","max_tokens":16,"stream":false}'

stmatengss · 2025-06-08T08:45:59Z

@Risc-lt You can use tools like ruff or yapf to automatically fix code formatting and linting issues.

lvhan028 · 2025-06-08T12:04:37Z

The linting issue can be resolved by the following:

pip install pre-commit
cd lmdeploy # the root dir of lmdeploy repo
pre-commit run --all-files

Make sure that the python version is 3.10

lvhan028 · 2025-06-11T10:23:27Z

+        """Initialize p2p connection for this specific link."""
+        # TODO: Support more types of metadata_server
+        # e.g. "etcd://192.168.0.137:2379"
+        metadata_server = 'P2PHANDSHAKE'


what is metadata_server used for?

There are two modes: (1) 'P2PHANDSHAKE' (a magic string): no metadata server for maintaining connection information, which is intended for small-scale PD disaggregation, and (2) support for etcd/redis/http_server as the centralized server for larger-scale PD disaggregation.

lvhan028 · 2025-06-11T10:32:25Z

+        try:
+            from mooncake.engine import TransferEngine
+        except ImportError as e:
+            raise ImportError('Please install mooncake by following the instructions at '


When passing --migration-backend Mooncake, it's better to raise an import error immediately if Mooncake is not installed during API server launch.
Can we put it in the constructor of MooncakeBackend?

Risc-lt · 2025-06-14T14:52:18Z

Having solved the problems above. NVLink support will be covered in next pr. cc @stmatengss @lvhan028 @JimyMa

* feat: init mooncake migration backend * fix: mooncake endpoint and connect * fix: add cli option for Mooncake * feat: modify certralized mode into p2p and add endpoint building * fix: transfer failure and rm rdma info and * fix: run pre-commit to lint * fix: remove unnecessary print log * chore: lint code * feat: add async migration mode * chore: modify log printing and class overide * chore: rm info in annotations * fix: migration address key

Risc-lt added 5 commits May 23, 2025 15:54

feat: init mooncake migration backend

d2bb2c8

fix: mooncake endpoint and connect

aff4ab1

fix: add cli option for Mooncake

11e96c6

feat: modify certralized mode into p2p and add endpoint building

7280ad4

fix: transfer failure and rm rdma info and

dba69b0

Risc-lt and others added 5 commits June 8, 2025 22:08

Merge branch 'InternLM:main' into feat/mooncake

a9bfc44

fix: run pre-commit to lint

59a0f3f

fix: remove unnecessary print log

fcbef15

chore: lint code

5ee3b0b

feat: add async migration mode

e57a7cc

lvhan028 added the enhancement New feature or request label Jun 9, 2025

lvhan028 requested review from RunningLeon, grimoire and lvhan028 June 9, 2025 07:05

stmatengss reviewed Jun 10, 2025

View reviewed changes

Comment thread lmdeploy/pytorch/disagg/backend/mooncake.py Outdated

lvhan028 reviewed Jun 11, 2025

View reviewed changes

Comment thread lmdeploy/pytorch/disagg/backend/mooncake.py Outdated

lvhan028 reviewed Jun 11, 2025

View reviewed changes

Comment thread lmdeploy/pytorch/disagg/backend/mooncake.py

lvhan028 reviewed Jun 11, 2025

View reviewed changes

JimyMa reviewed Jun 12, 2025

View reviewed changes

Comment thread lmdeploy/pytorch/disagg/backend/mooncake.py Outdated

Risc-lt added 2 commits June 14, 2025 22:21

chore: modify log printing and class overide

88a3321

Merge branch 'main' into feat/mooncake

5dd5c61

stmatengss reviewed Jun 17, 2025

View reviewed changes

Comment thread lmdeploy/pytorch/disagg/backend/mooncake.py Outdated

Risc-lt added 2 commits June 19, 2025 10:31

chore: rm info in annotations

b5a434c

fix: migration address key

0de36e9

lvhan028 approved these changes Jun 20, 2025

View reviewed changes

lvhan028 merged commit 6cc314a into InternLM:main Jun 20, 2025
5 checks passed

Risc-lt mentioned this pull request Jul 4, 2025

docs: add support for LMDeploy kvcache-ai/Mooncake#592

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Mooncake migration backend for PD disaggregation#3620

Support Mooncake migration backend for PD disaggregation#3620
lvhan028 merged 14 commits into
InternLM:mainfrom
Risc-lt:feat/mooncake

Risc-lt commented Jun 7, 2025 •

edited

Loading

Uh oh!

stmatengss commented Jun 8, 2025

Uh oh!

lvhan028 commented Jun 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lvhan028 Jun 11, 2025

Uh oh!

stmatengss Jun 12, 2025

Uh oh!

lvhan028 Jun 11, 2025

Uh oh!

Uh oh!

Risc-lt commented Jun 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Risc-lt commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PD-Disaggregated KVCache Transfer Pipeline with Mooncake

Architecture Overview

Interfaces

Control Plane

Workflow

Current Status

Next Steps

How to Build

How to Run

Start Proxy

Start Prefill Engine

Start Decode Engine

Client Side

Uh oh!

stmatengss commented Jun 8, 2025

Uh oh!

lvhan028 commented Jun 8, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lvhan028 Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

stmatengss Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

lvhan028 Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Risc-lt commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Risc-lt commented Jun 7, 2025 •

edited

Loading

Risc-lt commented Jun 14, 2025 •

edited

Loading