Support Mooncake migration backend for PD disaggregation#3620
Conversation
|
@Risc-lt You can use tools like ruff or yapf to automatically fix code formatting and linting issues. |
|
The linting issue can be resolved by the following: Make sure that the python version is 3.10 |
| """Initialize p2p connection for this specific link.""" | ||
| # TODO: Support more types of metadata_server | ||
| # e.g. "etcd://192.168.0.137:2379" | ||
| metadata_server = 'P2PHANDSHAKE' |
There was a problem hiding this comment.
what is metadata_server used for?
There was a problem hiding this comment.
There are two modes: (1) 'P2PHANDSHAKE' (a magic string): no metadata server for maintaining connection information, which is intended for small-scale PD disaggregation, and (2) support for etcd/redis/http_server as the centralized server for larger-scale PD disaggregation.
| try: | ||
| from mooncake.engine import TransferEngine | ||
| except ImportError as e: | ||
| raise ImportError('Please install mooncake by following the instructions at ' |
There was a problem hiding this comment.
When passing --migration-backend Mooncake, it's better to raise an import error immediately if Mooncake is not installed during API server launch.
Can we put it in the constructor of MooncakeBackend?
|
Having solved the problems above. NVLink support will be covered in next pr. cc @stmatengss @lvhan028 @JimyMa |
* feat: init mooncake migration backend * fix: mooncake endpoint and connect * fix: add cli option for Mooncake * feat: modify certralized mode into p2p and add endpoint building * fix: transfer failure and rm rdma info and * fix: run pre-commit to lint * fix: remove unnecessary print log * chore: lint code * feat: add async migration mode * chore: modify log printing and class overide * chore: rm info in annotations * fix: migration address key
* feat: init mooncake migration backend * fix: mooncake endpoint and connect * fix: add cli option for Mooncake * feat: modify certralized mode into p2p and add endpoint building * fix: transfer failure and rm rdma info and * fix: run pre-commit to lint * fix: remove unnecessary print log * chore: lint code * feat: add async migration mode * chore: modify log printing and class overide * chore: rm info in annotations * fix: migration address key
PD-Disaggregated KVCache Transfer Pipeline with Mooncake
This PR introduces a new implementation of the Prefill-Decode disaggregated KVCache transfer pipeline with LMDeploy, using native Mooncake components of transfer engine as an option other than
dlslime. The goal is to enable disaggregated prefill/decode workloads across nodes for large-scale LLM inference, inspired by lmdeploy-distserve. #3304 (comment)Architecture Overview
Interfaces
The Mooncake migration backend implementation expose interfaces below:
p2p_initialize: Notify Prefill & Decode Engines to initilize migration backend instance of Mooncake transfer engine.register_memory_region: Register memory region for the connectionendpoint_info: Return local memry pool and endpoint configuartion info.p2p_connect: Recieve endpoint infomation from the other side of connecting nodes.p2p_migrate: Set up conection for prefill-decode nodes and transfer kvcache synchronously in read mode.Control Plane
Proxy server firstly use FastAPI post to send the endpoint info to notify the prefill-decode servers to send their local endpoint info to the other one through TCP socket. After p2p-connection is established, Mooncake migration backend start to transfer kvcache through RDMA link.
Workflow
Current Status
Next Steps
How to Build
pip install mooncake-transfer-engine pip install -v -e .How to Run
Start Proxy
Start Prefill Engine
Start Decode Engine
Client Side