feat(engine-runtime): support sgl-router pd disaggregation with engine runtime#82
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull Request Overview
This PR implements worker registration functionality for SGLang topology clients. The changes add support for worker health checks and automatic registration/unregistration with an SGLang router.
- Added worker endpoint discovery logic using environment variables and Kubernetes pod information
- Implemented health check, registration, and unregistration methods in the SGLang topology client
- Integrated the topology client with the main application with signal handlers for graceful shutdown
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| python/patio/topo/factory.py | Updated create_topo_client to accept worker_info parameter |
| python/patio/topo/client/sgl_topo_client.py | Implemented health check, registration, and unregistration logic with endpoint discovery functions |
| python/patio/topo/client/base_topo_client.py | Added abstract wait_engine_ready method to the base client interface |
| python/patio/requirements.txt | Added new dependencies: httpx, requests, pytest, pyyaml |
| python/patio/app.py | Integrated topology client with signal handlers and command-line argument support |
| python/patio/Dockerfile | Removed unnecessary vim and curl installation commands |
Comments suppressed due to low confidence (5)
python/patio/app.py:136
- Variable topo_client is not used.
topo_client = run_topo_client(args.instance_info)
python/patio/app.py:138
- Variable topo_client is not used.
topo_client = None
python/patio/topo/client/sgl_topo_client.py:3
- Import of 'json' is not used.
import json
python/patio/topo/client/sgl_topo_client.py:8
- Import of 'ArgumentError' is not used.
from argparse import ArgumentError
python/patio/topo/client/sgl_topo_client.py:9
- Import of 'InvalidURL' is not used.
from http.client import InvalidURL
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| def register(self, url: str, worker_info: dict, file_path: Optional[str] = None) -> bool: | ||
| """ | ||
| worker_dict example: |
There was a problem hiding this comment.
The documentation refers to 'worker_dict' but the parameter is named 'worker_info'. Update the documentation to match the parameter name.
| worker_dict example: | |
| worker_info example: |
| # self.worker_endpoint = worker_info_payload["url"] if "url" in worker_info_payload else None | ||
| # if self.worker_endpoint is None: | ||
| # raise Exception(f"failed to find worker's info, please ensure 'url' is set in worker_dict.") |
There was a problem hiding this comment.
This comment appears to contain commented-out code.
| # self.worker_endpoint = worker_info_payload["url"] if "url" in worker_info_payload else None | |
| # if self.worker_endpoint is None: | |
| # raise Exception(f"failed to find worker's info, please ensure 'url' is set in worker_dict.") |
| topo_client.unregister() | ||
| sys.exit(0) | ||
|
|
||
| def run_topo_client(worker_instance_info: str) -> GroupTopoClient: |
There was a problem hiding this comment.
Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (1)
python/patio/topo/client/sgl_topo_client.py:117
- This comment appears to contain commented-out code.
# if self.worker_endpoint is None:
# raise Exception(f"failed to find worker's info, please ensure 'url' is set in worker_dict.")
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| traceback.print_exc() | ||
| return False | ||
|
|
||
| def register(self, url: str, worker_info: dict, file_path: Optional[str] = None) -> bool: |
There was a problem hiding this comment.
The 'url' parameter is not used in the method implementation. Instead, the method constructs its own URL from self.worker_endpoint (line 110) and ignores the passed 'url' parameter. Either remove this parameter from the method signature or use it as intended.
| def register(self, url: str, worker_info: dict, file_path: Optional[str] = None) -> bool: | |
| def register(self, worker_info: dict, file_path: Optional[str] = None) -> bool: |
| prometheus-client>=0.11.0 | ||
| aiohttp>=3.8.0 No newline at end of file | ||
| aiohttp>=3.8.0 | ||
| httpx>=0.23.0 |
There was a problem hiding this comment.
The httpx library is added as a dependency but is not imported or used anywhere in the codebase. Consider removing it if it's not needed, as only requests is used for HTTP calls.
| httpx>=0.23.0 |
| topo_client.unregister() | ||
| sys.exit(0) | ||
|
|
||
| def run_topo_client(worker_instance_info: str) -> GroupTopoClient: |
There was a problem hiding this comment.
Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.
Ⅰ. Motivation
Ⅱ. Modifications
Ⅲ. Does this pull request fix one issue?
NONE
Ⅳ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.
Ⅴ. Describe how to verify it
VI. Special notes for reviews
Checklist
make fmt.