Skip to content

feat(engine-runtime): support sgl-router pd disaggregation with engine runtime#82

Merged
cheyang merged 8 commits intosgl-project:mainfrom
TrafalgarZZZ:feat/sgl_router_topo
Nov 4, 2025
Merged

feat(engine-runtime): support sgl-router pd disaggregation with engine runtime#82
cheyang merged 8 commits intosgl-project:mainfrom
TrafalgarZZZ:feat/sgl_router_topo

Conversation

@TrafalgarZZZ
Copy link
Copy Markdown
Collaborator

Ⅰ. Motivation

Ⅱ. Modifications

Ⅲ. Does this pull request fix one issue?

NONE

Ⅳ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Ⅴ. Describe how to verify it

VI. Special notes for reviews

Checklist

  • Format your code make fmt.
  • Add unit tests or integration tests.
  • Update the documentation related to the change.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@RongGu RongGu requested a review from Copilot November 3, 2025 10:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements worker registration functionality for SGLang topology clients. The changes add support for worker health checks and automatic registration/unregistration with an SGLang router.

  • Added worker endpoint discovery logic using environment variables and Kubernetes pod information
  • Implemented health check, registration, and unregistration methods in the SGLang topology client
  • Integrated the topology client with the main application with signal handlers for graceful shutdown

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
python/patio/topo/factory.py Updated create_topo_client to accept worker_info parameter
python/patio/topo/client/sgl_topo_client.py Implemented health check, registration, and unregistration logic with endpoint discovery functions
python/patio/topo/client/base_topo_client.py Added abstract wait_engine_ready method to the base client interface
python/patio/requirements.txt Added new dependencies: httpx, requests, pytest, pyyaml
python/patio/app.py Integrated topology client with signal handlers and command-line argument support
python/patio/Dockerfile Removed unnecessary vim and curl installation commands
Comments suppressed due to low confidence (5)

python/patio/app.py:136

  • Variable topo_client is not used.
        topo_client = run_topo_client(args.instance_info)

python/patio/app.py:138

  • Variable topo_client is not used.
        topo_client = None

python/patio/topo/client/sgl_topo_client.py:3

  • Import of 'json' is not used.
import json

python/patio/topo/client/sgl_topo_client.py:8

  • Import of 'ArgumentError' is not used.
from argparse import ArgumentError

python/patio/topo/client/sgl_topo_client.py:9

  • Import of 'InvalidURL' is not used.
from http.client import InvalidURL

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/patio/topo/client/sgl_topo_client.py Outdated
Comment thread python/patio/topo/client/sgl_topo_client.py Outdated
Comment thread python/patio/topo/client/sgl_topo_client.py

def register(self, url: str, worker_info: dict, file_path: Optional[str] = None) -> bool:
"""
worker_dict example:
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation refers to 'worker_dict' but the parameter is named 'worker_info'. Update the documentation to match the parameter name.

Suggested change
worker_dict example:
worker_info example:

Copilot uses AI. Check for mistakes.
Comment thread python/patio/app.py
Comment thread python/patio/topo/client/sgl_topo_client.py Outdated
Comment thread python/patio/topo/client/sgl_topo_client.py Outdated
Comment thread python/patio/app.py Outdated
Comment on lines +116 to +118
# self.worker_endpoint = worker_info_payload["url"] if "url" in worker_info_payload else None
# if self.worker_endpoint is None:
# raise Exception(f"failed to find worker's info, please ensure 'url' is set in worker_dict.")
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# self.worker_endpoint = worker_info_payload["url"] if "url" in worker_info_payload else None
# if self.worker_endpoint is None:
# raise Exception(f"failed to find worker's info, please ensure 'url' is set in worker_dict.")

Copilot uses AI. Check for mistakes.
Comment thread python/patio/app.py
topo_client.unregister()
sys.exit(0)

def run_topo_client(worker_instance_info: str) -> GroupTopoClient:
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.

Copilot uses AI. Check for mistakes.
@RongGu RongGu requested a review from Copilot November 3, 2025 11:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Comments suppressed due to low confidence (1)

python/patio/topo/client/sgl_topo_client.py:117

  • This comment appears to contain commented-out code.
        # if self.worker_endpoint is None:
        #     raise Exception(f"failed to find worker's info, please ensure 'url' is set in worker_dict.")

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

traceback.print_exc()
return False

def register(self, url: str, worker_info: dict, file_path: Optional[str] = None) -> bool:
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 'url' parameter is not used in the method implementation. Instead, the method constructs its own URL from self.worker_endpoint (line 110) and ignores the passed 'url' parameter. Either remove this parameter from the method signature or use it as intended.

Suggested change
def register(self, url: str, worker_info: dict, file_path: Optional[str] = None) -> bool:
def register(self, worker_info: dict, file_path: Optional[str] = None) -> bool:

Copilot uses AI. Check for mistakes.
Comment thread python/patio/topo/client/sgl_topo_client.py Outdated
Comment thread python/patio/topo/client/sgl_topo_client.py Outdated
Comment thread python/patio/topo/client/sgl_topo_client.py Outdated
Comment thread python/patio/app.py Outdated
Comment thread python/patio/app.py Outdated
prometheus-client>=0.11.0
aiohttp>=3.8.0 No newline at end of file
aiohttp>=3.8.0
httpx>=0.23.0
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The httpx library is added as a dependency but is not imported or used anywhere in the codebase. Consider removing it if it's not needed, as only requests is used for HTTP calls.

Suggested change
httpx>=0.23.0

Copilot uses AI. Check for mistakes.
Comment thread python/patio/app.py
topo_client.unregister()
sys.exit(0)

def run_topo_client(worker_instance_info: str) -> GroupTopoClient:
Copy link

Copilot AI Nov 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mixing implicit and explicit returns may indicate an error, as implicit returns always return None.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@cheyang cheyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@cheyang cheyang merged commit 788c2f7 into sgl-project:main Nov 4, 2025
3 checks passed
@cheyang cheyang added this to the v0.5.0 milestone Nov 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants