[None][fix] Handle unset attention_dp_relax in ADP routers by peihu-nv · Pull Request #14276 · NVIDIA/TensorRT-LLM

peihu-nv · 2026-05-18T21:26:51Z

Summary by CodeRabbit

Bug Fixes
- Fixed request scheduling to correctly treat unset parameters as permitting relaxed routing mode, ensuring proper request distribution under capacity constraints.
Tests
- Added unit tests verifying request routing behavior with different parameter configurations.

Description

Fix an ADP router crash when a request has SchedulingParams, but attention_dp_relax is unset and causing NoneType comparison. Hit this issue when doing perfwork for DS V4 using inferenceX benchmarking script but both main and DS V4 branch will have this issue.

SchedulingParams.attention_dp_relax is typed as Optional[bool] and defaults to None.

The ADP routers were using this field directly as the sort key:

return scheduling_params.attention_dp_relax

This was previously mostly hidden for OpenAI serving because requests often had no py_scheduling_params, so the router took the safe path:

  if scheduling_params is None:
      return True

After a31d650 / #11173, OpenAI chat and harmony paths create SchedulingParams(agent_hierarchy=...) and pass it into generation. That means scheduling_params is present, but attention_dp_relax remains None. With multiple such requests, Python sorting can compare None values and fail with:
TypeError: '<' not supported between instances of 'NoneType' and 'NoneType'

Fix:
Treat only an explicit False as strict and keep None aligned with the existing missing-scheduling-params behavior. Apply the same logic to both DefaultADPRouter and KVCacheAwareADPRouter, with regression coverage for both paths.

Test Coverage

Added regression coverage for both ADP router implementations where one request has attention_dp_relax=None and another has attention_dp_relax=False.

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

OpenAI requests can carry a SchedulingParams object while leaving attention_dp_relax unset. The ADP routers used that field directly as the sort key, which makes mixed None and False requests fail when Python tries to compare None with bool values. Treat only an explicit False as strict and keep None aligned with the existing missing-scheduling-params behavior. Apply the same logic to both DefaultADPRouter and KVCacheAwareADPRouter, with regression coverage for both paths. Signed-off-by: peihengh <[email protected]>

peihu-nv · 2026-05-18T21:28:36Z

/bot run

coderabbitai · 2026-05-18T21:29:43Z

📝 Walkthrough

Walkthrough

The PR normalizes the attention_dp_relax scheduling parameter to boolean across two router scheduler classes. Both DefaultADPRouter and KVCacheAwareADPRouter now treat None and any non-False value as "relaxed" for request ordering. New unit tests verify this behavior under capacity constraints.

Changes

Attention DP Relax Boolean Normalization

Layer / File(s)	Summary
Relax-value boolean normalization in router schedulers `tensorrt_llm/_torch/pyexecutor/scheduler/adp_router.py`	`DefaultADPRouter` and `KVCacheAwareADPRouter` both update their `get_relax_value` methods to return `attention_dp_relax is not False` instead of the raw value, normalizing the relax flag to boolean.
Test coverage for None relax-value behavior `tests/unittest/_torch/executor/test_adp_router.py`	New test cases verify that `attention_dp_relax=None` is treated as relaxed (non-strict) when scheduling under capacity constraints, for both `DefaultADPRouter` and `KVCacheAwareADPRouter`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely identifies the main change: handling unset (None) attention_dp_relax in ADP routers, which is the core fix in this PR.
Description check	✅ Passed	The PR description is comprehensive and complete. It explains the issue, root cause, solution, and test coverage. All required template sections are filled out and the checklist is confirmed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/unittest/_torch/executor/test_adp_router.py (1)
206-221: QA list updates are unnecessary for this PR.

This change is unit-scope regression coverage plus router logic; no integration/perf QA test-list entry updates are needed.

As per coding guidelines, “If the PR only touches unittest/ or narrow unit scope, say explicitly whether QA list updates are unnecessary or optional.”

Also applies to: 945-956
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unittest/_torch/executor/test_adp_router.py` around lines 206 - 221,
Add an explicit note to the PR description and commit message stating that QA
list updates are unnecessary because this change only touches unit tests (e.g.,
the test_none_attention_dp_relax_is_relaxed test) and router logic in
DefaultADPRouter/RankState/_make_request_item/route_requests; also apply the
same explicit note to other similar test-only changes in this PR so reviewers
know QA list edits are not required.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/unittest/_torch/executor/test_adp_router.py`:
- Around line 206-221: Add an explicit note to the PR description and commit
message stating that QA list updates are unnecessary because this change only
touches unit tests (e.g., the test_none_attention_dp_relax_is_relaxed test) and
router logic in DefaultADPRouter/RankState/_make_request_item/route_requests;
also apply the same explicit note to other similar test-only changes in this PR
so reviewers know QA list edits are not required.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7634e2e2-905e-4070-ad26-e1a91c0b4aa7

📥 Commits

Reviewing files that changed from the base of the PR and between 355ba94 and 4fca6c1.

📒 Files selected for processing (2)

tensorrt_llm/_torch/pyexecutor/scheduler/adp_router.py
tests/unittest/_torch/executor/test_adp_router.py

tensorrt-cicd · 2026-05-18T21:34:05Z

PR_Github #48982 [ run ] triggered by Bot. Commit: 4fca6c1 Link to invocation

tensorrt-cicd · 2026-05-19T09:48:11Z

PR_Github #48982 [ run ] completed with state SUCCESS. Commit: 4fca6c1
/LLM/main/L0_MergeRequest_PR pipeline #38727 completed with status: 'SUCCESS'

CI Report

Link to invocation

Make the relaxed ADP routing default explicit on SchedulingParams instead of encoding it in router sort-key logic. This keeps requests that only carry agent hierarchy scheduling params aligned with the existing missing-scheduling-params behavior. Revert the routers to direct attention_dp_relax field reads and update the regression tests to cover the explicit SchedulingParams default for both DefaultADPRouter and KVCacheAwareADPRouter. Signed-off-by: peihengh <[email protected]>

peihu-nv · 2026-05-19T17:37:28Z

/bot run

tensorrt-cicd · 2026-05-19T17:43:19Z

PR_Github #49242 [ run ] triggered by Bot. Commit: 27a14c4 Link to invocation

tensorrt-cicd · 2026-05-19T21:20:25Z

PR_Github #49242 [ run ] completed with state SUCCESS. Commit: 27a14c4
/LLM/main/L0_MergeRequest_PR pipeline #38913 completed with status: 'SUCCESS'

CI Report

Link to invocation

… ADP routers (#14431) Signed-off-by: peihengh <[email protected]> Signed-off-by: Lance Liao <[email protected]> Co-authored-by: peihengh <[email protected]>

) Signed-off-by: peihengh <[email protected]>

peihu-nv requested a review from a team as a code owner May 18, 2026 21:26

peihu-nv requested a review from joyang-nv May 18, 2026 21:26

github-actions Bot assigned peihu-nv May 18, 2026

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

pcastonguay requested review from Tabrizian and mikeiovine May 19, 2026 16:26

mikeiovine approved these changes May 19, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/pyexecutor/scheduler/adp_router.py Outdated

Tabrizian approved these changes May 19, 2026

View reviewed changes

peihu-nv merged commit 3de344d into NVIDIA:main May 20, 2026
7 checks passed

This was referenced May 20, 2026

[None][fix] Handle None attention_dp_relax in ADP router sort key #14343

Closed

[None][feat] Enable agent serving evaluation via trace-replay on Scaffolding #14397

Open

lancelly mentioned this pull request May 22, 2026

[None][fix] Cherry-pick PR #14276: Handle unset attention_dp_relax in ADP routers #14431

Merged

xxi-nv pushed a commit to xxi-nv/TensorRT-LLM that referenced this pull request May 22, 2026

[None][fix] Handle unset attention_dp_relax in ADP routers (NVIDIA#14276

44911a7

) Signed-off-by: peihengh <[email protected]>

bmarimuthu-nv pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request May 28, 2026

[None][fix] Handle unset attention_dp_relax in ADP routers (NVIDIA#14276

50da2ee

) Signed-off-by: peihengh <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Handle unset attention_dp_relax in ADP routers#14276

[None][fix] Handle unset attention_dp_relax in ADP routers#14276
peihu-nv merged 2 commits into
NVIDIA:mainfrom
peihu-nv:fix/adp-relax-none

peihu-nv commented May 18, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

peihu-nv commented May 18, 2026

Uh oh!

coderabbitai Bot commented May 18, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

Uh oh!

peihu-nv commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

peihu-nv commented May 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

peihu-nv commented May 18, 2026

Uh oh!

coderabbitai Bot commented May 18, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

Uh oh!

peihu-nv commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

peihu-nv commented May 18, 2026 •

edited by coderabbitai Bot

Loading