[https://nvbugs/5991576][fix] fix dummy request crash with PP + ADP + disagg + block reuse by Tabrizian · Pull Request #12403 · NVIDIA/TensorRT-LLM

Tabrizian · 2026-03-20T07:24:06Z

Summary by CodeRabbit

Bug Fixes
- Fixed pipeline-parallel termination coordination to properly skip the termination handler for dummy requests, preventing unnecessary handler invocations during cleanup.
Tests
- Added new integration test configuration and corresponding test function for validating distributed inference scenarios with advanced parallelism configurations and speculative decoding.

Description

When using pipeline parallelism (PP) on the context server in disaggregated serving with attention DP (ADP) and KV cache block reuse enabled, the system crashes with:

RuntimeError: Assertion failed: emplaceDone (kvCacheManager.cpp:2497)

Root cause: The attention DP dummy request (hardcoded request_id=0) is routed through DisaggPPTerminationHandler, which delays sequence removal via a ring protocol across PP ranks. Since the dummy ID is reused every iteration by _pad_attention_dp_dummy_request, the delayed removal causes a collision in the KV cache manager's mSequences map when the dummy is re-added.

Fix: Bypass DisaggPPTerminationHandler for dummy requests since they don't participate in disagg KV cache transfers and must be terminated immediately.

Test Coverage

New integration test: test_disaggregated_deepseek_v3_lite_fp8_ctxtp2ep2pp2_gentp4_one_mtp_block_reuse
- CTX server: TP=2, PP=2, EP=2, ADP, block reuse enabled, MTP
- GEN server: TP=4, EP=4, ADP, block reuse enabled, MTP
- Model: DeepSeek-V3-Lite/fp8
- Requires 8 GPUs (Hopper+)
Verified on EOS (8x H100):
- Without fix: Test fails with emplaceDone assertion (timeout after 5 min)
- With fix: Test passes in ~3 min

PR Checklist

Please check this after reviewing the above items as appropriate for this PR.

Tabrizian · 2026-03-20T07:27:59Z

/bot run --disable-fail-fast

coderabbitai · 2026-03-20T07:28:27Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 34df4c98-7ab4-4ee6-854d-2aea236485aa

📥 Commits

Reviewing files that changed from the base of the PR and between 4197e19 and 2d80cc5.

📒 Files selected for processing (3)

tensorrt_llm/_torch/pyexecutor/py_executor.py
tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2ep2pp2_gentp4_deepseek_v3_lite_one_mtp_block_reuse.yaml
tests/integration/defs/disaggregated/test_disaggregated.py

📝 Walkthrough

Walkthrough

The PR refines disagreggated pipeline parallelism termination logic by conditionally restricting the PP termination handler to non-dummy requests, while adding a new integration test configuration for DeepSeek-V3-Lite with specific parallelism and speculative decoding settings, along with the corresponding test function.

Changes

Cohort / File(s)	Summary
Disaggregated PP Termination Logic `tensorrt_llm/_torch/pyexecutor/py_executor.py`	Modified `_terminate_request` to invoke the PP termination handler only for non-dummy requests, allowing dummy requests to bypass PP-termination coordination.
Integration Test Configuration & Test `tests/integration/defs/disaggregated/test_configs/disagg_config_ctxtp2ep2pp2_gentp4_deepseek_v3_lite_one_mtp_block_reuse.yaml`, `tests/integration/defs/disaggregated/test_disaggregated.py`	Added new YAML configuration for distributed DeepSeek-V3-Lite inference with tensor/pipeline/MoE parallelism and MTP speculative decoding, plus corresponding pytest test function with model symlink creation and environment setup.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the core fix: preventing dummy request crashes in disaggregated serving with PP+ADP+block reuse.
Description check	✅ Passed	The description provides clear root cause analysis, solution, test coverage with verification results, and follows the required template structure with all necessary sections completed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can enforce grammar and style rules using `languagetool`.

Configure the reviews.tools.languagetool setting to enable/disable rules and categories. Refer to the LanguageTool Community to learn more.

tensorrt-cicd · 2026-03-20T07:34:29Z

PR_Github #39708 [ run ] triggered by Bot. Commit: c16b98b Link to invocation

… disagg + block reuse The attention DP dummy request (hardcoded request_id=0) was being routed through DisaggPPTerminationHandler, which delays sequence removal via a ring protocol across PP ranks. Since the dummy ID is reused every iteration, the delayed removal caused a collision in the KV cache manager's mSequences map, triggering an emplaceDone assertion failure. Fix: bypass DisaggPPTerminationHandler for dummy requests since they don't participate in disagg KV cache transfers. Add regression test with CTX: TP=2, PP=2, EP=2, ADP + block reuse and GEN: TP=4, EP=4, ADP using DeepSeek-V3-Lite with MTP. Signed-off-by: Iman Tabrizian <[email protected]>

Tabrizian · 2026-03-20T07:43:49Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-20T07:50:08Z

PR_Github #39711 [ run ] triggered by Bot. Commit: 497b837 Link to invocation

tensorrt-cicd · 2026-03-20T15:06:40Z

PR_Github #39711 [ run ] completed with state SUCCESS. Commit: 497b837
/LLM/main/L0_MergeRequest_PR pipeline #30907 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Tabrizian · 2026-03-20T15:36:53Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-20T15:43:26Z

PR_Github #39746 [ run ] triggered by Bot. Commit: 497b837 Link to invocation

tensorrt-cicd · 2026-03-20T16:53:21Z

PR_Github #39746 [ run ] completed with state SUCCESS. Commit: 497b837
/LLM/main/L0_MergeRequest_PR pipeline #30941 completed with status: 'SUCCESS'

CI Report

Link to invocation

xinhe-nv · 2026-03-21T00:54:03Z

  - accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_bf16_4gpu[tp4ep4_cudagraph_overlap_adp_on]
  - disaggregated/test_disaggregated.py::test_disaggregated_ctxtp2pp2_gentp2pp2[TinyLlama-1.1B-Chat-v1.0]
  - disaggregated/test_disaggregated.py::test_disaggregated_ctxpp4_genpp4[TinyLlama-1.1B-Chat-v1.0]
+  - disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_fp8_ctxtp2ep2pp2_gentp4_one_mtp_block_reuse[DeepSeek-V3-Lite-fp8]


please add this case into qa test list, too

… disagg + block reuse (NVIDIA#12403)

Tabrizian requested review from a team as code owners March 20, 2026 07:24

Tabrizian requested a review from HuiGao-NV March 20, 2026 07:24

github-actions Bot assigned Tabrizian Mar 20, 2026

Tabrizian force-pushed the fix/nvbug-5991576-dummy-request-crash branch from 2d80cc5 to 4de6de9 Compare March 20, 2026 07:27

Tabrizian requested review from pcastonguay and reasonsolo March 20, 2026 07:28

Tabrizian force-pushed the fix/nvbug-5991576-dummy-request-crash branch from 4de6de9 to c16b98b Compare March 20, 2026 07:31

Tabrizian force-pushed the fix/nvbug-5991576-dummy-request-crash branch from c16b98b to 497b837 Compare March 20, 2026 07:43

pcastonguay approved these changes Mar 20, 2026

View reviewed changes

Tabrizian requested a review from tburt-nv March 20, 2026 19:30

tburt-nv approved these changes Mar 21, 2026

View reviewed changes

Tabrizian merged commit f31b45b into NVIDIA:main Mar 21, 2026
5 checks passed

xinhe-nv reviewed Mar 21, 2026

View reviewed changes

tanmayv25 mentioned this pull request Mar 30, 2026

chore: Advance to Tensorrt-LLM 1.3.0rc9 ai-dynamo/dynamo#7635

Merged

longcheng-nv pushed a commit to longcheng-nv/TensorRT-LLM that referenced this pull request Mar 31, 2026

[https://nvbugs/5991576][fix] fix dummy request crash with PP + ADP +…

8da7ae8

… disagg + block reuse (NVIDIA#12403)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/5991576][fix] fix dummy request crash with PP + ADP + disagg + block reuse#12403

[https://nvbugs/5991576][fix] fix dummy request crash with PP + ADP + disagg + block reuse#12403
Tabrizian merged 1 commit into
NVIDIA:mainfrom
Tabrizian:fix/nvbug-5991576-dummy-request-crash

Tabrizian commented Mar 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

Tabrizian commented Mar 20, 2026

Uh oh!

coderabbitai Bot commented Mar 20, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

Tabrizian commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

Tabrizian commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

Uh oh!

xinhe-nv Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Tabrizian commented Mar 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

Uh oh!

Tabrizian commented Mar 20, 2026

Uh oh!

coderabbitai Bot commented Mar 20, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

Tabrizian commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

Tabrizian commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

tensorrt-cicd commented Mar 20, 2026

Uh oh!

Uh oh!

xinhe-nv Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Tabrizian commented Mar 20, 2026 •

edited by coderabbitai Bot

Loading