[https://nvbugs/5991576][fix] fix dummy request crash with PP + ADP + disagg + block reuse#12403
Conversation
2d80cc5 to
4de6de9
Compare
|
/bot run --disable-fail-fast |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe PR refines disagreggated pipeline parallelism termination logic by conditionally restricting the PP termination handler to non-dummy requests, while adding a new integration test configuration for DeepSeek-V3-Lite with specific parallelism and speculative decoding settings, along with the corresponding test function. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can enforce grammar and style rules using `languagetool`.Configure the |
4de6de9 to
c16b98b
Compare
|
PR_Github #39708 [ run ] triggered by Bot. Commit: |
… disagg + block reuse The attention DP dummy request (hardcoded request_id=0) was being routed through DisaggPPTerminationHandler, which delays sequence removal via a ring protocol across PP ranks. Since the dummy ID is reused every iteration, the delayed removal caused a collision in the KV cache manager's mSequences map, triggering an emplaceDone assertion failure. Fix: bypass DisaggPPTerminationHandler for dummy requests since they don't participate in disagg KV cache transfers. Add regression test with CTX: TP=2, PP=2, EP=2, ADP + block reuse and GEN: TP=4, EP=4, ADP using DeepSeek-V3-Lite with MTP. Signed-off-by: Iman Tabrizian <[email protected]>
|
/bot run --disable-fail-fast |
c16b98b to
497b837
Compare
|
PR_Github #39711 [ run ] triggered by Bot. Commit: |
|
PR_Github #39711 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #39746 [ run ] triggered by Bot. Commit: |
|
PR_Github #39746 [ run ] completed with state |
| - accuracy/test_llm_api_pytorch.py::TestQwen3NextInstruct::test_bf16_4gpu[tp4ep4_cudagraph_overlap_adp_on] | ||
| - disaggregated/test_disaggregated.py::test_disaggregated_ctxtp2pp2_gentp2pp2[TinyLlama-1.1B-Chat-v1.0] | ||
| - disaggregated/test_disaggregated.py::test_disaggregated_ctxpp4_genpp4[TinyLlama-1.1B-Chat-v1.0] | ||
| - disaggregated/test_disaggregated.py::test_disaggregated_deepseek_v3_lite_fp8_ctxtp2ep2pp2_gentp4_one_mtp_block_reuse[DeepSeek-V3-Lite-fp8] |
There was a problem hiding this comment.
please add this case into qa test list, too
Summary by CodeRabbit
Bug Fixes
Tests
Description
When using pipeline parallelism (PP) on the context server in disaggregated serving with attention DP (ADP) and KV cache block reuse enabled, the system crashes with:
Root cause: The attention DP dummy request (hardcoded
request_id=0) is routed throughDisaggPPTerminationHandler, which delays sequence removal via a ring protocol across PP ranks. Since the dummy ID is reused every iteration by_pad_attention_dp_dummy_request, the delayed removal causes a collision in the KV cache manager'smSequencesmap when the dummy is re-added.Fix: Bypass
DisaggPPTerminationHandlerfor dummy requests since they don't participate in disagg KV cache transfers and must be terminated immediately.Test Coverage
test_disaggregated_deepseek_v3_lite_fp8_ctxtp2ep2pp2_gentp4_one_mtp_block_reuseemplaceDoneassertion (timeout after 5 min)PR Checklist