[None][chore] Include layer_idx in MoE backend fallback warnings#13409
Conversation
|
/bot run |
📝 WalkthroughWalkthroughThe changes propagate layer context through MoE backend selection by adding an optional Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tensorrt_llm/_torch/modules/fused_moe/create_moe.py (1)
25-27: Consider adding a small regression test for warning prefix behavior.A targeted test for
get_moe_cls(...)fallback paths would lock in the[layer_idx=N]formatting when provided and unchanged text when omitted.Also applies to: 32-32, 44-45, 53-54, 62-63, 77-78
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py` around lines 25 - 27, Add a small regression test that calls get_moe_cls(...) through the fallback paths and asserts the emitted warning text includes the layer prefix "[layer_idx=N]" when a layer_idx is passed and that the warning text is unchanged (no prefix) when layer_idx is omitted; use pytest.warns or warnings.catch_warnings to capture the warning message and check the exact string formatting for both cases to lock in the intended behavior of get_moe_cls.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py`:
- Around line 25-27: Add a small regression test that calls get_moe_cls(...)
through the fallback paths and asserts the emitted warning text includes the
layer prefix "[layer_idx=N]" when a layer_idx is passed and that the warning
text is unchanged (no prefix) when layer_idx is omitted; use pytest.warns or
warnings.catch_warnings to capture the warning message and check the exact
string formatting for both cases to lock in the intended behavior of
get_moe_cls.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cc287ab8-5a51-4e52-b2ee-886ad21cfcf2
📒 Files selected for processing (3)
tensorrt_llm/_torch/models/modeling_qwen3_moe.pytensorrt_llm/_torch/modules/fused_moe/configurable_moe.pytensorrt_llm/_torch/modules/fused_moe/create_moe.py
|
PR_Github #45335 [ run ] triggered by Bot. Commit: |
|
PR_Github #45335 [ run ] completed with state
|
c5243a4 to
13e1228
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #45640 [ run ] triggered by Bot. Commit: |
|
PR_Github #45640 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #45916 [ run ] triggered by Bot. Commit: |
|
PR_Github #45916 [ run ] completed with state |
d72670c to
2b53dc4
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #48813 [ run ] triggered by Bot. Commit: |
|
PR_Github #48813 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #49063 [ run ] triggered by Bot. Commit: |
|
PR_Github #49063 [ run ] completed with state
|
2b53dc4 to
a0c005e
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #49808 [ run ] triggered by Bot. Commit: |
|
PR_Github #49808 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #50121 [ run ] triggered by Bot. Commit: |
|
PR_Github #50121 [ run ] completed with state
|
When the requested MoE backend (CuteDSL, DenseGEMM, TRTLLMGen) cannot be used for a given layer's quant_config and falls back to CutlassFusedMoE, the warning did not say which layer caused it. This made it impossible to tell whether the fallback was intentional (e.g. MTP layer excluded from quantization) or a real regression when the same message repeats once per rank/layer. Thread layer_idx through get_moe_cls and prefix the four fallback warnings with [layer_idx=N] when known. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: Zhenhuan Chen <[email protected]>
a0c005e to
43df86e
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #50173 [ run ] triggered by Bot. Commit: |
|
PR_Github #50173 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #50281 [ run ] triggered by Bot. Commit: |
|
PR_Github #50281 [ run ] completed with state |
…DIA#13409) Signed-off-by: Zhenhuan Chen <[email protected]>
Summary
quant_configand falls back toCutlassFusedMoE, the existing warning does not say which layer triggered it.layer_idxthroughget_moe_clsand prefixes the four fallbacklogger.warningcalls with[layer_idx=N]when known. No behavior change.Test plan
[layer_idx=N].[layer_idx=N]prefix.🤖 Generated with Claude Code
Summary by CodeRabbit