[None][chore] Include layer_idx in MoE backend fallback warnings by dc3671 · Pull Request #13409 · NVIDIA/TensorRT-LLM

dc3671 · 2026-04-24T05:41:03Z

Summary

When the requested MoE backend (CuteDSL, DenseGEMM, TRTLLMGen) cannot be used for a given layer's quant_config and falls back to CutlassFusedMoE, the existing warning does not say which layer triggered it.
In mixed-quant checkpoints (e.g. DeepSeek V3.2 FP4 where the MTP layer is excluded from quantization), this produces identical-looking warnings repeated once per rank, making it impossible to tell whether the fallback is an expected per-layer decision or a real regression.
Threads layer_idx through get_moe_cls and prefixes the four fallback logger.warning calls with [layer_idx=N] when known. No behavior change.

Test plan

Run any MoE model where the CuteDSL/DenseGEMM/TRTLLMGen backend is requested but one or more layers fall back (e.g. DS-V3.2 FP4 with MTP + CUTEDSL backend) and confirm the warnings now include [layer_idx=N].
Run a pure homogeneous-quant model and confirm output is unchanged apart from the [layer_idx=N] prefix.

🤖 Generated with Claude Code

Summary by CodeRabbit

Improvements
- Enhanced MoE (Mixture of Experts) backend initialization with per-layer context for more precise backend class selection.
- Improved diagnostic and warning messages with layer-specific identifiers during backend resolution, providing clearer feedback in fallback scenarios.

dc3671 · 2026-04-24T05:42:36Z

/bot run

coderabbitai · 2026-04-24T05:44:45Z

📝 Walkthrough

Walkthrough

The changes propagate layer context through MoE backend selection by adding an optional layer_idx parameter to the get_moe_cls() function. Call sites in model and configurable MoE components now pass this parameter, enabling layer-aware log messaging during backend selection and fallback operations.

Changes

Cohort / File(s)	Summary
MoE Backend Selection `tensorrt_llm/_torch/modules/fused_moe/create_moe.py`	Added optional `layer_idx` parameter to `get_moe_cls()` function; uses it to prefix backend fallback/warning log messages with layer identifier.
MoE Initialization Call Sites `tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py`, `tensorrt_llm/_torch/models/modeling_qwen3_moe.py`	Updated calls to `get_moe_cls()` to pass `layer_idx` argument from existing context, enabling per-layer backend resolution.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description clearly explains the problem (repeated indistinguishable warnings in mixed-quant models), the solution (threading layer_idx through get_moe_cls), and the test plan. However, the required template sections (Test Coverage and PR Checklist) are missing.	Add the missing 'Test Coverage' section with explicit test cases, and complete the 'PR Checklist' by checking relevant items and confirming compliance with coding guidelines and test requirements.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: adding layer_idx context to MoE backend fallback warnings, which matches the primary modification across all three files.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tensorrt_llm/_torch/modules/fused_moe/create_moe.py (1)
25-27: Consider adding a small regression test for warning prefix behavior.

A targeted test for get_moe_cls(...) fallback paths would lock in the [layer_idx=N] formatting when provided and unchanged text when omitted.

Also applies to: 32-32, 44-45, 53-54, 62-63, 77-78
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py` around lines 25 - 27,
Add a small regression test that calls get_moe_cls(...) through the fallback
paths and asserts the emitted warning text includes the layer prefix
"[layer_idx=N]" when a layer_idx is passed and that the warning text is
unchanged (no prefix) when layer_idx is omitted; use pytest.warns or
warnings.catch_warnings to capture the warning message and check the exact
string formatting for both cases to lock in the intended behavior of
get_moe_cls.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tensorrt_llm/_torch/modules/fused_moe/create_moe.py`:
- Around line 25-27: Add a small regression test that calls get_moe_cls(...)
through the fallback paths and asserts the emitted warning text includes the
layer prefix "[layer_idx=N]" when a layer_idx is passed and that the warning
text is unchanged (no prefix) when layer_idx is omitted; use pytest.warns or
warnings.catch_warnings to capture the warning message and check the exact
string formatting for both cases to lock in the intended behavior of
get_moe_cls.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cc287ab8-5a51-4e52-b2ee-886ad21cfcf2

📥 Commits

Reviewing files that changed from the base of the PR and between 8caa274 and c5243a4.

📒 Files selected for processing (3)

tensorrt_llm/_torch/models/modeling_qwen3_moe.py
tensorrt_llm/_torch/modules/fused_moe/configurable_moe.py
tensorrt_llm/_torch/modules/fused_moe/create_moe.py

tensorrt-cicd · 2026-04-24T05:50:08Z

PR_Github #45335 [ run ] triggered by Bot. Commit: c5243a4 Link to invocation

tensorrt-cicd · 2026-04-24T11:40:06Z

PR_Github #45335 [ run ] completed with state SUCCESS. Commit: c5243a4
/LLM/main/L0_MergeRequest_PR pipeline #35584 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

dc3671 · 2026-04-27T03:33:39Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-27T03:39:37Z

PR_Github #45640 [ run ] triggered by Bot. Commit: 13e1228 Link to invocation

tensorrt-cicd · 2026-04-27T14:14:29Z

PR_Github #45640 [ run ] completed with state FAILURE. Commit: 13e1228
/LLM/main/L0_MergeRequest_PR pipeline #35853 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

dc3671 · 2026-04-28T09:41:55Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-04-28T09:47:46Z

PR_Github #45916 [ run ] triggered by Bot. Commit: 13e1228 Link to invocation

tensorrt-cicd · 2026-04-29T09:48:32Z

PR_Github #45916 [ run ] completed with state ABORTED. Commit: 13e1228

Link to invocation

dc3671 · 2026-05-18T02:51:36Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-18T02:57:33Z

PR_Github #48813 [ run ] triggered by Bot. Commit: 2b53dc4 Link to invocation

tensorrt-cicd · 2026-05-18T17:26:58Z

PR_Github #48813 [ run ] completed with state SUCCESS. Commit: 2b53dc4
/LLM/main/L0_MergeRequest_PR pipeline #38575 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

dc3671 · 2026-05-19T03:12:36Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-19T03:18:21Z

PR_Github #49063 [ run ] triggered by Bot. Commit: 2b53dc4 Link to invocation

tensorrt-cicd · 2026-05-19T11:43:41Z

PR_Github #49063 [ run ] completed with state SUCCESS. Commit: 2b53dc4
/LLM/main/L0_MergeRequest_PR pipeline #38793 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

dc3671 · 2026-05-22T01:36:41Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-22T01:46:55Z

PR_Github #49808 [ run ] triggered by Bot. Commit: a0c005e Link to invocation

tensorrt-cicd · 2026-05-22T08:35:23Z

PR_Github #49808 [ run ] completed with state SUCCESS. Commit: a0c005e
/LLM/main/L0_MergeRequest_PR pipeline #39395 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

dc3671 · 2026-05-25T01:37:50Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-25T01:44:01Z

PR_Github #50121 [ run ] triggered by Bot. Commit: a0c005e Link to invocation

tensorrt-cicd · 2026-05-25T05:32:04Z

PR_Github #50121 [ run ] completed with state SUCCESS. Commit: a0c005e
/LLM/main/L0_MergeRequest_PR pipeline #39674 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

When the requested MoE backend (CuteDSL, DenseGEMM, TRTLLMGen) cannot be used for a given layer's quant_config and falls back to CutlassFusedMoE, the warning did not say which layer caused it. This made it impossible to tell whether the fallback was intentional (e.g. MTP layer excluded from quantization) or a real regression when the same message repeats once per rank/layer. Thread layer_idx through get_moe_cls and prefix the four fallback warnings with [layer_idx=N] when known. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> Signed-off-by: Zhenhuan Chen <[email protected]>

dc3671 · 2026-05-25T07:37:28Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-25T07:43:11Z

PR_Github #50173 [ run ] triggered by Bot. Commit: 43df86e Link to invocation

tensorrt-cicd · 2026-05-25T15:17:03Z

PR_Github #50173 [ run ] completed with state SUCCESS. Commit: 43df86e
/LLM/main/L0_MergeRequest_PR pipeline #39717 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

dc3671 · 2026-05-26T05:20:11Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-26T05:25:41Z

PR_Github #50281 [ run ] triggered by Bot. Commit: 43df86e Link to invocation

tensorrt-cicd · 2026-05-26T06:18:27Z

PR_Github #50281 [ run ] completed with state SUCCESS. Commit: 43df86e
/LLM/main/L0_MergeRequest_PR pipeline #39811 completed with status: 'SUCCESS'

CI Report

Link to invocation

…DIA#13409) Signed-off-by: Zhenhuan Chen <[email protected]>

dc3671 requested review from a team as code owners April 24, 2026 05:41

dc3671 requested review from 2ez4bz, dongjiyingdjy and yuxianq April 24, 2026 05:41

github-actions Bot assigned dc3671 Apr 24, 2026

dc3671 requested a review from xxi-nv April 24, 2026 05:42

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

yuxianq approved these changes Apr 24, 2026

View reviewed changes

dc3671 force-pushed the zhenhuanc/moe-fallback-warning-layer-idx branch from c5243a4 to 13e1228 Compare April 27, 2026 03:33

dc3671 force-pushed the zhenhuanc/moe-fallback-warning-layer-idx branch 2 times, most recently from d72670c to 2b53dc4 Compare May 18, 2026 02:50

xxi-nv approved these changes May 18, 2026

View reviewed changes

dc3671 force-pushed the zhenhuanc/moe-fallback-warning-layer-idx branch from 2b53dc4 to a0c005e Compare May 22, 2026 01:36

dc3671 force-pushed the zhenhuanc/moe-fallback-warning-layer-idx branch from a0c005e to 43df86e Compare May 25, 2026 06:10

litaotju approved these changes May 26, 2026

View reviewed changes

byshiue approved these changes May 26, 2026

View reviewed changes

dc3671 merged commit 8f052f4 into NVIDIA:main May 26, 2026
7 checks passed

bmarimuthu-nv pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request May 28, 2026

[None][chore] Include layer_idx in MoE backend fallback warnings (NVI…

0afe64e

…DIA#13409) Signed-off-by: Zhenhuan Chen <[email protected]>

Conversation

dc3671 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

dc3671 commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

tensorrt-cicd commented Apr 24, 2026

Uh oh!

dc3671 commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

tensorrt-cicd commented Apr 27, 2026

Uh oh!

dc3671 commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 28, 2026

Uh oh!

tensorrt-cicd commented Apr 29, 2026

Uh oh!

dc3671 commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

tensorrt-cicd commented May 18, 2026

Uh oh!

dc3671 commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

tensorrt-cicd commented May 19, 2026

Uh oh!

dc3671 commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

tensorrt-cicd commented May 22, 2026

Uh oh!

dc3671 commented May 25, 2026

Uh oh!

tensorrt-cicd commented May 25, 2026

Uh oh!

tensorrt-cicd commented May 25, 2026

Uh oh!

dc3671 commented May 25, 2026

Uh oh!

tensorrt-cicd commented May 25, 2026

Uh oh!

tensorrt-cicd commented May 25, 2026

Uh oh!

dc3671 commented May 26, 2026

Uh oh!

tensorrt-cicd commented May 26, 2026

Uh oh!

tensorrt-cicd commented May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

dc3671 commented Apr 24, 2026 •

edited

Loading

coderabbitai Bot commented Apr 24, 2026 •

edited

Loading