Skip to content

[None][fix] Fix LoRA support for Qwen3 models#12785

Merged
achartier merged 1 commit into
NVIDIA:mainfrom
achartier:qwen3-lora
Apr 8, 2026
Merged

[None][fix] Fix LoRA support for Qwen3 models#12785
achartier merged 1 commit into
NVIDIA:mainfrom
achartier:qwen3-lora

Conversation

@achartier
Copy link
Copy Markdown
Collaborator

@achartier achartier commented Apr 6, 2026

Summary by CodeRabbit

  • Tests

    • Added comprehensive end-to-end test coverage for Qwen3 LoRA functionality across both dense and Mixture-of-Experts model configurations, validating adapter creation, inference execution, and output correctness.
  • Improvements

    • Enhanced internal parameter propagation through model decoder layers for improved extensibility.

Description

  • Qwen3 Dense: pass layer_idx to gatedMLP constructor call
  • Qwen3 MoE: ad **kwargs to decoder layer call
  • Add regression tests for both Qwen3 models

Test Coverage

New tests: tests/unittest/_torch/modules/tests_lora_modules/test_qwen3_sanity.py

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Aurelien Chartier <[email protected]>
@achartier achartier requested review from a team as code owners April 6, 2026 22:35
@achartier
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 6, 2026

📝 Walkthrough

Walkthrough

Updates to Qwen3 model initialization to pass layer index information to GatedMLP components and forward additional keyword arguments through decoder layers. Adds comprehensive LoRA sanity test coverage for both dense and MoE Qwen3 configurations, including adapter creation, inference execution with and without LoRA, and output validation.

Changes

Cohort / File(s) Summary
Qwen3 Model Parameter Passing
tensorrt_llm/_torch/models/modeling_qwen3.py, tensorrt_llm/_torch/models/modeling_qwen3_moe.py
Qwen3DecoderLayer now passes layer_idx to GatedMLP constructor. Qwen3MoEModel.forward forwards **kwargs to decoder layer calls to enable propagation of additional runtime options.
LoRA Sanity Testing
tests/unittest/_torch/modules/tests_lora_modules/test_qwen3_sanity.py
New test module with helper functions for LoRA adapter creation and validation. Implements end-to-end sanity tests for Qwen3 (0.6B) dense and Qwen3-30B-A3B MoE configurations, comparing inference outputs with and without LoRA adapters applied.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: fixing LoRA support for Qwen3 models, which aligns with the changeset's focus on LoRA-related fixes and tests.
Description check ✅ Passed The PR description provides clear bullet points about the changes (layer_idx parameter, kwargs forwarding, test additions) and identifies the new test file, satisfying core requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/unittest/_torch/modules/tests_lora_modules/test_qwen3_sanity.py (1)

128-149: Minor: Inner loop break doesn't exit outer loop.

When any_differ is set in the inner logprobs comparison loop (line 146), the code breaks out of the inner loop but continues iterating through remaining (lora_out, base_out) pairs unnecessarily. This is functionally correct but slightly inefficient.

♻️ Optional: Use labeled loop pattern or helper
 def _assert_lora_changes_output(out_lora, out_base):
     """Assert that LoRA produces at least one different output (tokens or logprobs)."""
     any_differ = False
     for lora_out, base_out in zip(out_lora, out_base, strict=True):
+        if any_differ:
+            break
         lora_ids = lora_out.outputs[0].token_ids
         base_ids = base_out.outputs[0].token_ids
         if lora_ids != base_ids:
             any_differ = True
             break

         # Even if tokens match, logprobs should differ
         lp_lora = lora_out.outputs[0].logprobs
         lp_base = base_out.outputs[0].logprobs
         if lp_lora and lp_base:
             for lp_w, lp_wo in zip(lp_lora, lp_base, strict=True):
                 val_w = next(iter(lp_w.values())).logprob
                 val_wo = next(iter(lp_wo.values())).logprob
                 if abs(val_w - val_wo) > 1e-6:
                     any_differ = True
                     break
+            if any_differ:
+                break

     assert any_differ, "LoRA outputs identical to base model (same tokens AND same logprobs)"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unittest/_torch/modules/tests_lora_modules/test_qwen3_sanity.py` around
lines 128 - 149, The helper _assert_lora_changes_output currently sets
any_differ inside the inner logprobs loop then breaks only that inner loop,
causing the outer for-loop over (lora_out, base_out) to continue unnecessarily;
to fix, make the function return immediately when a difference is detected (or
set any_differ and break the outer loop explicitly) so we stop processing
further pairs as soon as a differing token or logprob is found—update the inner
loop where val_w vs val_wo triggers any_differ to either call return or break
out of both loops (e.g., by returning from _assert_lora_changes_output) to
short-circuit the check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unittest/_torch/modules/tests_lora_modules/test_qwen3_sanity.py`:
- Around line 128-149: The helper _assert_lora_changes_output currently sets
any_differ inside the inner logprobs loop then breaks only that inner loop,
causing the outer for-loop over (lora_out, base_out) to continue unnecessarily;
to fix, make the function return immediately when a difference is detected (or
set any_differ and break the outer loop explicitly) so we stop processing
further pairs as soon as a differing token or logprob is found—update the inner
loop where val_w vs val_wo triggers any_differ to either call return or break
out of both loops (e.g., by returning from _assert_lora_changes_output) to
short-circuit the check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 94e6104d-c733-44c0-8865-a8ce78fa615c

📥 Commits

Reviewing files that changed from the base of the PR and between 2b80f8d and e92e1eb.

📒 Files selected for processing (3)
  • tensorrt_llm/_torch/models/modeling_qwen3.py
  • tensorrt_llm/_torch/models/modeling_qwen3_moe.py
  • tests/unittest/_torch/modules/tests_lora_modules/test_qwen3_sanity.py

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42005 [ run ] triggered by Bot. Commit: e92e1eb Link to invocation

@achartier
Copy link
Copy Markdown
Collaborator Author

@CodeRabbit review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42005 [ run ] completed with state SUCCESS. Commit: e92e1eb
/LLM/main/L0_MergeRequest_PR pipeline #32852 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

@achartier achartier requested a review from byshiue April 7, 2026 15:20
@achartier achartier merged commit ae86b91 into NVIDIA:main Apr 8, 2026
8 checks passed
@achartier achartier deleted the qwen3-lora branch April 8, 2026 04:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants