[recipe] Add pretrain configs with mock dataset for Qwen3-VL and Qwen3.5-VL by cuichenx · Pull Request #3100 · NVIDIA-NeMo/Megatron-Bridge

cuichenx · 2026-04-01T22:29:01Z

Summary

Restore pretrain recipe configs for Qwen3-VL (8B, 30B-A3B, 235B-A22B) and add new ones for Qwen3.5-VL (9B, 35B-A3B, 122B-A10B, 397B-A17B)
All configs use MockVLMConversationProvider for synthetic VLM pre-training data
Shared _qwen3_vl_common helper with Qwen3VLCommonKwargs TypedDict for type-safe overrides — existing perf scripts (qwen3_vl_pretrain.py) continue to work without changes

Test plan

ruff check and ruff format --check pass on all modified files
All new configs importable from megatron.bridge.recipes.qwen_vl and top-level megatron.bridge.recipes
Perf script call pattern (mock=True, precision_config=..., comm_overlap_config=..., moe_flex_dispatcher_backend=...) verified working
qwen3_vl_8b_pretrain_config — 3 training iters + eval completed on 4×H100 (TP=4, mock data)
qwen35_vl_9b_pretrain_config — 3 training iters + eval completed on 4×H100 (TP=4, mock data)
- Note: checkpoint save hits a pre-existing dt_bias (differential attention) issue in Megatron Core — unrelated to this PR, also affects Qwen3.5 SFT configs

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added pretraining configuration support for Qwen3-VL models (8B, 30B, 235B variants).
- Added pretraining configuration support for Qwen3.5-VL models (9B, 35B, 122B, 397B variants).

…3.5-VL Add pretrain recipe configs using MockVLMConversationProvider for VLM pre-training with synthetic data. This restores previously deleted pretrain configs and extends coverage to Qwen3.5-VL. Qwen3-VL: 8B, 30B-A3B, 235B-A22B Qwen3.5-VL: 9B, 35B-A3B, 122B-A10B, 397B-A17B The configs use a shared _qwen3_vl_common helper with a Qwen3VLCommonKwargs TypedDict for type-safe overrides. Existing perf scripts that import qwen3_vl_30b_a3b_pretrain_config / qwen3_vl_235b_a22b_pretrain_config continue to work without changes. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Chen Cui <[email protected]>

copy-pr-bot · 2026-04-01T22:29:04Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Rename all VLM pretrain recipe functions from *_pretrain_config to *_pretrain_mock_config to clearly indicate they use mock datasets. Updated imports in __init__.py, perf scripts, and examples. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Chen Cui <[email protected]>

coderabbitai · 2026-04-01T22:34:32Z

📝 Walkthrough

Walkthrough

The changes extend Qwen3-VL and Qwen3.5-VL recipe modules to support pretraining configurations. A shared configuration helper and TypedDict are introduced in the Qwen3-VL module, with model-specific pretrain factory functions added across both modules and exported via the package's __init__.py.

Changes

Cohort / File(s)	Summary
Qwen3-VL Pretrain Foundation `src/megatron/bridge/recipes/qwen_vl/qwen3_vl.py`	Introduced `Qwen3VLCommonKwargs` TypedDict and `_qwen3_vl_common()` helper function to provide shared configuration logic for pretraining. Added three new pretrain factory functions (`qwen3_vl_8b_pretrain_config`, `qwen3_vl_30b_a3b_pretrain_config`, `qwen3_vl_235b_a22b_pretrain_config`) that merge user kwargs with model-specific settings and delegate to the common helper. Updated module docstring to reflect pretrain support.
Qwen3.5-VL Pretrain Functions `src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py`	Added four new pretrain factory functions (`qwen35_vl_9b_pretrain_config`, `qwen35_vl_35b_a3b_pretrain_config`, `qwen35_vl_122b_a10b_pretrain_config`, `qwen35_vl_397b_a17b_pretrain_config`) that leverage the Qwen3-VL common configuration helper. Updated module docstring to include pretrain in described scope.
Package Exports `src/megatron/bridge/recipes/qwen_vl/__init__.py`	Added imports and exports of all seven new pretrain configuration functions to the module's `__all__` list, making them publicly accessible.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding pretrain configs with mock dataset support for Qwen3-VL and Qwen3.5-VL models.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes	✅ Passed	PR includes comprehensive test results documenting linting validation, import verification, functional testing, and successful training execution on 4×H100 hardware with documented known issues.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chcui/add-qwen-vl-pretrain-configs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py`:
- Around line 196-270: The hf_path values in qwen35_vl_9b_pretrain_config,
qwen35_vl_35b_a3b_pretrain_config, qwen35_vl_122b_a10b_pretrain_config, and
qwen35_vl_397b_a17b_pretrain_config point to Qwen3.5 text-only models while the
file/function names imply Qwen3-VL vision-language models; either update each
"hf_path" to the correct Qwen3-VL model identifiers (e.g., replace
"Qwen/Qwen3.5-9B" etc. with the corresponding "Qwen/Qwen3-VL-..." variant)
before calling _qwen3_vl_common, or rename the file/functions (and docstrings)
to reflect they configure Qwen3.5 text models so naming and hf_path are
consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1237aef6-95fc-456b-b1e8-a82c22dd5eeb

📥 Commits

Reviewing files that changed from the base of the PR and between 8bcc229 and 696fd7f.

📒 Files selected for processing (3)

src/megatron/bridge/recipes/qwen_vl/__init__.py
src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py
src/megatron/bridge/recipes/qwen_vl/qwen3_vl.py

tomlifu

LGTM.

cuichenx · 2026-04-01T23:43:13Z

/ok to test 920e9db

Signed-off-by: Chen Cui <[email protected]>

cuichenx · 2026-04-02T05:56:59Z

/ok to test 863644b

…3.5-VL (#3100) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: NeMo Bot <[email protected]>

coderabbitai Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py Outdated

cuichenx requested review from dingqingy-nv and yaoyu-33 April 1, 2026 22:37

tomlifu self-requested a review April 1, 2026 22:43

yaoyu-33 added feature New capabilities, enhancements, or enablement work area:recipe Training recipes and launch configs needs-review PR is ready for code review and waiting on a reviewer labels Apr 1, 2026

tomlifu previously approved these changes Apr 1, 2026

View reviewed changes

yaoyu-33 previously approved these changes Apr 1, 2026

View reviewed changes

dingqingy-nv previously approved these changes Apr 1, 2026

View reviewed changes

cuichenx added ready-to-merge PR is approved, current, and only waiting for CI to pass before merge and removed needs-review PR is ready for code review and waiting on a reviewer labels Apr 1, 2026

copy-pr-bot Bot temporarily deployed to test April 1, 2026 23:43 Inactive

cuichenx added the r0.4.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. label Apr 1, 2026

add tests

1082b5b

Signed-off-by: Chen Cui <[email protected]>

cuichenx dismissed stale reviews from dingqingy-nv, yaoyu-33, and tomlifu via 1082b5b April 2, 2026 05:07

fix test

863644b

Signed-off-by: Chen Cui <[email protected]>

copy-pr-bot Bot temporarily deployed to test April 2, 2026 05:57 Inactive

yaoyu-33 approved these changes Apr 2, 2026

View reviewed changes

yaoyu-33 merged commit 257c12f into main Apr 2, 2026
65 of 70 checks passed

yaoyu-33 deleted the chcui/add-qwen-vl-pretrain-configs branch April 2, 2026 20:18

coderabbitai Bot mentioned this pull request Apr 9, 2026

Add Qwen3.5-VL MoE pretrain performance configs #3101

Merged

5 tasks

shifangx mentioned this pull request Apr 10, 2026

recover qwen3_vl perf script #3270

Merged

5 tasks

coderabbitai Bot mentioned this pull request Apr 13, 2026

cp: Add Qwen3.5-VL MoE pretrain performance configs (3101) into r0.4.0 #3304

Merged

lonexreb mentioned this pull request May 4, 2026

[recipe, test] feat: add unit tests for qwen3_vl and qwen35_vl recipes #3648

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[recipe] Add pretrain configs with mock dataset for Qwen3-VL and Qwen3.5-VL#3100

[recipe] Add pretrain configs with mock dataset for Qwen3-VL and Qwen3.5-VL#3100
yaoyu-33 merged 4 commits intomainfrom
chcui/add-qwen-vl-pretrain-configs

cuichenx commented Apr 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Apr 1, 2026

Uh oh!

coderabbitai Bot commented Apr 1, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

tomlifu left a comment

Uh oh!

cuichenx commented Apr 1, 2026

Uh oh!

cuichenx commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

cuichenx commented Apr 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 1, 2026

Uh oh!

coderabbitai Bot commented Apr 1, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tomlifu left a comment

Choose a reason for hiding this comment

Uh oh!

cuichenx commented Apr 1, 2026

Uh oh!

cuichenx commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cuichenx commented Apr 1, 2026 •

edited by coderabbitai Bot

Loading