Skip to content

[recipe] Add pretrain configs with mock dataset for Qwen3-VL and Qwen3.5-VL#3100

Merged
yaoyu-33 merged 4 commits intomainfrom
chcui/add-qwen-vl-pretrain-configs
Apr 2, 2026
Merged

[recipe] Add pretrain configs with mock dataset for Qwen3-VL and Qwen3.5-VL#3100
yaoyu-33 merged 4 commits intomainfrom
chcui/add-qwen-vl-pretrain-configs

Conversation

@cuichenx
Copy link
Copy Markdown
Contributor

@cuichenx cuichenx commented Apr 1, 2026

Summary

  • Restore pretrain recipe configs for Qwen3-VL (8B, 30B-A3B, 235B-A22B) and add new ones for Qwen3.5-VL (9B, 35B-A3B, 122B-A10B, 397B-A17B)
  • All configs use MockVLMConversationProvider for synthetic VLM pre-training data
  • Shared _qwen3_vl_common helper with Qwen3VLCommonKwargs TypedDict for type-safe overrides — existing perf scripts (qwen3_vl_pretrain.py) continue to work without changes

Test plan

  • ruff check and ruff format --check pass on all modified files
  • All new configs importable from megatron.bridge.recipes.qwen_vl and top-level megatron.bridge.recipes
  • Perf script call pattern (mock=True, precision_config=..., comm_overlap_config=..., moe_flex_dispatcher_backend=...) verified working
  • qwen3_vl_8b_pretrain_config — 3 training iters + eval completed on 4×H100 (TP=4, mock data)
  • qwen35_vl_9b_pretrain_config — 3 training iters + eval completed on 4×H100 (TP=4, mock data)
    • Note: checkpoint save hits a pre-existing dt_bias (differential attention) issue in Megatron Core — unrelated to this PR, also affects Qwen3.5 SFT configs

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added pretraining configuration support for Qwen3-VL models (8B, 30B, 235B variants).
    • Added pretraining configuration support for Qwen3.5-VL models (9B, 35B, 122B, 397B variants).

…3.5-VL

Add pretrain recipe configs using MockVLMConversationProvider for VLM
pre-training with synthetic data. This restores previously deleted
pretrain configs and extends coverage to Qwen3.5-VL.

Qwen3-VL: 8B, 30B-A3B, 235B-A22B
Qwen3.5-VL: 9B, 35B-A3B, 122B-A10B, 397B-A17B

The configs use a shared _qwen3_vl_common helper with a Qwen3VLCommonKwargs
TypedDict for type-safe overrides. Existing perf scripts that import
qwen3_vl_30b_a3b_pretrain_config / qwen3_vl_235b_a22b_pretrain_config
continue to work without changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Rename all VLM pretrain recipe functions from *_pretrain_config to
*_pretrain_mock_config to clearly indicate they use mock datasets.
Updated imports in __init__.py, perf scripts, and examples.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Signed-off-by: Chen Cui <[email protected]>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

The changes extend Qwen3-VL and Qwen3.5-VL recipe modules to support pretraining configurations. A shared configuration helper and TypedDict are introduced in the Qwen3-VL module, with model-specific pretrain factory functions added across both modules and exported via the package's __init__.py.

Changes

Cohort / File(s) Summary
Qwen3-VL Pretrain Foundation
src/megatron/bridge/recipes/qwen_vl/qwen3_vl.py
Introduced Qwen3VLCommonKwargs TypedDict and _qwen3_vl_common() helper function to provide shared configuration logic for pretraining. Added three new pretrain factory functions (qwen3_vl_8b_pretrain_config, qwen3_vl_30b_a3b_pretrain_config, qwen3_vl_235b_a22b_pretrain_config) that merge user kwargs with model-specific settings and delegate to the common helper. Updated module docstring to reflect pretrain support.
Qwen3.5-VL Pretrain Functions
src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py
Added four new pretrain factory functions (qwen35_vl_9b_pretrain_config, qwen35_vl_35b_a3b_pretrain_config, qwen35_vl_122b_a10b_pretrain_config, qwen35_vl_397b_a17b_pretrain_config) that leverage the Qwen3-VL common configuration helper. Updated module docstring to include pretrain in described scope.
Package Exports
src/megatron/bridge/recipes/qwen_vl/__init__.py
Added imports and exports of all seven new pretrain configuration functions to the module's __all__ list, making them publicly accessible.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding pretrain configs with mock dataset support for Qwen3-VL and Qwen3.5-VL models.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes ✅ Passed PR includes comprehensive test results documenting linting validation, import verification, functional testing, and successful training execution on 4×H100 hardware with documented known issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chcui/add-qwen-vl-pretrain-configs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py`:
- Around line 196-270: The hf_path values in qwen35_vl_9b_pretrain_config,
qwen35_vl_35b_a3b_pretrain_config, qwen35_vl_122b_a10b_pretrain_config, and
qwen35_vl_397b_a17b_pretrain_config point to Qwen3.5 text-only models while the
file/function names imply Qwen3-VL vision-language models; either update each
"hf_path" to the correct Qwen3-VL model identifiers (e.g., replace
"Qwen/Qwen3.5-9B" etc. with the corresponding "Qwen/Qwen3-VL-..." variant)
before calling _qwen3_vl_common, or rename the file/functions (and docstrings)
to reflect they configure Qwen3.5 text models so naming and hf_path are
consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1237aef6-95fc-456b-b1e8-a82c22dd5eeb

📥 Commits

Reviewing files that changed from the base of the PR and between 8bcc229 and 696fd7f.

📒 Files selected for processing (3)
  • src/megatron/bridge/recipes/qwen_vl/__init__.py
  • src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py
  • src/megatron/bridge/recipes/qwen_vl/qwen3_vl.py

Comment thread src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py Outdated
@tomlifu tomlifu self-requested a review April 1, 2026 22:43
@yaoyu-33 yaoyu-33 added feature New capabilities, enhancements, or enablement work area:recipe Training recipes and launch configs needs-review PR is ready for code review and waiting on a reviewer labels Apr 1, 2026
tomlifu
tomlifu previously approved these changes Apr 1, 2026
Copy link
Copy Markdown
Contributor

@tomlifu tomlifu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

yaoyu-33
yaoyu-33 previously approved these changes Apr 1, 2026
dingqingy-nv
dingqingy-nv previously approved these changes Apr 1, 2026
@cuichenx
Copy link
Copy Markdown
Contributor Author

cuichenx commented Apr 1, 2026

/ok to test 920e9db

@cuichenx cuichenx added ready-to-merge PR is approved, current, and only waiting for CI to pass before merge and removed needs-review PR is ready for code review and waiting on a reviewer labels Apr 1, 2026
@cuichenx cuichenx added the r0.4.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. label Apr 1, 2026
Signed-off-by: Chen Cui <[email protected]>
@cuichenx cuichenx dismissed stale reviews from dingqingy-nv, yaoyu-33, and tomlifu via 1082b5b April 2, 2026 05:07
Signed-off-by: Chen Cui <[email protected]>
@cuichenx
Copy link
Copy Markdown
Contributor Author

cuichenx commented Apr 2, 2026

/ok to test 863644b

@yaoyu-33 yaoyu-33 merged commit 257c12f into main Apr 2, 2026
65 of 70 checks passed
@yaoyu-33 yaoyu-33 deleted the chcui/add-qwen-vl-pretrain-configs branch April 2, 2026 20:18
svcnvidia-nemo-ci pushed a commit that referenced this pull request Apr 2, 2026
…3.5-VL (#3100)

Signed-off-by: Chen Cui <[email protected]>
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Signed-off-by: NeMo Bot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:recipe Training recipes and launch configs feature New capabilities, enhancements, or enablement work r0.4.0 Auto-cherrypick to release branch. Apply before merge; cherrypick happens after merge. ready-to-merge PR is approved, current, and only waiting for CI to pass before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants