[recipe] Add pretrain configs with mock dataset for Qwen3-VL and Qwen3.5-VL#3100
[recipe] Add pretrain configs with mock dataset for Qwen3-VL and Qwen3.5-VL#3100
Conversation
…3.5-VL Add pretrain recipe configs using MockVLMConversationProvider for VLM pre-training with synthetic data. This restores previously deleted pretrain configs and extends coverage to Qwen3.5-VL. Qwen3-VL: 8B, 30B-A3B, 235B-A22B Qwen3.5-VL: 9B, 35B-A3B, 122B-A10B, 397B-A17B The configs use a shared _qwen3_vl_common helper with a Qwen3VLCommonKwargs TypedDict for type-safe overrides. Existing perf scripts that import qwen3_vl_30b_a3b_pretrain_config / qwen3_vl_235b_a22b_pretrain_config continue to work without changes. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Chen Cui <[email protected]>
Rename all VLM pretrain recipe functions from *_pretrain_config to *_pretrain_mock_config to clearly indicate they use mock datasets. Updated imports in __init__.py, perf scripts, and examples. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: Chen Cui <[email protected]>
📝 WalkthroughWalkthroughThe changes extend Qwen3-VL and Qwen3.5-VL recipe modules to support pretraining configurations. A shared configuration helper and TypedDict are introduced in the Qwen3-VL module, with model-specific pretrain factory functions added across both modules and exported via the package's Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/megatron/bridge/recipes/qwen_vl/qwen35_vl.py`:
- Around line 196-270: The hf_path values in qwen35_vl_9b_pretrain_config,
qwen35_vl_35b_a3b_pretrain_config, qwen35_vl_122b_a10b_pretrain_config, and
qwen35_vl_397b_a17b_pretrain_config point to Qwen3.5 text-only models while the
file/function names imply Qwen3-VL vision-language models; either update each
"hf_path" to the correct Qwen3-VL model identifiers (e.g., replace
"Qwen/Qwen3.5-9B" etc. with the corresponding "Qwen/Qwen3-VL-..." variant)
before calling _qwen3_vl_common, or rename the file/functions (and docstrings)
to reflect they configure Qwen3.5 text models so naming and hf_path are
consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1237aef6-95fc-456b-b1e8-a82c22dd5eeb
📒 Files selected for processing (3)
src/megatron/bridge/recipes/qwen_vl/__init__.pysrc/megatron/bridge/recipes/qwen_vl/qwen35_vl.pysrc/megatron/bridge/recipes/qwen_vl/qwen3_vl.py
|
/ok to test 920e9db |
Signed-off-by: Chen Cui <[email protected]>
1082b5b
Signed-off-by: Chen Cui <[email protected]>
|
/ok to test 863644b |
…3.5-VL (#3100) Signed-off-by: Chen Cui <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]> Signed-off-by: NeMo Bot <[email protected]>
Summary
MockVLMConversationProviderfor synthetic VLM pre-training data_qwen3_vl_commonhelper withQwen3VLCommonKwargsTypedDict for type-safe overrides — existing perf scripts (qwen3_vl_pretrain.py) continue to work without changesTest plan
ruff checkandruff format --checkpass on all modified filesmegatron.bridge.recipes.qwen_vland top-levelmegatron.bridge.recipesmock=True, precision_config=..., comm_overlap_config=..., moe_flex_dispatcher_backend=...) verified workingqwen3_vl_8b_pretrain_config— 3 training iters + eval completed on 4×H100 (TP=4, mock data)qwen35_vl_9b_pretrain_config— 3 training iters + eval completed on 4×H100 (TP=4, mock data)dt_bias(differential attention) issue in Megatron Core — unrelated to this PR, also affects Qwen3.5 SFT configs🤖 Generated with Claude Code
Summary by CodeRabbit