[recipe, test] feat: add unit tests for qwen3_vl and qwen35_vl recipes by lonexreb · Pull Request #3648 · NVIDIA-NeMo/Megatron-Bridge

lonexreb · 2026-05-04T08:38:18Z

Summary

Two recipe modules under src/megatron/bridge/recipes/qwen_vl/ had no unit-test coverage:

Module	LoC	Recipes	Prior coverage
`qwen3_vl.py`	1144	12 (3 pretrain, 3 SFT, 3 PEFT, energon variant, etc.)	L0 launch scripts only
`qwen35_vl.py`	622	22 (4 pretrain, 5 dense + 3 MoE SFT, 5 dense + 3 MoE PEFT, FSDP variant)	L0 launch scripts only

L0 functional scripts are slow and GPU-bound — they don't catch the kind of regression that issue #3177 was filed about ("Kimi K2 had a recipe but no recipe test, so we missed an MCore API breaking change").

This PR adds 44 parametrized unit tests modelled after test_glm_45v_recipes.py and test_kimi_k2.py. Each recipe is built with a stubbed AutoBridge (no HF Hub I/O) and a no-op apply_flex_dispatcher_backend (no torch.cuda dependency), then asserted against the documented contract.

Refs #3177. Tests-only — no production code changes.

What's covered

`tests/unit_tests/recipes/qwen_vl/test_qwen3_vl.py` — 16 tests

Test	Coverage
`test_pretrain_recipe_builds[…]` (×3)	Each pretrain recipe builds; tokenizer, parallelism, freeze defaults, mock VLM dataset
`test_sft_recipe_builds[…]` (×3)	Each SFT recipe builds; freeze flags off, no PEFT, seq_length harmonized
`test_peft_recipe_builds_with_default_lora[…]` (×3)	Default `lora` scheme attaches a LoRA config
`test_peft_recipe_with_dora[…]` (×3)	`peft_scheme=\"dora\"` attaches a `DoRA` instance
`test_pretrain_30b_a3b_uses_recommended_moe_parallelism`	TP=4, PP=2, EP=4, SP on
`test_pretrain_235b_a22b_uses_recommended_parallelism`	TP=4, PP=16, EP=8, CP=2, SP on
`test_pretrain_overrides_apply`	User overrides flow through `_qwen3_vl_common` (TP, PP, SP, seq_length)
`test_pretrain_non_mock_dataset_raises`	`mock=False` raises (intentionally not yet supported)
`test_sft_8b_default_parallelism`	TP=2, PP=1
`test_sft_30b_a3b_uses_ep_for_moe`	EP=8, alltoall dispatcher
`test_sft_uses_hf_dataset_provider`	SFT inherits `HFDatasetConversationProvider`
`test_peft_energon_config_builds_with_mocked_processor`	Energon variant builds when `AutoTokenizer` + `Qwen3VLProcessor` are mocked; dataset is `EnergonProvider`

`tests/unit_tests/recipes/qwen_vl/test_qwen35_vl.py` — 28 tests

Test	Coverage
`test_pretrain_recipe_builds[…]` (×4)	Each of 9B / 35B-A3B / 122B-A10B / 397B-A17B pretrain recipes builds; mock VLM dataset, freeze defaults
`test_sft_recipe_builds[…]` (×8)	Every SFT recipe (5 dense + 3 MoE) builds; freeze flags off, no PEFT, MTP defaults wired (`mtp_num_layers=1`, `mtp_loss_scaling_factor=0.1`)
`test_dense_sft_does_not_set_ep[…]` (×5)	Dense SFT recipes never invoke `_qwen35_vl_apply_moe` — the EP attr stays unset on the provider
`test_moe_sft_enables_ep_and_alltoall[…]` (×3)	MoE SFT recipes set EP > 1, sequence parallel, alltoall dispatcher, grouped GEMM
`test_large_moe_sft_enables_full_recompute[…]` (×2)	122B-A10B and 397B-A17B SFT enable `recompute_granularity=\"full\"`, `uniform`, 1 layer
`test_35b_a3b_fsdp_sft_specifics`	FSDP override wiring (`use_megatron_fsdp`, `fsdp_double_buffer`, `nccl_ub=False`, `overlap_grad_reduce`, `overlap_param_gather`, single optimizer instance, SP off because TP=1, EP=2 still set)
`test_peft_recipe_builds_with_default_lora[…]` (×8)	Default `lora` scheme attaches a LoRA config across all dense + MoE PEFT recipes
`test_peft_recipe_with_dora[…]` (×8)	`peft_scheme=\"dora\"` attaches `DoRA` across all PEFT recipes
`test_moe_peft_enables_ep[…]` (×3)	MoE PEFT recipes wire EP and alltoall just like their SFT siblings
`test_sft_27b_uses_pp_for_dense_27b`	27B dense SFT spans 2 nodes via TP=4, PP=4
`test_sft_122b_a10b_specifics`	TP=2, PP=6, EP=8, full recompute
`test_sft_397b_a17b_specifics`	TP=2, PP=4, EP=32, full recompute

Implementation notes

AutoBridge stub — _FakeAutoBridge.from_hf_pretrained() returns a permissive provider that absorbs arbitrary setattr calls. Only the attrs the recipes read (vocab_size, apply_rope_fusion) get explicit defaults.
No num_moe_experts on the fake provider — that means even without the patch on apply_flex_dispatcher_backend, the function would short-circuit on its own first guard before touching torch.cuda.get_device_properties(0). The fixture additionally patches the helper to a no-op so the test stays independent of that internal early-return.
Two-module patch — qwen35_vl pretrain delegates into _qwen3_vl_common, which references the qwen3_vl module's AutoBridge symbol. The qwen35_vl fixture patches both modules so the delegation path stays HF-free regardless of which symbol the recipe touches.
Energon variant — qwen3_vl_8b_peft_energon_config calls AutoTokenizer.from_pretrained and Qwen3VLProcessor.from_pretrained directly inside _make_energon_dataset. The dedicated test mocks both at the recipe-module level and asserts the resulting dataset is an EnergonProvider.

Test plan

python3 -m ast parse of new files
ruff check clean
ruff format --check clean
CI: L0 unit tests pick up the new modules automatically (no workflow file changes needed). Both modules live under tests/unit_tests/recipes/qwen_vl/, which is already covered by cicd-unit-tests-core.

Risk

Zero — tests only. No production code changes.

Notes for reviewers

I considered combining the qwen3_vl and qwen35_vl modules into a single test file but kept them split because (a) qwen35_vl needs to patch two modules' AutoBridge (the inheritance through _qwen3_vl_common) and the autouse fixture is cleanest when scoped to one test class per module, and (b) the recipe families have different attribute-touching surfaces (qwen35_vl sets MTP, qwen3_vl uses the flex-dispatcher helper), so split files keep each test focused.
The energon variant test is the only one that needs to mock AutoTokenizer / Qwen3VLProcessor at the recipe-module level. All other recipes go through AutoBridge only.
The recipe family also includes qwen25_vl.py — that one already follows the "flat" pattern this issue asks for, but is not yet covered by unit tests either. I'd happily land that in a follow-up if the maintainers want — it's the same pattern.

Both recipe modules under `src/megatron/bridge/recipes/qwen_vl/` (qwen3_vl.py — 1144 LoC, 12 exported recipes; qwen35_vl.py — 622 LoC, 22 exported recipes) had no unit-test coverage. Their only safety net was the L0 functional launch scripts, which are slow, GPU-bound, and don't catch regressions in recipe construction logic. Add parametrized unit tests modelled after the existing `test_glm_45v_recipes.py` and `test_kimi_k2.py`. Each recipe function is exercised with a stubbed AutoBridge (no HF Hub I/O) and a no-op `apply_flex_dispatcher_backend` (no torch.cuda dependency), then asserted against the documented contract: ConfigContainer shape, parallelism, freeze flags, MoE wiring, recompute, FSDP overrides, PEFT branching. `tests/unit_tests/recipes/qwen_vl/test_qwen3_vl.py` (16 tests): - Parametrized build sanity for all 3 pretrain, 3 SFT, 3 PEFT recipes. - PEFT `dora` scheme attaches a DoRA config across all 3 PEFT recipes. - Spot checks for 30B-A3B and 235B-A22B parallelism layouts. - User overrides flow through `_qwen3_vl_common`. - `mock=False` raises (intentionally not yet supported). - SFT 8B uses TP=2/PP=1; SFT 30B-A3B sets EP=8 with alltoall. - SFT inherits HFDatasetConversationProvider from `_sft_common_vlm`. - Energon variant builds when AutoTokenizer + Qwen3VLProcessor are mocked; the resulting dataset is an EnergonProvider. `tests/unit_tests/recipes/qwen_vl/test_qwen35_vl.py` (28 tests): - Parametrized build sanity for all 4 pretrain, 8 SFT, 8 PEFT recipes (mix of dense + MoE). - Pretrain delegates into `_qwen3_vl_common`; both modules' AutoBridge symbols are patched so the delegation path stays HF-free. - SFT recipes harmonize seq_length between model and dataset and attach MTP defaults (mtp_num_layers=1, mtp_loss_scaling_factor=0.1). - Dense SFT recipes leave EP unset; MoE SFT recipes set EP and enable sequence parallel + alltoall + grouped GEMM. - The 122B-A10B and 397B-A17B SFT recipes enable full recompute. - The Megatron-FSDP 35B-A3B variant overrides DDP (use_megatron_fsdp, fsdp_double_buffer, nccl_ub=False, overlap_*, single optimizer instance) and disables SP because TP=1. - Spot checks for the largest recipes (27B dense, 122B-A10B, 397B-A17B). No production code changes — tests only. Refs issue NVIDIA-NeMo#3177. Signed-off-by: lonexreb <[email protected]>

copy-pr-bot · 2026-05-04T08:38:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

lonexreb · 2026-05-04T08:45:25Z

Closing this — I incorrectly missed the existing recipe tests under tests/unit_tests/recipes/qwen_vl/:

test_qwen25_vl_recipes.py (457 LoC)
test_qwen3_vl_recipes.py (731 LoC, added in [recipe] Add pretrain configs with mock dataset for Qwen3-VL and Qwen3.5-VL #3100)
test_qwen35_vl_recipes.py (825 LoC)

The qwen3_vl and qwen35_vl recipes are already covered by the existing parametrized tests in those files. My PR was a duplicate of work that's been on main since #3100. Apologies for the noise — I should have checked tests/unit_tests/recipes/qwen_vl/ before writing new tests, not just the top-level tests/unit_tests/recipes/ directory.

The other PRs in the same sweep are not duplicates and remain open:

[training] fix: aggregate TB memory metrics across PP group #3645 (TB memory PP-max fix, issue [bug] improve TB memory logging #3167)
[recipe, test] feat: add unit tests for qwen2_audio and kimi_k25_vl recipes #3646 (qwen2_audio + kimi_k25_vl recipe tests, issue [feature] Recipe tests missing for some recipes #3177 — these recipes genuinely have no unit-test coverage today)
[training] feat: forward MoE/MTP metrics to MLFlow and Comet #3647 (MoE/MTP metric fanout to MLFlow / Comet, issue track_moe_metrics() does not forward MoE metrics to Comet ML #2989)

github-actions Bot added the community-request label May 4, 2026

lonexreb closed this May 4, 2026

lonexreb mentioned this pull request May 5, 2026

[data, test] feat: add unit tests for HuggingFace dataset processors #3680

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[recipe, test] feat: add unit tests for qwen3_vl and qwen35_vl recipes#3648

[recipe, test] feat: add unit tests for qwen3_vl and qwen35_vl recipes#3648
lonexreb wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
lonexreb:training/qwen-vl-recipe-tests-3177

lonexreb commented May 4, 2026

Uh oh!

copy-pr-bot Bot commented May 4, 2026

Uh oh!

lonexreb commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lonexreb commented May 4, 2026

Summary

What's covered

tests/unit_tests/recipes/qwen_vl/test_qwen3_vl.py — 16 tests

tests/unit_tests/recipes/qwen_vl/test_qwen35_vl.py — 28 tests

Implementation notes

Test plan

Risk

Notes for reviewers

Uh oh!

copy-pr-bot Bot commented May 4, 2026

Uh oh!

lonexreb commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`tests/unit_tests/recipes/qwen_vl/test_qwen3_vl.py` — 16 tests

`tests/unit_tests/recipes/qwen_vl/test_qwen35_vl.py` — 28 tests