Skip to content

[recipe, test] feat: add unit tests for qwen3_vl and qwen35_vl recipes#3648

Closed
lonexreb wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
lonexreb:training/qwen-vl-recipe-tests-3177
Closed

[recipe, test] feat: add unit tests for qwen3_vl and qwen35_vl recipes#3648
lonexreb wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
lonexreb:training/qwen-vl-recipe-tests-3177

Conversation

@lonexreb
Copy link
Copy Markdown
Contributor

@lonexreb lonexreb commented May 4, 2026

Summary

Two recipe modules under src/megatron/bridge/recipes/qwen_vl/ had no unit-test coverage:

Module LoC Recipes Prior coverage
qwen3_vl.py 1144 12 (3 pretrain, 3 SFT, 3 PEFT, energon variant, etc.) L0 launch scripts only
qwen35_vl.py 622 22 (4 pretrain, 5 dense + 3 MoE SFT, 5 dense + 3 MoE PEFT, FSDP variant) L0 launch scripts only

L0 functional scripts are slow and GPU-bound — they don't catch the kind of regression that issue #3177 was filed about ("Kimi K2 had a recipe but no recipe test, so we missed an MCore API breaking change").

This PR adds 44 parametrized unit tests modelled after test_glm_45v_recipes.py and test_kimi_k2.py. Each recipe is built with a stubbed AutoBridge (no HF Hub I/O) and a no-op apply_flex_dispatcher_backend (no torch.cuda dependency), then asserted against the documented contract.

Refs #3177. Tests-only — no production code changes.

What's covered

tests/unit_tests/recipes/qwen_vl/test_qwen3_vl.py — 16 tests

Test Coverage
test_pretrain_recipe_builds[…] (×3) Each pretrain recipe builds; tokenizer, parallelism, freeze defaults, mock VLM dataset
test_sft_recipe_builds[…] (×3) Each SFT recipe builds; freeze flags off, no PEFT, seq_length harmonized
test_peft_recipe_builds_with_default_lora[…] (×3) Default lora scheme attaches a LoRA config
test_peft_recipe_with_dora[…] (×3) peft_scheme=\"dora\" attaches a DoRA instance
test_pretrain_30b_a3b_uses_recommended_moe_parallelism TP=4, PP=2, EP=4, SP on
test_pretrain_235b_a22b_uses_recommended_parallelism TP=4, PP=16, EP=8, CP=2, SP on
test_pretrain_overrides_apply User overrides flow through _qwen3_vl_common (TP, PP, SP, seq_length)
test_pretrain_non_mock_dataset_raises mock=False raises (intentionally not yet supported)
test_sft_8b_default_parallelism TP=2, PP=1
test_sft_30b_a3b_uses_ep_for_moe EP=8, alltoall dispatcher
test_sft_uses_hf_dataset_provider SFT inherits HFDatasetConversationProvider
test_peft_energon_config_builds_with_mocked_processor Energon variant builds when AutoTokenizer + Qwen3VLProcessor are mocked; dataset is EnergonProvider

tests/unit_tests/recipes/qwen_vl/test_qwen35_vl.py — 28 tests

Test Coverage
test_pretrain_recipe_builds[…] (×4) Each of 9B / 35B-A3B / 122B-A10B / 397B-A17B pretrain recipes builds; mock VLM dataset, freeze defaults
test_sft_recipe_builds[…] (×8) Every SFT recipe (5 dense + 3 MoE) builds; freeze flags off, no PEFT, MTP defaults wired (mtp_num_layers=1, mtp_loss_scaling_factor=0.1)
test_dense_sft_does_not_set_ep[…] (×5) Dense SFT recipes never invoke _qwen35_vl_apply_moe — the EP attr stays unset on the provider
test_moe_sft_enables_ep_and_alltoall[…] (×3) MoE SFT recipes set EP > 1, sequence parallel, alltoall dispatcher, grouped GEMM
test_large_moe_sft_enables_full_recompute[…] (×2) 122B-A10B and 397B-A17B SFT enable recompute_granularity=\"full\", uniform, 1 layer
test_35b_a3b_fsdp_sft_specifics FSDP override wiring (use_megatron_fsdp, fsdp_double_buffer, nccl_ub=False, overlap_grad_reduce, overlap_param_gather, single optimizer instance, SP off because TP=1, EP=2 still set)
test_peft_recipe_builds_with_default_lora[…] (×8) Default lora scheme attaches a LoRA config across all dense + MoE PEFT recipes
test_peft_recipe_with_dora[…] (×8) peft_scheme=\"dora\" attaches DoRA across all PEFT recipes
test_moe_peft_enables_ep[…] (×3) MoE PEFT recipes wire EP and alltoall just like their SFT siblings
test_sft_27b_uses_pp_for_dense_27b 27B dense SFT spans 2 nodes via TP=4, PP=4
test_sft_122b_a10b_specifics TP=2, PP=6, EP=8, full recompute
test_sft_397b_a17b_specifics TP=2, PP=4, EP=32, full recompute

Implementation notes

  • AutoBridge stub_FakeAutoBridge.from_hf_pretrained() returns a permissive provider that absorbs arbitrary setattr calls. Only the attrs the recipes read (vocab_size, apply_rope_fusion) get explicit defaults.
  • No num_moe_experts on the fake provider — that means even without the patch on apply_flex_dispatcher_backend, the function would short-circuit on its own first guard before touching torch.cuda.get_device_properties(0). The fixture additionally patches the helper to a no-op so the test stays independent of that internal early-return.
  • Two-module patch — qwen35_vl pretrain delegates into _qwen3_vl_common, which references the qwen3_vl module's AutoBridge symbol. The qwen35_vl fixture patches both modules so the delegation path stays HF-free regardless of which symbol the recipe touches.
  • Energon variantqwen3_vl_8b_peft_energon_config calls AutoTokenizer.from_pretrained and Qwen3VLProcessor.from_pretrained directly inside _make_energon_dataset. The dedicated test mocks both at the recipe-module level and asserts the resulting dataset is an EnergonProvider.

Test plan

  • python3 -m ast parse of new files
  • ruff check clean
  • ruff format --check clean
  • CI: L0 unit tests pick up the new modules automatically (no workflow file changes needed). Both modules live under tests/unit_tests/recipes/qwen_vl/, which is already covered by cicd-unit-tests-core.

Risk

Zero — tests only. No production code changes.

Notes for reviewers

  • I considered combining the qwen3_vl and qwen35_vl modules into a single test file but kept them split because (a) qwen35_vl needs to patch two modules' AutoBridge (the inheritance through _qwen3_vl_common) and the autouse fixture is cleanest when scoped to one test class per module, and (b) the recipe families have different attribute-touching surfaces (qwen35_vl sets MTP, qwen3_vl uses the flex-dispatcher helper), so split files keep each test focused.
  • The energon variant test is the only one that needs to mock AutoTokenizer / Qwen3VLProcessor at the recipe-module level. All other recipes go through AutoBridge only.
  • The recipe family also includes qwen25_vl.py — that one already follows the "flat" pattern this issue asks for, but is not yet covered by unit tests either. I'd happily land that in a follow-up if the maintainers want — it's the same pattern.

Both recipe modules under `src/megatron/bridge/recipes/qwen_vl/`
(qwen3_vl.py — 1144 LoC, 12 exported recipes; qwen35_vl.py — 622 LoC, 22
exported recipes) had no unit-test coverage. Their only safety net was
the L0 functional launch scripts, which are slow, GPU-bound, and don't
catch regressions in recipe construction logic.

Add parametrized unit tests modelled after the existing
`test_glm_45v_recipes.py` and `test_kimi_k2.py`. Each recipe function
is exercised with a stubbed AutoBridge (no HF Hub I/O) and a no-op
`apply_flex_dispatcher_backend` (no torch.cuda dependency), then
asserted against the documented contract: ConfigContainer shape,
parallelism, freeze flags, MoE wiring, recompute, FSDP overrides, PEFT
branching.

`tests/unit_tests/recipes/qwen_vl/test_qwen3_vl.py` (16 tests):
  - Parametrized build sanity for all 3 pretrain, 3 SFT, 3 PEFT recipes.
  - PEFT `dora` scheme attaches a DoRA config across all 3 PEFT recipes.
  - Spot checks for 30B-A3B and 235B-A22B parallelism layouts.
  - User overrides flow through `_qwen3_vl_common`.
  - `mock=False` raises (intentionally not yet supported).
  - SFT 8B uses TP=2/PP=1; SFT 30B-A3B sets EP=8 with alltoall.
  - SFT inherits HFDatasetConversationProvider from `_sft_common_vlm`.
  - Energon variant builds when AutoTokenizer + Qwen3VLProcessor are
    mocked; the resulting dataset is an EnergonProvider.

`tests/unit_tests/recipes/qwen_vl/test_qwen35_vl.py` (28 tests):
  - Parametrized build sanity for all 4 pretrain, 8 SFT, 8 PEFT recipes
    (mix of dense + MoE).
  - Pretrain delegates into `_qwen3_vl_common`; both modules' AutoBridge
    symbols are patched so the delegation path stays HF-free.
  - SFT recipes harmonize seq_length between model and dataset and
    attach MTP defaults (mtp_num_layers=1, mtp_loss_scaling_factor=0.1).
  - Dense SFT recipes leave EP unset; MoE SFT recipes set EP and
    enable sequence parallel + alltoall + grouped GEMM.
  - The 122B-A10B and 397B-A17B SFT recipes enable full recompute.
  - The Megatron-FSDP 35B-A3B variant overrides DDP (use_megatron_fsdp,
    fsdp_double_buffer, nccl_ub=False, overlap_*, single optimizer
    instance) and disables SP because TP=1.
  - Spot checks for the largest recipes (27B dense, 122B-A10B, 397B-A17B).

No production code changes — tests only.

Refs issue NVIDIA-NeMo#3177.

Signed-off-by: lonexreb <[email protected]>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 4, 2026

Closing this — I incorrectly missed the existing recipe tests under tests/unit_tests/recipes/qwen_vl/:

The qwen3_vl and qwen35_vl recipes are already covered by the existing parametrized tests in those files. My PR was a duplicate of work that's been on main since #3100. Apologies for the noise — I should have checked tests/unit_tests/recipes/qwen_vl/ before writing new tests, not just the top-level tests/unit_tests/recipes/ directory.

The other PRs in the same sweep are not duplicates and remain open:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants