[recipe, test] feat: add unit tests for qwen3_vl and qwen35_vl recipes#3648
Closed
lonexreb wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
Closed
[recipe, test] feat: add unit tests for qwen3_vl and qwen35_vl recipes#3648lonexreb wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
lonexreb wants to merge 1 commit intoNVIDIA-NeMo:mainfrom
Conversation
Both recipe modules under `src/megatron/bridge/recipes/qwen_vl/`
(qwen3_vl.py — 1144 LoC, 12 exported recipes; qwen35_vl.py — 622 LoC, 22
exported recipes) had no unit-test coverage. Their only safety net was
the L0 functional launch scripts, which are slow, GPU-bound, and don't
catch regressions in recipe construction logic.
Add parametrized unit tests modelled after the existing
`test_glm_45v_recipes.py` and `test_kimi_k2.py`. Each recipe function
is exercised with a stubbed AutoBridge (no HF Hub I/O) and a no-op
`apply_flex_dispatcher_backend` (no torch.cuda dependency), then
asserted against the documented contract: ConfigContainer shape,
parallelism, freeze flags, MoE wiring, recompute, FSDP overrides, PEFT
branching.
`tests/unit_tests/recipes/qwen_vl/test_qwen3_vl.py` (16 tests):
- Parametrized build sanity for all 3 pretrain, 3 SFT, 3 PEFT recipes.
- PEFT `dora` scheme attaches a DoRA config across all 3 PEFT recipes.
- Spot checks for 30B-A3B and 235B-A22B parallelism layouts.
- User overrides flow through `_qwen3_vl_common`.
- `mock=False` raises (intentionally not yet supported).
- SFT 8B uses TP=2/PP=1; SFT 30B-A3B sets EP=8 with alltoall.
- SFT inherits HFDatasetConversationProvider from `_sft_common_vlm`.
- Energon variant builds when AutoTokenizer + Qwen3VLProcessor are
mocked; the resulting dataset is an EnergonProvider.
`tests/unit_tests/recipes/qwen_vl/test_qwen35_vl.py` (28 tests):
- Parametrized build sanity for all 4 pretrain, 8 SFT, 8 PEFT recipes
(mix of dense + MoE).
- Pretrain delegates into `_qwen3_vl_common`; both modules' AutoBridge
symbols are patched so the delegation path stays HF-free.
- SFT recipes harmonize seq_length between model and dataset and
attach MTP defaults (mtp_num_layers=1, mtp_loss_scaling_factor=0.1).
- Dense SFT recipes leave EP unset; MoE SFT recipes set EP and
enable sequence parallel + alltoall + grouped GEMM.
- The 122B-A10B and 397B-A17B SFT recipes enable full recompute.
- The Megatron-FSDP 35B-A3B variant overrides DDP (use_megatron_fsdp,
fsdp_double_buffer, nccl_ub=False, overlap_*, single optimizer
instance) and disables SP because TP=1.
- Spot checks for the largest recipes (27B dense, 122B-A10B, 397B-A17B).
No production code changes — tests only.
Refs issue NVIDIA-NeMo#3177.
Signed-off-by: lonexreb <[email protected]>
Contributor
Author
|
Closing this — I incorrectly missed the existing recipe tests under
The qwen3_vl and qwen35_vl recipes are already covered by the existing parametrized tests in those files. My PR was a duplicate of work that's been on The other PRs in the same sweep are not duplicates and remain open:
|
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two recipe modules under
src/megatron/bridge/recipes/qwen_vl/had no unit-test coverage:qwen3_vl.pyqwen35_vl.pyL0 functional scripts are slow and GPU-bound — they don't catch the kind of regression that issue #3177 was filed about ("Kimi K2 had a recipe but no recipe test, so we missed an MCore API breaking change").
This PR adds 44 parametrized unit tests modelled after
test_glm_45v_recipes.pyandtest_kimi_k2.py. Each recipe is built with a stubbedAutoBridge(no HF Hub I/O) and a no-opapply_flex_dispatcher_backend(no torch.cuda dependency), then asserted against the documented contract.Refs #3177. Tests-only — no production code changes.
What's covered
tests/unit_tests/recipes/qwen_vl/test_qwen3_vl.py— 16 teststest_pretrain_recipe_builds[…](×3)test_sft_recipe_builds[…](×3)test_peft_recipe_builds_with_default_lora[…](×3)lorascheme attaches a LoRA configtest_peft_recipe_with_dora[…](×3)peft_scheme=\"dora\"attaches aDoRAinstancetest_pretrain_30b_a3b_uses_recommended_moe_parallelismtest_pretrain_235b_a22b_uses_recommended_parallelismtest_pretrain_overrides_apply_qwen3_vl_common(TP, PP, SP, seq_length)test_pretrain_non_mock_dataset_raisesmock=Falseraises (intentionally not yet supported)test_sft_8b_default_parallelismtest_sft_30b_a3b_uses_ep_for_moetest_sft_uses_hf_dataset_providerHFDatasetConversationProvidertest_peft_energon_config_builds_with_mocked_processorAutoTokenizer+Qwen3VLProcessorare mocked; dataset isEnergonProvidertests/unit_tests/recipes/qwen_vl/test_qwen35_vl.py— 28 teststest_pretrain_recipe_builds[…](×4)test_sft_recipe_builds[…](×8)mtp_num_layers=1,mtp_loss_scaling_factor=0.1)test_dense_sft_does_not_set_ep[…](×5)_qwen35_vl_apply_moe— the EP attr stays unset on the providertest_moe_sft_enables_ep_and_alltoall[…](×3)test_large_moe_sft_enables_full_recompute[…](×2)recompute_granularity=\"full\",uniform, 1 layertest_35b_a3b_fsdp_sft_specificsuse_megatron_fsdp,fsdp_double_buffer,nccl_ub=False,overlap_grad_reduce,overlap_param_gather, single optimizer instance, SP off because TP=1, EP=2 still set)test_peft_recipe_builds_with_default_lora[…](×8)lorascheme attaches a LoRA config across all dense + MoE PEFT recipestest_peft_recipe_with_dora[…](×8)peft_scheme=\"dora\"attachesDoRAacross all PEFT recipestest_moe_peft_enables_ep[…](×3)test_sft_27b_uses_pp_for_dense_27btest_sft_122b_a10b_specificstest_sft_397b_a17b_specificsImplementation notes
_FakeAutoBridge.from_hf_pretrained()returns a permissive provider that absorbs arbitrarysetattrcalls. Only the attrs the recipes read (vocab_size,apply_rope_fusion) get explicit defaults.num_moe_expertson the fake provider — that means even without the patch onapply_flex_dispatcher_backend, the function would short-circuit on its own first guard before touchingtorch.cuda.get_device_properties(0). The fixture additionally patches the helper to a no-op so the test stays independent of that internal early-return._qwen3_vl_common, which references the qwen3_vl module'sAutoBridgesymbol. The qwen35_vl fixture patches both modules so the delegation path stays HF-free regardless of which symbol the recipe touches.qwen3_vl_8b_peft_energon_configcallsAutoTokenizer.from_pretrainedandQwen3VLProcessor.from_pretraineddirectly inside_make_energon_dataset. The dedicated test mocks both at the recipe-module level and asserts the resulting dataset is anEnergonProvider.Test plan
python3 -m astparse of new filesruff checkcleanruff format --checkcleantests/unit_tests/recipes/qwen_vl/, which is already covered bycicd-unit-tests-core.Risk
Zero — tests only. No production code changes.
Notes for reviewers
AutoBridge(the inheritance through_qwen3_vl_common) and the autouse fixture is cleanest when scoped to one test class per module, and (b) the recipe families have different attribute-touching surfaces (qwen35_vl sets MTP, qwen3_vl uses the flex-dispatcher helper), so split files keep each test focused.AutoTokenizer/Qwen3VLProcessorat the recipe-module level. All other recipes go throughAutoBridgeonly.qwen25_vl.py— that one already follows the "flat" pattern this issue asks for, but is not yet covered by unit tests either. I'd happily land that in a follow-up if the maintainers want — it's the same pattern.