[recipe, test] feat: add unit tests for qwen2_audio and kimi_k25_vl recipes by lonexreb · Pull Request #3646 · NVIDIA-NeMo/Megatron-Bridge

lonexreb · 2026-05-04T08:19:40Z

Summary

Two recipes under src/megatron/bridge/recipes/ have no unit-test coverage. Add them.

qwen2_audio.qwen2_audio_7b_finetune_config — 18 tests
kimi_vl.kimi_k25_vl_sft_config (and _get_kimi_k25_vl_pipeline_layout) — 16 tests

Both modules monkeypatch AutoBridge to bypass HuggingFace Hub I/O and assert the recipe's documented contract end-to-end. Pattern mirrors the existing test_kimi_k2.py and test_glm_45v_recipes.py.

Refs #3177.

Why this matters

The original issue notes that Kimi K2 had a recipe but no recipe test, which let an MCore-side API breaking change land undetected. Kimi K2 itself has since gained tests/unit_tests/recipes/kimi/test_kimi_k2.py, but a sweep of src/megatron/bridge/recipes/ vs tests/unit_tests/recipes/ still surfaces several recipe modules with no recipe-test counterpart. This PR closes the easiest two, both of which are entirely missing today.

What's covered

`test_qwen2_audio_recipes.py`

ConfigContainer shape, default training cadence (train_iters=2000, GBS=32, MBS=1), validation settings.
Default parallelism (TP=1, PP=1), seq_length propagation into both model_cfg and the HFDatasetConversationProvider.
Dataset wiring: make_default_audio_dataset maker, default train / dev split selection, processor path.
Default freeze flags (language model, audio model, audio projection) and override propagation.
Full-SFT vs PEFT lr selection (the entry point's branch): peft=None and peft=\"none\" → lr=5e-6; peft=\"lora\" → lr=1e-4; peft=\"dora\" attaches a DoRA instance; unknown PEFT scheme raises a clear ValueError.
User-supplied finetune_lr overrides the default.
Parallelism / seq_length overrides apply.
DDP defaults (no overlap, fp32 grad reduce, dist optimizer on, optim+grads+params sharding), checkpoint format (torch_dist, parallel save), RNG seed, val/test maker overrides.

`test_kimi_k25_vl.py`

_get_kimi_k25_vl_pipeline_layout helper:
- PP=1, VP=1 returns None.
- PP=16, VP=1 (the SFT default) matches the documented stage breakdown.
- All 7 known (PP, VP) combinations return non-empty list-of-lists.
- vp_size=None is normalized to 1.
- Unknown combinations raise ValueError.
- Each call returns a fresh copy (mutation-safe).
kimi_k25_vl_sft_config:
- ConfigContainer shape, default training cadence, validation cadence.
- Default parallelism (TP=2, PP=16, EP=32, SP on, expert TP=1).
- Full activation recompute (uniform, 1 layer), pipeline split flags, layout matches helper output.
- Muon-compatible DDP wiring (use_distributed_optimizer=False, overlap_param_gather=False, no_shard strategy).
- HF processor path, NullTokenizer with vocab_size pulled from the provider.
- bf16 mixed precision, fp32 optimizer precision, dist_muon selected.
- MoE wiring (alltoall + deepep flex backend, fused permute, grouped GEMM, shared-expert overlap on).
- TE backend with CUDA graphs disabled by default, kernel selections (cross_entropy_fusion_impl=\"te\").
- Comm overlap defaults (tp_comm_overlap=False, delay_wgrad_compute=False).
- Checkpoint cadence and async-save default.
- Rope-fusion gating contract: when model.apply_rope_fusion=False the experimental dist flag stays False; when True the recipe flips cfg.dist.enable_megatron_core_experimental=True.

Test plan

python3 -m ast parse of new files
ruff check clean
ruff format --check clean
CI: L0 unit tests pick up the new modules automatically (no workflow file changes needed).

Risk

Zero — tests only. No production code changes.

Notes for reviewers

Both stub providers expose only the attribute surface the recipe touches, so adding new attribute reads/writes in either recipe will surface as AttributeError here, matching the existing test_kimi_k2.py style.
qwen2_audio reuses the existing HFDatasetConversationProvider dataclass, which is import-safe (no HF I/O at construction).
The kimi_k25_vl test's rope-fusion-on case overrides the bridge inside the test to flip apply_rope_fusion=True, exercising the recipe's if cfg.model.apply_rope_fusion: branch.

…ecipes Two recipes under `src/megatron/bridge/recipes/` had no unit-test coverage: - `qwen2_audio.qwen2_audio_7b_finetune_config` - `kimi_vl.kimi_k25_vl_sft_config` (and `_get_kimi_k25_vl_pipeline_layout`) Following the established pattern in `test_kimi_k2.py` and the GLM-4.5V recipe tests, both new test modules monkeypatch `AutoBridge` to bypass HuggingFace Hub I/O, then assert the recipe's documented contract end to end: `test_qwen2_audio_recipes.py` (18 tests): - Basic ConfigContainer shape - Default parallelism (TP=1, PP=1, no CP/SP) - Default training cadence and validation settings - seq_length propagation into both `model_cfg` and the dataset provider - Default freeze flags (language model, audio model, audio projection) - NullTokenizer wiring - HFDatasetConversationProvider with the audio maker and train/dev split selection - Full-SFT (peft=None) path picks lr=5e-6 - peft="none" string is normalized to full SFT - peft="lora" path picks lr=1e-4 - peft="dora" attaches a DoRA instance - Unknown PEFT scheme raises a clear ValueError - User-supplied `finetune_lr` overrides the default - Freeze-flag overrides propagate to the provider - Parallelism overrides apply - seq_length override propagates to dataset provider - DDP / checkpoint / RNG defaults - val/test split overrides flow into the dataset provider `test_kimi_k25_vl.py` (16 tests): - Pipeline-layout helper: PP=1/VP=1 returns None, PP=16/VP=1 matches the documented stage breakdown, all 7 known PP/VP combinations return non-empty list-of-lists, vp_size=None is normalized to 1, unknown combinations raise ValueError, layout is a fresh copy on each call - SFT recipe: ConfigContainer shape, training cadence, parallelism (TP=2, PP=16, EP=32, SP on), full activation recompute (uniform, 1 layer), pipeline split flags off, layout matches helper output, Muon-compatible DDP wiring (no dist optimizer, no param overlap), HF processor path on dataset, NullTokenizer with vocab_size pulled from the provider, bf16 mixed precision, fp32 optimizer precision, `dist_muon` selected, MoE wiring (alltoall + deepep, fused permute, grouped GEMM), TE backend with CUDA graphs off, kernel selections, comm overlap defaults, checkpoint cadence, rope-fusion-off keeps the experimental dist flag at its default `False`, rope-fusion-on flips the experimental dist flag to `True` No production code changes — tests only. Refs issue NVIDIA-NeMo#3177. Signed-off-by: lonexreb <[email protected]>

copy-pr-bot · 2026-05-04T08:19:44Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cuichenx · 2026-05-05T01:09:49Z

/ok to test fda93c7

github-actions Bot added the community-request label May 4, 2026

lonexreb mentioned this pull request May 4, 2026

[recipe, test] feat: add unit tests for qwen3_vl and qwen35_vl recipes #3648

Closed

4 tasks

cuichenx approved these changes May 5, 2026

View reviewed changes

cuichenx added the ready-to-merge PR is approved, current, and only waiting for CI to pass before merge label May 5, 2026

cuichenx linked an issue May 5, 2026 that may be closed by this pull request

[feature] Recipe tests missing for some recipes #3177

Closed

copy-pr-bot Bot temporarily deployed to test May 5, 2026 01:10 Inactive

cuichenx merged commit af9ea4b into NVIDIA-NeMo:main May 5, 2026
60 of 62 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[recipe, test] feat: add unit tests for qwen2_audio and kimi_k25_vl recipes#3646

[recipe, test] feat: add unit tests for qwen2_audio and kimi_k25_vl recipes#3646
cuichenx merged 1 commit intoNVIDIA-NeMo:mainfrom
lonexreb:training/recipe-tests-3177

lonexreb commented May 4, 2026

Uh oh!

copy-pr-bot Bot commented May 4, 2026

Uh oh!

cuichenx commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants