Skip to content

[recipe, test] feat: add unit tests for qwen2_audio and kimi_k25_vl recipes#3646

Merged
cuichenx merged 1 commit intoNVIDIA-NeMo:mainfrom
lonexreb:training/recipe-tests-3177
May 5, 2026
Merged

[recipe, test] feat: add unit tests for qwen2_audio and kimi_k25_vl recipes#3646
cuichenx merged 1 commit intoNVIDIA-NeMo:mainfrom
lonexreb:training/recipe-tests-3177

Conversation

@lonexreb
Copy link
Copy Markdown
Contributor

@lonexreb lonexreb commented May 4, 2026

Summary

Two recipes under src/megatron/bridge/recipes/ have no unit-test coverage. Add them.

  • qwen2_audio.qwen2_audio_7b_finetune_config — 18 tests
  • kimi_vl.kimi_k25_vl_sft_config (and _get_kimi_k25_vl_pipeline_layout) — 16 tests

Both modules monkeypatch AutoBridge to bypass HuggingFace Hub I/O and assert the recipe's documented contract end-to-end. Pattern mirrors the existing test_kimi_k2.py and test_glm_45v_recipes.py.

Refs #3177.

Why this matters

The original issue notes that Kimi K2 had a recipe but no recipe test, which let an MCore-side API breaking change land undetected. Kimi K2 itself has since gained tests/unit_tests/recipes/kimi/test_kimi_k2.py, but a sweep of src/megatron/bridge/recipes/ vs tests/unit_tests/recipes/ still surfaces several recipe modules with no recipe-test counterpart. This PR closes the easiest two, both of which are entirely missing today.

What's covered

test_qwen2_audio_recipes.py

  • ConfigContainer shape, default training cadence (train_iters=2000, GBS=32, MBS=1), validation settings.
  • Default parallelism (TP=1, PP=1), seq_length propagation into both model_cfg and the HFDatasetConversationProvider.
  • Dataset wiring: make_default_audio_dataset maker, default train / dev split selection, processor path.
  • Default freeze flags (language model, audio model, audio projection) and override propagation.
  • Full-SFT vs PEFT lr selection (the entry point's branch): peft=None and peft=\"none\"lr=5e-6; peft=\"lora\"lr=1e-4; peft=\"dora\" attaches a DoRA instance; unknown PEFT scheme raises a clear ValueError.
  • User-supplied finetune_lr overrides the default.
  • Parallelism / seq_length overrides apply.
  • DDP defaults (no overlap, fp32 grad reduce, dist optimizer on, optim+grads+params sharding), checkpoint format (torch_dist, parallel save), RNG seed, val/test maker overrides.

test_kimi_k25_vl.py

  • _get_kimi_k25_vl_pipeline_layout helper:
    • PP=1, VP=1 returns None.
    • PP=16, VP=1 (the SFT default) matches the documented stage breakdown.
    • All 7 known (PP, VP) combinations return non-empty list-of-lists.
    • vp_size=None is normalized to 1.
    • Unknown combinations raise ValueError.
    • Each call returns a fresh copy (mutation-safe).
  • kimi_k25_vl_sft_config:
    • ConfigContainer shape, default training cadence, validation cadence.
    • Default parallelism (TP=2, PP=16, EP=32, SP on, expert TP=1).
    • Full activation recompute (uniform, 1 layer), pipeline split flags, layout matches helper output.
    • Muon-compatible DDP wiring (use_distributed_optimizer=False, overlap_param_gather=False, no_shard strategy).
    • HF processor path, NullTokenizer with vocab_size pulled from the provider.
    • bf16 mixed precision, fp32 optimizer precision, dist_muon selected.
    • MoE wiring (alltoall + deepep flex backend, fused permute, grouped GEMM, shared-expert overlap on).
    • TE backend with CUDA graphs disabled by default, kernel selections (cross_entropy_fusion_impl=\"te\").
    • Comm overlap defaults (tp_comm_overlap=False, delay_wgrad_compute=False).
    • Checkpoint cadence and async-save default.
    • Rope-fusion gating contract: when model.apply_rope_fusion=False the experimental dist flag stays False; when True the recipe flips cfg.dist.enable_megatron_core_experimental=True.

Test plan

  • python3 -m ast parse of new files
  • ruff check clean
  • ruff format --check clean
  • CI: L0 unit tests pick up the new modules automatically (no workflow file changes needed).

Risk

Zero — tests only. No production code changes.

Notes for reviewers

  • Both stub providers expose only the attribute surface the recipe touches, so adding new attribute reads/writes in either recipe will surface as AttributeError here, matching the existing test_kimi_k2.py style.
  • qwen2_audio reuses the existing HFDatasetConversationProvider dataclass, which is import-safe (no HF I/O at construction).
  • The kimi_k25_vl test's rope-fusion-on case overrides the bridge inside the test to flip apply_rope_fusion=True, exercising the recipe's if cfg.model.apply_rope_fusion: branch.

…ecipes

Two recipes under `src/megatron/bridge/recipes/` had no unit-test coverage:

  - `qwen2_audio.qwen2_audio_7b_finetune_config`
  - `kimi_vl.kimi_k25_vl_sft_config` (and `_get_kimi_k25_vl_pipeline_layout`)

Following the established pattern in `test_kimi_k2.py` and the GLM-4.5V
recipe tests, both new test modules monkeypatch `AutoBridge` to bypass
HuggingFace Hub I/O, then assert the recipe's documented contract end
to end:

`test_qwen2_audio_recipes.py` (18 tests):
  - Basic ConfigContainer shape
  - Default parallelism (TP=1, PP=1, no CP/SP)
  - Default training cadence and validation settings
  - seq_length propagation into both `model_cfg` and the dataset provider
  - Default freeze flags (language model, audio model, audio projection)
  - NullTokenizer wiring
  - HFDatasetConversationProvider with the audio maker and
    train/dev split selection
  - Full-SFT (peft=None) path picks lr=5e-6
  - peft="none" string is normalized to full SFT
  - peft="lora" path picks lr=1e-4
  - peft="dora" attaches a DoRA instance
  - Unknown PEFT scheme raises a clear ValueError
  - User-supplied `finetune_lr` overrides the default
  - Freeze-flag overrides propagate to the provider
  - Parallelism overrides apply
  - seq_length override propagates to dataset provider
  - DDP / checkpoint / RNG defaults
  - val/test split overrides flow into the dataset provider

`test_kimi_k25_vl.py` (16 tests):
  - Pipeline-layout helper: PP=1/VP=1 returns None, PP=16/VP=1 matches
    the documented stage breakdown, all 7 known PP/VP combinations
    return non-empty list-of-lists, vp_size=None is normalized to 1,
    unknown combinations raise ValueError, layout is a fresh copy on
    each call
  - SFT recipe: ConfigContainer shape, training cadence, parallelism
    (TP=2, PP=16, EP=32, SP on), full activation recompute (uniform, 1
    layer), pipeline split flags off, layout matches helper output,
    Muon-compatible DDP wiring (no dist optimizer, no param overlap),
    HF processor path on dataset, NullTokenizer with vocab_size pulled
    from the provider, bf16 mixed precision, fp32 optimizer precision,
    `dist_muon` selected, MoE wiring (alltoall + deepep, fused permute,
    grouped GEMM), TE backend with CUDA graphs off, kernel selections,
    comm overlap defaults, checkpoint cadence, rope-fusion-off keeps
    the experimental dist flag at its default `False`, rope-fusion-on
    flips the experimental dist flag to `True`

No production code changes — tests only.

Refs issue NVIDIA-NeMo#3177.

Signed-off-by: lonexreb <[email protected]>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 4, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cuichenx
Copy link
Copy Markdown
Contributor

cuichenx commented May 5, 2026

/ok to test fda93c7

@cuichenx cuichenx added the ready-to-merge PR is approved, current, and only waiting for CI to pass before merge label May 5, 2026
@cuichenx cuichenx linked an issue May 5, 2026 that may be closed by this pull request
@cuichenx cuichenx merged commit af9ea4b into NVIDIA-NeMo:main May 5, 2026
60 of 62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request ready-to-merge PR is approved, current, and only waiting for CI to pass before merge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[feature] Recipe tests missing for some recipes

3 participants