Skip to content

Add regression tests for basic_collation_fn and get_assistant_mask#90

Open
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
lonexreb:test/regression-tests-collation-and-mask
Open

Add regression tests for basic_collation_fn and get_assistant_mask#90
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
lonexreb:test/regression-tests-collation-and-mask

Conversation

@lonexreb
Copy link
Copy Markdown
Contributor

@lonexreb lonexreb commented May 4, 2026

Why

Two small helpers in the package have public-API contracts that recently had docs/fix PRs land against them:

Without tests, a future edit can quietly regress either contract. This PR follows the same pattern as the to_device tests in #80 — add focused, dependency-light pytest cases that lock in the documented behavior.

What

src/alpamayo_r1/processor/test_qwen_processor.py — 5 tests covering basic_collation_fn:

  • Default arg behaves as "no extra unstackable keys".
  • unstackable_keys=None is accepted and equivalent to omitting it.
  • Keys named in unstackable_keys come back as a plain list.
  • Repeated calls don't accumulate state via the default arg.
  • Non-tensor values are always returned as lists (never stacked).

src/alpamayo_r1/utils/test_get_label_mask.py — 4 tests covering get_assistant_mask:

  • torch.Tensor input → torch.Tensor (bool) output.
  • list[int] input → list[bool] output (the dual return shape the annotation now reflects).
  • The mask covers exactly the assistant content + trailing EOS, with the user turn untouched (matches the body's start+3 / end+1 logic).
  • Tensor and list input paths produce identical booleans.

Tests use a tiny stub tokenizer that only implements convert_tokens_to_ids, so they run on CI without HF auth or model downloads.

Run

pytest src/alpamayo_r1/processor/test_qwen_processor.py src/alpamayo_r1/utils/test_get_label_mask.py -v

Notes

Pins down the public-API contracts of two helpers that recently had
small fix/docs PRs land against them, so future edits don't quietly
regress the same surface.

src/alpamayo_r1/processor/test_qwen_processor.py covers
basic_collation_fn:
- Default arg behaves as "no extra unstackable keys" (no hidden mutable
  state across calls).
- unstackable_keys=None is accepted and equivalent to omitting it.
- Keys named in unstackable_keys are returned as a plain list.
- Repeated calls don't accumulate state through the default.
- Non-tensor values are always returned as lists (never stacked).

src/alpamayo_r1/utils/test_get_label_mask.py covers get_assistant_mask:
- torch.Tensor input -> torch.Tensor (bool) output.
- list[int] input -> list[bool] output (the dual return shape that the
  annotation now reflects).
- The mask covers exactly the assistant content + trailing EOS, with
  the user turn untouched (matches the body's start+3 / end+1 logic).
- Tensor and list input paths produce the same booleans.

Tests use a tiny stub tokenizer to avoid downloading real model
weights, so they run on CI without HF auth.

Signed-off-by: lonexreb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant