Skip to content

Add regression tests for compute_minade and summarize_metric keys#91

Open
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
lonexreb:test/regression-tests-metrics-keys
Open

Add regression tests for compute_minade and summarize_metric keys#91
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
lonexreb:test/regression-tests-metrics-keys

Conversation

@lonexreb
Copy link
Copy Markdown
Contributor

@lonexreb lonexreb commented May 4, 2026

Why

PR #86 corrected the docstrings of compute_minade, compute_grouped_corner_distance, and summarize_metric to match the keys the functions actually return. Those docstrings now act as a public contract — but there is nothing pinning that contract down. A future edit can quietly change the key names (or the gating on _std) and nothing fails.

This PR adds focused, dependency-light pytest cases that lock in the contract documented in #86.

What

src/alpamayo_r1/metrics/test_distance_metrics.py — 9 tests, no GPU / no HF auth required:

compute_ade:

  • Returns the documented [B, N, K] shape.
  • timestep_horizon actually truncates the time axis (head-only mean ≠ full-trajectory mean).

compute_minade:

  • Returns min_ade plus one min_ade/by_t={H:.1f} per valid horizon (precision matches f"{t * time_step:.1f}").
  • Returns matching _std entries when N > 1.
  • Skips horizons that exceed T (the valid_horizons = [t for t in timestep_horizons if t <= T] filter).
  • disable_summary=True suppresses _std.

summarize_metric:

  • Adds <key>_std when N > 1 and disable_summary=False.
  • Does not add _std when N == 1.
  • Does not add _std when disable_summary=True, even with N > 1.
  • Raises ValueError on inputs that are not [B, N].

Run

pytest src/alpamayo_r1/metrics/test_distance_metrics.py -v

Pure torch, runs on CPU in seconds. Uses torch.manual_seed(0) for determinism.

Pins down the return-dict contract that PR NVlabs#86 spelled out in the
docstrings: compute_minade emits a `min_ade` headline key plus one
per-horizon `min_ade/by_t={H:.1f}` key per valid horizon, and
summarize_metric layers `_std` on top only when N > 1 and summarization
is enabled.

src/alpamayo_r1/metrics/test_distance_metrics.py:

compute_ade:
- Returns the documented [B, N, K] shape.
- Honors timestep_horizon by truncating the time axis (head-only mean
  differs from full-trajectory mean).

compute_minade:
- Emits all documented keys (`min_ade` + per-horizon + `_std` variants
  when N > 1).
- Skips horizons that exceed T.
- `disable_summary=True` suppresses `_std` entries.

summarize_metric:
- Adds `<key>_std` when N > 1.
- Does NOT add `_std` when N == 1 or disable_summary=True.
- Raises ValueError on inputs that are not [B, N].

Pure torch, no GPU/HF dependency. Runs on CPU in seconds.

Signed-off-by: lonexreb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant