Add regression tests for compute_minade and summarize_metric keys#91
Open
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
Open
Add regression tests for compute_minade and summarize_metric keys#91lonexreb wants to merge 1 commit intoNVlabs:mainfrom
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
Conversation
Pins down the return-dict contract that PR NVlabs#86 spelled out in the docstrings: compute_minade emits a `min_ade` headline key plus one per-horizon `min_ade/by_t={H:.1f}` key per valid horizon, and summarize_metric layers `_std` on top only when N > 1 and summarization is enabled. src/alpamayo_r1/metrics/test_distance_metrics.py: compute_ade: - Returns the documented [B, N, K] shape. - Honors timestep_horizon by truncating the time axis (head-only mean differs from full-trajectory mean). compute_minade: - Emits all documented keys (`min_ade` + per-horizon + `_std` variants when N > 1). - Skips horizons that exceed T. - `disable_summary=True` suppresses `_std` entries. summarize_metric: - Adds `<key>_std` when N > 1. - Does NOT add `_std` when N == 1 or disable_summary=True. - Raises ValueError on inputs that are not [B, N]. Pure torch, no GPU/HF dependency. Runs on CPU in seconds. Signed-off-by: lonexreb <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
PR #86 corrected the docstrings of
compute_minade,compute_grouped_corner_distance, andsummarize_metricto match the keys the functions actually return. Those docstrings now act as a public contract — but there is nothing pinning that contract down. A future edit can quietly change the key names (or the gating on_std) and nothing fails.This PR adds focused, dependency-light pytest cases that lock in the contract documented in #86.
What
src/alpamayo_r1/metrics/test_distance_metrics.py— 9 tests, no GPU / no HF auth required:compute_ade:[B, N, K]shape.timestep_horizonactually truncates the time axis (head-only mean ≠ full-trajectory mean).compute_minade:min_adeplus onemin_ade/by_t={H:.1f}per valid horizon (precision matchesf"{t * time_step:.1f}")._stdentries whenN > 1.T(thevalid_horizons = [t for t in timestep_horizons if t <= T]filter).disable_summary=Truesuppresses_std.summarize_metric:<key>_stdwhenN > 1anddisable_summary=False._stdwhenN == 1._stdwhendisable_summary=True, even withN > 1.ValueErroron inputs that are not[B, N].Run
Pure torch, runs on CPU in seconds. Uses
torch.manual_seed(0)for determinism.