Add trajectory smoothness metrics for evaluation by lonexreb · Pull Request #102 · NVlabs/alpamayo

lonexreb · 2026-05-04T14:43:45Z

Why

ADE / FDE answer "is the trajectory accurate?" Smoothness metrics answer "is the trajectory drivable?" — and right now there is no easy way to ask the second question during evaluation.

The kinematic derivations already exist in finetune/rl/rewards/comfort_reward.py, but they're wrapped in a within-bound boolean used as an RL reward. A researcher who wants to compare two checkpoints on "did this model produce smoother trajectories?" has no metric for that. Several open issues mention model-output quality (e.g. #10 reports occasional erratic predictions) — smoothness numbers help triage that exact class of failure.

What

New module src/alpamayo_r1/metrics/smoothness_metrics.py exposing:

compute_smoothness_metrics(
    pred_xyz: Tensor,          # [B, N, K, T, 3]
    pred_rot: Tensor,          # [B, N, K, T, 3, 3]
    disable_summary: bool = False,
    planning_freq_hz: float = 10.0,
) -> dict[str, Tensor]

Returns per-batch tensors of shape [B] with both the RMS (typical signal level) and the absolute peak (worst-case) of:

smoothness/jerk_lon (longitudinal jerk, m/s³)
smoothness/accel_lon (longitudinal acceleration, m/s²)
smoothness/accel_lat (lateral acceleration, m/s²)
smoothness/yaw_rate (rad/s)
smoothness/yaw_accel (rad/s²)

Each → <key>_rms and <key>_max. _std variants are added by summarize_metric when N > 1, exactly mirroring compute_minade / compute_minfde so dashboards already grouping on *_std pick them up automatically.

Also exposes gather_dynamics(pred_xyz, pred_rot, planning_freq_hz) for callers that want the raw per-timestep signals.

Design notes

Pure additive change. comfort_reward.py is untouched. The small _diff / _diff_yaw helpers are deliberately duplicated so the metrics layer doesn't take a dependency on the finetune.rl package. The module docstring calls out the intentional duplication and the reason.
Yaw wraparound is handled so a heading transition across ±π does not report the spurious ~2π·freq_hz spike a naive diff would yield.
Both peak and RMS are reported because they answer different questions: peak captures worst-case discomfort, RMS captures sustained effort.
Shape validation up front (pred_xyz must be [B, N, K, T, 3], pred_rot must be [B, N, K, T, 3, 3] with matching leading dims), raising ValueError rather than failing deep in a tensor op.

Tests

src/alpamayo_r1/metrics/test_smoothness_metrics.py — 9 pytest cases, no GPU / no HF needed. Verified locally:

PASS: constant velocity gives v_lon=5, zero everywhere else
PASS: constant accel ax=2 (interior mean = 2.000000)
PASS: yaw wraparound (max |yaw_rate| = 0.400000, NOT ~62)
PASS: emits 10 expected keys, shape [B]
PASS: _std variants when N>1
PASS: disable_summary drops _std
PASS: constant-velocity reports near-zero on all signals
PASS: rejects xyz wrong shape
PASS: rejects rot wrong shape

Test fixtures use float64 to keep finite-difference noise at machine precision (twice-differentiating a 0.1 s step in float32 accumulates ~1e-4 jerk noise that would force loose tolerances and make the contract less crisp).

Migration

None. New module, new file, new tests. Existing call sites untouched.

Adds src/alpamayo_r1/metrics/smoothness_metrics.py with compute_smoothness_metrics(pred_xyz, pred_rot, disable_summary, planning_freq_hz) -> dict[str, Tensor] that reports per-batch RMS and max magnitude of: - jerk_lon (longitudinal jerk, m/s^3) - accel_lon (longitudinal acceleration, m/s^2) - accel_lat (lateral acceleration, m/s^2) - yaw_rate (rad/s) - yaw_accel (rad/s^2) Why this is worth adding: The kinematic derivations already exist in finetune/rl/rewards/comfort_reward.py, but they are wrapped in a within-bound boolean used as an RL reward. There is no easy way to report the actual signal magnitudes during evaluation today, so a researcher who wants to compare two checkpoints on "did this model produce smoother trajectories?" has no metric for that. This module surfaces the same physical signals as evaluation metrics parallel to compute_ade / compute_fde. ADE/FDE answer "is the trajectory accurate?"; smoothness metrics answer "is the trajectory drivable?". Both are needed and they are independent. Design notes: - Pure additive change. comfort_reward.py is untouched. We deliberately duplicate the small _diff helpers so the metrics layer doesn't take a dependency on the RL training package; the docstring calls out the intentional duplication and explains why. - Returns per-batch tensors of shape [B], matching the convention of every other metric in this module after summarize_metric. Per-key _std variants are added when N > 1 (consistent with compute_minade / compute_minfde). - Yaw rate uses pi-wraparound handling so heading transitions across +/-pi do not report the spurious ~2*pi*freq_hz spike a naive diff would yield. - Both peak and RMS are reported because they answer different questions: peak captures worst-case discomfort, RMS captures sustained effort; we have seen models that are good on one and bad on the other. Also adds src/alpamayo_r1/metrics/test_smoothness_metrics.py with 9 pytest cases covering: returned keys + shapes, constant-velocity is identically smooth, constant-acceleration recovers the input ax, pi-wraparound is handled, _std gating on N and disable_summary, and shape rejection on malformed inputs. All 9 functional invariants verified locally against torch (no GPU / HF needed) -- output: PASS: constant velocity gives v_lon=5, zero everywhere else PASS: constant accel ax=2 (interior mean = 2.000000) PASS: yaw wraparound (max |yaw_rate| = 0.400000, NOT ~62) PASS: emits 10 expected keys, shape [B] PASS: _std variants when N>1 PASS: disable_summary drops _std PASS: constant-velocity reports near-zero on all signals PASS: rejects xyz wrong shape PASS: rejects rot wrong shape Signed-off-by: lonexreb <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trajectory smoothness metrics for evaluation#102

Add trajectory smoothness metrics for evaluation#102
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
lonexreb:feat/add-trajectory-smoothness-metrics

lonexreb commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lonexreb commented May 4, 2026

Why

What

Design notes

Tests

Migration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant