Skip to content

Add Final Displacement Error (FDE) and minFDE metrics#101

Open
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
lonexreb:feat/add-fde-and-minfde-metrics
Open

Add Final Displacement Error (FDE) and minFDE metrics#101
lonexreb wants to merge 1 commit intoNVlabs:mainfrom
lonexreb:feat/add-fde-and-minfde-metrics

Conversation

@lonexreb
Copy link
Copy Markdown
Contributor

@lonexreb lonexreb commented May 4, 2026

Why

alpamayo_r1.metrics.distance_metrics already implements ADE / minADE / corner_distance, but the standard companion metric for trajectory prediction — Final Displacement Error — is missing.

ADE measures average L2 error along the trajectory; FDE measures L2 error at the last timestep. Reporting one without the other distorts the picture in two opposite ways:

  • A model that drifts early but recovers looks worse under ADE than under FDE.
  • A model that nails the early steps but veers off at the end looks better under ADE than under FDE.

Every major trajectory-prediction benchmark (Waymo Open Motion, nuScenes, Argoverse) reports both. Adding FDE here closes a real evaluation gap and aligns Alpamayo's metric outputs with the literature.

What

Two new functions in src/alpamayo_r1/metrics/distance_metrics.py that mirror compute_ade / compute_minade exactly — same shapes, same kwargs, same error semantics.

compute_fde(pred_xyz, gt_xyz, timestep_horizon=None, only_xy=True) -> torch.Tensor

Returns [B, N, K]. timestep_horizon=H evaluates FDE at the H-th timestep (1-indexed) instead of the final one — useful for reporting FDE@3s. Raises ValueError on timestep_horizon > T (mirroring compute_ade).

compute_minfde(pred_xyz, gt_xyz, disable_summary=False, timestep_horizons=[5, 10, 30, 50], only_xy=True, time_step=0.1) -> dict[str, Tensor]

Returns per-batch tensors of shape [B]:

  • min_fde: K-min FDE at the final timestep, averaged over the N groups.
  • min_fde/by_t={H:.1f}: K-min FDE at the timestep int(H / time_step), for each valid t in timestep_horizons. Out-of-range horizons are silently dropped (mirrors compute_minade).
  • <key>_std: stdev across the N groups, added by summarize_metric when N > 1 and disable_summary=False.

Key naming is parallel to compute_minade so dashboards / loggers that already group on min_ade* automatically pick up min_fde*.

Tests

src/alpamayo_r1/metrics/test_fde.py — 11 pytest cases, no GPU / no HF auth:

  • compute_fde shape, non-negativity, manual final-step L2 equivalence.
  • only_xy=False uses the 3D vector and returns a value >= the XY-projection FDE.
  • timestep_horizon=H selects index H-1.
  • timestep_horizon > T raises ValueError.
  • FDE is independent of intermediate timesteps (perturbing them by +100 leaves FDE unchanged).
  • FDE differs from ADE for non-trivial trajectories (sanity).
  • compute_minfde emits all documented keys + _std variants when N > 1.
  • Out-of-range horizons in compute_minfde are silently skipped.
  • disable_summary=True drops _std.
  • min_fde equals compute_fde(...).min(K).mean(N).

Local verification

Validated against torch (no GPU/HF needed) by exec'ing the new functions in isolation — all 6 core invariants pass. Output:

PASS: compute_fde shape (torch.Size([2, 3, 4])) and non-negative
PASS: compute_fde matches manual L2 at final step
PASS: compute_fde ignores intermediate timesteps
PASS: compute_minfde emits all documented keys + _std variants
PASS: compute_minfde silently skips out-of-range horizons
PASS: compute_fde rejects horizon > T

Migration

None. This is purely additive — existing call sites that import compute_ade / compute_minade are untouched.

ADE / minADE measure average L2 error along the trajectory, but the
companion metric for trajectory prediction -- Final Displacement Error
-- was missing. FDE is the L2 distance between predicted and
ground-truth positions at the *last* timestep, and minFDE is its K-min.
ADE without FDE under-reports models that drift early but recover, and
over-reports models that nail the early steps but veer off at the end.

This PR adds two functions to src/alpamayo_r1/metrics/distance_metrics.py
that mirror the existing compute_ade / compute_minade APIs exactly:

- compute_fde(pred_xyz, gt_xyz, timestep_horizon=None, only_xy=True)
  -> [B, N, K] tensor of FDE per sample. timestep_horizon=H reports
  FDE at the H-th timestep (1-indexed) instead of the final one,
  enabling FDE@3s and similar single-horizon reports.

- compute_minfde(pred_xyz, gt_xyz, disable_summary=False,
  timestep_horizons=[5, 10, 30, 50], only_xy=True, time_step=0.1)
  -> dict[str, Tensor] with `min_fde` plus `min_fde/by_t={H:.1f}` per
  valid horizon, plus `_std` variants from summarize_metric when N > 1.
  Mirrors compute_minade key naming exactly.

Both functions raise ValueError on out-of-range horizons (compute_fde)
or silently skip them (compute_minfde), matching the existing ADE
behavior so callers can swap with no surprises.

Also adds src/alpamayo_r1/metrics/test_fde.py with 11 pytest cases
covering: shape, non-negativity, manual L2 equivalence, only_xy=False
3D path, horizon selection at index t-1, ValueError on horizon > T,
independence from intermediate timesteps, key emission and _std gating
in compute_minfde, and that minFDE actually takes the K-min.

Verified locally against torch (without GPU/HF) by exec'ing the new
functions in isolation -- all 6 functional invariants pass.

Signed-off-by: lonexreb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant