Official implementation of "Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion" (ICRA 2026).
MIMIC is a goal-free, long-context sidewalk navigation policy: it takes a short history of RGB frames and predicts a local trajectory for autonomous sidewalk driving.
The pretrained MIMIC policy is released in the UCLA-VAIL/Navigation-Model-Zoo-Public model zoo — exported to ONNX (mimic.onnx) behind a unified inference interface.
| Property | Value |
|---|---|
| Goal mode | goal-free |
| Context | 16 RGB frames |
| Input resolution | 288 × 512 (fixed — frames must already be this size; not resized internally) |
| Normalization | none — pixel values in [0, 1] |
| Input rate | 5 Hz |
| Output | 15 waypoints (x, y, yaw) at non-uniform timestamps 0.2 s–5.0 s; the wrapper keeps the first 13 (~4 s) |
| Frame | standard: x = forward, y = left, meters |
pip install -U "huggingface_hub[cli]"
hf download UCLA-VAIL/Navigation-Model-Zoo-Public --include "MIMIC/*" --local-dir ./nav_model_zoopip install onnxruntime-gpu numpy torch pyyaml # use onnxruntime instead for CPU-only
⚠️ urbansimis required to import.MIMIC/inference.pyrunsfrom urbansim.custom.pp import PurePursuitControllerat module load, so theurbansimpackage must be importable. It is only used to construct a helper that the shipped inference path never calls — if you don't haveurbansim, comment out that import to run pure ONNX inference.
import numpy as np
from MIMIC.inference import MIMICNavigator # run from ./nav_model_zoo
nav = MIMICNavigator(device="cuda") # device="cpu" if no GPU
# obs: the robot's last 16 RGB frames at 288×512, (1, 16, 3, 288, 512) float32 in [0, 1]
obs = np.random.rand(1, 16, 3, 288, 512).astype(np.float32)
# MIMIC is goal-free — no goal argument
traj, scores = nav.inference_trajectory(obs) # (1, 1, 13, 2) meters
vw, best = nav.inference_vw(obs) # vw: (1, 2) = [v, ω]; best: (1, 13, 2)
nav.reset() # clear PD smoothing between episodesinference_trajectory returns local waypoints in meters; inference_vw turns the trajectory into a [linear_v, angular_ω] command via a built-in PD controller (tune limits with max_v / max_w at construction). Feed frames in temporal order [t-15, …, t].
If you find MIMIC helpful for your research, please cite:
@article{he2026learning,
title={Learning Sidewalk Autopilot from Multi-Scale Imitation with Corrective Behavior Expansion},
author={He, Honglin and Ma, Yukai and Squicciarini, Brad and Wu, Wayne and Zhou, Bolei},
journal={arXiv preprint arXiv:2603.22527},
year={2026}
}