This repository provides training and evaluation scripts for SARM on both the LeRobot dataset and raw robot trajectories.
We recommend using uv for dependency management.
git clone https://github.com/xdofai/opensarmpip install uvuv syncsource .venv/bin/activatepython train.py --config-name sarmpython eval.py --config-name sarmpython eval.py --config-name sarm --mode raw_data- Replace
sarmwithrewindin the commands above if you want to run the baseline method. - All configs are stored under the
config/directory.
We use a modified LeRobotDataset structure:
A typical LeRobotDataset looks like this from its root path:
.
├── data
│ ├── chunk-000
│ │ ├── episode_000000.parquet
│ │ ├── episode_000001.parquet
│ │ ├── episode_000002.parquet
│ │ └── ...
│ ├── chunk-001
│ │ ├── episode_001000.parquet
│ │ ├── episode_001001.parquet
│ │ ├── episode_001002.parquet
│ │ └── ...
│ └── ...
├── meta
│ ├── episodes.jsonl
│ ├── info.json
│ ├── stats.json
│ └── tasks.jsonl
└── videos
├── chunk-000
│ ├── left_camera-images-rgb
│ │ ├── episode_000000.mp4
│ │ ├── episode_000001.mp4
│ │ ├── episode_000002.mp4
│ │ └── ...
| ├── right_camera-images-rgb
│ │ ├── episode_000000.mp4
│ │ ├── episode_000001.mp4
│ │ ├── episode_000002.mp4
│ │ └── ...
| ├── top_camera-images-rgb
│ │ ├── episode_000000.mp4
│ │ ├── episode_000001.mp4
│ │ ├── episode_000002.mp4
│ │ └── ...
├── chunk-001
└── ...Each episode_XXXXXX.parquet stores a time-series trajectory.
Each row corresponds to one timestep and contains the following columns:
| Column | Type | Shape | Dtype | Description |
|---|---|---|---|---|
state |
np.ndarray |
(state_dim,) | float64 | Robot state vector at time t. |
actions |
np.ndarray |
(act_dim,) | float64 | Action vector applied at time t. |
reward |
np.ndarray |
(1,) | float32 | Scalar progress at time t. |
timestamp |
np.ndarray |
(1,) | float64 | Timestamp (seconds). |
frame_index |
np.ndarray |
(1,) | int64 | Frame id in the video stream. |
episode_index |
np.ndarray |
(1,) | int64 | Episode id (redundant but convenient for joins). |
index |
np.ndarray |
(1,) | int64 | Global step index (or row index). |
task_index |
np.ndarray |
(1,) | int64 | Task id (maps to meta/tasks.jsonl). |
Note: Despite its name, reward stores the absolute task progress (value) rather than a reinforcement learning reward.
- The repository structure is adapted from diffusion_policy.
- The dataset format follows the LeRobotDataset specification.
- The ReWiND model included in this repository is our reproduction from the paper; please refer to the official ReWiND repository for the original implementation.
If you find our paper or code is useful, please consider citing:
@article{chen2025sarm,
title={SARM: Stage-Aware Reward Modeling for Long Horizon Robot Manipulation},
author={Chen, Qianzhong and Yu, Justin and Schwager, Mac and Abbeel, Pieter and Shentu, Yide and Wu, Philipp},
journal={arXiv preprint arXiv:2509.25358},
year={2025}
}
