Motion Tracks Policy (MT-π)

This repository contains implementations of the algorithms presented in Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning. For videos and key insights, please check out our website.

├── mt_pi/                      # Main package directory
│   ├── dataset/                # Data processing and visualization utilities
│   │   ├── process_robot_dataset.py    # Robot demo processing
│   │   ├── process_hand_dataset.py     # Human demo processing
│   ├── envs/                   # Robot environment interfaces
│   ├── models/                 # Model architectures and training utilities
│   │   ├── diffusion_policy.py         # Main diffusion policy model
│   │   ├── keypoint_map_predictor.py   # Keypoint prediction model
│   ├── scripts/                # Training and evaluation scripts
│   │   ├── train_mtpi.py               # Main training script
│   │   ├── eval_mtpi.py                # Evaluation script
│   │   ├── record_robot_demos.py       # Robot demo recording
│   │   ├── record_human_demos.py       # Human demo recording
│   │   └── ...                         # Additional scripts
│   ├── utils/                  # Utility functions and helpers
│   └── third_party/            # Third-party dependencies
├── install_deps.sh            # Dependency installation script
└── set_env.sh                 # Environment setup script

Installation 🛠️

Please run

source install_deps.sh

to set up all dependencies needed for this project.

Note this script only needs to be run once. For future uses, please run source set_env.sh once per shell before running any script from this repo.

Note

Our real-robot setup follows that of SPHINX. Specifically, please find a walkthrough on how to collect data using the point cloud interface on this Google Doc, and this link on how to set up the interface between the robot and workstation.

Camera Calibration 📷

Well-calibrated cameras are crucial for accurate triangulation used by MT-π. We provide a script for camera calibration with a Franka Panda.

Important

You may have to adjust the self.home variable in envs/minrobot/controller.py depending on the version of your robot.

We will use a yellow-colored cube for calibration. You may change the color as you wish, but in general, try picking a color with maximum contrast against your background.

Run python scripts/calibrate.py
Follow the instructions on the terminal to place the cube as close to the gripper root as possible, then press Enter to close.
After initial calibration, the associated error should be on the order of 1e-3. If not, it's possible that a) one of the cameras detected an object in the background instead of the block itself, or b) one or both of the cameras are flipped.
Continue following the instructions on the terminal to finish calibration and alignment of the two point clouds.

Recording Demonstrations 🎬

Recording Robot Demos

To record robot demonstrations, run

python scripts/record_robot_demos.py --data_folder data/robot_<task_name>

For robot data collection, we use a combination of waypoint interpolation via a UI and fine-grained control via a spacemouse, as shown in SPHINX. For information on how to collect data using this interface, please refer to this Google Doc.

Tip

The right button on the spacemouse is currently mapped to "End Demo". You can choose to use that, or the button on the interface at the end of each trajectory.

Recording Human Demos

To record human demonstrations, run

python scripts/record_human_demos.py --data_folder data/hand_<task_name>

The cameras will first warm up, then a prompt will appear on the terminal for you to press Enter to start recording. To speed up human demonstration collection, we have added two prompts at the end of each trajectory:

The first asks if you want to save the demo
The second asks if you would like to save the gif visualization

By default (triggered when only the Enter key is pressed), the first is True and the second is False (we save the human demonstration but skip the gif-generation process). In other words, if you are happy with your trajectory, pressing Enter twice will start the next recording.

Tip

Processing human demonstrations, as described below, is a multi-step process. Since the post-processing step involves passing the demonstrations through multiple pre-trained models (HaMeR, SAM2), we can improve efficiency by saving chunks of the demonstration data to sequential directories, i.e. hand_<task_name>_1, hand_<task_name>_2, etc.

Post-processing Demonstrations 💻

Processing Robot Demos

To process the robot demonstrations, simply run

python dataset/process_robot_dataset.py --demo_dir data/robot_<task_name>

This will create a corresponding directory called data/robot_<task_name>_tracks which contains both train and validation directories.

Processing Human Demos

Human demonstrations are processed in a multi-step fashion:

Extraction of human hand poses
Extraction of bounding boxes around objects of interest
Extraction of grasp information

First, to extract the hand pose using HaMeR, run

python dataset/process_human_datasets.py --demo_dirs data/hand_<task_name>

Like the robot demonstrations, this will create a corresponding directory called data/hand_<task_name>_tracks.

Next, we need to produce a segmentation of the object in order to extract the grasp pose. Begin by extracting the first frame of each episode via

python dataset/extract_first_frames.py --data_dirs data/hand_<task_name>_tracks

In most cases, running Grounding-DINO is sufficient to provide a good enough bounding box for SAM2. To get those, run

python dataset/select_points.py --npz_path `data/hand_<task_name>_tracks/train/first_frames.npz` --use_dino --object_name carrot

Otherwise, there is also the option to manually select points of the manipulated object using the same script without the --use_dino flag. In the cv2 window that pops up, to undo a click, press u, and to go to the next image, press n.

At the end of the point selection process, you should end up with a new file at data/hand_<task_name>_tracks/train/first_frames_points.npz. Finally, we extract the grasp information using

python dataset/postprocess_with_sam.py -d data/hand_<task_name>_tracks

Visualizing Data

We provide additional scripts to help visualize the results of this post-processing. It is recommended to run these before training at least once to check that calibration is done properly for robot demonstrations and that tracks/grasp information is correct for human demonstrations.

python dataset/image_track_visualizer.py --data_paths [data/robot_<task_name>_tracks/val/dataset.zarr, data/human_<task_name>_tracks/val/dataset.zarr]

Training 🎓

Keypoint Predictor

The keypoint predictor is a small MLP that maps from noise to human keypoints to bridge the representation gap between robot and human end-effector poses. Train this first via

python scripts/train_keypoint_predictor.py --data_path [hand_<task_name>_tracks]

Then, update the path to the keypoint predictor in models/diffusion_policy.py under the variable keypoint_map_from to the final checkpoint.

MT-π

To train the main policy, run

python scripts/train_mtpi.py --data_path [<insert_data_paths>]

Testing 🧪

To test a trained policy on the real robot, first make sure that the log directory containing config.yaml, dataset_stats/, and checkpoints/ is on the workstation. Note that if this policy is trained with the keypoint predictor, that checkpoint will also need to be copied to the same relative path as during training (or updated in config.yaml).

Then run

python scripts/eval_mtpi.py --log_dir experiments_logs/<path_to_parent_directory_of_ckpt>

Tip

At any point during rollout, you can press Enter to end the trajectory and reset the arm back to its starting state. By default, all rollouts are recorded. You can turn this off via record=False. All recordings are saved under a real_rollouts/ subdirectory where the checkpoint resides (e.g. experiment_logs/MMDD/<timestamp>_<uuid>/real_rollouts).

To visualize the predicted tracks for on-policy rollouts, run

python scripts/visualize_online_rollouts.py -d experiment_logs/MMDD/<timestamp>_<uuid>/real_rollouts

directly on the workstation.

Acknowledgements 📝

We instantiate MT-π using Diffusion Policy, and thus several files take inspiration from their implementation. Calibration and data collection infrastructure is inspired by SPHINX. Real robot experiments are done on top of Monometis. We also use a number of third-party repositories during the data-processing step, namely HaMeR, SAM2, and Depth-Anything-V2. Last but not least, we thank Zi-ang Cao for his wrappers and utility functions interfacing with HaMeR.

Citation

If you found this repository useful in your research, please consider citing our paper.

@article{ren2025motion,
  title={Motion tracks: A unified representation for human-robot transfer in few-shot imitation learning},
  author={Ren, Juntao and Sundaresan, Priya and Sadigh, Dorsa and Choudhury, Sanjiban and Bohg, Jeannette},
  journal={arXiv preprint arXiv:2501.06994},
  year={2025}
}

For any questions regarding the paper or issues with the codebase, please feel free to contact juntao [dot] ren [at] stanford [dot] edu.

Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
mt_pi		mt_pi
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
README.md		README.md
install_deps.sh		install_deps.sh
requirements.txt		requirements.txt
set_env.sh		set_env.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Motion Tracks Policy (MT-π)

Table of Contents

Installation 🛠️

Camera Calibration 📷

Recording Demonstrations 🎬

Recording Robot Demos

Recording Human Demos

Post-processing Demonstrations 💻

Processing Robot Demos

Processing Human Demos

Visualizing Data

Training 🎓

Keypoint Predictor

MT-π

Testing 🧪

Acknowledgements 📝

Citation

About

Uh oh!

Releases

Packages

Languages

License

jren03/mt_pi_codebase

Folders and files

Latest commit

History

Repository files navigation

Motion Tracks Policy (MT-π)

Table of Contents

Installation 🛠️

Camera Calibration 📷

Recording Demonstrations 🎬

Recording Robot Demos

Recording Human Demos

Post-processing Demonstrations 💻

Processing Robot Demos

Processing Human Demos

Visualizing Data

Training 🎓

Keypoint Predictor

MT-π

Testing 🧪

Acknowledgements 📝

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages