TraceGen: World Modeling in 3D Trace-Space Enables Learning from Cross-Embodiment Videos

Official repository for the project TraceGen: World Modeling in 3D Trace-Space Enables Learning from Cross-Embodiment Videos.

Project Website: tracegen.github.io
arXiv: 2511.21690

🚀 Benchmark and Dataset is Released!

Training/testing labels for the five datasets (Libero, Robomimic, Droid, Epickitchen, Bridge), along with the checkpoints trained on each and their metrics, are now available. See the Hugging Face collection for all assets: https://huggingface.co/collections/furonghuang-lab/tracegen
The official leaderboard is hosted at: 👉 https://huggingface.co/furonghuang-lab/TraceGenBenchmark

TraceForge: How to Generate Your Own Dataset

For the data generation pipeline TraceForge that prepares cross-embodiment 3D trace dataset for TraceGen training, please refer to
TraceForge GitHub Repository.

TraceForge is a scalable data pipeline that transforms heterogeneous human and robot videos into consistent 3D traces.

Camera motion compensation: Estimating camera pose and depth, and applying world-to-camera alignment
Speed retargeting: Normalizing motion speeds across different embodiments
3D point tracking: Using predicted camera poses and depth to reconstruct scene-level 3D trajectories for both robot and object motion

Installation for TraceGen

We provide two ways to install TraceGen conda environment. Both tested on PyTorch 2.4.1 with CUDA 12.4.

Option 1: Using Conda Environment (Recommended)

Create and install environment from environment.yml:

conda env create -f environment.yml
conda activate trace_gen

Option 2: Manual Installation

Create a conda environment:

conda create -n trace_gen python=3.10
conda activate trace_gen

Install PyTorch (We tested on 2.4.1):

conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.4 -c pytorch -c nvidia

Install all dependencies:

pip install -r requirements.txt

Quick Start

Setup your local configuration: Create a local config file with your dataset paths:

cp cfg/train.local.yaml.example cfg/train.local.yaml
# Edit cfg/train.local.yaml with your dataset directories and checkpoint paths

Start training: See the Training section below for detailed examples.

Prepare Dataset for TraceGen

Datasets prepared through TraceForge should be organized as follows:

data/
├── episode_01/
│   ├── images/
│   │   ├── episode_01_0.png
│   │   ├── episode_01_5.png
│   │   └── ...
│   ├── samples/
│   │   ├── episode_01_0.npz  # Contains 'keypoints' array [N, 2]
│   │   ├── episode_01_5.npz
│   │   └── ...
│   ├── depth/ # (optional)
│   │   ├── episode_01_0_raw.npz  # Contains 'keypoints' array [N, 2]
│   │   ├── episode_01_5_raw.npz
│   │   └── ...
│   └── task_descriptions.json
├── episode_02/
└── ...

Training TraceGen

Important Configuration Guidelines

Training with Small Datasets

When training with small datasets, frequent visualization and checkpoint saving can be inefficient. We recommend the following configuration adjustments:

save_every: 20
num_log_steps_per_epoch: 0  # Disable intra-epoch logging
eval_every: 20
visualize_every: 20
val_split: 0.1  # Or larger to avoid zero validation samples

Training with Large Datasets

For large datasets, to ensure adequate logging frequency and avoid sparse checkpoints:

save_every: 1
num_log_steps_per_epoch: 10  # Or higher for more frequent intra-epoch logging
eval_every: 1
visualize_every: 1
val_split: 0.01 # Small enough number to avoid validation takes too much time

Option 1: Single GPU Training from Scratch

export CUDA_VISIBLE_DEVICES=0
python train.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=6 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=12 \
  model.decoder.num_attention_heads=16 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000

Option 2: Multi-GPU Training from Scratch

export CUDA_VISIBLE_DEVICES=0,1,2,3
torchrun --standalone --nproc_per_node=4 \
  train.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000

Option 3: Fine-tune TraceGen with Multi-GPU

export CUDA_VISIBLE_DEVICES=0,1,2,3
torchrun --standalone --nproc_per_node=4 \
  train.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000 \
  --resume {path_to_pretrained_checkpoint}

Note: Replace {path_to_pretrained_checkpoint} with the path to your downloaded TraceGen checkpoint. (you can find pretrained TraceGen on TraceForge-123k at https://huggingface.co/JayLee131/TraceGen)

Monitoring Training

If you enable Weights & Biases logging (logging.use_wandb=true), you can monitor:

Training and validation losses
Generated trajectory visualizations
Predicted trajectory MSE

Testing

🚀 Testing on TraceGen benchmarks is released!

Evaluation Protocol

This dataset defines the official evaluation protocol for the TraceGen benchmark. Models are evaluated on 5 environments with the following metrics:

Mean Squared Error (MSE)
Mean Absolute Error (MAE)
Endpoint MSE

The official leaderboard is hosted at: 👉 https://huggingface.co/furonghuang-lab/TraceGenBenchmark

Environment	Metric	TraceGen (×1e−2)
EpicKitchen	MSE	0.445
	MAE	2.721
	Endpoint MSE	0.791
Droid	MSE	0.206
	MAE	1.289
	Endpoint MSE	0.285
Bridge	MSE	0.653
	MAE	2.419
	Endpoint MSE	0.607
Libero	MSE	0.276
	MAE	1.442
	Endpoint MSE	0.385
Robomimic	MSE	0.138
	MAE	1.416
	Endpoint MSE	0.151

Test on TraceGen benchmark

Multi-GPU

export CUDA_VISIBLE_DEVICES=0,1,2,3
torchrun --standalone --nproc_per_node=4 \
  test_benchmark.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000 \
  --resume {path_to_pretrained_checkpoint}

Single-GPU

export CUDA_VISIBLE_DEVICES=0
python test_benchmark.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000 \
  --resume {path_to_pretrained_checkpoint}

Repository Structure

High-level overview of the repository structure:

Trace_gen/
├── cfg/                          # Configuration files
├── dataio/                       # Data loading and preprocessing
├── models/                       # Model architectures
├── losses/                       # Loss functions
├── trainer/                      # Training loop implementation
├── utils/                        # Utility functions
├── train.py                      # Main training script
├── test_example.py               # Example testing script
├── test_helpers.py               # Testing utilities
├── environment.yml               # Conda environment file
├── requirements.txt              # Python dependencies
└── README.md                     # This file

📖 Citation

If you find this work useful, please consider citing our paper:

@article{lee2025tracegen,
  title={TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos},
  author={Lee, Seungjae and Jung, Yoonkyo and Chun, Inkook and Lee, Yao-Chih and Cai, Zikui and Huang, Hongjia and Talreja, Aayush and Dao, Tan Dat and Liang, Yongyuan and Huang, Jia-Bin and Huang, Furong},
  journal={arXiv preprint arXiv:2511.21690},
  year={2025}
}

Acknowledgments

Our code modifies and builds upon:

CogVideoX from HuggingFace Diffusers for the 3D trace generation model.
Prismatic VLMs for insights on multimodal encoder design

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TraceGen: World Modeling in 3D Trace-Space Enables Learning from Cross-Embodiment Videos

🚀 Benchmark and Dataset is Released!

TraceForge: How to Generate Your Own Dataset

Installation for TraceGen

Option 1: Using Conda Environment (Recommended)

Option 2: Manual Installation

Quick Start

Prepare Dataset for TraceGen

Training TraceGen

Important Configuration Guidelines

Option 1: Single GPU Training from Scratch

Option 2: Multi-GPU Training from Scratch

Option 3: Fine-tune TraceGen with Multi-GPU

Monitoring Training

Testing

Evaluation Protocol

Test on TraceGen benchmark

Repository Structure

📖 Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
cfg		cfg
dataio		dataio
losses		losses
models		models
trainer		trainer
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
test_benchmark.py		test_benchmark.py
test_example.py		test_example.py
test_helpers.py		test_helpers.py
train.py		train.py

License

jayLEE0301/TraceGen

Folders and files

Latest commit

History

Repository files navigation

TraceGen: World Modeling in 3D Trace-Space Enables Learning from Cross-Embodiment Videos

🚀 Benchmark and Dataset is Released!

TraceForge: How to Generate Your Own Dataset

Installation for TraceGen

Option 1: Using Conda Environment (Recommended)

Option 2: Manual Installation

Quick Start

Prepare Dataset for TraceGen

Training TraceGen

Important Configuration Guidelines

Option 1: Single GPU Training from Scratch

Option 2: Multi-GPU Training from Scratch

Option 3: Fine-tune TraceGen with Multi-GPU

Monitoring Training

Testing

Evaluation Protocol

Test on TraceGen benchmark

Repository Structure

📖 Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages