Skip to content

Official repository for the project "TraceGen: World Modeling in 3D Trace-Space Enables Learning from Cross-Embodiment Videos"

License

Notifications You must be signed in to change notification settings

jayLEE0301/TraceGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TraceGen TraceGen: World Modeling in 3D Trace-Space Enables Learning from Cross-Embodiment Videos

Official repository for the project TraceGen: World Modeling in 3D Trace-Space Enables Learning from Cross-Embodiment Videos.

Project Website: tracegen.github.io
arXiv: 2511.21690

πŸš€ Benchmark and Dataset is Released!

TraceGen Overview

TraceGen TraceForge: How to Generate Your Own Dataset

TraceForge Overview For the data generation pipeline TraceForge that prepares cross-embodiment 3D trace dataset for TraceGen training, please refer to
TraceForge GitHub Repository.

TraceForge is a scalable data pipeline that transforms heterogeneous human and robot videos into consistent 3D traces.

  • Camera motion compensation: Estimating camera pose and depth, and applying world-to-camera alignment
  • Speed retargeting: Normalizing motion speeds across different embodiments
  • 3D point tracking: Using predicted camera poses and depth to reconstruct scene-level 3D trajectories for both robot and object motion

Installation for TraceGen TraceGen

We provide two ways to install TraceGen conda environment. Both tested on PyTorch 2.4.1 with CUDA 12.4.

Option 1: Using Conda Environment (Recommended)

  1. Create and install environment from environment.yml:
conda env create -f environment.yml
conda activate trace_gen

Option 2: Manual Installation

  1. Create a conda environment:
conda create -n trace_gen python=3.10
conda activate trace_gen
  1. Install PyTorch (We tested on 2.4.1):
conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.4 -c pytorch -c nvidia
  1. Install all dependencies:
pip install -r requirements.txt

Quick Start

  1. Setup your local configuration: Create a local config file with your dataset paths:
cp cfg/train.local.yaml.example cfg/train.local.yaml
# Edit cfg/train.local.yaml with your dataset directories and checkpoint paths
  1. Start training: See the Training section below for detailed examples.

Prepare Dataset for TraceGen TraceGen

Datasets prepared through TraceGen TraceForge should be organized as follows:

data/
β”œβ”€β”€ episode_01/
β”‚   β”œβ”€β”€ images/
β”‚   β”‚   β”œβ”€β”€ episode_01_0.png
β”‚   β”‚   β”œβ”€β”€ episode_01_5.png
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ samples/
β”‚   β”‚   β”œβ”€β”€ episode_01_0.npz  # Contains 'keypoints' array [N, 2]
β”‚   β”‚   β”œβ”€β”€ episode_01_5.npz
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ depth/ # (optional)
β”‚   β”‚   β”œβ”€β”€ episode_01_0_raw.npz  # Contains 'keypoints' array [N, 2]
β”‚   β”‚   β”œβ”€β”€ episode_01_5_raw.npz
β”‚   β”‚   └── ...
β”‚   └── task_descriptions.json
β”œβ”€β”€ episode_02/
└── ...

Training TraceGen TraceGen

TraceGen Overview

Important Configuration Guidelines

Training with Small Datasets

When training with small datasets, frequent visualization and checkpoint saving can be inefficient. We recommend the following configuration adjustments:

save_every: 20
num_log_steps_per_epoch: 0  # Disable intra-epoch logging
eval_every: 20
visualize_every: 20
val_split: 0.1  # Or larger to avoid zero validation samples

Training with Large Datasets

For large datasets, to ensure adequate logging frequency and avoid sparse checkpoints:

save_every: 1
num_log_steps_per_epoch: 10  # Or higher for more frequent intra-epoch logging
eval_every: 1
visualize_every: 1
val_split: 0.01 # Small enough number to avoid validation takes too much time

Option 1: Single GPU Training from Scratch

export CUDA_VISIBLE_DEVICES=0
python train.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=6 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=12 \
  model.decoder.num_attention_heads=16 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000

Option 2: Multi-GPU Training from Scratch

export CUDA_VISIBLE_DEVICES=0,1,2,3
torchrun --standalone --nproc_per_node=4 \
  train.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000

Option 3: Fine-tune TraceGen with Multi-GPU

export CUDA_VISIBLE_DEVICES=0,1,2,3
torchrun --standalone --nproc_per_node=4 \
  train.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000 \
  --resume {path_to_pretrained_checkpoint}

Note: Replace {path_to_pretrained_checkpoint} with the path to your downloaded TraceGen checkpoint. (you can find pretrained TraceGen on TraceForge-123k at https://huggingface.co/JayLee131/TraceGen)

Monitoring Training

If you enable Weights & Biases logging (logging.use_wandb=true), you can monitor:

  • Training and validation losses
  • Generated trajectory visualizations
  • Predicted trajectory MSE

Testing

πŸš€ Testing on TraceGen benchmarks is released!

Evaluation Protocol

This dataset defines the official evaluation protocol for the TraceGen benchmark. Models are evaluated on 5 environments with the following metrics:

  • Mean Squared Error (MSE)
  • Mean Absolute Error (MAE)
  • Endpoint MSE

The official leaderboard is hosted at: πŸ‘‰ https://huggingface.co/furonghuang-lab/TraceGenBenchmark

Environment Metric TraceGen (Γ—1eβˆ’2)
EpicKitchen MSE 0.445
MAE 2.721
Endpoint MSE 0.791
Droid MSE 0.206
MAE 1.289
Endpoint MSE 0.285
Bridge MSE 0.653
MAE 2.419
Endpoint MSE 0.607
Libero MSE 0.276
MAE 1.442
Endpoint MSE 0.385
Robomimic MSE 0.138
MAE 1.416
Endpoint MSE 0.151

Test on TraceGen benchmark

Multi-GPU

export CUDA_VISIBLE_DEVICES=0,1,2,3
torchrun --standalone --nproc_per_node=4 \
  test_benchmark.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000 \
  --resume {path_to_pretrained_checkpoint}

Single-GPU

export CUDA_VISIBLE_DEVICES=0
python test_benchmark.py \
  --config cfg/train.yaml \
  --override \
  train.batch_size=8 \
  train.lr_decoder=1.5e-4 \
  model.decoder.num_layers=6 \
  model.decoder.num_attention_heads=12 \
  model.decoder.latent_dim=768 \
  data.num_workers=4 \
  hardware.mixed_precision=true \
  logging.use_wandb=true \
  logging.log_every=2000 \
  --resume {path_to_pretrained_checkpoint}

Repository Structure

High-level overview of the repository structure:

Trace_gen/
β”œβ”€β”€ cfg/                          # Configuration files
β”œβ”€β”€ dataio/                       # Data loading and preprocessing
β”œβ”€β”€ models/                       # Model architectures
β”œβ”€β”€ losses/                       # Loss functions
β”œβ”€β”€ trainer/                      # Training loop implementation
β”œβ”€β”€ utils/                        # Utility functions
β”œβ”€β”€ train.py                      # Main training script
β”œβ”€β”€ test_example.py               # Example testing script
β”œβ”€β”€ test_helpers.py               # Testing utilities
β”œβ”€β”€ environment.yml               # Conda environment file
β”œβ”€β”€ requirements.txt              # Python dependencies
└── README.md                     # This file

πŸ“– Citation

If you find this work useful, please consider citing our paper:

@article{lee2025tracegen,
  title={TraceGen: World Modeling in 3D Trace Space Enables Learning from Cross-Embodiment Videos},
  author={Lee, Seungjae and Jung, Yoonkyo and Chun, Inkook and Lee, Yao-Chih and Cai, Zikui and Huang, Hongjia and Talreja, Aayush and Dao, Tan Dat and Liang, Yongyuan and Huang, Jia-Bin and Huang, Furong},
  journal={arXiv preprint arXiv:2511.21690},
  year={2025}
}

Acknowledgments

Our code modifies and builds upon:

  • CogVideoX from HuggingFace Diffusers for the 3D trace generation model.
  • Prismatic VLMs for insights on multimodal encoder design

About

Official repository for the project "TraceGen: World Modeling in 3D Trace-Space Enables Learning from Cross-Embodiment Videos"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages