Transform robot demonstrations into rich 3D point cloud datasets and fine-tune state-of-the-art vision-language-action models
Installation β’ Dataset Pipeline β’ Model Training β’ Evaluation
SpatialPi is a comprehensive toolkit that bridges the gap between raw robot demonstrations and powerful 3D-enhanced learning.
git clone --recurse-submodules [email protected]:offjangir/SpatialPi.git
cd SpatialPi
# If already cloned:
git submodule update --init --recursiveCreate the pcgen conda environment for point cloud generation:
conda create -n pcgen python=3.11 -y
conda activate pcgen
# Install stable-virtual-camera
pip install -e .
# Install VGGT
cd ../vggt
pip install -e .
# Install dependencies
pip install tensorflow tensorflow_datasets
pip install git+https://github.com/huggingface/lerobot@0cf864870cf29f4738d3ade893e6fd13fbd7cdb5
cd ../../ # Return to SpatialPi rootSet up the training environment using uv:
cd openpi
# Install dependencies
GIT_LFS_SKIP_SMUDGE=1 uv sync
GIT_LFS_SKIP_SMUDGE=1 uv pip install -e .π‘ Tip: For Docker-based installation, see
openpi/docs/docker.md
The SpatialPi pipeline converts RLDS-format robot demonstrations into LeRobot datasets enriched with 3D point clouds.
Download the LIBERO RLDS dataset:
huggingface-cli download openvla/modified_libero_rlds \
--repo-type dataset \
--local-dir ./data/modified_libero_rldsCreate dataset structure and allocate point cloud filenames:
conda activate pcgen
export PYTHONPATH=$(pwd)
python src/conver_data.py \
--data_dir ./data/modified_libero_rlds \
--stage 1 \
--output_dir ./lerobot_pcWhat happens:
- β Enumerates all RLDS episodes
- β Creates LeRobot dataset structure
- β Pre-allocates point cloud filenames
- β Generates metadata for Stage 2
Generate point clouds using multi-GPU processing:
python src/conver_data.py \
--data_dir ./data/modified_libero_rlds \
--stage 2 \
--output_dir ./lerobot_pc \
--num_gpus 4 \
--workers_per_gpu 1 \
--resume TrueKey Options:
| Option | Description | Default |
|---|---|---|
--num_gpus |
Number of GPUs to use | 2 |
--workers_per_gpu |
Workers per GPU | 1 |
--resume |
Resume from already processed episodes | True |
Fix HuggingFace metadata for LeRobot compatibility:
python src/fix_data.py ./lerobot_pcThis ensures all nested features use _type="Sequence" instead of "List" for seamless LeRobot integration.
Train state-of-the-art vision-language-action models on your point cloud datasets.
cd openpi
uv run scripts/compute_norm_stats.py --config-name pi0_liberoThis generates normalization statistics required for training.
XLA_PYTHON_CLIENT_MEM_FRACTION=0.9 \
uv run scripts/train.py pi0_libero \
--exp-name spatialpi_pi0_libero \
--overwrite
β οΈ Note: You should customize your own data path inopenpi/src/openpi/training/config.pyif you store the dataset in a different location.
Run evaluation with a two-terminal setup: one for the simulation environment, one for the policy server.
cd openpi
# Create evaluation environment
uv venv --python 3.9 examples/libero/.venv
source examples/libero/.venv/bin/activate
# Install dependencies
uv pip sync \
examples/libero/requirements.txt \
third_party/libero/requirements.txt \
--extra-index-url https://download.pytorch.org/whl/cu113 \
--index-strategy=unsafe-best-match
uv pip install -e packages/openpi-client
uv pip install -e third_party/libero
# Install Python 3.9-compatible VGGT
git clone [email protected]:sorceressyidi/vggt.git
cd vggt && uv pip install -e . && cd ..
# Set environment
export PYTHONPATH=$PYTHONPATH:$(pwd)/third_party/libero
# Run simulation
python examples/libero/main_vggt.py
# If EGL errors occur:
MUJOCO_GL=glx python examples/libero/main_vggt.pycd openpi
uv run scripts/serve_policy.py --env LIBEROThe server streams actions via WebSocket to the simulation environment.
We have fine-tuned the pi_base model on the Libero dataset for 5K steps and evaluated on the Libero benchmark.
| Model | Libero Spatial | Libero Object | Libero Goal | Libero 10 | Average |
|---|---|---|---|---|---|
| Οβ_pc @ 5K | 81.8% | 93.4% | 73.8% | 55.2% | 76.5% |
| Οβ @ 5K | 76.4% | 92.2% | 70.8% | 54.2% | 73.4% |
| Οβ_pc @ 30K | 98.2% | 97.4% | 94.0% | 89.6% | 94.8% |
| Ο0_libero | 96.8% | 98.0% | 93.4% | 81% | 92.3% |
| Οβ_base | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% |
We gratefully acknowledge the following projects and teams:
- Physical Intelligence for the amazing OpenPI framework and Ο-models
- Meta AI for VGGT depth estimation
- LIBERO Team for the comprehensive manipulation benchmark
- HuggingFace for LeRobot and dataset infrastructure
Built with β€οΈ for the robotics community
β Star us on GitHub β’ π Documentation β’ π Report Issues