Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory
Official implementation of the paper ManipDreamer3D (arXiv:2509.05314)
ManipDreamer3D is a 3D-aware video generation framework designed for plausible robotic manipulation synthesis.
It integrates occupancy-aware 3D trajectory planning with visual generative modeling, achieving both visual realism and physical feasibility.
🔹 Key Highlights
- 🧠 Occupancy-based 3D path planning ensures physical plausibility
- 🎥 Generates consistent multi-frame robotic motion videos
- 📈 Achieves 67.9% success rate, comparable to CogACT (67.5%) in SimplerEnv
Tested on Ubuntu 22.04 / CUDA 11.8 / Python 3.10
⚠️ Important: Install sam2 first, then other dependencies.
- Setup Environment
conda create -n md3d python=3.10 -y
conda activate md3d
pip install uv -i https://pypi.tuna.tsinghua.edu.cn/simple
export UV_INDEX_URL="https://pypi.tuna.tsinghua.edu.cn/simple/"
uv pip install git+https://github.com/facebookresearch/sam2.git
pip install torch==2.1.2+cu118 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
CUDA_HOME=/usr/local/cuda-11.8 uv pip install -e . --no-build-isolation
CUDA_HOME=/usr/local/cuda-11.8 uv pip install -r requirements.txt- Install NKSR (for occupancy reconstruction)
git clone https://github.com/nv-tlabs/NKSR.git && cd NKSR
CUDA_HOME=/usr/local/cuda-11.8 uv pip install --no-build-isolation package/
cd ..- Install SimplerEnv (for policy evaluation)
git clone https://github.com/simpler-env/SimplerEnv --recurse-submodules
cd SimplerEnv
uv pip install -r SimplerEnv/requirements_full_install.txt
cd ManiSkill2_real2sim && uv pip install .
cd .. && uv pip install .
pip install --upgrade "jax[cuda11_pip]==0.4.20" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
cd ..- Fix Vulkan dependency if needed
conda install conda-forge::libvulkan-loader -y⚙️ Environment Variables
Create an env.sh file and source env.sh for environment variables.:
## env.sh
export USER_ROOT="/path/to/your/workspace"
conda activate md3d
# (optional) set proxy
export PATH=/usr/local/cuda-11.8/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64:$LD_LIBRARY_PATH
# API key for OpenAI or DeepSeek
export api_key="your_openai_api_key"
# SimplerEnv data path
export MS2_REAL2SIM_ASSET_DIR=../SimplerEnv/ManiSkill2_real2sim/data
# PyTorch memory tuning
export PYTORCH_CUDA_ALLOC_CONF="max_split_size_mb:64"📦 Download Pretrained Models
mkdir -p manipdreamer3d/weights
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth \
-O manipdreamer3d/weights/groundingdino_swint_ogc.pth🧩 Training
bash scripts/train/train_md3d.sh🎯 Citation
If you find this work useful, please consider citing:
@article{li2025manipdreamer3d,
title={ManipDreamer3D: Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory},
author={Li, Ying and Wei, Xiaobao and Chi, Xiaowei and Li, Yuming and Zhao, Zhongyu and Wang, Hao and Ma, Ningning and Lu, Ming and Zhang, Shanghang},
journal={arXiv preprint arXiv:2509.05314},
year={2025}
}