Paper | Project Page | Demo | Poster
DIMO: Diverse 3D Motion Generation for Arbitrary Objects
Linzhan Mou, Jiahui Lei, Chen Wang, Lingjie Liu, Kostas Daniilidis
University of Pennsylvania
ICCV 2025 (Highlight)
- [2026-01-04] Code and data are pre-released!
- [2025-07-24] DIMO is selected as Highlight Paper!
- [2025-06-26] DIMO is accepted by ICCV 2025! 🎉 We will release code in this repo.
We use Python 3.10 with PyTorch 2.1.1 and CUDA 11.8. The environment and packages can be installed as follows:
git clone --recursive https://github.com/Friedrich-M/DIMO.git && cd DIMO
conda create -y -n dimo -c nvidia/label/cuda-11.8.0 -c defaults cuda-toolkit=11.8 cuda-compiler=11.8 cudnn=8 python=3.10
conda activate dimo
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt --no-build-isolation
pip install --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu118_pyt211/download.html
pip install git+https://github.com/rahul-goel/fused-ssim/ --no-build-isolation
pip install submodules/diff-gauss submodules/diff-gaussian-rasterization submodules/KNN_CUDA submodules/simple-knn --no-build-isolationIntuition: distill rich motion priors from video models as diverse motion capture.
-
Motion Priors Distillation
-
- We use text-conditioned monocular videos models ([CogVideoX], [Wan2.2], [HunyuanVideo], etc.) to distill rich motion priors. We will add detailed instructions soon.
-
Geometry Priors Distillation
You can skip this step and download our processed example data (51 Trump motions) from Google Drive:
mkdir data && cd data && gdown 1b0_2t_KKhOyKlJsYncUcQm6URecAS6M6 && tar -zxvf data_trump_n51_step20.tar.gz && cd ..Intuition: jointly model diverse 3D motions in a shared latent space. To train DIMO, simply run:
sh run_train_latent.sh- NOTE: You can modify the hyperparameters in
run_train_latent.shas needed. Checkconfigs/train_config.yamlto view all configurable parameters and default settings - NOTE: Set the
vae_latentflag toTrueto enforce a Gaussian distribution on the motion latent code, which also enables KL divergence loss during training.
You can also skip training and download our pre-trained model from Google Drive for testing:
mkdir ckpts && cd ckpts && gdown 1-a9JxXvoGRV_qy5ontRShc4mgDgVkrsd && tar -zxvf ckpt_trump_n51_step20.tar.gz && cd ..Once trained, you can perform 4d rendering and visualize key point trajectories by running:
sh run_test_motion.sh- NOTE: You can specify rendering motions by modifying the
render_videoslist inrun_test_motion.sh; uncomment the corresponding lines to render all motions (it may take some time).
The rendered key point trajectories will look like this (Trump is walking):
key_point_3d_trajectory.mp4
The 4d rendering results should look like this (reference, fixed view, orbit views):
4d_rendering.mp4
- NOTE: Since the video models we use for motion prior distillation were not perfect at that time, the generated videos may contain artifacts. We will update the code and models with more advanced video models like Veo3 and SV4D2.0 in the future.
If you have any questions, please feel free to open an issue or email at [email protected].
With the learned motion latent space, we provide scripts to test the following applications. Simply add the corresponding flags in run_test_motion.sh and run it. We also provide some visualization results below. More instructions will be added soon.
- Latent Space Motion Interpolation
Add test_interpolation=True in run_test_motion.sh
interpolation.mp4
- Language-Guided Motion Generation
Add test_language=True in run_test_motion.sh
language.mp4
- Test Motion Reconstruction
Add test_motion=True in run_test_motion.sh
test_motion.mp4
Our code is built on top of DreamGaussian, CogVideoX, SV4D. Many thanks to the authors for sharing their code. We also greatly appreciate the help from Yiming Xie.
If you find this paper useful for your research, please consider citing:
@inproceedings{mou2025dimo,
title={DIMO: Diverse 3D Motion Generation for Arbitrary Objects},
author={Mou, Linzhan and Lei, Jiahui and Wang, Chen and Liu, Lingjie and Daniilidis, Kostas},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={14357--14368},
year={2025}
}
