Skip to content

Friedrich-M/DIMO

Repository files navigation

DIMO: Diverse 3D Motion Generation for Arbitrary Objects

DIMO: Diverse 3D Motion Generation for Arbitrary Objects
Linzhan Mou, Jiahui Lei, Chen Wang, Lingjie Liu, Kostas Daniilidis
University of Pennsylvania
ICCV 2025 (Highlight)

VR-Robo Teaser

📜 News

  • [2026-01-04] Code and data are pre-released!
  • [2025-07-24] DIMO is selected as Highlight Paper!
  • [2025-06-26] DIMO is accepted by ICCV 2025! 🎉 We will release code in this repo.

⚙️ Installation

We use Python 3.10 with PyTorch 2.1.1 and CUDA 11.8. The environment and packages can be installed as follows:

git clone --recursive https://github.com/Friedrich-M/DIMO.git && cd DIMO
conda create -y -n dimo -c nvidia/label/cuda-11.8.0 -c defaults cuda-toolkit=11.8 cuda-compiler=11.8 cudnn=8 python=3.10
conda activate dimo
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt --no-build-isolation

pip install --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py310_cu118_pyt211/download.html
pip install git+https://github.com/rahul-goel/fused-ssim/ --no-build-isolation
pip install submodules/diff-gauss submodules/diff-gaussian-rasterization submodules/KNN_CUDA submodules/simple-knn --no-build-isolation

📂 Data Preparation

Intuition: distill rich motion priors from video models as diverse motion capture.

  • Motion Priors Distillation

    • We use text-conditioned monocular videos models ([CogVideoX], [Wan2.2], [HunyuanVideo], etc.) to distill rich motion priors. We will add detailed instructions soon.
  • Geometry Priors Distillation

    • We use multi-view video model ([SV3D], [SV4D]) to obtain geometric priors by generating novel views.

You can skip this step and download our processed example data (51 Trump motions) from Google Drive:

mkdir data && cd data && gdown 1b0_2t_KKhOyKlJsYncUcQm6URecAS6M6 && tar -zxvf data_trump_n51_step20.tar.gz && cd ..

🚀 Training

Intuition: jointly model diverse 3D motions in a shared latent space. To train DIMO, simply run:

sh run_train_latent.sh
  • NOTE: You can modify the hyperparameters in run_train_latent.sh as needed. Check configs/train_config.yaml to view all configurable parameters and default settings
  • NOTE: Set the vae_latent flag to True to enforce a Gaussian distribution on the motion latent code, which also enables KL divergence loss during training.

✨ Testing

You can also skip training and download our pre-trained model from Google Drive for testing:

mkdir ckpts && cd ckpts && gdown 1-a9JxXvoGRV_qy5ontRShc4mgDgVkrsd && tar -zxvf ckpt_trump_n51_step20.tar.gz && cd ..

Once trained, you can perform 4d rendering and visualize key point trajectories by running:

sh run_test_motion.sh
  • NOTE: You can specify rendering motions by modifying the render_videos list in run_test_motion.sh; uncomment the corresponding lines to render all motions (it may take some time).

The rendered key point trajectories will look like this (Trump is walking):

key_point_3d_trajectory.mp4

The 4d rendering results should look like this (reference, fixed view, orbit views):

4d_rendering.mp4
  • NOTE: Since the video models we use for motion prior distillation were not perfect at that time, the generated videos may contain artifacts. We will update the code and models with more advanced video models like Veo3 and SV4D2.0 in the future.

If you have any questions, please feel free to open an issue or email at [email protected].

🚦 Applications

With the learned motion latent space, we provide scripts to test the following applications. Simply add the corresponding flags in run_test_motion.sh and run it. We also provide some visualization results below. More instructions will be added soon.

  • Latent Space Motion Interpolation

Add test_interpolation=True in run_test_motion.sh

interpolation.mp4
  • Language-Guided Motion Generation

Add test_language=True in run_test_motion.sh

language.mp4
  • Test Motion Reconstruction

Add test_motion=True in run_test_motion.sh

test_motion.mp4

🌸 Acknowledgement

Our code is built on top of DreamGaussian, CogVideoX, SV4D. Many thanks to the authors for sharing their code. We also greatly appreciate the help from Yiming Xie.

📝 Citation

If you find this paper useful for your research, please consider citing:

@inproceedings{mou2025dimo,
  title={DIMO: Diverse 3D Motion Generation for Arbitrary Objects},
  author={Mou, Linzhan and Lei, Jiahui and Wang, Chen and Liu, Lingjie and Daniilidis, Kostas},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={14357--14368},
  year={2025}
}

About

[ICCV 2025 Highlight] DIMO: Diverse 3D Motion Generation for Arbitrary Objects

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published