Paper | Video | Project Page
The official implementation of the paper "DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions".
If you use this code in your research, please cite:
@inproceedings{christen2024diffh2o,
title = {DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions},
author = {Christen, Sammy and Hampali, Shreyas and Sener, Fadime and Remelli, Edoardo and Hodan, Tomas and Sauer, Eric and Ma, Shugao and Tekin, Bugra},
booktitle = {SIGGRAPH Asia 2024 Conference Papers},
year = {2024}
}
π’ 7/July/25 - First release.
This code was tested on Ubuntu 22.04 LTS and requires:
- Python 3.8
- conda3 or miniconda3
- CUDA capable GPU (one is enough)
Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpegFor windows use this instead.
DiffH2O shares a large part of its base dependencies with the GMD. However, you might find it easier to install our dependencies from scratch due to some key version differences.
Setup conda env:
conda config --append channels conda-forge
conda env create -f environment_diffh2o.yml
conda activate diffh2o
pip install -r requirements.txt
conda remove --force ffmpeg
pip install git+https://github.com/openai/CLIP.gitDownload the data using the following script:
bash prepare/download_representations.shDownload the pretrained models using the following script:
bash prepare/download_pretrained_models.shDownload the object and subject models from GRAB and place them within the assets folder as follows:
diffh2o
βββ assets
β βββ contact_meshes
β βββ female
β βββ male
βββ ...
We use aitviewer for visualization purposes. To install the aitviewer, the following folder structure is recommended:
parent_folder
βββ diffh2o # This repository
| βββ ...
βββ aitviewer # Visualization tool
β βββ ...
βββ data # Data folder
βββ smplx
βββ mano
βββ MANO_LEFT.pkl
βββ MANO_RIGHT.pkl
If the data is stored elsewhere, change smplx path in the aitvconfig.yaml of aitviewer.
To install aitviewer, carry out the following steps from the parent_folder:
git clone [email protected]:eth-ait/aitviewer.git
cd aitviewer
pip install -e .Next, you need to download the MANO models from the official page and correctly place them according to the folder structure above.
See aitviewer instructions for more information.
We provide textual annotations for the GRAB dataset, which can be found in this file grab_annotations.csv. There are some minor changes to the GRAB naming ("pick_all" in ours is "lift" in GRAB).
Run our full basic model (single stage without guidance) on the simple annotations with:
python -m sample.generate --model_path ./save/diffh2o_full/model000200000.pt --num_samples 16 Run our full basic model (single stage without guidance) on the detailed annotations with:
python -m sample.generate --model_path ./save/diffh2o_full_detailed/model000200000.pt --num_samples 16 --text_detailedTo run the two-stage model with guidance on the simple annotations, run:
python -m sample.generate_2stage --model_path ./save/diffh2o_full/model000200000.pt --num_samples 16 --guidance To run the two-stage model with guidance on the detailed annotations, run:
python -m sample.generate_2stage --model_path ./save/diffh2o_full_detailed/model000200000.pt --num_samples 16 --guidance --text_detailedYou may also define:
--deviceid.--seedto sample different prompts.--num_samplesthe number of samples to generate--physics_metricsflag to evaluate physics based metrics--eval_entire_setflag to evaluate the entire set of test prompts--text_detailedflag to use the detailed annotations for evaluation
For the visualizations in the paper, we utilize aitviewer.
To visualize generated outputs of our model, the following prompt can be used
python visualize/visualize_sequences.py --file_path save/diffh2o_full/samples_000400000/ --is_pcaYou may also define:
--is_pcatrue if PCA representation is used for MANO.--pre_grasptrue for only visualizing the grasping phase.--kf_vistrue to visualize grasp reference frames (if available).--vis_gtto visualize ground truth sequences from data.--save_videosaving the videos to local storage instead of interactive visualization--num_repsthe number of repetitions generated for each prompt--range_minthe index of the first sequence to be visualized--range_maxthe index of the last sequence to be visualized--resolutionthe resolution of the visualization. Either 'high', 'medium', or 'low'
DiffH2O is trained on the GRAB dataset.
To train the grasp model, run the following command:
python -m train.train_graspWe can train two variants of our full interaction model (used with inpainting of the grasping phase).
To train on simple text descriptions, run the following command:
python -m train.train_diffh2oTo train on our detailed text descriptions, run the following command:
python -m train.train_diffh2o_detailedThe model used for comparing to IMoS that only models the interaction phase (without impainting) can be trained via:
python -m train.train_interactionThe training options for the different configs can be found in ./configs/card.py.
Additionally, the following can be added to all training commands:
- Use
--deviceto define GPU id. - Add
--train_platform_type {ClearmlPlatform, TensorboardPlatform}to track results with either ClearML or Tensorboard.
We build our codebase upon previous works. We want to thank the following researchers for their contributions and code:
GMD, GRAB, MDM, guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl, MoDi.
This code is distributed under an CC-BY-NC LICENSE.
Note that our code depends on other libraries, including CLIP, MANO, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.
