Skip to content

We introduce DiffH2O, a diffusion-based framework to synthesize dexterous hand-object interactions. DiffH2O generates realistic hand-object motion from natural language, generalizes to unseen objects at test time and enables fine-grained control over the motion with detailed textual descriptions.

License

Notifications You must be signed in to change notification settings

facebookresearch/diffh2o

DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions

The official implementation of the paper "DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions".

teaser

Bibtex

If you use this code in your research, please cite:

@inproceedings{christen2024diffh2o,
  title     = {DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions},
  author    = {Christen, Sammy and Hampali, Shreyas and Sener, Fadime and Remelli, Edoardo and Hodan, Tomas and Sauer, Eric and Ma, Shugao and Tekin, Bugra},
  booktitle = {SIGGRAPH Asia 2024 Conference Papers},
  year      = {2024}
}

News

πŸ“’ 7/July/25 - First release.

Getting started

This code was tested on Ubuntu 22.04 LTS and requires:

  • Python 3.8
  • conda3 or miniconda3
  • CUDA capable GPU (one is enough)

1. Setup environment

Install ffmpeg (if not already installed):

sudo apt update
sudo apt install ffmpeg

For windows use this instead.

2. Install dependencies

DiffH2O shares a large part of its base dependencies with the GMD. However, you might find it easier to install our dependencies from scratch due to some key version differences.

Setup conda env:

conda config --append channels conda-forge
conda env create -f environment_diffh2o.yml
conda activate diffh2o
pip install -r requirements.txt
conda remove --force ffmpeg
pip install git+https://github.com/openai/CLIP.git

3. Download data

Download the data using the following script:

bash prepare/download_representations.sh

4. Download pretrained models

Download the pretrained models using the following script:

bash prepare/download_pretrained_models.sh

5. Download GRAB models

Download the object and subject models from GRAB and place them within the assets folder as follows:

diffh2o                
   β”œβ”€β”€ assets
   β”‚     β”œβ”€β”€ contact_meshes
   β”‚     β”œβ”€β”€ female
   β”‚     └── male
   └── ...

6. Install aitviewer

We use aitviewer for visualization purposes. To install the aitviewer, the following folder structure is recommended:

parent_folder
  β”œβ”€β”€ diffh2o                # This repository
  |     └── ...
  β”œβ”€β”€ aitviewer              # Visualization tool
  β”‚     └── ...
  └── data                   # Data folder
        └── smplx             
              └── mano         
                   β”œβ”€β”€ MANO_LEFT.pkl
                   └── MANO_RIGHT.pkl

If the data is stored elsewhere, change smplx path in the aitvconfig.yaml of aitviewer.

To install aitviewer, carry out the following steps from the parent_folder:

git clone [email protected]:eth-ait/aitviewer.git
cd aitviewer
pip install -e .

Next, you need to download the MANO models from the official page and correctly place them according to the folder structure above.

See aitviewer instructions for more information.

Annotations GRAB Dataset

We provide textual annotations for the GRAB dataset, which can be found in this file grab_annotations.csv. There are some minor changes to the GRAB naming ("pick_all" in ours is "lift" in GRAB).

Motion Synthesis

Generate from test set prompts

Run our full basic model (single stage without guidance) on the simple annotations with:

python -m sample.generate --model_path ./save/diffh2o_full/model000200000.pt --num_samples 16 

Run our full basic model (single stage without guidance) on the detailed annotations with:

python -m sample.generate --model_path ./save/diffh2o_full_detailed/model000200000.pt --num_samples 16  --text_detailed

To run the two-stage model with guidance on the simple annotations, run:

python -m sample.generate_2stage --model_path ./save/diffh2o_full/model000200000.pt --num_samples 16 --guidance 

To run the two-stage model with guidance on the detailed annotations, run:

python -m sample.generate_2stage --model_path ./save/diffh2o_full_detailed/model000200000.pt --num_samples 16 --guidance --text_detailed

You may also define:

  • --device id.
  • --seed to sample different prompts.
  • --num_samples the number of samples to generate
  • --physics_metrics flag to evaluate physics based metrics
  • --eval_entire_set flag to evaluate the entire set of test prompts
  • --text_detailed flag to use the detailed annotations for evaluation

Visualization:

For the visualizations in the paper, we utilize aitviewer.

To visualize generated outputs of our model, the following prompt can be used

python visualize/visualize_sequences.py --file_path save/diffh2o_full/samples_000400000/ --is_pca

You may also define:

  • --is_pca true if PCA representation is used for MANO.
  • --pre_grasp true for only visualizing the grasping phase.
  • --kf_vis true to visualize grasp reference frames (if available).
  • --vis_gt to visualize ground truth sequences from data.
  • --save_video saving the videos to local storage instead of interactive visualization
  • --num_reps the number of repetitions generated for each prompt
  • --range_min the index of the first sequence to be visualized
  • --range_max the index of the last sequence to be visualized
  • --resolution the resolution of the visualization. Either 'high', 'medium', or 'low'

Training DiffH2O

DiffH2O is trained on the GRAB dataset.

Grasp Model

To train the grasp model, run the following command:

python -m train.train_grasp

Full Model

We can train two variants of our full interaction model (used with inpainting of the grasping phase).

To train on simple text descriptions, run the following command:

python -m train.train_diffh2o

To train on our detailed text descriptions, run the following command:

python -m train.train_diffh2o_detailed

Interaction-Only Model

The model used for comparing to IMoS that only models the interaction phase (without impainting) can be trained via:

python -m train.train_interaction

The training options for the different configs can be found in ./configs/card.py.

Additionally, the following can be added to all training commands:

  • Use --device to define GPU id.
  • Add --train_platform_type {ClearmlPlatform, TensorboardPlatform} to track results with either ClearML or Tensorboard.

Acknowledgments

We build our codebase upon previous works. We want to thank the following researchers for their contributions and code:

GMD, GRAB, MDM, guided-diffusion, MotionCLIP, text-to-motion, actor, joints2smpl, MoDi.

License

This code is distributed under an CC-BY-NC LICENSE.

Note that our code depends on other libraries, including CLIP, MANO, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.

About

We introduce DiffH2O, a diffusion-based framework to synthesize dexterous hand-object interactions. DiffH2O generates realistic hand-object motion from natural language, generalizes to unseen objects at test time and enables fine-grained control over the motion with detailed textual descriptions.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published