DICE

Introduction

Official code of the paper DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image.

Abstract: Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand-face interaction recovery, Decaf, introduces a global fitting optimization guided by contact and deformation estimation networks trained on studio-collected data with 3D annotations. However, Decaf suffers from a time-consuming optimization process and limited generalization capability due to its reliance on 3D annotations of hand-face interaction data. To address these issues, we present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image. DICE estimates the poses of hands and faces, contacts, and deformations simultaneously using a Transformer-based architecture. It features disentangling the regression of local deformation fields and global mesh vertex locations into two network branches, enhancing deformation and contact estimation for precise and robust hand-face mesh recovery. To improve generalizability, we propose a weakly-supervised training approach that augments the training set using in-the-wild images without 3D ground-truth annotations, employing the depths of 2D keypoints estimated by off-the-shelf models and adversarial priors of poses for supervision. Our experiments demonstrate that DICE achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility. Additionally, our method operates at an interactive rate (20 fps) on an Nvidia 4090 GPU, whereas Decaf requires more than 15 seconds for a single image. Our code will be publicly available upon publication.

Inference

Environment Preparation

Create Conda environment:

conda create -n dice python=3.9
conda activate dice

Install required packages:

pip install -r requirements.txt

Install manopth:

git clone https://github.com/hassony2/manopth.git && cd manopth && git checkout 4f1dcad && pip install -e .  && cd ..

Install pytorch3d:

git clone https://github.com/facebookresearch/pytorch3d.git&&cd ./pytorch3d&&git checkout tags/v0.7.2&&pip install -e .&&cd ..

Install apex following METRO.

Dependency Files

run sh download_models.sh in the root folder to download the pretrained HRNet-W64 checkpoint.
create the folder src/common/utils/human_model_files and download the relevant files according to this instruction.
Download head_mesh_transforms.pt and hand_mesh_transforms.pt here and save to the root folder.
Download head_ref_vs.pt, rh_ref_vs.pt, and stiffness_final.npy here and place it in src/modeling/data/.
Download basicModel_neutral_lbs_10_207_0_v1.0.0.pkl from SMPLify, and place it in src/modeling/data.
Download MANO_RIGHT.pkl from MANO, and place it in src/modeling/data.
Download model.bin from here and place it in checkpoints.

Usage

To run inference, use our script: sh infer.sh to run inference on sample images from assets/images. Visualizations and output meshes are saved to output/example_inference.

For best results, crop input images to put the head and the face near center before running inference.

Citation

@inproceedings{wudice,
  title={DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image},
  author={Wu, Qingxuan and Dou, Zhiyang and Xu, Sirui and Shimada, Soshi and Wang, Chen and Yu, Zhengming and Liu, Yuan and Lin, Cheng and Cao, Zeyu and Komura, Taku and others},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}

Acknowledgements

Our implementation and experiments are built on top of open-source GitHub repositories. We thank all the authors who made their code public, which tremendously accelerates our project progress.

HRNet/HRNet-Image-Classification

huggingface/transformers

microsoft/MeshTransformer

MPI-IS/mesh

facebookresearch/pytorch3d

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets/images		assets/images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE.md		NOTICE.md
README.md		README.md
SECURITY.md		SECURITY.md
download_models.sh		download_models.sh
infer.sh		infer.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DICE

Introduction

Inference

Environment Preparation

Dependency Files

Usage

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Qingxuan-Wu/DICE

Folders and files

Latest commit

History

Repository files navigation

DICE

Introduction

Inference

Environment Preparation

Dependency Files

Usage

Citation

Acknowledgements

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages