Official code of the paper DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image.
[Project Page] [Paper] [Video]
Abstract: Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand-face interaction recovery, Decaf, introduces a global fitting optimization guided by contact and deformation estimation networks trained on studio-collected data with 3D annotations. However, Decaf suffers from a time-consuming optimization process and limited generalization capability due to its reliance on 3D annotations of hand-face interaction data. To address these issues, we present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image. DICE estimates the poses of hands and faces, contacts, and deformations simultaneously using a Transformer-based architecture. It features disentangling the regression of local deformation fields and global mesh vertex locations into two network branches, enhancing deformation and contact estimation for precise and robust hand-face mesh recovery. To improve generalizability, we propose a weakly-supervised training approach that augments the training set using in-the-wild images without 3D ground-truth annotations, employing the depths of 2D keypoints estimated by off-the-shelf models and adversarial priors of poses for supervision. Our experiments demonstrate that DICE achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility. Additionally, our method operates at an interactive rate (20 fps) on an Nvidia 4090 GPU, whereas Decaf requires more than 15 seconds for a single image. Our code will be publicly available upon publication.
Create Conda environment:
conda create -n dice python=3.9
conda activate dice
Install required packages:
pip install -r requirements.txt
Install manopth:
git clone https://github.com/hassony2/manopth.git && cd manopth && git checkout 4f1dcad && pip install -e . && cd ..
Install pytorch3d:
git clone https://github.com/facebookresearch/pytorch3d.git&&cd ./pytorch3d&&git checkout tags/v0.7.2&&pip install -e .&&cd ..
Install apex following METRO.
- run
sh download_models.shin the root folder to download the pretrained HRNet-W64 checkpoint. - create the folder
src/common/utils/human_model_filesand download the relevant files according to this instruction. - Download
head_mesh_transforms.ptandhand_mesh_transforms.pthere and save to the root folder. - Download
head_ref_vs.pt,rh_ref_vs.pt, andstiffness_final.npyhere and place it insrc/modeling/data/. - Download
basicModel_neutral_lbs_10_207_0_v1.0.0.pklfrom SMPLify, and place it insrc/modeling/data. - Download
MANO_RIGHT.pklfrom MANO, and place it insrc/modeling/data. - Download
model.binfrom here and place it incheckpoints.
To run inference, use our script: sh infer.sh to run inference on sample images from assets/images. Visualizations and output meshes are saved to output/example_inference.
For best results, crop input images to put the head and the face near center before running inference.
@inproceedings{wudice,
title={DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image},
author={Wu, Qingxuan and Dou, Zhiyang and Xu, Sirui and Shimada, Soshi and Wang, Chen and Yu, Zhengming and Liu, Yuan and Lin, Cheng and Cao, Zeyu and Komura, Taku and others},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025}
}
Our implementation and experiments are built on top of open-source GitHub repositories. We thank all the authors who made their code public, which tremendously accelerates our project progress.
