Official PyTorch implementation for the paper:
DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations, CVPR 2025.
Ziqiao Peng, Yanbo Fan, Haoyu Wu, Xuan Wang, Hongyan Liu, Jun He, Zhaoxin Fan
Paper | Project Page | Code
Comparison of single-role models (Speaker-Only and Listener-Only) with DualTalk. Unlike single-role models, which lack key interaction elements, DualTalk supports speaking and listening role transition, multi-round conversations, and natural interaction.
- Linux
- Python 3.6+
- Pytorch 1.12.1
- CUDA 11.3
- ffmpeg
- MPI-IS/mesh
- pytorch3d
Clone the repo:
git clone https://github.com/ZiqiaoPeng/DualTalk.git
cd DualTalkCreate conda environment:
conda create -n dualtalk python=3.8.8
conda activate dualtalk
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
python install_pytorch3d.pyBefore installation, you need to create an account on the FLAME website and prepare your login and password beforehand. You will be asked to provide them in the installation script. Then, run the install.sh script to download the FLAME data and install the environment.
bash install.shDownload the pretrained model.
# If you are in China, you can set up a mirror.
# export HF_ENDPOINT=https://hf-mirror.com
pip install huggingface-hub
huggingface-cli download ZiqiaoPeng/DualTalk --local-dir modelGiven the audio and blendshape data, run:
python demo.py --audio1_path ./demo/xkHwlcDSOjc_sub_video_109_000_speaker2.wav --audio2_path ./demo/xkHwlcDSOjc_sub_video_109_000_speaker1.wav --bs2_path ./demo/xkHwlcDSOjc_sub_video_109_000_speaker1.npzThe results will be saved to result_DualTalk folder.
Download the dataset from DualTalk_Dataset, unzip it, and place it in the data folder. The data folder format is as follows:
data/
├── train/
│ ├── xxx_speaker1.wav # Speaker 1 audio files
│ ├── xxx_speaker1.npz # Speaker 1 FLAME parameters
│ ├── xxx_speaker2.wav # Speaker 2 audio files
│ ├── xxx_speaker2.npz # Speaker 2 FLAME parameters
│ └── ...
├── test/
│ ├── xxx_speaker1.wav
│ ├── xxx_speaker1.npz
│ ├── xxx_speaker2.wav
│ ├── xxx_speaker2.npz
│ └── ...
└── ood/ # Out-of-distribution test data
├── xxx_speaker1.wav
├── xxx_speaker1.npz
├── xxx_speaker2.wav
├── xxx_speaker2.npz
└── ...
-
To train the model, run:
python main.pyYou can find the trained models in
save_DualTalkfolder.
-
To test the model, run:
python test.pyThe results will be saved to
result_DualTalkfolder.
-
To visualize the results, run:
cd render python render_dualtalk_output.pyYou can find the outputs in the
result_DualTalkfolder. -
To stitch the two speakers' video, run:
cd render python two_person_video_stitching.py
-
To evaluate the model, run:
cd metric python metric.pyYou can find the metrics results in the
metricfolder.
If you find this code useful, please consider citing:
@inproceedings{peng2025dualtalk,
title={Dualtalk: Dual-speaker interaction for 3d talking head conversations},
author={Peng, Ziqiao and Fan, Yanbo and Wu, Haoyu and Wang, Xuan and Liu, Hongyan and He, Jun and Fan, Zhaoxin},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={21055--21064},
year={2025}
}