Skip to content

[CVPR 2025] This is the official source for our paper "DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations"

Notifications You must be signed in to change notification settings

ZiqiaoPeng/DualTalk

Repository files navigation

DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations [CVPR 2025]

Official PyTorch implementation for the paper:

DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations, CVPR 2025.

Ziqiao Peng, Yanbo Fan, Haoyu Wu, Xuan Wang, Hongyan Liu, Jun He, Zhaoxin Fan

Paper | Project Page | Code

Comparison of single-role models (Speaker-Only and Listener-Only) with DualTalk. Unlike single-role models, which lack key interaction elements, DualTalk supports speaking and listening role transition, multi-round conversations, and natural interaction.

Environment

  • Linux
  • Python 3.6+
  • Pytorch 1.12.1
  • CUDA 11.3
  • ffmpeg
  • MPI-IS/mesh
  • pytorch3d

Clone the repo:

git clone https://github.com/ZiqiaoPeng/DualTalk.git
cd DualTalk

Create conda environment:

conda create -n dualtalk python=3.8.8
conda activate dualtalk
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
python install_pytorch3d.py

Before installation, you need to create an account on the FLAME website and prepare your login and password beforehand. You will be asked to provide them in the installation script. Then, run the install.sh script to download the FLAME data and install the environment.

bash install.sh

Demo

Download the pretrained model.

# If you are in China, you can set up a mirror.
# export HF_ENDPOINT=https://hf-mirror.com
pip install huggingface-hub
huggingface-cli download ZiqiaoPeng/DualTalk --local-dir model

Given the audio and blendshape data, run:

python demo.py --audio1_path ./demo/xkHwlcDSOjc_sub_video_109_000_speaker2.wav --audio2_path ./demo/xkHwlcDSOjc_sub_video_109_000_speaker1.wav --bs2_path ./demo/xkHwlcDSOjc_sub_video_109_000_speaker1.npz

The results will be saved to result_DualTalk folder.

Dataset

Download the dataset from DualTalk_Dataset, unzip it, and place it in the data folder. The data folder format is as follows:

data/
├── train/
│   ├── xxx_speaker1.wav          # Speaker 1 audio files
│   ├── xxx_speaker1.npz          # Speaker 1 FLAME parameters
│   ├── xxx_speaker2.wav          # Speaker 2 audio files
│   ├── xxx_speaker2.npz          # Speaker 2 FLAME parameters
│   └── ...
├── test/
│   ├── xxx_speaker1.wav
│   ├── xxx_speaker1.npz
│   ├── xxx_speaker2.wav
│   ├── xxx_speaker2.npz
│   └── ...
└── ood/                          # Out-of-distribution test data
    ├── xxx_speaker1.wav
    ├── xxx_speaker1.npz
    ├── xxx_speaker2.wav
    ├── xxx_speaker2.npz
    └── ...

Training and Testing

Training

  • To train the model, run:

     python main.py
    

    You can find the trained models in save_DualTalk folder.

Testing

  • To test the model, run:

    python test.py
    

    The results will be saved to result_DualTalk folder.

Visualization

  • To visualize the results, run:

     cd render
     python render_dualtalk_output.py
    

    You can find the outputs in the result_DualTalk folder.

  • To stitch the two speakers' video, run:

     cd render
     python two_person_video_stitching.py
    

Evaluation

  • To evaluate the model, run:

     cd metric
     python metric.py
    

    You can find the metrics results in the metric folder.

Citation

If you find this code useful, please consider citing:

@inproceedings{peng2025dualtalk,
  title={Dualtalk: Dual-speaker interaction for 3d talking head conversations},
  author={Peng, Ziqiao and Fan, Yanbo and Wu, Haoyu and Wang, Xuan and Liu, Hongyan and He, Jun and Fan, Zhaoxin},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={21055--21064},
  year={2025}
}

About

[CVPR 2025] This is the official source for our paper "DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published