FVHuman: Free-viewpoint Human Animation with Pose-correlated Reference Selection

Fa-Ting Hong^1,2, Zhan Xu², Haiyang Liu², Qinjie Lin³, Luchuan Song², Zhixin Shu², Yang Zhou², Duygu Ceylan², Dan Xu¹

¹HKUST, ²Adobe Research, ³Northwestern University

🎬 Demo

FVHuman is able to generate free-viewpoint human videos from multiple images.

Video Demo

For more video demonstrations and qualitative comparisons, please check out our project page for additional high-quality video results and detailed experimental analysis.

📖 Abstract

Diffusion-based human animation aims to animate a human character based on a source human image as well as driving signals such as a sequence of poses. Leveraging the generative capacity of diffusion model, existing approaches are able to generate high-fidelity poses, but struggle with significant viewpoint changes, especially in zoom-in/zoom-out scenarios where camera-character distance varies. This limits the applications such as cinematic shot type plan or camera control.

We propose a pose-correlated reference selection diffusion network, supporting substantial viewpoint variations in human animation. Our key idea is to enable the network to utilize multiple reference images as input, since significant viewpoint changes often lead to missing appearance details on the human body. To eliminate the computational cost, we first introduce a novel pose correlation module to compute similarities between non-aligned target and source poses, and then propose an adaptive reference selection strategy, utilizing the attention map to identify key regions for animation generation.

🌟 Key Features

Free-viewpoint Human Animation: Generate human videos with substantial viewpoint changes
Pose-correlated Reference Selection: Intelligent selection of relevant reference regions
Multi-reference Input: Utilizes multiple reference images for comprehensive appearance modeling
Adaptive Selection Strategy: Attention-based identification of key regions for animation
Large Viewpoint Variations: Supports zoom-in/zoom-out scenarios and camera control

🔧 Installation

# Clone the repository
git clone https://github.com/harlanhong/FVHuman.git
cd FVHuman

# Create conda environment
conda create -n fvhuman python=3.8
conda activate fvhuman

# Install dependencies
pip install -r requirements.txt

🚀 Quick Start

Training

Training consists of two stages. You can specify the GPU device using CUDA_VISIBLE_DEVICES:

Stage 1:

CUDA_VISIBLE_DEVICES=1 accelerate launch train_s1.py --config config_s1.yaml --exp_name stage1

Stage 2:

CUDA_VISIBLE_DEVICES=1 accelerate launch train_s2.py --config config_s2.yaml --exp_name stage2

Testing

Run inference with trained models:

CUDA_VISIBLE_DEVICES=5 python inference_video_full.py \
    --config configs/inference/inference_ted.yaml \
    --checkpoint_path state2_ted_full/net-40000.pth \
    --save_name user_test/rst.mp4

Configuration Files

Make sure you have the proper configuration files:

config_s1.yaml - Stage 1 training configuration
config_s2.yaml - Stage 2 training configuration
configs/inference/inference_ted.yaml - Inference configuration

Model Checkpoints

Download the pre-trained model checkpoints from: Checkpoint Download Link

The trained model checkpoint should be placed at:

state2_ted_full/net-40000.pth - Stage 2 trained model

📊 MSTed Dataset

We introduce the Multi-Shot TED (MSTed) dataset, designed to capture significant variations in viewpoints and camera distances:

1,084 unique identities
15,260 video clips
~30 hours of total content
Diverse viewpoints and camera distances
Professional quality TED talk videos

Dataset Download: Link

Dataset Structure

data/
├── msted/
│   ├── videos/
│   │   ├── identity_001/
│   │   │   ├── clip_001.mp4
│   │   │   └── ...
│   │   └── ...
│   ├── poses/
│   │   ├── identity_001/
│   │   │   ├── clip_001_poses.json
│   │   │   └── ...
│   │   └── ...
│   └── metadata.json

📝 Model Architecture

The illustration of our framework. Our framework feeds a reference set into reference UNet to extract the reference feature. To filter out the redundant information in reference features set, we propose a pose correlation guider to create a correlation map to indicate the informative region of the reference spatially. Moreover, we adopt a reference selection strategy to pick up the informative tokens from the reference feature set according to the correlation map and pass them to the following modules.

Our framework consists of:

Reference UNet: Extracts reference features from multiple input images
Pose Correlation Module: Computes similarities between target and source poses
Adaptive Reference Selection: Selects informative tokens based on correlation maps
Animation Generation: Synthesizes final human animation

🎯 Results

Our method achieves superior performance compared to SOTA methods under large viewpoint changes:

Qualitative Results: High-fidelity human animation with diverse viewpoints
Quantitative Evaluation: Improved metrics on viewpoint variation scenarios
User Studies: Preferred by users for realistic viewpoint transitions

For detailed experimental results and visual comparisons, please refer to our project page and the full paper.

📚 Citation

If you find this work useful for your research, please cite:

@article{hong2024fvhuman,
  author    = {Hong, Fa-Ting and Xu, Zhan and Liu, Haiyang and Lin, Qinjie and Song, Luchuan and Shu, Zhixin and Zhou, Yang and Ceylan, Duygu and Xu, Dan},
  title     = {Free-viewpoint Human Animation with Pose-correlated Reference Selection},
  journal   = {CVPR},
  year      = {2025},
}

🔗 Related Links

Project Page: https://harlanhong.github.io/publications/fvhuman/index.html
Paper: arXiv
CVPR 2025: Conference Page

📜 License

This project is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

🙏 Acknowledgments

We thank the creators of TED talks for providing diverse and high-quality video content that made the MSTed dataset possible. We also acknowledge the support from HKUST and Adobe Research.

📧 Contact

For questions and collaborations, please contact:

Fa-Ting Hong: [email protected]

⭐ If you find this project helpful, please consider giving it a star! ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
assets		assets
configs		configs
data		data
datasets		datasets
dwpose_tools		dwpose_tools
models		models
pipelines		pipelines
tools		tools
utils		utils
README.md		README.md
__init__.py		__init__.py
accelerate_config.yaml		accelerate_config.yaml
data_preprocess.py		data_preprocess.py
h100_requirements.txt		h100_requirements.txt
image_level_evaluation.py		image_level_evaluation.py
inference_video_full.py		inference_video_full.py
kill_gpu.py		kill_gpu.py
kill_port.py		kill_port.py
pkg.py		pkg.py
pytorch_i3d.py		pytorch_i3d.py
requirements.txt		requirements.txt
train_s1.py		train_s1.py
train_s2.py		train_s2.py
util.py		util.py
video_level_evaluation.py		video_level_evaluation.py
video_utils.py		video_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FVHuman: Free-viewpoint Human Animation with Pose-correlated Reference Selection

🎬 Demo

Video Demo

📖 Abstract

🌟 Key Features

🔧 Installation

🚀 Quick Start

Training

Testing

Configuration Files

Model Checkpoints

📊 MSTed Dataset

Dataset Download: Link

Dataset Structure

📝 Model Architecture

🎯 Results

📚 Citation

🔗 Related Links

📜 License

🙏 Acknowledgments

📧 Contact

About

Uh oh!

Releases

Packages

Languages

harlanhong/FVHuman

Folders and files

Latest commit

History

Repository files navigation

FVHuman: Free-viewpoint Human Animation with Pose-correlated Reference Selection

🎬 Demo

Video Demo

📖 Abstract

🌟 Key Features

🔧 Installation

🚀 Quick Start

Training

Testing

Configuration Files

Model Checkpoints

📊 MSTed Dataset

Dataset Download: Link

Dataset Structure

📝 Model Architecture

🎯 Results

📚 Citation

🔗 Related Links

📜 License

🙏 Acknowledgments

📧 Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages