RAM-Avatar: Real-Time Photo-Realistic Full-Body Avatar from Monocular Video

(CVPR 2024)

Xiang Deng¹, Zerong Zheng²,
Yuxiang Zhang¹, Jingxiang Sun¹,
Chao Xu², XiaoDong Yang³, Lizhen Wang¹, Yebin Liu¹

¹Tsinghua University ²NNKosmos Technology ³Li Auto

📄 Paper | 🎥 [Video Demo] |

🌟 Overview

We present RAM-Avatar, a novel approach for learning real-time, photo-realistic full-body avatars from monocular videos. Our method enables full-body control with high-fidelity rendering of facial expressions, hand gestures, and body textures — all while maintaining real-time performance.

📚 Abstract

This paper advances the practicality of human avatar learning by introducing RAM-Avatar, a framework that learns real-time, photo-realistic avatars from monocular videos with full-body controllability. To model fine-grained variations in facial expressions and hand gestures, we employ dedicated statistical templates. For the body, a sparsely computed dual attention mechanism enhances texture fidelity on torso and limbs. Built upon this, a lightweight yet powerful StyleUNet architecture, coupled with a temporal-aware discriminator, enables efficient and realistic rendering at real-time speeds. To ensure robust animation under out-of-distribution poses, we propose a Motion Distribution Alignment (MDA) module that reduces domain shift between training and inference. Extensive experiments validate the superiority of our method in both qualitative and quantitative evaluations. We further demonstrate its practical potential via a real-time live avatar system. Code and models will be released for research purposes.

⚙️ Requirements

- Python 3.9.17
- PyTorch 2.0.0+cu118
- TorchVision 0.15.1+cu118
- setuptools 68.0.0
- scikit-image 0.22.0
- numpy 1.25.2

💡 We recommend using a Conda environment:

conda create -n ramavatar python=3.9
conda activate ramavatar
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.0+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install scikit-image numpy setuptools

🗂️ Dataset Preparation

To train RAM-Avatar, prepare your dataset as follows:

Estimate SMPL-X parameters using ProxyCapV2.
Fit FaceVerse parameters for facial dynamics.
Render SMPL and facial maps using PyTorch3D.
Organize the data directory structure:

dataset/train/
├── keypoints_mmpose_hand/
│   ├── 00000001.json
│   ├── 00000002.json
│   └── ...
├── smpl_map/
│   ├── 00000001.png
│   ├── 00000002.png
│   └── ...
├── smpl_map_001/
│   ├── 00000001.png
│   ├── 00000002.png
│   └── ...
├── track2/
│   ├── 00000001.png
│   ├── 00000002.png
│   └── ...
├── 00000001.png          # Original frames
├── 00000002.png
└── ...

📦 Pretrained Checkpoints & Datasets

You can download our pre-trained models and sample datasets here:

🔗 Pretrained Checkpoints (Password: i8gt)
🔗 Sample Dataset (Password: e4wp)
— Shared via Baidu Wangpan Super VIP

⚠️ Note: These links are hosted on Baidu Netdisk. International users may need a download accelerator.

🏃 Training

Single GPU:

CUDA_VISIBLE_DEVICES=0 python main_train.py --from_json configs/train.json --name train --nump 0

Multi-GPU (4 GPUs):

CUDA_VISIBLE_DEVICES=0,1,2,3 python main_train.py --from_json configs/train.json --name train --nump 4

🧪 Testing

CUDA_VISIBLE_DEVICES=0 python main_test.py --from_json configs/test.json --name train --nump 0

🙏 Acknowledgements

This work is built upon the following excellent open-source projects. We thank the authors for their contributions:

📎 Citation

If you find our work useful in your research, please cite:

@inproceedings{deng2024ram,
  title     = {RAM-Avatar: Real-Time Photo-Realistic Avatar from Monocular Videos with Full-Body Control},
  author    = {Deng, Xiang and Zheng, Zerong and Zhang, Yuxiang and Sun, Jingxiang and Xu, Chao and Yang, Xiaodong and Wang, Lizhen and Liu, Yebin},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages     = {1996--2007},
  year      = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
code/rendering		code/rendering
.gitignore		.gitignore
README.md		README.md
pipeline.png		pipeline.png
sample_results.png		sample_results.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAM-Avatar: Real-Time Photo-Realistic Full-Body Avatar from Monocular Video

🌟 Overview

📚 Abstract

⚙️ Requirements

🗂️ Dataset Preparation

📦 Pretrained Checkpoints & Datasets

🏃 Training

🧪 Testing

🙏 Acknowledgements

📎 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Xiang-Deng00/RAM-Avatar

Folders and files

Latest commit

History

Repository files navigation

RAM-Avatar: Real-Time Photo-Realistic Full-Body Avatar from Monocular Video

🌟 Overview

📚 Abstract

⚙️ Requirements

🗂️ Dataset Preparation

📦 Pretrained Checkpoints & Datasets

🏃 Training

🧪 Testing

🙏 Acknowledgements

📎 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages