Skip to content

Xiang-Deng00/RAM-Avatar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAM-Avatar: Real-Time Photo-Realistic Full-Body Avatar from Monocular Video

(CVPR 2024)

Xiang Deng1, Zerong Zheng2,
Yuxiang Zhang1, Jingxiang Sun1,
Chao Xu2, XiaoDong Yang3, Lizhen Wang1, Yebin Liu1

1Tsinghua University 2NNKosmos Technology 3Li Auto


📄 Paper  |  🎥 [Video Demo]  | 


🌟 Overview

We present RAM-Avatar, a novel approach for learning real-time, photo-realistic full-body avatars from monocular videos. Our method enables full-body control with high-fidelity rendering of facial expressions, hand gestures, and body textures — all while maintaining real-time performance.

Sample Results

📚 Abstract

This paper advances the practicality of human avatar learning by introducing RAM-Avatar, a framework that learns real-time, photo-realistic avatars from monocular videos with full-body controllability. To model fine-grained variations in facial expressions and hand gestures, we employ dedicated statistical templates. For the body, a sparsely computed dual attention mechanism enhances texture fidelity on torso and limbs. Built upon this, a lightweight yet powerful StyleUNet architecture, coupled with a temporal-aware discriminator, enables efficient and realistic rendering at real-time speeds. To ensure robust animation under out-of-distribution poses, we propose a Motion Distribution Alignment (MDA) module that reduces domain shift between training and inference. Extensive experiments validate the superiority of our method in both qualitative and quantitative evaluations. We further demonstrate its practical potential via a real-time live avatar system. Code and models will be released for research purposes.

Method Pipeline

⚙️ Requirements

- Python 3.9.17
- PyTorch 2.0.0+cu118
- TorchVision 0.15.1+cu118
- setuptools 68.0.0
- scikit-image 0.22.0
- numpy 1.25.2

💡 We recommend using a Conda environment:

conda create -n ramavatar python=3.9
conda activate ramavatar
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.0+cu118 --index-url https://download.pytorch.org/whl/cu118
pip install scikit-image numpy setuptools

🗂️ Dataset Preparation

To train RAM-Avatar, prepare your dataset as follows:

  1. Estimate SMPL-X parameters using ProxyCapV2.
  2. Fit FaceVerse parameters for facial dynamics.
  3. Render SMPL and facial maps using PyTorch3D.
  4. Organize the data directory structure:
dataset/train/
├── keypoints_mmpose_hand/
│   ├── 00000001.json
│   ├── 00000002.json
│   └── ...
├── smpl_map/
│   ├── 00000001.png
│   ├── 00000002.png
│   └── ...
├── smpl_map_001/
│   ├── 00000001.png
│   ├── 00000002.png
│   └── ...
├── track2/
│   ├── 00000001.png
│   ├── 00000002.png
│   └── ...
├── 00000001.png          # Original frames
├── 00000002.png
└── ...

📦 Pretrained Checkpoints & Datasets

You can download our pre-trained models and sample datasets here:

⚠️ Note: These links are hosted on Baidu Netdisk. International users may need a download accelerator.


🏃 Training

Single GPU:

CUDA_VISIBLE_DEVICES=0 python main_train.py --from_json configs/train.json --name train --nump 0

Multi-GPU (4 GPUs):

CUDA_VISIBLE_DEVICES=0,1,2,3 python main_train.py --from_json configs/train.json --name train --nump 4

🧪 Testing

CUDA_VISIBLE_DEVICES=0 python main_test.py --from_json configs/test.json --name train --nump 0

🙏 Acknowledgements

This work is built upon the following excellent open-source projects. We thank the authors for their contributions:


📎 Citation

If you find our work useful in your research, please cite:

@inproceedings{deng2024ram,
  title     = {RAM-Avatar: Real-Time Photo-Realistic Avatar from Monocular Videos with Full-Body Control},
  author    = {Deng, Xiang and Zheng, Zerong and Zhang, Yuxiang and Sun, Jingxiang and Xu, Chao and Yang, Xiaodong and Wang, Lizhen and Liu, Yebin},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages     = {1996--2007},
  year      = {2024}
}

© 2025 RAM-Avatar Authors. This project is for academic purposes only.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published