Skip to content

This repository is the official implementation of Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer.

Notifications You must be signed in to change notification settings

DSaurus/Human4DiT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ruizhi Shao*, Youxin Pang*, Zerong Zheng, Jingxiang Sun, Yebin Liu.

report

This repository contains the official implementation of ”Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer“.

Teaser Image

News

  • [2024/10/21] Human4DiT processing code is available!
  • [2024/10/08] Human4DiT dataset is available!

TODO

  • Human4DiT dataset
  • Human4DiT dataset preprocssing code
  • Human4DiT model and inference code
  • Human4DiT training code

1. Human4DiT Dataset

Dataset Structure

Human4DiT dataset consists of 10K monocular human videos from the internet, 5k 3D human scans captured by a dense DLSR rig and 100 4D human characters.

For monocular human videos, we provide their downlond urls and corresponding SMPL sequences. For human scans, we provide 3D models (obj file) and the estimated SMPL model. For 4D human characters, we provide FBX model files.

Agreement

  1. The Human4DiT dataset (the "Dataset") is available for non-commercial research purposes only. Any other use, in particular any use for commercial purposes, is prohibited. This includes, without limitation, incorporation in a commercial product, use in a commercial service, as training data for a commercial product, for commercial ergonomic analysis (e.g. product design, architectural design, etc.), or production of other artifacts for commercial purposes including, for example, web services, movies, television programs, mobile applications, or video games. The dataset may not be used for pornographic purposes or to generate pornographic material whether commercial or not. The Dataset may not be reproduced, modified and/or made available in any form to any third party without Tsinghua University’s prior written permission.

  2. You agree not to reproduce, modified, duplicate, copy, sell, trade, resell or exploit any portion of the images and any portion of derived data in any form to any third party without Tsinghua University’s prior written permission.

  3. You agree not to further copy, publish or distribute any portion of the Dataset. Except, for internal use at a single site within the same organization it is allowed to make copies of the dataset.

  4. Tsinghua University reserves the right to terminate your access to the Dataset at any time.

Download Instructions

The dataset is encrypted to prevent unauthorized access.

Please fill the request form and get the download links of Human4DiT dataset.

By requesting for the link, you acknowledge that you have read the agreement, understand it, and agree to be bound by them. If you do not agree with these terms and conditions, you must not download and/or use the Dataset.

2. Video Dataset Processing

The code for processing video dataset in the data_scripts folder.

Download Videos from Internet

First download the csv file of Human4DiT-Video. Then install yt-dlp, which is a tool to download videos. You can download all videos using the following command:

python download.py --csv-file HUMAN4DIT_VIDEO_CSV \
    --yt-dlp-path YT_DLP_PATH \
    --output-dir OUTPUT_PATH \
    --cookies COOKIES_PATH \
    --download-nums DOWNLOAD_NUMS

For cookies, you should open your browser and sign in www.bilibili.com to get BiliBili cookie. If you don't have cookies, you can still download videos. However, the resolution will be limited in 720P.

Cut Videos

We use ffmpeg+NVENC(CUDA) to cut videos, which is much more efficient than CPU. If you only has CPU, you should open video_cut.py and change hevc_nvenc in line 124 to other CPU-only encoders. Once you finished downloading videos, you can run the following code to cut videos.

python video_cut.py --csv-file HUMAN4DIT_VIDEO_CSV \
    --num-devices YOUR_GPU_NUMS \
    --process-nums MULTIPROCESSING_NUMS \
    --input-video-dir VIDEO_DOWNLOAD_DIR \
    --output-video-dir CUT_VIDEO_FOLDER

Render SMPL Normals from SMPL Sequences

To render SMPL normals, you should first install SMPLX and download the SMPL model file basicModel_neutral_lbs_10_207_0_v1.0.0.pkl. You need also install Pytorch3D which can render SMPL normal maps with GPU. Finally you can download all SMPL sequnece files into SMPL_FOLDER and use the following code to render SMPL normal videos.

python render_smpl.py --num-devices GPU_NUMS \
    --process-nums MULTIPROCESSING_NUMS \
    --video-dir CUT_VIDEO_FOLDER \
    --smpl-dir SMPL_FOLDER \
    --output-dir RENDER_RESULTS_PATH \
    --smpl-model-path PATH_TO_basicModel_neutral_lbs_10_207_0_v1.0.0.pkl

3. 3D Dataset Processing

The code for processing 3D dataset is in the data_scripts folder. we provide render_thuman.py and render_thuman_smpl.py to render free-view videos for 3D human scans.

Render Free-view RGB Videos

We use pyrender to render free-view RGB videos for each 3D human scan. You should install pyrender first and then you can run the following code to get these videos.

python render_thuman.py --obj-dir 3D_OBJ_FOLDER  \
    --output-dir OUTPUT_RGB_RENDER_FOLDER

Render Free-view Videos of SMPL Normal Maps

We use the same Pytorch3D renderer for SMPL normal maps rendering. You should install pyrender first and then you can run the following code to get videos of SMPL normal maps.

python render_thuman_smpl.py --obj-dir 3D_OBJ_FOLDER \
    --camera-dir OUTPUT_RGB_RENDER_FOLDER \
    --output-dir OUTPUT_NORMAL_RENDER_FOLDER

4. Inference

ref_img represents the input reference image. For condition preparing, including normal maps and dwpose maps, please refer to ./opensora/datasets/datasets_image.py for more details. Also, you should change the path in ./configs/opensora/train/4dtrans.py to your path. You can download the pre-trained ckpt here.

CUDA_VISIBLE_DEVICES=1 \
    python inference_long.py configs/opensora/train/4dtrans.py \
    --ckpt-path ./checkpoints/humandit/ema.pt \
    --ref_img ./test0614/1.jpg                    

Citation

@article{shao2024human4dit,
title={Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer},
author={Shao, Ruizhi and Pang, Youxin and Zheng, Zerong and Sun, Jingxiang and Liu, Yebin},
journal={ACM Transactions on Graphics (TOG)},
volume={43},
number={6},
articleno={},
year={2024}, publisher={ACM New York, NY, USA}
}

About

This repository is the official implementation of Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages