Skip to content

pad3r/pad3r-public

Repository files navigation

PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos

SIGGRAPH ASIA 2025 conference track

arXiv Project page

PAD3R reconstructs dynamic 3D objects from a single casual monocular video, coupling object deformation with camera motion.

Overview

PAD3R reconstructs a dynamic 4D object from a monocular casual video in three stages:

  1. Static 3D — Zero123 SDS + SuGaR mesh-bound Gaussian Splatting on the canonical keyframe
  2. PoseNet — DINOv2-based camera pose estimator, trained on rendered views of the Stage 1 model
  3. Dynamic 4D — Deformation graph optimization guided by CoTracker3 2D correspondences

Setup

We test our code on torch2.0.1+cu118.

git clone [email protected]:pad3r/pad3r-public.git --recursive
cd pad3r-public
conda create -n pad3r python=3.10 && conda activate pad3r
pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
bash scripts/setup.sh

Download Zero123 weights:

mkdir -p load/zero123
wget -O load/zero123/stable_zero123.ckpt \
    https://huggingface.co/stabilityai/stable-zero123/resolve/main/stable_zero123.ckpt

CoTracker3 and DINOv2 weights download automatically on first use.

Data Preparation

PAD3R expects square RGBA PNGs with transparent background. We provide a preprocessing script that takes either a video or an image folder as input (this step is optional and can be skipped if data is already prepared):

python preprocess/prepare_frames.py \
    --input <video_or_image_dir> \
    --out_dir database/<seqname> \
    --text_prompt "<object_name>" \     # optional  
    --mask_dir <mask_dir>               # optional  

Output: database//000000.png, 000001.png, ...

Training

Edit the variables at the top of scripts/run.sh and run:

seqname="cows"  
VIDEO_DIR="database/cows"  
KEYFRAME=0              # index of the canonical frame  
SKIP_PREPROCESS=false   # set true if frames are already prepared  

If SKIP_PREPROCESS=false, also set:

INPUT="<path_to_video_or_images>"  
TEXT_PROMPT="cows"  
MASK_DIR="<path_to_masks>"  # optional  

bash scripts/run.sh

The script runs all stages end-to-end. Preprocessing is automatically executed unless SKIP_PREPROCESS=true.
Final output: outputs/pad3r//`

Acknowledgements

This project is built upon DreamMesh4D, threestudio and Lab4D, and also benefits from SuGaR, CoTracker, and DINOv2. We thank all the authors for their great work and for making their code publicly available.

Citation

@article{pad3r,
    author    = {Liao, Ting-Hsuan and Liu, Haowen and Xu, Yiran and Ge, Songwei and Yang, Gengshan and Huang, Jia-Bin},
    title     = {PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos},
    journal   = {SIGGRAPH ASIA},
    year      = {2025},
} 

About

Official implementation of PAD3R, a pipeline for reconstructing dynamic 3D objects from a single casual monocular video by jointly modeling object deformation and camera motion.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages