Skip to content
/ UCPE Public

📷 Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!

Notifications You must be signed in to change notification settings

chengzhag/UCPE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📷 UCPE

Unified Camera Positional Encoding for Controlled Video Generation

Cheng Zhang1,2 · Boying Li1 · Meng Wei1 · Yan-Pei Cao3 · Camilo Cruz Gambardella1,2 · Dinh Phung1 · Jianfei Cai1
1Monash University 2Building 4.0 CRC 3VAST

Watch the video *Our UCPE introduces a geometry-consistent alternative to Plücker rays as one of the core contributions, enabling better generalization in Transformers. We hope to inspire future research on camera-aware architectures.

🚀 TLDR

🔥 Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!

Camera lenses             Orientation control

📷 UCPE integrates Relative Ray Encoding—which delivers significantly better generalization than Plücker across diverse camera motion, intrinsics and lens distortions—with Absolute Orientation Encoding for controllable pitch and roll, enabling a unified camera representation for Transformers and state-of-the-art camera-controlled video generation with just 0.5% extra parameters (35.5M over the 7.3B parameters of the base model)

UCPE

🔔 Coming Soon

  • 📁 PanShot Dataset And Curation Code (controllable camera data synthesized from PanFlow)
  • 🎯 Full Training, Evaluation Code for UCPE

🛠️ Installation

conda create -n UCPE python=3.11 -y
conda activate UCPE
conda install -c conda-forge "ffmpeg<8" libiconv libgl -y
pip install -r requirements.txt
pip install --no-build-isolation --no-cache-dir flash-attn==2.8.0.post2
pip install -e .

cd thirdparty/equilib
pip install -e .

⚡ Quick Demo

Download our finetuned weights from OneDrive and put it in logs/ folder. Then run:

bash scripts/demo.sh

The generated videos will be saved in logs/6wodf04s/demo, examples shown below:

  • demo/lens.json: Our Relative Ray Encoding not only generalizes to but also enables controllability over a wide range of camera intrinsics and lens distortions.

Lens control

  • demo/pose.json: The geometry-consistent design of Relative Ray Encoding further allows strong generalization and controllability over diverse camera motions.

Pose control

  • demo/teaser.json: Our Absolute Orientation Encoding further eliminate the ambiguity in pitch and roll in previous T2V methods, enabling precise control over initial camera orientation.

Orientation control

💡 Acknowledgements

Our paper cannot be completed without the amazing open-source projects Wan2.1, AC3D, ReCamMaster, CameraCtrl, prope, vllm, stella_vslam...

Also check out our Pan-Series works PanFlow, PanFusion and PanSplat towards 3D scene generation with panoramic images!

About

📷 Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages