Cheng Zhang1,2
·
Boying Li1
·
Meng Wei1
·
Yan-Pei Cao3
·
Camilo Cruz Gambardella1,2
·
Dinh Phung1
·
Jianfei Cai1
1Monash University 2Building 4.0 CRC 3VAST
Paper | Project Page | Video
*Our UCPE introduces a geometry-consistent alternative to Plücker rays as one of the core contributions, enabling better generalization in Transformers. We hope to inspire future research on camera-aware architectures.
🔥 Camera-controlled text-to-video generation, now with intrinsics, distortion and orientation control!
📷 UCPE integrates Relative Ray Encoding—which delivers significantly better generalization than Plücker across diverse camera motion, intrinsics and lens distortions—with Absolute Orientation Encoding for controllable pitch and roll, enabling a unified camera representation for Transformers and state-of-the-art camera-controlled video generation with just 0.5% extra parameters (35.5M over the 7.3B parameters of the base model)
- 📁 PanShot Dataset And Curation Code (controllable camera data synthesized from PanFlow)
- 🎯 Full Training, Evaluation Code for UCPE
conda create -n UCPE python=3.11 -y
conda activate UCPE
conda install -c conda-forge "ffmpeg<8" libiconv libgl -y
pip install -r requirements.txt
pip install --no-build-isolation --no-cache-dir flash-attn==2.8.0.post2
pip install -e .
cd thirdparty/equilib
pip install -e .Download our finetuned weights from OneDrive and put it in logs/ folder. Then run:
bash scripts/demo.shThe generated videos will be saved in logs/6wodf04s/demo, examples shown below:
demo/lens.json: Our Relative Ray Encoding not only generalizes to but also enables controllability over a wide range of camera intrinsics and lens distortions.
demo/pose.json: The geometry-consistent design of Relative Ray Encoding further allows strong generalization and controllability over diverse camera motions.
demo/teaser.json: Our Absolute Orientation Encoding further eliminate the ambiguity in pitch and roll in previous T2V methods, enabling precise control over initial camera orientation.
Our paper cannot be completed without the amazing open-source projects Wan2.1, AC3D, ReCamMaster, CameraCtrl, prope, vllm, stella_vslam...
Also check out our Pan-Series works PanFlow, PanFusion and PanSplat towards 3D scene generation with panoramic images!





