This is the official implementation of paper:
ChatCam: Empowering Camera Control through Conversational AI
Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang
NeurIPS 2024
Project Page | Paper
conda create --name chatcam -y python=3.8
conda activate chatcam
pip install torch==1.13.1 torchvision functorch --extra-index-url https://download.pytorch.org/whl/cu117
pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
pip install ftfy regex tqdm pillow clip
pip install nerfstudio
ns-train -hWe quantize camera trajectories to sequences of tokens and adopt a GPT-based architecture to generate the tokens autoregressively. Learning trajectory and language jointly, CineGPT is capable of text-conditioned trajectory generation.
Codes and model weights for CineGPT coming soon...
Given a prompt describing the image rendered from an anchor point, the anchor selector chooses the best matching input image. An anchor refinement procedure further fine-tunes the anchor position.
Codes and model weights for Anchor Determinator coming soon...
Through this prompt, we provide the LLM with detailed instructions and guidelines for tool usage to achieve the target. We include a template and examples for the LLM's responses. Check out the user-agent convesation example.
If you find ChatCam useful in your research, please consider citing:
@article{liu2024chatcam,
title={ChatCam: Empowering Camera Control through Conversational AI},
author={Liu, Xinhang and Tai, Yu-Wing and Tang, Chi-Keung},
journal={arXiv preprint arXiv:2409.17331},
year={2024}
}

