[Project Page] [Paper] [Video]
Authors: Shivansh Patel1, Shraddhaa Mohan1, Hanlin Mai1, Unnat Jain2*, Svetlana Lazebnik1*, Yunzhu Li3* (* denotes equal advising)
1 University of Illinois Urbana-Champaign • 2 UC Irvine 3 Columbia University
RIGVid (Robot Imitation from Generated Videos) extracts a 6‑DoF pose rollout of a moving object purely from a generated video. Starting from a single RGB‑D observation, RIGVid:
- Predicts monocular depth for every video frame using RollingDepth.
- Extracts 6‑DoF poses via FoundationPose: register on the first frame, then track through time.
- Visualizes the scene point cloud and object trajectory in an interactive Plotly viewer.
This demo code reproduces the pipeline end‑to‑end in the camera frame.
We provide an installation script that clones and installs both RollingDepth and FoundationPose in one environment:
git clone https://github.com/shivanshpatel35/rigvid.git
cd rigvid
./install.sh # creates and configures 'rigvid' conda envThen activate:
conda activate rigvidThe installation will likely take some time, please be patient.
All required assets (generated video, RGB-D, masks, intrinsics) are in the media/ folder. Simply run:
python demo.pyIf you wish to generate your own video, you have two options:
- Use the KlingAI API. Code for querying is provided in
video_gen_query.py. You'll need to obtain a KlingAI API Key and configure it via environment variables:
export KLING_API_ACCESS_KEY="YOUR_KLINGAI_KEY"
export KLING_API_SECRET_KEY="YOUR_KLINGAI_SECRET_KEY"
However, note that the cheapest API package currently costs $1400, as detailed here.
- Use the KlingAI web interface. Navigate to the site, use the Kling V1.6 model in Professional Mode, and set a high creativity and relevance factor. Once the video is generated, download it manually.
- Outputs saved under
outputs/:- Depth maps, numpy arrays, and visualization videos
- 6‑DoF pose matrices in
outputs/fp_outputs/ob_in_cam/ - Interactive HTML:
outputs/trajectory_visualization.html
The following modifications need to be made for real-world deployment:
- Transform to world frame using your camera extrinsics.
- Smoothing is crucial to avoid jerky motions.
- Retarget poses to your robot’s end‑effector:
T_world_to_eef = T_world_to_object_smooth @ T_object_to_eef
We build upon RollingDepth and FoundationPose.

