Robotic Manipulation by Imitating Generated Videos (RIGVid)

Authors: Shivansh Patel¹, Shraddhaa Mohan¹, Hanlin Mai¹, Unnat Jain^2*, Svetlana Lazebnik^1*, Yunzhu Li^3* (* denotes equal advising)

¹ University of Illinois Urbana-Champaign • ² UC Irvine ³ Columbia University

📖 Overview

RIGVid (Robot Imitation from Generated Videos) extracts a 6‑DoF pose rollout of a moving object purely from a generated video. Starting from a single RGB‑D observation, RIGVid:

Predicts monocular depth for every video frame using RollingDepth.
Extracts 6‑DoF poses via FoundationPose: register on the first frame, then track through time.
Visualizes the scene point cloud and object trajectory in an interactive Plotly viewer.

This demo code reproduces the pipeline end‑to‑end in the camera frame.

🏗 Installation

We provide an installation script that clones and installs both RollingDepth and FoundationPose in one environment:

git clone https://github.com/shivanshpatel35/rigvid.git
cd rigvid
./install.sh      # creates and configures 'rigvid' conda env

Then activate:

conda activate rigvid

The installation will likely take some time, please be patient.

🎬 Quick Demo

All required assets (generated video, RGB-D, masks, intrinsics) are in the media/ folder. Simply run:

python demo.py

If you wish to generate your own video, you have two options:

Use the KlingAI API. Code for querying is provided in video_gen_query.py. You'll need to obtain a KlingAI API Key and configure it via environment variables:

export KLING_API_ACCESS_KEY="YOUR_KLINGAI_KEY"
export KLING_API_SECRET_KEY="YOUR_KLINGAI_SECRET_KEY"

However, note that the cheapest API package currently costs $1400, as detailed here.

Use the KlingAI web interface. Navigate to the site, use the Kling V1.6 model in Professional Mode, and set a high creativity and relevance factor. Once the video is generated, download it manually.

Outputs saved under outputs/:
- Depth maps, numpy arrays, and visualization videos
- 6‑DoF pose matrices in outputs/fp_outputs/ob_in_cam/
- Interactive HTML: outputs/trajectory_visualization.html

Interactive 3D view of scene point cloud + object trajectory.

🏭 Real‑World Deployment

The following modifications need to be made for real-world deployment:

Transform to world frame using your camera extrinsics.
Smoothing is crucial to avoid jerky motions.

Retarget poses to your robot’s end‑effector:

T_world_to_eef = T_world_to_object_smooth @ T_object_to_eef

📜 License & Acknowledgements

We build upon RollingDepth and FoundationPose.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
media		media
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
depth_predictor.py		depth_predictor.py
install.sh		install.sh
pose_rollout_predictor.py		pose_rollout_predictor.py
utils.py		utils.py
video_gen_query.py		video_gen_query.py
visualizer.py		visualizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Robotic Manipulation by Imitating Generated Videos (RIGVid)

📖 Overview

🏗 Installation

🎬 Quick Demo

🏭 Real‑World Deployment

📜 License & Acknowledgements

About

Uh oh!

Releases

Packages

Languages

shivanshpatel35/rigvid

Folders and files

Latest commit

History

Repository files navigation

Robotic Manipulation by Imitating Generated Videos (RIGVid)

📖 Overview

🏗 Installation

🎬 Quick Demo

🏭 Real‑World Deployment

📜 License & Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages