FVP is a novel 3D point cloud representation learning pipeline for robotic manipulation. Different from prior works in Contrastive Learning and Masked Signal Modeling, FVP trains 3D visual representations by leveraging the preceding frame point cloud and employing a diffusion model to predict the point cloud of the current frame.
This is a PyTorch implementation of the paper FVP: 4D Visual Pre-training for Robot Learning:
@article{cheng2025fvp,
author = {Chengkai Hou and Yanjie Ze and Yankai Fu and Zeyu Gao and Yue Yu and Songbo Hu and Shanghang Zhang and Huazhe Xu},
title = {FVP: 4D Visual Pre-training for Robot Learning},
journal = {ICCV},
year = {2025},
}❗ This repo contains configs and experiments on simulation dataset and real-world dataset.
Please see DP3 installation instructions.
In addition to PyTorch environments, please install:
conda install pyyaml
pip install ema-pytorch tensorboardYou can generate a dataset of simulated data following the DP3 instructions, for example:
cd your_path/3D-Diffusion-Policy-master
bash scripts/gen_demonstration_adroit.sh hammerWe collect the real-world dataset as a dictionary, which follows the same format as the simulator dataset:
- "point_cloud": Array of shape (T, Np, 6), Np is the number of point clouds, 6 denotes [x, y, z, r, g, b]. Note: it is highly suggested to crop out the table/background and only leave the useful point clouds in your observation, which demonstrates effectiveness in our real-world experiments.
- "image": Array of shape (T, H, W, 3)
- "depth": Array of shape (T, H, W)
- "agent_pos": Array of shape (T, Nd), Nd is the action dim of the robot agent, i.e. 22 for our dexhand tasks (6d position of end effector + 16d joint position)
- "action": Array of shape (T, Nd). We use relative end-effector position control for the robot arm and relative joint-angle position control for the dex hand.
You can follow this example to collect real-world dataset.
For config dp3.yaml, you should change your dataset path. Then, you can use FVP to train:
python train_gpu.py --config config/dp3.yaml Simply load the weights trained using FVP, and then proceed with the standard DP3 command line for execution.
bash scripts/train_policy.sh dp3 adroit_hammer 0112 0 0