PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies
PEEK enhances the zero-shot generalization ability of any RGB-input manipulation policy by showing policies where to focus on and what to do. This guidance is given to the policy via a VLM that predicts paths and masking points to draw onto the policy's input images in closed-loop.
We include PEEK VLM inference, VLM data labeling, ACT+PEEK pre-trained models + inference (on widowx) + training code, Pi-0+PEEK pre-trained models + inference (on widowx) + training code, and 3D-DA simulation code.
git clone --recursive https://github.com/peek-robot/peek.git # to download all submodules
# or if you already downloaded the repo without --recursive:
git submodule update --init --recursive- Follow the instructions in peek_vlm to install the VLM.
- For ACT baseline training/inference, follow the instructions in lerobot to install the LeRobot.
- For Pi-0, follow the instructions in openpi to install OpenPI.
- For data labeling, follow the instructions in point_tracking to install the point tracking code.
- For 3D-DA simulation, follow the instructions in 3dda to install the 3dda code.
PEEK works by:
- VLM Fine-tuning: Fine-tune a pre-trained VLM on automatically labeled robotics data to predict paths and masking points
- Policy Enhancement: Use the VLM to guide any RGB-input policy during both training and inference
- Zero-shot Generalization: Enable policies to generalize to new tasks, objects, and environments
- 🎯 Path Prediction: VLM predicts where the robot should move
- 🎭 Masking Points: VLM identifies relevant areas to focus on
- 🔄 Closed-loop Guidance: Real-time VLM predictions during policy execution
- 🧩 Policy Agnostic: Works with any RGB-input manipulation policy
- 🌍 Zero-shot Generalization: Tested on 535 real-world evaluations across 17 task variations
peek/
├── lerobot/ # LeRobot (for ACT)
├── openpi/ # OpenPI (for Openpi)
├── 3dda/ # For 3D-DA simulation code
├── point_tracking/ # Data Labeling code
└── peek_vlm/ # PEEK-VLM wrapper
PEEK significantly improves policy performance across various scenarios:
- Semantic Generalization: Handles novel objects and instructions
- Visual Clutter: Robust performance in cluttered environments
We use the point tracking code to label the data. To download raw OXE or BRIDGE_v2 datasets like we used for PEEK VLM training, or to try out the data annotation pipeline on your own dataset, see the point tracking for instructions and examples.
We provide VLM model checkpoints and VQA dataset for the experiments in the paper. If you want to train your own VLM, please follow the official VILA SFT instructions.
We provide an example of using PEEK's VLM to label data in the peek_vlm folder.
We have inference examples (gradio, server/client), and we also have an example of batched labeling of VLM outputs at bridge example with instructions in the README.
Alternatively, if you just need the PEEK VLM path/mask labels for BRIDGE_v2, download here.
We provide an example dataset conversion scripts for PEEK-VLM generated labels for BRIDGE_v2 to demonstrate how to convert VLM-labeled data to LeRobot data for upload and training with ACT/Pi-0/any lerobot dataset supported policy and codebase. See: openpi/examples/bridge/convert_bridge_data_to_lerobot.py.
We include policy checkpoints for ACT+PEEK and Pi-0+PEEK trained on BRIDGEv2 (along with standard ACT and Pi-0 checkpoints). We also include training/inference code for experiments in the paper:
- LeRobot repo for ACT training/inference.
- OpenPI repo for Pi-0 training/inference.
- 3D-DA repo for 3D-DA sim training (for sim2real).
@inproceedings{zhang2025peek,
title={PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies},
author={Jesse Zhang and Marius Memmel and Kevin Kim and Dieter Fox and Jesse Thomason and Fabio Ramos and Erdem Bıyık and Abhishek Gupta and Anqi Li},
booktitle={arXiv:2509.18282},
year={2025},
}- Built on top of VILA-1.5
