Skip to content

Code for PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies

License

Notifications You must be signed in to change notification settings

peek-robot/peek

Repository files navigation

PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies

PEEK Teaser

arXiv Website HF Dataset HF Model

PEEK enhances the zero-shot generalization ability of any RGB-input manipulation policy by showing policies where to focus on and what to do. This guidance is given to the policy via a VLM that predicts paths and masking points to draw onto the policy's input images in closed-loop.

🚀 Quick Start

We include PEEK VLM inference, VLM data labeling, ACT+PEEK pre-trained models + inference (on widowx) + training code, Pi-0+PEEK pre-trained models + inference (on widowx) + training code, and 3D-DA simulation code.

Installation

git clone --recursive https://github.com/peek-robot/peek.git # to download all submodules

# or if you already downloaded the repo without --recursive:
git submodule update --init --recursive
  • Follow the instructions in peek_vlm to install the VLM.
  • For ACT baseline training/inference, follow the instructions in lerobot to install the LeRobot.
  • For Pi-0, follow the instructions in openpi to install OpenPI.
  • For data labeling, follow the instructions in point_tracking to install the point tracking code.
  • For 3D-DA simulation, follow the instructions in 3dda to install the 3dda code.

Basic Usage

📖 Overview

PEEK works by:

  1. VLM Fine-tuning: Fine-tune a pre-trained VLM on automatically labeled robotics data to predict paths and masking points
  2. Policy Enhancement: Use the VLM to guide any RGB-input policy during both training and inference
  3. Zero-shot Generalization: Enable policies to generalize to new tasks, objects, and environments

Key Features

  • 🎯 Path Prediction: VLM predicts where the robot should move
  • 🎭 Masking Points: VLM identifies relevant areas to focus on
  • 🔄 Closed-loop Guidance: Real-time VLM predictions during policy execution
  • 🧩 Policy Agnostic: Works with any RGB-input manipulation policy
  • 🌍 Zero-shot Generalization: Tested on 535 real-world evaluations across 17 task variations

🏗️ Repository Structure

peek/
├── lerobot/                    # LeRobot (for ACT)
├── openpi/                    # OpenPI  (for Openpi)
├── 3dda/                      # For 3D-DA simulation code
├── point_tracking/            # Data Labeling code
└── peek_vlm/                  # PEEK-VLM wrapper

📊 Results

PEEK significantly improves policy performance across various scenarios:

  • Semantic Generalization: Handles novel objects and instructions
  • Visual Clutter: Robust performance in cluttered environments

Data Annotation / Data Labeling

We use the point tracking code to label the data. To download raw OXE or BRIDGE_v2 datasets like we used for PEEK VLM training, or to try out the data annotation pipeline on your own dataset, see the point tracking for instructions and examples.

VLM Training

We provide VLM model checkpoints and VQA dataset for the experiments in the paper. If you want to train your own VLM, please follow the official VILA SFT instructions.

Using PEEK's VLM to label data

We provide an example of using PEEK's VLM to label data in the peek_vlm folder.

We have inference examples (gradio, server/client), and we also have an example of batched labeling of VLM outputs at bridge example with instructions in the README.

Alternatively, if you just need the PEEK VLM path/mask labels for BRIDGE_v2, download here.

Making a dataset with PEEK VLM labels

We provide an example dataset conversion scripts for PEEK-VLM generated labels for BRIDGE_v2 to demonstrate how to convert VLM-labeled data to LeRobot data for upload and training with ACT/Pi-0/any lerobot dataset supported policy and codebase. See: openpi/examples/bridge/convert_bridge_data_to_lerobot.py.

Policy Training and Evaluation/Inference

We include policy checkpoints for ACT+PEEK and Pi-0+PEEK trained on BRIDGEv2 (along with standard ACT and Pi-0 checkpoints). We also include training/inference code for experiments in the paper:

  • LeRobot repo for ACT training/inference.
  • OpenPI repo for Pi-0 training/inference.
  • 3D-DA repo for 3D-DA sim training (for sim2real).

📄 Citation

@inproceedings{zhang2025peek,
    title={PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies}, 
    author={Jesse Zhang and Marius Memmel and Kevin Kim and Dieter Fox and Jesse Thomason and Fabio Ramos and Erdem Bıyık and Abhishek Gupta and Anqi Li},
    booktitle={arXiv:2509.18282},
    year={2025},
}

🙏 Acknowledgments

🔗 Links

About

Code for PEEK: Guiding and Minimal Image Representations for Zero-Shot Generalization of Robot Manipulation Policies

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •