Skip to content

robomonkey-vla/RoboMonkey

Repository files navigation

Cartridges logo

Scaling Test-Time Sampling and Verification for Vision-Language-Action Models

arXiv Project Website HF Models License

Table of contents

🛠️ Setup

Clone this repository:

git clone --recurse-submodules https://github.com/robomonkey-vla/RoboMonkey.git

Use the provided script to set up all dependencies:

bash scripts/setup.sh

This setup has been tested on 2×RTX 4090 GPUs using this Docker image.

✅ Action Verifier

Spin up the action verifier server:

conda activate monkey-verifier
cd monkey-verifier/src
python infer_server.py

⚡ VLA Serving Engine

Launch OpenVLA using our optimized SGLang-based engine:

conda activate sglang-vla
cd sglang-vla
CUDA_VISIBLE_DEVICES=1 python openvla_server.py --seed 1

🤖 SIMPLER Environment

Running RoboMonkey

Activate the environment and run the evaluation script as follows:

conda activate simpler_env
export PRISMATIC_DATA_ROOT=. && export PYTHONPATH=.
cd openvla-mini

xvfb-run --auto-servernum -s "-screen 0 640x480x24" \
python experiments/robot/simpler/run_simpler_eval.py \
  --task_suite_name simpler_put_eggplant_in_basket \
  --initial_samples 9 \
  --augmented_samples 32
  • initial_samples: Number of actions generated by the base policy.
  • augmented_samples: Number of actions generated via Gaussian perturbation.
  • task_suite: simpler_put_eggplant_in_basket, simpler_stack_cube, simpler_spoon_on_towel, simpler_carrot_on_plate

Baseline without Verifier

To disable the verifier and use the base policy:

--initial_samples 1 --augmented_samples 1

📊 Evaluation Results

Task Initial Samples Augmented Samples Seed 1 Seed 2 Seed 3 Average Baseline Success Rate ↑
Eggplant in Basket 9 32 76% 66% 78% 73% 54% +19%
Carrot on Plate 5 16 24% 24% 26% 25% 20% +5%
Spoon on Towel 5 32 46% 46% 50% 47% 45% +2%
Stack Cube 9 32 46% 40% 48% 45% 35% +10%

Logs are saved under: openvla-mini/experiments/log/

📚 Acknowledgements

We thank the authors of OpenVLA, SGLang, SimplerEnv, LLaVA-RLHF, and OpenVLA-mini for their contributions to the open-source community. Our implementation builds upon these projects. For comprehensive details and the latest updates, please consult the official documentation and repositories of the respective projects.

If you find this project helpful, please consider citing:

@article{kwok25robomonkey,
  title={RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models},
  author={Jacky Kwok and Christopher Agia and Rohan Sinha and Matt Foutter and Shulu Li and Ion Stoica and Azalia Mirhoseini and Marco Pavone},
  journal={arXiv preprint arXiv:2506.17811},
  year={2025},
}

🔎 Troubleshooting

If you encounter the following error: No Vulkan extensions found for window surface creation (hint: set VK_ICD_FILENAMES to `locate icd.json`)

You can resolve this by running the script that installs Vulkan dependencies and sets up the correct ICD configuration:

bash scripts/vulkan.sh

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages