This is a PyTorch implementation of the paper: VLANeXt: Recipes for Building Strong VLA Models, and also a unified, easy-to-use codebase that standardizes training and evaluation while exposing the key components of the VLA design space. It is intentionally lightweight and minimally encapsulated, enabling researchers to reproduce results, probe alternative design choices, and build new VLA variants on a shared, transparent foundation. We also release a curated and continuously updated list of VLA research (Awesome VLA) to help better understand the development of VLAs.
We'll keep updating this repo with new features. Welcome to join us by:
- Build your own VLAs on our VLANeXt codebase. We will keep your models alongside VLANeXt and RT-2 baseline in our repo for others to use.
- Add your VLMs, diffusion algorithms or other general strategies in our VLANeXt codebase, to enrich the design space of VLANeXt, and test your general strategies in the robotics domain.
Let's build the future of VLAs together! If you have any questions, feel free to contact me by [email protected].
# Basic setup
conda create -n VLANeXt python=3.10
conda activate VLANeXt
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
conda install -c conda-forge ffmpegLIBERO
cd third_party
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
cd LIBERO && pip install .LIBERO-plus (Separate env needed)
cd third_party
git clone https://github.com/sylvestf/LIBERO-plus.git
cd LIBERO-plus && pip install .
# Dependencies
apt install libexpat1 libfontconfig1-dev libpython3-stdlib libmagickwand-dev
pip install -r extra_requirements.txt
conda env config vars set LIBERO_CONFIG_PATH=~/.libero_plusWe also need to download the asserts, see LIBERO-plus.
Droid dataset is for robotics pretraining (used in our real-world experiments), and libero dataset is for benchmark evaluation (used in our benchmark evaluation). The default training setting is for our final VLANeXt framework.
We provide a tutorial-style guide to configuring the 12 design spaces from our paper.
👉 Please refer to DESIGN_SPACE.md for detailed configuration instructions.
For more details, please refer to the Droid Dataset.
Download:
gsutil -m rsync -r gs://gresearch/robotics/droid/1.0.1 droid/1.0.1/ Run Training:
# Single GPU
CUDA_VISIBLE_DEVICES=0 python -m scripts.train --config config/droid_train_config.yaml
# Multi-GPU (Set distributed=true in config)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=29505 -m scripts.train --config config/droid_train_config.yamlFor more details, please refer to the OpenVLA, which modifies the original dataset in LIBERO for training VLAs.
Download:
hf download openvla/modified_libero_rlds --repo-type dataset --local-dir LIBERO_modifiedRun Training:
# Single GPU
CUDA_VISIBLE_DEVICES=0 python -m scripts.train --config config/libero_train_config.yaml
# Multi-GPU (Set distributed=true in config)
CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc_per_node=4 --master_port=29506 -m scripts.train --config config/libero_train_config.yamlWe have released VLANeXt checkpoints for the four LIBERO or LIBERO-plus suites on huggingface. These checkpoints achieve slightly better performance than the results reported in the paper, as the paper reports the average results.
For more details, please refer to the official repository of LIBERO.
unset PYTHONPATH
export PYTHONPATH=$PYTHONPATH:~/proj/VLANeXt/third_party/LIBERO
CUDA_VISIBLE_DEVICES=0 MUJOCO_EGL_DEVICE_ID=0 python -m scripts.libero_bench_evalFor more details, please refer to the official repository of LIBERO-plus.
unset PYTHONPATH
export PYTHONPATH=$PYTHONPATH:~/proj/VLANeXt/third_party/LIBERO-plus
CUDA_VISIBLE_DEVICES=0 MUJOCO_EGL_DEVICE_ID=0 python -m scripts.libero_plus_bench_evalWe also released a checkpoint trained on the Droid dataset, which can be finetuned for your real-world experiments with a Franka robotic arm.
Model Size and Speed
Set CHECKPOINT_PATH and INPUT_MODALITY in scripts/size_speed_eval.py.
CUDA_VISIBLE_DEVICES=0 python -m scripts.size_speed_evalIf you run into issues, check COMMON_ISSUES.md for known problems and solutions.
If you find VLANeXt useful for your research or applications, please cite our paper using the following BibTeX:
@article{wu2026vlanext,
title={VLANeXt: Recipes for Building Strong VLA Models},
author={Xiao-Ming Wu and Bin Fan and Kang Liao and Jian-Jian Jiang and Runze Yang and Yihang Luo and Zhonghua Wu and Wei-Shi Zheng and Chen Change Loy},
journal={arXiv preprint arXiv:2602.18532},
year={2026},
}This project is licensed under NTU S-Lab License 1.0.


