Skip to content

DravenALG/VLANeXt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VLANeXt: Recipes for Building Strong VLA Models

arXiv Project Page Hugging Face Awesome VLA & WAM

roadmap of vlanext

This is a PyTorch implementation of the paper: VLANeXt: Recipes for Building Strong VLA Models, and also a unified, easy-to-use codebase that standardizes training and evaluation while exposing the key components of the VLA design space. It is intentionally lightweight and minimally encapsulated, enabling researchers to reproduce results, probe alternative design choices, and build new VLA variants on a shared, transparent foundation. We also release a curated and continuously updated list of VLA research (Awesome VLA) to help better understand the development of VLAs.

codebase overview

We'll keep updating this repo with new features. Welcome to join us by:

  1. Build your own VLAs on our VLANeXt codebase. We will keep your models alongside VLANeXt and RT-2 baseline in our repo for others to use.
  2. Add your VLMs, diffusion algorithms or other general strategies in our VLANeXt codebase, to enrich the design space of VLANeXt, and test your general strategies in the robotics domain.

Let's build the future of VLAs together! If you have any questions, feel free to contact me by [email protected].

🛠️ Environment Setup

Basic Installation

# Basic setup 
conda create -n VLANeXt python=3.10
conda activate VLANeXt
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
conda install -c conda-forge ffmpeg

Benchmark Installation

LIBERO

cd third_party
git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git
cd LIBERO && pip install .

LIBERO-plus (Separate env needed)

cd third_party
git clone https://github.com/sylvestf/LIBERO-plus.git
cd LIBERO-plus && pip install .
# Dependencies
apt install libexpat1 libfontconfig1-dev libpython3-stdlib libmagickwand-dev
pip install -r extra_requirements.txt
conda env config vars set LIBERO_CONFIG_PATH=~/.libero_plus

We also need to download the asserts, see LIBERO-plus.

🚀 Training

Droid dataset is for robotics pretraining (used in our real-world experiments), and libero dataset is for benchmark evaluation (used in our benchmark evaluation). The default training setting is for our final VLANeXt framework.

framework of vlanext

🧪 Design Space Exploration

We provide a tutorial-style guide to configuring the 12 design spaces from our paper.

👉 Please refer to DESIGN_SPACE.md for detailed configuration instructions.

Droid Dataset

For more details, please refer to the Droid Dataset.

Download:

gsutil -m rsync -r gs://gresearch/robotics/droid/1.0.1 droid/1.0.1/ 

Run Training:

# Single GPU
CUDA_VISIBLE_DEVICES=0 python -m scripts.train --config config/droid_train_config.yaml

# Multi-GPU (Set distributed=true in config)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=29505 -m scripts.train --config config/droid_train_config.yaml

LIBERO Dataset

For more details, please refer to the OpenVLA, which modifies the original dataset in LIBERO for training VLAs.

Download:

hf download openvla/modified_libero_rlds --repo-type dataset --local-dir LIBERO_modified

Run Training:

# Single GPU
CUDA_VISIBLE_DEVICES=0 python -m scripts.train --config config/libero_train_config.yaml

# Multi-GPU (Set distributed=true in config)
CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc_per_node=4 --master_port=29506 -m scripts.train --config config/libero_train_config.yaml

📊 Evaluation

We have released VLANeXt checkpoints for the four LIBERO or LIBERO-plus suites on huggingface. These checkpoints achieve slightly better performance than the results reported in the paper, as the paper reports the average results.

LIBERO

For more details, please refer to the official repository of LIBERO.

unset PYTHONPATH
export PYTHONPATH=$PYTHONPATH:~/proj/VLANeXt/third_party/LIBERO

CUDA_VISIBLE_DEVICES=0 MUJOCO_EGL_DEVICE_ID=0 python -m scripts.libero_bench_eval

LIBERO-plus

For more details, please refer to the official repository of LIBERO-plus.

unset PYTHONPATH
export PYTHONPATH=$PYTHONPATH:~/proj/VLANeXt/third_party/LIBERO-plus

CUDA_VISIBLE_DEVICES=0 MUJOCO_EGL_DEVICE_ID=0 python -m scripts.libero_plus_bench_eval

real-world attempts

We also released a checkpoint trained on the Droid dataset, which can be finetuned for your real-world experiments with a Franka robotic arm.

⚡ Analysis

Model Size and Speed
Set CHECKPOINT_PATH and INPUT_MODALITY in scripts/size_speed_eval.py.

CUDA_VISIBLE_DEVICES=0 python -m scripts.size_speed_eval

❗ Common Issues

If you run into issues, check COMMON_ISSUES.md for known problems and solutions.

📚 Citation

If you find VLANeXt useful for your research or applications, please cite our paper using the following BibTeX:

  @article{wu2026vlanext,
      title={VLANeXt: Recipes for Building Strong VLA Models}, 
      author={Xiao-Ming Wu and Bin Fan and Kang Liao and Jian-Jian Jiang and Runze Yang and Yihang Luo and Zhonghua Wu and Wei-Shi Zheng and Chen Change Loy},
      journal={arXiv preprint arXiv:2602.18532},
      year={2026},
  }

🗞️ License

This project is licensed under NTU S-Lab License 1.0.

About

Official implementation of VLANeXt.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages