Skip to content

LUOyk1999/SimVLA

Repository files navigation

SimVLA: A Simple VLA Baseline for Robotic Manipulation

Paper Website Model & Data
Paper Website Hugging Face

A simple and efficient Vision-Language-Action (VLA) model for robot manipulation tasks.

image

Installation

conda create -n simvla python=3.10 -y
conda activate simvla

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install transformers>=4.57.0
pip install peft accelerate fastapi tensorboard uvicorn json_numpy safetensors scipy einops timm mmengine pyarrow h5py mediapy num2words av wandb websockets msgpack_numpy
pip install flash-attn==2.5.6 --no-build-isolation
pip install tensorflow tensorflow-datasets

Important: Use transformers>=4.57.0.

Training (LIBERO Dataset)

1. Prepare LIBERO Dataset

Download LIBERO dataset, and place it in ./datasets/metas/.

2. Create Training Metadata

python create_libero_meta.py \
    --data_dir ./datasets/metas \
    --subsets libero_10 libero_goal libero_object libero_spatial \
    --output ./datasets/metas/libero_train.json

3. Compute Normalization Statistics

python compute_libero_norm_stats.py \
    --data_dir ./datasets/metas \
    --subsets libero_10 libero_goal libero_object libero_spatial \
    --output ./norm_stats/libero_norm.json

4. Start Training

Small Model Configuration:

bash train_smolvlm_small.sh

Large Model Configuration:

bash train_smolvlm_large.sh

5. Evaluation

cd evaluation/libero

6. Results

image

Model Architecture

  • Vision-Language Backbone: SmolVLM-500M-Instruct (576 hidden dim)
  • Action Transformer: Configurable depth and width
    • Small: 768 hidden, 12 layers, 12 heads
    • Large: 1024 hidden, 24 layers, 16 heads

Reference

If you find our codes useful, please consider citing our work

@article{luo2026simvla,
  title={SimVLA: A Simple VLA Baseline for Robotic Manipulation},
  author={Luo, Yuankai and Chen, Woping and Liang, Tong and Wang, Baiqiao and Li, Zhenguo},
  journal={arXiv preprint arXiv:2602.18224},
  year={2026}
}

About

Implementation of "SimVLA: A Simple VLA Baseline for Robotic Manipulation"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors