We present Self-Correcting VLA (SC-VLA), a novel framework designed to enhance physical grounding through intrinsic self-improvement. The model is equipped with Sparse World Imagination (SPI) to forecast task progress and future trajectory trends, and Online Action Refinement (OAR) to dynamically optimize policies via residual adjustments and reshaped rewards. SC-VLA achieves superior performance on ManiSkill and real-world ARX5 benchmarks, surpassing baselines in both success rate and execution throughput.
- (🔥 New) (2026.2.26) We have released the code and datasets of SC-VLA !
- (🔥 New) (2026.2.25) Our paper is released on the arXiv.
Here we provide a conda environment setup for the project.
# clone the repository
git clone https://github.com/Kisaragi0/SC-VLA.git
cd SC-VLA
conda create -n scvla python=3.10
conda activate scvla
# install dependencies
pip install --upgrade setuptools
pip install -r requirements.txt
# Install ffmpeg (required only for torchcodec(real-bot))
conda install -c conda-forge ffmpeg==7.1.1FlashAttention is required for efficient attention computation. The version must be compatible with your CUDA and PyTorch installation.
pip install --no-build-isolation flash-attn==2.7.1.post4Hardware Note:
We have validated the project on NVIDIA L40 (CUDA 12.4) and RTX 5090 (CUDA 12.8) GPUs.
Please make sure to install compatible versions of PyTorch, xFormers, and FlashAttention according to your CUDA version.
Download the pretrained GR00T N1.5 weights from Huggingface and save them in
SC-VLA/GR00T-N1.5-3B/
All simulation-based datasets and experiments in this project are conducted using the ManiSkill environment.
The ManiSkill simulation environment should be set up following the official installation guide.
We recommend installing ManiSkill in a separate conda environment (e.g., maniskill) following the official guide.
We provide real-world robot datasets collected on the ARX-5 platform via HuggingFace.
You can download the dataset using the HuggingFace CLI:
huggingface-cli download Kisaragi0/arx5_real_world_datasets \
--repo-type dataset \
--local-dir arx5_real_world_datasets| Component | Conda Environment | Description |
|---|---|---|
| SC-VLA Server | scvla |
Model loading, policy inference, and training |
| ManiSkill Client | maniskill |
Simulation, interaction, and evaluation |
modify the variables in the script before you execute the following instruction
python scripts/scvla_train.py
# In the action head implementation:gr00t/model/action_head/flow_matching_action_head.py
# Make sure dataset and STATS_PATH are set consistently across the two files.# Ensure that the host and port settings are consistent between the policy service and the ManiSkill environment.
# Start the policy inference service (SC-VLA Environment)
python scripts/inference_service_policy.py
# Start training (ManiSkill Environment)
python sac_residual/sac_maniskill_train.pyThe evaluation is deployed in a client–server architecture, where the policy model runs as a service and the ManiSkill environment interacts with it as a client. To evaluate the model on ManiSkill, follow the steps below.
Before execution, modify the required variables in the script as needed.
# Start the policy inference service
python scripts/inference_service_policy.pyIn a separate terminal, start the ManiSkill client to connect to the policy service.
# Start the ManiSkill client
python scripts/eval_for_maniskill_v5_client.pyThe policy service and ManiSkill client must use the same host and port to establish a successful connection.
We deploy SC-VLA on the ARX5 real robot platform. The environment setup and data collection follow our existing ARX5 pipeline; please refer to the ARX5 repository for details.
The real-robot deployment script for SC-VLA is provided below:
# Start the ARX5 client
python scripts/deploy_for_arx5_scvla.pyOur work is built upon the following projects, Thanks for their great open-source work!
If you find this project useful, please consider citing our work:
@article{SC-VLA,
title={Self-Correcting VLA: Online Action Refinement via Sparse World Imagination},
author={Chenyv Liu and Wentao Tan and Lei Zhu and Fengling Li and Jingjing Li and Guoli Yang and Heng Tao Shen},
journal={arXiv preprint arXiv:2602.21633},
year={2026},
}
