🤖 Self-Correcting VLA: Online Action Refinement via Sparse World Imagination

We present Self-Correcting VLA (SC-VLA), a novel framework designed to enhance physical grounding through intrinsic self-improvement. The model is equipped with Sparse World Imagination (SPI) to forecast task progress and future trajectory trends, and Online Action Refinement (OAR) to dynamically optimize policies via residual adjustments and reshaped rewards. SC-VLA achieves superior performance on ManiSkill and real-world ARX5 benchmarks, surpassing baselines in both success rate and execution throughput.

📜 [paper] 🤗 [datasets]

🚀 News

(🔥 New) (2026.2.26) We have released the code and datasets of SC-VLA !
(🔥 New) (2026.2.25) Our paper is released on the arXiv.

Preparation

Here we provide a conda environment setup for the project.

# clone the repository
git clone https://github.com/Kisaragi0/SC-VLA.git
cd SC-VLA

conda create -n scvla python=3.10
conda activate scvla

# install dependencies
pip install --upgrade setuptools
pip install -r requirements.txt

# Install ffmpeg (required only for torchcodec(real-bot))
conda install -c conda-forge ffmpeg==7.1.1

FlashAttention is required for efficient attention computation. The version must be compatible with your CUDA and PyTorch installation.

pip install --no-build-isolation flash-attn==2.7.1.post4

Hardware Note:
We have validated the project on NVIDIA L40 (CUDA 12.4) and RTX 5090 (CUDA 12.8) GPUs.
Please make sure to install compatible versions of PyTorch, xFormers, and FlashAttention according to your CUDA version.

Download the pretrained GR00T N1.5 weights from Huggingface and save them in SC-VLA/GR00T-N1.5-3B/

Datasets

1. Simulation Data (ManiSkill)

All simulation-based datasets and experiments in this project are conducted using the ManiSkill environment.

The ManiSkill simulation environment should be set up following the official installation guide.

We recommend installing ManiSkill in a separate conda environment (e.g., maniskill) following the official guide.

2. Real-World Robot Data

We provide real-world robot datasets collected on the ARX-5 platform via HuggingFace.

You can download the dataset using the HuggingFace CLI:

huggingface-cli download Kisaragi0/arx5_real_world_datasets \
  --repo-type dataset \
  --local-dir arx5_real_world_datasets

Training

0.Environment Overview

Component	Conda Environment	Description
SC-VLA Server	`scvla`	Model loading, policy inference, and training
ManiSkill Client	`maniskill`	Simulation, interaction, and evaluation

modify the variables in the script before you execute the following instruction

1. Train Sparse World Imagination Model

python scripts/scvla_train.py
# In the action head implementation:gr00t/model/action_head/flow_matching_action_head.py
# Make sure dataset and STATS_PATH are set consistently across the two files.

2. Train Online Action Refinement Model

# Ensure that the host and port settings are consistent between the policy service and the ManiSkill environment.
# Start the policy inference service (SC-VLA Environment)
python scripts/inference_service_policy.py
# Start training  (ManiSkill Environment)
python sac_residual/sac_maniskill_train.py

Evaluation

1. ManiSkill Simulation Evaluation

The evaluation is deployed in a client–server architecture, where the policy model runs as a service and the ManiSkill environment interacts with it as a client. To evaluate the model on ManiSkill, follow the steps below.

Step 1: Launch the Policy Service (SC-VLA Environment)

Before execution, modify the required variables in the script as needed.

# Start the policy inference service
python scripts/inference_service_policy.py

Step 2: Launch the ManiSkill Client (ManiSkill Environment)

In a separate terminal, start the ManiSkill client to connect to the policy service.

# Start the ManiSkill client
python scripts/eval_for_maniskill_v5_client.py

The policy service and ManiSkill client must use the same host and port to establish a successful connection.

2. ARX5 Real-Robot Deployment

We deploy SC-VLA on the ARX5 real robot platform. The environment setup and data collection follow our existing ARX5 pipeline; please refer to the ARX5 repository for details.

The real-robot deployment script for SC-VLA is provided below:

# Start the ARX5 client
python scripts/deploy_for_arx5_scvla.py

Acknowledgement

Our work is built upon the following projects, Thanks for their great open-source work!

Citation

If you find this project useful, please consider citing our work:

@article{SC-VLA,
      title={Self-Correcting VLA: Online Action Refinement via Sparse World Imagination}, 
      author={Chenyv Liu and Wentao Tan and Lei Zhu and Fengling Li and Jingjing Li and Guoli Yang and Heng Tao Shen},
      journal={arXiv preprint arXiv:2602.21633},
      year={2026},
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs/imgs		docs/imgs
gr00t		gr00t
sac_residual		sac_residual
scripts		scripts
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Self-Correcting VLA: Online Action Refinement via Sparse World Imagination

🚀 News

Preparation

Datasets

1. Simulation Data (ManiSkill)

2. Real-World Robot Data

Training

0.Environment Overview

1. Train Sparse World Imagination Model

2. Train Online Action Refinement Model

Evaluation

1. ManiSkill Simulation Evaluation

Step 1: Launch the Policy Service (SC-VLA Environment)

Step 2: Launch the ManiSkill Client (ManiSkill Environment)

2. ARX5 Real-Robot Deployment

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Self-Correcting VLA: Online Action Refinement via Sparse World Imagination

🚀 News

Preparation

Datasets

1. Simulation Data (ManiSkill)

2. Real-World Robot Data

Training

0.Environment Overview

1. Train Sparse World Imagination Model

2. Train Online Action Refinement Model

Evaluation

1. ManiSkill Simulation Evaluation

Step 1: Launch the Policy Service (SC-VLA Environment)

Step 2: Launch the ManiSkill Client (ManiSkill Environment)

2. ARX5 Real-Robot Deployment

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages