SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

📖 Introduction

Vision-Language-Action (VLA) models have demonstrated strong generalization capabilities in robotic manipulation tasks. However, their inference process typically relies on large-scale vision and language models, resulting in high computational overhead that makes real-time deployment challenging.

In practice, we observe that action generation exhibits significant temporal redundancy: many continuous motion phases have stable trends that can be quickly predicted based on historical information, while complex vision-language reasoning is only needed at critical decision points (e.g., grasping or placing).

Based on this observation, we propose SP-VLA, a framework for VLA model inference acceleration that improves efficiency through two mechanisms: Model Scheduling and Token Pruning:

Model Scheduling: Dynamically selects inference paths based on action types, using a lightweight action generator during simple motion phases and invoking the full VLA model at critical decision stages.
Token Pruning: Filters key visual tokens using spatial and semantic information, reducing input size while maintaining spatial understanding capabilities.

By combining these two mechanisms, SP-VLA significantly reduces inference overhead while ensuring model decision-making capabilities, making VLA models more suitable for practical robotic system deployment.

🚀 Key Features

1. Adaptive Token Pruning

Attention-based Pruning: Leverages attention scores from the SigLIP vision encoder to identify important visual tokens
Edge Detection Enhancement: Combines Canny edge detection to ensure retention of tokens containing critical edge information
Dynamic Pruning Rate: Adaptively adjusts pruning intensity based on robot motion velocity
- Retains more tokens during high-speed motion to ensure accuracy
- Applies more aggressive pruning during low-speed motion to improve speed

2. Step Skipping

Motion Pattern Detection: Identifies scenarios where the robot performs smooth planar movements
Linear Regression Prediction: Uses ridge regression to fit historical actions and predict the next action
Conditional Skipping: Only skips model inference when the following conditions are met:
- Vertical motion is sufficiently small relative to horizontal motion
- Absolute vertical displacement is below threshold
- Action history buffer contains sufficient generated actions

📊 Experimental Results

Simulation Experiments

On the LIBERO benchmark, SP-VLA achieves 1.5× inference acceleration while maintaining task success rates unchanged.
In the SimplerEnv environment, SP-VLA not only achieves 2.4× acceleration but also improves task performance by 6%.

Visualization Examples

These tasks cover various robotic manipulation scenarios including spatial reasoning, long-horizon operations, and complex object interactions.

Real Robot Experiments

Real robot experiments conducted on the Franka Panda manipulator demonstrate:

2.5× end-to-end inference acceleration
Task success rate decreases by only 1%

🎯 Quick Start

1. Installation

# Clone this repository
git clone https://github.com/ChildTang/SP-VLA.git
cd SP-VLA

# Follow OpenVLA's installation steps to set up the project
# See: https://github.com/openvla/openvla

2. Run Evaluation

# Evaluate on LIBERO Spatial tasks
cd experiments/robot/libero
python run_libero.py \
    --pretrained_checkpoint /path/to/checkpoint \
    --task_suite_name libero_spatial \
    --cuda_device 0

📁 Project Structure

openvla_rule/
├── experiments/
│   └── robot/
│       ├── libero/
│       │   └── run_libero.py          # LIBERO evaluation script
│       ├── openvla_utils.py           # OpenVLA utility functions
│       └── robot_utils.py             # General robot utilities
├── prismatic/
│   ├── extern/hf/
│   │   └── modeling_prismatic.py      # Core model implementation (with optimizations)

🔧 Core Implementation

Token Pruning Implementation

Token pruning is implemented in PrismaticVisionBackbone.forward() in prismatic/extern/hf/modeling_prismatic.py:

# 1. Extract attention scores
siglip_attn = compute_attention_scores(siglip_q, siglip_k)

# 2. Dynamic threshold calculation
threshold = calculate_dynamic_threshold(z_trans, cfg)

# 3. Select important tokens
important_idx = select_tokens_by_attention(siglip_attn, threshold)

# 4. Combine with edge detection
edge_idx = detect_edge_tokens(raw_image)
important_idx = union(important_idx, edge_idx)

# 5. Apply pruning
patches = patches[:, important_idx]

Step Skipping Implementation

Step skipping is implemented in OpenVLAForActionPrediction.predict_action():

# 1. Check motion pattern
if is_planar_movement(recent_actions):
    # Use linear regression prediction
    action = fit_next_action(action_history)
else:
    # Run full VLA inference
    action = vla.generate(...)

📝 Citation

If you use this code, please cite our paper:

@article{li2025sp,
  title={Sp-vla: A joint model scheduling and token pruning approach for vla model acceleration},
  author={Li, Ye and Meng, Yuan and Sun, Zewen and Ji, Kangye and Tang, Chen and Fan, Jiajun and Ma, Xinzhu and Xia, Shutao and Wang, Zhi and Zhu, Wenwu},
  journal={arXiv preprint arXiv:2506.12723},
  year={2025}
}

🙏 Acknowledgments

This project is built upon the following excellent works:

OpenVLA - Original OpenVLA implementation
LIBERO - Robotic learning benchmark

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Note: This implementation focuses on inference acceleration. For training-related functionality, please refer to the original OpenVLA repository.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
experiments/robot		experiments/robot
img		img
prismatic		prismatic
scripts		scripts
vla-scripts		vla-scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_CN.md		README_CN.md
camper ready.pdf		camper ready.pdf
pyproject.toml		pyproject.toml
requirements-min.txt		requirements-min.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

📖 Introduction

🚀 Key Features

1. Adaptive Token Pruning

2. Step Skipping

📊 Experimental Results

Simulation Experiments

Visualization Examples

Real Robot Experiments

🎯 Quick Start

1. Installation

2. Run Evaluation

📁 Project Structure

🔧 Core Implementation

Token Pruning Implementation

Step Skipping Implementation

📝 Citation

🙏 Acknowledgments

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

📖 Introduction

🚀 Key Features

1. Adaptive Token Pruning

2. Step Skipping

📊 Experimental Results

Simulation Experiments

Visualization Examples

Real Robot Experiments

🎯 Quick Start

1. Installation

2. Run Evaluation

📁 Project Structure

🔧 Core Implementation

Token Pruning Implementation

Step Skipping Implementation

📝 Citation

🙏 Acknowledgments

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages