Skip to content

ChildTang/SP-VLA

Repository files navigation

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

Paper Code License


📖 Introduction

Vision-Language-Action (VLA) models have demonstrated strong generalization capabilities in robotic manipulation tasks. However, their inference process typically relies on large-scale vision and language models, resulting in high computational overhead that makes real-time deployment challenging.

In practice, we observe that action generation exhibits significant temporal redundancy: many continuous motion phases have stable trends that can be quickly predicted based on historical information, while complex vision-language reasoning is only needed at critical decision points (e.g., grasping or placing).

Based on this observation, we propose SP-VLA, a framework for VLA model inference acceleration that improves efficiency through two mechanisms: Model Scheduling and Token Pruning:

  • Model Scheduling: Dynamically selects inference paths based on action types, using a lightweight action generator during simple motion phases and invoking the full VLA model at critical decision stages.
  • Token Pruning: Filters key visual tokens using spatial and semantic information, reducing input size while maintaining spatial understanding capabilities.

By combining these two mechanisms, SP-VLA significantly reduces inference overhead while ensuring model decision-making capabilities, making VLA models more suitable for practical robotic system deployment.

SP-VLA Framework

🚀 Key Features

1. Adaptive Token Pruning

  • Attention-based Pruning: Leverages attention scores from the SigLIP vision encoder to identify important visual tokens
  • Edge Detection Enhancement: Combines Canny edge detection to ensure retention of tokens containing critical edge information
  • Dynamic Pruning Rate: Adaptively adjusts pruning intensity based on robot motion velocity
    • Retains more tokens during high-speed motion to ensure accuracy
    • Applies more aggressive pruning during low-speed motion to improve speed

2. Step Skipping

  • Motion Pattern Detection: Identifies scenarios where the robot performs smooth planar movements
  • Linear Regression Prediction: Uses ridge regression to fit historical actions and predict the next action
  • Conditional Skipping: Only skips model inference when the following conditions are met:
    • Vertical motion is sufficiently small relative to horizontal motion
    • Absolute vertical displacement is below threshold
    • Action history buffer contains sufficient generated actions

📊 Experimental Results

Simulation Experiments

On the LIBERO benchmark, SP-VLA achieves 1.5× inference acceleration while maintaining task success rates unchanged.
In the SimplerEnv environment, SP-VLA not only achieves 2.4× acceleration but also improves task performance by 6%.

LIBERO Results SimplerEnv Results

Visualization Examples

These tasks cover various robotic manipulation scenarios including spatial reasoning, long-horizon operations, and complex object interactions.

Task Visualization

Real Robot Experiments

Real robot experiments conducted on the Franka Panda manipulator demonstrate:

  • 2.5× end-to-end inference acceleration
  • Task success rate decreases by only 1%
Real Robot Experiment 1 Real Robot Experiment 2

🎯 Quick Start

1. Installation

# Clone this repository
git clone https://github.com/ChildTang/SP-VLA.git
cd SP-VLA

# Follow OpenVLA's installation steps to set up the project
# See: https://github.com/openvla/openvla

2. Run Evaluation

# Evaluate on LIBERO Spatial tasks
cd experiments/robot/libero
python run_libero.py \
    --pretrained_checkpoint /path/to/checkpoint \
    --task_suite_name libero_spatial \
    --cuda_device 0

📁 Project Structure

openvla_rule/
├── experiments/
│   └── robot/
│       ├── libero/
│       │   └── run_libero.py          # LIBERO evaluation script
│       ├── openvla_utils.py           # OpenVLA utility functions
│       └── robot_utils.py             # General robot utilities
├── prismatic/
│   ├── extern/hf/
│   │   └── modeling_prismatic.py      # Core model implementation (with optimizations)

🔧 Core Implementation

Token Pruning Implementation

Token pruning is implemented in PrismaticVisionBackbone.forward() in prismatic/extern/hf/modeling_prismatic.py:

# 1. Extract attention scores
siglip_attn = compute_attention_scores(siglip_q, siglip_k)

# 2. Dynamic threshold calculation
threshold = calculate_dynamic_threshold(z_trans, cfg)

# 3. Select important tokens
important_idx = select_tokens_by_attention(siglip_attn, threshold)

# 4. Combine with edge detection
edge_idx = detect_edge_tokens(raw_image)
important_idx = union(important_idx, edge_idx)

# 5. Apply pruning
patches = patches[:, important_idx]

Step Skipping Implementation

Step skipping is implemented in OpenVLAForActionPrediction.predict_action():

# 1. Check motion pattern
if is_planar_movement(recent_actions):
    # Use linear regression prediction
    action = fit_next_action(action_history)
else:
    # Run full VLA inference
    action = vla.generate(...)

📝 Citation

If you use this code, please cite our paper:

@article{li2025sp,
  title={Sp-vla: A joint model scheduling and token pruning approach for vla model acceleration},
  author={Li, Ye and Meng, Yuan and Sun, Zewen and Ji, Kangye and Tang, Chen and Fan, Jiajun and Ma, Xinzhu and Xia, Shutao and Wang, Zhi and Zhu, Wenwu},
  journal={arXiv preprint arXiv:2506.12723},
  year={2025}
}

🙏 Acknowledgments

This project is built upon the following excellent works:

  • OpenVLA - Original OpenVLA implementation
  • LIBERO - Robotic learning benchmark

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


Note: This implementation focuses on inference acceleration. For training-related functionality, please refer to the original OpenVLA repository.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors