Skip to content

OpenHelix-Team/VLA-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

VLA2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation

Python PyTorch VLA Agent LIBERO License

πŸ“„ Paper & Resources

πŸ“£ News

  • 10.27.25 initial upload.
  • 11.03.25 update Deployment.

πŸ“ Project Structure

VLA-2/
β”œβ”€β”€ experiments/                    # Main experimental codes
β”‚   β”œβ”€β”€ robot/                    # Core VLA-2 implementation
β”‚   β”‚   β”œβ”€β”€ openvla_utils.py      # OpenVLA utility functions
β”‚   β”‚   β”œβ”€β”€ robot_utils.py        # Robot interaction utilities
β”‚   β”‚   └── libero_run/           # Main scripts for LIBERO environment
β”‚   β”‚       β”œβ”€β”€ main_agent_clean.py        # 🎯 Main execution script, use client to get service from vision_planner_service
β”‚   β”‚       β”œβ”€β”€ vision_planner_service.py  # Vision & planning service
β”‚   β”‚       β”œβ”€β”€ qwenvl.py                  # Verification module wrapper
β”‚   β”‚       β”œβ”€β”€ libero_utils.py            # LIBERO environment utilities
β”‚   β”‚       β”œβ”€β”€ regenerate_libero_dataset.py  # Dataset regeneration
β”‚   β”‚       β”œβ”€β”€ mps_start.sh               # Multi-process service start
β”‚   β”‚       └── mps_stop.sh                # Multi-process service stop
β”‚   └── val_zsh/                  # Validation shell scripts
β”‚       β”œβ”€β”€ 0.sh, 10.sh           # 0 and 10 test scenarios
β”‚       β”œβ”€β”€ goal.sh, goal_new.sh  # Goal-based evaluations
β”‚       β”œβ”€β”€ objects.sh            # Object manipulation tests
β”‚       β”œβ”€β”€ orange.sh             # Specific object tests
β”‚       └── spatial.sh            # Spatial reasoning tests
β”œβ”€β”€ script/                       # Tool and utility scripts
β”‚   β”œβ”€β”€ __init__.py              # Package initialization
β”‚   β”œβ”€β”€ auto_DL.py              # Automatic searching utilities
β”‚   β”œβ”€β”€ color.json              # Color configuration
β”‚   β”œβ”€β”€ Judge_simple.py         # Simple judgment module
β”‚   β”œβ”€β”€ mmgdino.py              # MM-GroundingDINO integration, including Vision and Language understanding
β”‚   β”œβ”€β”€ mmgdino_simple.py       # Simplified MM-GroundingDINO
β”‚   β”œβ”€β”€ qwenvl_meg.py           # QwenVL model enhancement
β”‚   β”œβ”€β”€ SAM2_1.py               # Segment Anything Model 2.1
β”‚   β”œβ”€β”€ SAPdivision.py          # SAP (Sub-Action Planning) division
β”‚   β”œβ”€β”€ segvideo.py             # Video segmentation
β”‚   β”œβ”€β”€ segvideo_simple.py      # Simplified video segmentation
β”‚   β”œβ”€β”€ Wholebody.py            # A media function
β”‚   └── test_images/            # Test images and configurations
β”‚       β”œβ”€β”€ info.json           # Image metadata
β”‚       β”œβ”€β”€ replacetest.py      # Replacement testing
β”‚       β”œβ”€β”€ smoke_results.json  # Smoke test results
β”‚       └── test.py             # Test runner
β”œβ”€β”€ prismatic/                  # OpenVLA codebase (original)
└── vla-scripts/                # Model testing
    β”œβ”€β”€ deploy.py               # Model deployment script
    β”œβ”€β”€ finetune.py             # Fine-tuning script
    β”œβ”€β”€ train.py                # Training script
    └── extern/                 # External conversion utilities
        β”œβ”€β”€ convert_openvla_weights_to_hf.py  # Weight conversion
        β”œβ”€β”€ test_openvla.py                   # OpenVLA testing
        └── verify_openvla.py                 # OpenVLA verification

πŸ”§ Core Components

🎯 Main Execution (libero_run/)

  • main_agent_clean.py: Main execution script containing all tool module calls and agent logic implementation
  • vision_planner_service.py: Service server for planner, Vision, and Language modules. Due to library version compatibility issues, we run the execution and verification module code in a separate process, communicating with the main process through socket communication. For module naming and content details, please refer to the paper.
  • qwenvl.py: Wrapper function for the verification module

πŸ› οΈ Tool Scripts (script/)

  • Computer Vision: SAM2_1.py, segvideo.py, mmgdino.py - Advanced vision processing
  • Language Models: qwenvl_meg.py, Judge_simple.py - Language understanding and judgment
  • Planning: SAPdivision.py - Sub-action planning and task decomposition
  • Utilities: auto_DL.py, Wholebody.py - Automation and analysis tools

πŸ—οΈ Architecture (prismatic/)

The remaining code in the experiments folder is based on OpenVLA codebase

  • Backbone Models: Support for various LLM and vision architectures
  • VLA Integration: Specialized vision-language-action model implementations
  • Training Infrastructure: Distributed training with DDP/FSDP support
  • Data Processing: RLDS dataset integration and preprocessing

πŸ“Š Evaluation Scripts (val_zsh/)

  • Comprehensive test scenarios covering different aspects of robot manipulation
  • Goal-oriented tasks, object manipulation, and spatial reasoning evaluations

πŸš€ Installation & Deployment

Overview

This project uses a dual conda environment setup to avoid library version conflicts, particularly with transformers. We recommend using OpenVLA's recommended configuration for the main environment and our specified requirements for the server environment.

Prerequisites

  • Anaconda/Miniconda: Latest version
  • Git: For repository cloning
  • NVIDIA Driver: 550.54.14+
  • CUDA: Compatible with PyTorch 2.2/2.3

Environment Architecture

Client Environment Dependencies

Server Environment Dependencies

Installation Steps

Step 1: Client Environment Setup

# Create and activate client environment
conda env create -f client.yml
conda activate client

# Install video segmentation library
git clone https://github.com/hkchengrex/Cutie
cd Cutie && pip install -e .
cd ..

# Install robot learning benchmark
git clone https://github.com/zhangjiaxuan-Xuan/LIBERO_ZERO 
# Optional: cd LIBERO_ZERO && pip install -e .
# Recommended: Import LIBERO_ZERO by absolute path

# Install OpenVLA dependencies
pip install dlimp@git+https://github.com/moojink/dlimp_openvla
pip install thinplate@git+https://github.com/cheind/py-thin-plate-spline

# Optional: Install Flash Attention for performance
pip install flash-attn==2.5.5

Step 2: Server Environment Setup

# Create and activate server environment
conda env create -f server.yml
conda activate server

# Install bulk image downloader
pip install git+https://github.com/ostrolucky/Bulk-Bing-Image-downloader

# Install latest transformers (includes tokenizers)
pip install git+https://github.com/huggingface/transformers.git

# Optional: Install Flash Attention for performance
pip install flash-attn==2.6.1

Step 3: Model Configuration

  1. Download required model weights to local storage
  2. Update model paths in all files in experiments and scripts as needed
  3. Use validation scripts in val_zsh/ folder for initial testing

Quick Start

Enter the 'val_zsh' directory and run a test script, e.g.,

cd val_zsh
zsh 0.sh

Citation

if you find this project useful in your research, please consider citing:

@misc{zhaozhang2025vla2,
  title={VLAΒ²: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation},
  author={Han Zhao, Jiaxuan Zhang, Wenxuan Song, Pengxiang Ding, Donglin Wang},
  eprint={2510.14902},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  year={2025}
}

πŸŽ–οΈ References

πŸ”§ todo:

  • Updating, new features coming soon.

About

VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors