Skip to content

BraveGroup/DriveVLA-W0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving

πŸ“œ [Arxiv] πŸ€— [Model Weights]

Yingyan Li*, Shuyao Shang*, Weisong Liu*, Bing Zhan*, Haochen Wang*, Yuqi Wang, Yuntao Chen, Xiaoman Wang, Yasong An, Chufeng Tang, Lu Hou, Lue Fan†, Zhaoxiang Zhang†

This paper presents DriveVLA-W0, a training paradigm that employs world modeling to predict future images. This generates dense, self-supervised signals, compelling the model to learn the underlying dynamics of the driving environment, addressing the "supervision deficit" in VLA models and amplifying data scaling laws.

DriveVLA-W0

Due to company policy, only the reviewed part of our codebase is available. Please contact us if you have any questions.

πŸ“‹ Project Structure

DriveVLA-W0/
β”œβ”€β”€ assets/                    # Project assets (images, docs, etc.)
β”œβ”€β”€ configs/                   # Model configuration files and normalization stats
β”‚   β”œβ”€β”€ fast/                 # Fast action tokenizer configs
β”‚   β”œβ”€β”€ normalizer_navsim_test/    # NAVSIM testset normalization config
β”‚   β”œβ”€β”€ normalizer_navsim_trainval/ # NAVSIM train/val normalization config
β”‚   └── normalizer_nuplan/    # NuPlan dataset normalization config
β”œβ”€β”€ data/                      # Data pipelines and config
β”‚   β”œβ”€β”€ navsim/               # NAVSIM-related data
β”‚   └── others/               # Other datasets
β”œβ”€β”€ inference/                 # Inference scripts
β”‚   β”œβ”€β”€ navsim/               # NAVSIM PDMS evaluation
β”‚   β”œβ”€β”€ qwen/                 # Qwen model inference
β”‚   └── vla/                  # Emu model inference
β”œβ”€β”€ models/                    # Model definitions
β”‚   β”œβ”€β”€ policy_head/          # Policy head implementations
β”‚   └── tokenizer/            # Tokenizer implementations
β”œβ”€β”€ scripts/                   # Training and deployment scripts
β”œβ”€β”€ tools/                     # Utility scripts
β”‚   β”œβ”€β”€ action_tokenizer/     # Action tokenizer tools
β”‚   └── pickle_gen/           # Data preprocessing & pickle generation
β”œβ”€β”€ utils/                     # utils code
β”‚   β”œβ”€β”€ datasets.py           # Dataset definitions
└── requirements.txt          # Python dependencies

πŸš€ Quick Start

5-Minute Example

  1. Download Pretrained Models
pip install huggingface_hub
export HF_ENDPOINT=https://hf-mirror.com
mkdir pretrained_models
bash scripts/misc/download_emu3_pretrain.sh
  1. Set Up Environment
conda create -n drivevla python=3.10
conda activate drivevla
pip install -r requirements.txt
  1. Download Model Weights Download Emu3_Flow_Matching_Action_Expert_PDMS_87.2 and navsim_emu_vla_256_144_test_pre_1s.pkl from Hugging Face.

  2. Run Inference

# Run inference using pretrained model (update paths as needed)
bash inference/vla/infer_navsim_flow_matching_PDMS_87.2.sh

πŸ“Š Data Preparation

NAVSIM Dataset

DriveVLA-W0 uses the NAVSIM (v1.1) dataset for training and evaluation. Steps required:

  1. Obtain NAVSIM Dataset

    • Visit the official NAVSIM repo
    • Download the train and test data splits
    • The data includes sensor information, scenario metadata, and labels
  2. Data Preprocessing

    # Generate VQ indices
    python tools/pickle_gen/pickle_generation_navsim_pre_1s.py
    
    # Generate NAVSIM pickle files
    bash scripts/tokenizer/extract_vq_emu3_navsim.sh
  3. Data Format

    • Preprocessed data is saved in data/navsim/processed_data/
    • Contains scenario files, metadata, and extracted features

Dataset Size

  • Training: ~100,000 driving frames
  • Validation: ~10,000 frames
  • Test: NAVSIM test set

πŸ’» Hardware Requirements

Training Resource Consumption

8x L20 GPUs (40GB memory each), ~16 hours

Install

CUDA Installation

If your system does not already have CUDA 12.4+, please install it first:

# Download CUDA 12.8.1 (recommended version)
wget https://developer.download.nvidia.com/compute/cuda/12.8.1/local_installers/cuda_12.8.1_570.124.06_linux.run

# Install CUDA toolkit
bash cuda_12.8.1_570.124.06_linux.run --silent --toolkit --toolkitpath=/usr/local/cuda-12.8

# Add to your ~/.bashrc or shell profile
export CUDA_HOME=/usr/local/cuda-12.8
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

Conda Environment Setup

# Create Conda environment
conda create -n drivevla python=3.10
conda activate drivevla

# Install PyTorch (CUDA 12.4)
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124

# Install core dependencies
pip install -r requirements.txt
pip install "transformers[torch]"

# Install training-related dependencies
pip install deepspeed          # Distributed training
pip install scipy              # Scientific computing
pip install tensorboard==2.14.0  # Visualization
pip install wandb              # Experiment tracking

Testing

First, download the model checkpoints from Hugging Face.

Then, run the following testing script to produce the output actions (as JSON files):

bash inference/vla/infer_navsim_flow_matching_PDMS_87.2.sh

Finally, run the script below to compute the PDMS metrics using the generated JSONs (with the conda environment and a valid navsim repo):

bash inference/vla/eval_navsim_metric_from_json.sh

βš™οΈ Configuration Overview

Configuration Files

The project uses JSON-formatted configuration files located in configs/:

configs/
β”œβ”€β”€ moe_fast_video.json          # MoE model fast inference config
β”œβ”€β”€ moe_fast_video_pretrain.json # MoE model pretraining config
β”œβ”€β”€ normalizer_navsim_test/      # NAVSIM test set normalization parameters
β”œβ”€β”€ normalizer_navsim_trainval/  # NAVSIM train+val normalization parameters
└── normalizer_nuplan/           # NuPlan normalization parameters

Normalization Statistics

Normalization parameters are automatically computed from the training datasets:

  • normalizer_navsim_trainval/ β€” computed on NAVSIM training set
  • normalizer_navsim_test/ β€” computed on NAVSIM test set
  • normalizer_nuplan/ β€” computed on NuPlan dataset

πŸ† NAVSIM v1/v2 Benchmark SOTA

Here is a comparison with state-of-the-art methods on the NAVSIM test set, as presented in the paper. Our model, DriveVLA-W0, establishes a new state-of-the-art.

Method Reference Sensors NC ↑ DAC ↑ TTC ↑ C. ↑ EP ↑ PDMS ↑
Human 100.0 100.0 100.0 99.9 87.5 94.8
BEV-based Methods
LAW ICLR'25 1x Cam 96.4 95.4 88.7 99.9 81.7 84.6
Hydra-MDP arXiv'24 3x Cam + L 98.3 96.0 94.6 100.0 78.7 86.5
DiffusionDrive CVPR'25 3x Cam + L 98.2 96.2 94.7 100.0 82.2 88.1
WoTE ICCV'25 3x Cam + L 98.5 96.8 94.4 99.9 81.9 88.3
VLA-based Methods
AutoVLA NeurIPS'25 3x Cam 98.4 95.6 98.0 99.9 81.9 89.1
ReCogDrive arXiv'25 3x Cam 98.2 97.8 95.2 99.8 83.5 89.6
DriveVLA-W0* Ours 1x Cam 98.7 99.1 95.3 99.3 83.3 90.2
AutoVLA† NeurIPS'25 3x Cam 99.1 97.1 97.1 100.0 87.6 92.1
DriveVLA-W0† Ours 1x Cam 99.3 97.4 97.0 99.9 88.3 93.0

⭐ Star

If you find our work useful for your research, please consider giving this repository a star ⭐.

πŸ“œ Citation

If you find this work useful for your research, please consider citing our paper:

@article{li2025drivevla,
  title={DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving},
  author={Li, Yingyan and Shang, Shuyao and Liu, Weisong and Zhan, Bing and Wang, Haochen and Wang, Yuqi and Chen, Yuntao and Wang, Xiaoman and An, Yasong and Tang, Chufeng and others},
  journal={arXiv preprint arXiv:2510.12796},
  year={2025}
}

Acknowledgements

We would like to acknowledge the following related works:

LAW (ICLR 2025): Using latent world models for self-supervised feature learning in end-to-end autonomous driving.

WoTE (ICCV 2025): Using BEV world models for online trajectory evaluation in end-to-end autonomous driving.

UniVLA: World modeling in the broader field of robotics.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published