Skip to content

youngzhou1999/DrivingGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Yang Zhou* , Hao Shao* , Letian Wang , Zhuofan Zong , Hongsheng Li , Steven L. Waslander ("*" denotes equal contribution)

pipeline

Table of Contents

Updates

  • [03/2026] Evaluation code released.
  • [01/2026] DrivingGen is aceepted to ICLR 2026.
  • [01/2026] We release our paper on arXiv and our dataset on Hugging Face.

Overview

DrivingGen is the first comprehensive benchmark for generative driving world models. It combines a diverse evaluation dataset curated from both driving datasets and internet-scale video sources — spanning varied weather, time of day, geographic regions, and complex maneuvers. DrivingGen evaluates models from both a visual perspective (the realism and overall quality of generated videos) and a robotics perspective (the physical plausibility, consistency, and accuracy of generated trajectories).

Setup Instructions

1. Clone the Repository

git clone https://github.com/youngzhou1999/DrivingGen.git
cd DrivingGen

2. Environment Setup

conda create -n drivinggen python=3.10
conda activate drivinggen

3. Install Dependencies

We recommend using the provided environment.yml for a full environment setup:

conda env create -f environment.yml
conda activate drivinggen

Alternatively, if you prefer installing into an existing environment:

pip install -r requirements.txt

Then install third-party packages:

cd third_parties/UniDepth && pip install -e . && cd ../..
cd third_parties/yolov10 && pip install -e . && cd ../..
cd third_parties/samurai && pip install -e . && cd ../..

4. Download Dataset

Download the DrivingGen dataset from Hugging Face. First, update your Hugging Face token in drivinggen/down_dataset.py, then run:

bash scripts/0-down_data.sh

The dataset will be downloaded to ./data/.

Video Generation

This section guides you through generating videos for evaluation using your own world generation model. We provide an example using Wan2.2-14B (Image-to-Video).

1. Run Inference

Configure and run video generation:

bash scripts/1-example_infer_model.sh

Key parameters in the script:

video_path=data/ego_condition.json   # Input metadata
out_dir=cache/infer_results          # Output directory
split=ego_condition                  # Data split (ego_condition / open_domain)
model=wan2.2-14b                     # Model name
exp_id=default_prompt                # Experiment ID

The generated videos (101 frames at 10 fps, 576x1024 resolution) will be saved as both MP4 videos and individual PNG frames.

2. Extract Ego Trajectory

Extract ego vehicle trajectory from generated videos using UniDepthV2 and Visual SLAM:

bash scripts/2-get_ego_traj.sh

3. Extract Agent Trajectories

Extract agent trajectories using YOLOv10 detection and depth estimation:

bash scripts/3-get_agent_traj.sh

Evaluation

DrivingGen evaluates generated videos using comprehensive video metrics and trajectory metrics.

Video Metrics

Evaluates visual quality and temporal coherence of generated videos:

Category Metrics
Distribution FVD (Frechet Video Distance)
Objective Quality IEEE P2020 automotive imaging metrics (sharpness, exposure, contrast, color, noise, artifacts, texture, temporal)
Subjective Quality CLIP-IQA+ based assessment
Scene Consistency DINOv3 feature-based consistency
Agent Consistency Agent appearance consistency and missing detection
Perceptual LPIPS, SSIM

Run video evaluation:

bash scripts/4-get_video_metrics.sh

Trajectory Metrics

Evaluates the physical plausibility and accuracy of generated trajectories:

Category Metrics
Distribution FTD (Frechet Trajectory Distance) via Motion Transformer encoder
Alignment ADE, FDE, Success Rate, Hausdorff Distance, DTW
Quality Comfort Score (jerk, acceleration, yaw rate), Curvature RMS, Speed Score
Consistency Velocity Consistency, Acceleration Consistency

Run trajectory evaluation:

bash scripts/5-get_traj_metrics.sh

Results will be saved to cache/eval_logs/.

Benchmarked Models

DrivingGen benchmarks 14 state-of-the-art models across three categories:

Category Models
General Video World Models Gen-3, Kling, CogVideoX, Wan, HunyuanVideo, LTX-Video, SkyReels
Physical World Models Cosmos-Predict1, Cosmos-Predict2
Driving-Specific World Models Vista, DrivingDojo, GEM, VaViM, UniFuture

Citation

If you find our research useful, please cite us as:

@misc{zhou2026drivinggencomprehensivebenchmarkgenerative,
      title={DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving},
      author={Yang Zhou and Hao Shao and Letian Wang and Zhuofan Zong and Hongsheng Li and Steven L. Waslander},
      year={2026},
      eprint={2601.01528},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.01528},
}

License

All code within this repository is under Apache License 2.0.

About

[ICLR 2026] DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors