DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Yang Zhou* , Hao Shao* , Letian Wang , Zhuofan Zong , Hongsheng Li , Steven L. Waslander ("*" denotes equal contribution)

Updates

[03/2026] Evaluation code released.
[01/2026] DrivingGen is aceepted to ICLR 2026.
[01/2026] We release our paper on arXiv and our dataset on Hugging Face.

Overview

DrivingGen is the first comprehensive benchmark for generative driving world models. It combines a diverse evaluation dataset curated from both driving datasets and internet-scale video sources — spanning varied weather, time of day, geographic regions, and complex maneuvers. DrivingGen evaluates models from both a visual perspective (the realism and overall quality of generated videos) and a robotics perspective (the physical plausibility, consistency, and accuracy of generated trajectories).

Setup Instructions

1. Clone the Repository

git clone https://github.com/youngzhou1999/DrivingGen.git
cd DrivingGen

2. Environment Setup

conda create -n drivinggen python=3.10
conda activate drivinggen

3. Install Dependencies

We recommend using the provided environment.yml for a full environment setup:

conda env create -f environment.yml
conda activate drivinggen

Alternatively, if you prefer installing into an existing environment:

pip install -r requirements.txt

Then install third-party packages:

cd third_parties/UniDepth && pip install -e . && cd ../..
cd third_parties/yolov10 && pip install -e . && cd ../..
cd third_parties/samurai && pip install -e . && cd ../..

4. Download Dataset

Download the DrivingGen dataset from Hugging Face. First, update your Hugging Face token in drivinggen/down_dataset.py, then run:

bash scripts/0-down_data.sh

The dataset will be downloaded to ./data/.

Video Generation

This section guides you through generating videos for evaluation using your own world generation model. We provide an example using Wan2.2-14B (Image-to-Video).

1. Run Inference

Configure and run video generation:

bash scripts/1-example_infer_model.sh

Key parameters in the script:

video_path=data/ego_condition.json   # Input metadata
out_dir=cache/infer_results          # Output directory
split=ego_condition                  # Data split (ego_condition / open_domain)
model=wan2.2-14b                     # Model name
exp_id=default_prompt                # Experiment ID

The generated videos (101 frames at 10 fps, 576x1024 resolution) will be saved as both MP4 videos and individual PNG frames.

2. Extract Ego Trajectory

Extract ego vehicle trajectory from generated videos using UniDepthV2 and Visual SLAM:

bash scripts/2-get_ego_traj.sh

3. Extract Agent Trajectories

Extract agent trajectories using YOLOv10 detection and depth estimation:

bash scripts/3-get_agent_traj.sh

Evaluation

DrivingGen evaluates generated videos using comprehensive video metrics and trajectory metrics.

Video Metrics

Evaluates visual quality and temporal coherence of generated videos:

Category	Metrics
Distribution	FVD (Frechet Video Distance)
Objective Quality	IEEE P2020 automotive imaging metrics (sharpness, exposure, contrast, color, noise, artifacts, texture, temporal)
Subjective Quality	CLIP-IQA+ based assessment
Scene Consistency	DINOv3 feature-based consistency
Agent Consistency	Agent appearance consistency and missing detection
Perceptual	LPIPS, SSIM

Run video evaluation:

bash scripts/4-get_video_metrics.sh

Trajectory Metrics

Evaluates the physical plausibility and accuracy of generated trajectories:

Category	Metrics
Distribution	FTD (Frechet Trajectory Distance) via Motion Transformer encoder
Alignment	ADE, FDE, Success Rate, Hausdorff Distance, DTW
Quality	Comfort Score (jerk, acceleration, yaw rate), Curvature RMS, Speed Score
Consistency	Velocity Consistency, Acceleration Consistency

Run trajectory evaluation:

bash scripts/5-get_traj_metrics.sh

Results will be saved to cache/eval_logs/.

Benchmarked Models

DrivingGen benchmarks 14 state-of-the-art models across three categories:

Category	Models
General Video World Models	Gen-3, Kling, CogVideoX, Wan, HunyuanVideo, LTX-Video, SkyReels
Physical World Models	Cosmos-Predict1, Cosmos-Predict2
Driving-Specific World Models	Vista, DrivingDojo, GEM, VaViM, UniFuture

Citation

If you find our research useful, please cite us as:

@misc{zhou2026drivinggencomprehensivebenchmarkgenerative,
      title={DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving},
      author={Yang Zhou and Hao Shao and Letian Wang and Zhuofan Zong and Hongsheng Li and Steven L. Waslander},
      year={2026},
      eprint={2601.01528},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.01528},
}

License

All code within this repository is under Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
drivinggen		drivinggen
scripts		scripts
third_parties		third_parties
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Yang Zhou* , Hao Shao* , Letian Wang , Zhuofan Zong , Hongsheng Li , Steven L. Waslander ("*" denotes equal contribution)

Table of Contents

Updates

Overview

Setup Instructions

1. Clone the Repository

2. Environment Setup

3. Install Dependencies

4. Download Dataset

Video Generation

1. Run Inference

2. Extract Ego Trajectory

3. Extract Agent Trajectories

Evaluation

Video Metrics

Trajectory Metrics

Benchmarked Models

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Yang Zhou* , Hao Shao* , Letian Wang , Zhuofan Zong , Hongsheng Li , Steven L. Waslander ("*" denotes equal contribution)

Table of Contents

Updates

Overview

Setup Instructions

1. Clone the Repository

2. Environment Setup

3. Install Dependencies

4. Download Dataset

Video Generation

1. Run Inference

2. Extract Ego Trajectory

3. Extract Agent Trajectories

Evaluation

Video Metrics

Trajectory Metrics

Benchmarked Models

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages