Yang Zhou* , Hao Shao* , Letian Wang , Zhuofan Zong , Hongsheng Li , Steven L. Waslander ("*" denotes equal contribution)
- [03/2026] Evaluation code released.
- [01/2026] DrivingGen is aceepted to ICLR 2026.
- [01/2026] We release our paper on arXiv and our dataset on Hugging Face.
DrivingGen is the first comprehensive benchmark for generative driving world models. It combines a diverse evaluation dataset curated from both driving datasets and internet-scale video sources — spanning varied weather, time of day, geographic regions, and complex maneuvers. DrivingGen evaluates models from both a visual perspective (the realism and overall quality of generated videos) and a robotics perspective (the physical plausibility, consistency, and accuracy of generated trajectories).
git clone https://github.com/youngzhou1999/DrivingGen.git
cd DrivingGenconda create -n drivinggen python=3.10
conda activate drivinggenWe recommend using the provided environment.yml for a full environment setup:
conda env create -f environment.yml
conda activate drivinggenAlternatively, if you prefer installing into an existing environment:
pip install -r requirements.txtThen install third-party packages:
cd third_parties/UniDepth && pip install -e . && cd ../..
cd third_parties/yolov10 && pip install -e . && cd ../..
cd third_parties/samurai && pip install -e . && cd ../..Download the DrivingGen dataset from Hugging Face. First, update your Hugging Face token in drivinggen/down_dataset.py, then run:
bash scripts/0-down_data.shThe dataset will be downloaded to ./data/.
This section guides you through generating videos for evaluation using your own world generation model. We provide an example using Wan2.2-14B (Image-to-Video).
Configure and run video generation:
bash scripts/1-example_infer_model.shKey parameters in the script:
video_path=data/ego_condition.json # Input metadata
out_dir=cache/infer_results # Output directory
split=ego_condition # Data split (ego_condition / open_domain)
model=wan2.2-14b # Model name
exp_id=default_prompt # Experiment IDThe generated videos (101 frames at 10 fps, 576x1024 resolution) will be saved as both MP4 videos and individual PNG frames.
Extract ego vehicle trajectory from generated videos using UniDepthV2 and Visual SLAM:
bash scripts/2-get_ego_traj.shExtract agent trajectories using YOLOv10 detection and depth estimation:
bash scripts/3-get_agent_traj.shDrivingGen evaluates generated videos using comprehensive video metrics and trajectory metrics.
Evaluates visual quality and temporal coherence of generated videos:
| Category | Metrics |
|---|---|
| Distribution | FVD (Frechet Video Distance) |
| Objective Quality | IEEE P2020 automotive imaging metrics (sharpness, exposure, contrast, color, noise, artifacts, texture, temporal) |
| Subjective Quality | CLIP-IQA+ based assessment |
| Scene Consistency | DINOv3 feature-based consistency |
| Agent Consistency | Agent appearance consistency and missing detection |
| Perceptual | LPIPS, SSIM |
Run video evaluation:
bash scripts/4-get_video_metrics.shEvaluates the physical plausibility and accuracy of generated trajectories:
| Category | Metrics |
|---|---|
| Distribution | FTD (Frechet Trajectory Distance) via Motion Transformer encoder |
| Alignment | ADE, FDE, Success Rate, Hausdorff Distance, DTW |
| Quality | Comfort Score (jerk, acceleration, yaw rate), Curvature RMS, Speed Score |
| Consistency | Velocity Consistency, Acceleration Consistency |
Run trajectory evaluation:
bash scripts/5-get_traj_metrics.shResults will be saved to cache/eval_logs/.
DrivingGen benchmarks 14 state-of-the-art models across three categories:
| Category | Models |
|---|---|
| General Video World Models | Gen-3, Kling, CogVideoX, Wan, HunyuanVideo, LTX-Video, SkyReels |
| Physical World Models | Cosmos-Predict1, Cosmos-Predict2 |
| Driving-Specific World Models | Vista, DrivingDojo, GEM, VaViM, UniFuture |
If you find our research useful, please cite us as:
@misc{zhou2026drivinggencomprehensivebenchmarkgenerative,
title={DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving},
author={Yang Zhou and Hao Shao and Letian Wang and Zhuofan Zong and Hongsheng Li and Steven L. Waslander},
year={2026},
eprint={2601.01528},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2601.01528},
}All code within this repository is under Apache License 2.0.
