Awesome-GE2EAD

This is the official repository for "Survey of General End-to-End Autonomous Driving: A Unified Perspective".

This project aims to provide a unified roadmap for the field by:

🗂️ Literature Taxonomy: Classifying methods into Conventional (e.g., UniAD), VLM-centric (e.g., DriveLM), and Hybrid (e.g., Senna) approaches.
💾 Dataset Curation: Collecting both Standard and Vision-Language datasets relevant to end-to-end AD.
📈 Trend Analysis: Outlining main research branches and emerging trends based on our survey.

Citation

If you find this project useful in your research, please consider citing:

@article{yang2025survey,
  title={Survey of General End-to-End Autonomous Driving: A Unified Perspective},
  author={Yang, Yixiang and Han, Chuanrong and Mao, Runhao and others},
  journal={TechRxiv},
  year={2025},
  month={December},
  doi={10.36227/techrxiv.176523315.56439138/v1},
  url={https://doi.org/10.36227/techrxiv.176523315.56439138/v1}
}

📌 Milestones

🚀 2025-12-24: We organize the list of papers in a completely new tabular format.
🚀 2025-12-10: The paper “Survey of General End-to-End Autonomous Driving: A Unified Perspective” was released, and this repository was made publicly available.

Mindmap, Top Methods

GE2EAD Mindmap

Top Methods

Papers

Conventional End-to-End Methods

2025

🧠 Method	🗓️ Year / Venue	🏷️ Tags	💻 GitHub	🌐 Project
FutureX _{FutureX: Enhance End-to-End Autonomous Driving via Latent Chain-of-Thought World Model}	2025	`World Model` · `Latent CoT`	—	—
Spatial Retrieval AD _{Spatial Retrieval Augmented Autonomous Driving}	2025	`Retrieval` · `Geo Images`		Project
UniMM-V2X _{UniMM-V2X: MoE-Enhanced Multi-Level Fusion for End-to-End Cooperative Autonomous Driving}	2025	`MoE` · `Multi-Agent`		—
UniLION _{UniLION: Towards Unified Autonomous Driving Model with Linear Group RNNs}	2025	`Linear RNN`		—
DiffusionDriveV2 _{DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving}	2025	`Diffusion` · `RL`		—
SIMSCALE _{SimScale: Learning to Drive via Real-World Simulation at Scale}	2025	`Simulation` · `Data Gen`		—
LAP _{LAP: Fast Latent Diffusion Planner with Fine-Grained Feature Distillation for Autonomous Driving}	2025	`Latent Diffusion` · `Planning`		—
GuideFlow _{GuideFlow: Constraint-Guided Flow Matching for Planning in End-to-End Autonomous Driving}	2025	`Generative` · `Flow Matching`		—
DiffRefiner _{DiffRefiner: Coarse to Fine Trajectory Planning via Diffusion Refinement with Semantic Interaction for End to End Autonomous Driving}	2025	`Diffusion` · `Refinement`		—
ResAD _{ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving}	2025	`Trajectory Modeling`		—
SeerDrive _{Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution}	NeurIPS 2025	`World Model` · `Planning`		—
DriveDPO _{DriveDPO: Policy Learning via Safety DPO For End-to-End Autonomous Driving}	2025	`DPO` · `Safety`	—	—
AnchDrive _{AnchDrive: Bootstrapping Diffusion Policies with Hybrid Trajectory Anchors for End-to-End Driving}	2025	`Diffusion` · `Anchors`	—	—
AdaThinkDrive _{AdaThinkDrive: Adaptive Thinking via Reinforcement Learning for Autonomous Driving}	2025	`RL` · `CoT`	—	—
VeteranAD _{Perception in Plan: Coupled Perception and Planning for End-to-End Autonomous Driving}	2025	`Perception-Planning`		—
EvaDrive _{Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving}	2025	`RL` · `Adversarial`	—	—
ReconDreamer-RL _{Enhancing Reinforcement Learning via Diffusion-based Scene Reconstruction}	2025	`RL` · `World Model`		—
GMF-Drive _{Gated Mamba Fusion with Spatial-Aware BEV Representation for End-to-End Autonomous Driving}	2025	`Mamba` · `Fusion`	—	—
DistillDrive _{End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model}	2025	`Distillation`		—
GEMINUS _{Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving}	2025	`MoE` · `Adaptive`		—
DiVER _{Breaking Imitation Bottlenecks: Reinforced Diffusion Powers Diverse Trajectory Generation}	2025	`RL` · `Diffusion`	—	—
World4Drive _{End-to-End Autonomous Driving via Intention-aware Physical Latent World Model}	ICCV 2025	`World Model`		—
FocalAD _{Local Motion Planning for End-to-End Autonomous Driving}	2025	`Motion Planning`	—	—
GaussianFusion _{Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving}	2025	`Gaussian Splatting` · `Fusion`		—
CogAD _{Cognitive-Hierarchy Guided End-to-End Autonomous Driving}	2025	`Cognitive` · `Hierarchy`	—	—
DiffE2E _{Rethinking End-to-End Driving with a Hybrid Action Diffusion and Supervised Policy}	2025	`Diffusion` · `Hybrid`	—	Project
TransDiffuser _{End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving}	2025	`Diffusion` · `Multimodal`	—	—
MomAD _{Don’t Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving}	CVPR 2025	`Planning` · `Momentum`		—
Consistency _{Predictive Planner for Autonomous Driving with Consistency Models}	2025	`Consistency` · `Planning`	—	—
ARTEMIS _{Autoregressive End-to-End Trajectory Planning with Mixture of Experts for Autonomous Driving}	2025	`MoE` · `Autoregressive`	—	—
TTOG _{Two Tasks, One Goal: Uniting Motion and Planning for Excellent End To End Autonomous Driving Performance}	2025	`Multi-task`	—	—
DiffusionDrive _{Truncated Diffusion Model for End-to-End Autonomous Driving}	CVPR 2025	`Diffusion`		—
WoTE _{End-to-End Driving with Online Trajectory Evaluation via BEV World Model}	2025	`World Model` · `BEV`		—
DMAD _{Divide and Merge: Motion and Semantic Learning in End-to-End Autonomous Driving}	2025	`Multi-task`		—
Centaur _{Robust End-to-End Autonomous Driving with Test-Time Training}	2025	`Test-Time Training`	—	—
Drive in Corridors _{Enhancing the Safety of End-to-end Autonomous Driving via Corridor Learning and Planning}	2025	`Safety` · `Planning`	—	—
BridgeAD _{Bridging Past and Future: End-to-End Autonomous Driving with Historical Prediction and Planning}	CVPR 2025	`Prediction` · `Planning`		—
Hydra-MDP++ _{Advancing End-to-End Driving via Expert-Guided Hydra-Distillation}	2025	`Distillation` · `Multi-head`		—
DiffAD _{A Unified Diffusion Modeling Approach for Autonomous Driving}	2025	`Diffusion`	—	—
GoalFlow _{Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving}	CVPR 2025	`Flow Matching`		—
HiP-AD _{Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder}	ICCV 2025	`Attention` · `Planning`		—
LAW _{Enhancing End-to-End Autonomous Driving with Latent World Model}	ICLR 2025	`World Model`		—
DriveTransformer _{Unified Transformer for Scalable End-to-End Autonomous Driving}	ICLR 2025	`Transformer`		—
UncAD _{Towards Safe End-to-end Autonomous Driving via Online Map Uncertainty}	ICRA 2025	`Uncertainty` · `Map`		—
RAD _{Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning}	2025	`RL` · `3DGS`	—	Project
OAD _{Trajectory Offset Learning: A Framework for Enhanced End-to-End Autonomous Driving}	2025	`Trajectory` · `Offset`		—

2024

🧠 Method	🗓️ Year / Venue	🏷️ Tags	💻 GitHub	🌐 Project
GaussianAD _{Gaussian-Centric End-to-End Autonomous Driving}	2024	`Gaussian Splatting` · `Perception`		—
MA2T _{Module-wise Adaptive Adversarial Training for End-to-end Autonomous Driving}	2024	`Adversarial` · `Robustness`	—	—
Hint-AD _{Holistically Aligned Interpretability in End-to-End Autonomous Driving}	2024	`Interpretability` · `Alignment`		Project
DRAMA _{An Efficient End-to-end Motion Planner for Autonomous Driving with Mamba}	CVPR 2025	`Mamba` · `Motion Planning`		Project
PPAD _{Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving}	ECCV 2024	`Prediction` · `Planning`		—
BEV-Planner _{Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?}	CVPR 2024	`BEV` · `Evaluation`		—
EfficientFuser _{Efficient Fusion and Task Guided Embedding for End-to-end Autonomous Driving}	2024	`Efficient` · `Fusion`	—	—
UAD _{End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation}	2024	`Unsupervised`	—	—
Hydra-MDP _{End-to-end Multimodal Planning with Multi-target Hydra-Distillation}	2024	`Distillation` · `Multimodal`		—
DualAD _{Disentangling the Dynamic and Static World for End-to-End Driving}	CVPR 2025	`Dual-Stream` · `Dynamic`		—
SparseDrive _{End-to-End Autonomous Driving via Sparse Scene Representation}	2024	`Sparse` · `Scene Rep`		—
GAD _{GAD-Generative Learning for HD Map-Free Autonomous Driving}	2024	`Generative` · `Map-Free`		—
SparseAD _{Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving}	2024	`Sparse` · `Query`	—	—
GenAD _{Generative End-to-End Autonomous Driving}	ECCV 2024	`Generative` · `Prediction`		—
GraphAD _{Interaction Scene Graph for End-to-end Autonomous Driving}	2024	`Graph` · `Interaction`		—
ActiveAD _{Planning-Oriented Active Learning for End-to-End Autonomous Driving}	2024	`Active Learning`	—	—
VADv2 _{End-to-End Vectorized Autonomous Driving via Probabilistic Planning}	2024	`Vectorized` · `Probabilistic`		—

2023

🧠 Method	🗓️ Year / Venue	🏷️ Tags	💻 GitHub	🌐 Project
DriveAdapter _{Breaking the Coupling Barrier of Perception and Planning in End-to-End Autonomous Driving}	ICCV 2023	`Adapter` · `Decoupling`		—
VAD _{Vectorized Scene Representation for Efficient Autonomous Driving}	ICCV 2023	`Vectorized` · `Efficient`		—
ThinkTwice _{Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving}	CVPR 2023	`Decoder` · `Refinement`		—
ReasonNet _{End-to-End Driving with Temporal and Global Reasoning}	CVPR 2023	`Reasoning` · `Temporal`		—
SuperDriverAI _{Towards Design and Implementation for End-to-End Learning-based Autonomous Driving}	2023	`Attention` · `DNN`	—	—
UniAD _{Planning-oriented Autonomous Driving}	CVPR 2023	`Multi-task` · `Unified`		—
E2E Dense _{End-to-End Learning of Behavioural Inputs for Autonomous Driving in Dense Traffic}	IROS 2023	`Optimization` · `Dense Traffic`	—	—
CRCHFL _{Communication Resources Constrained Hierarchical Federated Learning for End-to-End Autonomous Driving}	IROS 2023	`Federated Learning`	—	—
PPGeo _{Policy pre-training for autonomous driving via self-supervised geometric modeling}	ICLR 2023	`Self-Supervised` · `Geometric`		—

Before 2023

🧠 Method	🗓️ Year / Venue	🏷️ Tags	📄 Paper	💻 GitHub	🌐 Project
MMFN _{Multi-Modal-Fusion-Net for End-to-End Driving}	IROS 2022	`Fusion` · `Multi-Modal`			—
KEMP _{Keyframe-Based Hierarchical End-to-End Deep Model for Long-Term Trajectory Prediction}	ICRA 2022	`Keyframe` · `Hierarchical`		—	—
TCP _{Trajectory-guided Control Prediction for End-to-end Autonomous Driving: A Simple yet Strong Baseline}	NeurIPS 2022	`Trajectory` · `Control`			—
ST-P3 _{End-to-end Vision-based Autonomous Driving via Spatial-Temporal Feature Learning}	ECCV 2022	`Spatial-Temporal` · `Interpretable`			—
MP3 _{A Unified Model to Map, Perceive, Predict and Plan}	CVPR 2021	`Mapless` · `Prediction`	Paper	—	—
Multitask _{Multi-task Learning with Attention for End-to-end Autonomous Driving}	CVPR 2021	`Multi-task` · `Attention`			—
Transfuser _{Multi-Modal Fusion Transformer for End-to-End Autonomous Driving}	CVPR 2021	`Transformer` · `Fusion`	Paper		—
NEAT _{Neural Attention Fields for End-to-End Autonomous Driving}	ICCV 2021	`Attention Fields` · `BEV`	Paper		—
Fast-LiDARNet _{Efficient and Robust LiDAR-Based End-to-End Navigation}	ICRA 2021	`LiDAR` · `Efficient`		—	—
IVMP _{Learning Interpretable End-to-End Vision-Based Motion Planning for Autonomous Driving with Optical Flow Distillation}	ICRA 2021	`Interpretable` · `Optical Flow`		—	Project
P3 _{Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations}	ECCV 2020	`Semantic` · `Interpretability`		—	—
DARB _{Exploring data aggregation in policy learning for vision-based urban autonomous driving}	CVPR 2020	`Data Aggregation` · `Policy`	Paper		—
Roach _{End-to-End Urban Driving by Imitating a Reinforcement Learning Coach}	ICCV 2021	`RL` · `Imitation`			—
LBC _{Learning by cheating}	CoRL 2019	`Knowledge Distillation`			—
CIL _{End-to-End driving via conditional imitation learning}	CoRL 2018	`Imitation Learning`			—
Drive in A Day _{Learning to drive in a day}	2018	`RL`			—
CNN E2E _{End to End Learning for Self-Driving Cars}	2016	`CNN` · `Imitation`			—
ALVINN _{An autonomous land vehicle in a neural network}	NeurIPS 1988	`Neural Network`	Paper	—	—

(back to top)

VLM-Centric End-to-End Methods

2025

🧠 Method	🗓️ Year / Venue	🏷️ Tags	💻 GitHub	🌐 Project
DrivePI _{DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning}	2025	`4D Spatial` · `Occupancy`		—
WAM-Diff _{WAM-Diff: A Masked Diffusion VLA Framework with MoE and Online Reinforcement Learning for Autonomous Driving}	2025	`Masked Diffusion` · `MoE` · `Online RL`		—
SpaceDrive _{SpaceDrive: Infusing Spatial Awareness into VLM-based Autonomous Driving}	2025	`Spatial Encoding`		Project
OpenREAD _{OpenREAD: Reinforced Open-Ended Reasoning for End-to-End Autonomous Driving with LLM-as-Critic}	2025	`RFT/RL` · `LLM-as-Critic`
CoT4AD _{CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning}	2025	`VLA` · `CoT`	—	—
MPA _{Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving}	NeurIPS 2025	`Model-Based` · `Sim`	—	Project
AD-R1 _{AD-R1: Closed-Loop Reinforcement Learning with Impartial World Models}	2025	`RL` · `World Model`	—	—
Alpamayo-R1 _{Alpamayo-R1: Bridging Reasoning and Action Prediction for Generalizable Autonomous Driving}	2025	`VLA` · `Reasoning`		—
DriveVLA-W0 _{DRIVEVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving}	2025	`VLA` · `World Model`		—
MTRDrive _{MTRDrive: Memory-Tool Synergistic Reasoning for Robust Autonomous Driving}	2025	`VLM` · `Memory`	—	—
ReflectDrive _{Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving}	2025	`Diffusion` · `VLA`	—	—
IRL-VLA _{IRL-VLA: Training an Vision-Language-Action Policy via Reward World Model}	2025	`IRL` · `VLA`		—
Prune2Drive _{Prune2Drive: A Plug-and-Play Framework for Accelerating Vision-Language Models}	2025	`VLM` · `Pruning`	—	—
FastDriveVLA _{FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning}	2025	`VLA` · `Pruning`	—	—
MCAM _{Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding}	2025	`Causal` · `Multimodal`		—
AutoDrive-R² _{Incentivizing Reasoning and Self-Reflection Capacity for VLA Model in Autonomous Driving}	2025	`VLA` · `Reflection`	—	—
DriveAgent-R1 _{Advancing VLM-based Autonomous Driving with Hybrid Thinking and Active Perception}	2025	`VLM` · `Active`	—	—
NavigScene _{Bridging Local Perception and Global Navigation for Beyond-Visual-Range Autonomous Driving}	2025	`Navigation` · `Perception`	—	—
ADRD _{LLM-DRIVEN AUTONOMOUS DRIVING BASED ON RULE-BASED DECISION SYSTEMS}	2025	`LLM` · `Rule-Based`	—	—
AutoVLA _{A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning}	2025	`VLA` · `RL`		Project
Poutine _{Vision-Language-Trajectory Pre-Training and Reinforcement Learning Post-Training}	2025	`VLT` · `RL`	—	—
ReCogDrive _{A Reinforced Cognitive Framework for End-to-End Autonomous Driving}	2025	`VLM` · `Diffusion`		Project
AD-EE _{Early Exiting for Fast and Reliable Vision-Language Models in Autonomous Driving}	2025	`VLM` · `Efficient`	—	—
FastDrive _{Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving}	2025	`VLM` · `Structured`	—	—
HMVLM _{Multistage Reasoning-Enhanced Vision-Language Model for Long-Tailed Driving Scenarios}	2025	`VLM` · `Long-Tail`	—	—
S4-Driver _{Scalable Self-Supervised Driving Multimodal Large Language Model}	CVPR 2025	`Self-Supervised` · `MLLM`	—	—
DiffVLA _{Vision-Language Guided Diffusion Planning for Autonomous Driving}	2025	`Diffusion` · `VLM`	—	—
X-Driver _{Explainable Autonomous Driving with Vision-Language Models}	2025	`MLLM` · `CoT`	—	—
DriveGPT4-V2 _{Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving}	CVPR 2025	`LLM` · `Closed-Loop`	—	—
DriveMind _{A Dual-VLM based Reinforcement Learning Framework for Autonomous Driving}	2025	`Dual-VLM` · `RL`	—	—
ReasonPlan _{Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving}	2025	`MLLM` · `Reasoning`		—
FutureSightDrive _{Thinking Visually with Spatio-Temporal CoT for Autonomous Driving}	2025	`CoT` · `Spatio-Temporal`		—
PADriver _{Towards Personalized Autonomous Driving}	2025	`MLLM` · `Personalized`	—	—
LDM _{Unlock the Power of Unlabeled Data in Language Driving Model}	ICRA 2025	`Self-Supervised` · `Distillation`	—	—
DriveMoE _{Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving}	2025	`MoE` · `VLA`		Project
DriveMonkey _{Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving}	2025	`LVLM` · `Interactive`		—
AgentThink _{A Unified Framework for Tool-Augmented Chain-of-Thought Reasoning in Vision-Language Models}	2025	`CoT` · `Tools`	—	—
DSDrive _{Distilling Large Language Model for Lightweight End-to-End Autonomous Driving}	2025	`Distillation` · `Lightweight`	—	—
LightEMMA _{Lightweight End-to-end Multimodal Autonomous Driving}	2025	`Lightweight` · `Multimodal`		—
THCAD _{Towards Human-Centric Autonomous Driving: A Fast-Slow Architecture Integrating LLM Guidance with RL}	2025	`LLM` · `RL` · `Fast-Slow`	—	—
DriveSOTIF _{Advancing Perception SOTIF Through Multimodal Large Language Models}	2025	`SOTIF` · `MLLM`	—	—
Actor-Reasoner _{Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework}	2025	`LLM` · `Interaction`		—
MPDrive _{Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving}	CVPR 2025	`Prompt` · `Spatial`	—	—
V3LMA _{Visual 3D-enhanced Language Model for Autonomous Driving}	2025	`3D` · `LVLM`	—	—
OpenDriveVLA _{Towards End-to-end Autonomous Driving with Large Vision Language Action Model}	2025	`VLA` · `Open-Source`		Project
SimLingo _{Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment}	CVPR 2025	`VLA` · `Closed-Loop`		Project
SAFEAUTO _{KNOWLEDGE-ENHANCED SAFE AUTONOMOUS DRIVING WITH MULTIMODAL FOUNDATION MODELS}	ICLR 2025	`Safety` · `Multimodal`		—
NuGrounding _{A Multi-View 3D Visual Grounding Framework in Autonomous Driving}	2025	`Grounding` · `3D`	—	—
CoT-Drive _{Efficient Motion Forecasting for Autonomous Driving with LLMs and Chain-of-Thought Prompting}	2025	`CoT` · `Forecasting`	—	—
CoLMDriver _{LLM-based Negotiation Benefits Cooperative Autonomous Driving}	2025	`Cooperative` · `LLM`		—
AlphaDrive _{Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning}	2025	`RL` · `Reasoning`		—
TrackingMeetsLMM _{Tracking Meets Large Multimodal Models for Driving Scenario Understanding}	2025	`Tracking` · `LMM`		—
BEVDriver _{Leveraging BEV Maps in LLMs for Robust Closed-Loop Driving}	2025	`BEV` · `LLM`	—	—
DynRsl-VLM _{Enhancing Autonomous Driving Perception with Dynamic Resolution Vision-Language Models}	2025	`Dynamic Res` · `VLM`	—	—
Sce2DriveX _{A Generalized MLLM Framework for Scene-to-Drive Learning}	2025	`MLLM` · `Scene`	—	—
VLM-Assisted-CL _{VLM-Assisted Continual learning for Visual Question Answering in Self-Driving}	2025	`Continual Learning`	—	—
LeapVAD _{A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking}	2025	`Cognitive` · `Dual-Process`		Project

2024

🧠 Method	🗓️ Year / Venue	🏷️ Tags	📄 Paper	💻 GitHub	🌐 Project
VLM-RL _{A Unified Vision Language Model and Reinforcement Learning Framework for Safe Autonomous Driving}	2024	`RL` · `VLM`			Project
GPVL _{Generative Planning with 3D-vision Language Pre-training for End-to-End Autonomous Driving}	AAAI 2025	`Generative` · `3D-VL`			—
CALMM-Drive _{Confidence-Aware Autonomous Driving with Large Multimodal Model}	2024	`CoT` · `Confidence`		—	—
WiseAD _{Knowledge Augmented End-to-End Autonomous Driving with Vision-Language Model}	2024	`VLM` · `Reasoning`			—
OpenEMMA _{Open-Source Multimodal Model for End-to-End Autonomous Driving}	WACV 2025	`Open-Source` · `Multimodal`			—
FeD _{Feedback-Guided Autonomous Driving}	CVPR 2024	`Feedback` · `LLM`	Paper	—	Project
LeapAD _{Continuously learning, adapting, and improving: A dual-process approach to autonomous driving}	NeurIPS 2024	`Dual-Process` · `Continual`			Project
DriveMM _{All-in-One Large Multimodal Model for Autonomous Driving}	2024	`Multimodal` · `Generalization`			Project
Exp-Planning _{Explanation for Trajectory Planning using Multi-modal Large Language Model for Autonomous Driving}	ECCV 2024	`Explainability` · `Planning`		—	—
LaVida Drive _{Vision-Text Interaction VLM for Autonomous Driving with Token Selection, Recovery and Enhancement}	2024	`VQA` · `Interaction`		—	—
EMMA _{End-to-End Multimodal Model for Autonomous Driving}	2024	`End-to-End` · `Multimodal`		—	—
DriVLMe _{Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences}	IROS 2024	`Embodied` · `Social`			Project
OccLLaMA _{An Occupancy-Language-Action Generative World Model for Autonomous Driving}	2024	`World Model` · `Occupancy`		—	—
MiniDrive _{More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens}	2024	`Efficient` · `MoE`		—	—
RDA-Driver _{Making Large Language Models Better Planners with Reasoning-Decision Alignment}	ECCV 2024	`Reasoning` · `Alignment`		—	—
EC-Drive _{Edge-Cloud Collaborative Motion Planning for Autonomous Driving with Large Language Models}	ICCT 2024	`Edge-Cloud` · `Collaborative`		—	Project
V2X-VLM _{End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models}	2024	`V2X` · `Cooperative`			Project
Cube-LLM _{Language-Image Models with 3D Understanding}	2024	`3D` · `Language-Image`		—	Project
VLM-MPC _{Vision Language Foundation Model (VLM)-Guided Model Predictive Controller (MPC)}	2024	`MPC` · `Control`		—	—
SimpleLLM4AD _{An End-to-End Vision-Language Model with Graph Visual Question Answering}	IEIT Systems	`Graph VQA` · `Pipeline`		—	—
AsyncDriver _{Asynchronous Large Language Model Enhanced Planner for Autonomous Driving}	ECCV 2024	`Asynchronous` · `Closed-Loop`			—
AD-H _{AUTONOMOUS DRIVING WITH HIERARCHICAL AGENTS}	ICLR 2025	`Hierarchical` · `Agents`	Paper	—	—
CarLLaVA _{Vision language models for camera-only closed-loop driving}	2024	`Camera-only` · `Closed-Loop`		—	Project
PlanAgent _{A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning}	2024	`Agent` · `Closed-Loop`		—	—
Atlas _{Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?}	2024	`3D-Tokenized` · `LLM`		—	—
TRR Agent _{Interpretable Decision-Making for Autonomous Vehicles with Retrieval-Augmented Reasoning via LLM}	2024	`RAG` · `Rule-Based`		—	—
OmniDrive _{A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning}	CVPR 2025	`Counterfactual` · `3D`			—
Co-driver _{VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding}	2024	`Assistant` · `Human-like`		—	—
AgentsCoDriver _{Large Language Model Empowered Collaborative Driving with Lifelong Learning}	2024	`Collaborative` · `Lifelong`		—	—
EM-VLM4AD _{Multi-Frame, Lightweight & Efficient Vision-Language Models for Question Answering}	CVPR 2024	`Efficient` · `VQA`			—
LeGo-Drive _{Language-enhanced Goal-oriented Closed-Loop End-to-End Autonomous Driving}	IROS 2024	`Goal-oriented` · `Closed-Loop`			Project
Hybrid Reasoning _{Hybrid Reasoning Based on Large Language Models for Autonomous Car Driving}	ICCMA 2024	`Reasoning` · `Math`		—	—
VLAAD _{Vision and Language Assistant for Autonomous Driving}	WACV 2024	`Assistant` · `Explainability`	Paper	—	—
ELM _{Embodied Understanding of Driving Scenarios}	ECCV 2024	`Embodied` · `Scene Understanding`		—	—
RAG-Driver _{Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning}	RSS 2024	`RAG` · `In-Context`			Project
BEV-TSR _{Text-Scene Retrieval in BEV Space for Autonomous Driving}	AAAI 2025	`Retrieval` · `BEV`		—	—
LLaDA _{Driving Everywhere with Large Language Model Policy Adaptation}	CVPR 2024	`Adaptation` · `Traffic Rules`			Project

2023

🧠 Method	🗓️ Year / Venue	🏷️ Tags	📄 Paper	💻 GitHub	🌐 Project
LingoQA _{Visual Question Answering for Autonomous Driving}	ECCV 2024	`VQA` · `LLM`			—
LaMPilot _{An Open Benchmark Dataset for Autonomous Driving with Language Model Programs}	CVPR 2024	`Benchmark` · `LLM`			—
LLM-ASSIST _{Enhancing Closed-Loop Planning with Language-Based Reasoning}	2023	`Planning` · `Reasoning`		—	Project
DriveLM _{Driving with Graph Visual Question Answering}	ECCV 2024	`Graph VQA` · `Reasoning`			—
DriveMLM _{Aligning Multi-Modal Large Language Models with Behavioral Planning States}	2023	`MLLM` · `Planning`			—
LiDAR-LLM _{Exploring the Potential of Large Language Models for 3D LiDAR Understanding}	2023	`LiDAR` · `LLM`		—	Project
Talk2BEV _{Language-enhanced Bird's-eye View Maps for Autonomous Driving}	2023	`BEV` · `LVLM`			Project
Talk2Drive _{Personalized Autonomous Driving with Large Language Models: Field Experiments}	2023	`Personalized` · `LLM`		—	Project
LMDrive _{Closed-Loop End-to-End Driving with Large Language Models}	CVPR 2024	`Closed-Loop` · `LLM`			—
Reason2Drive _{Towards Interpretable and Chain-based Reasoning for Autonomous Driving}	ECCV 2024	`Reasoning` · `Interpretability`			—
CAVG _{GPT-4 Enhanced Multimodal Grounding for Autonomous Driving}	2023	`Grounding` · `GPT-4`			—
Dolphins _{Multimodal Language Model for Driving}	ECCV 2024	`Multimodal` · `VLM`			Project
Agent-Driver _{A Language Agent for Autonomous Driving}	COLM 2024	`Agent` · `Memory`			Project
LLM-Safety _{Empowering Autonomous Driving with Large Language Models: A Safety Perspective}	ICLR 2024	`Safety` · `MPC`			—
Co-Pilot _{ChatGPT as Your Vehicle Co-Pilot: An Initial Attempt}	2023	`Co-Pilot` · `LLM`	Paper	—	—
RRR _{Receive, Reason, and React: Drive as You Say with Large Language Models}	ITSM 2024	`Tools` · `LLM`		—	—
LanguageMPC _{Large Language Models as Decision Makers for Autonomous Driving}	2023	`MPC` · `CoT`		—	—
Driving with LLMs _{Fusing Object-Level Vector Modality for Explainable Autonomous Driving}	2023	`Object-Level` · `Explainable`			—
DriveGPT4 _{Interpretable End-to-end Autonomous Driving via Large Language Model}	RAL	`Interpretable` · `LLM`	Paper	—	Project
GPT-Driver _{Learning to Drive with GPT}	NeurIPS 2023	`Planner` · `GPT`			Project
DiLu _{A Knowledge-Driven Approach to Autonomous Driving with Large Language Models}	ICLR 2024	`Knowledge-Driven` · `Reflection`			Project
Drive as You Speak _{Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles}	2023	`Interaction` · `LLM`		—	—
HiLM-D _{Enhancing MLLMs with Multi-Scale High-Resolution Details for Autonomous Driving}	IJCV	`High-Res` · `MLLM`		—	—
SurrealDriver _{Designing LLM-powered Generative Driver Agent Framework based on Human Data}	2023	`Generative` · `Agent`		—	—
Drive Like a Human _{Rethinking Autonomous Driving with Large Language Models}	2023	`Reasoning` · `Reflection`			—
ADAPT _{Action-aware Driving Caption Transformer}	ICRA 2023	`Captioning` · `Transformer`			—

(back to top)

Hybrid End-to-End Methods

2025

🧠 Method	🗓️ Year / Venue	🏷️ Tags	💻 GitHub	🌐 Project
MindDrive _{MindDrive: An All-in-One Framework Bridging World Models and Vision-Language Model for End-to-End Autonomous Driving}	2025	`World Model` · `VLM Evaluator`		Project
AdaDrive _{AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving}	ICCV 2025	`Slow-Fast` · `LLM`		—
ReAL-AD _{Towards Human-Like Reasoning in End-to-End Autonomous Driving}	2025	`Reasoning` · `VLM`	—	Project
VLAD _{A VLM-Augmented Autonomous Driving Framework with Hierarchical Planning and Interpretable Decision Process}	ITSC 2025	`VLM` · `Hierarchical`	—	—
LeAD _{The LLM Enhanced Planning System Converged with End-to-end Autonomous Driving}	2025	`LLM` · `E2E`	—	—
NetRoller _{Interfacing General and Specialized Models for End-to-End Autonomous Driving}	2025	`Adapter` · `VLM`		—
SOLVE _{Synergy of Language-Vision and End-to-End Networks for Autonomous Driving}	CVPR 2025	`VLM` · `Fusion`	—	—
VERDI _{VLM-Embedded Reasoning for Autonomous Driving}	2025	`VLM` · `Reasoning`	—	—
ALN-P3 _{Unified Language Alignment for Perception, Prediction, and Planning in Autonomous Driving}	2025	`Alignment` · `Language`	—	—
VLM-E2E _{Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion}	2025	`VLM` · `Attention`	—	—
DIMA _{Distilling Multi-modal Large Language Models for Autonomous Driving}	CVPR 2025	`Distillation` · `MLLM`	—	—

2024

🧠 Method	🗓️ Year / Venue	🏷️ Tags	💻 GitHub	🌐 Project
VLM-AD _{End-to-End Autonomous Driving through Vision-Language Model Supervision}	2024	`Supervision` · `VLM`	—	—
FASIONAD _{FAst and Slow FusION Thinking Systems for Human-Like Autonomous Driving}	2024	`Fast-Slow` · `Fusion`	—	—
Senna _{Bridging Large Vision-Language Models and End-to-End Autonomous Driving}	2024	`VLM` · `Robustness`		—
Hint-AD _{Holistically Aligned Interpretability in End-to-End Autonomous Driving}	CoRL 2024	`Interpretability` · `Alignment`		Project
DriveVLM _{The Convergence of Autonomous Driving and Large Vision-Language Models}	CoRL 2024	`Hybrid` · `VLM`	—	Project
DME-Driver _{Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving}	AAAI 2025	`Logic` · `Perception`	—	—
VLP _{Vision Language Planning for Autonomous Driving}	CVPR 2024	`Planning` · `Reasoning`	—	—

(back to top)

Dataset

Normal Dataset

📦 Dataset	🗓️ Year / Venue	🏷️ Tags	📄 Paper	💾 Dataset / Code
KITTI _{The KITTI Vision Benchmark Suite}	CVPR 2012	`3D Detection` · `Tracking`		Dataset
nuScenes _{A Multimodal Dataset for Autonomous Driving}	CVPR 2020	`Multimodal` · `LiDAR` · `Radar`		Dataset
Waymo _{Waymo Open Dataset: Scalability in Perception}	CVPR 2020	`Perception` · `LiDAR`	Paper	Dataset
Argoverse _{3D Tracking and Forecasting with Rich Maps}	CVPR 2019	`Tracking` · `Forecasting` · `Maps`		Dataset
Lyft _{One Thousand and One Hours: Self-driving Motion Prediction Dataset}	2020	`Motion Prediction`		Dataset
ONCE _{One Million Scenes for Autonomous Driving}	NeurIPS 2021	`Unsupervised` · `3D Detection`		Dataset
Mapillary Vistas _{Semantic Understanding of Street Scenes}	ICCV 2017	`Semantic Segmentation`	Paper	Dataset
BDD100K _{A Diverse Driving Dataset for Heterogeneous Multitask Learning}	CVPR 2020	`Multitask` · `Video`
ApolloScape _{The ApolloScape Open Dataset for Autonomous Driving}	CVPR 2018	`Segmentation` · `LiDAR`		Dataset

Vision Language Dataset

2025

📦 Dataset	🗓️ Year / Venue	🏷️ Tags	💾 Dataset / Code	🌐 Project
nuScenesR²-6K _{Incentivizing Reasoning and Self-Reflection Capacity for VLA Model}	2025	`CoT` · `Reasoning`	—	—
Bench2ADVLM _{A Closed-Loop Benchmark for Vision-language Models}	2025	`Benchmark` · `Closed-Loop`	—	—
VLADBench _{Fine-Grained Evaluation of Large Vision-Language Models}	2025	`Evaluation` · `Reasoning`	Dataset	Project
NuInteract _{Extending Large Vision-Language Model for Diverse Interactive Tasks}	2025	`Interaction` · `VLM`	Dataset	Project
Drive-R1 _{Bridging Reasoning and Planning in VLMs with RL}	2025	`RL` · `Reasoning`	—	—
DriveAction _{A Benchmark for Exploring Human-like Driving Decisions in VLA Models}	2025	`Action-Driven` · `VLA`	Dataset	Project
STSBench _{A Spatio-temporal Scenario Benchmark for MLLMs}	2025	`Spatio-Temporal` · `3D`	Dataset	Project
HiLM-D _{(DRAMA-ROLISP) Enhancing MLLMs with Multi-Scale High-Resolution Details}	IJCV 2025	`Risk` · `High-Res`	Dataset	Project
S4-Driver _{WOMD-Planning-ADE Benchmark: Scalable Self-Supervised Driving MLLM}	CVPR 2025	`Self-Supervised` · `Planning`	—	—
ImpromptuVLA _{Open Weights and Open Data for Driving Vision-Language-Action Models}	2025	`Open Data` · `VLA`	Dataset	Project
DriveBench _{Are VLMs Ready for Autonomous Driving? An Empirical Study}	ICCV 2025	`Reliability` · `Evaluation`	Dataset	Project
SimLingo _{Vision-Only Closed-Loop Autonomous Driving with Language-Action Alignment}	CVPR 2025	`Alignment` · `Closed-Loop`	Dataset	Project
WOMD-Reasoning _{A Large-Scale Dataset for Interaction Reasoning in Driving}	ICML 2025	`Interaction` · `Reasoning`	Dataset	Project
OmniDrive _{LLM-Agent for Autonomous Driving with 3D Perception}	CVPR 2025	`3D Perception` · `Agent`	Dataset	Project
CODA-LM _{Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases}	WACV 2025	`Corner Cases` · `Evaluation`	Dataset	Project
CoVLA _{Comprehensive Vision-Language-Action Dataset}	WACV 2025	`VLA` · `Video`	Dataset	Project
nuPrompt _{Language Prompt for Autonomous Driving}	AAAI 2025	`Prompt` · `3D`	Dataset	Project
Robusto-1 _{Comparing Humans and VLMs on real out-of-distribution AD VQA}	2025	`OOD` · `VQA`	Dataset	—
DrivingVQA _{RIV-CoT: Retrieval-Based Interleaved Visual Chain-of-Thought}	2025	`VQA` · `CoT`	Dataset	Project
DriveLMM-o1 _{A Step-by-Step Reasoning Dataset and Large Multimodal Model}	2025	`Reasoning` · `MLLM`	Dataset	Project

2024

📦 Dataset	🗓️ Year / Venue	🏷️ Tags	📄 Paper	💾 Dataset / Code	🌐 Project
DriveLM _{Driving with Graph Visual Question Answering}	ECCV 2024	`Graph VQA` · `Graph`		Dataset	Project
LMDrive _{Closed-Loop End-to-End Driving with Large Language Models}	2024	`Closed-Loop` · `Language`		Dataset	Project
DriveCoT _{Integrating Chain-of-Thought Reasoning with End-to-End Driving}	2024	`CoT` · `Reasoning`		Dataset	Project
NuScenes-QA _{A Multi-Modal Visual Question Answering Benchmark}	AAAI 2024	`VQA` · `Benchmark`		Dataset	Project
NuScenes-MQA _{Integrated Evaluation of Captions and QA using Markup Annotations}	WACV 2024	`Captioning` · `QA`		Dataset	Project
Talk2BEV _{Language-enhanced Bird’s-eye View Maps}	ICRA 2024	`BEV` · `Maps`		Dataset	Project
DriveGPT4 _{Interpretable End-to-end Autonomous Driving via LLM}	RA-L 2024	`Interpretable` · `Instruction`		Dataset	Project
ContextVLM _{Zero-Shot and Few-Shot Context Understanding}	ITSC 2024	`Context` · `Few-Shot`		Dataset	Project
LingoQA _{Visual Question Answering for Autonomous Driving}	ECCV 2024	`VQA` · `Freeform`		Dataset	Project
Rank2Tell _{A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning}	WACV 2024	`Ranking` · `Reasoning`		Dataset	—
MAPLM _{A Real-World Large-Scale Vision-Language Dataset for Map and Traffic Scene}	CVPR 2024	`Map` · `Traffic`	Paper	Dataset	Project
NuInstruct _{Holistic Autonomous Driving Understanding by BEV Injected Multi-Modal Large Models}	CVPR 2024	`Instruction` · `BEV`		Dataset	Project
DriveVLM _{SUP-AD Dataset: The Convergence of Autonomous Driving and VLMs}	CoRL 2024	`Scene Understanding` · `Planning`		—	Project
SURDS _{Benchmarking Spatial Understanding and Reasoning in Driving Scenarios}	2024	`Spatial` · `Reasoning`		Dataset	Project

2023

📦 Dataset	🗓️ Year / Venue	🏷️ Tags	💾 Dataset / Code	🌐 Project
DriveMLM _{Aligning Multi-Modal Large Language Models with Behavioral Planning States}	2023	`Planning` · `Explanation`		—
Reason2Drive _{Towards Interpretable and Chain-based Reasoning for Autonomous Driving}	2023	`Reasoning` · `Chain-based`	Dataset	Project
Refer-KITTI _{Referring Multi-Object Tracking}	CVPR 2023	`Tracking` · `Referring`	Dataset	Project
DRAMA _{Joint Risk Localization and Captioning in Driving}	WACV 2023	`Risk` · `Captioning`	Dataset	Project

Before 2023

📦 Dataset	🗓️ Year / Venue	🏷️ Tags	💾 Dataset / Code	🌐 Project
SUTD-TrafficQA _{A Question Answering Benchmark and an Efficient Network for Video Reasoning}	CVPR 2021	`Video QA` · `Reasoning`	Dataset	Project
BDD-OIA _{Explainable Object-induced Action Decision for Autonomous Vehicles}	CVPR 2020	`Explainable` · `Decision`	Dataset	Project
HAD _{Grounding Human-to-Vehicle Advice for Self-driving Vehicles}	CVPR 2019	`Advice` · `Grounding`	Dataset	—
BDD-X _{Textual Explanations for Self-Driving Vehicles}	ECCV 2018	`Explanation` · `Captioning`	Dataset	Project
Talk2Car _{Taking Control of Your Self-Driving Car}	EMNLP 2019	`Commands` · `Referral`		Project

(back to top)

License

The GE2EAD resources is released under the Apache 2.0 license.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
images		images
LICENSE		LICENSE
README.md		README.md
index.html		index.html
script.js		script.js
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome-GE2EAD

Citation

📌 Milestones

Table of Contents

Mindmap, Top Methods

GE2EAD Mindmap

Top Methods

Papers

Conventional End-to-End Methods

VLM-Centric End-to-End Methods

Hybrid End-to-End Methods

Dataset

Vision Language Dataset

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

AutoLab-SAI-SJTU/GE2EAD

Folders and files

Latest commit

History

Repository files navigation

Awesome-GE2EAD

Citation

📌 Milestones

Table of Contents

Mindmap, Top Methods

GE2EAD Mindmap

Top Methods

Papers

Conventional End-to-End Methods

VLM-Centric End-to-End Methods

Hybrid End-to-End Methods

Dataset

Vision Language Dataset

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages