Skip to content

hotdogcheesewhite/MindDriver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

MindDriver: Introducing Progressive Multimodal Reasoning for Autonomous Driving

πŸŽ‰πŸŽ‰CVPR 2026 πŸŽ‰πŸŽ‰

Lingjun Zhang1*, Yujian Yuan1,2*, Changjie Wu1†, Xinyuan Chang1, Xin Cai3, Shuang Zeng1,4, Linzhe Shi1, Sijin Wang1, Hang Zhang1, Mu Xu1,

1Amap, Alibaba Group, 2The Hong Kong University of Science and Technology, 3The Chinese University of Hong Kong, 4Xi'an Jiaotong University

(*) Equal contribution. (†) Project leader.

image

Comparison of different reasoning methods. Text reasoning struggles with space misalignment, while image reasoning suffers from guideless image prediction. Our proposed progressive multimodal reasoning conducts aligned smooth reasoning.

MindDriver: The proposed multimodal reasoning framework that enables VLM to imitate human-like progressive thinking for autonomous driving. MindDriver presents semantic understanding, semantic-to-physical space imagination, and physical-space trajectory planning.

πŸ—“οΈ TODO

  • Release MindDriver reasoning code
  • Release whole MindDriver code
  • Release checkpoints

πŸ™ Acknowledgement

Our work is primarily based on the following codebases:FSDrive, LLaMA-Factory, MoVQGAN, GPT-Driver, Agent-Driver. We are sincerely grateful for their work.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors