MindDriver: Introducing Progressive Multimodal Reasoning for Autonomous Driving

🎉🎉CVPR 2026 🎉🎉

Lingjun Zhang^1*, Yujian Yuan^1,2*, Changjie Wu^1†, Xinyuan Chang¹, Xin Cai³, Shuang Zeng^1,4, Linzhe Shi¹, Sijin Wang¹, Hang Zhang¹, Mu Xu¹,

¹Amap, Alibaba Group, ²The Hong Kong University of Science and Technology, ³The Chinese University of Hong Kong, ⁴Xi'an Jiaotong University

(*) Equal contribution. (†) Project leader.

Comparison of different reasoning methods. Text reasoning struggles with space misalignment, while image reasoning suffers from guideless image prediction. Our proposed progressive multimodal reasoning conducts aligned smooth reasoning.

MindDriver: The proposed multimodal reasoning framework that enables VLM to imitate human-like progressive thinking for autonomous driving. MindDriver presents semantic understanding, semantic-to-physical space imagination, and physical-space trajectory planning.

🗓️ TODO

Release MindDriver reasoning code
Release whole MindDriver code
Release checkpoints

🙏 Acknowledgement

Our work is primarily based on the following codebases:FSDrive, LLaMA-Factory, MoVQGAN, GPT-Driver, Agent-Driver. We are sincerely grateful for their work.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MindDriver: Introducing Progressive Multimodal Reasoning for Autonomous Driving

🎉🎉CVPR 2026 🎉🎉

🗓️ TODO

🙏 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

MindDriver: Introducing Progressive Multimodal Reasoning for Autonomous Driving

🎉🎉CVPR 2026 🎉🎉

🗓️ TODO

🙏 Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages