Lingjun Zhang1*, Yujian Yuan1,2*, Changjie Wu1β , Xinyuan Chang1, Xin Cai3, Shuang Zeng1,4, Linzhe Shi1, Sijin Wang1, Hang Zhang1, Mu Xu1,
1Amap, Alibaba Group, 2The Hong Kong University of Science and Technology, 3The Chinese University of Hong Kong, 4Xi'an Jiaotong University
(*) Equal contribution. (β ) Project leader.
Comparison of different reasoning methods. Text reasoning struggles with space misalignment, while image reasoning suffers from guideless image prediction. Our proposed progressive multimodal reasoning conducts aligned smooth reasoning.
MindDriver: The proposed multimodal reasoning framework that enables VLM to imitate human-like progressive thinking for autonomous driving. MindDriver presents semantic understanding, semantic-to-physical space imagination, and physical-space trajectory planning.
- Release MindDriver reasoning code
- Release whole MindDriver code
- Release checkpoints
Our work is primarily based on the following codebases:FSDrive, LLaMA-Factory, MoVQGAN, GPT-Driver, Agent-Driver. We are sincerely grateful for their work.