Bohan Li - 李博涵

Email: bohan.li77_at_gmail.com

[GitHub] [Google Scholar]

Biography

I'm a Ph.D. student at Shanghai Jiao Tong University (SJTU) and Eastern Institute of Technology(EIT), Ningbo, advised by Prof. Xin Jin, Prof. Wenjun Zeng, Prof. Chao Ma, and Prof. Xiaokang Yang. I'm a Visiting Scholar of BME and ECE at the National University of Singapore (NUS) and work with Prof. Yueming Jin. I also work with Prof. Hao Zhao at Tsinghua University. I did my Master's degree at South China University of Technology (SCUT) and Bachelor's degree at Northeastern University (NEU). I have also spent some time at BAAI, Changcheng, Lixiang, MEGVII, IDEA, Tencent AILab, NetEase AILab, PhiGent, ZTE.

Research

I have a broad research interest in 3D computer vison and world modeling, including autonomous vehicles and robotics, 3D scene comprehension and generation, 3D structed information processing, representation disentanglement, and AI for Science.

News

We have one paper accepted to IEEE TPAMI, good news on the last day of 2025.

We have two papers accepted to IEEE TPAMI.

We have a paper accepted to NeurIPS 2025.

We have a paper accepted to CoRL 2025 (Oral).

We have two papers accepted to ICCV 2025.

We have a paper accepted to CVPR 2025.

We have a paper accepted to NeurIPS 2024.

We have two papers accepted to ECCV 2024.

We have a paper accepted to IJCAI 2024 (Oral).

We have a paper accepted to AAAI 2024.

We have a paper accepted to ICCV 2023.

I worked as a Computer Vision Algorithm Engineer at Tencent AI Lab.

Selected Publications (*Equal Contribution, †Corresponding)


	OmniNWM: Omniscient Driving Navigation World Models Bohan Li, Zhuang Ma, Dalong Du*, Baorui Peng, Zhujin Liang, Zhenqiang Liu, Chao Ma, Yueming Jin, Hao Zhao, Wenjun Zeng, Xin Jin†. Arxiv [ paper ] [ project page ] [ code ]
	Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction Bohan Li, Jiajun Deng, Yasheng Sun, Xiaofeng Wang, Xin Jin†, Wenjun Zeng. IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI) [ paper ] [ project page ]
	OccScene: Semantic Occupancy-based Cross-task Mutual Learning for 3D Scene Generation Bohan Li, Xin Jin†, Jianan Wang, Yukai Shi, Yasheng Sun, Xiaofeng Wang, Zhuang Ma, Baao Xie, Chao Ma, Xiaokang Yang, Wenjun Zeng. IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI) [ paper ]
	UniScene: Unified Occupancy-centric Driving Scene Generation Bohan Li, Jiazhe Guo, Hongsi Liu, Yingshuang Zou, Yikang Ding, Xiwu Chen, Hu Zhu, Feiyang Tan, Chi Zhang, Tiancai Wang, Shuchang Zhou, Li Zhang, Xiaojuan Qi, Hao Zhao, Mu Yang, Wenjun Zeng, Xin Jin†. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025)* [ paper ] [ project page ] [ code ]
	NaviNeRF++: Towards Interpretable 3D Reconstruction via Unsupervised Disentangled Representation Learning Baao Xie, Zequn Zhang, Huanting Guo, Qiuyu Chen, Hu Zhu, Bohan Li, Wenjun Zeng, Xin Jin†. IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI) [ paper ]
	Scaling Up Occupancy-centric Driving Scene Generation: Dataset and Method Bohan Li, Xin Jin†, Hu Zhu, Hongsi Liu, Ruikai Li, Jiazhe Guo, Kaiwen Cai, Chao Ma, Yueming Jin, Hao Zhao, Xiaokang Yang, Wenjun Zeng. Arxiv [ paper ] [ project page ] [ code ]
	Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond Zheng Zhu†, Xiaofeng Wang, Wangbo Zhao, Chen Min, Bohan Li, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang. Arxiv [ paper ] [ project page ]
	ORV: 4D Occupancy-centric Robot Video Generation Xiuyu Yang, Bohan Li, Shaocong Xu,Nan Wang,Chongjie Ye,Zhaoxi Chen,Minghan Qin,Yikang Ding,Xin Jin,Hang Zhao, Hao Zhao†. Arxiv [ paper ] [ project page ] [ Dataset ] [ code ]
	Challenger: Affordable Adversarial Driving Video Generation Zhiyuan Xu, Bohan Li, Huan-ang Gao, Mingju Gao, Yong Chen, Ming Liu, Chenxu Yan, Hang Zhao, Shuo Feng, Hao Zhao†. Conference on Robot Learning (CoRL 2025 SAFE-ROL Workshop Oral) [ paper ] [ project page ] [ Dataset ] [ code ]
	Hybrid-grained Feature Aggregation with Coarse-to-fine Language Guidance for Self-supervised Monocular Depth Estimation Wenyao Zhang, Hongsi Liu, Bohan Li, Jiawei He, Zekun Qi, Yunnan Wang, Shengyang Zhao, Xinqiang Yu, Wenjun Zeng, Xin Jin†. International Conference on Computer Vision (ICCV 2025)* [ paper ] [ code ]
	Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting Nan Wang, Yuantao Chen, Lixing Xiao, Weiqing Xiao, Bohan Li, Zhaoxi Chen, Chongjie Ye, Shaocong Xu, Saining Zhang, Ziyang Yan, Pierre Merriaux, Lei Lei, Tianfan Xue, Hao Zhao†. Neural Information Processing Systems (NeurIPS 2025) [ paper ] [ project page ] [ code ]
	One View, Many Worlds: Single-Image to 3D Object Meets Generative Domain Randomization for One-Shot 6D Pose Estimation Zheng Geng, Nan Wang, Shaocong Xu, Chongjie Ye, Bohan Li, Zhaoxi Chen, Sida Peng, Hao Zhao†. Conference on Robot Learning ( CoRL 2025 Oral) [ paper ] [ project page ] [ Huggingface ] [ code ]
	DiST-4D: Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Driving Scene Generation Jiazhe Guo, Yikang Ding, Xiwu Chen, Shuo Chen, Bohan Li, Yingshuang Zou, Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Zhiheng Li, Hao Zhao†. International Conference on Computer Vision (ICCV 2025) [ paper ] [ page ] [ code ]
	MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction Yingshuang Zou, Yikang Ding, Chuanrui Zhang, Jiazhe Guo, Bohan Li, Xiaoyang Lyu, Feiyang Tan, Xiaojuan Qi, Haoqian Wang†. British Machine Vision Association (BMVC 2025) [ paper ] [ page ] [ code ]

	TAPTRv2: Attention-based Position Update Improves Tracking Any Point Hongyang Li, Feng Li, Hao Zhang, Tianhe Ren, Shilong Liu, Bohan Li, Zhaoyang Zeng, Lei Zhang†. Neural Information Processing Systems (NeurIPS 2024) [ paper ]

	Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion Bohan Li, Jiajun Deng, Wenyao Zhang, Liang, Dalong Du, Xin Jin†, Wenjun Zeng. European Conference on Computer Vision (ECCV 2024) [ paper ] [ code ]

	Closed-Loop Unsupervised Representation Disentanglement with β-VAE Distillation and Diffusion Probabilistic Feedback Xin Jin†, Bohan Li, Baao Xie, Wenyao Zhang, Jinming Liu, Ziqiang Li, Tao Yang, Wenjun Zeng. European Conference on Computer Vision (ECCV 2024) [ paper ]

	Bridging Stereo Geometry and BEV Representation with Reliable Mutual Interaction for Semantic Scene Completion Bohan Li, Yasheng Sun, Zhujin Liang, Dalong Du, Zhuanghui Zhang, Xiaofeng Wang, Yunnan Wang, Xin Jin†, Wenjun Zeng. International Joint Conference on Artificial Intelligence (IJCAI 2024 Oral) [ paper ] [ code ]

	One at a Time: Progressive Multi-step Volumetric Probability Learning for Reliable 3D Scene Perception Bohan Li, Yasheng Sun, Jingxin Dong, Zheng Zhu, Jinming Liu, Xin Jin†, Wenjun Zeng. AAAI Conference on Artificial Intelligence (AAAI 2024) [ paper ]

	NaviNeRF: NeRF-based 3D Representation Disentanglement by Latent Semantic Navigation Baao Xie, Bohan Li, Zequn Zhang, Junting Dong, Xin Jin†, Jingyu Yang, Wenjun Zeng. International Conference on Computer Vision (ICCV 2023) [ paper ] [ code ]

	Robust Scale-Aware Stereo Matching Network James Okae, Bohan Li, Juan Du†, Yueming Hu. IEEE Transactions on Artificial Intelligence (TAI) [ paper ]

Experiences

	BAAI Project Leader & Research Intern Topic: Large-scale 4D Reconstruction and Generation, Controllable World Modeling
	Changcheng Project Leader & Research Intern Topic: Efficient World Modeling
	Lixiang Project Leader & Research Intern Topic: Scalable Multi-modal Driving Scene Generation
	MEGVII Project Leader & Research Intern Topic: UniScene: Unified Occupancy-centric Driving Scene Generation
	Tencent AILab Research Engineer Topic: Large-scale 3D City Scene Generation and Reconstruction
	NetEase AILab Research Intern Topic: Robust 3D Perception and Reconstruction
	NavInfo Research Intern Topic: Multi-modal Driving Scene Generation
	IDEA Research Intern Topic: Occupancy-based Scene Generation, Long-term Consistent Perception
	PhiGent Robotics Engineer & Research Intern Topic: Semantic Scene Completion, Depth Estimation, World Models
	ZTE Research Intern Topic: Monocular Depth Estimation, Real-time Stereo Matching

Certificate

2026 CSIG CSIG First Doctoral Student Forum

[ (CSIG第一届博士生论坛) ] [ (Live) ] ;
2025 PhD National Scholarship

(博士国家奖学金) ;
2024 Ningbo Association for Science and Technology Major Scientific Achievement Award

(宁波市科协重大科技成果奖) ;
2024 Huatai Securities Doctoral Science and Technology Scholarship

(华泰证券科技奖学金) ;
IJCAI 2024 Oral Presentation ;

Academic Services

Conference Reviewer:

CVPR 2025-, ICCV 2025-, ECCV 2024-, NeurIPS 2024-, ICLR 2026-, AAAI 2025-, IJCAI 2025-

Journal Reviewer:

IEEE T-PAMI, IEEE T-IP, IEEE T-MM, IEEE T-CSVT, IEEE T-AI, IEEE RA-L