Greetings! 😆
Welcome to my website!
I am a third-year PhD student in the Sun Yat-sen University’s Master-Doctor combined program, supervised by Guang Tan and Chao Gou.
Since June 2024, I have been honored to be a visiting Ph.D at MMLab, CUHK, under the supervision of Prof. Tianfan Xue.
My research interests center around 3D consistent video generation/world modeling, 3D generation/reconstruction, and 4D generation. I am open to collaboration and welcome further discussions if you are interested in my research.
🔥 News 🔥
∙ [2025.12] 🌟🌟 Our new work, ReCamDriving, is released! Check it out via here.
∙ [2025.10] 🌟🌟 Our new work, DynamicTree, is released! Check it out via here.
∙ [2025.10] 🎉🎉 One paper is accepted to PR 2025
∙ [2025.07] 🎉🎉 One paper is accepted to ICCV 2025
∙ [2024.12] 🎉🎉 One paper is accepted to AAAI 2025
∙ [2024.11] 🎉🎉 One paper is accepted to ESWA 2025
∙ [2023.11] 🎉🎉 One paper is accepted to IJCV 2024
📑 Selected Publications

ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation
Yaokun Li, Shuaixian Wang, Mantang Guo, Jiehui Huang, Taojun Ding, Mu Hu, Kaixuan Wang, Shaijie Shen, Guang Tan
arXiv Preprint, 2025
We introduce ReCamDriving, a vision-only framework for camera-controlled video generation, and ParaDrive, a large-scale dataset comprising 110K parallel-trajectory video pairs.

FullPart: Generating each 3D Part at Full Resolution
Lihe Ding*, Shaocong Dong*, Yaokun Li, Chenjian Gao, Xiao Chen, Rui Han, Yihao Kuang, Hong Zhang, Bo Huang, Zhanpeng Huang, Zibin Wang, Dan Xu†, Tianfan Xue†
arXiv Preprint, 2025
Fullpart generates each 3d part at full resolution. We also present PartVerse-XL, the largest human annotated 3d part dataset.

DynamicTree: Interactive Real Tree Animation via Sparse Voxel Spectrum
Yaokun Li, Lihe Ding, Xiao Chen, Guang Tan, Tianfan Xue
arXiv Preprint, 2025
We propose DynamicTree, the first framework that can generate long-term, interactive animation of 3D Gaussian Splatting trees.

From One to More: Contextual Part Latents for 3D Generation
Shaocong Dong*, Lihe Ding*, Xiao Chen, Yaokun Li, Yuxin Wang, Yucheng Wang, Qi Wang, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, Zibin Wang, Tianfan Xue†, Dan Xu†
ICCV 2025
Copart generates 3d parts from contextual part latents and supports various applications, such as articulation modeling.

Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm
Chuang Yu, Jinmiao Zhao, Yunpeng Liu, Yaokun Li, Xiujun Shu, Yuanhao Feng, Bo Wang, Yimian Dai, Xiangyu Yue
arXiv Preprint, 2025
We propose FDEP, a foundation-driven efficient paradigm for single-frame infrared small target detection, alongside HSE, a holistic evaluation metric for fair model comparison.

Exploiting Continuous Motion Clues for Vision-Based Occupancy Prediction
Haoran Xu, Peixi Peng, Xinyi Zhang, Guang Tan, Yaokun Li, Shuaixian Wang, Luntong Li
AAAI 2025
We propose CMOP, a continuous motion-aware occupancy prediction framework that leverages historical propagation and dynamic tracking modules to address object occlusions in real-world scenarios.

ID-NeRF: Indirect Diffusion-Guided Neural Radiance Fields for Generalizable View Synthesis
Yaokun Li, Shuaixian Wang, Guang Tan
ESWA 2025
We propose ID-NeRF, a generalizable novel view synthesis framework that addresses sub-optimal reprojected features by indirectly distilling pre-trained diffusion priors into an imaginative latent space for feature refinement.

Learning hierarchical uncertainty from hybrid representations for neural active reconstruction
Shuaixian Wang, Yaokun Li, Chenhui Guo, Guang Tan
PR 2025
We propose a neural active reconstruction system that leverages hierarchical uncertainty across hybrid implicit representations to optimize next-best-view planning and high-fidelity 3D reconstruction.

Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose
Yaokun Li, Guang Tan, Chao Gou
IJCV 2024
We propose CIT, a cascaded iterative transformer that explicitly exploits task dependencies for facial analysis, along with MERL-RAV-FLOP, the first dataset providing joint annotations for landmarks, occlusion, and pose.
🏆 Awards
∙ (2019) China National Scholarship
∙ (2020) Polytechnic Youth Top Ten Students
∙ (2022) Honorable mention in HACKPKU 2022
∙ (2023) Third Prize of 2023 “Huawei Cup” National Graduate Student Mathematical Modeling Competition
📝 Academic Service
Reviewer:
∙ Conference Reviewer: CVPR, ECCV, AAAI, …
∙ Journal Reviewer: IJCV, TCSVT, PR, …
📖 Teaching
∙ Teaching Assistant: IERG4190-IEMS5707 Multimedia Coding and Processing, CUHK, 2024R2 ∙ Teaching Assistant: ISE3111 Pattern Recognition & Machine Learning, SYSU, 2022 Fall
😻 My Hobbies
🏃♂️ 🏀 🏋 🎧 📷 …
