Greetings! 😆

Welcome to my website!

I am a third-year PhD student in the Sun Yat-sen University’s Master-Doctor combined program, supervised by Guang Tan and Chao Gou.

Since June 2024, I have been honored to be a visiting Ph.D at MMLab, CUHK, under the supervision of Prof. Tianfan Xue.

My research interests center around 3D consistent video generation/world modeling, 3D generation/reconstruction, and 4D generation. I am open to collaboration and welcome further discussions if you are interested in my research.

🔥 News 🔥

∙ [2025.12] 🌟🌟 Our new work, ReCamDriving, is released! Check it out via here.
∙ [2025.10] 🌟🌟 Our new work, DynamicTree, is released! Check it out via here.
∙ [2025.10] 🎉🎉 One paper is accepted to PR 2025
∙ [2025.07] 🎉🎉 One paper is accepted to ICCV 2025
∙ [2024.12] 🎉🎉 One paper is accepted to AAAI 2025
∙ [2024.11] 🎉🎉 One paper is accepted to ESWA 2025
∙ [2023.11] 🎉🎉 One paper is accepted to IJCV 2024

📑 Selected Publications

ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation

Yaokun Li, Shuaixian Wang, Mantang Guo, Jiehui Huang, Taojun Ding, Mu Hu, Kaixuan Wang, Shaijie Shen, Guang Tan
arXiv Preprint, 2025

We introduce ReCamDriving, a vision-only framework for camera-controlled video generation, and ParaDrive, a large-scale dataset comprising 110K parallel-trajectory video pairs.

Project | Paper | Code

FullPart: Generating each 3D Part at Full Resolution

Lihe Ding*, Shaocong Dong*, Yaokun Li, Chenjian Gao, Xiao Chen, Rui Han, Yihao Kuang, Hong Zhang, Bo Huang, Zhanpeng Huang, Zibin Wang, Dan Xu†, Tianfan Xue†
arXiv Preprint, 2025

Fullpart generates each 3d part at full resolution. We also present PartVerse-XL, the largest human annotated 3d part dataset.

Project | Paper | Code

DynamicTree: Interactive Real Tree Animation via Sparse Voxel Spectrum

Yaokun Li, Lihe Ding, Xiao Chen, Guang Tan, Tianfan Xue
arXiv Preprint, 2025

We propose DynamicTree, the first framework that can generate long-term, interactive animation of 3D Gaussian Splatting trees.

Project | Paper | Code | Data

From One to More: Contextual Part Latents for 3D Generation

Shaocong Dong*, Lihe Ding*, Xiao Chen, Yaokun Li, Yuxin Wang, Yucheng Wang, Qi Wang, Jaehyeok Kim, Chenjian Gao, Zhanpeng Huang, Zibin Wang, Tianfan Xue†, Dan Xu†
ICCV 2025

Copart generates 3d parts from contextual part latents and supports various applications, such as articulation modeling.

Project | Paper | Code | Data

Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm

Chuang Yu, Jinmiao Zhao, Yunpeng Liu, Yaokun Li, Xiujun Shu, Yuanhao Feng, Bo Wang, Yimian Dai, Xiangyu Yue
arXiv Preprint, 2025

We propose FDEP, a foundation-driven efficient paradigm for single-frame infrared small target detection, alongside HSE, a holistic evaluation metric for fair model comparison.

Paper | Code

Exploiting Continuous Motion Clues for Vision-Based Occupancy Prediction

Haoran Xu, Peixi Peng, Xinyi Zhang, Guang Tan, Yaokun Li, Shuaixian Wang, Luntong Li
AAAI 2025

We propose CMOP, a continuous motion-aware occupancy prediction framework that leverages historical propagation and dynamic tracking modules to address object occlusions in real-world scenarios.

Paper

ID-NeRF: Indirect Diffusion-Guided Neural Radiance Fields for Generalizable View Synthesis

Yaokun Li, Shuaixian Wang, Guang Tan
ESWA 2025

We propose ID-NeRF, a generalizable novel view synthesis framework that addresses sub-optimal reprojected features by indirectly distilling pre-trained diffusion priors into an imaginative latent space for feature refinement.

Paper

Learning hierarchical uncertainty from hybrid representations for neural active reconstruction

Shuaixian Wang, Yaokun Li, Chenhui Guo, Guang Tan
PR 2025

We propose a neural active reconstruction system that leverages hierarchical uncertainty across hybrid implicit representations to optimize next-best-view planning and high-fidelity 3D reconstruction.

Paper

Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose

Yaokun Li, Guang Tan, Chao Gou
IJCV 2024

We propose CIT, a cascaded iterative transformer that explicitly exploits task dependencies for facial analysis, along with MERL-RAV-FLOP, the first dataset providing joint annotations for landmarks, occlusion, and pose.

Paper | Code

🏆 Awards

∙ (2019) China National Scholarship
∙ (2020) Polytechnic Youth Top Ten Students
∙ (2022) Honorable mention in HACKPKU 2022
∙ (2023) Third Prize of 2023 “Huawei Cup” National Graduate Student Mathematical Modeling Competition

📝 Academic Service

Reviewer:
∙ Conference Reviewer: CVPR, ECCV, AAAI, …
∙ Journal Reviewer: IJCV, TCSVT, PR, …

📖 Teaching

∙ Teaching Assistant: IERG4190-IEMS5707 Multimedia Coding and Processing, CUHK, 2024R2 ∙ Teaching Assistant: ISE3111 Pattern Recognition & Machine Learning, SYSU, 2022 Fall

😻 My Hobbies

🏃‍♂️ 🏀 🏋 🎧 📷 …

Yaokun Li 李垚坤

🔥 News 🔥

📑 Selected Publications

ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation

FullPart: Generating each 3D Part at Full Resolution

DynamicTree: Interactive Real Tree Animation via Sparse Voxel Spectrum

From One to More: Contextual Part Latents for 3D Generation

Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm

Exploiting Continuous Motion Clues for Vision-Based Occupancy Prediction

ID-NeRF: Indirect Diffusion-Guided Neural Radiance Fields for Generalizable View Synthesis

Learning hierarchical uncertainty from hybrid representations for neural active reconstruction

Cascaded Iterative Transformer for Jointly Predicting Facial Landmark, Occlusion Probability and Head Pose

🏆 Awards

📝 Academic Service

📖 Teaching

😻 My Hobbies