The CVNext lab focuses on advancing general-purpose embodied intelligence, building upon foundations in long video
understanding and reasoning in dynamic, complex scenes. The core objective is to develop open, adaptive embodied agents
that tightly integrate environment perception, interactive reasoning, and personalized adaptation and decision-making.
Ultimately, the research aims to establish both theoretical frameworks and practical systems for general and
domain-specific embodied agents, contributing to scalable, transferable, and real-world embodied AI. Our main research
directions include:
- Interactive 3D Scene Reconstruction and Generation
- Unified World-Reasoning-Action Modeling for Embodied Agents
- Personalized Adaptation with Active Perception
Professor
Gaoang Wang [Web]
Assistant Professor
Office: C417, ZJUI Building
Email: [email protected]
Research Interests:
- Visual Perception
- Transfer Learning
- Spatial Intelligence
- Embodied Intelligence
News:
[Nov. 2025] One paper was accepted by IJCV, 2025.
[Nov. 2025] Three papers were accepted by AAAI 2026, including one oral paper.
[Oct. 2025] We got an outstanding paper award in ICCV KnowledgeMR workshop.
[Aug. 2025] One paper was accepted by TPAMI, 2025.
[Jul. 2025] One paper was accepted by ECAI 2025.
[Jul. 2025] One paper was accepted by ICCV Findings Workshop, 2025.
[Jun. 2025] One paper was accepted by TIP, 2025.
[Jun. 2025] One paper was accepted by ICCV 2025.
[May 2025] One paper was accepted by Information Fusion, 2025.
[May 2025] One paper was accepted by ICML 2025.
[Apr. 2025] One paper was accepted by CVPR Workshop on Urban Scene Modeling, 2025.
[Mar. 2025] One paper was accepted by TVCG, 2025.
[Feb. 2025] One paper was accepted by TCSVT, 2025.
[Feb. 2025] One paper was accepted by CVPR 2025.
[Jan. 2025] One paper was accepted by MIA, 2025.
[Jan. 2025] One paper was accepted by TMM, 2025.
[Dec. 2024] Two papers were accepted by ICASSP 2025.
[Dec. 2024] One paper was accepted by AAAI 2025.
[Sep. 2024] One paper was accepted by NeurIPS 2024.
[Jul. 2024] One paper was accepted by MICCAI Workshop on Deep Generative Models, 2024.
[Jun. 2024] Two papers were accepted by ACM MM 2024.
[Jun. 2024] One paper was accepted by ECCV 2024.
[Jun. 2024] One paper was accepted by PRCV 2024.
[Apr. 2024] One paper was accepted by TMM, 2024.
[Mar. 2024] "Long-term Video Question Answering Competition (LOVEU@CVPR'24 Track 1)" was released. More details can
be found here.
[Mar. 2024] One paper was accepted by ICLR Workshop on LLM Agents, 2024.
[Feb. 2024] Three papers were accepted by CVPR 2024.
[Dec. 2023] Two papers were accepted by ICASSP 2024.
[Dec. 2023] Two papers were accepted by AAAI 2024.
[Dec. 2023] One paper was accepted by Neurocomputing, 2023.
[Sep. 2023] One paper was accepted by IJCV, 2023.
[Sep. 2023] One paper was accepted by TMM, 2023.
[Aug. 2023] One paper was accepted by PRCV 2023.
[Jul. 2023] Three papers were accepted by ICCV 2023.
[Jun. 2023] One paper was accepted by MICCAI 2023.
[May 2023] One paper was accepted by Findings of ACL 2023.
[Apr. 2023] Two papers were accepted by IJCAI 2023.
[Apr. 2023] One paper was accepted by CVPR workshop, Computer Vision for Fashion, Art, and Design, 2023.
[Mar. 2023] One paper was accepted by ICME 2023.
[Mar. 2023] One paper was accepted by ICASSP 2023.
[Feb. 2023] One paper was accepted by CVPR 2023.
[Feb. 2023] One paper was accepted by TAI, 2023.
[Nov. 2022] One paper was accepted by TMI, 2022.
[Jul. 2022] One paper was accepted by ECCV 2022.
[Apr. 2022] One paper was accepted by CVPR workshop, the 2nd Workshop on Sketch-Oriented Deep Learning, 2022.
[Mar. 2022] One paper was accepted by ICME 2022.
[Jan. 2022] One paper was accepted by TMM, 2022.
[Aug. 2021] One paper was accepted by CVIU, 2021.
[Jul. 2021] One paper was accepted by ICCV 2021.
[Apr. 2021] One paper was accepted by CVPR workshop, the Workshop on Autonomous Driving, 2021.
[Jan. 2021] ROD2021 Challenge @ICMR 2021 was released.
Ph.D. Students
Multi-modality Learning
Video Understanding
Vision and Language
3D Vision
Generative Models
Anomaly Detection
Zhonghan Zhao
Embodied AI
Reinforcement Learning
Incontext Learning
Chenlu Zhan
(Main Advisor: Hongwei Wang)
Medical Vision Language
Medical Multimodality
Visual-Language Pretraining
Wendi Hu
Multi-object Tracking
Kewei Wei
Multimodality Learning
Tielong Cai
Generative model
Embodied AI
Master Students
Dongping Li
Multi-modality Learning
Active Perception
Unified Model
Junsheng Huang
3D Vision
Multi-modality Learning
Tianci Tang
Embodied AI
Diffusion Model
Yizhi Li
Multi-modality Learning
Computer Vision
Xuexiang Wen
Multi-modality Learning
Jiawu Zhang
Multi-modal logistics large models
Bocheng Hu
Motion Generation
Vision–Language Models (VLMs)
Vision–Language–Action Models (VLAs)
Jie Cao
Multi-modality Learning
Haonan Zhou
3D Scene Generation
Xiaohan Chen
Multi-modality Learning
Large Language Models (LLMs)
Alumni
Shengyu Hao
Multi-object Tracking
Representation Learning
Domain Adaptation
Xiaoyue Li
(Main Advisor: Mark Butala)
Image Generation
Image Reconstruction
Medical Image Inverse Problems
Shidong Cao
Generative Models
Multi-modality Learning
Graph Machine Learning
Yichen Ouyang [Web]
Generative Models
3D Vision
Multi-modality Learning
Meiqi Sun
Animal Action Recognition
Animal Pose Estimation
Xuechen Guo
Computer Vision
Multi-modality Learning
Jianshu Guo
Diffusion Model
Vision Language