The CVNext lab focuses on advancing general-purpose embodied intelligence, building upon foundations in long video understanding and reasoning in dynamic, complex scenes. The core objective is to develop open, adaptive embodied agents that tightly integrate environment perception, interactive reasoning, and personalized adaptation and decision-making. Ultimately, the research aims to establish both theoretical frameworks and practical systems for general and domain-specific embodied agents, contributing to scalable, transferable, and real-world embodied AI. Our main research directions include:
  • Interactive 3D Scene Reconstruction and Generation
  • Unified World-Reasoning-Action Modeling for Embodied Agents
  • Personalized Adaptation with Active Perception

Professor


Gaoang Wang [Web]

Assistant Professor

Office: C417, ZJUI Building
Email: [email protected]

Research Interests:

  • Visual Perception
  • Transfer Learning
  • Spatial Intelligence
  • Embodied Intelligence


News:

  • [Nov. 2025] One paper was accepted by IJCV, 2025.
  • [Nov. 2025] Three papers were accepted by AAAI 2026, including one oral paper.
  • [Oct. 2025] We got an outstanding paper award in ICCV KnowledgeMR workshop.
  • [Aug. 2025] One paper was accepted by TPAMI, 2025.
  • [Jul. 2025] One paper was accepted by ECAI 2025.
  • [Jul. 2025] One paper was accepted by ICCV Findings Workshop, 2025.
  • [Jun. 2025] One paper was accepted by TIP, 2025.
  • [Jun. 2025] One paper was accepted by ICCV 2025.
  • [May 2025] One paper was accepted by Information Fusion, 2025.
  • [May 2025] One paper was accepted by ICML 2025.
  • [Apr. 2025] One paper was accepted by CVPR Workshop on Urban Scene Modeling, 2025.
  • [Mar. 2025] One paper was accepted by TVCG, 2025.
  • [Feb. 2025] One paper was accepted by TCSVT, 2025.
  • [Feb. 2025] One paper was accepted by CVPR 2025.
  • [Jan. 2025] One paper was accepted by MIA, 2025.
  • [Jan. 2025] One paper was accepted by TMM, 2025.
  • [Dec. 2024] Two papers were accepted by ICASSP 2025.
  • [Dec. 2024] One paper was accepted by AAAI 2025.
  • [Sep. 2024] One paper was accepted by NeurIPS 2024.
  • [Jul. 2024] One paper was accepted by MICCAI Workshop on Deep Generative Models, 2024.
  • [Jun. 2024] Two papers were accepted by ACM MM 2024.
  • [Jun. 2024] One paper was accepted by ECCV 2024.
  • [Jun. 2024] One paper was accepted by PRCV 2024.
  • [Apr. 2024] One paper was accepted by TMM, 2024.
  • [Mar. 2024] "Long-term Video Question Answering Competition (LOVEU@CVPR'24 Track 1)" was released. More details can be found here.
  • [Mar. 2024] One paper was accepted by ICLR Workshop on LLM Agents, 2024.
  • [Feb. 2024] Three papers were accepted by CVPR 2024.
  • [Dec. 2023] Two papers were accepted by ICASSP 2024.
  • [Dec. 2023] Two papers were accepted by AAAI 2024.
  • [Dec. 2023] One paper was accepted by Neurocomputing, 2023.
  • [Sep. 2023] One paper was accepted by IJCV, 2023.
  • [Sep. 2023] One paper was accepted by TMM, 2023.
  • [Aug. 2023] One paper was accepted by PRCV 2023.
  • [Jul. 2023] Three papers were accepted by ICCV 2023.
  • [Jun. 2023] One paper was accepted by MICCAI 2023.
  • [May 2023] One paper was accepted by Findings of ACL 2023.
  • [Apr. 2023] Two papers were accepted by IJCAI 2023.
  • [Apr. 2023] One paper was accepted by CVPR workshop, Computer Vision for Fashion, Art, and Design, 2023.
  • [Mar. 2023] One paper was accepted by ICME 2023.
  • [Mar. 2023] One paper was accepted by ICASSP 2023.
  • [Feb. 2023] One paper was accepted by CVPR 2023.
  • [Feb. 2023] One paper was accepted by TAI, 2023.
  • [Nov. 2022] One paper was accepted by TMI, 2022.
  • [Jul. 2022] One paper was accepted by ECCV 2022.
  • [Apr. 2022] One paper was accepted by CVPR workshop, the 2nd Workshop on Sketch-Oriented Deep Learning, 2022.
  • [Mar. 2022] One paper was accepted by ICME 2022.
  • [Jan. 2022] One paper was accepted by TMM, 2022.
  • [Aug. 2021] One paper was accepted by CVIU, 2021.
  • [Jul. 2021] One paper was accepted by ICCV 2021.
  • [Apr. 2021] One paper was accepted by CVPR workshop, the Workshop on Autonomous Driving, 2021.
  • [Jan. 2021] ROD2021 Challenge @ICMR 2021 was released.



  • Ph.D. Students


    [email protected]
    Multi-modality Learning
    Video Understanding
    Vision and Language
    Wenhao Hu [Web]
    [email protected]
    3D Vision
    Generative Models
    Anomaly Detection
    Zhonghan Zhao
    [email protected]
    Embodied AI
    Reinforcement Learning
    Incontext Learning

    Chenlu Zhan
    (Main Advisor: Hongwei Wang)
    [email protected]
    Medical Vision Language
    Medical Multimodality
    Visual-Language Pretraining
    Wendi Hu
    [email protected]
    Multi-object Tracking
    Kewei Wei
    [email protected]
    Multimodality Learning

    Tielong Cai
    [email protected]
    Generative model
    Embodied AI



    Master Students


    Enxin Song [Web]
    [email protected]
    Video Understanding
    Image Generation
    Xuan Wang
    [email protected]
    Multi-modality Learning
    Embodied AI
    Fang Liang
    3D Vision
    Image Reconstruction

    Dongping Li
    [email protected]
    Multi-modality Learning
    Active Perception
    Unified Model
    Junsheng Huang
    [email protected]
    3D Vision
    Multi-modality Learning
    Tianci Tang
    [email protected]
    Embodied AI
    Diffusion Model

    Yizhi Li
    [email protected]
    Multi-modality Learning
    Computer Vision
    Xuexiang Wen
    [email protected]
    Multi-modality Learning
    Jiawu Zhang
    [email protected]
    Multi-modal logistics large models

    Bocheng Hu
    [email protected]
    Motion Generation
    Vision–Language Models (VLMs)
    Vision–Language–Action Models (VLAs)
    Jie Cao
    [email protected]
    Multi-modality Learning
    Haonan Zhou
    [email protected]
    3D Scene Generation

    Xiaohan Chen
    [email protected]
    Multi-modality Learning
    Large Language Models (LLMs)



    Alumni


    Shengyu Hao
    [email protected]
    Multi-object Tracking
    Representation Learning
    Domain Adaptation
    Xiaoyue Li
    (Main Advisor: Mark Butala)
    [email protected]
    Image Generation
    Image Reconstruction
    Medical Image Inverse Problems

    Shidong Cao
    [email protected]
    Generative Models
    Multi-modality Learning
    Graph Machine Learning
    Yichen Ouyang [Web]
    [email protected]
    Generative Models
    3D Vision
    Multi-modality Learning
    Meiqi Sun
    [email protected]
    Animal Action Recognition
    Animal Pose Estimation

    Xuechen Guo
    [email protected]
    Computer Vision
    Multi-modality Learning
    Jianshu Guo
    [email protected]
    Diffusion Model
    Vision Language
    Chang Su
    [email protected]
    Smart City

    Yichen Xu
    Wenhao Chai (Alumni)[Web]
    [email protected]
    Multi-modality Representation
    Unified Perception Model
    Embodied Intelligence
    Jie Deng
    [email protected]
    3D Scene Generation