About Me

Hi, I am a third-year Ph.D. student at the Australian Institute for Machine Learning (AIML), the University of Adelaide, supervised by A/Prof. Qi Wu and Dr. Yicong Hong. I am a member of the V3A Lab. Prior to my Ph.D. studies, I received my Master’s degree from the Australian National University, where I was supervised by Prof. Stephen Gould. In 2025, I interned at Adobe Research, where I collaborated closely with Dr. Yicong Hong, Dr. Chongjian Ge, Dr. Hao Tan, and Dr. Tianyu Wang. I am currently a Research Intern at Qwen.

I build explainable and embodied AI systems for autonomous agents that can perceive, reason, and navigate the physical world. My work focuses on integrating perception, reasoning, and long-horizon memory into a unified embodied intelligence framework that enables continuous learning and interpretable decision-making. I develop such systems by leveraging large multimodal reasoning models as decision policies, together with generative world models that support interaction, learning, and long-term evolution.

Some topics that I currently focus on include:

  • World Modeling and Simulation with Scalable Image/Video Generation: SAR, LightMover
  • Large Embodied Reasoning Models with Context Management and Agentic RL: NavGPT, NavGPT-2
  • Embodied Navigation Foundation Models with Sim2Real Transferability: NaVid, NavFoM

News

  • 2026.01.05   I joined Qwen as a Research Intern working on VLA and VL post-train. Super excited to learn and work with the team!
  • 2025.06.25   SAME is accepted to ICCV 2025. Thanks to all collaborators.
  • 2025.04.14   I joined Adobe Research as a Research Intern working on text to video generation. Excited to work with the team at San Jose!
  • 2025.01.27   One paper is accepted to ICRA 2025. Congratulations to Zerui!
  • 2024.07.11   We are thrilled to see that @GoogleDeepMind shares the same perspective as our previous work NavGPT on instruction-following navigation agents and build fascinating robots based on Gemini 1.5 Pro! [Details]
  • 2024.07.01   NavGPT-2 is accepted to ECCV 2024! Thanks to all collaborators.
  • 2024.05.14   NaVid is accepted to RSS 2024! Congratulations to Jiazhao, Kunyu and Rongtao!
  • 2023.12.09   Two papers are accepted to AAAI 2024. Congratulations and thanks to all collaborators.

Research

Document
SAME Image

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, Qi Wu

International Conference on Computer Vision (ICCV), 2025

Static Badge


NavGPT-2 Image

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Gengze Zhou, Yicong Hong, Zun Wang, Xin Eric Wang, Qi Wu

European Conference on Computer Vision (ECCV), 2024

Static Badge


NavGPT Image

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Gengze Zhou, Yicong Hong, Qi Wu

AAAI Conference on Artificial Intelligence (AAAI), 2024

Static Badge


WebVLN Image

WebVLN: Vision-and-Language Navigation on Websites

Qi Chen, Dileepa Pitawela, Chongyang Zhao, Gengze Zhou, Hsiang-Ting Chen, Qi Wu

AAAI Conference on Artificial Intelligence (AAAI), 2024

Static Badge


NaVid Image

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, Wang He

Proceedings of Robotics: Science and Systems (RSS), 2024

Static Badge

Experience

Professional

Qwen, Alibaba Group

Research Intern

Jan 2026 - Present | Beijing

Adobe Research

Research Intern

Aug 2025 - Nov 2025 | San Jose, CA

Adobe Research

Research Intern

Apr 2025 - Jul 2025 | San Jose, CA

Teaching

Teaching Assistant
COMP8536 - Deep Learning
Australian National University | 2022
Master Supervisor
COMP7205 - Individual Research Project on Embodied MLLM Agents Evaluation
University of Adelaide | 2025

Services

Conference Reviewer

Computer Vision
CVPR '24 '25 '26 ICCV '25
Machine Learning
NeurIPS '25 ICLR '25 '26
Natural Language Processing
ACL '25, '26 NAACL '25 EMNLP '24 '25
Artificial Intelligence
AAAI '25 MM '24
Robotics
ICRA '25 IROS '25

Journal Reviewer

TPAMI TCSVT RAL