Skip to content

amap-cvlab/ABot-Navigation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ABot-N0

A Unified VLA Foundation Model for Versatile Embodied Navigation

Paper Project Page

AMAP CV Lab


πŸ”₯ News

πŸ“– Introduction

intro.mp4

ABot-N0 is a unified Vision-Language-Action (VLA) foundation model that achieves a "Grand Unification" across 5 core embodied navigation tasks:

Task Description
🎯 Point-Goal Reach precise metric coordinates for robust locomotion and obstacle avoidance
πŸ” Object-Goal Search for and navigate to a specific object category in unseen environments
πŸ“ Instruction-Following Execute complex natural language navigation instructions
πŸ“ POI-Goal Navigate to specific Points of Interest and their physical entrances
🚢 Person-Following Real-time tracking and following of dynamic human targets

ABot-N0 adopts a hierarchical "Brain-Action" architecture:

  • Universal Multi-Modal Encoder β€” unifies heterogeneous inputs (RGB, visual history, goals) into a shared latent space
  • Cognitive Brain β€” a pre-trained LLM (Qwen3-4B) for deep semantic understanding and spatial reasoning
  • Action Expert β€” Flow Matching-based trajectory generator for precise, continuous control

πŸ“Š Key Highlights

Unified Tasks 5 core navigation paradigms in a single model
SOTA Benchmarks New state-of-the-art on 7 authoritative benchmarks
Data Scale 16.9M expert trajectories + 5.0M reasoning samples
3D Scenes 7,802 high-fidelity scenes covering 10.3 kmΒ²
Real-world Deployment Deployed on Unitree Go2 with NVIDIA Jetson Orin NX, achieving 2Hz VLA inference

πŸ—οΈ Architecture

ABot-N0 follows a hierarchical "Brain-Action" design comprising three pillars:

  1. Universal Multi-Modal Encoder: Supports flexible vision inputs (panoramic / front-view), heterogeneous goal definitions (text-based semantic goals & point-based geometric goals), and reasoning task encoding.

  2. Cognitive Brain: Built upon a pre-trained LLM, it supports dual-mode operation β€” a Reasoning Head for high-level semantic understanding and an Action Head for motion planning.

  3. Action Expert: Employs Flow Matching to generate multi-modal trajectory distributions (5 waypoints with position + yaw), enabling precise continuous control.

πŸ“¦ Data Engine

ABot-N0 Data Engine Video

▢️ Click to watch the Data Engine video

The ABot-N0 Data Engine is the largest embodied navigation data pipeline, integrating three synergistic layers:

  • High-Fidelity 3D Scene Ecosystem: 7,802 scenes (indoor: homes, offices, malls, stations; outdoor: intersections, parks, virtual city) covering 10.3 kmΒ²
  • Universal Trajectories Dataset: ~16.9M expert trajectories across 5 navigation paradigms
  • Cognitive Reasoning Dataset: ~5.0M reasoning samples grounding decision-making in spatial-social logic

πŸ‹οΈ Training Recipe

ABot-N0 is trained via a three-stage curriculum:

  1. Phase 1 β€” Cognitive Warm-up: Fine-tune the LLM backbone on reasoning tasks to learn "what to see" and "how to reason"
  2. Phase 2 β€” Unified Sensorimotor SFT: Joint multi-task training with dual-head optimization (AR reasoning + Flow Matching actions)
  3. Phase 3 β€” SAFE-GRPO: Post-training value alignment via socially-aware reinforcement learning for social compliance

πŸ€– Agentic Navigation System

Beyond the foundation model, we propose an Agentic Navigation System for real-world deployment:

  • Agentic Planner: VLM-powered intent decomposition with CoT reasoning and closed-loop self-reflection
  • Topo-Memory (Map-as-Memory): Hierarchical topological memory for cross-scale spatial knowledge (Block β†’ Road β†’ Function β†’ Object/POI layers)
  • Neural Controller: High-speed reactive control (>10Hz) bridging strategic waypoints and real-time execution
  • Hardware: Unitree Go2 quadrupedal robot + NVIDIA Jetson Orin NX (157 TOPS)

🎬 Video Demos

🌍 Long-Horizon Agentic Missions

Indoor Long-Range Mission
▢️ Indoor Long-Range Mission
Outdoor Long-Range Mission
▢️ Outdoor Long-Range Mission

🀝 Real-World Applications

Guide Dog Assistance
▢️ Guide Dog Assistance
Interactive Companion
▢️ Interactive Companion

🎯 Single-Task Capabilities

Point-Goal Navigation
▢️ Point-Goal
Object-Goal Navigation
▢️ Object-Goal
Instruction-Following
▢️ Instruction-Following
POI-Goal Navigation
▢️ POI-Goal
Person-Following
▢️ Person-Following

πŸ“ˆ Benchmark Results

ABot-N0 achieves new SOTA on 7 benchmarks:

  • CityWalker (Point-Goal, Open-Loop)
  • SocNav (Point-Goal, Closed-Loop)
  • VLN-CE R2R (Instruction-Following)
  • VLN-CE RxR (Instruction-Following)
  • HM3D-OVON (Object-Goal)
  • BridgeNav (POI-Goal)
  • EVT-Bench (Person-Following)

πŸ“… Release Plan

We are committed to progressively open-sourcing resources to support the research community:

Phase Content Status
Phase 1 Technical Report βœ… Released
Phase 2 Data πŸ”œ Coming Soon
Phase 3 Code πŸ”œ Coming Soon

⚠️ Note on Data Release: Due to privacy and security concerns associated with certain data, we will conduct thorough data cleaning and de-identification before releasing a compliant version for community research use. We prioritize data compliance over release speed β€” thank you for your patience and understanding.

πŸ“„ Citation

If you find this work useful, please consider citing:

@misc{chu2026abotn0technicalreportvla,
      title={ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation}, 
      author={Zedong Chu and Shichao Xie and Xiaolong Wu and Yanfen Shen and Minghua Luo and Zhengbo Wang and Fei Liu and Xiaoxu Leng and Junjun Hu and Mingyang Yin and Jia Lu and Yingnan Guo and Kai Yang and Jiawei Han and Xu Chen and Yanqing Zhu and Yuxiang Zhao and Xin Liu and Yirong Yang and Ye He and Jiahang Wang and Yang Cai and Tianlin Zhang and Li Gao and Liu Liu and Mingchao Sun and Fan Jiang and Chiyu Wang and Zhicheng Liu and Hongyu Pan and Honglin Han and Zhining Gu and Kuan Yang and Jianfang Zhang and Di Jing and Zihao Guan and Wei Guo and Guoqing Liu and Di Yang and Xiangpo Yang and Menglin Yang and Hongguang Xing and Weiguo Li and Mu Xu},
      year={2026},
      eprint={2602.11598},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2602.11598}, 
}

πŸ“œ License

This project is released under the Apache 2.0 License.

πŸ™ Acknowledgments

This work is developed by AMAP CV Lab. See the Technical Report for a full list of contributors.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors