🚀 SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation

Official Implementation of IROS 2025 Paper

SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation
Xiangyu Shi*, Zerui Li*, Wenqi Lyu, Jiatong Xia, Feras Dayoub,
Yanyuan Qiao^§, Qi Wu^†

📄 Paper · 🌐 Project Page

🧠 Abstract

Vision-and-Language Navigation (VLN) in continuous environments requires agents to interpret natural language instructions while navigating unconstrained 3D spaces. Existing VLN-CE frameworks rely on a two-stage approach: a waypoint predictor to generate waypoints and a navigator to execute movements. However, current waypoint predictors struggle with spatial awareness, while navigators lack historical reasoning and backtracking capabilities, limiting adaptability.

We propose a zero-shot VLN-CE framework integrating an enhanced waypoint predictor with a Multi-modal Large Language Model (MLLM)-based navigator. Our predictor employs a stronger vision encoder, masked cross-attention fusion, and an occupancyaware loss for better waypoint quality. The navigator incorporates history-aware reasoning and adaptive path planning with backtracking, improving robustness. Experiments on R2R-CE and MP3D benchmarks show our method achieves state-of-theart (SOTA) performance in zero-shot settings, demonstrating competitive results compared to fully supervised methods.

🤖 We deploy our method on a TurtleBot 4 equipped with an OAK-D Pro camera, demonstrating its adaptability through real-world validation.

📦 Overview

Figure: SmartWay architecture, combining enhanced waypoint prediction and backtracking.

📦 Environment Setup

# conda install
conda create -n smartway python==3.8.20
conda activate smartway

# pytorch
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121

# habitat-sim
wget https://anaconda.org/aihabitat/habitat-sim/0.1.7/download/linux-64/habitat-sim-0.1.7-py3.8_headless_linux_856d4b08c1a2632626bf0d205bf46471a99502b7.tar.bz2
conda install habitat-sim-0.1.7-py3.8_headless_linux_856d4b08c1a2632626bf0d205bf46471a99502b7.tar.bz2

# habitat-lab
git clone --branch v0.1.7 [email protected]:facebookresearch/habitat-lab.git
cd habitat-lab
python setup.py develop --all # install habitat and habitat_baselines
cd ..

# Adapt requirements.txt from Discrete-Continuous-VLN repo
python -m pip install -r requirements.txt
pip install webdataset
pip install openai tenacity timm fairscale

This project builds upon Discrete-Continuous-VLN. Please refer to their repository to set up the required conda environment and dependencies. Smartway running under Python3.8.20.

ℹ️ Note: All experiments are conducted using Habitat v0.1.7.

📁 Dataset Preparation

This project relies on R2R VLN-CE data and Matterport3D scene assets. Please follow the steps below to prepare the dataset correctly.

1️⃣ Download R2R VLN-CE Dataset

Download the preprocessed R2R VLN-CE dataset from the link below:

🔗 datasets 👉 Google Drive Link

After downloading, place the dataset in:

/data/

2️⃣ Download Matterport3D Scene Data

Download the Matterport3D dataset from the official website:

🔗 https://niessner.github.io/Matterport/

You may need to apply for dataset access first.

Place the extracted scene data under:

data/scene_datasets/mp3d/

Each scene directory should follow the structure below.

3️⃣ Expected Directory Structure

After completing the above steps, your directory layout should look like this:

data/
├── datasets/
│   └── R2R_VLNCE_v1-2_preprocessed/
│   └── R2R_VLNCE_v1-2_preprocessed_BERTidx/
├── pretrained_models/
|   └── ddppo-models
└── scene_datasets/
    └── mp3d/
        └── {scene_id}/
            ├── {scene_id}.glb
            ├── {scene_id}_semantic.ply
            ├── {scene_id}.house
            └── {scene_id}.navmesh

📌 Note:

{scene_id} refers to a Matterport3D scene identifier (e.g., 17DRP5sb8fy).
For more details on dataset preparation and environment setup, please refer to:

🔗 Open-Nav Repository 👉 https://github.com/YanyuanQiao/Open-Nav

🔗 DC-VLN Repository 👉 https://github.com/YicongHong/Discrete-Continuous-VLN

📥 Pretrained Checkpoints

Download and place the following pretrained models:

Waypoint Predictor → Save to waypoint_predictor/checkpoints/
Depth Encoder (ResNet-50, gibson-2plus) → Save to data/pretrained_models/ddppo-models/
RAM+ (ram_plus_swin_large_14m.pth) → Save to the root repo directory /

🧩 Third party

Recognize Anything — Please install this dependency before using the navigation module.

→ Save to the root repo directory /

🚀 Run the repo

To run:

bash eval.sh

🙏 Acknowledgements

We thank Discrete-Continuous-VLN (adapt from this repo), Dinov2, OpenNav, Recognize-Anything, VLN-CE, MapGPT, and Waypoint Predictor for their inspiring work.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
habitat_extensions		habitat_extensions
vlnce_baselines		vlnce_baselines
waypoint_predictor		waypoint_predictor
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.sh		eval.sh
requirements.txt		requirements.txt
run.py		run.py
run_VLNBERT.yaml		run_VLNBERT.yaml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚀 SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation

🧠 Abstract

📦 Overview

📦 Environment Setup

📁 Dataset Preparation

1️⃣ Download R2R VLN-CE Dataset

2️⃣ Download Matterport3D Scene Data

3️⃣ Expected Directory Structure

📥 Pretrained Checkpoints

🧩 Third party

🚀 Run the repo

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🚀 SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation

🧠 Abstract

📦 Overview

📦 Environment Setup

📁 Dataset Preparation

1️⃣ Download R2R VLN-CE Dataset

2️⃣ Download Matterport3D Scene Data

3️⃣ Expected Directory Structure

📥 Pretrained Checkpoints

🧩 Third party

🚀 Run the repo

🙏 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages