Skip to content

sxyxs/SmartWay-Code

Repository files navigation

🚀 SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation

arXiv IROS 2025

Official Implementation of IROS 2025 Paper

SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation
Xiangyu Shi*, Zerui Li*, Wenqi Lyu, Jiatong Xia, Feras Dayoub,
Yanyuan Qiao§, Qi Wu

📄 Paper · 🌐 Project Page


🧠 Abstract

Vision-and-Language Navigation (VLN) in continuous environments requires agents to interpret natural language instructions while navigating unconstrained 3D spaces. Existing VLN-CE frameworks rely on a two-stage approach: a waypoint predictor to generate waypoints and a navigator to execute movements. However, current waypoint predictors struggle with spatial awareness, while navigators lack historical reasoning and backtracking capabilities, limiting adaptability.

We propose a zero-shot VLN-CE framework integrating an enhanced waypoint predictor with a Multi-modal Large Language Model (MLLM)-based navigator. Our predictor employs a stronger vision encoder, masked cross-attention fusion, and an occupancyaware loss for better waypoint quality. The navigator incorporates history-aware reasoning and adaptive path planning with backtracking, improving robustness. Experiments on R2R-CE and MP3D benchmarks show our method achieves state-of-theart (SOTA) performance in zero-shot settings, demonstrating competitive results compared to fully supervised methods.

🤖 We deploy our method on a TurtleBot 4 equipped with an OAK-D Pro camera, demonstrating its adaptability through real-world validation.


📦 Overview

SmartWay pipeline overview

Figure: SmartWay architecture, combining enhanced waypoint prediction and backtracking.


📦 Environment Setup

# conda install
conda create -n smartway python==3.8.20
conda activate smartway

# pytorch
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121

# habitat-sim
wget https://anaconda.org/aihabitat/habitat-sim/0.1.7/download/linux-64/habitat-sim-0.1.7-py3.8_headless_linux_856d4b08c1a2632626bf0d205bf46471a99502b7.tar.bz2
conda install habitat-sim-0.1.7-py3.8_headless_linux_856d4b08c1a2632626bf0d205bf46471a99502b7.tar.bz2

# habitat-lab
git clone --branch v0.1.7 [email protected]:facebookresearch/habitat-lab.git
cd habitat-lab
python setup.py develop --all # install habitat and habitat_baselines
cd ..

# Adapt requirements.txt from Discrete-Continuous-VLN repo
python -m pip install -r requirements.txt
pip install webdataset
pip install openai tenacity timm fairscale

This project builds upon Discrete-Continuous-VLN. Please refer to their repository to set up the required conda environment and dependencies. Smartway running under Python3.8.20.

ℹ️ Note: All experiments are conducted using Habitat v0.1.7.

📁 Dataset Preparation

This project relies on R2R VLN-CE data and Matterport3D scene assets. Please follow the steps below to prepare the dataset correctly.


1️⃣ Download R2R VLN-CE Dataset

Download the preprocessed R2R VLN-CE dataset from the link below:

🔗 datasets 👉 Google Drive Link

After downloading, place the dataset in:

/data/

2️⃣ Download Matterport3D Scene Data

Download the Matterport3D dataset from the official website:

🔗 https://niessner.github.io/Matterport/

You may need to apply for dataset access first.

Place the extracted scene data under:

data/scene_datasets/mp3d/

Each scene directory should follow the structure below.


3️⃣ Expected Directory Structure

After completing the above steps, your directory layout should look like this:

data/
├── datasets/
│   └── R2R_VLNCE_v1-2_preprocessed/
│   └── R2R_VLNCE_v1-2_preprocessed_BERTidx/
├── pretrained_models/
|   └── ddppo-models
└── scene_datasets/
    └── mp3d/
        └── {scene_id}/
            ├── {scene_id}.glb
            ├── {scene_id}_semantic.ply
            ├── {scene_id}.house
            └── {scene_id}.navmesh

📌 Note:

📥 Pretrained Checkpoints

Download and place the following pretrained models:

🧩 Third party

  • Recognize Anything — Please install this dependency before using the navigation module.

    → Save to the root repo directory /

🚀 Run the repo

To run:

bash eval.sh

🙏 Acknowledgements

We thank Discrete-Continuous-VLN (adapt from this repo), Dinov2, OpenNav, Recognize-Anything, VLN-CE, MapGPT, and Waypoint Predictor for their inspiring work.

About

[IROS 2025] Official implementation of SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors