🚀 SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation
Official Implementation of IROS 2025 Paper
SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation
Xiangyu Shi*, Zerui Li*, Wenqi Lyu, Jiatong Xia, Feras Dayoub,
Yanyuan Qiao§, Qi Wu†
Vision-and-Language Navigation (VLN) in continuous environments requires agents to interpret natural language instructions while navigating unconstrained 3D spaces. Existing VLN-CE frameworks rely on a two-stage approach: a waypoint predictor to generate waypoints and a navigator to execute movements. However, current waypoint predictors struggle with spatial awareness, while navigators lack historical reasoning and backtracking capabilities, limiting adaptability.
We propose a zero-shot VLN-CE framework integrating an enhanced waypoint predictor with a Multi-modal Large Language Model (MLLM)-based navigator. Our predictor employs a stronger vision encoder, masked cross-attention fusion, and an occupancyaware loss for better waypoint quality. The navigator incorporates history-aware reasoning and adaptive path planning with backtracking, improving robustness. Experiments on R2R-CE and MP3D benchmarks show our method achieves state-of-theart (SOTA) performance in zero-shot settings, demonstrating competitive results compared to fully supervised methods.
🤖 We deploy our method on a TurtleBot 4 equipped with an OAK-D Pro camera, demonstrating its adaptability through real-world validation.
Figure: SmartWay architecture, combining enhanced waypoint prediction and backtracking.
# conda install
conda create -n smartway python==3.8.20
conda activate smartway
# pytorch
pip install torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 --index-url https://download.pytorch.org/whl/cu121
# habitat-sim
wget https://anaconda.org/aihabitat/habitat-sim/0.1.7/download/linux-64/habitat-sim-0.1.7-py3.8_headless_linux_856d4b08c1a2632626bf0d205bf46471a99502b7.tar.bz2
conda install habitat-sim-0.1.7-py3.8_headless_linux_856d4b08c1a2632626bf0d205bf46471a99502b7.tar.bz2
# habitat-lab
git clone --branch v0.1.7 [email protected]:facebookresearch/habitat-lab.git
cd habitat-lab
python setup.py develop --all # install habitat and habitat_baselines
cd ..
# Adapt requirements.txt from Discrete-Continuous-VLN repo
python -m pip install -r requirements.txt
pip install webdataset
pip install openai tenacity timm fairscaleThis project builds upon Discrete-Continuous-VLN. Please refer to their repository to set up the required conda environment and dependencies. Smartway running under Python3.8.20.
ℹ️ Note: All experiments are conducted using Habitat v0.1.7.
This project relies on R2R VLN-CE data and Matterport3D scene assets. Please follow the steps below to prepare the dataset correctly.
Download the preprocessed R2R VLN-CE dataset from the link below:
🔗 datasets 👉 Google Drive Link
After downloading, place the dataset in:
/data/Download the Matterport3D dataset from the official website:
You may need to apply for dataset access first.
Place the extracted scene data under:
data/scene_datasets/mp3d/Each scene directory should follow the structure below.
After completing the above steps, your directory layout should look like this:
data/
├── datasets/
│ └── R2R_VLNCE_v1-2_preprocessed/
│ └── R2R_VLNCE_v1-2_preprocessed_BERTidx/
├── pretrained_models/
| └── ddppo-models
└── scene_datasets/
└── mp3d/
└── {scene_id}/
├── {scene_id}.glb
├── {scene_id}_semantic.ply
├── {scene_id}.house
└── {scene_id}.navmesh
📌 Note:
-
{scene_id}refers to a Matterport3D scene identifier (e.g.,17DRP5sb8fy). -
For more details on dataset preparation and environment setup, please refer to:
🔗 Open-Nav Repository 👉 https://github.com/YanyuanQiao/Open-Nav
🔗 DC-VLN Repository 👉 https://github.com/YicongHong/Discrete-Continuous-VLN
Download and place the following pretrained models:
-
Waypoint Predictor → Save to
waypoint_predictor/checkpoints/ -
Depth Encoder (ResNet-50, gibson-2plus) → Save to
data/pretrained_models/ddppo-models/ -
RAM+ (ram_plus_swin_large_14m.pth) → Save to the root repo directory
/
-
Recognize Anything — Please install this dependency before using the navigation module.
→ Save to the root repo directory
/
To run:
bash eval.shWe thank Discrete-Continuous-VLN (adapt from this repo), Dinov2, OpenNav, Recognize-Anything, VLN-CE, MapGPT, and Waypoint Predictor for their inspiring work.
