Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments (TPAMI 2025)

Kehan Chen*; Dong An*; Yan Huang; Rongtao Xu; Yifei Su; Yonggen Ling; Ian Reid; Liang Wang+;

[Paper] & [Project] [Real Robot Deployment]

We address the task of Vision-Language Navigation in Continuous Environments (VLN-CE) under the zero-shot setting. Zero-shot VLN-CE is particularly challenging due to the absence of expert demonstrations for training and minimal environment structural prior to guide navigation. To confront these challenges, we propose a Constraint-Aware Navigator (CA-Nav), which reframes zero-shot VLN-CE as a sequential, constraint aware sub-instruction completion process. CA-Nav continuously translates sub-instructions into navigation plans using two core modules: the Constraint-Aware Sub-instruction Manager (CSM) and the Constraint-Aware Value Mapper (CVM). CSM defines the completion criteria for decomposed sub-instructions as constraints and tracks navigation progress by switching sub-instructions in a constraint-aware manner. CVM, guided by CSM’s constraints, generates a value map on the fly and refines it using superpixel clustering to improve navigation stability. CA-Nav achieves the state-of-the-art performance on two VLN-CE benchmarks, surpassing the previous best method by 12% and 13% in Success Rate on the validation unseen splits of R2R-CE and RxR-CE, respectively. Moreover, CA-Nav demonstrates its effectiveness in real-world robot deployments across various indoor scenes and instructions.

Setup

Installation

Create a virtual environment. We develop this project with Python 3.8:
```
conda create -n CA-Nav python==3.8
conda activate CA-Nav
```

Install habitat-sim-v0.1.7 for a machine with multiple GPUs or without an attached display (i.e. a cluster):

git clone https://github.com/facebookresearch/habitat-sim.git
cd habitat-sim
git checkout tags/v0.1.7
pip install -r requirements.txt
python setup.py install --headless

Install habitat-lab-v0.1.7:

git clone https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab
git checkout tags/v0.1.7
cd habitat_baselines/rl
vi requirements.txt # delete tensorflow==1.13.1
cd ../../ # (return to habitat-lab direction)

pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html

pip install -r requirements.txt
python setup.py develop --all # install habitat and habitat_baselines; If the installation fails, try again, most of the time it is due to network problems

If you encounter some problems and failed to install habitat, please try to follow the Official Habitat Installation Guide to install habitat-lab and habitat-sim. We use version v0.1.7 in our experiments, same as in the VLN-CE, please refer to the VLN-CE page for more details.

Install Grounded-SAM and refine its phrases2classes function

git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO
git checkout -q 57535c5a79791cb76e36fdb64975271354f10251
pip install -q -e .
pip install 'git+https://github.com/facebookresearch/segment-anything.git'

ATTENTION: We found that optimizing the phrase-to-class mapping logic in Grounded-SAM using minimum edit distance leads to more stable prediction outputs.

cd <YOUR PATH>/GroundingDINO/groundingdino/util/inference.py
pip install nltk

Find and comment the raw phrase2class function in Line 235, then write the refined version:

# @staticmethod
# def phrases2classes(phrases: List[str], classes: List[str]) -> np.ndarray:
#     class_ids = []
#     for phrase in phrases:
#         try:
#             class_ids.append(classes.index(phrase))
#         except ValueError:
#             class_ids.append(None)
#     return np.array(class_ids)

from nltk.metrics import edit_distance
@staticmethod
def phrases2classes(phrases: List[str], classes: List[str]) -> np.ndarray:
    class_ids = []
    for phrase in phrases:
        if phrase in classes:
            class_ids.append(classes.index(phrase))
        else:
            distances = np.array([edit_distance(phrase, class_id) for class_id in classes])
            idx = np.argmin(distances)
            class_ids.append(idx)
    return np.array(class_ids)

Install other requirements

git clone https://github.com/Chenkehan21/CA-Nav-code.git
cd CA-Nav-code
pip install requirements.txt
pip install requirements2.txt

Datasets

R2R-CE
- Instructions: Download the R2R_VLNCE_v1-3_preprocessed instructions from VLN-CE:
- Scenes: Matterport3D (MP3D) scene reconstructions are used. The official Matterport3D download script (download_mp.py) can be accessed by following the instructions on their project webpage. The scene data can then be downloaded:
```
# requires running with python 2.7
python download_mp.py --task habitat -o data/scene_datasets/mp3d/
```
Extract such that it has the form scene_datasets/mp3d/{scene}/{scene}.glb. There should be 90 scenes. Place the scene_datasets folder in data/.
CA-Nav LLM Replys / BLIP2-ITM / BLIP2-VQA / Grounded-SAM

Download from CA-Nav-Google-Drive

Overall, datas are organized as follows:

CA-Nav-code
├── data
│   ├── blip2
│   ├── datasets
│       ├── LLM_REPLYS_VAL_UNSEEN
│       ├── R2R_VLNCE_v1-3_preprocessed
│   ├── grounded_sam
│   ├── logs
│   ├── scene_datasets
│   └── vqa
└── ...

Running

cd CA-NAV-code
sh run_r2r/main.sh

Contact Information

Acknowledge

Our implementations are partially inspired by SemExp and ETPNav. Thanks for their great works!

Citation

If you find this repository is useful, please consider citing our paper:

@String(TPAMI = {IEEE Trans. Pattern Anal. Mach. Intell.})

@article{chen2025canav,
  title={Constraint-aware zero-shot vision-language navigation in continuous environments},
  author={Chen, Kehan and An, Dong and Huang, Yan and Xu, Rongtao and Su, Yifei and Ling, Yonggen and Reid, Ian and Wang, Liang},
  journal=TPAMI,
  year={2025},
  volume={47},
  number={11},
  pages={10441--10456}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
habitat_extensions		habitat_extensions
run_r2r		run_r2r
vlnce_baselines		vlnce_baselines
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
requirements2.txt		requirements2.txt
run_mp.py		run_mp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments (TPAMI 2025)

[Paper] & [Project] [Real Robot Deployment]

Setup

Installation

Datasets

Running

Contact Information

Acknowledge

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments (TPAMI 2025)

[Paper] & [Project] [Real Robot Deployment]

Setup

Installation

Datasets

Running

Contact Information

Acknowledge

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages