Skip to content

Chenkehan21/CA-Nav-code

Repository files navigation

Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments (TPAMI 2025)

We address the task of Vision-Language Navigation in Continuous Environments (VLN-CE) under the zero-shot setting. Zero-shot VLN-CE is particularly challenging due to the absence of expert demonstrations for training and minimal environment structural prior to guide navigation. To confront these challenges, we propose a Constraint-Aware Navigator (CA-Nav), which reframes zero-shot VLN-CE as a sequential, constraint aware sub-instruction completion process. CA-Nav continuously translates sub-instructions into navigation plans using two core modules: the Constraint-Aware Sub-instruction Manager (CSM) and the Constraint-Aware Value Mapper (CVM). CSM defines the completion criteria for decomposed sub-instructions as constraints and tracks navigation progress by switching sub-instructions in a constraint-aware manner. CVM, guided by CSM’s constraints, generates a value map on the fly and refines it using superpixel clustering to improve navigation stability. CA-Nav achieves the state-of-the-art performance on two VLN-CE benchmarks, surpassing the previous best method by 12% and 13% in Success Rate on the validation unseen splits of R2R-CE and RxR-CE, respectively. Moreover, CA-Nav demonstrates its effectiveness in real-world robot deployments across various indoor scenes and instructions.

Setup

Installation

  1. Create a virtual environment. We develop this project with Python 3.8:

    conda create -n CA-Nav python==3.8
    conda activate CA-Nav
  2. Install habitat-sim-v0.1.7 for a machine with multiple GPUs or without an attached display (i.e. a cluster):

    git clone https://github.com/facebookresearch/habitat-sim.git
    cd habitat-sim
    git checkout tags/v0.1.7
    pip install -r requirements.txt
    python setup.py install --headless
  3. Install habitat-lab-v0.1.7:

    git clone https://github.com/facebookresearch/habitat-lab.git
    cd habitat-lab
    git checkout tags/v0.1.7
    cd habitat_baselines/rl
    vi requirements.txt # delete tensorflow==1.13.1
    cd ../../ # (return to habitat-lab direction)
    
    pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
    
    pip install -r requirements.txt
    python setup.py develop --all # install habitat and habitat_baselines; If the installation fails, try again, most of the time it is due to network problems

    If you encounter some problems and failed to install habitat, please try to follow the Official Habitat Installation Guide to install habitat-lab and habitat-sim. We use version v0.1.7 in our experiments, same as in the VLN-CE, please refer to the VLN-CE page for more details.

  4. Install Grounded-SAM and refine its phrases2classes function

    git clone https://github.com/IDEA-Research/GroundingDINO.git
    cd GroundingDINO
    git checkout -q 57535c5a79791cb76e36fdb64975271354f10251
    pip install -q -e .
    pip install 'git+https://github.com/facebookresearch/segment-anything.git'

    ATTENTION: We found that optimizing the phrase-to-class mapping logic in Grounded-SAM using minimum edit distance leads to more stable prediction outputs.

    cd <YOUR PATH>/GroundingDINO/groundingdino/util/inference.py
    pip install nltk

    Find and comment the raw phrase2class function in Line 235, then write the refined version:

    # @staticmethod
    # def phrases2classes(phrases: List[str], classes: List[str]) -> np.ndarray:
    #     class_ids = []
    #     for phrase in phrases:
    #         try:
    #             class_ids.append(classes.index(phrase))
    #         except ValueError:
    #             class_ids.append(None)
    #     return np.array(class_ids)
    
    from nltk.metrics import edit_distance
    @staticmethod
    def phrases2classes(phrases: List[str], classes: List[str]) -> np.ndarray:
        class_ids = []
        for phrase in phrases:
            if phrase in classes:
                class_ids.append(classes.index(phrase))
            else:
                distances = np.array([edit_distance(phrase, class_id) for class_id in classes])
                idx = np.argmin(distances)
                class_ids.append(idx)
        return np.array(class_ids)
  5. Install other requirements

    git clone https://github.com/Chenkehan21/CA-Nav-code.git
    cd CA-Nav-code
    pip install requirements.txt
    pip install requirements2.txt

Datasets

  1. R2R-CE

    • Instructions: Download the R2R_VLNCE_v1-3_preprocessed instructions from VLN-CE:

    • Scenes: Matterport3D (MP3D) scene reconstructions are used. The official Matterport3D download script (download_mp.py) can be accessed by following the instructions on their project webpage. The scene data can then be downloaded:

    # requires running with python 2.7
    python download_mp.py --task habitat -o data/scene_datasets/mp3d/

    Extract such that it has the form scene_datasets/mp3d/{scene}/{scene}.glb. There should be 90 scenes. Place the scene_datasets folder in data/.

  2. CA-Nav LLM Replys / BLIP2-ITM / BLIP2-VQA / Grounded-SAM

    Download from CA-Nav-Google-Drive

Overall, datas are organized as follows:

CA-Nav-code
├── data
│   ├── blip2
│   ├── datasets
│       ├── LLM_REPLYS_VAL_UNSEEN
│       ├── R2R_VLNCE_v1-3_preprocessed
│   ├── grounded_sam
│   ├── logs
│   ├── scene_datasets
│   └── vqa
└── ...

Running

cd CA-NAV-code
sh run_r2r/main.sh

Contact Information

Acknowledge

Our implementations are partially inspired by SemExp and ETPNav. Thanks for their great works!

Citation

If you find this repository is useful, please consider citing our paper:

@String(TPAMI = {IEEE Trans. Pattern Anal. Mach. Intell.})

@article{chen2025canav,
  title={Constraint-aware zero-shot vision-language navigation in continuous environments},
  author={Chen, Kehan and An, Dong and Huang, Yan and Xu, Rongtao and Su, Yifei and Ling, Yonggen and Reid, Ian and Wang, Liang},
  journal=TPAMI,
  year={2025},
  volume={47},
  number={11},
  pages={10441--10456}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors