PyTorch implementation of GOMAA-Geo: GOal Modality Agnostic Active Geo-localization (NeurIPS 2024)
Anindya Sarkar*, Srikumar Sastry*, Aleksis Pirinen, Chongjie Zhang, Nathan Jacobs, Yevgeniy Vorobeychik (*Corresponding Author)
This repository is the official implementation of the NeurIPS 2024 paper GOMAA-Geo, a goal modality agnostic active geo-localization agent that can geo-localize a goal location -- specified as an aerial patch, ground-level image, or textual description -- by navigating partially observed aerial imagery.
- Update Gradio demo
- Release Models to 🤗 HuggingFace
- Release PyTorch
ckptfiles for all models
You can use the following commands to install the necessary dependencies to run the code:
conda create --name gomaa_geo
conda activate gomaa_geo
conda install python==3.11
pip install -r requirements.txtTo run the code with the Masa or xBD data, download the zip file at the following link: https://www.kaggle.com/datasets/balraj98/massachusetts-buildings-dataset To run the code with the xBD data, download the zip file at the following link: https://xview2.org/ (Note that, In order to download the dataset, first login using a valid email id) To run the code with our MM-GAG data, download the zip file at the following Anonymous Huggingface link: https://huggingface.co/datasets/Gomaa-Geo/MM-GAG
The uncompressed folder named data should be placed at the root directory of this repository.
This folder includes processed data for the following active geo localization problems:
- Masa dataset in
masa_data, xBD dataset in 'xBD_data'
This folder also includes other intermediate results to recreate our analyses and figures.
Before setting up data or running experiments, setup all parameters of interest in the file config.py. This includes grid size, model configuration, training configuration etc.
Extract all data of interest in the folder gomaa_geo/data
Then, create patches (grids) for each image in a dataset using the following script:
python -m gamma_geo.data_utils.get_patchesThen get CLIP-MMFE embeddings for each patch:
python -m gamma_geo.data_utils.get_sat_embeddings_sat2capUsing the same script and function get_ground_embeds one can create embeddings for ground level images.
To create text embeddings, run the following script:
python -m gamma_geo.data_utils.get_text_embeddingsTo run our pre-training procedure with GOMAA-Geo, use the following commands:
python -m gomaa_geo.pretrainAgain, all parameters of interest must be specified in the config.py file.
The weights of the trained llm network at each iteration will be locally saved in gomaa_geo/checkpoint/
To run training of the pre-trained model, use the following command:
python -m gomaa_geo.trainTo run inference, run the following command:
python -m gomaa_geo.validateSet the path to the pre-trained llm in the variable: cfg.train.llm_checkpoint
To visualize exploration behaviour of the trained model, run the following script:
python -m gomaa_geo.viz_path --idx=77 --start=0 --end=24where, idx is image id, start is the starting position and end is the goal position.
Download GOMAA-Geo models from the links below: Coming Soon ...
@article{sarkar2024gomaa,
title={GOMAA-Geo: GOal Modality Agnostic Active Geo-localization},
author={Sarkar, Anindya and Sastry, Srikumar and Pirinen, Aleksis and Zhang, Chongjie and Jacobs, Nathan and Vorobeychik, Yevgeniy},
journal={arXiv preprint arXiv:2406.01917},
year={2024}
}Check out our lab website for other interesting works on geospatial understanding and mapping:

