Memory Embodied Question Answering

Official implementation of the paper "Memory-Centric Embodied Question Answering".

In this paper, we propose a memory-centric EQA framework named MemoryEQA. Unlike planner-centric EQA models where the memory module cannot fully interact with other modules, MemoryEQA flexible feeds memory information into all modules, thereby enhancing efficiency and accuracy in handling complex tasks, such as those involving multiple targets across different regions. Specifically, we establish a multi-modal hierarchical memory mechanism, which is divided into global memory that stores language-enhanced scene maps, and local memory that retains historical observations and state information. When performing EQA tasks, the multi-modal large language model is leveraged to convert memory information into the required input formats for injection into different modules. To evaluate EQA models' memory capabilities, we constructed the MT-HM3D dataset based on HM3D, comprising 1,587 question-answer pairs involving multiple targets across various regions, which requires agents to maintain memory of exploration-acquired target information.

Paper, Project

Installation

Set up the conda environment (Linux, Python 3.9):

conda env create -f environment.yml
conda activate memory-eqa
pip install -e .

Install the latest version of Habitat-Sim (headless with no Bullet physics) with:

conda install habitat-sim headless -c conda-forge -c aihabitat

Install flash-attention2:

pip install flash-attn --no-build-isolation

Install faiss-gpu

conda install -c conda-forge faiss-gpu

Install transformers for qwenvl

pip install git+https://github.com/huggingface/transformers
pip install qwen-vl-utils

Install AutoGPTQ

git clone https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
pip install -vvv --no-build-isolation -e .

Dataset

Huggingface: link
Baidu Cloud: coming soon
Google Drive: coming soon

Download MT-HM3D, and file structure is as follow:

MemoryEQA
└─ data
    └─ MT-HM3D

Inference on MT-HM3D

sh scripts/run_memory_eqa.sh

Inference on Go2

Complete EQA tasks in real scenarios using Untree Go2.

Unitree Go2

Your first need install go2_dashboard on Unitree Go2.

cd go2_dashboard
python app.py

Server

python server_wrapper/go2_flask.py # launch go2 interface
python server_wrapper/vlm_flask.py # launch qwen2vl server
sh scripts/run_vlm_real.sh

Results

Experiments on multiple methods across MT-HM3D, HM-EQA, and OpenEQA, testing various foundational models. $^\dagger$ denotes that the value was converted based on the metrics provided in the original paper. $\ddagger$ indicates that the performance was obtained through our replication.

On MT-HM3D, MemoryEQA attains a success rate of 54.5%, outperforming baseline by 18.9% (Exp.3), highlighting the critical role of hierarchical memory in multi-target tasks. The results demonstrate that MemoryEQA exhibits superior performance in multi-modal reasoning tasks, particularly in complex scene understanding and knowledge integration.

Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝 (´▽`ʃ♡ƪ)

@inproceedings{gao2025multi,
    title={Memory-Centric Embodied Question Answering},
    author={Zhai, Mingliang and Gao, Zhi and Wu, Yuwei and Jia, Yunde},
    booktitle={arXiv preprint arXiv:2505.13948}
}

Acknowledgements

Our project is built upon Explore-EQA, leveraging their robust codebases and the exceptional language capabilities of base model.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
CLIP		CLIP
assets		assets
cfg		cfg
evaluation		evaluation
scripts		scripts
server_wrapper		server_wrapper
src		src
test		test
tools		tools
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Memory Embodied Question Answering

Installation

Dataset

Inference on MT-HM3D

Inference on Go2

Unitree Go2

Server

Results

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

memory-eqa/MemoryEQA

Folders and files

Latest commit

History

Repository files navigation

Memory Embodied Question Answering

Installation

Dataset

Inference on MT-HM3D

Inference on Go2

Unitree Go2

Server

Results

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages