This repository is the official implementation for our EMNLP 2025 paper: MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval. Our paper tackles the DocQA task by addressing the limitations of prior methods that rely only on semantic relevance for retrieval. By incorporating logical relevance, our VLM-powered retrieval engine performs multi-hop reasoning over page graph to identify key pages.
Please consider citing or giving a 🌟 if our repository is helpful to your work!
@inproceedings{wu2025molorag
title={MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval},
author={Xixi Wu and Yanchao Tan and Nan Hou and Ruiyang Zhang and Hong Cheng},
year={2025},
booktitle={The 2025 Conference on Empirical Methods in Natural Language Processing},
url={https://arxiv.org/abs/2509.07666},
}
🎉 [2025-08-24] Our paper is accepted to EMNLP 2025. The camera ready paper and fully reviewed codes will be released soon!
Full datasets are available at HuggingFace:
huggingface-cli download --repo-type dataset xxwu/MoLoRAG --local-dir ./dataset/The full package versions can be found in
env/main.txtandenv/qwenvl.txt, respectively. Please refer to these files for detailed package versions.
For Qwen2.5-VL-series models:
transformers==4.50.0.dev0
xformers==0.0.29.post3
torch==2.6.0
qwen-vl-utils==0.0.8For remaining LVLMs, VLM retrieve, and LLM baselines:
transformers==4.47.1
torch==2.5.1
colpali_engine==0.3.8
colbert-ai==0.2.21
langchain==0.3.19
langchain-community==0.3.18
langchain-core==0.3.37
langchain-text-splitters==0.3.6
PyMuPDF==1.25.3
pypdf==5.3.0
pypdfium2==4.30.1
pdf2image==1.17.0We release our fine-tuned VLM retriever, MoLoRAG-3B, based on the Qwen2.5-VL-3B, at HuggingFace:
huggingface-cli download xxwu/MoLoRAG-QwenVL-3BThe training data for fine-tuning this retriever to enable its logic-aware ability is available at HuggingFace. The data generation pipeline is available at VLMRetriever/data_collection.py.
Before running the code, please check if you need to fill in the API Keys or prepare the model/data
Codes and commands are available in the LLMBaseline directory.
Step 0 - Prepare the retrieved contents following commands in VLMRetriever
Step 1 - Make predictions following commands in example_run.sh
Step 2 - Evaluate the inference following commands in example_run_eval.sh
- Provide tailored MDocAgent code
- Provide detailed scripts or running tutorials
If you have any questions about usage, reproducibility, or would like to discuss, please feel free to open an issue on GitHub or contact the authors via email at [email protected]
We thank the open-sourced datasets, MMLongBench, LongDocURL, UDA-Benchmark. We also appreciate the official implementations of M3DocRAG and MDocAgent.