Skip to content

[EMNLP 2025] Official implementation for paper "MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval"

Notifications You must be signed in to change notification settings

WxxShirley/MoLoRAG

Repository files navigation

MoLoRAG

This repository is the official implementation for our EMNLP 2025 paper: MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval. Our paper tackles the DocQA task by addressing the limitations of prior methods that rely only on semantic relevance for retrieval. By incorporating logical relevance, our VLM-powered retrieval engine performs multi-hop reasoning over page graph to identify key pages.

Please consider citing or giving a 🌟 if our repository is helpful to your work!

@inproceedings{wu2025molorag
   title={MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval},
   author={Xixi Wu and Yanchao Tan and Nan Hou and Ruiyang Zhang and Hong Cheng},
   year={2025},
   booktitle={The 2025 Conference on Empirical Methods in Natural Language Processing},
   url={https://arxiv.org/abs/2509.07666},
}

🎙️ News

🎉 [2025-08-24] Our paper is accepted to EMNLP 2025. The camera ready paper and fully reviewed codes will be released soon!


📋 Table of Contents

📚 Dataset

Full datasets are available at HuggingFace:

huggingface-cli download --repo-type dataset xxwu/MoLoRAG --local-dir ./dataset/

🔧 Environment

The full package versions can be found in env/main.txt and env/qwenvl.txt, respectively. Please refer to these files for detailed package versions.

For Qwen2.5-VL-series models:

transformers==4.50.0.dev0
xformers==0.0.29.post3
torch==2.6.0
qwen-vl-utils==0.0.8

For remaining LVLMs, VLM retrieve, and LLM baselines:

transformers==4.47.1
torch==2.5.1
colpali_engine==0.3.8
colbert-ai==0.2.21
langchain==0.3.19
langchain-community==0.3.18
langchain-core==0.3.37
langchain-text-splitters==0.3.6
PyMuPDF==1.25.3
pypdf==5.3.0
pypdfium2==4.30.1
pdf2image==1.17.0

🤗 Model

We release our fine-tuned VLM retriever, MoLoRAG-3B, based on the Qwen2.5-VL-3B, at HuggingFace:

huggingface-cli download xxwu/MoLoRAG-QwenVL-3B

The training data for fine-tuning this retriever to enable its logic-aware ability is available at HuggingFace. The data generation pipeline is available at VLMRetriever/data_collection.py.

🚀 Run

Before running the code, please check if you need to fill in the API Keys or prepare the model/data

LLM Baselines

Codes and commands are available in the LLMBaseline directory.

LVLM Baselines

Step 0 - Prepare the retrieved contents following commands in VLMRetriever

Step 1 - Make predictions following commands in example_run.sh

Step 2 - Evaluate the inference following commands in example_run_eval.sh

✏️ TODO

  • Provide tailored MDocAgent code
  • Provide detailed scripts or running tutorials

📮 Contact

If you have any questions about usage, reproducibility, or would like to discuss, please feel free to open an issue on GitHub or contact the authors via email at [email protected]

🙏 Acknowledgements

We thank the open-sourced datasets, MMLongBench, LongDocURL, UDA-Benchmark. We also appreciate the official implementations of M3DocRAG and MDocAgent.

About

[EMNLP 2025] Official implementation for paper "MoLoRAG: Bootstrapping Document Understanding via Multi-modal Logic-aware Retrieval"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published