Evaluating Large Language Models for Cross-Lingual Information Retrieval

This repository contains the code, experimental framework, and evaluation scripts to reproduce the results of the paper Evaluating Large Language Models for Cross-Lingual Retrieval.

Directory Structure

retrieval/: First-stage retrieval (e.g., BM25)
reranking/: LLM-based second-stage reranking (listwise & pairwise)

Data Loading

For our work, we evaluated CLEF2003 as a corpus for high-resource European languages, and CIRAL as a corpus for low-resource African languages.

To download CLEF2003, you first need to install clef-dataloaders. Follow the setup instructions in the repository, and run pip install -e . inside the directory where you extracted clef-dataloaders. You can then use python download_data.py to download both CIRAL and CLEF2003. The script preprocesses the data, so the format is compatible with all evaluation code inside this repository. By default, the script downloads both datasets, but you can choose to download only a specific dataset by passing the argumet --dataset <clef/ciral>.

You can also manually download the CIRAL queries/qrels from here. For queries, we used the -test-a.tsv files. For the qrels, we used the -test-a-pools.tsv files. The CIRAL corpus files can be downloaded from here.

Running Experiments

1. Environment Setup

Before running any scripts, you need to configure the .env file in the project root with the correct local paths. All provided scripts automatically load .env from the repository root.

2. First-stage Retrieval

Refer to /retrieval/README.md for generating initial candidates using BM25 or bi-encoder models.

3. Second-stage Reranking

You can use either listwise or pairwise reranking. Refer to /reranking/README.md for details.

Evaluation

Run the provided script build_score_table.py to evaluate and build final retrieval or reranking results table.

python build_score_table.py \
  --stage <retrieval|reranking> \
  --dataset <clef|ciral> \
  --approach <listwise|pairwise>

--stage: Stage of experiments, either retrieval or reranking
--dataset: Dataset to evaluate, either clef or ciral
--approach: Reranking method, either listwise or pairwise. Only required when --stage=reranking

After evaluation, a significance test is run automatically, reranking results significantly different from retrieval (paired t-test, p < 0.05) are marked with *.

Citation

If you find this paper useful, please cite:

@inproceedings{zuo-etal-2025-evaluating,
    title = "Evaluating Large Language Models for Cross-Lingual Retrieval",
    author = "Zuo, Longfei and Hong, Pingjun and Kraus, Oliver and Plank, Barbara and Litschko, Robert",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.612/",
    pages = "11415--11429",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluating Large Language Models for Cross-Lingual Information Retrieval

Directory Structure

Data Loading

Running Experiments

1. Environment Setup

2. First-stage Retrieval

3. Second-stage Reranking

Evaluation

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
reranking		reranking
retrieval		retrieval
README.md		README.md
build_score_table.py		build_score_table.py
download_data.py		download_data.py

mainlp/llm-clir

Folders and files

Latest commit

History

Repository files navigation

Evaluating Large Language Models for Cross-Lingual Information Retrieval

Directory Structure

Data Loading

Running Experiments

1. Environment Setup

2. First-stage Retrieval

3. Second-stage Reranking

Evaluation

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages