Code for the paper: RARe - Retrieval Augmented Retrieval With In-Context Examples.
We present an approach that finetunes models with semantically similar in-context examples to boost retrieval performance.
sh setup.shDownloading supervised training data from the Echo Embeddings Repository for training from retriever checkpoint experiment.
cd data
wget "https://drive.usercontent.google.com/download?id=1YqgaJIzmBIH37XBxpRPCVzV_CLh6aOI4&export=download"
tar -xvf echo-data.tar
rm -r echo-data.tar
cd ../Preprocessing RAR-b benchmark for evaluation
cd misc_code
python process_rarb.py
cd ../Configuration files are provided in llm2vec/train_configs/supervised. We mainly use E5-Instruct.json and MetaLlama3-Supervised.json.
cd LLM2Vec
sh run.shTrains E5-Mistral-7B-Instruct with 5 in-context examples using retrieved examples with BM25.
cd tevatron
sh run.shTrains Llama-3.1-8B-Instruct with 5 in-context examples using retrieved examples with BM25.
The run_eval.sh script in the misc_code/ folder provides an example of running evaluation with 5 in-context examples.
cd misc_code
sh run_eval.shYou may need to modify e5_models.py llm2vec_models.py and repllama_models.py in mteb/models to include the paths to newly trained models. Examples are provided in each of these files.
If our work was helpful in your research, please kindly cite us as follows:
@misc{tejaswi2024rare,
title={RARe: Retrieval Augmented Retrieval with In-Context Examples},
author={Atula Tejaswi and Yoonsang Lee and Sujay Sanghavi and Eunsol Choi},
year={2024},
eprint={2410.20088},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.20088},
}