Skip to content

atutej/RARe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RARe - Retrieval Augmented Retrieval With In-Context Examples

1. Overview

Code for the paper: RARe - Retrieval Augmented Retrieval With In-Context Examples.

We present an approach that finetunes models with semantically similar in-context examples to boost retrieval performance.

2. Setup

2.2 Installation

sh setup.sh

2.3 Preprocessing

Downloading supervised training data from the Echo Embeddings Repository for training from retriever checkpoint experiment.

cd data
wget "https://drive.usercontent.google.com/download?id=1YqgaJIzmBIH37XBxpRPCVzV_CLh6aOI4&export=download"
tar -xvf echo-data.tar
rm -r echo-data.tar
cd ../

Preprocessing RAR-b benchmark for evaluation

cd misc_code
python process_rarb.py
cd ../

3. Running Experiments

3.1 Training

3.1.1 Training from Retreiver Checkpoint

Configuration files are provided in llm2vec/train_configs/supervised. We mainly use E5-Instruct.json and MetaLlama3-Supervised.json.

cd LLM2Vec
sh run.sh

Trains E5-Mistral-7B-Instruct with 5 in-context examples using retrieved examples with BM25.

3.1.2 Training from LLM (Decoder-Only) Checkpoint

cd tevatron
sh run.sh

Trains Llama-3.1-8B-Instruct with 5 in-context examples using retrieved examples with BM25.

3.2 Evaluation

The run_eval.sh script in the misc_code/ folder provides an example of running evaluation with 5 in-context examples.

cd misc_code
sh run_eval.sh

You may need to modify e5_models.py llm2vec_models.py and repllama_models.py in mteb/models to include the paths to newly trained models. Examples are provided in each of these files.

Some of the code was forked from the following repositories

Cite

If our work was helpful in your research, please kindly cite us as follows:

@misc{tejaswi2024rare,
      title={RARe: Retrieval Augmented Retrieval with In-Context Examples}, 
      author={Atula Tejaswi and Yoonsang Lee and Sujay Sanghavi and Eunsol Choi},
      year={2024},
      eprint={2410.20088},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.20088}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published