RARe - Retrieval Augmented Retrieval With In-Context Examples

1. Overview

Code for the paper: RARe - Retrieval Augmented Retrieval With In-Context Examples.

We present an approach that finetunes models with semantically similar in-context examples to boost retrieval performance.

2. Setup

2.2 Installation

sh setup.sh

2.3 Preprocessing

Downloading supervised training data from the Echo Embeddings Repository for training from retriever checkpoint experiment.

cd data
wget "https://drive.usercontent.google.com/download?id=1YqgaJIzmBIH37XBxpRPCVzV_CLh6aOI4&export=download"
tar -xvf echo-data.tar
rm -r echo-data.tar
cd ../

Preprocessing RAR-b benchmark for evaluation

cd misc_code
python process_rarb.py
cd ../

3. Running Experiments

3.1 Training

3.1.1 Training from Retreiver Checkpoint

Configuration files are provided in llm2vec/train_configs/supervised. We mainly use E5-Instruct.json and MetaLlama3-Supervised.json.

cd LLM2Vec
sh run.sh

Trains E5-Mistral-7B-Instruct with 5 in-context examples using retrieved examples with BM25.

3.1.2 Training from LLM (Decoder-Only) Checkpoint

cd tevatron
sh run.sh

Trains Llama-3.1-8B-Instruct with 5 in-context examples using retrieved examples with BM25.

3.2 Evaluation

The run_eval.sh script in the misc_code/ folder provides an example of running evaluation with 5 in-context examples.

cd misc_code
sh run_eval.sh

You may need to modify e5_models.py llm2vec_models.py and repllama_models.py in mteb/models to include the paths to newly trained models. Examples are provided in each of these files.

Some of the code was forked from the following repositories

Cite

If our work was helpful in your research, please kindly cite us as follows:

@misc{tejaswi2024rare,
      title={RARe: Retrieval Augmented Retrieval with In-Context Examples}, 
      author={Atula Tejaswi and Yoonsang Lee and Sujay Sanghavi and Eunsol Choi},
      year={2024},
      eprint={2410.20088},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.20088}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
llm2vec		llm2vec
misc_code		misc_code
mteb		mteb
tevatron		tevatron
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RARe - Retrieval Augmented Retrieval With In-Context Examples

1. Overview

2. Setup

2.2 Installation

2.3 Preprocessing

3. Running Experiments

3.1 Training

3.1.1 Training from Retreiver Checkpoint

3.1.2 Training from LLM (Decoder-Only) Checkpoint

3.2 Evaluation

Some of the code was forked from the following repositories

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Languages

atutej/RARe

Folders and files

Latest commit

History

Repository files navigation

RARe - Retrieval Augmented Retrieval With In-Context Examples

1. Overview

2. Setup

2.2 Installation

2.3 Preprocessing

3. Running Experiments

3.1 Training

3.1.1 Training from Retreiver Checkpoint

3.1.2 Training from LLM (Decoder-Only) Checkpoint

3.2 Evaluation

Some of the code was forked from the following repositories

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages