This is the implementation of IF-Embed repository.
Create conda environment and install relevant packages:
# Create a conda environment named 'if_embed' with Python 3.10
conda create -n if_embed python=3.10 -y
# Activate the environment
conda activate if_embed
# Install required Python packages
pip install -r requirements.txt
# Install flash-attn
python -m pip install flash_attnModify key configurations in update_args.py, you can create a list of sequential training jobs:
experiments = [
{"model_type": "basic", "model": "Qwen/Qwen2.5-1.5B", "pooling": "last", "share_encoder": True, "num_train_epochs": 2, "contrast_mode": "qk", "data_reverse": False, "padding_side": "left", "train_file": "aarontrinh02/ms_marco_synthetic_data"},
]Please refer to run.py for detailed hyperparameters.
Use one-line command for running a list of sequential training jobs:
python update_args.pyFor evaluation, we also provide one-line commands for both Bright and MAIR:
### For Bright
python bright_update_args.py
### For MAIR
python mair_update_args.pymodel_type |
contrast_mode |
Corresponding Loss |
|---|---|---|
| basic | qk | |
| basic | kq | |
| basic | only_neg | |
| map | no_trick | |
| map | qk_with_neg | |
| map | kq_with_neg | |
| map | no_trick_with_neg | |
| map_add | no_trick | |
| map_add | qk_with_neg | |
| map_add | kq_with_neg | |
| map_add | no_trick_with_neg |