This repository contains the code for the paper titled "RePO: Understanding Preference Learning Through ReLU-Based Optimization".
Our codebase is built upon the alignment-handbook repo. The following steps will guide you through the installation process.
First, create a Python virtual environment using e.g. Conda:
conda create -n handbook python=3.10 && conda activate handbookNext, install PyTorch v2.2.2. Since this is hardware-dependent, we
direct you to the PyTorch Installation Page.
You can then install the remaining package dependencies of alignment-handbook as follows:
git clone https://github.com/huggingface/alignment-handbook.git
cd ./alignment-handbook/
python -m pip install .You will also need Flash Attention 2 installed, which can be done by running:
python -m pip install flash-attn --no-build-isolationWe provide four training config files for the four training setups reported in our paper. The training config is set for 4xH100 GPUs. You may need to adjust num_processes and per_device_train_batch_size based on your computation environment.
- Mistral-Instruct:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_repo.py training_configs/mistral-7b-instruct-repo.yaml- Llama3-Instruct:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_repo.py training_configs/llama-3-8b-instruct-repo.yaml- Llama3-Instruct v0.2:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_repo.py training_configs/llama-3-8b-instruct-repo-v2.yaml- Gemma2-Instruct:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_repo.py training_configs/gemma-2-9b-it-repo.yamlThe project is built upon SimPO.
