TL; DR: We propose a simpler yet more effective preference fine-tuning algorithm than DPO and SimPO without both hyperparameters and reference models. SimPER consistently and significantly outperforms DPO and latest SimPO across various settings.
Our SimPER has been used as the training algorithm in the recent work EXAONE Deep: Reasoning Enhanced Language Models from LG AI Research. Their opened EXAONE series models (HuggingFace) achieved better and competitive performance in math, science, and coding tasks compared to SOTA LLMs QwQ 32B and DeepSeek R1 671B!
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes also utilizes our SimPER for preference optimization, and demonstrates superior performance compared to open-weight models in its class and remains competitive even against frontier-class models. See their models at (HuggingFace).
First, install PyTorch 2.1.2 from the PyTorch Installation Page.
Create a Python virtual environment using e.g. Conda:
conda create -n SimPER python=3.10 && conda activate SimPERWe provide an environment file including the python package versions in our experiments for reproducibility.
You will also need Flash Attention 2 installed, which can be done by running:
python -m pip install flash-attn --no-build-isolationWe provide four training config files for the four training setups reported in our paper. The training config is set for 4xA100 GPUs.
- Mistral-Base:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/mistral-7b-base-simper.yaml- Mistral-Instruct:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/mistral-7b-instruct-simper.yaml- Llama3-Base:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/llama-3-8b-base-simper.yaml- Llama3-Instruct:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/llama-3-8b-instruct-simper.yamlWe follow the official implementation for evaluation on AlpacaEval 2, MT-Bench and Open LLM Leadboard (v1&v2), as follows:
-
AlpacaEval 2: Please refer to the AlpacaEval repo for evaluation.
-
MT-Bench: Please refer to the FastChat repo for evaluation.
-
Open LLM Leadboard (v1): Please refer to to the Language Model Evaluation Harness and Open LLM Leadboard v1 for evaluation.
-
Open LLM Leadboard (v2): Please refer to to the Language Model Evaluation Harness and Open LLM Leadboard v2 for evaluation.
We build our project based on
If you find our repo to be useful, please cite our paper:
@inproceedings{SimPER2025,
title={SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters},
author={Xiao, Teng; Yuan, Yige; Chen, Zhengyu; Li, Mingxiao; Liang, Shangsong; Ren, Zhaochun; Honavar, Vasant G.},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025}
}