Skip to content

tengxiao1/SimPER

Repository files navigation

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)

TL; DR: We propose a simpler yet more effective preference fine-tuning algorithm than DPO and SimPO without both hyperparameters and reference models. SimPER consistently and significantly outperforms DPO and latest SimPO across various settings.

News 🎉

Our SimPER has been used as the training algorithm in the recent work EXAONE Deep: Reasoning Enhanced Language Models from LG AI Research. Their opened EXAONE series models (HuggingFace) achieved better and competitive performance in math, science, and coding tasks compared to SOTA LLMs QwQ 32B and DeepSeek R1 671B!

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes also utilizes our SimPER for preference optimization, and demonstrates superior performance compared to open-weight models in its class and remains competitive even against frontier-class models. See their models at (HuggingFace).

Environment

First, install PyTorch 2.1.2 from the PyTorch Installation Page.

Create a Python virtual environment using e.g. Conda:

conda create -n SimPER python=3.10 && conda activate SimPER

We provide an environment file including the python package versions in our experiments for reproducibility.

You will also need Flash Attention 2 installed, which can be done by running:

python -m pip install flash-attn --no-build-isolation

Training Scripts

We provide four training config files for the four training setups reported in our paper. The training config is set for 4xA100 GPUs.

  • Mistral-Base:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/mistral-7b-base-simper.yaml
  • Mistral-Instruct:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/mistral-7b-instruct-simper.yaml
  • Llama3-Base:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/llama-3-8b-base-simper.yaml
  • Llama3-Instruct:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/llama-3-8b-instruct-simper.yaml

Evaluation

We follow the official implementation for evaluation on AlpacaEval 2, MT-Bench and Open LLM Leadboard (v1&v2), as follows:

Acknowledgement

We build our project based on

Reference

If you find our repo to be useful, please cite our paper:

@inproceedings{SimPER2025,
  title={SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters},
  author={Xiao, Teng; Yuan, Yige; Chen, Zhengyu; Li, Mingxiao; Liang, Shangsong; Ren, Zhaochun; Honavar, Vasant G.},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2025}
}

About

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages