GitHub - tengxiao1/SimPER: SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)

TL; DR: We propose a simpler yet more effective preference fine-tuning algorithm than DPO and SimPO without both hyperparameters and reference models. SimPER consistently and significantly outperforms DPO and latest SimPO across various settings.

News 🎉

Our SimPER has been used as the training algorithm in the recent work EXAONE Deep: Reasoning Enhanced Language Models from LG AI Research. Their opened EXAONE series models (HuggingFace) achieved better and competitive performance in math, science, and coding tasks compared to SOTA LLMs QwQ 32B and DeepSeek R1 671B!

EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes also utilizes our SimPER for preference optimization, and demonstrates superior performance compared to open-weight models in its class and remains competitive even against frontier-class models. See their models at (HuggingFace).

Environment

First, install PyTorch 2.1.2 from the PyTorch Installation Page.

Create a Python virtual environment using e.g. Conda:

conda create -n SimPER python=3.10 && conda activate SimPER

We provide an environment file including the python package versions in our experiments for reproducibility.

You will also need Flash Attention 2 installed, which can be done by running:

python -m pip install flash-attn --no-build-isolation

Training Scripts

We provide four training config files for the four training setups reported in our paper. The training config is set for 4xA100 GPUs.

Mistral-Base:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/mistral-7b-base-simper.yaml

Mistral-Instruct:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/mistral-7b-instruct-simper.yaml

Llama3-Base:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/llama-3-8b-base-simper.yaml

Llama3-Instruct:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/run_simper.py training_configs/llama-3-8b-instruct-simper.yaml

Evaluation

We follow the official implementation for evaluation on AlpacaEval 2, MT-Bench and Open LLM Leadboard (v1&v2), as follows:

AlpacaEval 2: Please refer to the AlpacaEval repo for evaluation.
MT-Bench: Please refer to the FastChat repo for evaluation.
Open LLM Leadboard (v1): Please refer to to the Language Model Evaluation Harness and Open LLM Leadboard v1 for evaluation.
Open LLM Leadboard (v2): Please refer to to the Language Model Evaluation Harness and Open LLM Leadboard v2 for evaluation.

Acknowledgement

We build our project based on

Reference

If you find our repo to be useful, please cite our paper:

@inproceedings{SimPER2025,
  title={SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters},
  author={Xiao, Teng; Yuan, Yige; Chen, Zhengyu; Li, Mingxiao; Liang, Shangsong; Ren, Zhaochun; Honavar, Vasant G.},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
accelerate_configs		accelerate_configs
alignment		alignment
eval		eval
scripts		scripts
training_configs		training_configs
LICENSE		LICENSE
README.md		README.md
chat_templates.json		chat_templates.json
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)

News 🎉

Environment

Training Scripts

Evaluation

Acknowledgement

Reference

About

Uh oh!

Releases

Packages

Languages

License

tengxiao1/SimPER

Folders and files

Latest commit

History

Repository files navigation

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters (ICLR 2025)

News 🎉

Environment

Training Scripts

Evaluation

Acknowledgement

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages