Official repo for Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization (ACL 2025 Main Conference)
conda create -n rm_eval python=3.10 -y
conda activate rm_eval
pip install -r requirements.txtTo evaluate results, MARIO EVAL needs to be installed.
git clone https://github.com/MARIO-Math-Reasoning/MARIO_EVAL.git
cd MARIO_EVAL
cd latex2sympy && pip install . && cd ..
pip install -e .bash scripts/run_classifier_rm.shbash scripts/run_prm.shThe underlying codebase for evaluating reward model from RewardBench.
@article{kim2025rethinking,
title={Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization},
author={Kim, Sunghwan and Kang, Dongjin and Kwon, Taeyoon and Chae, Hyungjoo and Lee, Dongha and Yeo, Jinyoung},
journal={arXiv preprint arXiv:2505.12763},
year={2025}
}