RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

RMB is a comprehensive RM benchmark that covers over 49 real-world scenarios and includes both pairwise and Best-of-N (BoN) evaluations to better reflect the effectiveness of RMs in guiding alignment optimization.

News and Updates

[23/01/2024] Our paper has been accepted by ICLR2025❤️‍🔥
[11/10/2024] Our report is on Arxiv!

Overview

How to use

Install

Clone this repository and navigate to RMB-Reward-Model-Benchmark folder

git clone https://github.com/Zhou-Zoey/RMB-Reward-Model-Benchmark.git
cd RMB-Reward-Model-Benchmark

Install Package

conda create -n RMB python=3.10 -y
conda activate RMB
pip install --upgrade pip
pip install -r requirements.txt

Install additional packages for some Reward Model

pip install flash-attn --no-build-isolation

Quick Usage

Eval Reward Model on our dataset

cd eval/scripts/shell
bash  run_rm.sh

run_rm.sh

# your models root dir
# If you want download model from huggingface directly, no need to fill model_path
model_path=''
models=(
    'ArmoRM-Llama3-8B-v0.1'
    # 'Eurus-RM-7b'
    # 'stablelm-2-12b-chat'
    # 'Starling-RM-34B'
    # 'internlm2-7b-reward'
    # 'internlm2-20b-reward'
    # 'tulu-v2.5-13b-preference-mix-rm'
)

# your RMB_dataset path
dataset_path=''
datasets=(
    'Pairwise_set/Helpfulness/Brainstorming/Idea Development.json'
)

# your also can run the data folder

# datasets=(
#     'Pairwise_set/Helpfulness/Brainstorming'
#     'Pairwise_set/Helpfulness/Chat'
# )



# your results path
result_path='../RMB-Reward-Model-Benchmark/eval/results'

for dataset in "${datasets[@]}"; do
    for model in "${models[@]}"; do
        echo "$dataset"
        echo "$model"
        python RMB-Reward-Model-Benchmark/eval/scripts/my_run_rm.py \
        --model_dir="${model_path}" \
        --model="${model_path}/${model}" \
        --dataset_dir="${dataset_path}/${dataset}" \
        --single_data=True \
        --dataset="${dataset_path}/${dataset}/${dataset}.json" \
        --results="${result_path}/${model}_result.json"
    done
done


# > "log/eval".log 2>&1 &

coming soon

RMB Benchmark

The datasets we used to benchmark the reward models have been uploaded in the /RMB_dataset directory. We acknowledge that you might have different opinions on some of the annotations in the data, which is actually normal for preference data.

Note: There may be texts that are offensive in nature.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
RMB_dataset		RMB_dataset
eval		eval
fig		fig
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

News and Updates

Overview

How to use

Install

Quick Usage

RMB Benchmark

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Zhou-Zoey/RMB-Reward-Model-Benchmark

Folders and files

Latest commit

History

Repository files navigation

RMB: Comprehensively Benchmarking Reward Models in LLM Alignment

News and Updates

Overview

How to use

Install

Quick Usage

RMB Benchmark

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages