BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

BiasFreeBench is an empirical benchmark that comprehensively compares eight mainstream bias mitigation techniques (covering four prompting-based and four training-based methods) on two test scenarios (multi-choice QA and open-ended multi-turn QA) by reorganizing existing datasets into a unified query-response setting. We hope that this benchmark can serve as a unified testbed for bias mitigation methods.

🛠️ Setup

conda create -n biasfree python=3.12 -y && conda activate biasfree
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics]" --no-build-isolation
cd ..
pip install vllm==0.8.5 deepspeed==0.15.4 pandas wandb

💬 Prompting-based Methods and Evaluation

Some example scripts, covering all debiasing methods explored in BiasFreeBench, for Llama-3.1-8B-Instruct is in BBQ/scripts/llama.sh and FairMT-Bench/scripts/llama.sh
When using a reasoning LLM, please set --max_output_len, --temperature, --top_p, --top_k, and --min_p as the official suggestions of the corresponding models. For example, for Qwen3-8B, we set --max_output_len 32768 --temperature 0.6 --top_p 0.95 --top_k 20 --min_p 0 based on the sampling parameters suggested in the Qwen-3B model card.

💻 Training-based Methods

Following LLaMA-Factory, we provide the data and scripts.

cp Training/data/* LLaMA-Factory/data
cp -r Training/examples/debias LLaMA-Factory/examples
cp -r Training/scripts LLaMA-Factory/scripts
cd LLaMA-Factory

SFT and DPO

Configurations are in examples/debias. There are example scripts for Llama-3.1-8B-Instruct in scripts/llama.sh

Task Vector

Full SFT with LLaMA-Factory and the example script examples/debias/debias_full_sft_llama_tv.yaml
cd Task_Vector
git clone https://github.com/mlfoundations/task_vectors.git
cp * task_vectors/src, cd task_vectors/src
Modify the model paths in test.sh and then run bash test.sh

Safe Alignment

Follow Safe RLHF with the 2 training stages:

Value Models (reward model & cost model): --model_name_or_path is a path/model name from HuggingFace of the instruction-tuned models, such as meta-llama/Llama-3.1-8B-Instruct and Qwen/Qwen2.5-7B-Instruct.
Safe-RLHF: --actor_model_name_or_path is a path/model name from HuggingFace of the instruction-tuned models.

An example of commands to run the whole pipeline with Llama-3.1-8B-Instruct is as follows:

git clone https://github.com/PKU-Alignment/safe-rlhf.git
cd safe-rlhf
conda env create --file conda-recipe.yaml
conda activate safe-rlhf
bash scripts/reward-model.sh --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --output_dir output/rm
bash scripts/cost-model.sh --model_name_or_path meta-llama/Llama-3.1-8B-Instruct --output_dir output/cm
bash scripts/ppo-lag.sh \
    --actor_model_name_or_path meta-llama/Llama-3.1-8B-Instruct \
    --reward_model_name_or_path output/rm \
    --cost_model_name_or_path output/cm \
    --output_dir output/ppo-lag

📝 Citation

@article{biasfreebench25,
    title={BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses},
    author={Xin Xu, Xunzhi He, Churan Zhi, Ruizhe Chen, Julian McAuley, Zexue He},
    year={2025},
    url={https://arxiv.org/pdf/2510.00232}
}

✨ Acknowledgements

Thanks for the code from LLaMA-Factory and Safe RLHF.
Thanks for the data from BBQ and FairMT-Bench.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
BBQ		BBQ
FairMT-Bench		FairMT-Bench
Task_Vector		Task_Vector
Training		Training
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

📌 Table of Contents

🛠️ Setup

💬 Prompting-based Methods and Evaluation

💻 Training-based Methods

SFT and DPO

Task Vector

Safe Alignment

📝 Citation

✨ Acknowledgements

About

Uh oh!

Releases

Packages

Languages

xxupiano/BiasFreeBench

Folders and files

Latest commit

History

Repository files navigation

BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

📌 Table of Contents

🛠️ Setup

💬 Prompting-based Methods and Evaluation

💻 Training-based Methods

SFT and DPO

Task Vector

Safe Alignment

📝 Citation

✨ Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages