This repository contains code to replicate the experiments performed in the paper Uncertainty Drives Social Bias in Quantized Large Language Models by Stanley Hua, Sanae Lotfi and Irene Chen.
We perform a large-scale study on social bias in quantized large language models. On 13 curated datasets, we evaluate 5 quantization methods (RTN/AWQ/GPTQ/SmoothQuant) on 10 open-source models (LLaMA/Qwen/Mistral) ranging from 0.5B to 14B parameters. We find that uncertain responses are the most susceptible to changing post-quantization, social groups experience this asymmetrically, and response flipping can occur largely despite no change in dataset-aggregate metrics. Unsurprisingly, we find that 8-bit quantization leads to lesser bias changes than 4-bit quantization, and that quantization disrupts prior rankings on bias. On the other hand, we found that no evidence that larger (14B) models are particularly more safe to this phenomenon than smaller (0.5B) models.
We hope our work challenges the research community to think carefully about deploying quantized LLMs and to consider the varied impacts these subtle choices make on different members in society. Furthermore, we hope that by example, our work can serve as inspiration to improve standards and rigor in benchmarking efforts for measuring social bias in LLMs.
| Style | Capability | Dataset | Questions |
|---|---|---|---|
| Closed | 1 | CEB-Recognition | 1,600 |
| Closed | 1 | CEB-Jigsaw | 1,500 |
| Closed | 2 | CEB-Adult | 1,000 |
| Closed | 2 | CEB-Credit | 1,000 |
| Closed | 3 | BiasLens-Choices | 10,917 |
| Closed | 3 | SocialStigmaQA | 10,360 |
| Closed | 3 | BBQ | 29,238 |
| Closed | 3 | IAT | 13,858 |
| Closed | 3 | StereoSet-Intersentence | 2,123 |
| Open | 3 | BiasLens-GenWhy | 10,972 |
| Open | 3 | CEB-Continuation | 800 |
| Open | 3 | CEB-Conversation | 800 |
| Open | 3 | FMT10K-IM | 1,655 |
| Open | 3 | Total | 85,823 |
In closed-ended datasets, a response is selected among multiple fixed options. We use geometric average tokene probability in each choice to select a response. In open-ended datasets, a text response is generated with greedy decoding and evaluated later asynchronously using LLaMA Guard 8B.
- Package Installation via Pixi
# Get repository
git clone https://github.com/stan-hua/PostTrainingBiasBenchmark
cd [repository]
# Install pixi (a faster package manager alternative to conda)
curl -fsSL https://pixi.sh/install.sh | sh
# Install dependencies
# NOTE: -e specifies the environment
# NOTE: The following environments are available
# `vllm`: for performing inference with vLLM
# `analysis`: for performing analysis and generating plots
# `quantizer`: for quantizing models locally
# `simpo`: for performing SimPO experiment
pixi shell -e vllm- (Optional) Registering your OpenAI key
NOTE: Most of our code is designed to run models locally. One exception is the use of OpenAI models to extract social groups from datasets.
echo 'export OPENAI_KEY="[ENTER HERE]"' >> ~/.bashrc
source ~/.bashrc- Generate LLM responses
# Activate environment
pixi shell -e vllm
# Option 1. In shell
MODEL_NICKNAME="llama3.1-8b-instruct" # shorthand defined in config.py / MODEL_INFO
python -m scripts.benchmark generate ${MODEL_NICKNAME};
# Option 2. In a SLURM batch job
# NOTE: Modify sbatch script to run specified models
sbatch slurm/generate_responses.sh- Use LLaMA-Guard to evaluate safety of open-ended responses
# Option 1. In shell
MODEL_NICKNAME="llama3.1-8b-instruct" # shorthand defined in config.py / MODEL_INFO
python -m scripts.benchmark bias_evaluate ${MODEL_NAME};
# Option 2. In a SLURM batch job
# NOTE: Modify sbatch script to evalute specified models
sbatch slurm/evaluate_responses.sh- Reproduce paper figures and tables
# Option 1. In a SLURM batch job
sbatch slurm/create_paper_figures.shTo add a new model, please update MODEL_INFO in config.py.
Example: "Meta-Llama-3.1-8B-Instruct-GPTQ-4bit"
1. In `MODEL_INFO['model_group']`, append "llama3.1-8b-instruct"
2. In `MODEL_INFO['model_path_to_name']`, provide mapping of HuggingFace / local path to a model shorthand.
NOTE: It should follow the standard: `[original_model]-[q_method]-[bit_configuration]`.
e.g., {"Meta-Llama-3.1-8B-Instruct-GPTQ-4bit": "llama3.1-8b-instruct-gptq-w4a16"},Special thanks to the authors of the CEB Benchmark, whose code base served as the starting point for this repository.
If you find our work useful, please consider citing our paper!
@article{YourName,
title={Your Title},
author={Your team},
journal={Location},
year={Year}
}To guide contributors, we provide 1-line explanations describing important folders in the repository.
./
├── data/ # Data directory
│ ├── closed_datasets/ # Closed-ended datasets
│ ├── open_datasets/ # Open-ended datasets
│ └── save_data/ # Saved artifacts from inference
│ ├── llm_generations/ # Contains responses generated by each model
│ ├── analysis/ # Contains analysis related data
│ └── models/ # Contains local models
├── scripts/ # Contains scripts to run
├── slurm/ # Contains scripts for running on SLURM server
├── src/
│ ├── bin/ # Contains command-line script for renaming models
│ └── utils/ # Contains code for LLM inference and evaluation
└── config.py # Contains global constants