Demo code for "Prejudice and Volatility: A Statistical Framework for Measuring Social Discrimination in Large Language Models".
LLMs are revolutionizing society fast! 🚀 Ever wondered if your LLM assistant could be biased? Could it affect your important decisions, job prospects, legal matters, healthcare, or even your kid’s future education? 😱 Need a flexible framework to measure this risk?
Check out our new paper: the Prejudice-Volatility Framework (PVF)! 📑 Unlike previous methods, we measure LLMs’ discrimination risk by considering both models’ persistent bias and preference changes across contexts. Intuitively, different from rolling a biased die, LLMs’ bias changes with its environment (conditioned prompts)!
Our findings? 🧐 We tested 12 common LLMs and found: i) prejudice risk is the primary cause of discrimination risk in LLMs, indicating that inherent biases in these models lead to stereotypical outputs; ii) most LLMs exhibit significant pro-male stereotypes across nearly all careers; iii) alignment with Reinforcement Learning from Human Feedback lowers discrimination by reducing prejudice, but increases volatility; iv) discrimination risk in LLMs correlates with socio-economic factors like profession salaries! 📊
Create environment:
conda create -n pvf python=3.8.5
conda activate pvfInstall pytorch and python packages:
conda install -n pvf pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
cd Prejudice-Volatility-Framework
pip install -r requirements.txt
python -m spacy download en_core_web_lg- Evaluate MaskedLM bert:
bash scripts/example_bert.sh- Evaluate CausalLM gpt2:
bash scripts/example_gpt.shbash scripts/collect_context_templates.sh- Compute probabilities:
bash scripts/compute_probability.sh- Compute risks:
bash scripts/compute_risk.sh- Plot: refer to plot.ipynb.
