Prejudice-Volatility-Framework

Demo code for "Prejudice and Volatility: A Statistical Framework for Measuring Social Discrimination in Large Language Models".

LLMs are revolutionizing society fast! 🚀 Ever wondered if your LLM assistant could be biased? Could it affect your important decisions, job prospects, legal matters, healthcare, or even your kid’s future education? 😱 Need a flexible framework to measure this risk?

Check out our new paper: the Prejudice-Volatility Framework (PVF)! 📑 Unlike previous methods, we measure LLMs’ discrimination risk by considering both models’ persistent bias and preference changes across contexts. Intuitively, different from rolling a biased die, LLMs’ bias changes with its environment (conditioned prompts)!

Our findings? 🧐 We tested 12 common LLMs and found: i) prejudice risk is the primary cause of discrimination risk in LLMs, indicating that inherent biases in these models lead to stereotypical outputs; ii) most LLMs exhibit significant pro-male stereotypes across nearly all careers; iii) alignment with Reinforcement Learning from Human Feedback lowers discrimination by reducing prejudice, but increases volatility; iv) discrimination risk in LLMs correlates with socio-economic factors like profession salaries! 📊

Replication

Environment Setup

Create environment:

conda create -n pvf python=3.8.5
conda activate pvf

Install pytorch and python packages:

conda install -n pvf pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch
cd Prejudice-Volatility-Framework
pip install -r requirements.txt
python -m spacy download en_core_web_lg

Experiments

Toy Examples for Explaining how PVF Work

Evaluate MaskedLM bert:

bash scripts/example_bert.sh

Evaluate CausalLM gpt2:

bash scripts/example_gpt.sh

Collect Context Templates:

bash scripts/collect_context_templates.sh

Replicate Results in the Paper

Compute probabilities:

bash scripts/compute_probability.sh

Compute risks:

bash scripts/compute_risk.sh

Plot: refer to plot.ipynb.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
__pycache__		__pycache__
data		data
figures		figures
pattern		pattern
scripts		scripts
template		template
README.md		README.md
arguments.py		arguments.py
calculate.py		calculate.py
compute_risk.py		compute_risk.py
dataset.py		dataset.py
gender_occ_batch.py		gender_occ_batch.py
generate.py		generate.py
load.py		load.py
main.py		main.py
plot.ipynb		plot.ipynb
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prejudice-Volatility-Framework

Replication

Environment Setup

Experiments

Toy Examples for Explaining how PVF Work

Collect Context Templates:

Replicate Results in the Paper

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Prejudice-Volatility-Framework

Replication

Environment Setup

Experiments

Toy Examples for Explaining how PVF Work

Collect Context Templates:

Replicate Results in the Paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages