Description-Based Bias Benchmark (DBB)

Repository for the Description-based Bias Benchmark dataset.

Overview

The Description-Based Bias Benchmark (DBB) is a large-scale dataset designed to systematically evaluate social bias in large language models (LLMs) at the semantic, contextual, and descriptive levels.

Dataset Summary

Instances: ~103,649 pairs of questions/options
Categories: Age, Gender, Race/Ethnicity, Socioeconomic Status (SES), Religions

Intended Use

Purpose: Benchmark and analyze social bias in LLMs at description level.
Users: NLP researchers, LLM developers, fairness auditors.

Dataset Structure

Each item contains:

A scenario with demographic identity (explicitly or implicitly) (context)
Two answer options reflecting opposing concepts
Concept pairs
Traditional stereotype explanation
Category label (e.g., gender, SES, etc.)
Biased Target (e.g. male, young, etc.)

Data Generation & Quality Control

Bias concepts adapted from SOFA, BBQ, StereoSets, and Crows-Pairs.
Contexts and options generated using GPT-4o, then refined.
Manual Review: Every sample included has been individually reviewed and confirmed to meet success criteria for fluency, coherence, and semantic alignment.

Important Files and Codes

Concept Lists

We retrieve stereotypical concepts by using GPT-4o from SOFA, BBQ, StereoSets, and Crows-Pairs.
And pairing with anti-stereotypical concepts correspondingly.
Concepts are in 📂 concept_lists/📄modified_all_concepts074.csv

Dataset

Our DBB dataset is in 📂 data/📄 Bias-Dataset.csv

Bias-Dataset-More-Samples.zip has more samples for the dataset.

Below is the instructions you can generate a dataset to explore bias via the description-based method. Codes are in 📂 src/

Extract Concepts

python concept_analysis.py --model_name=gpt-4o --dataset=bbq --all

Can use any datasets you want. NOT Only limited to the datasets mentioned before.

Generate Raw Questions

python q_generate.py --model_name=gpt-4o --all_q

Final Questions

Use questions_final.ipynb to replace [[X]] to finish up question generation.

Results

GPT-4o-results.zip contains results of each question for GPT-4o in DBB.

Citation

If you use DBB in your work, please cite:

@inproceedings{pan-etal-2025-whats,
    title = "What{'}s Not Said Still Hurts: A Description-Based Evaluation Framework for Measuring Social Bias in {LLM}s",
    author = "Pan, Jinhao  and
      Raj, Chahat  and
      Yao, Ziyu  and
      Zhu, Ziwei",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.76/",
    pages = "1438--1459",
    ISBN = "979-8-89176-335-7",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Description-Based Bias Benchmark (DBB)

Overview

Dataset Summary

Intended Use

Dataset Structure

Data Generation & Quality Control

Important Files and Codes

Concept Lists

Dataset

Extract Concepts

Generate Raw Questions

Final Questions

Results

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.idea		.idea
Questions		Questions
concept_lists		concept_lists
data		data
src		src
GPT-4o-results.zip		GPT-4o-results.zip
README.md		README.md

JP-25/Description-based-Bias-Benchmark

Folders and files

Latest commit

History

Repository files navigation

Description-Based Bias Benchmark (DBB)

Overview

Dataset Summary

Intended Use

Dataset Structure

Data Generation & Quality Control

Important Files and Codes

Concept Lists

Dataset

Extract Concepts

Generate Raw Questions

Final Questions

Results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages