Assessing Historical Structural Oppression Worldwide via Rule-Guided Prompting of Large Language Models

Sreejato Chatterjee¹, Linh Tran¹, Quoc Duy Nguyen¹, Roni Kirson¹, Drue Hamlin², Harvest Aquino¹, Hanjia Lyu¹, Jiebo Luo¹, Timothy Dye³

¹ University of Rochester

² Rochester Institute of Technology

³ University of Rochester School of Medicine & Dentistry

Accepted for publication in IEEE Big Data 2025: 11th Special Session on Intelligent Data Mining

Introduction

Traditional efforts to measure historical structural oppression struggle with cross-national validity due to the unique, locally specified histories of exclusion, colonization, and social status in each country, and often have relied on structured indices that privilege material resources while overlooking lived, identity-based exclusion.

We introduce a novel framework for oppression measurement that leverages Large Language Models (LLMs) to generate context-sensitive scores of lived historical disadvantage across diverse geopolitical settings. Using unstructured self-identified ethnicity utterances from a multilingual COVID-19 global study, we design rule-guided prompting strategies that encourage models to produce interpretable, theoretically grounded estimations of oppression. We systematically evaluate these strategies across multiple state-of-the-art LLMs.

Our results demonstrate that LLMs, when guided by explicit rules, can capture nuanced forms of identity-based historical oppression within nations. This approach provides a complementary measurement tool that highlights dimensions of systemic exclusion, offering a scalable, cross-cultural lens for understanding how oppression manifests in data-driven research and public health contexts.

Example Usage

This pipeline assigns oppression scores (1-5) and explanations to free-text identity-country pairs using LangChain. It is designed for use in Google Colab, with manual variable configuration.

python ethnicity_assignment_pipeline.py

Before running, update the following lines inside the script:

# Choose LLM provider: "gemini" or "openai"
model_choice = "gemini"

# Choose prompt type: "vanilla", "cot", or "rule-guided"
prompt_mode = "rule-guided"

# Path to input Excel file (each sheet must contain columns: 'identity', 'country')
excel_path = "/content/drive/My Drive/Dye Lab/unmatched_identities.xlsx"

Note: In Google Colab, you will also need to:

Mount Google Drive:

from google.colab import drive
drive.mount('/content/drive')

Add your API Keys securely:

from google.colab import userdata
userdata.set("GPT-Key", "sk-...")
userdata.set("GeminiKey", "AIza...")

The output will be saved as:

gemini_rule-guided.csv  # (or similar, depending on config)

Reproducibility

Figure 2

python fig_2.py

Figure 3 & 4

python divergences.py

python fig_3_4.py

Figure 5

python fig_5.py

Table 1

python tab_1.py

Citation

@inproceedings{chatterjee2025oppression,
  title = {Assessing Historical Structural Oppression Worldwide via Rule-Guided Prompting of Large Language Models},
  author = {Chatterjee, Sreejato and Tran, Linh and Nguyen, Quoc Duy and Kirson, Roni and Hamlin, Drue and Aquino, Harvest and Lyu, Hanjia and Luo, Jiebo and Dye, Timothy},
  booktitle = {Proceedings of the 2025 IEEE International Conference on Big Data (IEEE Big Data)},
  year = {2025},
  publisher = {IEEE},
  address = {Macau, CN},
  url = {https://arxiv.org/abs/2509.15216},
  note = {11th Special Session on Intelligent Data Mining},
  organization = {IEEE}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LLM Results		LLM Results
reproducibility		reproducibility
Prompt_Chart.png		Prompt_Chart.png
README.md		README.md
ethnicity_assignment_pipeline.py		ethnicity_assignment_pipeline.py
fulldata.xlsx		fulldata.xlsx
metrics_table.png		metrics_table.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assessing Historical Structural Oppression Worldwide via Rule-Guided Prompting of Large Language Models

Introduction

Example Usage

Reproducibility

Figure 2

Figure 3 & 4

Figure 5

Table 1

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Assessing Historical Structural Oppression Worldwide via Rule-Guided Prompting of Large Language Models

Introduction

Example Usage

Reproducibility

Figure 2

Figure 3 & 4

Figure 5

Table 1

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages