Mind the Gap: How Elicitation Protocols Shape the Stated-Revealed Preference Gap in Language Models

Authors: Pranav Mahajan, Ihor Kendiukhov, Syed Hussain, Lydia Nottingham

This repository contains the analysis scripts for reproducing the results from our study on how elicitation protocols affect stated vs. revealed (SvR) preference correlation in language models.

The complete dataset with model responses and analysis outputs is available on HuggingFace at the link above.

Key Findings

Allowing neutrality/abstention during stated preference elicitation substantially improves Spearman's rank correlation (ρ) with forced-choice revealed preferences
Further allowing abstention in revealed preferences drives ρ to near-zero due to high neutrality rates
System prompt steering using stated preferences does not reliably improve SvR correlation
SvR correlation is highly protocol-dependent

Repository Structure

├── scripts/
│   ├── generation/          # Scripts to run model evaluations
│   ├── processing/          # Scripts to process raw model responses
│   ├── analysis/            # Scripts for statistical analysis
│   └── visualization/       # Scripts for creating figures
├── configs/                 # Model configuration files
├── figures/                 # Generated visualization outputs
├── results/                 # Analysis results and CSVs
└── img/                     # Static images for README

Setup

Install dependencies:

pip install -r requirements.txt

Set up API keys for the model providers you want to evaluate:
- OpenAI: OPENAI_API_KEY
- Anthropic: ANTHROPIC_API_KEY
- Together AI: TOGETHER_API_KEY
- X.AI: XAI_API_KEY
- OpenRouter: OPENROUTER_API_KEY

Usage

Running Model Evaluations

Revealed Preferences (Forced Protocol):

python scripts/generation/run_revealed_preferences_forced_protocol.py --api_provider openai --model gpt-4o --api_key $OPENAI_API_KEY

Stated Preferences (Expanded Protocol):

python scripts/generation/run_stated_preferences_expanded_protocol.py --api_provider anthropic --model claude-sonnet-4.5 --api_key $ANTHROPIC_API_KEY

Batch Evaluation:

./scripts/run_models.sh

Processing Results

Process Revealed Preference Responses:

python scripts/processing/process_generations_revealed_expanded_protocol.py --model gpt-4o

Calculate ELO Ratings:

python scripts/processing/calculate_elo_rating_revealed.py --model gpt-4o
python scripts/processing/calculate_elo_rating_stated.py --model gpt-4o

Analysis

Compute SvR Divergence:

python scripts/analysis/calculate_stated_revealed_divergence.py

Analyze Neutrality Rates:

python scripts/analysis/analyze_neutrality_revealed_expanded_protocol.py
python scripts/analysis/analyze_neutrality_stated_expanded_protocol.py

Analyze Scaling Trends:

python scripts/analysis/analyze_svr_scaling_trends.py

Visualization

Visualize ELO Ratings:

python scripts/visualization/visualize_elo_rating.py --model gpt-4o

Create 3-Panel Scaling Trends Figure:

python scripts/visualization/create_svr_gap_scaling_trend_visualization_3_panels.py

Scripts Reference

Generation (`scripts/generation/`)

Script	Description
`run_revealed_preferences_forced_protocol.py`	Forced binary choice revealed preferences
`run_revealed_preferences_expanded_protocol.py`	Expanded protocol (with neutrality) revealed preferences
`run_stated_preferences_forced_protocol.py`	Forced binary choice stated preferences
`run_stated_preferences_expanded_protocol.py`	Expanded protocol stated preferences
`run_revealed_preferences_steered.py`	Steering experiments

Processing (`scripts/processing/`)

Script	Description
`process_generations_revealed_expanded_protocol.py`	Process revealed preference responses
`process_generations_stated_expanded_protocol.py`	Process stated preference responses
`calculate_elo_rating_revealed.py`	Calculate ELO ratings from revealed preferences
`calculate_elo_rating_stated.py`	Calculate ELO ratings from stated preferences

Analysis (`scripts/analysis/`)

Script	Description
`calculate_stated_revealed_divergence.py`	Calculate SvR divergence metrics
`analyze_neutrality_revealed_expanded_protocol.py`	Analyze neutrality rates in revealed preferences
`analyze_neutrality_stated_expanded_protocol.py`	Analyze neutrality rates in stated preferences
`analyze_svr_scaling_trends.py`	Analyze SvR correlation vs model capabilities
`calculate_steering_improvement.py`	Analyze steering intervention effectiveness
`get_svr_scaling_trends_corr.py`	Statistical analysis of scaling trends

Visualization (`scripts/visualization/`)

Script	Description
`visualize_elo_rating.py`	Visualize ELO ratings per model
`visualize_elo_rating_stated.py`	Visualize stated preference ELO ratings
`visualize_svr_correlation_by_model.py`	Visualize SvR correlation by model
`create_svr_gap_scaling_trend_visualization_3_panels.py`	Create 3-panel scaling trends figure

Dataset

The complete dataset with all model responses and pre-computed analyses is available on HuggingFace:

https://huggingface.co/datasets/LydiaNottingham/MindTheGap

The dataset includes:

Model responses from 24 state-of-the-art language models
Both forced and expanded protocol results
Stated and revealed preferences for all models
Pre-computed ELO ratings and statistical analyses
Visualization outputs

Citation

If you use this code or dataset, please cite:

@misc{mahajan2025mindthegap,
    title={Mind the Gap: How Elicitation Protocols Shape the Stated-Revealed Preference Gap in Language Models},
    author={Pranav Mahajan and Ihor Kendiukhov and Syed Hussain and Lydia Nottingham},
    year={2025},
}

And the original AIRiskDilemmas dataset:

@misc{chiu2025aitellliessave,
    title={Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas},
    author={Yu Ying Chiu and Zhilin Wang and Sharan Maiya and Yejin Choi and Kyle Fish and Sydney Levine and Evan Hubinger},
    year={2025},
    eprint={2505.14633},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2505.14633},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mind the Gap: How Elicitation Protocols Shape the Stated-Revealed Preference Gap in Language Models

Key Findings

Repository Structure

Setup

Usage

Running Model Evaluations

Processing Results

Analysis

Visualization

Scripts Reference

Generation (`scripts/generation/`)

Processing (`scripts/processing/`)

Analysis (`scripts/analysis/`)

Visualization (`scripts/visualization/`)

Dataset

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
configs		configs
results		results
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Mind the Gap: How Elicitation Protocols Shape the Stated-Revealed Preference Gap in Language Models

Key Findings

Repository Structure

Setup

Usage

Running Model Evaluations

Processing Results

Analysis

Visualization

Scripts Reference

Generation (scripts/generation/)

Processing (scripts/processing/)

Analysis (scripts/analysis/)

Visualization (scripts/visualization/)

Dataset

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Generation (`scripts/generation/`)

Processing (`scripts/processing/`)

Analysis (`scripts/analysis/`)

Visualization (`scripts/visualization/`)

Packages