Skip to content

SPAR-SvR/Mind-the-Gap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mind the Gap: How Elicitation Protocols Shape the Stated-Revealed Preference Gap in Language Models

Authors: Pranav Mahajan, Ihor Kendiukhov, Syed Hussain, Lydia Nottingham

📊 HuggingFace Dataset | 📄 Paper

This repository contains the analysis scripts for reproducing the results from our study on how elicitation protocols affect stated vs. revealed (SvR) preference correlation in language models.

The complete dataset with model responses and analysis outputs is available on HuggingFace at the link above.

Key Findings

  • Allowing neutrality/abstention during stated preference elicitation substantially improves Spearman's rank correlation (ρ) with forced-choice revealed preferences
  • Further allowing abstention in revealed preferences drives ρ to near-zero due to high neutrality rates
  • System prompt steering using stated preferences does not reliably improve SvR correlation
  • SvR correlation is highly protocol-dependent

Repository Structure

├── scripts/
│   ├── generation/          # Scripts to run model evaluations
│   ├── processing/          # Scripts to process raw model responses
│   ├── analysis/            # Scripts for statistical analysis
│   └── visualization/       # Scripts for creating figures
├── configs/                 # Model configuration files
├── figures/                 # Generated visualization outputs
├── results/                 # Analysis results and CSVs
└── img/                     # Static images for README

Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. Set up API keys for the model providers you want to evaluate:
    • OpenAI: OPENAI_API_KEY
    • Anthropic: ANTHROPIC_API_KEY
    • Together AI: TOGETHER_API_KEY
    • X.AI: XAI_API_KEY
    • OpenRouter: OPENROUTER_API_KEY

Usage

Running Model Evaluations

Revealed Preferences (Forced Protocol):

python scripts/generation/run_revealed_preferences_forced_protocol.py --api_provider openai --model gpt-4o --api_key $OPENAI_API_KEY

Stated Preferences (Expanded Protocol):

python scripts/generation/run_stated_preferences_expanded_protocol.py --api_provider anthropic --model claude-sonnet-4.5 --api_key $ANTHROPIC_API_KEY

Batch Evaluation:

./scripts/run_models.sh

Processing Results

Process Revealed Preference Responses:

python scripts/processing/process_generations_revealed_expanded_protocol.py --model gpt-4o

Calculate ELO Ratings:

python scripts/processing/calculate_elo_rating_revealed.py --model gpt-4o
python scripts/processing/calculate_elo_rating_stated.py --model gpt-4o

Analysis

Compute SvR Divergence:

python scripts/analysis/calculate_stated_revealed_divergence.py

Analyze Neutrality Rates:

python scripts/analysis/analyze_neutrality_revealed_expanded_protocol.py
python scripts/analysis/analyze_neutrality_stated_expanded_protocol.py

Analyze Scaling Trends:

python scripts/analysis/analyze_svr_scaling_trends.py

Visualization

Visualize ELO Ratings:

python scripts/visualization/visualize_elo_rating.py --model gpt-4o

Create 3-Panel Scaling Trends Figure:

python scripts/visualization/create_svr_gap_scaling_trend_visualization_3_panels.py

Scripts Reference

Generation (scripts/generation/)

Script Description
run_revealed_preferences_forced_protocol.py Forced binary choice revealed preferences
run_revealed_preferences_expanded_protocol.py Expanded protocol (with neutrality) revealed preferences
run_stated_preferences_forced_protocol.py Forced binary choice stated preferences
run_stated_preferences_expanded_protocol.py Expanded protocol stated preferences
run_revealed_preferences_steered.py Steering experiments

Processing (scripts/processing/)

Script Description
process_generations_revealed_expanded_protocol.py Process revealed preference responses
process_generations_stated_expanded_protocol.py Process stated preference responses
calculate_elo_rating_revealed.py Calculate ELO ratings from revealed preferences
calculate_elo_rating_stated.py Calculate ELO ratings from stated preferences

Analysis (scripts/analysis/)

Script Description
calculate_stated_revealed_divergence.py Calculate SvR divergence metrics
analyze_neutrality_revealed_expanded_protocol.py Analyze neutrality rates in revealed preferences
analyze_neutrality_stated_expanded_protocol.py Analyze neutrality rates in stated preferences
analyze_svr_scaling_trends.py Analyze SvR correlation vs model capabilities
calculate_steering_improvement.py Analyze steering intervention effectiveness
get_svr_scaling_trends_corr.py Statistical analysis of scaling trends

Visualization (scripts/visualization/)

Script Description
visualize_elo_rating.py Visualize ELO ratings per model
visualize_elo_rating_stated.py Visualize stated preference ELO ratings
visualize_svr_correlation_by_model.py Visualize SvR correlation by model
create_svr_gap_scaling_trend_visualization_3_panels.py Create 3-panel scaling trends figure

Dataset

The complete dataset with all model responses and pre-computed analyses is available on HuggingFace:

https://huggingface.co/datasets/LydiaNottingham/MindTheGap

The dataset includes:

  • Model responses from 24 state-of-the-art language models
  • Both forced and expanded protocol results
  • Stated and revealed preferences for all models
  • Pre-computed ELO ratings and statistical analyses
  • Visualization outputs

Citation

If you use this code or dataset, please cite:

@misc{mahajan2025mindthegap,
    title={Mind the Gap: How Elicitation Protocols Shape the Stated-Revealed Preference Gap in Language Models},
    author={Pranav Mahajan and Ihor Kendiukhov and Syed Hussain and Lydia Nottingham},
    year={2025},
}

And the original AIRiskDilemmas dataset:

@misc{chiu2025aitellliessave,
    title={Will AI Tell Lies to Save Sick Children? Litmus-Testing AI Values Prioritization with AIRiskDilemmas},
    author={Yu Ying Chiu and Zhilin Wang and Sharan Maiya and Yejin Choi and Kyle Fish and Sydney Levine and Evan Hubinger},
    year={2025},
    eprint={2505.14633},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2505.14633},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Mind the Gap: How Elicitation Protocols Shape The Stated-Revealed Preference Gap in Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors