CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics
This repository contains code for prompt generation, image generation and evalaution experiments for CulturalFrames.
The increasing ubiquity of text-to-image (T2I) models as tools for visual content generation raises concerns about their ability to accurately represent diverse cultural contexts. In this work, we present the first study to systematically quantify the alignment of T2I models and evaluation metrics with respect to both explicit as well as implicit cultural expectations. To this end, we introduce CulturalFrames, a novel benchmark designed for rigorous human evaluation of cultural representation in visual generations. Spanning 10 countries and 5 socio-cultural domains, CulturalFrames comprises 983 prompts, 3,637 corresponding images generated by 4 state-of-the-art T2I models, and over 10k detailed human annotations
Code and implementation details will be available soon. Stay tuned for updates!
If you use CulturalFrames in your research, please cite our paper:
@misc{nayak2025culturalframes,
title={CulturalFrames: Assessing Cultural Expectation Alignment in Text-to-Image Models and Evaluation Metrics},
author={Shravan Nayak and Mehar Bhatia and Xiaofeng Zhang and Verena Rieser and Lisa Anne Hendricks and Sjoerd van Steenkiste and Yash Goyal and Karolina Staลczak and Aishwarya Agrawal},
year={2025},
eprint={2506.08835},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.08835}
}