This repository contains dataset and analysis code for studying variation in narrative perceptions in social media texts.
Clone the repository:
git clone [email protected]:joel-mire/story-perceptions.git
cd story-perceptions
Create a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows, use `.venv\Scripts\activate`
Install dependencies:
python -m pip install -r requirements.txt
Run preprocess.ipynb, which:
- downloads the StorySeeker dataset
- rehydrates the texts from StorySeeker's source dataset, (TLDR).
- queries GPT series models and Llama3 for story labels (if results aren't already cached)
- preprocesses the crowd annotation data, our meta-annotations for crowd rationales, and the StorySeeker texts and labels.
Run analysis.ipynb, which uses StoryPerceptions to explore 3 research questions:
- What are crowd workers’ descriptive perceptions of storytelling in social media texts?
- How do narrative perceptions differ among crowd workers?
- How do narrative perceptions differ across prescriptive labels from researchers, descriptive annotations from crowd workers, and predictions from LLM-based classifiers?
| Filename | Description |
|---|---|
| codes.csv | taxonomy codes, derived through qualitative analysis of crowd workers' free-text responses for our annotation task. See paper for details. |
| pc.csv | short for 'prolific_coded'; includes crowd annotations and our meta-annotations; used for most of the intra-crowd analysis |
| pcru_ann1.csv | the first author's meta-annotations for the full set of the crowd's free-text annotations |
| pcru_ann2.csv | the second author's meta-annotations for a small subset of the crowd's free-text annotations |
| sp.csv* | an expanded version of the StorySeeker dataset that includes both descriptive labels from crowd workers and descriptive predictions from LLM-based classifiers, in addition to the pre-existing prescriptive labels from researchers |
| sse.csv* | the StorySeeker dataset, rehydrated with texts downloaded from the TLDR dataset. |
| tldr_ss_subset.csv* | the subset of the TLDR dataset corresponding to the StorySeeker texts |
* files generated by preprocess.ipynb
Please open an issue or contact Joel Mire with any questions.
@inproceedings{mire-etal-2024-empirical,
title = "The Empirical Variability of Narrative Perceptions of Social Media Texts",
author = "Mire, Joel and
Antoniak, Maria and
Ash, Elliott and
Piper, Andrew and
Sap, Maarten",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.emnlp-main.1113",
pages = "19940--19968"
}