StoryPerceptions

This repository contains dataset and analysis code for studying variation in narrative perceptions in social media texts.

Paper: https://aclanthology.org/2024.emnlp-main.1113/

Setup

Clone the repository:

git clone [email protected]:joel-mire/story-perceptions.git
cd story-perceptions

Create a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows, use `.venv\Scripts\activate`

Install dependencies:

python -m pip install -r requirements.txt

Preprocess Data

Run preprocess.ipynb, which:

downloads the StorySeeker dataset
rehydrates the texts from StorySeeker's source dataset, (TLDR).
queries GPT series models and Llama3 for story labels (if results aren't already cached)
preprocesses the crowd annotation data, our meta-annotations for crowd rationales, and the StorySeeker texts and labels.

Run Analysis

Run analysis.ipynb, which uses StoryPerceptions to explore 3 research questions:

What are crowd workers’ descriptive perceptions of storytelling in social media texts?
How do narrative perceptions differ among crowd workers?
How do narrative perceptions differ across prescriptive labels from researchers, descriptive annotations from crowd workers, and predictions from LLM-based classifiers?

Data Files Overview

Filename	Description
codes.csv	taxonomy codes, derived through qualitative analysis of crowd workers' free-text responses for our annotation task. See paper for details.
pc.csv	short for 'prolific_coded'; includes crowd annotations and our meta-annotations; used for most of the intra-crowd analysis
pcru_ann1.csv	the first author's meta-annotations for the full set of the crowd's free-text annotations
pcru_ann2.csv	the second author's meta-annotations for a small subset of the crowd's free-text annotations
sp.csv*	an expanded version of the StorySeeker dataset that includes both descriptive labels from crowd workers and descriptive predictions from LLM-based classifiers, in addition to the pre-existing prescriptive labels from researchers
sse.csv*	the StorySeeker dataset, rehydrated with texts downloaded from the TLDR dataset.
tldr_ss_subset.csv*	the subset of the TLDR dataset corresponding to the StorySeeker texts

* files generated by preprocess.ipynb

Questions?

Please open an issue or contact Joel Mire with any questions.

Citation

@inproceedings{mire-etal-2024-empirical,
    title = "The Empirical Variability of Narrative Perceptions of Social Media Texts",
    author = "Mire, Joel  and
      Antoniak, Maria  and
      Ash, Elliott  and
      Piper, Andrew  and
      Sap, Maarten",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.1113",
    pages = "19940--19968"
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
out		out
results		results
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StoryPerceptions

Setup

Preprocess Data

Run Analysis

Data Files Overview

Questions?

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StoryPerceptions

Setup

Preprocess Data

Run Analysis

Data Files Overview

Questions?

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages