Cultural Codes

This repository contains source code for the paper Modeling Cross-Cultural Pragmatic Inference with Codenames Duet by Omar Shaikh*, Caleb Ziems*, Will Held, Aryan J. Pariani, Fred Morstatter, and Diyi Yang. Feel free to reach out to Omar Shaikh with any questions!

[Read the Paper] | [Download the Data] | [Demo (coming soon!)]

Why Cross-Cultural Inference and Codenames?

Sociocultural variation holds significant influence over how we communicate with eachother. Our referents depend heavily on sociocultural priors. For example, depending on who you ask, football might refer to American football or soccer.

Rigorously modeling how socioculture affects pragmatic inference on all axes is understandably challenging. The board game Codenames offers a more restricted setting of turn-based word reference between two players. In each round, THE CLUE GIVER provides a single-word clue; then THE GUESSER must interpret this clue to select the intended word references on the game board. Ideal inferences come from the players’ common ground—the set of shared beliefs between them (Clark, 1996). In practice, however, a player’s behavior can be idiosyncratic. Each player has knowledge and experience that shape how they interpret clues and make guesses.

Our experiments show that accounting for background characteristics significantly improves model performance for tasks related to both clue giving and guessing, indicating that sociocultural priors play a vital role in gameplay decisions.

What's 'in the box?'

Our dataset consists of 794 games with 7,703 turns, distributed across 153 unique players. Alongside gameplay, we collect information about players' personalities, values, and demographics. We deconstruct games into 6 tasks, shown below.

We additionally open-source our data collection code, including a modified Codenames Duet frontend and backend. This is under the duet folder.

How do I run the baseline models?

1. Dataset Preparation The datasets are under the data folder. Each task above is split into its own folder, named after each gameplay task above. Dataset splits have their own .csv file. Each file has 6 columns, described below.

base_text: The base_text field properties of the game state, like words remaining on the board, avoid words, green words, etc. All below fields encode base_text, and their additional values.
leaning_only encodes variables about political leaning.
event_only encodes demographic information collected from the UI (age, country of origin, native English speaker).
demo_only encodes several more demographic variables (see paper for details).
personality_only encodes results from the Big 5 personality traits.
all_text encodes all variables from the above fields.
output encodes the output from the specific gameplay task.

2. Baseline Models Running train_all_best.sh will reproduce the best-performing models on the validation set. To isolate the best model, run select_best_model.py. Additionally, models will be uploaded to HuggingFace Hub.

How do I cite this work?

BibTeX:

@article{shaikh2023modeling,
  title={Modeling Cross-Cultural Pragmatic Inference with Codenames Duet}, 
  author={Omar Shaikh and Caleb Ziems and William Held and Aryan J. Pariani and Fred Morstatter and Diyi Yang},
  year={2023},
  eprint={2306.02475},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
duet		duet
models		models
.gitignore		.gitignore
README.md		README.md
frontpage.png		frontpage.png
select_best_model.py		select_best_model.py
tasks.png		tasks.png
train_all_best.sh		train_all_best.sh
train_classification.sh		train_classification.sh
train_gen.sh		train_gen.sh
train_gen_ftext.sh		train_gen_ftext.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Cultural Codes

Why Cross-Cultural Inference and Codenames?

What's 'in the box?'

How do I run the baseline models?

How do I cite this work?

About

Uh oh!

Releases

Packages

Uh oh!

Languages

SALT-NLP/codenames

Folders and files

Latest commit

History

Repository files navigation

Cultural Codes

Why Cross-Cultural Inference and Codenames?

What's 'in the box?'

How do I run the baseline models?

How do I cite this work?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages