ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline

Project Page: http://conlangcrafter.github.io
Paper: ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline

We introduce a fully automated system for constructing languages (conlangs) using large language models. Our multi-stage pipeline creates coherent, diverse artificial languages with their own phonology, grammar, lexicon, and translation capabilities.

Code

Supported Models

You can pass any valid model string for the provider you choose:

Google Gemini (e.g., gemini-2.5-pro, gemini-1.5-flash)
OpenAI (e.g., o4-mini, gpt-4o, gpt-5, gpt-4.1-mini)
DeepSeek via Together (e.g., deepseek-ai/DeepSeek-R1)

Quick Start

Install dependencies:
```
pip install -r requirements.txt
```

Set up API keys:

cp .env.example .env
# Edit .env and add your API keys

Generate a language:

python src/run_pipeline.py --model gemini-2.5-pro

Or with OpenAI models (choose the model you prefer):

# Reasoning model (o-series)
python src/run_pipeline.py --model o4-mini --reasoning-effort medium

# GPT-family examples
python src/run_pipeline.py --model gpt-4o
python src/run_pipeline.py --model gpt-5
python src/run_pipeline.py --model gpt-4.1-mini

Notes for OpenAI:

o-series (o1/o3/o4) reasoning models ignore temperature/top_p; use --reasoning-effort.
GPT-family models (e.g., gpt-4o) respect temperature/top_p.

Directory Structure

src/                    # Core source code
├── run_pipeline.py     # Main pipeline script
├── llm_client.py       # LLM API clients
├── pipeline_steps.py   # Language generation steps
└── utils.py           # Utility functions

prompts/               # Prompt templates
├── phonology/         # Phonology generation prompts
├── grammar/           # Grammar generation prompts
├── lexicon/           # Lexicon building prompts
└── translation/       # Translation prompts

output/                # Generated languages (created automatically)

Configuration

The system supports various parameters for customizing language generation:

python src/run_pipeline.py \
    --model gemini-2.5-pro \
    --steps phonology,grammar,lexicon,translation \
    --custom-constraints "Use only 3 vowels" \
    --translation-sentence "Hello, world!"

Model-specific parameter guide

reasoning-effort: Applies to OpenAI o-series reasoning models only (o1, o3, o4, including o4-mini). Ignored by GPT-family models like gpt-4o and gpt-5.
thinking-budget: Applies to Google Gemini models that support thinking output. Supported: gemini-2.5-pro. Not supported/ignored: gemini-1.5-flash and OpenAI models in this project.
DeepSeek: DeepSeek-R1 automatically emits a section; thinking-budget isn’t used here. Use temperature/top_p as usual.

API Keys

You'll need API keys for the language models:

Google Gemini: Get from Google AI Studio → set GOOGLE_API_KEY
OpenAI: Get from OpenAI API Keys → set OPENAI_API_KEY
DeepSeek (via Together): Get from Together AI → set TOGETHER_API_KEY

Add these to your .env file (copy from .env.example).

Self-Refinement (Critic / Amend Loop)

Enable an optional QA loop that critiques and amends intermediate artifacts (phonology, grammar, lexicon, translation).

Scoring scale used by prompts:

10: Completely consistent / excellent
9: Consistent, only clarity or minor style ambiguities
8: Very minor issues (default acceptance threshold)
7: Some moderate issues – needs revision
6 or below: Significant inconsistencies or errors

Run with QA enabled (global threshold & custom self-refine cycles):

python src/run_pipeline.py --model gemini-2.5-pro \
   --qa-enabled \
   --self-refine-steps 4 \
   --qa-threshold 8

Flags:

--qa-enabled: activate QA self-refine loop.
--self-refine-steps: number of critic→amend cycles (default 3).
--qa-threshold: global acceptance threshold (1–10 scale) overriding per-step thresholds when set.
--qa-threshold-: per-step acceptance threshold (default 8.0) used only if --qa-threshold not supplied.
--continue-qa: append new QA iterations onto existing _qa.json.

Each QA cycle:

Critic prompt returns JSON: overall_score (1–10) + issues list.
If score < threshold and cycles remain, amend prompt applies corrections.
Loop stops early if threshold met; otherwise after self-refine budget exhausted.
Iterations logged in _qa.json (before/after snapshots + iteration metadata).

Note: A score of 9 typically indicates only clarity/ambiguity issues; 8 allows very minor contradictions. Adjust thresholds if you want stricter acceptance

Citation

If you use ConlangCrafter in your research, please cite:

@article{conlangcrafter2025,
    title={ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline},
    author={Morris Alper and Moran Yanuka and Raja Giryes and Ga{\v{s}}per Begu{\v{s}}},
    year={2025},
    eprint={2508.06094},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2508.06094}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline

Code

Supported Models

Quick Start

Directory Structure

Configuration

Model-specific parameter guide

API Keys

Self-Refinement (Critic / Amend Loop)

Citation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
prompts		prompts
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

morrisalp/ConlangCrafter

Folders and files

Latest commit

History

Repository files navigation

ConlangCrafter: Constructing Languages with a Multi-Hop LLM Pipeline

Code

Supported Models

Quick Start

Directory Structure

Configuration

Model-specific parameter guide

API Keys

Self-Refinement (Critic / Amend Loop)

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages