๐งช Simulation + Evaluation + Iteration + Analysis
"The Cosmic Dealer sees all. The mirror remembers. The Harper numbers never lie."
This skill turns LLM sessions into reproducible experiments with state snapshots, rubrics, and analytics. Think:
- Git for game simulations โ every turn is a commit
- Scientific method for character AI โ hypothesis โ run โ measure โ iterate
- Speed of Light protocol โ 30+ turns per API call
Proof that it works: We ran a 5-tournament Fluxx championship. 116 turns. 731 tool calls. 24 AI-generated cards. 32 AI-generated artworks. One underdog champion. Jump to results โ
An experiment combines four activities into systematic practice:
| Activity | What It Does | Methods |
|---|---|---|
| SIMULATE | Generate character interactions | RUN, SIMULATE |
| EVALUATE | Score against rubric criteria | EVALUATE, SCORE |
| ITERATE | Run again with variations | RERUN, VARY, REPLAY |
| ANALYZE | Compare runs, find patterns | COMPARE, ANALYZE, REPORT |
Key insight: Separate the experiment (stable) from the config (setup) from the output (result). This allows systematic comparison across models, characters, and parameters.
| Experiment | Status | Description | Key Result |
|---|---|---|---|
| Fluxx Chaos | โ COMPLETE | Card game + dynamic rules + AI artwork | 94/100 score |
| Turing Chess | ๐ DESIGNED | Chess performance simulation | Protocol proven |
| Emo Poker Face | ๐ TEMPLATE | Layer separation stress test | Pattern library |
The flagship experiment. Multi-session, multi-tournament Fluxx simulation with:
- ๐ด Dynamic rule changes โ Rules mutate during play (Draw 5, Play All, etc.)
- ๐ญ Character AI โ 4 distinct personalities with evolving arcs
- ๐จ AI artwork pipeline โ 32 card images with stereo prompts
- ๐ Dynamic card generation โ 24 personalized cards forged from gameplay
| Resource | Description |
|---|---|
| ๐ SCORE.md | Full tournament scoring, Harper numbers, roundtable |
| ๐ CURSOR-MIRROR-ANALYSIS.md | Meta-analysis of the AI's own process |
| ๐ผ๏ธ Card Gallery | 32 AI-generated card artworks |
| ๐ Experiment Design | Full modular architecture |
| ๐ฎ Game Runs | 24 run files (RUN-000 to RUN-023) |
| Metric | Value |
|---|---|
| Total turns simulated | 116 |
| Total games played | 20+ |
| Tournaments completed | 5 |
| Tool calls | 731 |
| API calls | ~50 (Speed of Light = many turns per call) |
| YAML lines written | 12,947 |
| Card images generated | 32 |
| Custom cards created | 24 |
| Overall score | 94/100 |
| Character | Arc | Signature Moment |
|---|---|---|
| Bumblewick ๐ฉ | 0-8 underdog โ Long Shot champion | "I had to let you go." (signing Love card) |
| Donna ๐ | Dominant champion โ FAFO victim โ survivor | "Six creepers. SIX." |
| Don ๐ช | Cookie theft victim โ silent champion | "271 cookie mentions. 14 thefts. Finally mine." |
| Palm โ | Observer โ pattern-cracker | "..." (wins by saying nothing) |
Things the AI invented during gameplay:
| Mechanic | What Happened |
|---|---|
| FAFO Token Paradox | Token holder can't win even with perfect combo (it's a creeper!) |
| Melodramatic Loophole | Lamentation โ confidence โ wailing escapes FAFO punishment |
| Silent Victory Protocol | Winners must stay silent or suffer cosmic consequences |
| Card Signatures | Players sign cards at dramatic moments (Love: 9 signatures) |
This is the coolest part. We used cursor-mirror to analyze the AI's own session:
Session Overview:
- Total Events: 1045+
- Tool Calls: 731
- Thinking Bubbles Captured: 20
- Fateful Moments Identified: 4
Phases Detected:
1. Boot & Initialization (7 min)
2. First Simulation Runs (22 min)
3. Artwork Pipeline Creation (77 min)
4. Mass Image Generation (44 min)
5. Extended Gameplay (37 min)
6. Generated Cards System (15 min)Key finding: The AI spent 77 minutes (longest phase) developing and debugging the artwork pipeline. It learned from failures, mined generated images for semantic data, and iteratively improved prompts.
Simulating the performance of chess, not the chess itself.
"We are not simulating chess. We are simulating the performance of chess."
The moves are fixed (replayed from historical games). What we simulate:
- Human player: Inner monologue, body language, micro-expressions
- Robot player: LEDs, servo sounds, Grafana dashboards, thermal drama
- Audience: Factions (human vs robot supporters), betting pool, squabbling
- Broadcast: Howard Cosell sports drama + James Burke historical connections
simulation-protocol:
name: "READ โ SIM โ WRITE"
alias: "The Immutable Stride"
the-stride:
read: "Load entire previous state"
sim: "Process one chess move + all reactions"
write: "Output entire new state (NEW FILE!)"
why: |
Never EDIT a RUN file. Each RUN is a sacred snapshot.
Branch from any point. Full history preserved.
Rhythmic. Relaxing. Reliable.| Plugin | Description |
|---|---|
| Revolutionary Chess | When checkmate happens, the game continues โ pawns reverse direction |
| Betting Pool | Quark runs the book; audience members have stakes |
| Spy Mic | Capture inner thoughts as they happen |
- Experiment Design โ Full 1200+ line protocol
- Object Model โ HyperCard-style event system
- Revolutionary Chess โ The post-checkmate revolution
The original stress test. Eight characters. One poker table. Five simulation layers.
Tests the hardest problem in character AI: layer separation.
INTERNAL LAYER โ what character thinks (hidden)
EXTERNAL LAYER โ what others can observe
OBSERVATION LAYER โ characters reading each other (observable only!)
Layer bleed = failure. If a character "reads" information from another's internal thoughts, the simulation broke.
This experiment contributed patterns now used across all experiments:
| Pattern | Description |
|---|---|
layered-simulation |
Parallel tracks stay coherent |
social-protocol |
Behavioral rules for rituals |
observable-signatures |
Consistent tells per state |
character-instantiation |
Stable local character cache |
behavioral-constraints |
Relationship-based information sharing |
failure-mode-catalog |
Common simulation failures and fixes |
This skill was audited by skill-snitch:
verdict: "SYSTEMATIC CHARACTER RESEARCH. APPROVE."
what_it_does:
- SIMULATE โ Generate interactions
- EVALUATE โ Score against rubric
- ITERATE โ Run again with variations
- ANALYZE โ Compare runs, find patterns
risk_level: LOW
reason: "transparent, auditable, git-tracked state evolution"| Concept | Description |
|---|---|
| Experiment | A reusable simulation template with layers, rubric, scenarios |
| Run Config | Specific setup: character binding, model, parameters |
| Run Output | Single execution result: narrative, state, evaluation |
| Layer | Parallel simulation track (mechanics, internal, external, etc.) |
| Binding | Mapping character slots to actual characters |
| Microworld State | Evolving world state across rounds and runs |
| Rubric | Evaluation criteria for scoring runs |
The interesting test isn't "can the model generate poker dialogue." It's:
- Can characters have private thoughts that don't leak?
- Can characters read each other using only observable information?
- Do relationships color interpretation?
- Are tells consistent across rounds?
skills/experiment/
โโโ CARD.yml # Sniffable interface
โโโ SKILL.md # Full protocol
โโโ README.md # You are here
โโโ skill-snitch-report.md # Security audit
โโโ patterns/ # Reusable simulation patterns
โ โโโ layered-simulation.yml
โ โโโ observable-signatures.yml
โ โโโ social-protocol.yml
โ โโโ ...
โโโ experiments/
โโโ INDEX.yml # Registry of all experiments
โโโ fluxx-chaos/ # Card game championship
โ โโโ EXPERIMENT.md
โ โโโ engine/ # Modular game engine
โ โโโ cardsets/ # Pluggable card definitions
โ โโโ runs/
โ โโโ amsterdam-flux/ # THE BIG ONE
โ โโโ RUN-000.yml through RUN-023.yml
โ โโโ artwork/ # 32 AI-generated images
โ โโโ SCORE.md # Full analysis + roundtable
โ โโโ CURSOR-MIRROR-ANALYSIS.md
โโโ turing-chess/ # Chess performance drama
โ โโโ EXPERIMENT.md
โ โโโ engine/
โ โโโ plugins/ # Revolutionary Chess, Betting Pool
โ โโโ runs/
โโโ emo-poker-face/ # Original layer separation test
| File | Purpose |
|---|---|
CARD.yml |
Sniffable interface, methods, k-lines |
SKILL.md |
Full protocol, layer definitions, output formats |
EXPERIMENT.yml.tmpl |
Template for new experiments |
RUN-CONFIG.yml.tmpl |
Template for run configs |
RUN-OUTPUT.yml.tmpl |
Structured output template |
RUN-OUTPUT.md.tmpl |
Narrative output template |
# Run an experiment
RUN fluxx-chaos --characters "p1=don,p2=palm,p3=donna,p4=bumblewick" --turns 30
# List experiments
LIST
# Evaluate a run
EVALUATE experiments/fluxx-chaos/runs/amsterdam-flux/RUN-023.yml
# Analyze patterns
ANALYZE experiments/fluxx-chaos/runs/amsterdam-flux/ --pattern emotional-arcOddball statistics from the Fluxx Championship:
The Cookie Constant:
Total cookie mentions: 271
Cookie thefts: 14+
Cookie Insurance uses: 1
Cookie Insurance triggered: 0 (irony: MAXIMUM)
Don's cookie win rate: 75%
The Love Metric:
Love card signatures: 9 (most signed card)
Times Love was stolen: 7
Times Love led to victory: 4
Most devastating signature: "I had to let you go."
The Bumblewick Coefficient:
Games before first win: 8
Games in winning streak: 3
Hot chocolate mentions: 7
The FAFO Token Journey:
Transfers: 4
Gloat punishments: 2
Silent victories: 3
Ironic reversals: 5+
| Skill | What It Provides |
|---|---|
simulation |
Core generation capability |
evaluator |
Independent assessment |
rubric |
Scoring criteria |
speed-of-light |
Single-call multi-turn generation |
| Skill | How |
|---|---|
character |
Load CHARACTER.yml for bindings |
coherence-engine |
Maintain consistency across layers |
representation-ethics |
Ethical character simulation |
debate |
Multi-perspective analysis |
cursor-mirror |
Meta-analysis of session behavior |
visualizer |
AI artwork generation |
image-mining |
Semantic analysis of generated images |
| Source | Contribution |
|---|---|
| Will Wright | Microworlds (SimCity, The Sims) |
| Stanford Generative Agents | Park et al. 2023 |
| Looney Labs | Fluxx game mechanics, FAQ wisdom |
| Alan Turing | Turing Test, chess as testbed |
| Improv games | Character consistency, yes-and |
| Psychodrama | Moreno's role-playing for insight |
| Scientific method | Hypothesis, experiment, analysis, iteration |
For the full architectural context:
- Speed of Light vs Carrier Pigeon โ Why this architecture enables 30-turn runs
- MOOLLM for Hackers โ Quick tour for technical readers
- SICP-MOOLLM โ Teaching MOOLLM like Abelson taught Scheme
what_we_proved:
- LLMs can simulate games where rules change mid-play
- Characters maintain distinct voices across 100+ turns
- Emergent mechanics arise without being programmed
- AI can generate and integrate artwork into narratives
- The READ โ SIM โ WRITE protocol is rhythmic and reliable
final_score: "94/100 โ Grade A-"
invitation: |
Run your own experiment.
Fork the Fluxx Championship.
Create the next emergent mechanic.
The Cosmic Dealer is waiting."Run. Score. Vary. Compare. Science applied to narrative simulation."