Measure what models think.

The first pre-registered cross-architecture replication in mechanistic interpretability. Six model families, sealed in git commit e32cc75 ninety-three minutes before any data was captured, all three sealed decision conditions passed.

+0.769mean LOO cosine

p = 0.0315permutation (one-sided)

6 / 6families positive

93 minseal → data gap

read the paper · DOI · try the playground · github · ★ styxx — first product

§1The atlas v0.3 replication

April 10, 2026 · the headline result

We ran twelve open-weight captures (six families × base/instruct) on a fixed 90-prompt probe set, sealed the decision rule in git before any data was captured, then applied it without modification. All three conditions passed.

Pre-registration seal

▲ commit e32cc75

when 2026-04-10 14:57:52 -0400

mirror osf.io/wtkzg

— 93 min wall-clock gap —

▼ commit 01969cb

when 2026-04-10 16:30:28 -0400

data 12 captures · n=6 families · probe v0.1

01 · sealed pre-registration · publicly verifiable

The decision rule was committed to git ninety-three minutes before any data was captured.

The v0.3 decision rule was committed as e32cc75 at 14:57:52 ET on 2026-04-10, mirrored on OSF at osf.io/wtkzg. The first v0.3 capture landed at 16:30:28 ET as 01969cb — a 93-minute gap anyone can verify from the git history. No field in the decision rule was touched after data collection.

02 · H1 supported · all three sealed conditions passed

Mean LOO cosine +0.769. Permutation p = 0.0315. Bootstrap CI strictly above zero.

The sealed primary measurement was the entropy early-window leave-one-out cosine at n≥5 families. Observed: mean LOO cosine +0.769 (threshold ≥ 0.40), permutation p = 0.0315 (threshold < 0.05), bootstrap 95% CI [+0.571, +0.869] (lower bound > 0). All 6 of 6 families show positive LOO cosine. Verdict: H1 supported.

H1 primary · entropy early-window LOO cosine

Gemma-2-2B+0.977

Llama-3.2-1B+0.939

Llama-3.2-3B+0.884

Gemma-3-1B+0.682

Qwen2.5-3B+0.602

Qwen2.5-1.5B+0.531

mean+0.769 ★

03 · D = cos(h^(L), w_{y_t}) · architecturally universal

An SAE-free measurement primitive. No per-model training. Portable across architectures.

The atlas uses an SAE-free measurement primitive: the cosine between the final-layer residual stream and the unembedding row of the chosen token. It requires no SAE, no per-model training, and is well-defined on any transformer with an explicit unembedding. One per-token dot product, runnable on any model with a logprob interface — including closed-weight frontier via the entropy bridge at r = 0.902 shape correlation.

04 · physics grounding · S = M × IPR

The commitment intensity is not an ad-hoc formula. It is the inverse participation ratio.

The commitment intensity S is mathematically exactly the inverse participation ratio of the coherence event distribution — a seventy-year-old construct from condensed-matter physics (Anderson 1958, Edwards-Thouless, random matrix theory). Verified to machine precision on real trajectories. Explains why the ratio form is specific and why alternative formulas (max alone, mean alone) fail.

§2Verify it yourself

28 assertions · runs in under a minute · no GPU needed

Every numerical claim in the paper is anchored to a committed JSON file. A reproducibility script walks every claim and fails loud if any number drifts.

# 01 · clone the repo
$ git clone https://github.com/fathom-lab/fathom
$ cd fathom

# 02 · inspect the sealed pre-reg commit
$ git show e32cc75 atlas/PREREG_v0.3_attractor_replication.md
# commit author : darkflobi <[email protected]>
# commit date   : 2026-04-10 14:57:52 -0400
# verdict sealed: H1 if mean LOO cos ≥ 0.40
#                   AND perm p < 0.05
#                   AND bootstrap CI lower > 0

# 03 · run the audit
$ python atlas/verify_all_claims.py
# running 28 assertions against committed JSONs ...
# [ok] mean LOO cosine   = +0.7691  ≥ 0.40
# [ok] permutation p     = 0.0315   < 0.05
# [ok] bootstrap CI low  = +0.5708  > 0
# [ok] 6 / 6 families positive
# [ok] prereg commit     = e32cc75
# ...
# 28 / 28 PASSED  ·  0.43 s

§3Head-to-head

single-instrument validation · n = 200 TruthfulQA items

Beyond the cross-architecture replication, the SAE-derived commitment intensity S_early beats every standard uncertainty baseline on the same sample, same model, same labels.

signal	AUC	p-value	source
S_early (ours)	0.663	0.013	SAE coherence
logit entropy (max)	0.607	0.053	standard
logit entropy (mean)	0.596	0.133	standard
logprob (mean)	0.559	0.291	standard
top-2 margin	0.477	0.624	standard

Same 200 TruthfulQA items, Gemma-2-2B-IT, same labels. S_early is the only feature reaching conventional significance. Correlation with logit entropy: r = −0.17 (nearly independent signals). Cross-dataset meta-effect pooled d = +0.494, Fisher combined p = 0.0008.