Measure what models think.
The first pre-registered cross-architecture replication in mechanistic interpretability. Six model families, sealed in git commit e32cc75 ninety-three minutes before any data was captured, all three sealed decision conditions passed.
read the paper · DOI · try the playground · github · ★ styxx — first product
§1The atlas v0.3 replication
April 10, 2026 · the headline result
We ran twelve open-weight captures (six families × base/instruct) on a fixed 90-prompt probe set, sealed the decision rule in git before any data was captured, then applied it without modification. All three conditions passed.
The decision rule was committed to git ninety-three minutes before any data was captured.
The v0.3 decision rule was committed as e32cc75 at 14:57:52 ET on 2026-04-10, mirrored on OSF at osf.io/wtkzg. The first v0.3 capture landed at 16:30:28 ET as 01969cb — a 93-minute gap anyone can verify from the git history. No field in the decision rule was touched after data collection.
Mean LOO cosine +0.769. Permutation p = 0.0315. Bootstrap CI strictly above zero.
The sealed primary measurement was the entropy early-window leave-one-out cosine at n≥5 families. Observed: mean LOO cosine +0.769 (threshold ≥ 0.40), permutation p = 0.0315 (threshold < 0.05), bootstrap 95% CI [+0.571, +0.869] (lower bound > 0). All 6 of 6 families show positive LOO cosine. Verdict: H1 supported.
An SAE-free measurement primitive. No per-model training. Portable across architectures.
The atlas uses an SAE-free measurement primitive: the cosine between the final-layer residual stream and the unembedding row of the chosen token. It requires no SAE, no per-model training, and is well-defined on any transformer with an explicit unembedding. One per-token dot product, runnable on any model with a logprob interface — including closed-weight frontier via the entropy bridge at r = 0.902 shape correlation.
The commitment intensity is not an ad-hoc formula. It is the inverse participation ratio.
The commitment intensity S is mathematically exactly the inverse participation ratio of the coherence event distribution — a seventy-year-old construct from condensed-matter physics (Anderson 1958, Edwards-Thouless, random matrix theory). Verified to machine precision on real trajectories. Explains why the ratio form is specific and why alternative formulas (max alone, mean alone) fail.
§2Verify it yourself
28 assertions · runs in under a minute · no GPU needed
Every numerical claim in the paper is anchored to a committed JSON file. A reproducibility script walks every claim and fails loud if any number drifts.
# 01 · clone the repo
$ git clone https://github.com/fathom-lab/fathom
$ cd fathom
# 02 · inspect the sealed pre-reg commit
$ git show e32cc75 atlas/PREREG_v0.3_attractor_replication.md
# commit author : darkflobi <[email protected]>
# commit date : 2026-04-10 14:57:52 -0400
# verdict sealed: H1 if mean LOO cos ≥ 0.40
# AND perm p < 0.05
# AND bootstrap CI lower > 0
# 03 · run the audit
$ python atlas/verify_all_claims.py
# running 28 assertions against committed JSONs ...
# [ok] mean LOO cosine = +0.7691 ≥ 0.40
# [ok] permutation p = 0.0315 < 0.05
# [ok] bootstrap CI low = +0.5708 > 0
# [ok] 6 / 6 families positive
# [ok] prereg commit = e32cc75
# ...
# 28 / 28 PASSED · 0.43 s
§3Head-to-head
single-instrument validation · n = 200 TruthfulQA items
Beyond the cross-architecture replication, the SAE-derived commitment intensity Searly beats every standard uncertainty baseline on the same sample, same model, same labels.
| signal | AUC | p-value | source |
|---|---|---|---|
| Searly (ours) | 0.663 | 0.013 | SAE coherence |
| logit entropy (max) | 0.607 | 0.053 | standard |
| logit entropy (mean) | 0.596 | 0.133 | standard |
| logprob (mean) | 0.559 | 0.291 | standard |
| top-2 margin | 0.477 | 0.624 | standard |
Same 200 TruthfulQA items, Gemma-2-2B-IT, same labels. Searly is the only feature reaching conventional significance. Correlation with logit entropy: r = −0.17 (nearly independent signals). Cross-dataset meta-effect pooled d = +0.494, Fisher combined p = 0.0008.
§4Open science artifacts
every claim traceable · every byte open
Nothing crosses unseen.