fathom
sycophancy 0.04· deception 0.02· drift 0.11· overconfidence 0.07 scored 2026-04-30 by styxx 7.1.0
FATHOM // V15 · ATLAS V0.3 · PREREG E32CC75 · H1 SUPPORTED · 2026-04-10
Pre-registered · 6 of 6 families · H1 supported

Measure what models think.

The first pre-registered cross-architecture replication in mechanistic interpretability. Six model families, sealed in git commit e32cc75 ninety-three minutes before any data was captured, all three sealed decision conditions passed.

+0.769mean LOO cosine
p = 0.0315permutation (one-sided)
6 / 6families positive
93 minseal → data gap

read the paper · DOI  ·  try the playground  ·  github  ·  ★ styxx — first product

§1The atlas v0.3 replication

April 10, 2026 · the headline result

We ran twelve open-weight captures (six families × base/instruct) on a fixed 90-prompt probe set, sealed the decision rule in git before any data was captured, then applied it without modification. All three conditions passed.

Pre-registration seal
commit e32cc75
when 2026-04-10 14:57:52 -0400
mirror osf.io/wtkzg
— 93 min wall-clock gap —
commit 01969cb
when 2026-04-10 16:30:28 -0400
data 12 captures · n=6 families · probe v0.1
01 · sealed pre-registration · publicly verifiable

The decision rule was committed to git ninety-three minutes before any data was captured.

The v0.3 decision rule was committed as e32cc75 at 14:57:52 ET on 2026-04-10, mirrored on OSF at osf.io/wtkzg. The first v0.3 capture landed at 16:30:28 ET as 01969cb — a 93-minute gap anyone can verify from the git history. No field in the decision rule was touched after data collection.

02 · H1 supported · all three sealed conditions passed

Mean LOO cosine +0.769. Permutation p = 0.0315. Bootstrap CI strictly above zero.

The sealed primary measurement was the entropy early-window leave-one-out cosine at n≥5 families. Observed: mean LOO cosine +0.769 (threshold ≥ 0.40), permutation p = 0.0315 (threshold < 0.05), bootstrap 95% CI [+0.571, +0.869] (lower bound > 0). All 6 of 6 families show positive LOO cosine. Verdict: H1 supported.

H1 primary · entropy early-window LOO cosine
Gemma-2-2B+0.977
Llama-3.2-1B+0.939
Llama-3.2-3B+0.884
Gemma-3-1B+0.682
Qwen2.5-3B+0.602
Qwen2.5-1.5B+0.531
mean+0.769 ★
03 · D = cos(h(L), wyt) · architecturally universal

An SAE-free measurement primitive. No per-model training. Portable across architectures.

The atlas uses an SAE-free measurement primitive: the cosine between the final-layer residual stream and the unembedding row of the chosen token. It requires no SAE, no per-model training, and is well-defined on any transformer with an explicit unembedding. One per-token dot product, runnable on any model with a logprob interface — including closed-weight frontier via the entropy bridge at r = 0.902 shape correlation.

04 · physics grounding · S = M × IPR

The commitment intensity is not an ad-hoc formula. It is the inverse participation ratio.

The commitment intensity S is mathematically exactly the inverse participation ratio of the coherence event distribution — a seventy-year-old construct from condensed-matter physics (Anderson 1958, Edwards-Thouless, random matrix theory). Verified to machine precision on real trajectories. Explains why the ratio form is specific and why alternative formulas (max alone, mean alone) fail.

§2Verify it yourself

28 assertions · runs in under a minute · no GPU needed

Every numerical claim in the paper is anchored to a committed JSON file. A reproducibility script walks every claim and fails loud if any number drifts.

# 01 · clone the repo
$ git clone https://github.com/fathom-lab/fathom
$ cd fathom

# 02 · inspect the sealed pre-reg commit
$ git show e32cc75 atlas/PREREG_v0.3_attractor_replication.md
# commit author : darkflobi <[email protected]>
# commit date   : 2026-04-10 14:57:52 -0400
# verdict sealed: H1 if mean LOO cos ≥ 0.40
#                   AND perm p < 0.05
#                   AND bootstrap CI lower > 0

# 03 · run the audit
$ python atlas/verify_all_claims.py
# running 28 assertions against committed JSONs ...
# [ok] mean LOO cosine   = +0.7691  ≥ 0.40
# [ok] permutation p     = 0.0315   < 0.05
# [ok] bootstrap CI low  = +0.5708  > 0
# [ok] 6 / 6 families positive
# [ok] prereg commit     = e32cc75
# ...
# 28 / 28 PASSED  ·  0.43 s

§3Head-to-head

single-instrument validation · n = 200 TruthfulQA items

Beyond the cross-architecture replication, the SAE-derived commitment intensity Searly beats every standard uncertainty baseline on the same sample, same model, same labels.

signalAUCp-valuesource
Searly (ours)0.6630.013SAE coherence
logit entropy (max)0.6070.053standard
logit entropy (mean)0.5960.133standard
logprob (mean)0.5590.291standard
top-2 margin0.4770.624standard

Same 200 TruthfulQA items, Gemma-2-2B-IT, same labels. Searly is the only feature reaching conventional significance. Correlation with logit entropy: r = −0.17 (nearly independent signals). Cross-dataset meta-effect pooled d = +0.494, Fisher combined p = 0.0008.

§4Open science artifacts

every claim traceable · every byte open

★ styxx — first product
Drop-in cognitive vitals monitor for LLM agents. Real-time cross-architecture readout. Built on fathom atlas v0.3.
fathom.darkflobi.com/styxx
paper v15 · unified release
The fathom paper + the full cognitive atlas v0.3 data bundle (probe set, 12 captures, sealed pre-reg, verify script, audit) on a single zenodo record.
10.5281/zenodo.19504993
fathom paper series · concept DOI
Permanent identifier for every version of the paper (v1 → v15). Always resolves to the latest release.
10.5281/zenodo.19326174
OSF pre-registration
Hypotheses locked before data collection · mirrored in git.
osf.io/wtkzg
GitHub
Source, analyzer, captures, reproducibility verification.
github.com/fathom-lab/fathom
live playground
Paste a (question, response, reference) — see all 7 cognometric signals fire in your browser, no install.
/cognometry-try

Nothing crosses unseen.

Install the instrument.

One line of Python. Cognometric vitals on every response.

pip install -U styxx

github · pypi · spec v1.0