Skip to content
@fathom-lab

fathom-lab

Fathom Lab

Cognitive instruments for machine cognition. Open source. Published failure modes.


What we build

We measure cognitive states of large language models at runtime — refusal, confabulation, retrieval, reasoning, adversarial drift — from signals already carried on the token stream and residual activations. Three of our tools are public today:

  • styxx — one decorator, any LLM call, cross-validated hallucination detection. pip install styxx[nli] + @trust. Cross-validated across 8 public benchmarks with two declared failure modes published openly in the weights module.

  • fathom — SAE-based depth measurement for transformer internals. Fathom constant 1.0212 measured across two open-weight architectures.

  • Cognometry manifestofathom.darkflobi.com/cognometry — three falsifiable laws for cognometric measurement, each with a cross-validated number.

Current numbers (styxx v4.0.2, 3-seed averaged, n=150/dataset)

Benchmark AUC
HaluEval-QA 0.998 ± 0.001
TruthfulQA 0.994 ± 0.006
HaluBench-RAGTruth 0.807 ± 0.043
HaluBench-PubMedQA 0.719 ± 0.051
HaluEval-Dialog 0.676 ± 0.037
HaluEval-Summarization 0.643 ± 0.060
HaluBench-FinanceBench 0.492 ± 0.026 — declared failure
HaluBench-DROP 0.424 ± 0.080 — declared failure

Two of the eight came in below chance. They're declared in calibrated_weights_v4.CALIBRATION_NOTES.documented_failure_modes so production callers know where the detector will lie. That honesty is load-bearing for how we run this lab.

Cognometry leaderboard

Open submission: any lab can PR a detector following the Cognometry Detector Interface v0 protocol and have it auto-evaluated against our 8 benchmarks. Live table:

fathom.darkflobi.com/cognometry/leaderboard

Papers

Also at

How to contribute

  • Disconfirmations welcome. If a number is wrong at your favorite seed, open an issue or PR — we cite disconfirmations in the next paper.
  • Submit a detector to the cognometry leaderboard (one-file PR, protocol above).
  • Extend the benchmark suite. FEVER, FactCC, XSum-Faithful, and PHD-A are on the v4.2 track — PRs welcome.

"nothing crosses unseen"

Popular repositories Loading

  1. styxx styxx Public

    Cognitive observability for LLM agents. Nine calibrated cognometric instruments — pure-Python, MIT, no LLM required. 9-for-9 on K=1 phase transition. Every Mind Leaves Vitals (DOI 10.5281/zenodo.19…

    Python 5 1

  2. pneuma pneuma Public

    a measured-AI desktop chat — the agent's body is its breath. its breath cannot lie. powered by styxx.

    HTML 2

  3. fathom fathom Public

    cognitive geometry of llm reasoning via sae feature coherence. cross-architecture. physics-grounded. patent pending.

    Python

  4. darkcity darkcity Public

    darkcity

    JavaScript

  5. .github .github Public

    Fathom Lab organization profile

Repositories

Showing 5 of 5 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…