-
Notifications
You must be signed in to change notification settings - Fork 2
research(tools): NabaOS tool receipt layer — 94.2% hallucination detection at <15ms overhead (arXiv:2603.10060) #2266
Description
Summary
NabaOS (arXiv:2603.10060, March 2026) — a verification layer that categorizes every agent claim by its epistemic source (direct tool output, inference, external testimony, absence, unsupported opinion). Cryptographic receipts are generated per tool call, preventing the LLM from fabricating or misrepresenting tool results. Tested on 1,800 scenarios across 4 languages: 94.2% detection of fabricated tool references, 87.6% count misstatements, 91.3% false absence claims — all under 15ms overhead vs 180,000ms for ZK-proof alternatives.
Applicability to Zeph
Direct fit for zeph-tools. Zeph's ToolExecutor and audit log already capture tool call results. Adding an epistemically-tagged receipt wrapper:
- Fabrication prevention: tag each tool result with its provenance (real output vs. LLM inference), block untrusted claims from entering the context as facts
- Audit trail enrichment: the existing
.local/testing/data/audit-test.jsonlaudit log could carry claim-source tags per tool invocation - Low overhead: <15ms is well within the tool execution latency budget; no model change required
Implementation Sketch
- SHORT term (LOW): add
claim_source: ClaimSourceenum toToolResult— variants:DirectOutput,Cached,LlmInference,NotFound - MEDIUM term: expose
claim_sourcein debug dumps and audit log; use it inContentSanitizerto downgrade trust ofLlmInferenceresults - LONG term: implement lightweight receipt hashing to detect cross-turn result substitution
Related
Complements existing security pipeline: ContentSanitizer (injection flags) + ExfiltrationGuard + PiiFilter. Adds a fourth layer for output provenance.
Source: https://arxiv.org/abs/2603.10060