-
Notifications
You must be signed in to change notification settings - Fork 2
research(memory): agentic memory benchmarking harness — hit rate, recall latency, compression ratio metrics (arXiv:2602.19320) #2419
Copy link
Copy link
Closed
Labels
P3Research — medium-high complexityResearch — medium-high complexitymemoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)researchResearch-driven improvementResearch-driven improvement
Description
Finding
Anatomy of Agentic Memory (arXiv:2602.19320)
Comprehensive taxonomy + evaluation framework for agent memory systems. Defines standardized metrics: recall hit rate, latency per recall, compression ratio, interference rate (new memories degrading old recalls), and context utilization efficiency.
Applicability to Zeph
Zeph has no systematic memory benchmarking. The journal tracks qualitative results ("cross-session recall works") but no quantitative metrics. A benchmarking harness would enable data-driven tuning of thresholds (cross_session_score_threshold, compaction_threshold, admission.threshold, etc.).
Proposed design:
# .local/testing/bench-memory.py
# 1. Seed N facts with known content
# 2. Run M recall queries at different time delays
# 3. Report: hit_rate, avg_recall_latency_ms, compression_ratio, interference_rate
python3 .local/testing/bench-memory.py --facts 50 --queries 100 --sessions 5Metrics to track:
hit_rate— fraction of seeded facts recalled correctlyrecall_latency_p50/p99— milliseconds permemory_searchcallcompression_ratio— tokens before/after compaction per sessioninterference_rate— fraction of recalled facts that are contaminated by newer, unrelated memories
Priority
P3 — tooling improvement; enables data-driven tuning of existing thresholds.
Source
- arXiv:2602.19320 — Anatomy of Agentic Memory: A Principled Survey and Evaluation Framework
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P3Research — medium-high complexityResearch — medium-high complexitymemoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)researchResearch-driven improvementResearch-driven improvement