-
Notifications
You must be signed in to change notification settings - Fork 2
research: task-continuation metric for post-compaction validation #1609
Copy link
Copy link
Closed
Labels
P1High ROI, low complexity — do next sprintHigh ROI, low complexity — do next sprintenhancementNew feature or requestNew feature or requestmemoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)researchResearch-driven improvementResearch-driven improvement
Description
Research Finding
Source: Factory.ai evaluation framework (2025), ICLR 2025
Applicability: Medium | Complexity: Simple
Problem
Zeph has no validation after context compaction. If summarization loses a critical fact (file path, decision, API key location), the next LLM call silently operates on incomplete context. This is the root cause of the class of bugs where agents 'forget' state after long sessions.
Proposed Approach
After each summarization event, run a lightweight 'compaction probe':
- Generate 2-3 factual questions from the original turns that are about to be summarized (e.g., 'What file was modified?', 'What was the user's goal?')
- Inject the new summary as context and ask the questions
- Score answers against expected (stored before compaction)
- If probe score < threshold: log WARN with question/answer pairs, optionally fall back to keeping original turns
The probe adds 1 extra LLM call per compaction event (mitigated by using a cheap fast model via the existing orchestrator).
Integration Points
crates/zeph-memory:validate_compaction(before_messages, summary) -> CompactionScore[memory.compression]config:probe_enabled = false,probe_model(defaults to summary model),probe_threshold = 0.7- Debug dump: include
compaction_probesection with questions, answers, score - CLI: no user-facing change (background validation)
Reference
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1High ROI, low complexity — do next sprintHigh ROI, low complexity — do next sprintenhancementNew feature or requestNew feature or requestmemoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)researchResearch-driven improvementResearch-driven improvement