Conversation
This was referenced Mar 20, 2026
e1cee06 to
d520822
Compare
Adds a post-summarization validation step that fires before the summary
replaces original messages in context. Prevents silent context degradation
from lossy compaction.
Core changes:
- New `crates/zeph-memory/src/compaction_probe.rs` with `CompactionProbeConfig`,
`CompactionProbeResult`, `ProbeVerdict` (Pass/SoftFail/HardFail), and
`score_answers()` using token-set-ratio (handles paraphrasing + substring boost).
`validate_compaction()` orchestrates two LLM calls: question generation
and answer evaluation from summary only.
- `compact_context()` return type changed from `Result<(), AgentError>` to
`Result<CompactionOutcome, AgentError>` where `CompactionOutcome` is
`Compacted | ProbeRejected | NoChange`. This is a breaking internal API change:
`maybe_compact()` now matches on the outcome enum and handles `ProbeRejected`
with a cooldown but without triggering `Exhausted` — which would have been
incorrect since the compactor is not stuck, only the summary was too lossy.
- Config: `[memory.compression.probe]` section with `enabled = false` default.
Backward compatible — existing configs without the section deserialize unchanged.
- Metrics: 4 new `MetricsSnapshot` fields for pass/soft_fail/failure/error counts.
- Debug dump: `dump_compaction_probe()` writes `{N}-compaction-probe.json` with
per-question breakdown (question, expected, actual, score).
12 unit tests for scoring edge cases, serde round-trips, and threshold logic.
6044 tests pass, 0 clippy warnings.
Add 4 new unit tests for edge cases not covered by the initial 12: - fewer_answers_than_questions: LLM returns truncated answer list, verifies repeat(&String::new()) padding and score averaging - verdict_boundary_at_threshold: exact boundary values (0.6, 0.35) and one-ULP margins for all three verdict tiers - config_partial_json_uses_defaults: partial deserialization exercises #[serde(default)] on CompactionProbeConfig - config_empty_json_uses_all_defaults: empty object deserialization Total: 16 compaction_probe tests, all passing.
SEC-PROBE-01: cap LLM response to max_questions in generate_probe_questions()
- Add output.questions.truncate(max_questions) after the LLM call.
- Prevents a misbehaving LLM from returning hundreds of questions and
inflating the second LLM prompt.
SEC-PROBE-02: apply scrub_content() to probe dump fields
- Import crate::redact::scrub_content in debug_dump/mod.rs.
- Apply scrub_content() to question, expected_answer, and actual answer
fields before writing to the JSON dump file.
- Matches the redaction pattern used in trace.rs (IMP-04).
PERF-03: emit TUI status indicator before probe LLM calls
- Add channel.send_status("Validating compaction quality...") before
validate_compaction() is called.
- Complies with CLAUDE.md TUI rules: every implicit LLM call must have
a visible status indicator.
T-06: add H1 integration test for ProbeRejected state machine invariant
- New test probe_rejected_does_not_trigger_exhausted in context/tests.rs.
- Configures mock provider with refusal answers (score ~0 → HardFail).
- Asserts compact_context() returns ProbeRejected, messages are preserved,
and CompactionState is not Exhausted (H1 design invariant).
d520822 to
4c0e24f
Compare
This was referenced Mar 20, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements a post-compression validation system (compaction probe) to detect if context summarization loses critical facts. When enabled, the probe generates 2-3 factual questions from context being compacted, answers them using only the summary, and scores the answers. If score falls below threshold, the original context is preserved and a warning is logged.
Closes #1609 (research: task-continuation metric for post-compaction validation)
Changes
Core Implementation
crates/zeph-memory/src/compaction_probe.rs(new module):validate_compaction()function,CompactionProbeConfig,ProbeVerdict(Pass/SoftFail/HardFail), token-set-ratio scoring with paraphrasing supportcrates/zeph-memory/src/config.rs: Added[memory.compression.probe]config section (default: disabled)crates/zeph-core/src/agent/context/summarization.rs:compact_context()return type:Result<(), AgentError>→Result<CompactionOutcome, AgentError>CompactionOutcome::ProbeRejectedvariant for dedicated error pathmaybe_compact()to handle ProbeRejected (sets cooldown, skips Exhausted check — H1 invariant)crates/zeph-core/src/debug_dump/mod.rs: Addeddump_compaction_probe()section with questions, answers, scorecrates/zeph-core/src/metrics.rs: 4 new counters for probe verdicts and errorsTesting
probe_rejected_does_not_trigger_exhaustedverifies ProbeRejected path does not transition to ExhaustedSecurity Fixes
scrub_content()(prevents secret leakage)Performance & UX
Design Reference
Validator Feedback
All validators approved with conditional fixes, all addressed:
Integration Notes
[memory.compression.probe] enabled = false(disabled by default for backward compatibility)compact_context()return type (internal method, properly scoped)Testing Checklist
Next Steps (Phase 3 — Deferred)
--initwizard update for[memory.compression.probe]config optiondocs/src/concepts/context-management.mdupdateBranch & Commits
issue-1609-compaction-validation5889f539— feat(memory): compaction probe validation core implementation1eb86827— test(memory): expand compaction probe test coveragee1cee063— fix(memory): address validator findings (truncate, scrub_content, TUI spinner)Closes #1609