feat(memory): compaction probe validation for post-compression context integrity (#1609) by bug-ops · Pull Request #2047 · bug-ops/zeph

bug-ops · 2026-03-20T14:40:58Z

Summary

Implements a post-compression validation system (compaction probe) to detect if context summarization loses critical facts. When enabled, the probe generates 2-3 factual questions from context being compacted, answers them using only the summary, and scores the answers. If score falls below threshold, the original context is preserved and a warning is logged.

Closes #1609 (research: task-continuation metric for post-compaction validation)

Changes

Core Implementation

crates/zeph-memory/src/compaction_probe.rs (new module): validate_compaction() function, CompactionProbeConfig, ProbeVerdict (Pass/SoftFail/HardFail), token-set-ratio scoring with paraphrasing support
crates/zeph-memory/src/config.rs: Added [memory.compression.probe] config section (default: disabled)
crates/zeph-core/src/agent/context/summarization.rs:
- Changed compact_context() return type: Result<(), AgentError> → Result<CompactionOutcome, AgentError>
- Added CompactionOutcome::ProbeRejected variant for dedicated error path
- Integrated probe validation hook after summarization
- Updated maybe_compact() to handle ProbeRejected (sets cooldown, skips Exhausted check — H1 invariant)
crates/zeph-core/src/debug_dump/mod.rs: Added dump_compaction_probe() section with questions, answers, score
crates/zeph-core/src/metrics.rs: 4 new counters for probe verdicts and errors

Testing

16 unit tests for scoring, thresholds, serde, unicode edge cases
H1 integration test: probe_rejected_does_not_trigger_exhausted verifies ProbeRejected path does not transition to Exhausted
Threshold boundary tests: Exact f32 values at 0.6, 0.35, and margins
All 6044 workspace tests pass

Security Fixes

Max questions enforced after LLM response (prevents prompt injection from misbehaving LLM)
Debug dump redacts questions/answers via scrub_content() (prevents secret leakage)

Performance & UX

TUI status indicator during probe LLM calls ("Validating compaction quality...")
Non-blocking error handling: probe failure is logged, compaction proceeds normally
~2–3s latency per probe (1–3 LLM calls), ~$0.002 cost with Haiku

Design Reference

Factory.ai evaluation framework inspiration
ICLR 2025 cascade routing insights
Three-tier verdict system:
- Pass (≥0.6): Summary is accurate, discard original
- SoftFail [0.35, 0.6): Borderline summary, log warning but proceed
- HardFail (<0.35): Summary likely lost facts, preserve original

Validator Feedback

All validators approved with conditional fixes, all addressed:

Validator	Verdict	Issues
Architect	✓ Design v2	H1 deadlock fix, token-set-ratio, soft-fail zone
Critic	✓ Approved	Design addresses all security & robustness concerns
Tester	✓ PASS	6044 tests pass, 16 probe tests, no gaps
Perf	✓ CONDITIONAL	PERF-03: TUI spinner added
Security	✓ CONDITIONAL	SEC-PROBE-01/02: truncate + scrub_content
Impl-Critic	✓ CONDITIONAL	T-01/T-06: boundary tests + H1 test
Reviewer	✓ APPROVED	All fixes verified, ready to merge

Integration Notes

Config: [memory.compression.probe] enabled = false (disabled by default for backward compatibility)
CLI: No user-facing change
TUI: Status indicator added
Backward compatibility: Preserved (probe disabled by default)
Breaking changes: Only compact_context() return type (internal method, properly scoped)

Testing Checklist

Unit tests pass (16/16 probe + 67/67 core compact tests)
All workspace tests pass (6044/6044)
H1 integration test verifies ProbeRejected behavior
Boundary threshold tests at exact decision points
Security audit passed (max_questions, debug redaction)
Performance impact within spec (1–3s, ~$0.002 cost)
Pre-commit checks: fmt ✓, clippy ✓, nextest ✓

Next Steps (Phase 3 — Deferred)

--init wizard update for [memory.compression.probe] config option
TUI metrics panel integration for live probe score display
Documentation: docs/src/concepts/context-management.md update

Branch & Commits

Branch: issue-1609-compaction-validation
Commits (3):
1. 5889f539 — feat(memory): compaction probe validation core implementation
2. 1eb86827 — test(memory): expand compaction probe test coverage
3. e1cee063 — fix(memory): address validator findings (truncate, scrub_content, TUI spinner)

Closes #1609

Adds a post-summarization validation step that fires before the summary replaces original messages in context. Prevents silent context degradation from lossy compaction. Core changes: - New `crates/zeph-memory/src/compaction_probe.rs` with `CompactionProbeConfig`, `CompactionProbeResult`, `ProbeVerdict` (Pass/SoftFail/HardFail), and `score_answers()` using token-set-ratio (handles paraphrasing + substring boost). `validate_compaction()` orchestrates two LLM calls: question generation and answer evaluation from summary only. - `compact_context()` return type changed from `Result<(), AgentError>` to `Result<CompactionOutcome, AgentError>` where `CompactionOutcome` is `Compacted | ProbeRejected | NoChange`. This is a breaking internal API change: `maybe_compact()` now matches on the outcome enum and handles `ProbeRejected` with a cooldown but without triggering `Exhausted` — which would have been incorrect since the compactor is not stuck, only the summary was too lossy. - Config: `[memory.compression.probe]` section with `enabled = false` default. Backward compatible — existing configs without the section deserialize unchanged. - Metrics: 4 new `MetricsSnapshot` fields for pass/soft_fail/failure/error counts. - Debug dump: `dump_compaction_probe()` writes `{N}-compaction-probe.json` with per-question breakdown (question, expected, actual, score). 12 unit tests for scoring edge cases, serde round-trips, and threshold logic. 6044 tests pass, 0 clippy warnings.

Add 4 new unit tests for edge cases not covered by the initial 12: - fewer_answers_than_questions: LLM returns truncated answer list, verifies repeat(&String::new()) padding and score averaging - verdict_boundary_at_threshold: exact boundary values (0.6, 0.35) and one-ULP margins for all three verdict tiers - config_partial_json_uses_defaults: partial deserialization exercises #[serde(default)] on CompactionProbeConfig - config_empty_json_uses_all_defaults: empty object deserialization Total: 16 compaction_probe tests, all passing.

SEC-PROBE-01: cap LLM response to max_questions in generate_probe_questions() - Add output.questions.truncate(max_questions) after the LLM call. - Prevents a misbehaving LLM from returning hundreds of questions and inflating the second LLM prompt. SEC-PROBE-02: apply scrub_content() to probe dump fields - Import crate::redact::scrub_content in debug_dump/mod.rs. - Apply scrub_content() to question, expected_answer, and actual answer fields before writing to the JSON dump file. - Matches the redaction pattern used in trace.rs (IMP-04). PERF-03: emit TUI status indicator before probe LLM calls - Add channel.send_status("Validating compaction quality...") before validate_compaction() is called. - Complies with CLAUDE.md TUI rules: every implicit LLM call must have a visible status indicator. T-06: add H1 integration test for ProbeRejected state machine invariant - New test probe_rejected_does_not_trigger_exhausted in context/tests.rs. - Configures mock provider with refusal answers (score ~0 → HardFail). - Asserts compact_context() returns ProbeRejected, messages are preserved, and CompactionState is not Exhausted (H1 design invariant).

github-actions bot added documentation Improvements or additions to documentation memory zeph-memory crate (SQLite) rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 20, 2026

This was referenced Mar 20, 2026

feat(core): add --init wizard step for compaction probe config #2048

Closed

feat(tui): add compaction probe metrics to TUI dashboard #2049

Closed

docs: add compaction probe explanation to context-management guide #2050

Closed

bug-ops enabled auto-merge (squash) March 20, 2026 14:43

bug-ops force-pushed the issue-1609-compaction-validation branch from e1cee06 to d520822 Compare March 20, 2026 14:45

bug-ops added 3 commits March 20, 2026 15:54

bug-ops force-pushed the issue-1609-compaction-validation branch from d520822 to 4c0e24f Compare March 20, 2026 14:54

bug-ops merged commit 46dc6bb into main Mar 20, 2026
25 checks passed

bug-ops deleted the issue-1609-compaction-validation branch March 20, 2026 15:02

This was referenced Mar 20, 2026

feat(init): add --init wizard step for compaction probe config (#2048) #2054

Merged

docs: add compaction probe explanation to context-management guide (#2050) #2072

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): compaction probe validation for post-compression context integrity (#1609)#2047

feat(memory): compaction probe validation for post-compression context integrity (#1609)#2047
bug-ops merged 3 commits intomainfrom
issue-1609-compaction-validation

bug-ops commented Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Mar 20, 2026

Summary

Changes

Core Implementation

Testing

Security Fixes

Performance & UX

Design Reference

Validator Feedback

Integration Notes

Testing Checklist

Next Steps (Phase 3 — Deferred)

Branch & Commits

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant