feat(memory): structured compaction probe categories (#2164) by bug-ops · Pull Request #2181 · bug-ops/zeph

bug-ops · 2026-03-26T17:00:15Z

Summary

Add four functional probe dimensions (Recall, Artifact, Continuation, Decision) to the compaction probe pipeline with per-category scoring breakdown
New probe_provider config field in [memory.compression.probe] resolves from [[llm.providers]] at startup (follows build_judge_provider pattern); old dead model field removed
TUI memory panel shows per-category scores (Rec/Art/Con/Dec) with threshold-based color coding using compaction_probe_threshold from MetricsSnapshot

Changes

crates/zeph-memory/src/compaction_probe.rs

ProbeCategory enum with #[serde(rename_all = "lowercase")]
CategoryScore struct (category, score, probes_run)
ProbeQuestion.category field with #[serde(default = "default_probe_category")] (backward compat: defaults to Recall)
CompactionProbeResult.category_scores with #[serde(default)] (backward compat with old JSON)
CompactionProbeConfig: probe_provider: String, category_weights: Option<HashMap<ProbeCategory, f32>>; model field removed (was dead code, never read at call site)
compute_category_scores(): weighted average with zero-weight fallback; tracing::warn on negative weights
run_probe(): empty summary guard (< 10 chars), missing-category warn when max_questions >= 4, category-aware LLM prompt

crates/zeph-core/src/metrics.rs

last_probe_category_scores: Option<Vec<CategoryScore>>
compaction_probe_threshold: f32, compaction_probe_hard_fail_threshold: f32

crates/zeph-core/src/agent/context/summarization.rs

Sets last_probe_category_scores, compaction_probe_threshold, compaction_probe_hard_fail_threshold in all verdict branches

crates/zeph-tui/src/widgets/memory.rs

render_probe_last_line(): appends Rec:X.XX Art:X.XX Con:X.XX Dec:X.XX with threshold-based color; absent categories show --
New snapshot test probe_lines_with_category_scores

crates/zeph-core/src/debug_dump/mod.rs

dump_compaction_probe() includes category_scores array and category per question

crates/zeph-core/src/bootstrap/mod.rs

build_probe_provider(): resolves probe_provider name at startup, returns Option<AnyProvider>

crates/zeph-core/src/agent/{builder,state/mod,mod}.rs

ProviderState.probe_provider: Option<AnyProvider>, with_probe_provider() builder, probe_or_summary_provider() fallback

src/{runner,daemon,acp}.rs

Wire build_probe_provider() in all three entry points

src/init.rs

Replace probe_model with probe_provider in wizard state and prompts

Test plan

cargo +nightly fmt --check — clean
cargo clippy -p zeph-memory -p zeph-core -p zeph-tui -- -D warnings — clean
cargo nextest run --workspace --lib --bins — 6050 passed, 15 skipped
New unit tests: ProbeCategory serde, category_weights TOML round-trip, compute_category_scores (equal weights, custom weights, zero-weight fallback, missing category, multi-probe average), backward compat deserialization (old JSON without category_scores/category fields)
New TUI snapshot: probe_lines_with_category_scores

Closes #2164

Add four functional probe dimensions (Recall, Artifact, Continuation, Decision) to the compaction probe pipeline: - ProbeCategory enum with serde lowercase serialization - CategoryScore struct with per-category average and question count - ProbeQuestion gains a `category` field (serde default = Recall for backward compat with old persisted JSON) - CompactionProbeResult gains `category_scores` field (#[serde(default)] for backward compat) - CompactionProbeConfig: add `probe_provider` (replaces dead `model` field) and `category_weights: Option<HashMap<ProbeCategory, f32>>` - compute_category_scores(): weighted average with fallback to equal weighting when all weights are 0; tracing::warn on negative weights - run_probe(): warns per-missing-category when max_questions >= 4; empty-summary guard (< 10 chars) skips probe - Provider wiring: build_probe_provider() at startup (same pattern as build_judge_provider); probe_or_summary_provider() fallback chain - TUI: per-category Rec/Art/Con/Dec scores with threshold-based color coding using compaction_probe_threshold from MetricsSnapshot (IC-01) - MetricsSnapshot: last_probe_category_scores, compaction_probe_threshold, compaction_probe_hard_fail_threshold fields - Debug dump: category_scores array and category per question - Init wizard: probe_provider prompt replaces probe_model - All 6031 tests pass; new snapshot test for category score display

bug-ops enabled auto-merge (squash) March 26, 2026 17:00

bug-ops merged commit 0e36ee6 into main Mar 26, 2026
25 checks passed

bug-ops deleted the research-compression-probe-bas branch March 26, 2026 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): structured compaction probe categories (#2164)#2181

feat(memory): structured compaction probe categories (#2164)#2181
bug-ops merged 1 commit intomainfrom
research-compression-probe-bas

bug-ops commented Mar 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Mar 26, 2026

Summary

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant