Skip to content

feat(memory): structured compaction probe categories (#2164)#2181

Merged
bug-ops merged 1 commit intomainfrom
research-compression-probe-bas
Mar 26, 2026
Merged

feat(memory): structured compaction probe categories (#2164)#2181
bug-ops merged 1 commit intomainfrom
research-compression-probe-bas

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 26, 2026

Summary

  • Add four functional probe dimensions (Recall, Artifact, Continuation, Decision) to the compaction probe pipeline with per-category scoring breakdown
  • New probe_provider config field in [memory.compression.probe] resolves from [[llm.providers]] at startup (follows build_judge_provider pattern); old dead model field removed
  • TUI memory panel shows per-category scores (Rec/Art/Con/Dec) with threshold-based color coding using compaction_probe_threshold from MetricsSnapshot

Changes

crates/zeph-memory/src/compaction_probe.rs

  • ProbeCategory enum with #[serde(rename_all = "lowercase")]
  • CategoryScore struct (category, score, probes_run)
  • ProbeQuestion.category field with #[serde(default = "default_probe_category")] (backward compat: defaults to Recall)
  • CompactionProbeResult.category_scores with #[serde(default)] (backward compat with old JSON)
  • CompactionProbeConfig: probe_provider: String, category_weights: Option<HashMap<ProbeCategory, f32>>; model field removed (was dead code, never read at call site)
  • compute_category_scores(): weighted average with zero-weight fallback; tracing::warn on negative weights
  • run_probe(): empty summary guard (< 10 chars), missing-category warn when max_questions >= 4, category-aware LLM prompt

crates/zeph-core/src/metrics.rs

  • last_probe_category_scores: Option<Vec<CategoryScore>>
  • compaction_probe_threshold: f32, compaction_probe_hard_fail_threshold: f32

crates/zeph-core/src/agent/context/summarization.rs

  • Sets last_probe_category_scores, compaction_probe_threshold, compaction_probe_hard_fail_threshold in all verdict branches

crates/zeph-tui/src/widgets/memory.rs

  • render_probe_last_line(): appends Rec:X.XX Art:X.XX Con:X.XX Dec:X.XX with threshold-based color; absent categories show --
  • New snapshot test probe_lines_with_category_scores

crates/zeph-core/src/debug_dump/mod.rs

  • dump_compaction_probe() includes category_scores array and category per question

crates/zeph-core/src/bootstrap/mod.rs

  • build_probe_provider(): resolves probe_provider name at startup, returns Option<AnyProvider>

crates/zeph-core/src/agent/{builder,state/mod,mod}.rs

  • ProviderState.probe_provider: Option<AnyProvider>, with_probe_provider() builder, probe_or_summary_provider() fallback

src/{runner,daemon,acp}.rs

  • Wire build_probe_provider() in all three entry points

src/init.rs

  • Replace probe_model with probe_provider in wizard state and prompts

Test plan

  • cargo +nightly fmt --check — clean
  • cargo clippy -p zeph-memory -p zeph-core -p zeph-tui -- -D warnings — clean
  • cargo nextest run --workspace --lib --bins — 6050 passed, 15 skipped
  • New unit tests: ProbeCategory serde, category_weights TOML round-trip, compute_category_scores (equal weights, custom weights, zero-weight fallback, missing category, multi-probe average), backward compat deserialization (old JSON without category_scores/category fields)
  • New TUI snapshot: probe_lines_with_category_scores

Closes #2164

Add four functional probe dimensions (Recall, Artifact, Continuation,
Decision) to the compaction probe pipeline:

- ProbeCategory enum with serde lowercase serialization
- CategoryScore struct with per-category average and question count
- ProbeQuestion gains a `category` field (serde default = Recall for
  backward compat with old persisted JSON)
- CompactionProbeResult gains `category_scores` field (#[serde(default)]
  for backward compat)
- CompactionProbeConfig: add `probe_provider` (replaces dead `model`
  field) and `category_weights: Option<HashMap<ProbeCategory, f32>>`
- compute_category_scores(): weighted average with fallback to equal
  weighting when all weights are 0; tracing::warn on negative weights
- run_probe(): warns per-missing-category when max_questions >= 4;
  empty-summary guard (< 10 chars) skips probe
- Provider wiring: build_probe_provider() at startup (same pattern as
  build_judge_provider); probe_or_summary_provider() fallback chain
- TUI: per-category Rec/Art/Con/Dec scores with threshold-based color
  coding using compaction_probe_threshold from MetricsSnapshot (IC-01)
- MetricsSnapshot: last_probe_category_scores, compaction_probe_threshold,
  compaction_probe_hard_fail_threshold fields
- Debug dump: category_scores array and category per question
- Init wizard: probe_provider prompt replaces probe_model
- All 6031 tests pass; new snapshot test for category score display
@github-actions github-actions bot added enhancement New feature or request size/XL Extra large PR (500+ lines) documentation Improvements or additions to documentation memory zeph-memory crate (SQLite) rust Rust code changes core zeph-core crate dependencies Dependency updates and removed enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 26, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 26, 2026 17:00
@bug-ops bug-ops merged commit 0e36ee6 into main Mar 26, 2026
25 checks passed
@bug-ops bug-ops deleted the research-compression-probe-bas branch March 26, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate dependencies Dependency updates documentation Improvements or additions to documentation memory zeph-memory crate (SQLite) rust Rust code changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

research(compression): probe-based evaluation of context compaction quality

1 participant