-
Notifications
You must be signed in to change notification settings - Fork 2
research(routing): KVzip importance scoring as principled heuristic for compaction pruning #1824
Description
Source
KVzip: Query-Agnostic KV Cache Compression
https://arxiv.org/abs/2505.23416 — May 2025 (NeurIPS 2025 Oral)
Summary
KVzip scores KV pairs by how well the LLM can reconstruct the original context from them (teacher-forced decoding), producing a query-independent importance signal. Achieves 3-4x cache reduction with negligible accuracy loss.
Applicability to Zeph
MEDIUM. Zeph doesn't control the inference engine KV cache. However, the core idea — ranking context chunks by how well the model could reconstruct them, independent of the current query — is directly applicable as a principled alternative to Zeph's current recency + token-count heuristics for choosing which tool outputs to summarize or prune during tiered compaction.
Proposed adaptation
Rather than implementing teacher-forced decoding (requires inference engine access), adapt the importance signal concept:
- For each tool output candidate for pruning, compute a lightweight proxy importance score:
- Length vs. information density ratio (unique tokens / total tokens)
- Recency weight (Ebbinghaus decay, already implemented)
- Reference count (how many subsequent turns reference this tool's output)
- Combine into a composite
pruning_priorityscore - During Soft compaction: prune lowest-priority tool outputs first (vs. current oldest-first)
This is a targeted improvement to ContextManager::apply_soft_compaction() in zeph-core.
Related
- Tiered compaction (PR feat(memory): tiered context compaction — soft at 70%, hard at 90% (#1338) #1720, Tiered context compaction: soft at 70%, hard at 90% #1338)
- Ebbinghaus eviction policy (already in
zeph-memory) - Tool output pruning (current oldest-first approach)