Skip to content

research(routing): KVzip importance scoring as principled heuristic for compaction pruning #1824

@bug-ops

Description

@bug-ops

Source

KVzip: Query-Agnostic KV Cache Compression
https://arxiv.org/abs/2505.23416 — May 2025 (NeurIPS 2025 Oral)

Summary

KVzip scores KV pairs by how well the LLM can reconstruct the original context from them (teacher-forced decoding), producing a query-independent importance signal. Achieves 3-4x cache reduction with negligible accuracy loss.

Applicability to Zeph

MEDIUM. Zeph doesn't control the inference engine KV cache. However, the core idea — ranking context chunks by how well the model could reconstruct them, independent of the current query — is directly applicable as a principled alternative to Zeph's current recency + token-count heuristics for choosing which tool outputs to summarize or prune during tiered compaction.

Proposed adaptation

Rather than implementing teacher-forced decoding (requires inference engine access), adapt the importance signal concept:

  1. For each tool output candidate for pruning, compute a lightweight proxy importance score:
    • Length vs. information density ratio (unique tokens / total tokens)
    • Recency weight (Ebbinghaus decay, already implemented)
    • Reference count (how many subsequent turns reference this tool's output)
  2. Combine into a composite pruning_priority score
  3. During Soft compaction: prune lowest-priority tool outputs first (vs. current oldest-first)

This is a targeted improvement to ContextManager::apply_soft_compaction() in zeph-core.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Research — medium-high complexityresearchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions