Skip to content

research(context): task-aware self-adaptive context pruning (SWE-Pruner) #1851

@bug-ops

Description

@bug-ops

Research Finding

Sources:

  • SWE-Pruner — arXiv 2601.16746 (Jan 2026): task-aware self-adaptive context pruning, 40% API cost reduction with no accuracy regression on SWE-bench
  • COMI — arXiv 2602.01719 (Feb 2026): coarse-to-fine compression via Marginal Information Gain

Problem

Zeph's compaction currently prunes by recency + token count (oldest-first). Both papers propose principled scoring functions to decide which context chunks to retain — they are complementary approaches to the same problem and can be evaluated/combined.

Algorithm A: Task-Goal Guided Pruning (SWE-Pruner)

Before each hard compaction, emit a lightweight LLM call (~50 tokens): "Given the current task goal, summarize in one sentence what context is most important to preserve."

Use the extracted goal string as a relevance signal:

  • Score each tool-output block against it via cosine similarity (embedding already available) or keyword overlap
  • Prune lowest-scoring blocks first, rather than oldest-first

Config: [memory.compression] strategy = "task_aware" (default remains "reactive").

Complexity: MEDIUM — one extra LLM call per compaction event; embedding scoring reuses existing infrastructure.

Algorithm B: Marginal Information Gain (COMI)

Two-stage scoring replacing oldest-first:

MIG = relevance(unit, query) − redundancy(unit, already-selected-units)

  1. Coarse: partition context into groups (by type: system/user/assistant/tool, or temporal window) → compute inter-group MIG → allocate compaction budget proportionally (low-relevance groups compressed more)
  2. Fine: within each group, greedily select units by per-unit MIG until budget exhausted

Particularly effective when recent messages are redundant (repeated tool calls).

Synergy with existing work:

Cost: requires embedding calls for relevance scoring; can reuse Qdrant embeddings already computed per turn.

Config: [memory.compression] strategy = "mig".

Complexity: MEDIUM-HIGH — group partitioning + MIG scoring + budget allocation loop.

Implementation Plan

  1. Define CompactionStrategy enum: Reactive | TaskAware | Mig | TaskAwareMig (combined)
  2. Start with Algorithm A (simpler, single LLM call) — validate quality improvement vs baseline
  3. Add Algorithm B as optional upgrade or combine: strategy = "task_aware_mig"
  4. Emit pruning goal and scores at DEBUG for observability

Integration Points

  • zeph-core: ContextManager::apply_soft_compaction(), HardCompactor
  • zeph-memory: SemanticMemory::compress()
  • Config: [memory.compression] strategy
  • Debug dump: include pruning goal, per-block scores, strategy used

See Also

References

  • arXiv 2601.16746 (SWE-Pruner)
  • arXiv 2602.01719 (COMI — Jiwei Tang et al., Tsinghua/Alibaba, Feb 2026)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2High value, medium complexityresearchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions