Skip to content

bug(core): context compaction loop when budget too tight — each turn compacts, 400 on oversized context #1708

@bug-ops

Description

@bug-ops

Description

When context_budget_tokens is set too tight (e.g. 18K with a 12K+ system prompt), the compaction logic enters a multi-turn loop where every single turn triggers compaction, compaction summaries grow the context further, and eventually the API returns 400 (context too large).

Reproduction

Config: context_budget_tokens = 18000, compaction_threshold = 0.65, auto_budget = false, provider = openai
Run 7+ turns with tool calls.

Observed Behavior

  1. Turn 5: cached_tokens=15371, threshold=11700 → should_compact=truecompaction M0: Project bootstrap — workspace and crate skeleton #1 fires: 40 messages → summary_tokens=3104. After: cached_tokens=12329 (still > threshold).
  2. Turn 6: should_compact=true again — compaction M1: Ollama chat loop — interactive CLI with LLM #2 fires: 2 messages → summary_tokens=2755. After: cached_tokens=12217 (still > threshold).
  3. Turn 7: compaction M2: Skills system — SKILL.md loading and shell execution #3 fires — 400 Bad Request from API (context too large).

The root problem: compaction summaries are injected back as system messages. With a very tight budget, the system prompt + injected summaries alone exceed the threshold, so every turn triggers compaction even when there are almost no messages left to compact.

Secondary bug: compaction #2 produced a 2755-token summary for only 2 messages — the summary is larger than what it replaced, making the context worse.

Debug dump evidence

Request #3 (failed): 3 system messages (18630 + 2059 + 13957 chars), tool output 15799 chars, max_tokens=4096. Total context clearly exceeds 18K token budget.

Expected Behavior

  • After compaction, if cached_tokens is still above threshold, emit WARN "context compaction could not reduce usage below threshold (compacted N messages, still at M/B tokens)" and stop attempting further compaction for that session — surface Stopping: context window is nearly full to user.
  • OR: add a post-compaction cooldown: skip should_compact() for the next N turns after a successful compaction.
  • Compaction summary should be bounded — if summary_tokens > freed_tokens, it was counterproductive; log WARN and don't apply.

Severity

Medium — only affects users with extremely tight context_budget_tokens settings (well below the default). With auto_budget = true (default), this doesn't occur. Manual tight budgets hit this edge case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions