-
Notifications
You must be signed in to change notification settings - Fork 2
bug(core): context compaction loop when budget too tight — each turn compacts, 400 on oversized context #1708
Description
Description
When context_budget_tokens is set too tight (e.g. 18K with a 12K+ system prompt), the compaction logic enters a multi-turn loop where every single turn triggers compaction, compaction summaries grow the context further, and eventually the API returns 400 (context too large).
Reproduction
Config: context_budget_tokens = 18000, compaction_threshold = 0.65, auto_budget = false, provider = openai
Run 7+ turns with tool calls.
Observed Behavior
- Turn 5:
cached_tokens=15371, threshold=11700 → should_compact=true— compaction M0: Project bootstrap — workspace and crate skeleton #1 fires: 40 messages →summary_tokens=3104. After:cached_tokens=12329(still > threshold). - Turn 6:
should_compact=trueagain — compaction M1: Ollama chat loop — interactive CLI with LLM #2 fires: 2 messages →summary_tokens=2755. After:cached_tokens=12217(still > threshold). - Turn 7: compaction M2: Skills system — SKILL.md loading and shell execution #3 fires — 400 Bad Request from API (context too large).
The root problem: compaction summaries are injected back as system messages. With a very tight budget, the system prompt + injected summaries alone exceed the threshold, so every turn triggers compaction even when there are almost no messages left to compact.
Secondary bug: compaction #2 produced a 2755-token summary for only 2 messages — the summary is larger than what it replaced, making the context worse.
Debug dump evidence
Request #3 (failed): 3 system messages (18630 + 2059 + 13957 chars), tool output 15799 chars, max_tokens=4096. Total context clearly exceeds 18K token budget.
Expected Behavior
- After compaction, if
cached_tokensis still above threshold, emitWARN "context compaction could not reduce usage below threshold (compacted N messages, still at M/B tokens)"and stop attempting further compaction for that session — surfaceStopping: context window is nearly fullto user. - OR: add a post-compaction cooldown: skip
should_compact()for the next N turns after a successful compaction. - Compaction summary should be bounded — if
summary_tokens > freed_tokens, it was counterproductive; log WARN and don't apply.
Severity
Medium — only affects users with extremely tight context_budget_tokens settings (well below the default). With auto_budget = true (default), this doesn't occur. Manual tight budgets hit this edge case.