-
-
Notifications
You must be signed in to change notification settings - Fork 69.1k
[Feature]: Deliver assistant response before auto-compaction starts #35074
Description
Summary
Auto-compaction triggers after a conversation turn but blocks the entire run pipeline — the assistant's already-generated response is not delivered until compaction completes. On a large context (141 messages / 139K chars on Opus 4.6), compaction took 444,767ms (7.4 minutes). The user saw no reply for over 7 minutes despite the response being fully generated before compaction started.
Problem to solve
Any long-running iMessage/Telegram/WhatsApp session that approaches the compaction threshold experiences invisible delays. The longer the session, the longer compaction takes, the longer the user waits for a response that already exists. On a 141-message session with Opus 4.6, the user waited 7.4 minutes for a reply that was fully generated before compaction even started. This destroys conversational flow and makes the assistant appear unresponsive. The current architecture treats compaction as part of the response pipeline, but it has zero dependency on delivery — the response is complete before compaction begins.
Proposed solution
Deliver the assistant response to the channel immediately after generation, before starting post-turn compaction. The response is already complete — compaction is maintenance on the context window with no dependency on delivery.
Pseudocode:
response = model.generate()
channel.deliver(response) // user sees reply immediately
compaction.run() // runs in background, no user impact
This is a sequencing change, not an architectural one. The compaction logic remains identical — it just runs after delivery instead of before.
Alternatives considered
- Run compaction in a background worker: More complex, introduces concurrency issues with the context window if the user sends another message mid-compaction.
- Increase compaction threshold: Only delays the problem — eventually the context grows large enough to trigger compaction anyway, and the delay scales with context size.
- Disable auto-compaction: Requires manual intervention and risks hitting context limits mid-conversation.
Impact
Affected: All users on messaging channels (iMessage, Telegram, WhatsApp) with long-running sessions approaching compaction threshold
Severity: High — 7+ minute invisible delays destroy conversational flow
Frequency: Every compaction event (deterministic, scales with session length)
Consequence: Users perceive the assistant as unresponsive/broken. Compounds with stale-socket restarts — if the iMessage provider restarts during compaction, the delivery failure cascades further.
Evidence/examples
Gateway log from 2026-03-04 showing compaction blocking delivery:
4:03:19 PM — Response fully generated (Opus 4.6)
4:03:19 PM — Auto-compaction triggered (141 messages, 139K chars)
4:04:10 PM — iMessage socket flagged stale mid-compaction (stale-socket restart)
4:10:18 PM — Compaction complete (444,767ms / 7.4 minutes)
4:10:18 PM — Response finally delivered to iMessage
The response existed at 4:03:19 PM but the user didn't receive it until 4:10:18 PM.
Environment: OpenClaw v2026.3.2, macOS 15.4 M4, iMessage channel, Opus 4.6
Additional information
This issue compounds with #35072 (stale-socket restarts during idle periods). When compaction runs for 7+ minutes, the health monitor can flag the iMessage socket as stale mid-compaction and restart the provider, further delaying delivery. Fixing delivery ordering (this issue) and stale-socket thresholds (#35072) together would eliminate the combined 7+ minute blackout window.