Skip to content

[Feature]: Deliver assistant response before auto-compaction starts #35074

@mira-lgtm

Description

@mira-lgtm

Summary

Auto-compaction triggers after a conversation turn but blocks the entire run pipeline — the assistant's already-generated response is not delivered until compaction completes. On a large context (141 messages / 139K chars on Opus 4.6), compaction took 444,767ms (7.4 minutes). The user saw no reply for over 7 minutes despite the response being fully generated before compaction started.

Problem to solve

Any long-running iMessage/Telegram/WhatsApp session that approaches the compaction threshold experiences invisible delays. The longer the session, the longer compaction takes, the longer the user waits for a response that already exists. On a 141-message session with Opus 4.6, the user waited 7.4 minutes for a reply that was fully generated before compaction even started. This destroys conversational flow and makes the assistant appear unresponsive. The current architecture treats compaction as part of the response pipeline, but it has zero dependency on delivery — the response is complete before compaction begins.

Proposed solution

Deliver the assistant response to the channel immediately after generation, before starting post-turn compaction. The response is already complete — compaction is maintenance on the context window with no dependency on delivery.

Pseudocode:

response = model.generate()
channel.deliver(response)    // user sees reply immediately
compaction.run()             // runs in background, no user impact

This is a sequencing change, not an architectural one. The compaction logic remains identical — it just runs after delivery instead of before.

Alternatives considered

  1. Run compaction in a background worker: More complex, introduces concurrency issues with the context window if the user sends another message mid-compaction.
  2. Increase compaction threshold: Only delays the problem — eventually the context grows large enough to trigger compaction anyway, and the delay scales with context size.
  3. Disable auto-compaction: Requires manual intervention and risks hitting context limits mid-conversation.

Impact

Affected: All users on messaging channels (iMessage, Telegram, WhatsApp) with long-running sessions approaching compaction threshold
Severity: High — 7+ minute invisible delays destroy conversational flow
Frequency: Every compaction event (deterministic, scales with session length)
Consequence: Users perceive the assistant as unresponsive/broken. Compounds with stale-socket restarts — if the iMessage provider restarts during compaction, the delivery failure cascades further.

Evidence/examples

Gateway log from 2026-03-04 showing compaction blocking delivery:

4:03:19 PM — Response fully generated (Opus 4.6)
4:03:19 PM — Auto-compaction triggered (141 messages, 139K chars)
4:04:10 PM — iMessage socket flagged stale mid-compaction (stale-socket restart)
4:10:18 PM — Compaction complete (444,767ms / 7.4 minutes)
4:10:18 PM — Response finally delivered to iMessage

The response existed at 4:03:19 PM but the user didn't receive it until 4:10:18 PM.

Environment: OpenClaw v2026.3.2, macOS 15.4 M4, iMessage channel, Opus 4.6

Additional information

This issue compounds with #35072 (stale-socket restarts during idle periods). When compaction runs for 7+ minutes, the health monitor can flag the iMessage socket as stale mid-compaction and restart the provider, further delaying delivery. Fixing delivery ordering (this issue) and stale-socket thresholds (#35072) together would eliminate the combined 7+ minute blackout window.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions