-
-
Notifications
You must be signed in to change notification settings - Fork 39.6k
Closed
Description
Bug Description
After compaction triggers, the embedded summarization run times out, and the session lane remains permanently stuck. No further messages are processed until the gateway is manually restarted.
Steps to Reproduce
- Run a long session until context fills up (~140k+ of 200k tokens)
- Compaction triggers an embedded summarization run
- The embedded run times out (hits 180s or 600s limit)
- Session lane becomes permanently jammed
Expected Behavior
After an embedded run timeout, the session lane should recover and continue processing incoming messages (even if the compaction/summary failed).
Actual Behavior
The lane stays stuck. Symptoms in logs:
[agent/embedded] embedded run timeout: runId=... sessionId=... timeoutMs=180000
[diagnostic] lane wait exceeded: lane=session:agent:main:main waitedMs=97489 queueAhead=0
announce queue drain failed for agent:main:main: Error: gateway timeout after 60000ms
Messages queue up but are never processed. The only fix is a full gateway restart.
Evidence
From a single day (2026-02-14):
- 6 embedded run timeouts on the same session
- 4 lane wait exceeded events (up to 97s wait with 0 queue ahead)
- 2 announce queue drain failures (60s gateway timeout)
- Multiple
Suppressed AbortErrorentries following each timeout
Workaround
Tuning compaction to trigger earlier reduces the likelihood (less content to summarize = faster = less likely to timeout):
"compaction": {
"mode": "safeguard",
"maxHistoryShare": 0.6,
"reserveTokensFloor": 40000,
"memoryFlush": { "enabled": true }
}But this only reduces frequency — the core issue is that a timed-out embedded run should not permanently jam the session lane.
Environment
- OpenClaw 2026.2.13 (commit 203b5bd)
- macOS ARM64 (Mac Mini)
- Node v22.22.0
- Model: anthropic/claude-opus-4-6 (200k context window)
- Context pruning active: cache-ttl mode, 1h TTL
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels