-
-
Notifications
You must be signed in to change notification settings - Fork 69.7k
Memory leak: chatRunState.buffers not cleaned up for stuck runs #51821
Description
Bug
In src/gateway/server-chat.ts, each streaming response builds an in-memory string buffer via appendUniqueSuffix() in chatRunState.buffers. The buffer is freed in emitChatFinal when the run lifecycle ends cleanly.
However, if the run gets stuck (e.g., an LLM request times out after 10 minutes without triggering a clean lifecycle end), emitChatFinal never fires and the buffer persists indefinitely. The maintenance timer in server-maintenance.ts only cleans up runs that are in chatRunState.abortedRuns — stuck runs that were never explicitly aborted are missed.
This is the direct trigger for the V8 StringAdd_CheckNone OOM crash, since the stuck buffer holds a large concatenated string that can't be GC'd.
Steps to reproduce
- Run gateway as a long-lived daemon
- Have an LLM request time out (e.g.,
embedded run timeoutafter 600s) - The failover path fires but the original run's buffer in
chatRunState.buffersis never deleted - Over multiple stuck runs, heap grows until OOM
Expected behavior
The maintenance timer should also clean up buffers for runs that have exceeded a timeout threshold (e.g., the existing ABORTED_RUN_TTL_MS) regardless of whether they're in chatRunState.abortedRuns. Any buffer older than the run timeout + a grace period should be swept.
chatRunState.deltaLastBroadcastLen has the same issue — only cleaned in emitChatFinal.
Environment
- openclaw 2026.3.13
- Node 25.8.1 (Apple Silicon)
- Gateway running as launchd daemon, crashed with
FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memoryafter ~19 hours (68M ms uptime) at ~4GB heap - Crash stack trace shows
Builtins_StringAdd_CheckNoneas the allocating frame
🤖 Generated with Claude Code