fix: sweep stale chatRunState buffers for stuck runs by karanuppal · Pull Request #52428 · openclaw/openclaw

karanuppal · 2026-03-22T18:45:49Z

Summary

emitChatFinal frees chatRunState.buffers (and deltaSentAt/deltaLastBroadcastLen) on clean run completion. The maintenance timer also sweeps abortedRuns after ABORTED_RUN_TTL_MS. But runs that get stuck — e.g. an LLM request times out without triggering a clean lifecycle end — are never explicitly aborted, so their string buffers persist indefinitely.

This is the direct trigger for the StringAdd_CheckNone OOM crash: stuck buffers hold large concatenated strings that can't be GC'd.

Fix

Add a stale buffer sweep in the existing maintenance timer (server-maintenance.ts) that iterates chatDeltaSentAt and cleans up buffers, deltaSentAt, and deltaLastBroadcastLen for any run not updated within ABORTED_RUN_TTL_MS (1 hour), regardless of whether the run was explicitly aborted.

Changes

src/gateway/server-maintenance.ts — add stale buffer sweep loop, accept chatDeltaLastBroadcastLen param
src/gateway/server-runtime-state.ts — expose chatDeltaLastBroadcastLen from runtime state
src/gateway/server.impl.ts — pass chatDeltaLastBroadcastLen to maintenance timer
src/gateway/server-maintenance.test.ts — update test dep factory for new param

Testing

30 tests pass across gateway test files (server-maintenance, server-chat.agent-events, server-close)
Full pre-commit suite passes (tsgo, oxlint, oxfmt, all boundary lints)

Closes #51821

🤖 AI-assisted (Claude), fully tested, author understands the change.

greptile-apps · 2026-03-22T18:47:56Z

Greptile Summary

This PR fixes an OOM crash (StringAdd_CheckNone) by adding a stale-buffer sweep to the existing maintenance timer. Runs that time out without a clean lifecycle end (no emitChatFinal, no explicit abort) were previously leaving their large concatenated string buffers alive forever; the new sweep reaps any chatDeltaSentAt entry older than ABORTED_RUN_TTL_MS (1 hour) that was not already handled by the explicit-abort TTL path. The change is minimal and well-scoped.

server-maintenance.ts — new sweep loop iterates chatDeltaSentAt, skips already-aborted runs, and deletes chatRunBuffers, chatDeltaSentAt, and chatDeltaLastBroadcastLen for stale entries.
server-runtime-state.ts / server.impl.ts — plumbing to expose and forward chatDeltaLastBroadcastLen to the maintenance timer.
server-maintenance.test.ts — dep factory updated with the new map.
Minor gap: the pre-existing aborted-run TTL sweep (lines 126–133) still does not delete from chatDeltaLastBroadcastLen, so explicitly-aborted runs' entries in that map are never reclaimed. Because the values are small integers the practical impact is negligible, but it is inconsistent with the cleanup done by the new stale sweep.

Confidence Score: 4/5

Safe to merge — the core OOM fix is correct and the only gap is a minor numeric-value leak in chatDeltaLastBroadcastLen for explicitly-aborted runs.
The stale-buffer sweep logic is correct: safe Map mutation during iteration in JS (current entry already yielded before deletion), proper skipping of runs still tracked in abortedRuns, and consistent cleanup of all three maps for the newly targeted case. The one concrete issue — chatDeltaLastBroadcastLen not being cleaned up in the existing aborted-run sweep — is trivially fixable with a one-line addition and doesn't affect the primary OOM fix.
The abortedRuns TTL sweep in src/gateway/server-maintenance.ts (lines 126–133) should also delete from chatDeltaLastBroadcastLen.

Comments Outside Diff (1)

src/gateway/server-maintenance.ts, line 126-133 (link)

chatDeltaLastBroadcastLen not cleaned up for aborted runs

The existing aborted-run TTL sweep deletes from chatRunBuffers and chatDeltaSentAt but does not delete from chatDeltaLastBroadcastLen. Because the stale-buffer sweep (added below) only iterates chatDeltaSentAt and skips entries still present in abortedRuns, explicitly-aborted runs' chatDeltaLastBroadcastLen entries will never be removed by either sweep. After the TTL expiry the entry disappears from abortedRuns and chatDeltaSentAt, making it invisible to both cleanup paths.

The individual values are just numbers so the heap impact is negligible, but the inconsistency means the maps can drift out of sync over time.

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/gateway/server-maintenance.ts
Line: 126-133

Comment:
**`chatDeltaLastBroadcastLen` not cleaned up for aborted runs**

The existing aborted-run TTL sweep deletes from `chatRunBuffers` and `chatDeltaSentAt` but does **not** delete from `chatDeltaLastBroadcastLen`. Because the stale-buffer sweep (added below) only iterates `chatDeltaSentAt` and skips entries still present in `abortedRuns`, explicitly-aborted runs' `chatDeltaLastBroadcastLen` entries will never be removed by either sweep. After the TTL expiry the entry disappears from `abortedRuns` and `chatDeltaSentAt`, making it invisible to both cleanup paths.

The individual values are just numbers so the heap impact is negligible, but the inconsistency means the maps can drift out of sync over time.



How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/gateway/server-maintenance.ts
Line: 126-133

Comment:
**`chatDeltaLastBroadcastLen` not cleaned up for aborted runs**

The existing aborted-run TTL sweep deletes from `chatRunBuffers` and `chatDeltaSentAt` but does **not** delete from `chatDeltaLastBroadcastLen`. Because the stale-buffer sweep (added below) only iterates `chatDeltaSentAt` and skips entries still present in `abortedRuns`, explicitly-aborted runs' `chatDeltaLastBroadcastLen` entries will never be removed by either sweep. After the TTL expiry the entry disappears from `abortedRuns` and `chatDeltaSentAt`, making it invisible to both cleanup paths.

The individual values are just numbers so the heap impact is negligible, but the inconsistency means the maps can drift out of sync over time.

```suggestion
      params.chatRunState.abortedRuns.delete(runId);
      params.chatRunBuffers.delete(runId);
      params.chatDeltaSentAt.delete(runId);
      params.chatDeltaLastBroadcastLen.delete(runId);
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix: sweep stale chatRunState buffers fo..." | Re-trigger Greptile}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e5ba544f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-22T18:50:03Z

src/gateway/server-maintenance.ts

+    for (const [runId, lastSentAt] of params.chatDeltaSentAt) {
+      if (params.chatRunState.abortedRuns.has(runId)) {
+        continue; // already handled above
+      }
+      if (now - lastSentAt <= ABORTED_RUN_TTL_MS) {
+        continue;
+      }
+      params.chatRunBuffers.delete(runId);


Guard stale-buffer sweep against active runs

chatDeltaSentAt is only the timestamp of the last broadcasted assistant delta, not proof that the run is abandoned. We still allow chat runs to stay active far longer than an hour (src/gateway/chat-abort.ts:17-28), so a legitimate run can sit in a long tool call or other quiet phase after already emitting text. In that case this loop deletes chatRunState.buffers, and when output resumes resolveMergedAssistantText/emitChatFinal in src/gateway/server-chat.ts:513-630 can only reconstruct the post-pause suffix. The segmented-text path in src/gateway/server-chat.agent-events.test.ts:339-371 depends on that buffer surviving across tool gaps, so this change will truncate replies for any run that resumes after >1h without a delta.

Useful? React with 👍 / 👎.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e1d772eb69

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-22T22:07:44Z

src/gateway/server-maintenance.ts

+      if (params.chatAbortControllers.has(runId)) {
+        continue;


Don't reap active controllerless chat runs

This assumes every active chat run still has a chatAbortControllers entry, but that is not true for other chat-producing gateway flows: agent.request adds chat runs without creating an abort controller (src/gateway/server-methods/agent.ts:503-506), and voice transcript ingress does the same (src/gateway/server-node-events.ts:295-298). Those runs still populate chatRunState.buffers/chatDeltaSentAt through src/gateway/server-chat.ts:502-630, so a legitimate run that pauses in a long tool call for more than an hour will be treated as orphaned here and lose its pre-pause text before the eventual final event.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-22T22:07:44Z

.devcontainer/devcontainer.json

+      "version": "latest"
+    }
+  },
+  "postCreateCommand": "npm install"


Use pnpm for the devcontainer bootstrap

npm install only installs the root package in this repo, because the workspace graph lives in pnpm-workspace.yaml and the root package.json does not declare workspaces. A fresh devcontainer created from this file will therefore come up without the ui, packages/*, and extensions/* dependencies that the documented pnpm build/pnpm test flows rely on, so contributors will hit broken commands until they manually rerun pnpm install.

Useful? React with 👍 / 👎.

…aranuppal

…aranuppal)

…aranuppal

…aranuppal)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fa994c33be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-22T22:36:26Z

src/gateway/server-maintenance.ts

+    // Sweep stale buffers for runs that were never explicitly aborted.
+    // Only reap orphaned buffers after the abort controller is gone; active
+    // runs can legitimately sit idle while tools/models work.
+    for (const [runId, lastSentAt] of params.chatDeltaSentAt) {


Sweep runs that buffer text without broadcasting it

This loop uses chatDeltaSentAt as the source of truth, but emitChatDelta writes merged assistant text into chatRunState.buffers before the early returns for silent replies and hidden heartbeat output (src/gateway/server-chat.ts:522-530). In those cases a stuck run can still accumulate a large buffer while never creating a chatDeltaSentAt entry, so the new maintenance sweep never reclaims the strings for exactly the classes of runs that do not stream to chat. A wedged heartbeat or NO_REPLY run can therefore still hold onto the same OOM-sized buffers this patch is meant to clear.

Useful? React with 👍 / 👎.

@karanuppal

@karanuppal)

…aranuppal

…aranuppal)

…anuppal

…anuppal)

…anuppal

…anuppal)

@karanuppal

@karanuppal)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fa1c96c95

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-22T23:36:54Z

src/agents/pi-embedded-runner/extra-params.ts

+  if (provider === "openai" || provider === "openai-codex") {
+    if (provider === "openai") {
+      // Default OpenAI Responses to WebSocket-first with transparent SSE fallback.
+      agent.streamFn = createOpenAIDefaultTransportWrapper(agent.streamFn);


Avoid rewrapping direct OpenAI transport defaults

For the direct openai provider, the bundled plugin already adds createOpenAIDefaultTransportWrapper(...) in extensions/openai/openai-provider.ts:175-176. Putting the same wrapper here means any model-level extra params applied later by createStreamFnWithExtraParams()—for example transport: "sse" or openaiWsWarmup: true in agents.defaults.models["openai/..."].params—get clobbered by the outer plugin wrapper's fallback transport: "auto" / openaiWsWarmup: false, because that helper merges {...streamParams, ...options}. In practice, direct OpenAI users can no longer force SSE or enable warm-up through config.

Useful? React with 👍 / 👎.

emitChatFinal frees buffers on clean run completion, and the maintenance timer sweeps abortedRuns after ABORTED_RUN_TTL_MS. But runs that get stuck (e.g. LLM timeout without triggering clean lifecycle end) are never aborted and their string buffers persist indefinitely. This is the direct trigger for the StringAdd_CheckNone OOM crash reported in the issue. Add a stale buffer sweep in the maintenance timer that cleans up buffers, deltaSentAt, and deltaLastBroadcastLen for any run not updated within ABORTED_RUN_TTL_MS, regardless of abort status. Closes openclaw#51821

…aranuppal

…aranuppal)

…anuppal

…anuppal)

…anuppal

…anuppal)

@karanuppal

@karanuppal)

…uppal

…uppal)

…ranuppal

…ranuppal)

steipete · 2026-03-22T23:56:40Z

Landed via temp rebase onto main.

Gate: pnpm check; OPENCLAW_TEST_PROFILE=low OPENCLAW_TEST_SERIAL_GATEWAY=1 pnpm test; pnpm test -- src/cli/directory-cli.test.ts
Land commit: c490a98
Merge commit: 7bd12f4

Thanks @karanuppal!

openclaw-barnacle bot added gateway Gateway runtime size: XS labels Mar 22, 2026

chatgpt-codex-connector bot reviewed Mar 22, 2026

View reviewed changes

This was referenced Mar 22, 2026

feat(skills): add web-screenshot skill for reliable full-page screenshots #52388

Open

bug: CI broken on main — TypeScript error in zod-schema.core.ts (thinkingFormat: 'openrouter') #52438

Open

steipete added a commit that referenced this pull request Mar 22, 2026

fix: guard stale chat buffer sweep (#52428) (thanks @karanuppal)

dbd61b2

steipete force-pushed the fix/memory-leak-stale-buffers-51821 branch from 9e5ba54 to dbd61b2 Compare March 22, 2026 21:58

openclaw-barnacle bot added app: web-ui App: web-ui size: S and removed size: XS labels Mar 22, 2026

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

build: refresh plugin-sdk api baseline (openclaw#52428)

e1d772e

openclaw-barnacle bot added the docs Improvements or additions to documentation label Mar 22, 2026

chatgpt-codex-connector bot reviewed Mar 22, 2026

View reviewed changes

steipete force-pushed the fix/memory-leak-stale-buffers-51821 branch from e1d772e to 8c71659 Compare March 22, 2026 22:28

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

fix: guard stale chat buffer sweep (openclaw#52428) (thanks @karanuppal)

9b1c72f

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

build: refresh plugin-sdk api baseline (openclaw#52428)

afb6770

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

fix: stabilize stale buffer sweep landing (openclaw#52428) (thanks @k…

8c71659

…aranuppal)

openclaw-barnacle bot added cli CLI command changes size: M and removed size: S labels Mar 22, 2026

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

fix: guard stale chat buffer sweep (openclaw#52428) (thanks @karanuppal)

753fb74

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

build: refresh plugin-sdk api baseline (openclaw#52428)

782084d

steipete force-pushed the fix/memory-leak-stale-buffers-51821 branch from 8c71659 to fa994c3 Compare March 22, 2026 22:32

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

fix: stabilize stale buffer sweep landing (openclaw#52428) (thanks @k…

2d0fe37

…aranuppal)

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

test: fix post-rebase test drift (openclaw#52428)

fa994c3

openclaw-barnacle bot added size: S and removed size: M labels Mar 22, 2026

chatgpt-codex-connector bot reviewed Mar 22, 2026

View reviewed changes

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

fix: guard stale chat buffer sweep (openclaw#52428) (thanks @karanuppal)

7d6e75e

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

test: align rebased expectations and typecheck (openclaw#52428) (thanks

d7e707b

@karanuppal)

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

fix: guard stale chat buffer sweep (openclaw#52428) (thanks @karanuppal)

a09a9d6

steipete force-pushed the fix/memory-leak-stale-buffers-51821 branch from d7e707b to 122894e Compare March 22, 2026 23:28

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

build: refresh plugin-sdk api baseline (openclaw#52428)

b7c496d

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

fix: stabilize stale buffer sweep landing (openclaw#52428) (thanks @k…

67a3d7d

…aranuppal)

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

test: fix post-rebase test drift (openclaw#52428)

6b23429

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

fix: stabilize post-rebase CI (openclaw#52428)

80bd0c8

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

fix: restore rebase regression coverage (openclaw#52428) (thanks @kar…

a84e64b

…anuppal)

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

build: serialize formatter checks in CI (openclaw#52428) (thanks @kar…

b37af70

…anuppal)

steipete added a commit to karanuppal/openclaw that referenced this pull request Mar 22, 2026

test: align rebased expectations and typecheck (openclaw#52428) (thanks

c1154ea

@karanuppal)

chatgpt-codex-connector bot reviewed Mar 22, 2026

View reviewed changes

karanuppal and others added 11 commits March 22, 2026 23:51

fix: guard stale chat buffer sweep (openclaw#52428) (thanks @karanuppal)

c68a1c8

build: refresh plugin-sdk api baseline (openclaw#52428)

9295da3

fix: stabilize stale buffer sweep landing (openclaw#52428) (thanks @k…

7cafbf4

…aranuppal)

test: fix post-rebase test drift (openclaw#52428)

0e0b87a

fix: restore rebase regression coverage (openclaw#52428) (thanks @kar…

9a5339c

…anuppal)

build: serialize formatter checks in CI (openclaw#52428) (thanks @kar…

738c3a9

…anuppal)

test: align rebased expectations and typecheck (openclaw#52428) (thanks

12fb039

@karanuppal)

test: normalize rebase fallout (openclaw#52428) (thanks @karanuppal)

b5d9b96

test: stabilize json runtime captures (openclaw#52428) (thanks @karan…

d7d528b

…uppal)

test: fix rebased directory runtime mock (openclaw#52428) (thanks @ka…

c490a98

…ranuppal)

github-actions bot mentioned this pull request Mar 23, 2026

📡 Upstream Digest — 2026-03-23 01:25 UTC curtismercier/openclaw-mods#336

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: sweep stale chatRunState buffers for stuck runs#52428

fix: sweep stale chatRunState buffers for stuck runs#52428
steipete merged 11 commits intoopenclaw:mainfrom
karanuppal:fix/memory-leak-stale-buffers-51821

karanuppal commented Mar 22, 2026

Uh oh!

greptile-apps bot commented Mar 22, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Uh oh!

steipete commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

karanuppal commented Mar 22, 2026

Summary

Fix

Changes

Testing

Uh oh!

greptile-apps bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Comments Outside Diff (1)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

steipete commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Mar 22, 2026 •

edited

Loading