Skip to content

fix: prevent stack overflow in cache-trace stableStringify on circular refs#38935

Closed
MumuTW wants to merge 1 commit intoopenclaw:mainfrom
MumuTW:fix/cache-trace-circular-ref
Closed

fix: prevent stack overflow in cache-trace stableStringify on circular refs#38935
MumuTW wants to merge 1 commit intoopenclaw:mainfrom
MumuTW:fix/cache-trace-circular-ref

Conversation

@MumuTW
Copy link
Copy Markdown
Contributor

@MumuTW MumuTW commented Mar 7, 2026

Summary

  • Add WeakSet-based circular reference detection to stableStringify() in src/agents/cache-trace.ts
  • Prevents RangeError: Maximum call stack size exceeded when digest() encounters circular object references in session messages (e.g. image attachments near context limit)
  • Follows the same pattern already used by redactImageDataForDiagnostics() in src/agents/payload-redaction.ts

Test plan

  • Added regression test: circular reference in messages no longer causes stack overflow
  • All existing cache-trace tests pass (6/6)
  • pnpm format clean

Fixes #38859

…r refs

Add WeakSet-based circular reference detection to stableStringify(),
preventing RangeError when digesting messages that contain circular
object references (e.g. image attachments near context limit).

Fixes openclaw#38859
@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: S labels Mar 7, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 959af8af59

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +120 to +123
if (seen.has(value)) {
return JSON.stringify("[Circular]");
}
seen.add(value);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use path-scoped cycle tracking in stableStringify

seen is never cleared after descending into a branch, so any repeated reference is treated as "[Circular]" even when there is no cycle (for example, two fields pointing to the same object). In summarizeMessages(), this changes digests based on object identity rather than message content and can collapse distinct payload shapes to the same fingerprint, reducing cache-trace accuracy for debugging. This was introduced by the new seen.has/seen.add logic and should use recursion-path tracking (or delete from seen on unwind) to mark only true cycles.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 7, 2026

Greptile Summary

This PR fixes a RangeError: Maximum call stack size exceeded crash in stableStringify() by adding WeakSet-based circular reference detection, returning a "[Circular]" sentinel instead of recursing infinitely. The fix is minimal, correct, and consistent with the existing pattern in redactImageDataForDiagnostics().

  • The seen WeakSet is threaded through all recursive call sites (plain objects, arrays, Error, Uint8Array) correctly and a new instance is created per top-level call.
  • A regression test is included and passes alongside all existing tests.
  • Minor note: like redactImageDataForDiagnostics, the shared-seen approach will produce a false "[Circular]" for non-circular diamond-shaped references (the same object referenced from two branches of the same message). Given that fingerprints are diagnostic-only, the practical impact is very low.

Confidence Score: 4/5

  • This PR is safe to merge — it fixes a real crash with a correct, well-scoped change that is consistent with existing patterns in the codebase.
  • The fix is straightforward and correct: threading a WeakSet through stableStringify() stops the circular-reference stack overflow without touching any other behaviour. The only non-critical caveat is the known false-positive on diamond-shaped (non-circular) object graphs, which is an acceptable trade-off for a diagnostics-only function and is already present in the sibling redactImageDataForDiagnostics helper. A regression test is included and all 6 existing tests continue to pass.
  • No files require special attention.

Last reviewed commit: 959af8a

Comment on lines +120 to +123
if (seen.has(value)) {
return JSON.stringify("[Circular]");
}
seen.add(value);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared references produce false "[Circular]" in fingerprints

Because seen is shared across the entire traversal, any object (or array) that is referenced from two different places in the same message — a diamond-shaped graph that is not actually circular — will be serialized as "[Circular]" on its second occurrence. This can silently produce incorrect messageFingerprints for valid data.

For example:

const shared = { type: "text", text: "hello" };
const msg = { role: "user", content: [shared, shared] }; // not circular
stableStringify(msg);
// → {"content":[{"text":"hello","type":"text"},"[Circular]"],"role":"user"}
// instead of the correct:
// → {"content":[{"text":"hello","type":"text"},{"text":"hello","type":"text"}],"role":"user"}

Since the fingerprints are only used for diagnostics/tracing (not actual cache-control decisions), the impact is low. However, it's worth being aware of, especially if messageFingerprints are later used for anything more critical. The same trade-off exists in redactImageDataForDiagnostics(), so this is at least consistent.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/cache-trace.ts
Line: 120-123

Comment:
**Shared references produce false `"[Circular]"` in fingerprints**

Because `seen` is shared across the entire traversal, any object (or array) that is referenced from **two different places** in the same message — a diamond-shaped graph that is *not* actually circular — will be serialized as `"[Circular]"` on its second occurrence. This can silently produce incorrect `messageFingerprints` for valid data.

For example:
```ts
const shared = { type: "text", text: "hello" };
const msg = { role: "user", content: [shared, shared] }; // not circular
stableStringify(msg);
// → {"content":[{"text":"hello","type":"text"},"[Circular]"],"role":"user"}
// instead of the correct:
// → {"content":[{"text":"hello","type":"text"},{"text":"hello","type":"text"}],"role":"user"}
```

Since the fingerprints are only used for diagnostics/tracing (not actual cache-control decisions), the impact is low. However, it's worth being aware of, especially if `messageFingerprints` are later used for anything more critical. The same trade-off exists in `redactImageDataForDiagnostics()`, so this is at least consistent.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +173 to +177
expect(lines.length).toBe(1);
const event = JSON.parse(lines[0]?.trim() ?? "{}") as Record<string, unknown>;
expect(event.messageCount).toBe(1);
expect(event.messageFingerprints).toHaveLength(1);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test could assert "[Circular]" appears in the fingerprint

The test correctly checks that no stack overflow occurs and that a single event line is produced, but it doesn't assert that the circular reference is actually replaced with the "[Circular]" sentinel in the serialized fingerprint. Adding an assertion would more directly confirm the fix works as intended:

expect(event.messageFingerprints).toHaveLength(1);
// Verify the circular reference is replaced rather than causing incorrect output
const fp = event.messageFingerprints as string[];
expect(fp[0]).toBeTruthy(); // fingerprint is a non-empty sha256 hex string

Or, if you want to verify the placeholder itself ends up in the serialized form, you could expose a small unit test on stableStringify directly (via a test-only export or a separate describe block for the helper). Either way, the current test is a solid regression guard — this is just a suggestion to strengthen it.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/cache-trace.test.ts
Line: 173-177

Comment:
**Test could assert `"[Circular]"` appears in the fingerprint**

The test correctly checks that no stack overflow occurs and that a single event line is produced, but it doesn't assert that the circular reference is actually replaced with the `"[Circular]"` sentinel in the serialized fingerprint. Adding an assertion would more directly confirm the fix works as intended:

```ts
expect(event.messageFingerprints).toHaveLength(1);
// Verify the circular reference is replaced rather than causing incorrect output
const fp = event.messageFingerprints as string[];
expect(fp[0]).toBeTruthy(); // fingerprint is a non-empty sha256 hex string
```

Or, if you want to verify the placeholder itself ends up in the serialized form, you could expose a small unit test on `stableStringify` directly (via a test-only export or a separate `describe` block for the helper). Either way, the current test is a solid regression guard — this is just a suggestion to strengthen it.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor

@gambletan gambletan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: cache-trace stableStringify circular reference fix

Overall: Good fix, clean implementation.

What works well

  • Using WeakSet<object> is the correct choice — it avoids memory leaks since entries are garbage-collected when no longer referenced.
  • The [Circular] sentinel is JSON-safe and clearly communicates what happened.
  • The default parameter seen: WeakSet<object> = new WeakSet() keeps the public API unchanged — callers don't need to know about the cycle tracking.
  • Test case correctly creates a real circular reference and verifies no crash.

Minor observations

  1. The seen set is shared across the entire recursive tree. This means if the same object appears in two independent branches (not a cycle, just shared reference), it will be serialized as [Circular] on the second encounter. For a diagnostic tracing function this is fine — stability and crash prevention matter more than perfect fidelity. Just noting this is a deliberate trade-off.

  2. New Error and Uint8Array wrapper objects are not added to seen before recursing (lines ~127–140). These freshly created plain objects can't be circular, so this is correct. Good.

  3. The test could optionally verify the output content (e.g., that the fingerprint contains [Circular]), but since the function is internal diagnostic plumbing, verifying "no crash + correct count" is sufficient.

LGTM — clean, minimal fix for a real crash scenario. 👍

Copy link
Copy Markdown

@mdlmarkham mdlmarkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Circular Reference Stack Overflow Fix ✅

Verdict: Clean fix for a critical crash. Ready to merge.

The Bug

stableStringify() in cache-trace uses deep recursion. When a message contains circular references (e.g., image attachments with parent back-references), the function recurses infinitely until RangeError: Maximum call stack size exceeded.

The Fix

function stableStringify(value: unknown, seen: WeakSet<object> = new WeakSet()): string {
  // ...
  if (seen.has(value)) {
    return JSON.stringify("[Circular]");
  }
  seen.add(value);
  // ... recurse with seen
}

Why WeakSet: Tracks objects only, doesn't prevent GC, O(1) lookup, doesn't leak memory.

Pattern Consistency

This mirrors redactImageDataForDiagnostics() in payload-redaction.ts:

// Same pattern already used elsewhere
const seen = new WeakSet();
if (seen.has(value)) return "[Circular]";

Test Coverage

  • ✅ Circular reference in messages no longer causes stack overflow
  • ✅ Existing tests pass (6/6)
  • ✅ Verify output contains [Circular] placeholder, not crash

Edge Cases

  1. Array circular refs: [{ref: parent}, parent]seen catches the object ✅
  2. Deep nesting before cycle: seen tracks all visited objects ✅
  3. Primitives in cycle: WeakSet only tracks objects, primitives are fine ✅

Minor Observation

The fix replaces circular refs with "[Circular]". If downstream consumers parse this string, they'll get a literal string, not an error. This is better than:

  • Crashing (old behavior)
  • Losing the entire object

If downstream needs to know which field was circular, that would require path tracking — but that's overkill for this use case.


Recommendation: Approve. Critical crash fix, follows established pattern, well-tested.

@steipete
Copy link
Copy Markdown
Contributor

steipete commented Mar 7, 2026

Landed. Thank you @MumuTW.

What I did:

SHA hashes:

  • original PR commit: 959af8af5927
  • landed commit on main: 5effa6043

Thanks again for the fix.

@steipete steipete closed this Mar 7, 2026
mrosmarin added a commit to mrosmarin/openclaw that referenced this pull request Mar 7, 2026
* main: (133 commits)
  reduce image size, offer slim image (openclaw#38479)
  fix(security): harden install base drift cleanup
  fix(agents): respect explicit provider baseUrl in merge mode (openclaw#39103)
  fix(agents): apply contextTokens cap for compaction threshold (openclaw#39099)
  fix(exec): block dangerous override-only env pivots
  fix(security): stage installs before publish
  fix(daemon): normalise whitespace in checkTokenDrift to prevent false-positive warning (openclaw#39108)
  fix(security): harden fs-safe copy writes
  refactor: dedupe bluebubbles webhook auth test setup
  refactor: dedupe discord native command test scaffolding
  refactor: dedupe anthropic probe target test setup
  refactor: dedupe minimax provider auth test setup
  refactor: dedupe runtime snapshot test fixtures
  fix: harden zip extraction writes
  fix(tests): stabilize diffs localReq headers (supersedes openclaw#39063)
  fix: harden workspace skill path containment
  fix(agents): land openclaw#38935 from @MumuTW
  fix(models): land openclaw#38947 from @davidemanuelDEV
  fix(gateway): land openclaw#39064 from @Narcooo
  fix(models-auth): land openclaw#38951 from @MumuTW
  ...
vincentkoc pushed a commit to BryanTegomoh/openclaw-fork that referenced this pull request Mar 8, 2026
openperf pushed a commit to openperf/moltbot that referenced this pull request Mar 8, 2026
mcaxtr pushed a commit to mcaxtr/openclaw that referenced this pull request Mar 8, 2026
Saitop pushed a commit to NomiciAI/openclaw that referenced this pull request Mar 8, 2026
GordonSH-oss pushed a commit to GordonSH-oss/openclaw that referenced this pull request Mar 9, 2026
jenawant pushed a commit to jenawant/openclaw that referenced this pull request Mar 10, 2026
dhoman pushed a commit to dhoman/chrono-claw that referenced this pull request Mar 11, 2026
senw-developers pushed a commit to senw-developers/va-openclaw that referenced this pull request Mar 17, 2026
V-Gutierrez pushed a commit to V-Gutierrez/openclaw-vendor that referenced this pull request Mar 17, 2026
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 21, 2026
Co-authored-by: MumuTW <[email protected]>
(cherry picked from commit 5effa60)
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 21, 2026
Co-authored-by: MumuTW <[email protected]>
(cherry picked from commit 5effa60)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: RangeError: Maximum call stack size exceeded on chat.send with image attachment near context limit

4 participants