Skip to content

fix(shared): extract text from stringified content block arrays (#29028)#29050

Open
ArsalanShakil wants to merge 14 commits intoopenclaw:mainfrom
ArsalanShakil:fix/heartbeat-json-serialization-29028
Open

fix(shared): extract text from stringified content block arrays (#29028)#29050
ArsalanShakil wants to merge 14 commits intoopenclaw:mainfrom
ArsalanShakil:fix/heartbeat-json-serialization-29028

Conversation

@ArsalanShakil
Copy link
Copy Markdown

Summary

  • Problem: extractTextFromChatContent returned raw JSON strings like [{"type":"output_text","text":"..."}] instead of extracting the text, because it didn't detect JSON-stringified content block arrays and didn't recognize "output_text" blocks from the OpenAI Responses API.
  • Why it matters: Heartbeat replies were delivered to Discord as unreadable JSON walls; recursive serialization amplified this into exponentially growing backslash escaping.
  • What changed: Added isTextBlockType() (accepts both "text" and "output_text"), tryParseStringifiedContentBlocks() (detects and extracts text from stringified arrays), and 14 new tests.
  • What did NOT change (scope boundary): No changes to the heartbeat runner, agent runner, session storage, or delivery pipeline — the fix is isolated to the shared text extraction utility.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Heartbeat responses using OpenAI Responses API models are now delivered as plain text instead of raw JSON content block arrays.
  • Recursive escaping / slash loops in heartbeat messages are eliminated.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Ubuntu 24.04 (reporter), macOS (dev)
  • Runtime/container: Node 22+
  • Model/provider: OpenAI Responses API (any model)
  • Integration/channel: Discord (heartbeat delivery)
  • Relevant config: Heartbeat enabled with OpenAI Responses API model

Steps

  1. Configure a heartbeat with an OpenAI Responses API model
  2. Wait for heartbeat to trigger and generate a response
  3. Observe the delivered message in Discord

Expected

  • Plain text heartbeat message (e.g. "Everything looks good")

Actual

  • Raw JSON: [{"type":"output_text","text":"Everything looks good"}]
  • In severe cases: recursive escaping with walls of \ characters

Evidence

  • Failing test/log before + passing after — 14 new tests covering stringified content arrays, output_text blocks, mixed types, sanitizer pass-through, recursive escaping, and edge cases. All 28 tests pass.
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: All 28 unit tests pass including 14 new tests for the fix
  • Edge cases checked: plain strings starting with [, JSON arrays of non-objects, arrays missing type field, whitespace-only text blocks, doubly-stringified content, sanitizer applied to stringified blocks
  • What you did not verify: Live heartbeat delivery to Discord with a real OpenAI Responses API model

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Revert the single commit on src/shared/chat-content.ts
  • Files/config to restore: src/shared/chat-content.ts
  • Known bad symptoms reviewers should watch for: Regular plain-text strings being incorrectly parsed as content block arrays (mitigated by requiring every array element to have a type field)

Risks and Mitigations

  • Risk: A plain string that happens to be valid JSON matching the content block shape could be incorrectly parsed.
    • Mitigation: Strict validation requires every element to be an object with a type field — this pattern is extremely unlikely in natural text. The function returns the extracted text either way, so the worst case is equivalent output.

…claw#29028)

Heartbeat responses using the OpenAI Responses API were being delivered
as raw JSON (e.g. `[{"type":"output_text","text":"..."}]`) instead of
extracted plain text. In severe cases, repeated serialization produced
recursive escaping with walls of backslashes.

The root cause: `extractTextFromChatContent` passed through string
content without checking if it was a JSON-serialized content block array,
and only recognized `type: "text"` blocks (missing `"output_text"`).

Changes:
- Add `isTextBlockType()` helper accepting both `"text"` and
  `"output_text"` block types
- Add `tryParseStringifiedContentBlocks()` to detect and extract text
  from JSON-stringified content arrays before they leak as raw JSON
- Use both helpers in `extractTextFromChatContent` for string and array
  content paths
- Add comprehensive tests covering stringified arrays, output_text
  blocks, mixed types, sanitizer pass-through, and edge cases
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Feb 27, 2026

Greptile Summary

This PR fixes extractTextFromChatContent to properly handle stringified JSON content block arrays (like [{"type":"output_text","text":"..."}]) by parsing and extracting text instead of returning raw JSON. The fix adds two helper functions: isTextBlockType() to recognize both "text" and "output_text" block types, and tryParseStringifiedContentBlocks() to detect and parse stringified arrays. The implementation is conservative, requiring all array elements to have a type field before treating input as content blocks, which prevents false positives. All 14 new tests pass and cover edge cases including plain strings starting with [, mixed block types, sanitizer application, and whitespace handling. The change is backward compatible and doesn't modify any upstream code paths (heartbeat runner, agent runner, etc.).

Confidence Score: 5/5

  • This PR is safe to merge with no blocking issues
  • The fix is well-isolated to a single shared utility function, includes comprehensive test coverage (14 new tests), maintains backward compatibility, and uses conservative validation logic to prevent false positives. The code follows repository style guidelines, maintains type safety, and solves the reported issue without introducing new risks.
  • No files require special attention

Last reviewed commit: f828fae

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0fa1c38a41

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

The initial fix only parsed one JSON layer, so nested serialized payloads
(e.g. a text field containing another stringified content block array)
still leaked as raw JSON. This is the exact pattern that causes the
recursive escaping / backslash wall behavior.

Changes:
- tryParseStringifiedContentBlocks now recurses into text values that are
  themselves stringified content block arrays (capped at 5 levels)
- The array content path in extractTextFromChatContent also unwraps
  nested stringified text values
- Added 4 new tests: nested unwrap via array, nested unwrap via string,
  triple nesting, and depth cap safety
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12abd48526

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 53a3cfbd5d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 31a1ee94c1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Mar 6, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b81a8365f1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@openclaw-barnacle openclaw-barnacle bot removed the stale Marked as stale due to inactivity label Mar 28, 2026
@openclaw-barnacle
Copy link
Copy Markdown

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: M stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Heartbeat responses serialized as raw JSON in Discord delivery — recursive escaping / slash loop

1 participant