fix(shared): extract text from stringified content block arrays (#29028)#29050
fix(shared): extract text from stringified content block arrays (#29028)#29050ArsalanShakil wants to merge 14 commits intoopenclaw:mainfrom
Conversation
…claw#29028) Heartbeat responses using the OpenAI Responses API were being delivered as raw JSON (e.g. `[{"type":"output_text","text":"..."}]`) instead of extracted plain text. In severe cases, repeated serialization produced recursive escaping with walls of backslashes. The root cause: `extractTextFromChatContent` passed through string content without checking if it was a JSON-serialized content block array, and only recognized `type: "text"` blocks (missing `"output_text"`). Changes: - Add `isTextBlockType()` helper accepting both `"text"` and `"output_text"` block types - Add `tryParseStringifiedContentBlocks()` to detect and extract text from JSON-stringified content arrays before they leak as raw JSON - Use both helpers in `extractTextFromChatContent` for string and array content paths - Add comprehensive tests covering stringified arrays, output_text blocks, mixed types, sanitizer pass-through, and edge cases
Greptile SummaryThis PR fixes Confidence Score: 5/5
Last reviewed commit: f828fae |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0fa1c38a41
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The initial fix only parsed one JSON layer, so nested serialized payloads (e.g. a text field containing another stringified content block array) still leaked as raw JSON. This is the exact pattern that causes the recursive escaping / backslash wall behavior. Changes: - tryParseStringifiedContentBlocks now recurses into text values that are themselves stringified content block arrays (capped at 5 levels) - The array content path in extractTextFromChatContent also unwraps nested stringified text values - Added 4 new tests: nested unwrap via array, nested unwrap via string, triple nesting, and depth cap safety
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 12abd48526
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53a3cfbd5d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 31a1ee94c1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
This pull request has been automatically marked as stale due to inactivity. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b81a8365f1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
This pull request has been automatically marked as stale due to inactivity. |
Summary
extractTextFromChatContentreturned raw JSON strings like[{"type":"output_text","text":"..."}]instead of extracting the text, because it didn't detect JSON-stringified content block arrays and didn't recognize"output_text"blocks from the OpenAI Responses API.isTextBlockType()(accepts both"text"and"output_text"),tryParseStringifiedContentBlocks()(detects and extracts text from stringified arrays), and 14 new tests.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
Security Impact (required)
NoNoNoNoNoRepro + Verification
Environment
Steps
Expected
Actual
[{"type":"output_text","text":"Everything looks good"}]\charactersEvidence
output_textblocks, mixed types, sanitizer pass-through, recursive escaping, and edge cases. All 28 tests pass.Human Verification (required)
[, JSON arrays of non-objects, arrays missingtypefield, whitespace-only text blocks, doubly-stringified content, sanitizer applied to stringified blocksCompatibility / Migration
YesNoNoFailure Recovery (if this breaks)
src/shared/chat-content.tssrc/shared/chat-content.tstypefield)Risks and Mitigations
typefield — this pattern is extremely unlikely in natural text. The function returns the extracted text either way, so the worst case is equivalent output.