fix(code-execution): preserve rich content through JS sandbox#7787
fix(code-execution): preserve rich content through JS sandbox#7787
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1a1e3dbe8b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2143e6abd2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| _ => { | ||
| // Non-text content (resources, images, blobs) — store for later | ||
| rich_contents.push(content.clone()); | ||
| } |
There was a problem hiding this comment.
Respect audience filtering for non-text tool content
This branch now forwards every non-text Content item into rich_contents without checking content.audience(), while text still enforces assistant/no-audience gating. In runs where a tool emits user-scoped artifacts (for example audience=[Role::User] on an image/resource), execute will now leak that payload back to the assistant/model. Apply the same audience predicate to non-text items before storing them.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This is intentional — the audience metadata is preserved on the Content items and filtering happens downstream. When the tool response is included in messages sent to the model, agent_visible_content() calls filter_for_audience(Role::Assistant) on the ToolResponse, which filters out any content with audience: [User] (see message.rs:285-294). So the model never sees user-only content, but it does reach the UI where it needs to be rendered (e.g. the autovisualiser HTML blob).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1c883d4534
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f14e96154b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ccaa2d8b14
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1e22ea448d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a1c86f061e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
alexhancock
left a comment
There was a problem hiding this comment.
this makes sense and is a great catch
also cc @eliasposen for visibility
…en refs When code mode wraps tools in a JS sandbox, non-text content (resources, images, blobs) couldn't round-trip through the JSON return value. This fix: 1. Stores non-text content in a ContentStore keyed by unique tokens 2. Returns token references in the JSON payload to the JS runtime 3. After execution, collects referenced tokens and appends resolved content This allows tools like Autovisualiser to return rich HTML visualizations that survive the JS sandbox boundary. The model's return value stays intact with token references, while the actual content is restored alongside. Also removed stdout/stderr from the tool output channel—console.log is now ignored and only return values are used, simplifying the interface. Fixes #6167 Co-Authored-By: Claude Haiku 4.5 <[email protected]>
…blings 1. Only treat top-level return as a pure content ref when the object has exactly the shape we created (_goose_content_ref + text_result). If the model spreads extra fields into the ref, include the text output too. 2. Always recurse into all object values in collect_rich_content, even after finding a ref. This ensures nested refs in sibling fields are not silently dropped. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Co-Authored-By: Claude Opus 4.6 <[email protected]>
The locally-generated snapshot included extra skills from the local environment. Restored from main and applied only the console.log instruction change. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Providers that only serialize text content (OpenAI, Codex, Anthropic) would produce no tool output for the model when execute returns only rich content. Always include a text placeholder so the model sees a valid tool result. Co-Authored-By: Claude Opus 4.6 <[email protected]>
When a tool callback returns both rich content and a text_result, use the actual text_result as the text fallback instead of a generic placeholder, so the model still receives meaningful textual data. Co-Authored-By: Claude Opus 4.6 <[email protected]>
text_result can be a JSON object/number/boolean, not just a string. Serialize non-string values to text instead of dropping them. Co-Authored-By: Claude Opus 4.6 <[email protected]>
When text_value is a JSON object, inject _goose_content_ref into it instead of wrapping in a new object. This preserves the original tool output schema so r.items etc. still works in JS code mode. Co-Authored-By: Claude Opus 4.6 <[email protected]>
Use a _goose_pure_ref sentinel in the wrapper object so is_pure_ref can't collide with real tool objects that happen to have text_result. Track seen tokens in collect_rich_content to avoid duplicating rich content when the same ref appears multiple times in the output. Co-Authored-By: Claude Opus 4.6 <[email protected]>
19a2ec9 to
5889b4f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e182c5cf60
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| .output | ||
| .as_ref() | ||
| .and_then(|v| v.get("_goose_pure_ref")) | ||
| .and_then(|v| v.as_bool()) | ||
| .unwrap_or(false); |
There was a problem hiding this comment.
Restrict pure-ref fast path to untouched wrapper values
handle_execute_typescript currently treats any top-level output with _goose_pure_ref: true as an internal wrapper and drops the JSON object in favor of text_result plus rich content. This loses caller-added fields when a script augments a non-object rich tool result (for example const r = await Tool(); return { ...r, status: "ok" }), because the sentinel survives and still triggers this branch. The pure-ref shortcut should only run for the exact wrapper shape produced by create_tool_callback, otherwise structured return data is silently discarded.
Useful? React with 👍 / 👎.
| • agent-browser - Debug visual bugs and interact with web apps using agent-browser CLI. Use when debugging, inspecting, navigating, filling forms, clicking buttons, taking screenshots, scraping data, testing web apps, or automating browser tasks. Supports desktop browsers and iOS Simulator (Mobile Safari). 93% less context than Playwright MCP. | ||
| • codesearch - Use when searching, finding, locating, discovering, querying, looking up, or browsing code patterns, files, or implementations across all Square repositories via go/codesearch. | ||
| • find-skills - Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill. |
There was a problem hiding this comment.
Remove machine-local skills from the golden prompt snapshot
These added skill bullets make the snapshot depend on whatever skills are installed on the author’s machine, not just repository fixtures. In summon::discover_filesystem_sources the test input includes global config/home skill directories, so entries like agent-browser/codesearch can vary by environment; committing them here makes test_all_platform_extensions brittle and likely to fail on CI or contributors without the same local skill set.
Useful? React with 👍 / 👎.
…cal skills from snapshot
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f82e166cc7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| map.insert( | ||
| "_goose_content_ref".to_string(), | ||
| Value::String(token), | ||
| ); |
There was a problem hiding this comment.
Preserve existing
_goose_content_ref fields on tool objects
When a tool returns an object plus rich content, create_tool_callback unconditionally does map.insert("_goose_content_ref", token). If the tool schema already includes a _goose_content_ref property, this silently overwrites the real value and changes the returned object shape, so downstream TypeScript code that relies on that field receives corrupted data. This regression is triggered specifically for object-shaped tool results that include non-text content.
Useful? React with 👍 / 👎.
| } else { | ||
| let return_val = serde_json::to_string_pretty(&output.output) | ||
| .unwrap_or_else(|_| json!(&output.output).to_string()); | ||
| let mut contents = vec![Content::text(format_output( |
There was a problem hiding this comment.
I would use output.markdown() (https://github.com/portofcontext/pctx/blob/main/crates/pctx_code_mode/src/model.rs#L185) here and delete custom format_output. The execute_typescript tool description says that anything that is returned or console logged by the script will be returned to the model.
|
@jh-block looks great! is there way to test this? Either an actual unit test or just something you have been doing locally in a session? |
Summary
Code mode wraps MCP tools in a JS sandbox, but non-text content (resources, images, blobs) couldn't survive the JSON boundary. This fix introduces a token-based reference system: non-text content is stored on the Rust side, token references are passed to JS, and the resolved content is restored after execution.
The model's return value remains intact with token refs visible in the JSON, while actual rich content is appended as separate Content items. Also removed stdout/stderr as a return channel—console.log is now ignored entirely. This simplifies the interface of Code Mode (one return channel rather than three) and means that the freeform text of stdout/stderr doesn't have to be scanned for possible Content refs.
Type of Change
AI Assistance
Testing
Manual testing confirmed Autovisualiser now renders visualizations successfully when called through code mode, preserving the HTML blob through the JS sandbox boundary.
Related Issues
Relates to #6167
🤖 Generated with Claude Code