Skip to content

fix(code-execution): preserve rich content through JS sandbox#7787

Open
jh-block wants to merge 12 commits intomainfrom
jhugo/code-mode-preserve-rich-content
Open

fix(code-execution): preserve rich content through JS sandbox#7787
jh-block wants to merge 12 commits intomainfrom
jhugo/code-mode-preserve-rich-content

Conversation

@jh-block
Copy link
Copy Markdown
Collaborator

@jh-block jh-block commented Mar 10, 2026

Summary

Code mode wraps MCP tools in a JS sandbox, but non-text content (resources, images, blobs) couldn't survive the JSON boundary. This fix introduces a token-based reference system: non-text content is stored on the Rust side, token references are passed to JS, and the resolved content is restored after execution.

The model's return value remains intact with token refs visible in the JSON, while actual rich content is appended as separate Content items. Also removed stdout/stderr as a return channel—console.log is now ignored entirely. This simplifies the interface of Code Mode (one return channel rather than three) and means that the freeform text of stdout/stderr doesn't have to be scanned for possible Content refs.

Type of Change

  • Bug fix
  • Refactor / Code quality

AI Assistance

  • This PR was created or reviewed with AI assistance

Testing

Manual testing confirmed Autovisualiser now renders visualizations successfully when called through code mode, preserving the HTML blob through the JS sandbox boundary.

Related Issues

Relates to #6167

🤖 Generated with Claude Code

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1a1e3dbe8b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2143e6abd2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +333 to +336
_ => {
// Non-text content (resources, images, blobs) — store for later
rich_contents.push(content.clone());
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Respect audience filtering for non-text tool content

This branch now forwards every non-text Content item into rich_contents without checking content.audience(), while text still enforces assistant/no-audience gating. In runs where a tool emits user-scoped artifacts (for example audience=[Role::User] on an image/resource), execute will now leak that payload back to the assistant/model. Apply the same audience predicate to non-text items before storing them.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentional — the audience metadata is preserved on the Content items and filtering happens downstream. When the tool response is included in messages sent to the model, agent_visible_content() calls filter_for_audience(Role::Assistant) on the ToolResponse, which filters out any content with audience: [User] (see message.rs:285-294). So the model never sees user-only content, but it does reach the UI where it needs to be rendered (e.g. the autovisualiser HTML blob).

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c883d4534

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f14e96154b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ccaa2d8b14

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1e22ea448d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1c86f061e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@jh-block jh-block requested a review from DOsinga March 11, 2026 15:57
@DOsinga DOsinga requested a review from alexhancock March 26, 2026 17:08
@DOsinga DOsinga added the needs_human label to set when a robot looks at a PR and can't handle it label Mar 26, 2026
Copy link
Copy Markdown
Collaborator

@alexhancock alexhancock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense and is a great catch

also cc @eliasposen for visibility

jh-block and others added 10 commits March 26, 2026 20:51
…en refs

When code mode wraps tools in a JS sandbox, non-text content (resources,
images, blobs) couldn't round-trip through the JSON return value. This fix:

1. Stores non-text content in a ContentStore keyed by unique tokens
2. Returns token references in the JSON payload to the JS runtime
3. After execution, collects referenced tokens and appends resolved content

This allows tools like Autovisualiser to return rich HTML visualizations
that survive the JS sandbox boundary. The model's return value stays intact
with token references, while the actual content is restored alongside.

Also removed stdout/stderr from the tool output channel—console.log is now
ignored and only return values are used, simplifying the interface.

Fixes #6167

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
…blings

1. Only treat top-level return as a pure content ref when the object has
   exactly the shape we created (_goose_content_ref + text_result). If the
   model spreads extra fields into the ref, include the text output too.

2. Always recurse into all object values in collect_rich_content, even
   after finding a ref. This ensures nested refs in sibling fields are
   not silently dropped.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The locally-generated snapshot included extra skills from the local
environment. Restored from main and applied only the console.log
instruction change.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Providers that only serialize text content (OpenAI, Codex, Anthropic)
would produce no tool output for the model when execute returns only
rich content. Always include a text placeholder so the model sees a
valid tool result.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
When a tool callback returns both rich content and a text_result,
use the actual text_result as the text fallback instead of a generic
placeholder, so the model still receives meaningful textual data.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
text_result can be a JSON object/number/boolean, not just a string.
Serialize non-string values to text instead of dropping them.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
When text_value is a JSON object, inject _goose_content_ref into it
instead of wrapping in a new object. This preserves the original
tool output schema so r.items etc. still works in JS code mode.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Use a _goose_pure_ref sentinel in the wrapper object so is_pure_ref
can't collide with real tool objects that happen to have text_result.
Track seen tokens in collect_rich_content to avoid duplicating rich
content when the same ref appears multiple times in the output.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@jh-block jh-block force-pushed the jhugo/code-mode-preserve-rich-content branch from 19a2ec9 to 5889b4f Compare March 26, 2026 19:55
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e182c5cf60

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +303 to +307
.output
.as_ref()
.and_then(|v| v.get("_goose_pure_ref"))
.and_then(|v| v.as_bool())
.unwrap_or(false);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict pure-ref fast path to untouched wrapper values

handle_execute_typescript currently treats any top-level output with _goose_pure_ref: true as an internal wrapper and drops the JSON object in favor of text_result plus rich content. This loses caller-added fields when a script augments a non-object rich tool result (for example const r = await Tool(); return { ...r, status: "ok" }), because the sentinel survives and still triggers this branch. The pure-ref shortcut should only run for the exact wrapper shape produced by create_tool_callback, otherwise structured return data is silently discarded.

Useful? React with 👍 / 👎.

Comment on lines +105 to +107
• agent-browser - Debug visual bugs and interact with web apps using agent-browser CLI. Use when debugging, inspecting, navigating, filling forms, clicking buttons, taking screenshots, scraping data, testing web apps, or automating browser tasks. Supports desktop browsers and iOS Simulator (Mobile Safari). 93% less context than Playwright MCP.
• codesearch - Use when searching, finding, locating, discovering, querying, looking up, or browsing code patterns, files, or implementations across all Square repositories via go/codesearch.
• find-skills - Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove machine-local skills from the golden prompt snapshot

These added skill bullets make the snapshot depend on whatever skills are installed on the author’s machine, not just repository fixtures. In summon::discover_filesystem_sources the test input includes global config/home skill directories, so entries like agent-browser/codesearch can vary by environment; committing them here makes test_all_platform_extensions brittle and likely to fail on CI or contributors without the same local skill set.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f82e166cc7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +420 to +423
map.insert(
"_goose_content_ref".to_string(),
Value::String(token),
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve existing _goose_content_ref fields on tool objects

When a tool returns an object plus rich content, create_tool_callback unconditionally does map.insert("_goose_content_ref", token). If the tool schema already includes a _goose_content_ref property, this silently overwrites the real value and changes the returned object shape, so downstream TypeScript code that relies on that field receives corrupted data. This regression is triggered specifically for object-shaped tool results that include non-text content.

Useful? React with 👍 / 👎.

} else {
let return_val = serde_json::to_string_pretty(&output.output)
.unwrap_or_else(|_| json!(&output.output).to_string());
let mut contents = vec![Content::text(format_output(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use output.markdown() (https://github.com/portofcontext/pctx/blob/main/crates/pctx_code_mode/src/model.rs#L185) here and delete custom format_output. The execute_typescript tool description says that anything that is returned or console logged by the script will be returned to the model.

@eliasposen
Copy link
Copy Markdown
Contributor

@jh-block looks great! is there way to test this? Either an actual unit test or just something you have been doing locally in a session?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs_human label to set when a robot looks at a PR and can't handle it

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants