Skip to content

feat: return structured {stdout, stderr} from shell tool with output schema#7604

Merged
jh-block merged 10 commits intomainfrom
jhugo/shell-structured-output
Mar 2, 2026
Merged

feat: return structured {stdout, stderr} from shell tool with output schema#7604
jh-block merged 10 commits intomainfrom
jhugo/shell-structured-output

Conversation

@jh-block
Copy link
Copy Markdown
Collaborator

@jh-block jh-block commented Mar 2, 2026

Problem

When using code mode with local models, the model naturally writes result.stdout and result.stderr to access shell output. However, the shell tool was:

  1. Merging stdout and stderr into a single string
  2. Not declaring an output schema, so code mode generated Promise<any> as the return type
  3. Describing itself as "return both stdout/stderr" which implied a structured object

This caused result.stdout and result.stderr to both be undefined.

Changes

  • Structured output schema: added ShellOutput { stdout, stderr, exit_code, timed_out } with .with_output_schema::<ShellOutput>() so pctx_code_mode generates typed TypeScript interfaces instead of Promise<any>
  • Structured content on all paths: every return path (success, non-zero exit, timeout, parse error, spawn failure) attaches a ShellOutput-shaped structured_content so code-mode callers always get {stdout, stderr, exit_code, timed_out}
  • Interleaved display text: output is collected as tagged lines preserving chronological arrival order across stdout/stderr for the human-facing text content
  • Raw empty strings in structured fields: empty streams are "" in structured content (the "(no output)" placeholder is only used in display text)
  • Truncation on structured fields: stdout and stderr in structured_content are individually truncated via the same 2000-line/50KB limits, with overflow saved to per-stream temp files
  • Single-copy memory: output is stored once as tagged lines; stdout, stderr, and interleaved strings are derived on demand without duplicate buffering
  • Updated description: clarified that the tool returns an object with separate fields

Testing

All 24 developer extension tests pass, including a new test for per-label overflow file separation.

…schema

The shell tool now returns stdout and stderr as separate fields instead of
merging them into a single string. Adds an output schema so code mode
generates typed TypeScript interfaces (Promise<{stdout: string, stderr: string}>)
instead of Promise<any>, fixing local models that assume the return shape.
@jh-block jh-block requested a review from DOsinga March 2, 2026 16:20
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d344a89c4d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…files per stream

Address code review feedback:
- Set structured_content on timeout and non-zero exit error results so
  code-mode callers get {stdout, stderr} even on failed commands
- Use per-label temp files for overflow so stdout and stderr don't
  overwrite each other when both exceed the limit
- Add test for separate overflow file paths
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2cb0d0f3c5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@jamadeo jamadeo requested a review from baxen March 2, 2026 16:45
…g for display

Address code review feedback:
- Add exit_code field to ShellOutput so code-mode callers can detect
  failure without relying on text parsing
- Restore interleaved stdout/stderr collection for the text display
  content, preserving chronological line ordering while still providing
  separate streams in structured_content
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70bff70692

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…ntent on all error paths

Address code review feedback:
- Structured stdout/stderr fields now use raw empty strings instead of
  the '(no output)' placeholder, which is only used in the human-facing
  text content
- Early error returns (empty command, spawn failure) now include a
  ShellOutput-shaped structured_content so code-mode callers always get
  {stdout, stderr, exit_code} regardless of the error path
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0edf23052c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

The structured ShellOutput fields were using raw unbounded output,
bypassing the 2000-line/50KB limits that only applied to the display
text. Since code mode prefers structured_content, large commands could
return unbounded payloads. Now stdout and stderr in structured_content
are individually truncated via the same limits.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 29abe3690a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Store output as tagged lines (is_stderr, text) in ExecutionOutput — the
single source of truth. Derive stdout, stderr, and interleaved strings
on demand via split_lines() in shell_with_cwd. This avoids keeping 2x
the output in memory while preserving chronological interleaving in the
display text.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 214efcfd6f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Track whether each stream has started rather than checking if the
buffer is empty, so leading empty lines are correctly preserved.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be220e278c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Structured content now includes a timed_out boolean so code-mode
callers can reliably detect timeout failures without parsing text.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc88c354d7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

jh-block added 2 commits March 2, 2026 14:06
The shell match arm in DeveloperClient::call_tool now uses
ShellTool::error_result for parse errors, ensuring structured_content
matches the declared output schema on all error paths.
…d-output

* origin/main:
  Improve custom provider creation experience (#7541)
  fix(scheduler): schedules added via CLI showing up in UI (#7594)

/// Collect lines from stdout and stderr in arrival order, tagging each with its source.
/// Returns a vec of (is_stderr, line_text) preserving interleaved ordering.
async fn collect_tagged_lines(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just return the three buffers instead of packaging it up like that and then unpacking it again?

Copy link
Copy Markdown
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems much better

@jh-block jh-block added this pull request to the merge queue Mar 2, 2026
Merged via the queue into main with commit 6702936 Mar 2, 2026
27 of 28 checks passed
@jh-block jh-block deleted the jhugo/shell-structured-output branch March 2, 2026 20:42
wpfleger96 added a commit that referenced this pull request Mar 3, 2026
* origin/main: (27 commits)
  feat: return structured {stdout, stderr} from shell tool with output schema (#7604)
  Improve custom provider creation experience (#7541)
  fix(scheduler): schedules added via CLI showing up in UI (#7594)
  chore: openai reasoning model cleanup (#7529)
  chore(deps): bump hono from 4.12.1 to 4.12.3 in /evals/open-model-gym/mcp-harness (#7585)
  chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498)
  chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368)
  Better network failure error & antrhopic retry (#7595)
  feat: make the text bar persistent and add a queue for messages (#7560)
  fix: outdated clippy command in goosehints (#7590)
  chore(deps): bump hono from 4.11.7 to 4.12.1 in /evals/open-model-gym/mcp-harness (#7417)
  chore(deps-dev): bump ajv from 6.12.6 to 6.14.0 in /ui/desktop (#7437)
  chore(deps): bump ajv from 8.17.1 to 8.18.0 in /evals/open-model-gym/mcp-harness (#7491)
  chore(deps): bump hono from 4.12.0 to 4.12.2 in /ui/desktop (#7515)
  chore(deps-dev): bump rollup from 4.57.1 to 4.59.0 in /ui/desktop (#7522)
  chore(deps): bump minimatch in /ui/desktop (#7572)
  fix: validate configure probe for streaming providers (#7564)
  Dockerfile: add missing build/runtime dependencies (#7546)
  fix(claude-code): Permission routing for smart-approve (#7501)
  Add base_path field to custom provider config (#7558)
  ...
lifeizhou-ap added a commit that referenced this pull request Mar 3, 2026
* main: (74 commits)
  feat: return structured {stdout, stderr} from shell tool with output schema (#7604)
  Improve custom provider creation experience (#7541)
  fix(scheduler): schedules added via CLI showing up in UI (#7594)
  chore: openai reasoning model cleanup (#7529)
  chore(deps): bump hono from 4.12.1 to 4.12.3 in /evals/open-model-gym/mcp-harness (#7585)
  chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498)
  chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368)
  Better network failure error & antrhopic retry (#7595)
  feat: make the text bar persistent and add a queue for messages (#7560)
  fix: outdated clippy command in goosehints (#7590)
  chore(deps): bump hono from 4.11.7 to 4.12.1 in /evals/open-model-gym/mcp-harness (#7417)
  chore(deps-dev): bump ajv from 6.12.6 to 6.14.0 in /ui/desktop (#7437)
  chore(deps): bump ajv from 8.17.1 to 8.18.0 in /evals/open-model-gym/mcp-harness (#7491)
  chore(deps): bump hono from 4.12.0 to 4.12.2 in /ui/desktop (#7515)
  chore(deps-dev): bump rollup from 4.57.1 to 4.59.0 in /ui/desktop (#7522)
  chore(deps): bump minimatch in /ui/desktop (#7572)
  fix: validate configure probe for streaming providers (#7564)
  Dockerfile: add missing build/runtime dependencies (#7546)
  fix(claude-code): Permission routing for smart-approve (#7501)
  Add base_path field to custom provider config (#7558)
  ...
tlongwell-block added a commit that referenced this pull request Mar 4, 2026
* origin/main:
  feat: return structured {stdout, stderr} from shell tool with output schema (#7604)
  Improve custom provider creation experience (#7541)
  fix(scheduler): schedules added via CLI showing up in UI (#7594)
  chore: openai reasoning model cleanup (#7529)
  chore(deps): bump hono from 4.12.1 to 4.12.3 in /evals/open-model-gym/mcp-harness (#7585)
  chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498)
  chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368)
  Better network failure error & antrhopic retry (#7595)
  feat: make the text bar persistent and add a queue for messages (#7560)
craigwalkeruk pushed a commit to craigwalkeruk/custom-goose that referenced this pull request Mar 5, 2026
tlongwell-block added a commit that referenced this pull request Mar 5, 2026
* origin/main: (107 commits)
  Merge platform/builtin extensions (#7630)
  Clean up stale references to removed components (#7644)
  fix: scope empty session reuse to current window to prevent session mixing (#7602)
  fix: prevent abort in local inference  (#7633)
  Revert git patch for llama-cpp-2 (#7642)
  docs: update recipe usage step to reflect auto-submit behavior (#7639)
  docs: add guide for customizing the sidebar (#7638)
  docs: update Claude Code approve behavior and model list in cli-providers guide (#7448)
  fix: restore provider and extensions for LRU-evicted sessions (#7616)
  Restore goosed logging (#7622)
  feat: return structured {stdout, stderr} from shell tool with output schema (#7604)
  Improve custom provider creation experience (#7541)
  fix(scheduler): schedules added via CLI showing up in UI (#7594)
  chore: openai reasoning model cleanup (#7529)
  chore(deps): bump hono from 4.12.1 to 4.12.3 in /evals/open-model-gym/mcp-harness (#7585)
  chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498)
  chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368)
  Better network failure error & antrhopic retry (#7595)
  feat: make the text bar persistent and add a queue for messages (#7560)
  fix: outdated clippy command in goosehints (#7590)
  ...

# Conflicts:
#	Cargo.lock
#	Cargo.toml
#	crates/goose-server/src/commands/agent.rs
#	crates/goose-server/src/main.rs
#	crates/goose-server/src/routes/reply.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants