feat: return structured {stdout, stderr} from shell tool with output schema#7604
feat: return structured {stdout, stderr} from shell tool with output schema#7604
Conversation
…schema
The shell tool now returns stdout and stderr as separate fields instead of
merging them into a single string. Adds an output schema so code mode
generates typed TypeScript interfaces (Promise<{stdout: string, stderr: string}>)
instead of Promise<any>, fixing local models that assume the return shape.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d344a89c4d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…files per stream
Address code review feedback:
- Set structured_content on timeout and non-zero exit error results so
code-mode callers get {stdout, stderr} even on failed commands
- Use per-label temp files for overflow so stdout and stderr don't
overwrite each other when both exceed the limit
- Add test for separate overflow file paths
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2cb0d0f3c5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…g for display Address code review feedback: - Add exit_code field to ShellOutput so code-mode callers can detect failure without relying on text parsing - Restore interleaved stdout/stderr collection for the text display content, preserving chronological line ordering while still providing separate streams in structured_content
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 70bff70692
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
…ntent on all error paths
Address code review feedback:
- Structured stdout/stderr fields now use raw empty strings instead of
the '(no output)' placeholder, which is only used in the human-facing
text content
- Early error returns (empty command, spawn failure) now include a
ShellOutput-shaped structured_content so code-mode callers always get
{stdout, stderr, exit_code} regardless of the error path
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0edf23052c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The structured ShellOutput fields were using raw unbounded output, bypassing the 2000-line/50KB limits that only applied to the display text. Since code mode prefers structured_content, large commands could return unbounded payloads. Now stdout and stderr in structured_content are individually truncated via the same limits.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29abe3690a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Store output as tagged lines (is_stderr, text) in ExecutionOutput — the single source of truth. Derive stdout, stderr, and interleaved strings on demand via split_lines() in shell_with_cwd. This avoids keeping 2x the output in memory while preserving chronological interleaving in the display text.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 214efcfd6f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Track whether each stream has started rather than checking if the buffer is empty, so leading empty lines are correctly preserved.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be220e278c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Structured content now includes a timed_out boolean so code-mode callers can reliably detect timeout failures without parsing text.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc88c354d7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The shell match arm in DeveloperClient::call_tool now uses ShellTool::error_result for parse errors, ensuring structured_content matches the declared output schema on all error paths.
|
|
||
| /// Collect lines from stdout and stderr in arrival order, tagging each with its source. | ||
| /// Returns a vec of (is_stderr, line_text) preserving interleaved ordering. | ||
| async fn collect_tagged_lines( |
There was a problem hiding this comment.
why not just return the three buffers instead of packaging it up like that and then unpacking it again?
* origin/main: (27 commits)
feat: return structured {stdout, stderr} from shell tool with output schema (#7604)
Improve custom provider creation experience (#7541)
fix(scheduler): schedules added via CLI showing up in UI (#7594)
chore: openai reasoning model cleanup (#7529)
chore(deps): bump hono from 4.12.1 to 4.12.3 in /evals/open-model-gym/mcp-harness (#7585)
chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498)
chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368)
Better network failure error & antrhopic retry (#7595)
feat: make the text bar persistent and add a queue for messages (#7560)
fix: outdated clippy command in goosehints (#7590)
chore(deps): bump hono from 4.11.7 to 4.12.1 in /evals/open-model-gym/mcp-harness (#7417)
chore(deps-dev): bump ajv from 6.12.6 to 6.14.0 in /ui/desktop (#7437)
chore(deps): bump ajv from 8.17.1 to 8.18.0 in /evals/open-model-gym/mcp-harness (#7491)
chore(deps): bump hono from 4.12.0 to 4.12.2 in /ui/desktop (#7515)
chore(deps-dev): bump rollup from 4.57.1 to 4.59.0 in /ui/desktop (#7522)
chore(deps): bump minimatch in /ui/desktop (#7572)
fix: validate configure probe for streaming providers (#7564)
Dockerfile: add missing build/runtime dependencies (#7546)
fix(claude-code): Permission routing for smart-approve (#7501)
Add base_path field to custom provider config (#7558)
...
* main: (74 commits)
feat: return structured {stdout, stderr} from shell tool with output schema (#7604)
Improve custom provider creation experience (#7541)
fix(scheduler): schedules added via CLI showing up in UI (#7594)
chore: openai reasoning model cleanup (#7529)
chore(deps): bump hono from 4.12.1 to 4.12.3 in /evals/open-model-gym/mcp-harness (#7585)
chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498)
chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368)
Better network failure error & antrhopic retry (#7595)
feat: make the text bar persistent and add a queue for messages (#7560)
fix: outdated clippy command in goosehints (#7590)
chore(deps): bump hono from 4.11.7 to 4.12.1 in /evals/open-model-gym/mcp-harness (#7417)
chore(deps-dev): bump ajv from 6.12.6 to 6.14.0 in /ui/desktop (#7437)
chore(deps): bump ajv from 8.17.1 to 8.18.0 in /evals/open-model-gym/mcp-harness (#7491)
chore(deps): bump hono from 4.12.0 to 4.12.2 in /ui/desktop (#7515)
chore(deps-dev): bump rollup from 4.57.1 to 4.59.0 in /ui/desktop (#7522)
chore(deps): bump minimatch in /ui/desktop (#7572)
fix: validate configure probe for streaming providers (#7564)
Dockerfile: add missing build/runtime dependencies (#7546)
fix(claude-code): Permission routing for smart-approve (#7501)
Add base_path field to custom provider config (#7558)
...
* origin/main:
feat: return structured {stdout, stderr} from shell tool with output schema (#7604)
Improve custom provider creation experience (#7541)
fix(scheduler): schedules added via CLI showing up in UI (#7594)
chore: openai reasoning model cleanup (#7529)
chore(deps): bump hono from 4.12.1 to 4.12.3 in /evals/open-model-gym/mcp-harness (#7585)
chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498)
chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368)
Better network failure error & antrhopic retry (#7595)
feat: make the text bar persistent and add a queue for messages (#7560)
* origin/main: (107 commits) Merge platform/builtin extensions (#7630) Clean up stale references to removed components (#7644) fix: scope empty session reuse to current window to prevent session mixing (#7602) fix: prevent abort in local inference (#7633) Revert git patch for llama-cpp-2 (#7642) docs: update recipe usage step to reflect auto-submit behavior (#7639) docs: add guide for customizing the sidebar (#7638) docs: update Claude Code approve behavior and model list in cli-providers guide (#7448) fix: restore provider and extensions for LRU-evicted sessions (#7616) Restore goosed logging (#7622) feat: return structured {stdout, stderr} from shell tool with output schema (#7604) Improve custom provider creation experience (#7541) fix(scheduler): schedules added via CLI showing up in UI (#7594) chore: openai reasoning model cleanup (#7529) chore(deps): bump hono from 4.12.1 to 4.12.3 in /evals/open-model-gym/mcp-harness (#7585) chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498) chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368) Better network failure error & antrhopic retry (#7595) feat: make the text bar persistent and add a queue for messages (#7560) fix: outdated clippy command in goosehints (#7590) ... # Conflicts: # Cargo.lock # Cargo.toml # crates/goose-server/src/commands/agent.rs # crates/goose-server/src/main.rs # crates/goose-server/src/routes/reply.rs
Problem
When using code mode with local models, the model naturally writes
result.stdoutandresult.stderrto access shell output. However, the shell tool was:Promise<any>as the return typeThis caused
result.stdoutandresult.stderrto both beundefined.Changes
ShellOutput { stdout, stderr, exit_code, timed_out }with.with_output_schema::<ShellOutput>()sopctx_code_modegenerates typed TypeScript interfaces instead ofPromise<any>ShellOutput-shapedstructured_contentso code-mode callers always get{stdout, stderr, exit_code, timed_out}""in structured content (the"(no output)"placeholder is only used in display text)structured_contentare individually truncated via the same 2000-line/50KB limits, with overflow saved to per-stream temp filesTesting
All 24 developer extension tests pass, including a new test for per-label overflow file separation.