Conversation
There was a problem hiding this comment.
Pull request overview
This PR centralizes provider max-token configuration by routing request token limits through ModelConfig::max_output_tokens() (instead of per-provider defaults / ad hoc overrides), and removes an Apps-specific max token override so canonical model limits drive app generation output size.
Changes:
- Replace provider-specific
max_tokens/max_new_tokensdefaults withmodel_config.max_output_tokens()across multiple request formatters. - Always include output token limits in OpenAI (chat + responses) and Google request payloads.
- Remove Apps extension logic that forcibly set
max_tokens = 16384, relying on the provider’s canonical model config instead.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/goose/src/providers/sagemaker_tgi.rs | Uses max_output_tokens() for TGI max_new_tokens instead of a hardcoded default. |
| crates/goose/src/providers/formats/snowflake.rs | Uses max_output_tokens() for Snowflake max_tokens field. |
| crates/goose/src/providers/formats/openai_responses.rs | Always inserts max_output_tokens derived from max_output_tokens(). |
| crates/goose/src/providers/formats/openai.rs | Always inserts either max_tokens or max_completion_tokens from max_output_tokens(). |
| crates/goose/src/providers/formats/google.rs | Always sets generation_config with max_output_tokens from max_output_tokens(). |
| crates/goose/src/providers/formats/databricks.rs | Uses max_output_tokens() for max token fields; refactors thinking budget parsing. |
| crates/goose/src/providers/formats/anthropic.rs | Refactors thinking budget parsing for Claude thinking mode. |
| crates/goose/src/model.rs | Changes the fallback returned by max_output_tokens() when unset. |
| crates/goose/src/agents/platform_extensions/apps.rs | Removes forced max_tokens = 16384 override for Apps tool calls. |
Comments suppressed due to low confidence (1)
crates/goose/src/providers/formats/anthropic.rs:444
max_tokens + budget_tokenscan overflowi32ifCLAUDE_THINKING_BUDGETis set large, potentially wrapping to a negative value in release builds; use a checked/saturating add (or compute ini64and clamp) before inserting into the payload.
payload
.as_object_mut()
.unwrap()
.insert("max_tokens".to_string(), json!(max_tokens + budget_tokens));
| // With thinking enabled, max_tokens must include both output and thinking budget | ||
| let max_tokens = model_config.max_output_tokens() + budget_tokens; | ||
| payload | ||
| .as_object_mut() | ||
| .unwrap() | ||
| .insert("max_tokens".to_string(), json!(max_tokens)); |
There was a problem hiding this comment.
model_config.max_output_tokens() + budget_tokens is an i32 addition that can overflow if CLAUDE_THINKING_BUDGET is set large, which can wrap to a negative max_tokens in release builds; use checked/saturating arithmetic (or compute in i64 and clamp) before serializing.
| "parameters": { | ||
| "max_new_tokens": self.model.max_tokens.unwrap_or(150), | ||
| "max_new_tokens": self.model.max_output_tokens(), | ||
| "temperature": self.model.temperature.unwrap_or(0.7), |
There was a problem hiding this comment.
Switching TGI's max_new_tokens from a small provider-specific default (150) to model.max_output_tokens() means an unset/unknown model will now request 16,384 tokens by default, which is likely to cause slow/expensive generations or SageMaker endpoint errors; consider keeping a conservative TGI fallback (e.g., use max_tokens only when explicitly set, otherwise default to ~150 or a configurable env var).
| let budget_tokens: i32 = std::env::var("CLAUDE_THINKING_BUDGET") | ||
| .ok() | ||
| .and_then(|s| s.parse().ok()) | ||
| .unwrap_or(DEFAULT_THINKING_BUDGET); |
There was a problem hiding this comment.
The comment says Anthropic requires budget_tokens >= 1024, but CLAUDE_THINKING_BUDGET is parsed without enforcing that minimum (or preventing negatives), which can lead to Databricks request rejection; clamp/validate the parsed value to at least 1024 before using it.
| .unwrap_or(DEFAULT_THINKING_BUDGET); | |
| .unwrap_or(DEFAULT_THINKING_BUDGET) | |
| .max(1024); |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
| } | ||
|
|
||
| // Priority 2: Global default | ||
| 4_096 |
There was a problem hiding this comment.
Was separately thinking about making this bigger. Too small seems strictly worse as it can lead to truncation.
There was a problem hiding this comment.
I had it higher for a bit, but presumably we almost always have a real value here so then erring on a lower side is probably better
…m-cache * 'main' of github.com:block/goose: chore(release): release version 1.25.0 (minor) (#7263) Docs: Community all-stars and page update (#7483) fix: searchbar zindex modal overlay (#7502) blog: Order Lunch Without Leaving Your AI Agent (#7505) feat: gateway to chat to goose - telegram etc (#7199) Option to add changeable defaults in goose-releases (#7373) Fix out of order messages (#7472) fix: ensure latest session always displays in sidebar (#7489) docs: rename TLDR to Quick Install in MCP tutorials (#7493) docs: update Neighborhood extension page with video embed and layout improvements (#7473) feat: let AskAI Discord bot see channels in the server (#7480) Apps token limit (#7474) fix(apps): declare color-scheme to allow transparent MCP App iframes (#7479)
Co-authored-by: Douwe Osinga <[email protected]> Co-authored-by: Copilot <[email protected]> Signed-off-by: Oleg Levchenko <[email protected]>
Summary
makes the canonical model config drive the output tokens that we set for the providers