Skip to content

Apps token limit#7474

Merged
DOsinga merged 5 commits intomainfrom
apps-token-limit
Feb 24, 2026
Merged

Apps token limit#7474
DOsinga merged 5 commits intomainfrom
apps-token-limit

Conversation

@DOsinga
Copy link
Copy Markdown
Collaborator

@DOsinga DOsinga commented Feb 24, 2026

Summary

makes the canonical model config drive the output tokens that we set for the providers

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR centralizes provider max-token configuration by routing request token limits through ModelConfig::max_output_tokens() (instead of per-provider defaults / ad hoc overrides), and removes an Apps-specific max token override so canonical model limits drive app generation output size.

Changes:

  • Replace provider-specific max_tokens/max_new_tokens defaults with model_config.max_output_tokens() across multiple request formatters.
  • Always include output token limits in OpenAI (chat + responses) and Google request payloads.
  • Remove Apps extension logic that forcibly set max_tokens = 16384, relying on the provider’s canonical model config instead.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
crates/goose/src/providers/sagemaker_tgi.rs Uses max_output_tokens() for TGI max_new_tokens instead of a hardcoded default.
crates/goose/src/providers/formats/snowflake.rs Uses max_output_tokens() for Snowflake max_tokens field.
crates/goose/src/providers/formats/openai_responses.rs Always inserts max_output_tokens derived from max_output_tokens().
crates/goose/src/providers/formats/openai.rs Always inserts either max_tokens or max_completion_tokens from max_output_tokens().
crates/goose/src/providers/formats/google.rs Always sets generation_config with max_output_tokens from max_output_tokens().
crates/goose/src/providers/formats/databricks.rs Uses max_output_tokens() for max token fields; refactors thinking budget parsing.
crates/goose/src/providers/formats/anthropic.rs Refactors thinking budget parsing for Claude thinking mode.
crates/goose/src/model.rs Changes the fallback returned by max_output_tokens() when unset.
crates/goose/src/agents/platform_extensions/apps.rs Removes forced max_tokens = 16384 override for Apps tool calls.
Comments suppressed due to low confidence (1)

crates/goose/src/providers/formats/anthropic.rs:444

  • max_tokens + budget_tokens can overflow i32 if CLAUDE_THINKING_BUDGET is set large, potentially wrapping to a negative value in release builds; use a checked/saturating add (or compute in i64 and clamp) before inserting into the payload.
        payload
            .as_object_mut()
            .unwrap()
            .insert("max_tokens".to_string(), json!(max_tokens + budget_tokens));

Comment on lines +623 to +628
// With thinking enabled, max_tokens must include both output and thinking budget
let max_tokens = model_config.max_output_tokens() + budget_tokens;
payload
.as_object_mut()
.unwrap()
.insert("max_tokens".to_string(), json!(max_tokens));
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_config.max_output_tokens() + budget_tokens is an i32 addition that can overflow if CLAUDE_THINKING_BUDGET is set large, which can wrap to a negative max_tokens in release builds; use checked/saturating arithmetic (or compute in i64 and clamp) before serializing.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i32 is 2 billion

Comment on lines 151 to 153
"parameters": {
"max_new_tokens": self.model.max_tokens.unwrap_or(150),
"max_new_tokens": self.model.max_output_tokens(),
"temperature": self.model.temperature.unwrap_or(0.7),
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching TGI's max_new_tokens from a small provider-specific default (150) to model.max_output_tokens() means an unset/unknown model will now request 16,384 tokens by default, which is likely to cause slow/expensive generations or SageMaker endpoint errors; consider keeping a conservative TGI fallback (e.g., use max_tokens only when explicitly set, otherwise default to ~150 or a configurable env var).

Copilot uses AI. Check for mistakes.
let budget_tokens: i32 = std::env::var("CLAUDE_THINKING_BUDGET")
.ok()
.and_then(|s| s.parse().ok())
.unwrap_or(DEFAULT_THINKING_BUDGET);
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says Anthropic requires budget_tokens >= 1024, but CLAUDE_THINKING_BUDGET is parsed without enforcing that minimum (or preventing negatives), which can lead to Databricks request rejection; clamp/validate the parsed value to at least 1024 before using it.

Suggested change
.unwrap_or(DEFAULT_THINKING_BUDGET);
.unwrap_or(DEFAULT_THINKING_BUDGET)
.max(1024);

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings February 24, 2026 16:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Contributor

@katzdave katzdave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

}

// Priority 2: Global default
4_096
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was separately thinking about making this bigger. Too small seems strictly worse as it can lead to truncation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it higher for a bit, but presumably we almost always have a real value here so then erring on a lower side is probably better

@DOsinga DOsinga added this pull request to the merge queue Feb 24, 2026
Merged via the queue into main with commit e7b1566 Feb 24, 2026
21 checks passed
@DOsinga DOsinga deleted the apps-token-limit branch February 24, 2026 19:46
zanesq added a commit that referenced this pull request Feb 25, 2026
…m-cache

* 'main' of github.com:block/goose:
  chore(release): release version 1.25.0 (minor) (#7263)
  Docs: Community all-stars and page update (#7483)
  fix: searchbar zindex modal overlay (#7502)
  blog: Order Lunch Without Leaving Your AI Agent (#7505)
  feat: gateway to chat to goose - telegram etc (#7199)
  Option to add changeable defaults in goose-releases (#7373)
  Fix out of order messages (#7472)
  fix: ensure latest session always displays in sidebar (#7489)
  docs: rename TLDR to Quick Install in MCP tutorials (#7493)
  docs: update Neighborhood extension page with video embed and layout improvements (#7473)
  feat: let AskAI Discord bot see channels in the server (#7480)
  Apps token limit (#7474)
  fix(apps): declare color-scheme to allow transparent MCP App iframes (#7479)
u35tpus pushed a commit to u35tpus/goose that referenced this pull request Feb 26, 2026
Co-authored-by: Douwe Osinga <[email protected]>
Co-authored-by: Copilot <[email protected]>
Signed-off-by: Oleg Levchenko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants