Conversation
…turn from summarization Addresses review feedback: - compute_tool_call_cutoff now uses effective budget (context_limit * threshold) so users who set a low auto-compact threshold get more aggressive tool summarization - tool_ids_to_summarize takes protect_last_n to never summarize tool calls from the current turn, preventing the agent from eating its own tail in long tool sequences
Addresses feedback from power users who regularly exceed 100 tool calls on large context models. Changes: - Upper clamp raised from 100 to 500 (1M context → cutoff 100, same as before; but 5M context → cutoff 500 instead of capping at 100) - Batch size is now cutoff/5 (min 3) instead of fixed 3, so summarization fires in discrete bursts with breathing room before the next batch. e.g. cutoff 100: batch 20 pairs, won't fire again for ~20 turns.
# Conflicts: # crates/goose/src/context_mgmt/mod.rs
…zation # Conflicts: # crates/goose/src/agents/agent.rs # crates/goose/src/context_mgmt/mod.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b78ad36a03
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
crates/goose/src/agents/agent.rs
Outdated
| conversation = Conversation::new_unvalidated(updated_messages.clone()); | ||
| session_manager.add_message(&session_config.id, &summary_msg).await?; | ||
| conversation.push(summary_msg); |
There was a problem hiding this comment.
Preserve all generated summaries in agent conversation
When summaries has more than one entry (the new batch summarization path returns up to 10), this loop resets conversation from updated_messages.clone() on each iteration, but updated_messages never includes previously appended summary messages. As a result, once multiple tool pairs are hidden, only the last summary survives in the in-memory conversation used for subsequent turns, so the model loses context for earlier summarized tool calls.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0415a30a63
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| fn tool_pair_summarization_enabled() -> bool { | ||
| Config::global() | ||
| .get_param::<bool>("GOOSE_TOOL_PAIR_SUMMARIZATION") | ||
| .unwrap_or(true) |
There was a problem hiding this comment.
Gate background tool summaries for stateful providers
This default now enables background summarization for every provider (unwrap_or(true)), but maybe_summarize_tool_pairs calls provider.complete_fast using the active session_id, so the summarization prompt is sent as an extra model turn. That is unsafe for providers that keep conversation state internally per session, such as Claude Code (persistent CLI process and context per session_id in crates/goose/src/providers/claude_code.rs) and Gemini CLI (builds prompts from only the latest user message once a CLI session exists in crates/goose/src/providers/gemini_cli.rs). Once the tool-call threshold is crossed, these hidden summarization requests can pollute provider-side history and degrade subsequent replies/tool behavior; this should be gated by provider capability or isolated to a separate summarization session id.
Useful? React with 👍 / 👎.
michaelneale
left a comment
There was a problem hiding this comment.
I think ok - as long as for very large contexts it can grow really large!
…zation # Conflicts: # crates/goose/src/agents/agent.rs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1d0350dd30
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
crates/goose/src/agents/agent.rs
Outdated
| let tool_pair_summarization_task = crate::context_mgmt::maybe_summarize_tool_pairs( | ||
| self.provider().await?, | ||
| session_config.id.clone(), | ||
| conversation.clone(), | ||
| tool_call_cut_off, | ||
| current_turn_tool_count, | ||
| ); |
There was a problem hiding this comment.
Abort background summarization when reply setup fails
maybe_summarize_tool_pairs is spawned before later fallible ? operations in the loop (including stream_response_from_provider(...).await?), so if the reply path exits early on provider/setup errors, the join handle is dropped without being awaited or aborted. Because summarization is now enabled by default, that detached task can still run hidden complete_fast calls (up to a batch) and consume tokens/credits even though the turn failed and no summaries are applied to session state. Please ensure this task is canceled/joined on early-return paths or spawn it only after fallible setup succeeds.
Useful? React with 👍 / 👎.
| // See ENABLE_TOOL_PAIR_SUMMARIZATION constant above. | ||
| if !ENABLE_TOOL_PAIR_SUMMARIZATION { | ||
| return None; | ||
| if !tool_pair_summarization_enabled() || provider.manages_own_context() { |
There was a problem hiding this comment.
Propagate stateful-context flag through provider wrappers
This guard only checks provider.manages_own_context() on the top-level provider, but the new trait default is false; wrappers like LeadWorkerProvider therefore report false unless they explicitly override it. In lead/worker configurations that wrap stateful providers (e.g., Claude Code or Gemini CLI), background tool summarization will still run and inject hidden summarization turns into the same session, which reintroduces the context-pollution issue this check is intended to prevent. The capability should be propagated from underlying providers when wrappers are used.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 06b56d7e54
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| let context_limit = self | ||
| .provider() | ||
| .await | ||
| .map(|p| p.get_model_config().context_limit()) | ||
| .unwrap_or(crate::model::DEFAULT_CONTEXT_LIMIT); |
There was a problem hiding this comment.
Compute tool cutoff from the active model context
The new dynamic cutoff is derived once from self.provider().get_model_config().context_limit() before the reply loop, but wrapper providers can change the active model per turn. In particular, LeadWorkerProvider::get_model_config() returns the lead model config (crates/goose/src/providers/lead_worker.rs), so when the worker model has a smaller context window this overestimates the cutoff and delays summarization, making worker turns much more likely to hit ContextLengthExceeded. Derive the cutoff from the model actually used for the turn (or a conservative minimum across wrapped providers).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
somewhat true yeah, but in general doesn't matter that much for tool summarization since it is a bonux
* origin/main: fix(openai): use Responses API for gpt-5.4 (#7982) Remove lead/worker provider (#7989) chore(release): release version 1.28.0 (#7991) Fix empty tool results from resource content (e.g. auto visualiser) (#7866) Separate SSE streaming from POST work submission (#7834) fix: include token usage in Databricks streaming responses (#7959) Optimize tool summarization (#7938)
* main: (22 commits) feat: add gemini-acp provider, update docs on subscription models + improvements to codex (#8000) fix(openai): use Responses API for gpt-5.4 (#7982) Remove lead/worker provider (#7989) chore(release): release version 1.28.0 (#7991) Fix empty tool results from resource content (e.g. auto visualiser) (#7866) Separate SSE streaming from POST work submission (#7834) fix: include token usage in Databricks streaming responses (#7959) Optimize tool summarization (#7938) fix: overwrite the deprecated googledrive extension config (#7974) refactor: remove unnecessary Arc<Mutex> from tool execution pipeline (#7979) Revert message flush & test (#7966) docs: add Remote Access section with Telegram Gateway documentation (#7955) fix: update webmcp blog post metadata image URL (#7967) fix: clean up OAuth token cache on provider deletion (#7908) fix: hard-coded tool call id in code mode callback (#7939) Fix SSE parsers to accept optional space after data: prefix (#7929) docs: add GOOSE_INPUT_LIMIT to config-files.md (#7961) Add WebMCP for Beginners blog post (#7957) Fix download manager (#7933) Improve the formatting of tool calls, show thinking, treat Reasoning and Thinking as the same thing (sorry Kant) (#7626) ...
Co-authored-by: Michael Neale <[email protected]> Co-authored-by: Douwe Osinga <[email protected]> Signed-off-by: esnyder <[email protected]>
Co-authored-by: Michael Neale <[email protected]> Co-authored-by: Douwe Osinga <[email protected]> Signed-off-by: esnyder <[email protected]>
Summary
Resurrect tool call/reponse compacting based on @michaelneale previous attempt. this adds a fix batch size to be easier on th KV cache