Skip to content

Optimize tool summarization#7938

Merged
DOsinga merged 12 commits intomainfrom
optimize-tool-summarization
Mar 18, 2026
Merged

Optimize tool summarization#7938
DOsinga merged 12 commits intomainfrom
optimize-tool-summarization

Conversation

@DOsinga
Copy link
Copy Markdown
Collaborator

@DOsinga DOsinga commented Mar 16, 2026

Summary

Resurrect tool call/reponse compacting based on @michaelneale previous attempt. this adds a fix batch size to be easier on th KV cache

michaelneale and others added 8 commits February 24, 2026 20:13
…turn from summarization

Addresses review feedback:
- compute_tool_call_cutoff now uses effective budget (context_limit * threshold)
  so users who set a low auto-compact threshold get more aggressive tool summarization
- tool_ids_to_summarize takes protect_last_n to never summarize tool calls from the
  current turn, preventing the agent from eating its own tail in long tool sequences
Addresses feedback from power users who regularly exceed 100 tool calls
on large context models. Changes:
- Upper clamp raised from 100 to 500 (1M context → cutoff 100, same as
  before; but 5M context → cutoff 500 instead of capping at 100)
- Batch size is now cutoff/5 (min 3) instead of fixed 3, so summarization
  fires in discrete bursts with breathing room before the next batch.
  e.g. cutoff 100: batch 20 pairs, won't fire again for ~20 turns.
# Conflicts:
#	crates/goose/src/context_mgmt/mod.rs
…zation

# Conflicts:
#	crates/goose/src/agents/agent.rs
#	crates/goose/src/context_mgmt/mod.rs
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b78ad36a03

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1709 to +1711
conversation = Conversation::new_unvalidated(updated_messages.clone());
session_manager.add_message(&session_config.id, &summary_msg).await?;
conversation.push(summary_msg);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve all generated summaries in agent conversation

When summaries has more than one entry (the new batch summarization path returns up to 10), this loop resets conversation from updated_messages.clone() on each iteration, but updated_messages never includes previously appended summary messages. As a result, once multiple tool pairs are hidden, only the last summary survives in the in-memory conversation used for subsequent turns, so the model loses context for earlier summarized tool calls.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0415a30a63

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +23 to +26
fn tool_pair_summarization_enabled() -> bool {
Config::global()
.get_param::<bool>("GOOSE_TOOL_PAIR_SUMMARIZATION")
.unwrap_or(true)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate background tool summaries for stateful providers

This default now enables background summarization for every provider (unwrap_or(true)), but maybe_summarize_tool_pairs calls provider.complete_fast using the active session_id, so the summarization prompt is sent as an extra model turn. That is unsafe for providers that keep conversation state internally per session, such as Claude Code (persistent CLI process and context per session_id in crates/goose/src/providers/claude_code.rs) and Gemini CLI (builds prompts from only the latest user message once a CLI session exists in crates/goose/src/providers/gemini_cli.rs). Once the tool-call threshold is crossed, these hidden summarization requests can pollute provider-side history and degrade subsequent replies/tool behavior; this should be gated by provider capability or isolated to a separate summarization session id.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ok - as long as for very large contexts it can grow really large!

Douwe Osinga added 2 commits March 18, 2026 06:34
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1d0350dd30

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 1194 to 1200
let tool_pair_summarization_task = crate::context_mgmt::maybe_summarize_tool_pairs(
self.provider().await?,
session_config.id.clone(),
conversation.clone(),
tool_call_cut_off,
current_turn_tool_count,
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Abort background summarization when reply setup fails

maybe_summarize_tool_pairs is spawned before later fallible ? operations in the loop (including stream_response_from_provider(...).await?), so if the reply path exits early on provider/setup errors, the join handle is dropped without being awaited or aborted. Because summarization is now enabled by default, that detached task can still run hidden complete_fast calls (up to a batch) and consume tokens/credits even though the turn failed and no summaries are applied to session state. Please ensure this task is canceled/joined on early-return paths or spawn it only after fallible setup succeeds.

Useful? React with 👍 / 👎.

// See ENABLE_TOOL_PAIR_SUMMARIZATION constant above.
if !ENABLE_TOOL_PAIR_SUMMARIZATION {
return None;
if !tool_pair_summarization_enabled() || provider.manages_own_context() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate stateful-context flag through provider wrappers

This guard only checks provider.manages_own_context() on the top-level provider, but the new trait default is false; wrappers like LeadWorkerProvider therefore report false unless they explicitly override it. In lead/worker configurations that wrap stateful providers (e.g., Claude Code or Gemini CLI), background tool summarization will still run and inject hidden summarization turns into the same session, which reintroduces the context-pollution issue this check is intended to prevent. The capability should be propagated from underlying providers when wrappers are used.

Useful? React with 👍 / 👎.

@DOsinga DOsinga added this pull request to the merge queue Mar 18, 2026
@DOsinga DOsinga removed this pull request from the merge queue due to a manual request Mar 18, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06b56d7e54

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +370 to +374
let context_limit = self
.provider()
.await
.map(|p| p.get_model_config().context_limit())
.unwrap_or(crate::model::DEFAULT_CONTEXT_LIMIT);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Compute tool cutoff from the active model context

The new dynamic cutoff is derived once from self.provider().get_model_config().context_limit() before the reply loop, but wrapper providers can change the active model per turn. In particular, LeadWorkerProvider::get_model_config() returns the lead model config (crates/goose/src/providers/lead_worker.rs), so when the worker model has a smaller context window this overestimates the cutoff and delays summarization, making worker turns much more likely to hit ContextLengthExceeded. Derive the cutoff from the model actually used for the turn (or a conservative minimum across wrapped providers).

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

somewhat true yeah, but in general doesn't matter that much for tool summarization since it is a bonux

@DOsinga DOsinga added this pull request to the merge queue Mar 18, 2026
Merged via the queue into main with commit 475968d Mar 18, 2026
23 checks passed
@DOsinga DOsinga deleted the optimize-tool-summarization branch March 18, 2026 13:15
michaelneale added a commit that referenced this pull request Mar 19, 2026
* origin/main:
  fix(openai): use Responses API for gpt-5.4 (#7982)
  Remove lead/worker provider (#7989)
  chore(release): release version 1.28.0 (#7991)
  Fix empty tool results from resource content (e.g. auto visualiser) (#7866)
  Separate SSE streaming from POST work submission (#7834)
  fix: include token usage in Databricks streaming responses (#7959)
  Optimize tool summarization (#7938)
lifeizhou-ap added a commit that referenced this pull request Mar 20, 2026
* main: (22 commits)
  feat: add gemini-acp provider, update docs on subscription models + improvements to codex (#8000)
  fix(openai): use Responses API for gpt-5.4 (#7982)
  Remove lead/worker provider (#7989)
  chore(release): release version 1.28.0 (#7991)
  Fix empty tool results from resource content (e.g. auto visualiser) (#7866)
  Separate SSE streaming from POST work submission (#7834)
  fix: include token usage in Databricks streaming responses (#7959)
  Optimize tool summarization (#7938)
  fix: overwrite the deprecated googledrive extension config (#7974)
  refactor: remove unnecessary Arc<Mutex> from tool execution pipeline (#7979)
  Revert message flush & test (#7966)
  docs: add Remote Access section with Telegram Gateway documentation (#7955)
  fix: update webmcp blog post metadata image URL (#7967)
  fix: clean up OAuth token cache on provider deletion (#7908)
  fix: hard-coded tool call id in code mode callback (#7939)
  Fix SSE parsers to accept optional space after data: prefix (#7929)
  docs: add GOOSE_INPUT_LIMIT to config-files.md (#7961)
  Add WebMCP for Beginners blog post (#7957)
  Fix download manager (#7933)
  Improve the formatting of tool calls, show thinking, treat Reasoning and Thinking as the same thing (sorry Kant) (#7626)
  ...
elijahsgh pushed a commit to elijahsgh/goose that referenced this pull request Mar 21, 2026
Co-authored-by: Michael Neale <[email protected]>
Co-authored-by: Douwe Osinga <[email protected]>
Signed-off-by: esnyder <[email protected]>
elijahsgh pushed a commit to elijahsgh/goose that referenced this pull request Mar 21, 2026
Co-authored-by: Michael Neale <[email protected]>
Co-authored-by: Douwe Osinga <[email protected]>
Signed-off-by: esnyder <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants