Skip to content

feat: Add summarize tool for deterministic reads#7054

Merged
DOsinga merged 9 commits intomainfrom
baxen/summarize
Mar 11, 2026
Merged

feat: Add summarize tool for deterministic reads#7054
DOsinga merged 9 commits intomainfrom
baxen/summarize

Conversation

@baxen
Copy link
Copy Markdown
Collaborator

@baxen baxen commented Feb 6, 2026

Introduces a built-in summarize tool that lets the agent load files/directories and get an LLM-generated summary in a single call—a lightweight alternative to spawning a subagent when you already know what to analyze.

Copilot AI review requested due to automatic review settings February 6, 2026 19:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new built-in summarize tool to the goose agent so it can deterministically load local files/directories and ask the configured provider for a focused summary in a single tool call (avoiding spinning up a subagent when the scope is already known).

Changes:

  • Registers a new summarize tool in the agent tool list and dispatches tool calls to its handler.
  • Implements file collection (with basic .gitignore support), prompt construction, and provider override settings for the summarization call.
  • Adds unit tests covering file collection behaviors and prompt/tool creation.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
crates/goose/src/agents/mod.rs Exposes the new code_summary_tool module.
crates/goose/src/agents/code_summary_tool.rs Implements the summarize tool (file traversal, prompt building, provider overrides, tests).
crates/goose/src/agents/agent.rs Wires the tool into dispatch and includes it in the tool list returned to the model.

Comment on lines +282 to +299
let mut files = Vec::new();

for path_str in paths {
let path = if Path::new(path_str).is_absolute() {
PathBuf::from(path_str)
} else {
working_dir.join(path_str)
};

if !path.exists() {
return Err(format!("Path does not exist: {}", path.display()));
}

if path.is_dir() {
collect_from_dir(&path, &mut files, extensions, gitignore)?;
} else if path.is_file() && should_include_file(&path, extensions) {
collect_file(&path, &mut files)?;
}
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect_files accepts absolute paths and does not enforce that resolved (canonical) paths stay within working_dir, which can allow summarizing arbitrary files outside the project (including via .. segments). Reject absolute paths and/or canonicalize + verify starts_with(working_dir) before reading.

Suggested change
let mut files = Vec::new();
for path_str in paths {
let path = if Path::new(path_str).is_absolute() {
PathBuf::from(path_str)
} else {
working_dir.join(path_str)
};
if !path.exists() {
return Err(format!("Path does not exist: {}", path.display()));
}
if path.is_dir() {
collect_from_dir(&path, &mut files, extensions, gitignore)?;
} else if path.is_file() && should_include_file(&path, extensions) {
collect_file(&path, &mut files)?;
}
let base = working_dir.canonicalize().map_err(|e| {
format!(
"Failed to canonicalize working directory {}: {e}",
working_dir.display()
)
})?;
let mut files = Vec::new();
for path_str in paths {
// Reject absolute paths to avoid escaping the working directory.
if Path::new(path_str).is_absolute() {
return Err(format!(
"Absolute paths are not allowed: {}",
path_str
));
}
let joined_path = base.join(path_str);
if !joined_path.exists() {
return Err(format!("Path does not exist: {}", joined_path.display()));
}
let canonical_path = joined_path.canonicalize().map_err(|e| {
format!(
"Failed to canonicalize path {}: {e}",
joined_path.display()
)
})?;
if !canonical_path.starts_with(&base) {
return Err(format!(
"Path escapes working directory: {}",
joined_path.display()
));
}
if canonical_path.is_dir() {
collect_from_dir(&canonical_path, &mut files, extensions, gitignore)?;
} else if canonical_path.is_file() && should_include_file(&canonical_path, extensions) {
collect_file(&canonical_path, &mut files)?;
}

Copilot uses AI. Check for mistakes.
Comment on lines +319 to +325
if gitignore.matched(&path, path.is_dir()).is_ignore() {
continue;
}

if path.is_dir() {
collect_from_dir(&path, files, extensions, gitignore)?;
} else if path.is_file() && should_include_file(&path, extensions) {
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directory recursion uses path.is_dir()/read_dir() without handling symlinks, so a symlinked directory can escape the intended boundary or create cycles (potential infinite recursion / huge reads). Consider skipping symlinks (via symlink_metadata) or switching to ignore::WalkBuilder with follow_links(false) plus boundary checks.

Suggested change
if gitignore.matched(&path, path.is_dir()).is_ignore() {
continue;
}
if path.is_dir() {
collect_from_dir(&path, files, extensions, gitignore)?;
} else if path.is_file() && should_include_file(&path, extensions) {
let metadata = match std::fs::symlink_metadata(&path) {
Ok(m) => m,
Err(e) => {
tracing::debug!(
"Skipping path {}: failed to get metadata: {}",
path.display(),
e
);
continue;
}
};
if metadata.file_type().is_symlink() {
tracing::debug!("Skipping symlink path {}", path.display());
continue;
}
let is_dir = metadata.is_dir();
let is_file = metadata.is_file();
if gitignore.matched(&path, is_dir).is_ignore() {
continue;
}
if is_dir {
collect_from_dir(&path, files, extensions, gitignore)?;
} else if is_file && should_include_file(&path, extensions) {

Copilot uses AI. Check for mistakes.
Comment on lines +248 to +252
fn build_gitignore(working_dir: &Path) -> Gitignore {
let mut builder = GitignoreBuilder::new(working_dir);

// Add .gitignore if it exists
let gitignore_path = working_dir.join(".gitignore");
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tool description says it "respects .gitignore", but build_gitignore only loads the top-level .gitignore and won’t honor nested .gitignore files (or .git/info/exclude/global ignores). Either update the wording to match the behavior or use ignore::WalkBuilder which applies gitignore rules correctly during traversal.

Copilot uses AI. Check for mistakes.
Comment on lines +333 to +337
fn collect_file(path: &Path, files: &mut Vec<FileContent>) -> Result<(), String> {
let content = match std::fs::read_to_string(path) {
Ok(c) => c,
Err(e) => {
tracing::debug!("Skipping file {}: {}", path.display(), e);
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect_file reads entire files into memory with no per-file or total size limit, which can cause very large prompts (OOM/token-limit failures) when a directory contains big files. Add a maximum size/line cap (and/or truncate with a clear marker) before pushing content into files.

Copilot uses AI. Check for mistakes.
let ext = file.path.extension().and_then(|e| e.to_str()).unwrap_or("");

prompt.push_str(&format!(
"### {} ({} lines)\n```{}\n{}\n```\n\n",
Copy link

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_prompt wraps raw file contents in triple-backtick fences without escaping, so files containing ``` can break the prompt structure and contaminate the summary. Escape or transform fence sequences (or use an alternative delimiter) before embedding file contents.

Suggested change
"### {} ({} lines)\n```{}\n{}\n```\n\n",
"### {} ({} lines)\n~~~{}\n{}\n~~~\n\n",

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a useful tool to have and save context. we should bring it inline with the other build in tools though that now all are platform extensions

session.working_dir.clone(),
cancellation_token,
)
} else if tool_call.name == SUMMARIZE_TOOL_NAME {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to inject this straight into the agent. you should wire this up through the platform extension mechanism

}

pub fn create_summarize_tool() -> Tool {
let schema = json!({
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handcrafting json schema is rather fragile and this is just mirrorring the declaration that you have above, no?

// Add .gitignore if it exists
let gitignore_path = working_dir.join(".gitignore");
if gitignore_path.is_file() {
let _ = builder.add(&gitignore_path);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in our prompt manager we're not actually adding the .gitignore file, is that one faulty?

Copy link
Copy Markdown
Collaborator

@DOsinga DOsinga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a useful tool to have and save context. we should bring it inline with the other build in tools though that now all are platform extensions

Copilot AI review requested due to automatic review settings February 13, 2026 01:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment on lines +390 to +393
*total_size += content.len();
if *total_size > MAX_TOTAL_SIZE {
tracing::debug!("Total content size limit reached, stopping collection");
return Err("Total content size limit (1MB) exceeded".to_string());
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The total size check occurs after reading the file content and adding its length to total_size. This means collection can exceed MAX_TOTAL_SIZE by up to MAX_FILE_SIZE bytes before stopping. Consider checking the total size before reading each file to enforce the limit more strictly.

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +127
map.insert(
summarize_extension::EXTENSION_NAME,
PlatformExtensionDef {
name: summarize_extension::EXTENSION_NAME,
display_name: "Summarize",
description: "Load files/directories and get an LLM summary in a single call",
default_enabled: true,
client_factory: |ctx| {
Box::new(summarize_extension::SummarizeClient::new(ctx).unwrap())
},
},
);
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing required field unprefixed_tools in the PlatformExtensionDef. Based on the struct definition at line 208 in extension.rs and other extensions in this file, this field is required. It should be set to false for this extension since the tool should be prefixed with the extension name.

Copilot uses AI. Check for mistakes.
Comment on lines +789 to +790

prefixed_tools.push(create_summarize_tool());
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The agent.rs changes are architecturally incorrect. The summarize functionality is already implemented as a platform extension (SummarizeClient in summarize_extension.rs) that implements McpClientTrait and is registered in extension.rs. Platform extensions work through the extension_manager dispatch system automatically. These special-case handling additions (checking for SUMMARIZE_TOOL_NAME, calling handle_summarize_tool, and adding create_summarize_tool) are unnecessary and conflict with the extension registration. They should be removed entirely - the extension will work through the normal extension_manager.dispatch_tool_call path at line 549.

Suggested change
prefixed_tools.push(create_summarize_tool());

Copilot uses AI. Check for mistakes.
Resolve conflicts by adapting summarize extension to the new
platform_extensions module structure introduced in main.
Moved summarize_extension.rs to platform_extensions/summarize.rs
and updated to use builder APIs for InitializeResult and
CallToolResult.
@DOsinga DOsinga requested a review from blackgirlbytes as a code owner March 10, 2026 22:02
@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://block.github.io/goose/pr-preview/pr-7054/

Built to branch gh-pages at 2026-03-10 22:05 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Douwe Osinga added 4 commits March 10, 2026 18:06
Remove 5 tests that only verify obvious behavior (basic file reading,
string formatting, tool name checking). Keep 6 that guard security
boundaries (path traversal, absolute paths, symlinks) and resource
limits (large files, gitignore, extension filtering).

Set default_enabled: false so the extension ships opt-in while we
gather feedback.
@DOsinga DOsinga removed the request for review from blackgirlbytes March 11, 2026 00:21
@DOsinga DOsinga added this pull request to the merge queue Mar 11, 2026
Merged via the queue into main with commit b10717a Mar 11, 2026
20 of 21 checks passed
@DOsinga DOsinga deleted the baxen/summarize branch March 11, 2026 01:35
lifeizhou-ap added a commit that referenced this pull request Mar 11, 2026
* main: (45 commits)
  fix: resolve {{ recipe_dir }} in nested sub-recipe paths during secret discovery (#7797)
  Add @DOsinga as CODEOWNER for documentation (#7799)
  feat: Add summarize tool for deterministic reads (#7054)
  fix(api): use camelCase in CallToolResponse and add type discriminators to ContentBlock (#7487)
  feat: ACP providers for claude code and codex (#6605)
  chore(deps): bump express-rate-limit from 8.2.1 to 8.3.0 in /evals/open-model-gym/mcp-harness (#7703)
  feat(openai): capture reasoning summaries from responses API (#7375)
  Fix some dependencies (#7794)
  fix: improve keyring availability error detection (#7766)
  feat: add MiniMax provider with Anthropic-compatible API (#7640)
  feat: add Tensorix as a declarative provider (#7712)
  fix(security): remove insecure default secret from GOOSE_EXTERNAL_BACKEND (#7783)
  refactor: Convert Tanzu provider to declarative JSON config (#7124)
  replaces https://github.com/block/goose/pull/7340/changes (#7786)
  feat(summon): make skill supporting files individually loadable via load() (#7583)
  Keep toast open on failed extension (#7771)
  fix(ui-desktop): unify path resolution around GOOSE_PATH_ROOT (#7335)
  fix: pass OAuth scopes to DCR and extract granted_scopes from token response (#7571)
  fix: write to real file if config.yaml is symlink (#7669)
  fix: preserve pairings when stopping gateway (#7733)
  ...
lifeizhou-ap added a commit that referenced this pull request Mar 11, 2026
* main: (69 commits)
  fix: resolve {{ recipe_dir }} in nested sub-recipe paths during secret discovery (#7797)
  Add @DOsinga as CODEOWNER for documentation (#7799)
  feat: Add summarize tool for deterministic reads (#7054)
  fix(api): use camelCase in CallToolResponse and add type discriminators to ContentBlock (#7487)
  feat: ACP providers for claude code and codex (#6605)
  chore(deps): bump express-rate-limit from 8.2.1 to 8.3.0 in /evals/open-model-gym/mcp-harness (#7703)
  feat(openai): capture reasoning summaries from responses API (#7375)
  Fix some dependencies (#7794)
  fix: improve keyring availability error detection (#7766)
  feat: add MiniMax provider with Anthropic-compatible API (#7640)
  feat: add Tensorix as a declarative provider (#7712)
  fix(security): remove insecure default secret from GOOSE_EXTERNAL_BACKEND (#7783)
  refactor: Convert Tanzu provider to declarative JSON config (#7124)
  replaces https://github.com/block/goose/pull/7340/changes (#7786)
  feat(summon): make skill supporting files individually loadable via load() (#7583)
  Keep toast open on failed extension (#7771)
  fix(ui-desktop): unify path resolution around GOOSE_PATH_ROOT (#7335)
  fix: pass OAuth scopes to DCR and extract granted_scopes from token response (#7571)
  fix: write to real file if config.yaml is symlink (#7669)
  fix: preserve pairings when stopping gateway (#7733)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants