feat: add analyze platform extension with tree-sitter AST parsing by tlongwell-block · Pull Request #7542 · block/goose

tlongwell-block · 2026-02-26T19:04:21Z

Rebuild the analyze tool as a standalone platform extension after #7466 merged in, as suggested by DOsinga in Oct 2025 (issue #5155). Single unprefixed 'analyze' tool with three auto-selected modes:

Structure: directory overview with LOC/function/class counts, language breakdown
Semantic: per-file functions with signatures and return types, classes with fields and inheritance, imports, call frequency (•N)
Focused: symbol call graph with BFS incoming/outgoing chains, cross-file resolution, test/production caller separation

10 languages: Rust, Python, JavaScript, TypeScript, TSX, Go, Java, Kotlin, Swift, Ruby. Tree-sitter queries extract functions, classes, imports, and call sites. Rayon parallelism for directory analysis.

Key design decisions:

1 tool, 3 auto-selected modes (vs narsil-mcp's 90 tools)
Compact LLM-optimized output format
Function signatures with params and return types inline
Class inheritance shown for all languages
File-scoped call graph nodes with language isolation
Deduplicated edges (HashSet) prevent BFS chain explosion
Tree-formatted chain output with shared-prefix grouping
Test callers separated from production callers
Multiline format for large files (>10 symbols)
Subagent delegation pattern for token budget management

Architecture: 5 files, ~2k LOC, no file over 650 LOC

mod.rs: client, McpClientTrait impl, mode selection, tests
parser.rs: tree-sitter parsing, signatures, inheritance extraction
languages.rs: language registry (pure data, one entry per language)
format.rs: output formatting, tree dedup, test separation
graph.rs: call graph, BFS traversal, scoped resolution

Also includes analyze_cli binary for ad-hoc testing.

Rebuild the analyze tool as a standalone platform extension, as suggested by DOsinga in Oct 2025 (issue #5155). Single unprefixed 'analyze' tool with three auto-selected modes: - Structure: directory overview with LOC/function/class counts, language breakdown - Semantic: per-file functions with signatures and return types, classes with fields and inheritance, imports, call frequency (•N) - Focused: symbol call graph with BFS incoming/outgoing chains, cross-file resolution, test/production caller separation 10 languages: Rust, Python, JavaScript, TypeScript, TSX, Go, Java, Kotlin, Swift, Ruby. Tree-sitter queries extract functions, classes, imports, and call sites. Rayon parallelism for directory analysis. Key design decisions: - 1 tool, 3 auto-selected modes (vs narsil-mcp's 90 tools) - Compact LLM-optimized output format - Function signatures with params and return types inline - Class inheritance shown for all languages - File-scoped call graph nodes with language isolation - Deduplicated edges (HashSet) prevent BFS chain explosion - Tree-formatted chain output with shared-prefix grouping - Test callers separated from production callers - Multiline format for large files (>10 symbols) - Subagent delegation pattern for token budget management Architecture: 5 files, ~2k LOC, no file over 650 LOC - mod.rs: client, McpClientTrait impl, mode selection, tests - parser.rs: tree-sitter parsing, signatures, inheritance extraction - languages.rs: language registry (pure data, one entry per language) - format.rs: output formatting, tree dedup, test separation - graph.rs: call graph, BFS traversal, scoped resolution Validated by: - 3 rounds of crossfire review (final: APPROVE 8/10) - 2 waves of agent experience (AX) testing across 7 popular OSS repos (FastAPI, Zod, Fiber, Rails, Spring Boot, Ktor, Vapor) - AX understanding metric: 55-70% from analyze alone - Academic validation: aligns with RepoGraph (ICLR 2025), CodeCompass (2602.20048), AST-derived graph benchmarks Also includes analyze_cli binary for ad-hoc testing.

…tribution, impl dedup Bug 1: JS/TS/TSX generator functions (function*, async function*) now detected via generator_function_declaration node type. Bug 2: forwardRef/memo/React.lazy component expressions detected via lexical_declaration with call_expression value. Bug 3: Ref count in focused mode header now counts unique direct callers + callees (stable across follow_depth) instead of chain count which scaled combinatorially. Bug 4: Calls inside closures and async blocks now bubble up to the nearest named enclosing function instead of returning None. Bug 5: Inherent impl blocks (no trait) now show as Type(impl) to distinguish from the struct definition.

The forwardRef/memo detection query matched ANY const x = fn(), causing JSON.parse, getElementById, etc. to be falsely detected as function definitions. Removed — arrow function detection (const X = () => {}) already covers the majority of React components without false positives.

Remove 22 'what' comments that restate the code (parser.rs, graph.rs). Keep tree-sitter AST structure docs and 'why' comments. Structure mode now shows '(N files skipped: no parser)' when non-parseable files exist in the directory, so agents know the filesystem has more files than the analysis covers.

Java fields: collect_field_names now descends into variable_declarator to find the field name identifier. Foo{String,int} → Foo{name,count}. CLI: imports AnalyzeClient::analyze_file and AnalyzeClient::collect_files instead of reimplementing. Both methods made pub. Test patterns: is_test_chain now covers all 10 languages — Java Test.java/Tests.java, Kotlin Test.kt, Ruby _spec.rb/_test.rb, Swift Test.swift/Tests.swift, Go _test.go, plus Maven/Gradle src/test/ directory convention.

…ze guard error Same-file callee resolution (graph.rs): when multiple definitions share a name in the same file, pick the nearest by line proximity instead of returning all matches. Fixes false cross-linking in OO files with repeated method names. JS/TS import normalization (parser.rs): 'import React from react' now extracts 'react' instead of 'React from react'. Handles all JS/TS import forms: default, named, namespace, type-only. Swift init/deinit (parser.rs): initializers and deinitializers now captured as function definitions with correct parent class, signatures, and graph node registration. Calls inside init/deinit properly attributed. Size guard (mod.rs): output exceeding 50k chars now returns CallToolResult::error instead of success, so agents recognize the failure state and can retry with force or narrower scope.

Class inheritance was baked into Symbol.name (e.g. 'Child(Base)'), breaking graph lookups — --focus Child returned 'not found' because the graph key was 'Child(Base)'. Now inheritance lives in detail while name stays canonical. Display still shows Child:4(Base).

Module-scope callers (graph.rs): top-level calls like Python script invocations were silently dropped because <module> had no graph node. Now registers a <module> pseudo-node per file and falls back to it when no enclosing function is found. Go receiver types (parser.rs, languages.rs): Go methods like func (s *Server) Handle() now show as Server.Handle instead of bare Handle. Extracts receiver type from method_declaration's parameter list. Rust trait impl parent (parser.rs): impl Display for MyType now correctly attributes methods to MyType, not Display. Scans for the 'for' keyword and picks the type after it.

Call frequency (format.rs): Self::foo and module::foo now count toward the bare foo symbol's •N marker. Uses the same rsplit('::') normalization as graph callee resolution. JS inheritance (parser.rs): class Foo extends React.Component now shows Foo(React.Component) instead of Foo(React). Takes the full heritage expression text instead of the first descendant identifier.

Ref count in FOCUS header now always uses depth=1 graph queries regardless of the user's follow_depth. Previously --follow 0 showed '0 refs' because BFS returned no chains to count from. Now all depths report the same accurate count.

michaelneale · 2026-02-27T14:51:06Z

crates/goose/src/bin/analyze_cli.rs

@@ -0,0 +1,88 @@
+//! CLI wrapper for the analyze platform extension.


how come there is an analyze cli separately?

It lets the agents test it as they develop it. And humans can use it, too. Not sure its worth keeping in, but it makes dev on this tool much, much easier

michaelneale · 2026-02-27T14:51:38Z

Cargo.toml

 opentelemetry-stdout = { version = "0.31", features = ["trace", "metrics", "logs"] }
 tracing-opentelemetry = "0.32"

+rayon = "1.10"


any idea of heft added by these? I guess we had them before?

Shouldn't be noticeable since this was a dep just a few days ago already

michaelneale

I think good to get it back in if can get it clean

michaelneale · 2026-02-27T14:52:40Z

crates/goose/src/agents/platform_extensions/mod.rs

+                display_name: "Analyze",
+                description:
+                    "Analyze code structure with tree-sitter: directory overviews, file details, symbol call graphs",
+                default_enabled: true,


could we make this not on by default - let platform put it on?

It had been on by default, and makes goose much better in big repos that devs care about. I think on by default is definitely the way to go here. It's one tool and surprisingly light

…m-extension-pr * origin/main: Update CODEOWNERS for team restructuring (#7574) Add snapshot test with platform extensions (#7573) Handle Bedrock 'prompt is too long' error (#7550) feat: make pctx/Code Mode an optional dependency via 'code-mode' feature (#7567) chore(release): release version 1.26.0 (minor) (#7512) feat: allow goose askai bot to search goose codebase (#7508) Revert "Reapply "fix: prevent crashes in long-running Electron sessions"" Reapply "fix: prevent crashes in long-running Electron sessions" Revert "fix: prevent crashes in long-running Electron sessions" fix: replace unwrap() with graceful error in scheduler execute_job (#7436) fix: Dictation API error message shows incorrect limit (#7423) fix(acp): Use ACP schema types for session/list (#7409) fix(desktop): make bundle and updater asset naming configurable (#7337) fix(openai): preserve order in Responses API history (#7500) Use the correct Goose emoji 🪿 instead of Swan in README.md (#7485) fix: prevent crashes in long-running Electron sessions

…ate snapshot JS/TS caller resolution: find_enclosing_fn now only treats variable_declarator as a function scope when its value is an arrow_function or function expression. Calls like const data = load() inside function process() now correctly attribute to process, not data. Instructions: removed reference to the developer extension's tree tool — our extension shouldn't tell agents about other extensions' tools. Snapshot: updated all_platform_extensions snapshot to include the analyze extension instructions (merged from #7573).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1408e6cc1a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

crates/goose/src/agents/platform_extensions/analyze/format.rs

Chain links now show relative paths (e.g. platform_extensions/ analyze/mod.rs:call_tool:259) instead of bare filenames (mod.rs: call_tool:259). Disambiguates repos with repeated basenames like mod.rs, index.ts, lib.rs across different directories. DEF lines also updated to include symbol name for consistency with chain link format.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc8c010356

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-27T20:52:25Z

crates/goose/src/agents/platform_extensions/analyze/parser.rs

+                let is_fn_value = find_child_by_kind(&parent, "arrow_function").is_some()
+                    || find_child_by_kind(&parent, "function").is_some();


Treat function_expression declarators as JS/TS scopes

For JavaScript/TypeScript, calls inside const fn = function() { ... } are misattributed because this check only accepts arrow_function or function children on variable_declarator. Tree-sitter uses function_expression for this common pattern, so find_enclosing_fn skips the declarator and often falls back to <module>, which drops real caller→callee edges in focused mode and distorts semantic caller attribution.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-27T20:52:25Z

crates/goose/src/agents/platform_extensions/analyze/mod.rs

+        let analyses: Vec<FileAnalysis> = files
+            .par_iter()
+            .filter_map(|f| Self::analyze_file(f))
+            .collect();


Return parse error for unsupported focused file input

In focused mode, unsupported or binary files are silently removed by filter_map, and the tool then reports Symbol ... not found in 0 analyzed files instead of an analysis error. This creates a false negative whenever users focus on a single file that cannot be parsed, making parser failures look like missing symbols.

Useful? React with 👍 / 👎.

…m-extension-pr * origin/main: Restore old system prompt behavior around subagents (#7576)

PlatformExtensionContext gained a session field from #7576. Add session: None to the test helper. Update snapshot.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c3629caac

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-27T21:07:53Z

crates/goose/src/agents/platform_extensions/analyze/languages.rs

+                (lexical_declaration
+                  (variable_declarator
+                    name: (identifier) @name
+                    value: (arrow_function)))


Capture JS/TS declarator functions beyond arrow syntax

The JavaScript/TypeScript function queries only register variable_declarator entries when the value is an arrow_function inside a lexical_declaration, so common forms like const fn = function() {} and var fn = () => {} never become symbols. In semantic mode those functions disappear from output, and in focused mode calls like fn() cannot resolve to a local definition because no node was created.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-27T21:07:53Z

crates/goose/src/agents/platform_extensions/analyze/languages.rs

+            calls: r#"
+                (call_expression function: (identifier) @name)
+                (call_expression function: (member_expression property: (property_identifier) @name))
+                (new_expression constructor: (identifier) @name)


Match namespaced constructor calls in JS/TS queries

Constructor call extraction only matches new_expression with an identifier constructor, so new calls using member expressions (for example new sdk.Client()) are not captured. This drops a common JS/TS instantiation pattern and causes focused call graphs to miss incoming references to those constructors/classes.

Useful? React with 👍 / 👎.

* main: (46 commits) chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498) chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368) Better network failure error & antrhopic retry (#7595) feat: make the text bar persistent and add a queue for messages (#7560) fix: outdated clippy command in goosehints (#7590) chore(deps): bump hono from 4.11.7 to 4.12.1 in /evals/open-model-gym/mcp-harness (#7417) chore(deps-dev): bump ajv from 6.12.6 to 6.14.0 in /ui/desktop (#7437) chore(deps): bump ajv from 8.17.1 to 8.18.0 in /evals/open-model-gym/mcp-harness (#7491) chore(deps): bump hono from 4.12.0 to 4.12.2 in /ui/desktop (#7515) chore(deps-dev): bump rollup from 4.57.1 to 4.59.0 in /ui/desktop (#7522) chore(deps): bump minimatch in /ui/desktop (#7572) fix: validate configure probe for streaming providers (#7564) Dockerfile: add missing build/runtime dependencies (#7546) fix(claude-code): Permission routing for smart-approve (#7501) Add base_path field to custom provider config (#7558) fix(cli): avoid debug logging by default in CLI (#7569) fix: panic on corrupted permission.yaml instead of silently allowing all (#7432) (#7458) fix(openai): handle null reasoning effort in Responses API (#7469) Allow GOOSE_NODE_DIR override in batch file (#7422) feat: add analyze platform extension with tree-sitter AST parsing (#7542) ...

…ock#7542)

tlongwell-block added 6 commits February 26, 2026 14:01

style: cargo fmt

fa91d32

tlongwell-block marked this pull request as ready for review February 27, 2026 03:29

This comment was marked as outdated.

Sign in to view

This comment was marked as resolved.

Sign in to view

fix: ref count stable across all follow_depth values

52daa76

Ref count in FOCUS header now always uses depth=1 graph queries regardless of the user's follow_depth. Previously --follow 0 showed '0 refs' because BFS returned no chains to count from. Now all depths report the same accurate count.

This comment was marked as resolved.

Sign in to view

michaelneale reviewed Feb 27, 2026

View reviewed changes

michaelneale approved these changes Feb 27, 2026

View reviewed changes

michaelneale reviewed Feb 27, 2026

View reviewed changes

docs: note that max_depth also limits focus scan depth

5549241

tlongwell-block enabled auto-merge February 27, 2026 20:22

This comment was marked as resolved.

Sign in to view

tlongwell-block disabled auto-merge February 27, 2026 20:27

tlongwell-block added 2 commits February 27, 2026 15:32

chatgpt-codex-connector bot reviewed Feb 27, 2026

View reviewed changes

crates/goose/src/agents/platform_extensions/analyze/format.rs Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Feb 27, 2026

View reviewed changes

tlongwell-block added 2 commits February 27, 2026 15:52

Merge remote-tracking branch 'origin/main' into tyler/analyze-platfor…

366f499

…m-extension-pr * origin/main: Restore old system prompt behavior around subagents (#7576)

fix: add missing session field to test context, update snapshot

0c3629c

PlatformExtensionContext gained a session field from #7576. Add session: None to the test helper. Update snapshot.

tlongwell-block force-pushed the tyler/analyze-platform-extension-pr branch from 369c1d7 to 0c3629c Compare February 27, 2026 21:01

chatgpt-codex-connector bot reviewed Feb 27, 2026

View reviewed changes

tlongwell-block added this pull request to the merge queue Feb 27, 2026

Merged via the queue into main with commit 16be0cc Feb 27, 2026
18 of 21 checks passed

tlongwell-block deleted the tyler/analyze-platform-extension-pr branch February 27, 2026 21:22

github-actions bot mentioned this pull request Mar 2, 2026

chore(release): release version 1.27.0 (minor) #7611

Merged

craigwalkeruk pushed a commit to craigwalkeruk/custom-goose that referenced this pull request Mar 5, 2026

feat: add analyze platform extension with tree-sitter AST parsing (bl…

5d684af

…ock#7542)

		@@ -0,0 +1,88 @@
		//! CLI wrapper for the analyze platform extension.

		let is_fn_value = find_child_by_kind(&parent, "arrow_function").is_some()
		\|\| find_child_by_kind(&parent, "function").is_some();

Conversation

tlongwell-block commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

michaelneale Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

tlongwell-block Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaelneale Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

tlongwell-block Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

michaelneale left a comment

Choose a reason for hiding this comment

Uh oh!

michaelneale Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

tlongwell-block Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tlongwell-block commented Feb 26, 2026 •

edited

Loading

tlongwell-block Feb 27, 2026 •

edited

Loading