Skip to content

feat: add analyze platform extension with tree-sitter AST parsing#7542

Merged
tlongwell-block merged 17 commits intomainfrom
tyler/analyze-platform-extension-pr
Feb 27, 2026
Merged

feat: add analyze platform extension with tree-sitter AST parsing#7542
tlongwell-block merged 17 commits intomainfrom
tyler/analyze-platform-extension-pr

Conversation

@tlongwell-block
Copy link
Copy Markdown
Collaborator

@tlongwell-block tlongwell-block commented Feb 26, 2026

Rebuild the analyze tool as a standalone platform extension after #7466 merged in, as suggested by DOsinga in Oct 2025 (issue #5155). Single unprefixed 'analyze' tool with three auto-selected modes:

  • Structure: directory overview with LOC/function/class counts, language breakdown
  • Semantic: per-file functions with signatures and return types, classes with fields and inheritance, imports, call frequency (•N)
  • Focused: symbol call graph with BFS incoming/outgoing chains, cross-file resolution, test/production caller separation

10 languages: Rust, Python, JavaScript, TypeScript, TSX, Go, Java, Kotlin, Swift, Ruby. Tree-sitter queries extract functions, classes, imports, and call sites. Rayon parallelism for directory analysis.

Key design decisions:

  • 1 tool, 3 auto-selected modes (vs narsil-mcp's 90 tools)
  • Compact LLM-optimized output format
  • Function signatures with params and return types inline
  • Class inheritance shown for all languages
  • File-scoped call graph nodes with language isolation
  • Deduplicated edges (HashSet) prevent BFS chain explosion
  • Tree-formatted chain output with shared-prefix grouping
  • Test callers separated from production callers
  • Multiline format for large files (>10 symbols)
  • Subagent delegation pattern for token budget management

Architecture: 5 files, ~2k LOC, no file over 650 LOC

  • mod.rs: client, McpClientTrait impl, mode selection, tests
  • parser.rs: tree-sitter parsing, signatures, inheritance extraction
  • languages.rs: language registry (pure data, one entry per language)
  • format.rs: output formatting, tree dedup, test separation
  • graph.rs: call graph, BFS traversal, scoped resolution

Also includes analyze_cli binary for ad-hoc testing.

Rebuild the analyze tool as a standalone platform extension, as
suggested by DOsinga in Oct 2025 (issue #5155). Single unprefixed
'analyze' tool with three auto-selected modes:

- Structure: directory overview with LOC/function/class counts,
  language breakdown
- Semantic: per-file functions with signatures and return types,
  classes with fields and inheritance, imports, call frequency (•N)
- Focused: symbol call graph with BFS incoming/outgoing chains,
  cross-file resolution, test/production caller separation

10 languages: Rust, Python, JavaScript, TypeScript, TSX, Go, Java,
Kotlin, Swift, Ruby. Tree-sitter queries extract functions, classes,
imports, and call sites. Rayon parallelism for directory analysis.

Key design decisions:
- 1 tool, 3 auto-selected modes (vs narsil-mcp's 90 tools)
- Compact LLM-optimized output format
- Function signatures with params and return types inline
- Class inheritance shown for all languages
- File-scoped call graph nodes with language isolation
- Deduplicated edges (HashSet) prevent BFS chain explosion
- Tree-formatted chain output with shared-prefix grouping
- Test callers separated from production callers
- Multiline format for large files (>10 symbols)
- Subagent delegation pattern for token budget management

Architecture: 5 files, ~2k LOC, no file over 650 LOC
- mod.rs: client, McpClientTrait impl, mode selection, tests
- parser.rs: tree-sitter parsing, signatures, inheritance extraction
- languages.rs: language registry (pure data, one entry per language)
- format.rs: output formatting, tree dedup, test separation
- graph.rs: call graph, BFS traversal, scoped resolution

Validated by:
- 3 rounds of crossfire review (final: APPROVE 8/10)
- 2 waves of agent experience (AX) testing across 7 popular OSS
  repos (FastAPI, Zod, Fiber, Rails, Spring Boot, Ktor, Vapor)
- AX understanding metric: 55-70% from analyze alone
- Academic validation: aligns with RepoGraph (ICLR 2025),
  CodeCompass (2602.20048), AST-derived graph benchmarks

Also includes analyze_cli binary for ad-hoc testing.
…tribution, impl dedup

Bug 1: JS/TS/TSX generator functions (function*, async function*)
now detected via generator_function_declaration node type.

Bug 2: forwardRef/memo/React.lazy component expressions detected
via lexical_declaration with call_expression value.

Bug 3: Ref count in focused mode header now counts unique direct
callers + callees (stable across follow_depth) instead of chain
count which scaled combinatorially.

Bug 4: Calls inside closures and async blocks now bubble up to
the nearest named enclosing function instead of returning None.

Bug 5: Inherent impl blocks (no trait) now show as Type(impl)
to distinguish from the struct definition.
The forwardRef/memo detection query matched ANY const x = fn(),
causing JSON.parse, getElementById, etc. to be falsely detected
as function definitions. Removed — arrow function detection
(const X = () => {}) already covers the majority of React
components without false positives.
Remove 22 'what' comments that restate the code (parser.rs, graph.rs).
Keep tree-sitter AST structure docs and 'why' comments.

Structure mode now shows '(N files skipped: no parser)' when
non-parseable files exist in the directory, so agents know the
filesystem has more files than the analysis covers.
Java fields: collect_field_names now descends into variable_declarator
to find the field name identifier. Foo{String,int} → Foo{name,count}.

CLI: imports AnalyzeClient::analyze_file and AnalyzeClient::collect_files
instead of reimplementing. Both methods made pub.

Test patterns: is_test_chain now covers all 10 languages — Java
Test.java/Tests.java, Kotlin Test.kt, Ruby _spec.rb/_test.rb,
Swift Test.swift/Tests.swift, Go _test.go, plus Maven/Gradle
src/test/ directory convention.
@tlongwell-block tlongwell-block marked this pull request as ready for review February 27, 2026 03:29
@tlongwell-block

This comment was marked as outdated.

chatgpt-codex-connector[bot]

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

…ze guard error

Same-file callee resolution (graph.rs): when multiple definitions
share a name in the same file, pick the nearest by line proximity
instead of returning all matches. Fixes false cross-linking in OO
files with repeated method names.

JS/TS import normalization (parser.rs): 'import React from react'
now extracts 'react' instead of 'React from react'. Handles all
JS/TS import forms: default, named, namespace, type-only.

Swift init/deinit (parser.rs): initializers and deinitializers now
captured as function definitions with correct parent class, signatures,
and graph node registration. Calls inside init/deinit properly
attributed.

Size guard (mod.rs): output exceeding 50k chars now returns
CallToolResult::error instead of success, so agents recognize the
failure state and can retry with force or narrower scope.
chatgpt-codex-connector[bot]

This comment was marked as outdated.

Class inheritance was baked into Symbol.name (e.g. 'Child(Base)'),
breaking graph lookups — --focus Child returned 'not found' because
the graph key was 'Child(Base)'. Now inheritance lives in detail
while name stays canonical. Display still shows Child:4(Base).
@tlongwell-block

This comment was marked as outdated.

chatgpt-codex-connector[bot]

This comment was marked as outdated.

@github-actions

This comment was marked as outdated.

Module-scope callers (graph.rs): top-level calls like Python script
invocations were silently dropped because <module> had no graph node.
Now registers a <module> pseudo-node per file and falls back to it
when no enclosing function is found.

Go receiver types (parser.rs, languages.rs): Go methods like
func (s *Server) Handle() now show as Server.Handle instead of
bare Handle. Extracts receiver type from method_declaration's
parameter list.

Rust trait impl parent (parser.rs): impl Display for MyType now
correctly attributes methods to MyType, not Display. Scans for
the 'for' keyword and picks the type after it.
chatgpt-codex-connector[bot]

This comment was marked as outdated.

Call frequency (format.rs): Self::foo and module::foo now count
toward the bare foo symbol's •N marker. Uses the same rsplit('::')
normalization as graph callee resolution.

JS inheritance (parser.rs): class Foo extends React.Component now
shows Foo(React.Component) instead of Foo(React). Takes the full
heritage expression text instead of the first descendant identifier.
chatgpt-codex-connector[bot]

This comment was marked as resolved.

Ref count in FOCUS header now always uses depth=1 graph queries
regardless of the user's follow_depth. Previously --follow 0
showed '0 refs' because BFS returned no chains to count from.
Now all depths report the same accurate count.
chatgpt-codex-connector[bot]

This comment was marked as resolved.

@@ -0,0 +1,88 @@
//! CLI wrapper for the analyze platform extension.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how come there is an analyze cli separately?

Copy link
Copy Markdown
Collaborator Author

@tlongwell-block tlongwell-block Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It lets the agents test it as they develop it. And humans can use it, too. Not sure its worth keeping in, but it makes dev on this tool much, much easier

opentelemetry-stdout = { version = "0.31", features = ["trace", "metrics", "logs"] }
tracing-opentelemetry = "0.32"

rayon = "1.10"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any idea of heft added by these? I guess we had them before?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be noticeable since this was a dep just a few days ago already

Copy link
Copy Markdown
Collaborator

@michaelneale michaelneale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think good to get it back in if can get it clean

display_name: "Analyze",
description:
"Analyze code structure with tree-sitter: directory overviews, file details, symbol call graphs",
default_enabled: true,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we make this not on by default - let platform put it on?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It had been on by default, and makes goose much better in big repos that devs care about. I think on by default is definitely the way to go here. It's one tool and surprisingly light

chatgpt-codex-connector[bot]

This comment was marked as resolved.

…m-extension-pr

* origin/main:
  Update CODEOWNERS for team restructuring (#7574)
  Add snapshot test with platform extensions (#7573)
  Handle Bedrock 'prompt is too long' error (#7550)
  feat: make pctx/Code Mode an optional dependency via 'code-mode' feature (#7567)
  chore(release): release version 1.26.0 (minor) (#7512)
  feat: allow goose askai bot to search goose codebase (#7508)
  Revert "Reapply "fix: prevent crashes in long-running Electron sessions""
  Reapply "fix: prevent crashes in long-running Electron sessions"
  Revert "fix: prevent crashes in long-running Electron sessions"
  fix: replace unwrap() with graceful error in scheduler execute_job (#7436)
  fix: Dictation API error message shows incorrect limit (#7423)
  fix(acp): Use ACP schema types for session/list (#7409)
  fix(desktop): make bundle and updater asset naming configurable (#7337)
  fix(openai): preserve order in Responses API history (#7500)
  Use the correct Goose emoji 🪿 instead of Swan in README.md (#7485)
  fix: prevent crashes in long-running Electron sessions
…ate snapshot

JS/TS caller resolution: find_enclosing_fn now only treats
variable_declarator as a function scope when its value is an
arrow_function or function expression. Calls like
const data = load() inside function process() now correctly
attribute to process, not data.

Instructions: removed reference to the developer extension's
tree tool — our extension shouldn't tell agents about other
extensions' tools.

Snapshot: updated all_platform_extensions snapshot to include
the analyze extension instructions (merged from #7573).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1408e6cc1a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Chain links now show relative paths (e.g. platform_extensions/
analyze/mod.rs:call_tool:259) instead of bare filenames (mod.rs:
call_tool:259). Disambiguates repos with repeated basenames like
mod.rs, index.ts, lib.rs across different directories.

DEF lines also updated to include symbol name for consistency
with chain link format.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc8c010356

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +716 to +717
let is_fn_value = find_child_by_kind(&parent, "arrow_function").is_some()
|| find_child_by_kind(&parent, "function").is_some();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat function_expression declarators as JS/TS scopes

For JavaScript/TypeScript, calls inside const fn = function() { ... } are misattributed because this check only accepts arrow_function or function children on variable_declarator. Tree-sitter uses function_expression for this common pattern, so find_enclosing_fn skips the declarator and often falls back to <module>, which drops real caller→callee edges in focused mode and distorts semantic caller attribution.

Useful? React with 👍 / 👎.

Comment on lines +208 to +211
let analyses: Vec<FileAnalysis> = files
.par_iter()
.filter_map(|f| Self::analyze_file(f))
.collect();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Return parse error for unsupported focused file input

In focused mode, unsupported or binary files are silently removed by filter_map, and the tool then reports Symbol ... not found in 0 analyzed files instead of an analysis error. This creates a false negative whenever users focus on a single file that cannot be parsed, making parser failures look like missing symbols.

Useful? React with 👍 / 👎.

…m-extension-pr

* origin/main:
  Restore old system prompt behavior around subagents (#7576)
PlatformExtensionContext gained a session field from #7576.
Add session: None to the test helper. Update snapshot.
@tlongwell-block tlongwell-block force-pushed the tyler/analyze-platform-extension-pr branch from 369c1d7 to 0c3629c Compare February 27, 2026 21:01
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c3629caac

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +96 to +99
(lexical_declaration
(variable_declarator
name: (identifier) @name
value: (arrow_function)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Capture JS/TS declarator functions beyond arrow syntax

The JavaScript/TypeScript function queries only register variable_declarator entries when the value is an arrow_function inside a lexical_declaration, so common forms like const fn = function() {} and var fn = () => {} never become symbols. In semantic mode those functions disappear from output, and in focused mode calls like fn() cannot resolve to a local definition because no node was created.

Useful? React with 👍 / 👎.

calls: r#"
(call_expression function: (identifier) @name)
(call_expression function: (member_expression property: (property_identifier) @name))
(new_expression constructor: (identifier) @name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match namespaced constructor calls in JS/TS queries

Constructor call extraction only matches new_expression with an identifier constructor, so new calls using member expressions (for example new sdk.Client()) are not captured. This drops a common JS/TS instantiation pattern and causes focused call graphs to miss incoming references to those constructors/classes.

Useful? React with 👍 / 👎.

@tlongwell-block tlongwell-block added this pull request to the merge queue Feb 27, 2026
Merged via the queue into main with commit 16be0cc Feb 27, 2026
18 of 21 checks passed
@tlongwell-block tlongwell-block deleted the tyler/analyze-platform-extension-pr branch February 27, 2026 21:22
lifeizhou-ap added a commit that referenced this pull request Mar 2, 2026
* main: (46 commits)
  chore(deps): bump minimatch from 10.1.1 to 10.2.3 in /evals/open-model-gym/suite (#7498)
  chore(deps): bump swiper from 11.2.10 to 12.1.2 in /documentation (#7368)
  Better network failure error & antrhopic retry (#7595)
  feat: make the text bar persistent and add a queue for messages (#7560)
  fix: outdated clippy command in goosehints (#7590)
  chore(deps): bump hono from 4.11.7 to 4.12.1 in /evals/open-model-gym/mcp-harness (#7417)
  chore(deps-dev): bump ajv from 6.12.6 to 6.14.0 in /ui/desktop (#7437)
  chore(deps): bump ajv from 8.17.1 to 8.18.0 in /evals/open-model-gym/mcp-harness (#7491)
  chore(deps): bump hono from 4.12.0 to 4.12.2 in /ui/desktop (#7515)
  chore(deps-dev): bump rollup from 4.57.1 to 4.59.0 in /ui/desktop (#7522)
  chore(deps): bump minimatch in /ui/desktop (#7572)
  fix: validate configure probe for streaming providers (#7564)
  Dockerfile: add missing build/runtime dependencies (#7546)
  fix(claude-code): Permission routing for smart-approve (#7501)
  Add base_path field to custom provider config (#7558)
  fix(cli): avoid debug logging by default in CLI (#7569)
  fix: panic on corrupted permission.yaml instead of silently allowing all (#7432) (#7458)
  fix(openai): handle null reasoning effort in Responses API (#7469)
  Allow GOOSE_NODE_DIR override in batch file (#7422)
  feat: add analyze platform extension with tree-sitter AST parsing (#7542)
  ...
craigwalkeruk pushed a commit to craigwalkeruk/custom-goose that referenced this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants