Sandbox based MCP extensions #5899
michaelneale
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
It's been probably a year of MCP use in goose, and recently this excellent article by Anthropic https://www.anthropic.com/engineering/code-execution-with-mcp which follows on from https://blog.cloudflare.com/code-mode/ pointed out some of the challenges but also some pretty nice solutions for scaling MCP tool usage.
@alexhancock and I were talking and thought this is worth a look at:
The general idea is that there is a evaluation/sandbox/repl (if you like) of some kind which is used to invoke APIs (MCP server tools in this case) instead of using the LLM to directly call tools when told (you can read up more https://www.anthropic.com/engineering/code-execution-with-mcp in how this can have massive savings in context, performance, time to first token, and of a LOT of interest to goose - being able to effectively work with basically all extensions you can want, turned on, potentially hundreds or thousands of tools) and fewer round trips.
Some broad strokes
enabled:false) and then mechanically translate that to a directory layout with js/jsDoc describing each tools api. Something like:The content of a specific tool could be something like this (similar pattern to what anthropic showed in post above):
where there can be global functions available to the environment (like that callMCP) which then literally call the MCP tool (same as goose does now, but from an execution environment not due to the LLM returning tool calls).
The execution environment (based on some experiments, and hence js) could be the Boa library (native rust), which is a fairly lean/very maintained javascript lite execution environment (example snippet: https://gist.github.com/michaelneale/b4b35d2bec6f13e089c65e65ac5dab67 - https://github.com/boa-dev/boa) - this doesn't seem to add a lot of weight to the app. There are tons of other options (all the way to compiling and running wasm for max isolation and speed)
With these above - goose instead of using many tools, really just needs one or two (shell, really, or the developer extension, even that could be trimmed down a lot) to explore the extensions directory when needed (in the system prompt it knows to look there). that way it can progressively load these js defintions of apis.
When goose goes to invoke a tool, really the LLM will return a function based on the tool api it wants to use, which goose will appl with an "eval" style tool call which uses this sandbox environment (with the MCP tools/supporting global functions available) returning the results to the LLM as needed. This means it can call a single tool directly, discover a tool, or even write snippets (which could be persisted in a config dir adjacent to the extensions dir for future loose) which chain together more complex calls and repetition (keeping it out of the agent loop). Should it fail to make coherent invocations, that error will be fed back so it can make correct calls from the "documentation".
This wouldn't be a small change, may not be a huge amount of code, but it does radically change how MCP tools are loaded and used, and much more exploration needed (there are other environments for execution that could be tried as well).
Does seem all a bit surprising but a lot of evidence points to something like this yielding good results even with small numbers of tools (and also scaling up). I think this could also help with a wider variety of models (of different sizes) are they are expected to do less in terms of well formed tool calls and syntax (and stick to what they do well!).
(would love any thoughts as I think I may be missing some nuance from some of the original posts)
Beta Was this translation helpful? Give feedback.
All reactions