-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Parallel Subagent Execution is Serialized #7201
Description
Bug Summary
When the LLM issues multiple tool calls in a single response that target the same extension (e.g., two subagent calls, or two developer__shell calls), they execute sequentially instead of in parallel, despite the architecture intending to support concurrent execution.
Empirical Evidence
Two subagents were dispatched in the same LLM response, each performing a simple 3-second sleep:
| Subagent | Start | End |
|---|---|---|
| A | 20:29:27 | 20:29:30 |
| B | 20:29:37 | 20:29:40 |
Subagent B started 10 seconds after A, and 7 seconds after A finished. If parallel, both would have started within ~1 second of each other.
Root Cause
The McpClientBox type in crates/goose/src/agents/extension_manager.rs wraps the MCP client in an Arc<Mutex<...>>:
type McpClientBox = Arc<Mutex<Box<dyn McpClientTrait>>>;In dispatch_tool_call (line ~1295), the lazy future acquires this mutex and holds it for the entire duration of the tool execution:
let fut = async move {
let client_guard = client.lock().await; // ← ACQUIRES MUTEX
client_guard
.call_tool( // ← HOLDS MUTEX FOR ENTIRE EXECUTION
&session_id,
&actual_tool_name,
arguments,
working_dir_str.as_deref(),
cancellation_token,
)
.await // ← ONLY RELEASES WHEN TOOL COMPLETES
};Since all tool calls to the same extension share the same Arc<Mutex<...>> (via get_client() which clones the Arc), the second future blocks on client.lock().await until the first future completes.
The downstream consumer (stream::select_all in agent.rs line ~1270) correctly polls futures concurrently — the serialization happens purely because of this mutex.
Scope of Impact
This affects all parallel tool calls to the same extension, not just subagents:
- Two
developer__shellcommands → serialized - Two
computercontroller__web_scrapecalls → serialized - Two
subagentcalls → serialized - Any pair of tools from the same MCP extension → serialized
Tools from different extensions run in parallel as expected (separate McpClientBox instances).
Architecture Flow
LLM Response: [tool_call_A, tool_call_B] (both to same extension)
│
▼
handle_approved_and_denied_tools
│ (creates lazy futures - correct)
▼
stream::select_all (polls both futures concurrently - correct)
│
├── Future A: client.lock().await ← gets lock ✅
│ └── call_tool().await ← holds lock for duration 🐛
│
└── Future B: client.lock().await ← BLOCKED waiting for A 🐛
└── call_tool().await ← only runs after A finishes
Possible Fix Directions
-
Channel-based multiplexing: The MCP protocol already supports concurrent requests over a single transport (stdio/SSE). The mutex is artificially serializing what the protocol can handle natively. Refactoring to send requests over a channel and match responses by ID would be the most correct fix.
-
RwLockinstead ofMutex: Ifcall_tooldoesn't mutate client state,RwLockwould allow concurrent readers. -
Connection pool: Spawn multiple MCP client connections per extension, allowing true parallel execution.
-
Clone client per call: If the MCP client can be cheaply cloned, each future gets its own instance.
Option 1 is likely the most architecturally correct approach.
Relevant Files
crates/goose/src/agents/extension_manager.rs—McpClientBoxtype (line 52),dispatch_tool_call(line ~1276)crates/goose/src/agents/agent.rs—handle_approved_and_denied_tools(line 351),stream::select_all(line ~1270)crates/goose/src/agents/tool_execution.rs—ToolCallResultstruct (line 18)