Skip to content

Parallel Subagent Execution is Serialized #7201

@angiejones

Description

@angiejones

Bug Summary

When the LLM issues multiple tool calls in a single response that target the same extension (e.g., two subagent calls, or two developer__shell calls), they execute sequentially instead of in parallel, despite the architecture intending to support concurrent execution.

Empirical Evidence

Two subagents were dispatched in the same LLM response, each performing a simple 3-second sleep:

Subagent Start End
A 20:29:27 20:29:30
B 20:29:37 20:29:40

Subagent B started 10 seconds after A, and 7 seconds after A finished. If parallel, both would have started within ~1 second of each other.

Root Cause

The McpClientBox type in crates/goose/src/agents/extension_manager.rs wraps the MCP client in an Arc<Mutex<...>>:

type McpClientBox = Arc<Mutex<Box<dyn McpClientTrait>>>;

In dispatch_tool_call (line ~1295), the lazy future acquires this mutex and holds it for the entire duration of the tool execution:

let fut = async move {
    let client_guard = client.lock().await;   // ← ACQUIRES MUTEX
    client_guard
        .call_tool(                            // ← HOLDS MUTEX FOR ENTIRE EXECUTION
            &session_id,
            &actual_tool_name,
            arguments,
            working_dir_str.as_deref(),
            cancellation_token,
        )
        .await                                 // ← ONLY RELEASES WHEN TOOL COMPLETES
};

Since all tool calls to the same extension share the same Arc<Mutex<...>> (via get_client() which clones the Arc), the second future blocks on client.lock().await until the first future completes.

The downstream consumer (stream::select_all in agent.rs line ~1270) correctly polls futures concurrently — the serialization happens purely because of this mutex.

Scope of Impact

This affects all parallel tool calls to the same extension, not just subagents:

  • Two developer__shell commands → serialized
  • Two computercontroller__web_scrape calls → serialized
  • Two subagent calls → serialized
  • Any pair of tools from the same MCP extension → serialized

Tools from different extensions run in parallel as expected (separate McpClientBox instances).

Architecture Flow

LLM Response: [tool_call_A, tool_call_B]  (both to same extension)
        │
        ▼
handle_approved_and_denied_tools
        │  (creates lazy futures - correct)
        ▼
stream::select_all  (polls both futures concurrently - correct)
        │
        ├── Future A: client.lock().await  ← gets lock ✅
        │       └── call_tool().await      ← holds lock for duration 🐛
        │
        └── Future B: client.lock().await  ← BLOCKED waiting for A 🐛
                └── call_tool().await      ← only runs after A finishes

Possible Fix Directions

  1. Channel-based multiplexing: The MCP protocol already supports concurrent requests over a single transport (stdio/SSE). The mutex is artificially serializing what the protocol can handle natively. Refactoring to send requests over a channel and match responses by ID would be the most correct fix.

  2. RwLock instead of Mutex: If call_tool doesn't mutate client state, RwLock would allow concurrent readers.

  3. Connection pool: Spawn multiple MCP client connections per extension, allowing true parallel execution.

  4. Clone client per call: If the MCP client can be cheaply cloned, each future gets its own instance.

Option 1 is likely the most architecturally correct approach.

Relevant Files

  • crates/goose/src/agents/extension_manager.rsMcpClientBox type (line 52), dispatch_tool_call (line ~1276)
  • crates/goose/src/agents/agent.rshandle_approved_and_denied_tools (line 351), stream::select_all (line ~1270)
  • crates/goose/src/agents/tool_execution.rsToolCallResult struct (line 18)

Metadata

Metadata

Assignees

No one assigned

    Labels

    p1Priority 1 - High (supports roadmap)

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions