Skip to content

[Bug]: nested lane hardcoded to maxConcurrent: 1 — sessions_send broadcasts cause cascading timeouts #14214

@sprfrkr

Description

@sprfrkr

Summary

In multi-agent setups (10 agents), sessions_send calls all route through the nested command lane, which has a hardcoded maxConcurrent: 1. When an agent broadcasts to N other agents via sequential sessions_send calls, the queue grows linearly and each call blocks for up to 30 seconds waiting for its turn. Agents later in the queue reliably hit the 30-second timeout.

For example, with 10 agents, a broadcast to 9 agents takes up to 270 seconds wall-clock (9 × 30s serial), and agents at positions 3+ in the queue always see Session Send: agent:<id>:main failed: timeout.

Steps to reproduce

  1. Set up 3+ agents with agentToAgent enabled
  2. Have one agent use sessions_send to broadcast a message to all other agents sequentially
  3. Observe the nested lane queue growing in the gateway logs:
    lane enqueue: lane=nested queueSize=2
    lane enqueue: lane=nested queueSize=3
    lane enqueue: lane=nested queueSize=4
    ...
    
  4. Agents later in the queue will get timeout errors

Expected behavior

The nested lane concurrency should be configurable via openclaw.json, similar to how agents.defaults.maxConcurrent (Main lane) and agents.defaults.subagents.maxConcurrent (Subagent lane) work today.

For example:

{
  "agents": {
    "defaults": {
      "maxConcurrent": 8,
      "subagents": { "maxConcurrent": 12 },
      "nestedMaxConcurrent": 4
    }
  }
}

Or alternatively via a dedicated sessions config key:

{
  "agents": {
    "defaults": {
      "sessions": { "maxConcurrent": 4 }
    }
  }
}

Actual behavior

The nested lane is always maxConcurrent: 1, regardless of configuration. The applyGatewayLaneConcurrency() function in src/gateway/server-lanes.ts sets concurrency for Main, Cron, and Subagent lanes but omits Nested:

function applyGatewayLaneConcurrency(cfg) {
  setCommandLaneConcurrency(CommandLane.Cron, cfg.cron?.maxConcurrentRuns ?? 1);
  setCommandLaneConcurrency(CommandLane.Main, resolveAgentMaxConcurrent(cfg));
  setCommandLaneConcurrency(CommandLane.Subagent, resolveSubagentMaxConcurrent(cfg));
  // Missing: setCommandLaneConcurrency(CommandLane.Nested, ???);
}

The same omission exists in the hot-reload path where lane concurrency is re-applied after config changes.

Environment

  • OpenClaw version: 2026.2.9
  • OS: macOS 15.3.1 (Sequoia), Apple Silicon Mac Mini
  • Install method: npm global (npm install -g openclaw)
  • Node.js: v22.22.0

Logs or screenshots

Gateway diagnostic logs showing the queue growth during a broadcast:

17:47:40 lane enqueue: lane=nested queueSize=2
17:47:47 lane enqueue: lane=nested queueSize=3
17:48:10 lane enqueue: lane=nested queueSize=4
17:48:40 lane enqueue: lane=nested queueSize=5
17:49:10 lane enqueue: lane=nested queueSize=6
17:49:40 lane enqueue: lane=nested queueSize=7

Each sessions_send tool call returns after ~30 seconds with a timeout status, even though the message is eventually delivered.

Workaround

We're patching the bundled JS to inject setCommandLaneConcurrency(CommandLane.Nested, 4) at both injection points. This needs to be re-applied after every update.

Related issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions