Skip to content

Tool calls from active session timeout due to WS self-contention #6508

@amco3008

Description

@amco3008

Summary

When calling tools (e.g. cron.add, cron.list) from within an active LLM session, the tool call opens a new WebSocket connection back to the same gateway that is currently busy processing the session's turn. Since the gateway is single-threaded Node.js, it cannot respond to the second WS request while blocked on the first — resulting in a 10-second timeout.

The job actually succeeds (confirmed by checking after timeout), but the tool returns an error to the LLM, which may cause unnecessary retries and duplicate jobs.

Reproduction

  1. Start a Clawdbot session (e.g., via Telegram)
  2. From within the session, call cron add or cron list
  3. Observe: WS connects, sends frame (cron.list), gateway never responds within 10s
  4. Error: gateway timeout after 10000ms
  5. But: the cron job IS created (check clawdbot cron list from CLI — works fine)

Environment

  • Clawdbot 2026.1.24-3
  • Node 22.22.0
  • Gateway bind: tailnet (single Tailscale IP)
  • Linux (Docker, WSL2)
  • Two instances running (separate containers, separate gateways)

Root Cause

The embedded tool runner opens a second WS connection to the gateway to execute cron.* operations. The gateway's event loop is busy processing the current LLM turn (waiting for API response + tool execution). The second WS request sits in the queue, never gets processed within the timeout window.

Key evidence from logs:

→ close code=1005 reason= durationMs=10009 handshake=connected lastFrameType=req lastFrameMethod=cron.list

The WS connected, sent the frame, but the gateway never responded.

Workaround

Use exec tool to call CLI instead of native tool:

clawdbot cron list
clawdbot cron add --name my-job --schedule '0 */6 * * *' ...

CLI runs as a separate process with its own WS connection and no active session blocking the event loop.

Proposed Fix

Option A (preferred): Route embedded tool calls through the existing session WS channel instead of opening a new connection. The session already has an active WS — cron.* requests could be multiplexed on it.

Option B (simpler): Use in-process function calls for tools that are part of the gateway itself (cron, gateway config, etc.) instead of going through WS at all.

Option C (band-aid): Increase default timeout for embedded tool WS calls, or make it configurable per-tool.

Impact

  • Affects any tool that routes through gateway WS from within a session
  • Confirmed on cron.add, cron.list
  • May affect gateway tool calls under heavy load
  • Risk of duplicate cron jobs if LLM retries on false timeout error

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions