Summary
When calling tools (e.g. cron.add, cron.list) from within an active LLM session, the tool call opens a new WebSocket connection back to the same gateway that is currently busy processing the session's turn. Since the gateway is single-threaded Node.js, it cannot respond to the second WS request while blocked on the first — resulting in a 10-second timeout.
The job actually succeeds (confirmed by checking after timeout), but the tool returns an error to the LLM, which may cause unnecessary retries and duplicate jobs.
Reproduction
- Start a Clawdbot session (e.g., via Telegram)
- From within the session, call
cron add or cron list
- Observe: WS connects, sends frame (
cron.list), gateway never responds within 10s
- Error:
gateway timeout after 10000ms
- But: the cron job IS created (check
clawdbot cron list from CLI — works fine)
Environment
- Clawdbot 2026.1.24-3
- Node 22.22.0
- Gateway bind:
tailnet (single Tailscale IP)
- Linux (Docker, WSL2)
- Two instances running (separate containers, separate gateways)
Root Cause
The embedded tool runner opens a second WS connection to the gateway to execute cron.* operations. The gateway's event loop is busy processing the current LLM turn (waiting for API response + tool execution). The second WS request sits in the queue, never gets processed within the timeout window.
Key evidence from logs:
→ close code=1005 reason= durationMs=10009 handshake=connected lastFrameType=req lastFrameMethod=cron.list
The WS connected, sent the frame, but the gateway never responded.
Workaround
Use exec tool to call CLI instead of native tool:
clawdbot cron list
clawdbot cron add --name my-job --schedule '0 */6 * * *' ...
CLI runs as a separate process with its own WS connection and no active session blocking the event loop.
Proposed Fix
Option A (preferred): Route embedded tool calls through the existing session WS channel instead of opening a new connection. The session already has an active WS — cron.* requests could be multiplexed on it.
Option B (simpler): Use in-process function calls for tools that are part of the gateway itself (cron, gateway config, etc.) instead of going through WS at all.
Option C (band-aid): Increase default timeout for embedded tool WS calls, or make it configurable per-tool.
Impact
- Affects any tool that routes through gateway WS from within a session
- Confirmed on
cron.add, cron.list
- May affect
gateway tool calls under heavy load
- Risk of duplicate cron jobs if LLM retries on false timeout error
Summary
When calling tools (e.g.
cron.add,cron.list) from within an active LLM session, the tool call opens a new WebSocket connection back to the same gateway that is currently busy processing the session's turn. Since the gateway is single-threaded Node.js, it cannot respond to the second WS request while blocked on the first — resulting in a 10-second timeout.The job actually succeeds (confirmed by checking after timeout), but the tool returns an error to the LLM, which may cause unnecessary retries and duplicate jobs.
Reproduction
cron addorcron listcron.list), gateway never responds within 10sgateway timeout after 10000msclawdbot cron listfrom CLI — works fine)Environment
tailnet(single Tailscale IP)Root Cause
The embedded tool runner opens a second WS connection to the gateway to execute
cron.*operations. The gateway's event loop is busy processing the current LLM turn (waiting for API response + tool execution). The second WS request sits in the queue, never gets processed within the timeout window.Key evidence from logs:
The WS connected, sent the frame, but the gateway never responded.
Workaround
Use
exectool to call CLI instead of native tool:clawdbot cron list clawdbot cron add --name my-job --schedule '0 */6 * * *' ...CLI runs as a separate process with its own WS connection and no active session blocking the event loop.
Proposed Fix
Option A (preferred): Route embedded tool calls through the existing session WS channel instead of opening a new connection. The session already has an active WS —
cron.*requests could be multiplexed on it.Option B (simpler): Use in-process function calls for tools that are part of the gateway itself (cron, gateway config, etc.) instead of going through WS at all.
Option C (band-aid): Increase default timeout for embedded tool WS calls, or make it configurable per-tool.
Impact
cron.add,cron.listgatewaytool calls under heavy load