Skip to content

[Bug]: CLI gateway handshake timeout on WSL2 — 2026.3.13 regression #51879

@markvshaney

Description

@markvshaney

Bug type

Regression (worked before, now fails)

Summary

After upgrading to 2026.3.13, all CLI-to-gateway commands (gateway health, devices list, etc.) fail with "gateway closed (1000 normal closure)" on WSL2. The gateway server log shows "handshake timeout" after 3s. The browser Control UI connects successfully to the same gateway with the same token — only the CLI device-identity handshake path is affected.

Steps to reproduce

  1. Install OpenClaw 2026.3.13 on WSL2 Ubuntu (Windows 11 25H2).
  2. Configure gateway with gateway.auth.mode = "token" in ~/.openclaw/openclaw.json.
  3. Start gateway via systemctl --user start openclaw-gateway.service.
  4. Confirm browser Control UI connects successfully at http://127.0.0.1:18789/.
  5. Run: openclaw gateway health --url ws://127.0.0.1:18789 --token "<token>"
  6. Observe: gateway closed (1000 normal closure): no close reason
  7. Gateway log shows: [ws] handshake timeout after ~3s.

Expected behavior

openclaw gateway health should complete the device-identity handshake and return OK, as it did in prior versions and as the browser Control UI does on the same gateway.

Actual behavior

The CLI opens the WebSocket but never completes the auth handshake within the 3s server-side timeout (DEFAULT_HANDSHAKE_TIMEOUT_MS = 3e3 in gateway-cli-CuZs0RlJ.js). The gateway logs [ws] handshake timeout and closes with code 1000. The handshake typically needs ~1100-1200ms on WSL2 loopback, but the full device-identity path exceeds 3s. The browser Control UI is unaffected because it uses the operator-only device-auth bypass introduced in 2026.3.13.

OpenClaw version

2026.3.13 (61d171a)

Operating system

Windows 11 25H2 — WSL2 Ubuntu

Install method

npm global

Model

ollama/glm-4.7-flash

Provider / routing chain

openclaw -> ollama (local, via Tailscale to maingear-pc)

Additional provider/model setup details

This bug is not model or provider related. It affects the CLI-to-gateway WebSocket device-identity handshake before any model routing occurs. The gateway auth mode is shared token (gateway.auth.mode = "token"). Node v22.22.1 via nvm. Gateway runs as a systemd user service inside WSL2.

Workaround: Setting VITEST=1 and OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS=10000 on both the gateway systemd service and CLI invocation allows the handshake to complete (~1166ms).

Root cause: DEFAULT_HANDSHAKE_TIMEOUT_MS is hardcoded to 3000ms in gateway-cli-CuZs0RlJ.js. The 2026.3.13 loopback device-identity bypass only applies when process.platform === 'win32', missing WSL2 which reports as linux.

Suggested fixes:

  1. Detect WSL2 via /proc/version and apply the loopback bypass.
  2. Expose handshakeTimeoutMs as a user config under gateway.websocket.
  3. Increase DEFAULT_HANDSHAKE_TIMEOUT_MS to 5000ms — 3s is too tight for WSL2/Docker loopback.

Logs, screenshots, and evidence

**CLI output:**

$ openclaw gateway health --url ws://127.0.0.1:18789 --token "<redacted>"
gateway connect failed: Error: gateway closed (1000):
Error: gateway closed (1000 normal closure): no close reason
Gateway target: ws://127.0.0.1:18789
Source: cli --url
Config: /home/bh/.openclaw/openclaw.json


**Gateway server log (journalctl):**

[ws] handshake timeout conn=911ee1ca-07c1-4c15-992f-24c628116066 remote=127.0.0.1
[ws] closed before connect conn=911ee1ca-07c1-4c15-992f-24c628116066 remote=127.0.0.1 fwd=n/a origin=n/a host=127.0.0.1:18789 ua=n/a code=1000 reason=n/a


**Structured log (JSON) confirms:**

{"cause":"handshake-timeout","handshake":"failed","durationMs":9842,"host":"127.0.0.1:18789","handshakeMs":3003}


**Browser Control UI connects successfully to the same gateway at the same time** — server log shows:

[gateway] device pairing auto-approved device=<id> role=operator
[ws]

Impact and severity

Affected: All WSL2 users on 2026.3.13 using CLI commands against a local gateway
Severity: High (blocks all CLI-to-gateway operations: health, devices, logs, status)
Frequency: 100% — every CLI gateway command fails, every time
Consequence: CLI is completely unusable; only the browser Control UI works. Users cannot script, automate, or manage the gateway from the command line.

Additional information

First known bad version: 2026.3.13
Last known good version: NOT_ENOUGH_INFO (did not test prior versions on this install)

The regression is caused by two 2026.3.13 changes interacting:

  1. "Security/WebSocket preauth: shorten unauthenticated handshake retention" — reduced DEFAULT_HANDSHAKE_TIMEOUT_MS to 3s
  2. "Windows/gateway auth: stop attaching device identity on local loopback" — bypass only checks process.platform === 'win32', missing WSL2

Workaround: Add Environment=VITEST=1 and Environment=OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS=10000 to the systemd service unit, and prefix CLI commands with the same env vars. This uses the test-only timeout override to extend the handshake window to 10s.

Node: v22.22.1 (nvm)
Gateway managed by: systemd user service
Auth mode: shared token

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionBehavior that previously worked and now fails

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions