fix(gateway): increase WS handshake timeout from 3s to 10s#49262
fix(gateway): increase WS handshake timeout from 3s to 10s#49262obviyus merged 3 commits intoopenclaw:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5cac0f34c8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Greptile SummaryThis PR increases the WebSocket handshake timeout from 3 seconds to 10 seconds in
Confidence Score: 5/5
Last reviewed commit: "fix: restore VITEST ..." |
|
Addressed both review findings in c4e0fa8:
The other CI failures (
|
@greptileai review again if you would please |
The 3-second default is too aggressive when the event loop is under load (concurrent sessions, compaction, agent turns), causing spurious 'gateway closed (1000)' errors on CLI commands like `openclaw cron list`. Changes: - Increase DEFAULT_HANDSHAKE_TIMEOUT_MS from 3_000 to 10_000 - Add OPENCLAW_HANDSHAKE_TIMEOUT_MS env var for user override (no VITEST gate) - Keep OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS as fallback for existing tests Fixes openclaw#46892
…llback, fix formatting
c4e0fa8 to
0442be4
Compare
|
Landed on main. Thanks @fuller-stack-dev. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0442be4e5c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| process.env.OPENCLAW_HANDSHAKE_TIMEOUT_MS || | ||
| (process.env.VITEST && process.env.OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS); |
There was a problem hiding this comment.
Keep the test-only handshake override effective under Vitest
Because getHandshakeTimeoutMs() now prefers OPENCLAW_HANDSHAKE_TIMEOUT_MS even when process.env.VITEST is set, the handshake-timeout tests are no longer hermetic against a developer's shell environment. src/gateway/server.preauth-hardening.test.ts:15-37 and src/gateway/server.auth.default-token.suite.ts:80-85 set OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS to 200/20ms and expect the socket to close within under a second, but a valid exported OPENCLAW_HANDSHAKE_TIMEOUT_MS (for example 60000 while tuning a local gateway) now overrides that and makes those tests stall or fail. The test-only variable should still win under Vitest, or these suites need to clear the runtime override first.
Useful? React with 👍 / 👎.
…49262) * fix(gateway): increase WS handshake timeout from 3s to 10s The 3-second default is too aggressive when the event loop is under load (concurrent sessions, compaction, agent turns), causing spurious 'gateway closed (1000)' errors on CLI commands like `openclaw cron list`. Changes: - Increase DEFAULT_HANDSHAKE_TIMEOUT_MS from 3_000 to 10_000 - Add OPENCLAW_HANDSHAKE_TIMEOUT_MS env var for user override (no VITEST gate) - Keep OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS as fallback for existing tests Fixes openclaw#46892 * fix: restore VITEST guard on test env var, use || for empty-string fallback, fix formatting * fix: cover gateway handshake timeout env override (openclaw#49262) (thanks @fuller-stack-dev) --------- Co-authored-by: Wilfred <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]>
…49262) * fix(gateway): increase WS handshake timeout from 3s to 10s The 3-second default is too aggressive when the event loop is under load (concurrent sessions, compaction, agent turns), causing spurious 'gateway closed (1000)' errors on CLI commands like `openclaw cron list`. Changes: - Increase DEFAULT_HANDSHAKE_TIMEOUT_MS from 3_000 to 10_000 - Add OPENCLAW_HANDSHAKE_TIMEOUT_MS env var for user override (no VITEST gate) - Keep OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS as fallback for existing tests Fixes openclaw#46892 * fix: restore VITEST guard on test env var, use || for empty-string fallback, fix formatting * fix: cover gateway handshake timeout env override (openclaw#49262) (thanks @fuller-stack-dev) --------- Co-authored-by: Wilfred <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]>
…49262) * fix(gateway): increase WS handshake timeout from 3s to 10s The 3-second default is too aggressive when the event loop is under load (concurrent sessions, compaction, agent turns), causing spurious 'gateway closed (1000)' errors on CLI commands like `openclaw cron list`. Changes: - Increase DEFAULT_HANDSHAKE_TIMEOUT_MS from 3_000 to 10_000 - Add OPENCLAW_HANDSHAKE_TIMEOUT_MS env var for user override (no VITEST gate) - Keep OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS as fallback for existing tests Fixes openclaw#46892 * fix: restore VITEST guard on test env var, use || for empty-string fallback, fix formatting * fix: cover gateway handshake timeout env override (openclaw#49262) (thanks @fuller-stack-dev) --------- Co-authored-by: Wilfred <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]>
…49262) * fix(gateway): increase WS handshake timeout from 3s to 10s The 3-second default is too aggressive when the event loop is under load (concurrent sessions, compaction, agent turns), causing spurious 'gateway closed (1000)' errors on CLI commands like `openclaw cron list`. Changes: - Increase DEFAULT_HANDSHAKE_TIMEOUT_MS from 3_000 to 10_000 - Add OPENCLAW_HANDSHAKE_TIMEOUT_MS env var for user override (no VITEST gate) - Keep OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS as fallback for existing tests Fixes openclaw#46892 * fix: restore VITEST guard on test env var, use || for empty-string fallback, fix formatting * fix: cover gateway handshake timeout env override (openclaw#49262) (thanks @fuller-stack-dev) --------- Co-authored-by: Wilfred <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]>
…49262) * fix(gateway): increase WS handshake timeout from 3s to 10s The 3-second default is too aggressive when the event loop is under load (concurrent sessions, compaction, agent turns), causing spurious 'gateway closed (1000)' errors on CLI commands like `openclaw cron list`. Changes: - Increase DEFAULT_HANDSHAKE_TIMEOUT_MS from 3_000 to 10_000 - Add OPENCLAW_HANDSHAKE_TIMEOUT_MS env var for user override (no VITEST gate) - Keep OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS as fallback for existing tests Fixes openclaw#46892 * fix: restore VITEST guard on test env var, use || for empty-string fallback, fix formatting * fix: cover gateway handshake timeout env override (openclaw#49262) (thanks @fuller-stack-dev) --------- Co-authored-by: Wilfred <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]>
…49262) * fix(gateway): increase WS handshake timeout from 3s to 10s The 3-second default is too aggressive when the event loop is under load (concurrent sessions, compaction, agent turns), causing spurious 'gateway closed (1000)' errors on CLI commands like `openclaw cron list`. Changes: - Increase DEFAULT_HANDSHAKE_TIMEOUT_MS from 3_000 to 10_000 - Add OPENCLAW_HANDSHAKE_TIMEOUT_MS env var for user override (no VITEST gate) - Keep OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS as fallback for existing tests Fixes openclaw#46892 * fix: restore VITEST guard on test env var, use || for empty-string fallback, fix formatting * fix: cover gateway handshake timeout env override (openclaw#49262) (thanks @fuller-stack-dev) --------- Co-authored-by: Wilfred <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]>
…49262) * fix(gateway): increase WS handshake timeout from 3s to 10s The 3-second default is too aggressive when the event loop is under load (concurrent sessions, compaction, agent turns), causing spurious 'gateway closed (1000)' errors on CLI commands like `openclaw cron list`. Changes: - Increase DEFAULT_HANDSHAKE_TIMEOUT_MS from 3_000 to 10_000 - Add OPENCLAW_HANDSHAKE_TIMEOUT_MS env var for user override (no VITEST gate) - Keep OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS as fallback for existing tests Fixes openclaw#46892 * fix: restore VITEST guard on test env var, use || for empty-string fallback, fix formatting * fix: cover gateway handshake timeout env override (openclaw#49262) (thanks @fuller-stack-dev) --------- Co-authored-by: Wilfred <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]> (cherry picked from commit 36f394c)
* fix(gateway): skip device pairing when auth.mode=none Fixes openclaw#42931 When gateway.auth.mode is set to "none", authentication succeeds with method "none" but sharedAuthOk remains false because the auth-context only recognises token/password/trusted-proxy methods. This causes all pairing-skip conditions to fail, so Control UI browser connections get closed with code 1008 "pairing required" despite auth being disabled. Short-circuit the skipPairing check: if the operator explicitly disabled authentication, device pairing (which is itself an auth mechanism) must also be bypassed. Fixes openclaw#42931 (cherry picked from commit 9bffa34) * fix(gateway): strip unbound scopes for shared-auth connects (cherry picked from commit 7dc447f) * fix(gateway): increase WS handshake timeout from 3s to 10s (openclaw#49262) * fix(gateway): increase WS handshake timeout from 3s to 10s The 3-second default is too aggressive when the event loop is under load (concurrent sessions, compaction, agent turns), causing spurious 'gateway closed (1000)' errors on CLI commands like `openclaw cron list`. Changes: - Increase DEFAULT_HANDSHAKE_TIMEOUT_MS from 3_000 to 10_000 - Add OPENCLAW_HANDSHAKE_TIMEOUT_MS env var for user override (no VITEST gate) - Keep OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS as fallback for existing tests Fixes openclaw#46892 * fix: restore VITEST guard on test env var, use || for empty-string fallback, fix formatting * fix: cover gateway handshake timeout env override (openclaw#49262) (thanks @fuller-stack-dev) --------- Co-authored-by: Wilfred <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]> (cherry picked from commit 36f394c) * fix(gateway): skip Control UI pairing when auth.mode=none (closes openclaw#42931) (openclaw#47148) When auth is completely disabled (mode=none), requiring device pairing for Control UI operator sessions adds friction without security value since any client can already connect without credentials. Add authMode parameter to shouldSkipControlUiPairing so the bypass fires only for Control UI + operator role + auth.mode=none. This avoids the openclaw#43478 regression where a top-level OR disabled pairing for ALL websocket clients. (cherry picked from commit 26e0a3e) * fix(gateway): clear trusted-proxy control ui scopes (cherry picked from commit ccf16cd) * fix(gateway): guard interface discovery failures Closes openclaw#44180. Refs openclaw#47590. Co-authored-by: Peter Steinberger <[email protected]> (cherry picked from commit 3faaf89) * fix(gateway): suppress ciao interface assertions Closes openclaw#38628. Refs openclaw#47159, openclaw#52431. Co-authored-by: Peter Steinberger <[email protected]> (cherry picked from commit c0d4abc) * fix(gateway): run before_tool_call for HTTP tools (cherry picked from commit 8cc0c9b) * fix(gateway): skip seq-gap broadcast for stale post-lifecycle events (openclaw#43751) * fix: stop stale gateway seq-gap errors (openclaw#43751) (thanks @caesargattuso) * fix: keep agent.request run ids session-scoped --------- Co-authored-by: Ayaan Zaidi <[email protected]> (cherry picked from commit 57f1cf6) * fix(gateway): honor trusted proxy hook auth rate limits (cherry picked from commit 4da617e) * fix(gateway): enforce browser origin check regardless of proxy headers In trusted-proxy mode, enforceOriginCheckForAnyClient was set to false whenever proxy headers were present. This allowed browser-originated WebSocket connections from untrusted origins to bypass origin validation entirely, as the check only ran for control-ui and webchat client types. An attacker serving a page from an untrusted origin could connect through a trusted reverse proxy, inherit proxy-injected identity, and obtain operator.admin access via the sharedAuthOk / roleCanSkipDeviceIdentity path without any origin restriction. Remove the hasProxyHeaders exemption so origin validation runs for all browser-originated connections regardless of how the request arrived. Fixes GHSA-5wcw-8jjv-m286 (cherry picked from commit ebed3bb) * fix(gateway): harden health monitor account gating (openclaw#46749) * gateway: harden health monitor account gating * gateway: tighten health monitor account-id guard (cherry picked from commit 29fec8b) * fix(gateway): bound unanswered client requests (openclaw#45689) * fix(gateway): bound unanswered client requests * fix(gateway): skip default timeout for expectFinal requests * fix(gateway): preserve gateway call timeouts * fix(gateway): localize request timeout policy * fix(gateway): clamp explicit request timeouts * fix(gateway): clamp default request timeout (cherry picked from commit 5fc43ff) * fix(gateway): propagate real gateway client into plugin subagent runtime Plugin subagent dispatch used a hardcoded synthetic client carrying operator.admin, operator.approvals, and operator.pairing for all runtime.subagent.* calls. Plugin HTTP routes with auth:"plugin" require no gateway auth by design, so an unauthenticated external request could drive admin-only gateway methods (sessions.delete, agent.run) through the subagent runtime. Propagate the real gateway client into the plugin runtime request scope when one is available. Plugin HTTP routes now run inside a scoped runtime client: auth:"plugin" routes receive a non-admin synthetic operator.write client; gateway-authenticated routes retain admin-capable scopes. The security boundary is enforced at the HTTP handler level. Fixes GHSA-xw77-45gv-p728 (cherry picked from commit a1520d7) * fix(gateway): enforce caller-scope subsetting in device.token.rotate device.token.rotate accepted attacker-controlled scopes and forwarded them to rotateDeviceToken without verifying the caller held those scopes. A pairing-scoped token could rotate up to operator.admin on any already-paired device whose approvedScopes included admin. Add a caller-scope subsetting check before rotateDeviceToken: the requested scopes must be a subset of client.connect.scopes via the existing roleScopesAllow helper. Reject with missing scope: <scope> if not. Also add server.device-token-rotate-authz.test.ts covering both the priv-esc path and the admin-to-node-invoke chain. Fixes GHSA-4jpw-hj22-2xmc (cherry picked from commit dafd61b) * fix(gateway): pin plugin webhook route registry (openclaw#47902) (cherry picked from commit a69f619) * fix(gateway): split conversation reset from admin reset (cherry picked from commit c91d162) * fix(gateway): harden token fallback/reconnect behavior and docs (openclaw#42507) * fix(gateway): harden token fallback and auth reconnect handling * docs(gateway): clarify auth retry and token-drift recovery * fix(gateway): tighten auth reconnect gating across clients * fix: harden gateway token retry (openclaw#42507) (thanks @joshavant) (cherry picked from commit a76e810) * fix: adapt cherry-picks for fork TS strictness - Replace OpenClawConfig with RemoteClawConfig in server-channels and server-runtime-state - Replace loadOpenClawPlugins with loadRemoteClawPlugins in server-plugins and remove unsupported runtimeOptions field and dead subagent runtime code - Export HookClientIpConfig type from server-http and thread it through server/hooks into server-runtime-state and server.impl - Create plugins-http/ submodules (path-context, route-match, route-auth) extracted from the monolithic plugins-http.ts by upstream refactor - Create stub modules for gutted upstream layers: acp/control-plane/manager, agents/bootstrap-cache, agents/pi-embedded, agents/internal-events - Strip thinkingLevel, reasoningLevel, skillsSnapshot from SessionEntry literals in agent.ts and session-reset-service.ts (Pi-specific fields) - Remove internalEvents from agent ingress opts and loadGatewayModelCatalog from sessions patch call (not present in fork types) - Fix connect-policy tests to pass booleans instead of role strings for the sharedAuthOk parameter (fork changed the function signature) - Add isHealthMonitorEnabled to ChannelManager mocks in test files - Widen runBeforeToolCallHook mock return type to accept blocked: true - Add explicit string types for msg params in server-plugins logger Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: apply fork naming to cherry-picked bonjour files --------- Co-authored-by: Andrew Demczuk <[email protected]> Co-authored-by: Peter Steinberger <[email protected]> Co-authored-by: fuller-stack-dev <[email protected]> Co-authored-by: Wilfred <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]> Co-authored-by: caesargattuso <[email protected]> Co-authored-by: Robin Waslander <[email protected]> Co-authored-by: Tak Hoffman <[email protected]> Co-authored-by: Peter Steinberger <[email protected]> Co-authored-by: Josh Avant <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Summary
Fixes #46892 — the 3-second WS handshake timeout is too aggressive when the gateway event loop is busy, causing spurious
gateway closed (1000)errors on CLI commands.Changes
src/gateway/server-constants.ts:DEFAULT_HANDSHAKE_TIMEOUT_MSfrom3_000to10_000OPENCLAW_HANDSHAKE_TIMEOUT_MSenv var for runtime override (removed VITEST gate)OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MSas fallback so existing tests are unaffectedWhy 10 seconds?
3s is too tight for loopback connections when the event loop is processing concurrent agent sessions, compaction, or long tool calls. 10s provides comfortable headroom while still catching genuinely broken connections. Users who need different values can set
OPENCLAW_HANDSHAKE_TIMEOUT_MS.Testing
OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MScontinue to work (kept as fallback in the nullish coalescing chain)OPENCLAW_HANDSHAKE_TIMEOUT_MSvalidated with sameNumber.isFinite && > 0guard