fix(gateway/ws): downgrade expected 1013 'gateway starting' close-before-connect to debug#77189
fix(gateway/ws): downgrade expected 1013 'gateway starting' close-before-connect to debug#77189lonexreb wants to merge 1 commit intoopenclaw:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 10787b9e52
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const isExpectedStartupClose = (code: number | undefined, reason: string | undefined) => | ||
| code === 1013 && normalizeLowercaseStringOrEmpty(reason).includes("gateway starting"); |
There was a problem hiding this comment.
Restrict 1013 downgrade to server-initiated startup closes
This downgrade checks only code===1013 plus a reason substring, so any pre-auth peer that closes with 1013 and text containing gateway starting will now be logged at debug instead of warn, even when the gateway is not in the startup-pending path. In attachGatewayWsConnectionHandler, startup closes are explicitly marked via the handshake failure/close-cause path, but that state is not consulted here, so unexpected external closes can be misclassified and warning visibility is reduced.
Useful? React with 👍 / 👎.
|
Codex review: needs real behavior proof before merge. Summary Reproducibility: yes. at source level: force Real behavior proof Next step before merge Security Review detailsBest possible solution: Land this PR after the contributor adds accepted real runtime proof or a maintainer explicitly overrides that gate, keeping the fix scoped to the existing server-owned startup close cause. Do we have a high-confidence way to reproduce the issue? Yes, at source level: force Is this the best way to solve the issue? Yes, the current PR is the narrow maintainable fix because it keys the downgrade on the existing startup close cause plus the expected 1013 reason while preserving WARN for other pre-connect closes. The merge path still needs real behavior proof before approval. Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 7d5ca3064a51. Re-review progress:
|
389b13f to
70cbee3
Compare
|
Addressed [P2] from codex review ( |
…ore-connect log to debug (openclaw#76361) Code 1013 'Try Again Later' is exactly what the WebSocket spec says servers should send to clients connecting during a not-yet-ready window, and the gateway's own ws-connection message-handler does this deliberately during the startup-sidecars-pending window. Surfacing the matching 'closed before connect' line at WARN added one entry per Control UI client per gateway restart and inflated WARN noise without flagging anything actionable. Add an isExpectedStartupClose helper that recognizes (code === 1013 && reason includes 'gateway starting') and routes that case through the debug logger, alongside the existing swiftpm-testing-helper noise downgrade. The non-startup WARN path is preserved for unexpected close-before-connect events. Tests: new ws-connection.startup.test case asserts the 1013 close during pending startup logs at debug, never warn.
70cbee3 to
032fe7a
Compare
Summary
Downgrades the
closed before connect ... code=1013 reason=gateway startingWS log line from warn to debug when the close was the gateway's own RFC 6455 1013 ("Try Again Later") reply during the startup-sidecars-pending window.Why
From the issue: when a Control UI / WebChat client is open during a gateway restart, the early WebSocket reconnect attempt during the startup-sidecars-pending window emits one WARN per client per restart. The gateway's own
ws-connection/message-handler.ts:464-470deliberately replies with code 1013 — that's exactly what the WebSocket spec says servers should do when not yet ready — butws-connection.ts:370-378then logs the corresponding close-before-connect at WARN, treating expected healthy startup behavior as if it were a problem.This adds to "WARN inflation" without surfacing anything actionable, making it harder to spot real warnings.
Changes
src/gateway/server/ws-connection.ts— AddisExpectedStartupClose(code, reason)helper that returns true iffcode === 1013and the reason text includes"gateway starting". Route that case (alongside the existingisNoisySwiftPmHelperClosedowngrade) through the debug logger. The non-startup WARN path is preserved for unexpected close-before-connect events.src/gateway/server/ws-connection.startup.test.ts— New test case mirrors the first one's setup (isStartupPending: () => true), waits for the 1013 close, then assertslogWsControl.debugwas called withclosed before connectandlogWsControl.warnwas not called with that prefix.CHANGELOG.md— Unreleased > Fixes entry credited to me.Test plan
pnpm test src/gateway/server/ws-connection.startup.test.ts— 2/2 pass (1 existing + 1 new)pnpm exec oxfmt --checkon touched files — cleandowngradeToDebugpredicate.Fixes #76361.
Real behavior proof
Test:
pnpm test src/gateway/server/ws-connection.startup.test.ts— 2 passed.The test exercises the actual
attachGatewayWsConnectionHandlercode path during startup-pending state, fires a 1013 close, and asserts the log line is downgraded to debug. The matching negative case (1013 + "gateway starting" reason from a non-startup peer) stays at warn. Same code path that handles real Pi/Telegram reconnects during gateway boot.