Skip to content

Commit 032fe7a

Browse files
committed
fix(gateway/ws): downgrade expected 1013 'gateway starting' close-before-connect log to debug (#76361)
Code 1013 'Try Again Later' is exactly what the WebSocket spec says servers should send to clients connecting during a not-yet-ready window, and the gateway's own ws-connection message-handler does this deliberately during the startup-sidecars-pending window. Surfacing the matching 'closed before connect' line at WARN added one entry per Control UI client per gateway restart and inflated WARN noise without flagging anything actionable. Add an isExpectedStartupClose helper that recognizes (code === 1013 && reason includes 'gateway starting') and routes that case through the debug logger, alongside the existing swiftpm-testing-helper noise downgrade. The non-startup WARN path is preserved for unexpected close-before-connect events. Tests: new ws-connection.startup.test case asserts the 1013 close during pending startup logs at debug, never warn.
1 parent 5fae1c3 commit 032fe7a

3 files changed

Lines changed: 107 additions & 3 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ Docs: https://docs.openclaw.ai
7070

7171
### Fixes
7272

73+
- Gateway/ws: downgrade the expected RFC 6455 1013 `closed before connect` log to debug only when the close was server-initiated by the `startup-sidecars-pending` cause. Fixes #76361. Thanks @lonexreb.
7374
- Gateway/OpenAI-compatible: send the assistant role SSE chunk as soon as streaming chat-completion headers are accepted, so cold agent setup cannot leave `/v1/chat/completions` clients with a bodyless 200 response until their idle timeout fires.
7475
- Agents/media: avoid direct generated-media completion fallback while the announce-agent run is still pending, so async video and music completions do not duplicate raw media messages. (#77754)
7576
- TUI/sessions: bound the session picker to recent rows and use exact lookup-style refreshes for the active session, so dusty stores no longer make TUI hydrate weeks-old transcripts before becoming responsive. Thanks @vincentkoc.

src/gateway/server/ws-connection.startup.test.ts

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,4 +125,94 @@ describe("attachGatewayWsConnectionHandler startup readiness", () => {
125125
expect(socket.close).toHaveBeenCalledWith(1013, "gateway starting");
126126
});
127127
});
128+
129+
// Regression for #76361: a 1013 ("Try Again Later") close during the startup
130+
// sidecars-pending window is exactly what the WebSocket spec says servers
131+
// should do, so the corresponding `closed before connect` log must surface
132+
// at debug level rather than warn — otherwise every Control UI client adds
133+
// a WARN entry per gateway restart.
134+
it("logs 1013 'gateway starting' close-before-connect at debug, not warn (#76361)", async () => {
135+
const listeners = new Map<string, (...args: unknown[]) => void>();
136+
const wss = {
137+
on: vi.fn((event: string, handler: (...args: unknown[]) => void) => {
138+
listeners.set(event, handler);
139+
}),
140+
} as unknown as WebSocketServer;
141+
const socket = Object.assign(new EventEmitter(), {
142+
_socket: {
143+
remoteAddress: "127.0.0.1",
144+
remotePort: 1234,
145+
localAddress: "127.0.0.1",
146+
localPort: 5678,
147+
},
148+
send: vi.fn((_data: string, cb?: (err?: Error) => void) => {
149+
cb?.();
150+
}),
151+
close: vi.fn((code?: number, reason?: string) => {
152+
socket.emit("close", code ?? 1000, Buffer.from(reason ?? ""));
153+
}),
154+
});
155+
const upgradeReq = {
156+
headers: { host: "127.0.0.1:19001" },
157+
socket: { localAddress: "127.0.0.1" },
158+
};
159+
const logWsControl = createLogger();
160+
161+
attachGatewayWsConnectionHandler({
162+
wss,
163+
clients: new Set(),
164+
preauthConnectionBudget: { release: vi.fn() } as never,
165+
port: 19001,
166+
canvasHostEnabled: false,
167+
resolvedAuth: { mode: "none", allowTailscale: false },
168+
isStartupPending: () => true,
169+
gatewayMethods: [],
170+
events: [],
171+
refreshHealthSnapshot: vi.fn(async () => ({}) as never),
172+
logGateway: createLogger() as never,
173+
logHealth: createLogger() as never,
174+
logWsControl: logWsControl as never,
175+
extraHandlers: {},
176+
broadcast: vi.fn(),
177+
buildRequestContext: () => createRequestContext() as never,
178+
});
179+
180+
const onConnection = listeners.get("connection");
181+
onConnection?.(socket, upgradeReq);
182+
socket.emit(
183+
"message",
184+
JSON.stringify({
185+
type: "req",
186+
id: "connect-1",
187+
method: "connect",
188+
params: {
189+
minProtocol: PROTOCOL_VERSION,
190+
maxProtocol: PROTOCOL_VERSION,
191+
client: {
192+
id: GATEWAY_CLIENT_NAMES.CLI,
193+
version: "dev",
194+
platform: "test",
195+
mode: GATEWAY_CLIENT_MODES.CLI,
196+
},
197+
role: "operator",
198+
scopes: ["operator.read"],
199+
caps: [],
200+
},
201+
}),
202+
);
203+
204+
await vi.waitFor(() => {
205+
expect(socket.close).toHaveBeenCalledWith(1013, "gateway starting");
206+
});
207+
await vi.waitFor(() => {
208+
const debugCalls = logWsControl.debug.mock.calls.filter(
209+
([msg]) => typeof msg === "string" && msg.includes("closed before connect"),
210+
);
211+
expect(debugCalls.length).toBeGreaterThan(0);
212+
});
213+
const warnCalls = logWsControl.warn.mock.calls.filter(
214+
([msg]) => typeof msg === "string" && msg.includes("closed before connect"),
215+
);
216+
expect(warnCalls).toHaveLength(0);
217+
});
128218
});

src/gateway/server/ws-connection.ts

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -349,6 +349,18 @@ export function attachGatewayWsConnectionHandler(params: AttachGatewayWsConnecti
349349
normalizeLowercaseStringOrEmpty(userAgent).includes("swiftpm-testing-helper") &&
350350
isLoopbackAddress(remote);
351351

352+
// RFC 6455 close code 1013 ("Try Again Later") is exactly what the
353+
// gateway sends to clients that connect during the startup-sidecars-pending
354+
// window. The reconnect attempt is expected and healthy — surface it at
355+
// debug rather than warn so legitimate WARN inflation stays meaningful.
356+
// Tie the downgrade to the server-owned `startup-sidecars-pending` close
357+
// cause so a peer that picks code 1013 + reason "gateway starting" on a
358+
// non-startup close cannot suppress an otherwise actionable WARN. See #76361.
359+
const isExpectedStartupClose = (code: number | undefined, reason: string | undefined) =>
360+
closeCause === "startup-sidecars-pending" &&
361+
code === 1013 &&
362+
normalizeLowercaseStringOrEmpty(reason).includes("gateway starting");
363+
352364
socket.once("close", (code, reason) => {
353365
const durationMs = Date.now() - openedAt;
354366
const logForwardedFor = sanitizeLogValue(forwardedFor);
@@ -375,9 +387,10 @@ export function attachGatewayWsConnectionHandler(params: AttachGatewayWsConnecti
375387
...closeMeta,
376388
};
377389
if (!client) {
378-
const logFn = isNoisySwiftPmHelperClose(requestUserAgent, remoteAddr)
379-
? logWsControl.debug
380-
: logWsControl.warn;
390+
const downgradeToDebug =
391+
isNoisySwiftPmHelperClose(requestUserAgent, remoteAddr) ||
392+
isExpectedStartupClose(code, reason?.toString());
393+
const logFn = downgradeToDebug ? logWsControl.debug : logWsControl.warn;
381394
logFn(
382395
`closed before connect conn=${connId} peer=${endpoint ?? "n/a"} remote=${remoteAddr ?? "?"} fwd=${logForwardedFor || "n/a"} origin=${logOrigin || "n/a"} host=${logHost || "n/a"} ua=${logUserAgent || "n/a"} code=${code ?? "n/a"} reason=${logReason || "n/a"}`,
383396
closeContext,

0 commit comments

Comments
 (0)