Fix/gateway silent revocation failures#70676
Fix/gateway silent revocation failures#70676davidangularme wants to merge 5 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR replaces three bare
Confidence Score: 4/5Safe to merge after addressing the unbound method extraction in server-aux-handlers.ts The core security fix and the terminate fallback logic are correct and well-tested. One call site in server-aux-handlers.ts uses an unbound method reference inconsistent with the rest of the PR, which could break the eviction loop if the logger is class-based — worth a one-line fix before merge. src/gateway/server-aux-handlers.ts lines 65–69: unbound warn extraction Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/gateway/server-aux-handlers.ts
Line: 65-69
Comment:
**Unbound method extraction may break logger**
`params.log.warn` is extracted as a bare reference and then re-wrapped in a new object literal `{ warn }`. If the underlying logger is a class instance whose `warn` method references `this`, calling it via the re-wrapped object will either throw or silently misbehave — and because `forceCloseSharedAuthClient` calls `logger?.warn(...)` inside a catch block (no guard around the log call itself), a thrown exception there will propagate up and abort the eviction loop for remaining clients.
All other call sites pass the logger object directly (e.g., `logger: params.logReload` in `server-reload-handlers.ts`, `logger: log` in `server.impl.ts`). Here the pattern is inconsistent and less safe. Using an arrow-function wrapper (`warn: (m: string) => params.log.warn!(m)`) binds the call back to `params.log` without extracting the unbound reference.
```suggestion
disconnectStaleSharedGatewayAuthClients({
clients: params.clients,
expectedGeneration: nextSharedGatewaySessionGeneration,
logger: params.log.warn ? { warn: (m: string) => params.log.warn!(m) } : undefined,
});
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "test(gateway): cover silent revocation c..." | Re-trigger Greptile |
| const warn = params.log.warn; | ||
| disconnectStaleSharedGatewayAuthClients({ | ||
| clients: params.clients, | ||
| expectedGeneration: nextSharedGatewaySessionGeneration, | ||
| logger: warn ? { warn } : undefined, |
There was a problem hiding this comment.
Unbound method extraction may break logger
params.log.warn is extracted as a bare reference and then re-wrapped in a new object literal { warn }. If the underlying logger is a class instance whose warn method references this, calling it via the re-wrapped object will either throw or silently misbehave — and because forceCloseSharedAuthClient calls logger?.warn(...) inside a catch block (no guard around the log call itself), a thrown exception there will propagate up and abort the eviction loop for remaining clients.
All other call sites pass the logger object directly (e.g., logger: params.logReload in server-reload-handlers.ts, logger: log in server.impl.ts). Here the pattern is inconsistent and less safe. Using an arrow-function wrapper (warn: (m: string) => params.log.warn!(m)) binds the call back to params.log without extracting the unbound reference.
| const warn = params.log.warn; | |
| disconnectStaleSharedGatewayAuthClients({ | |
| clients: params.clients, | |
| expectedGeneration: nextSharedGatewaySessionGeneration, | |
| logger: warn ? { warn } : undefined, | |
| disconnectStaleSharedGatewayAuthClients({ | |
| clients: params.clients, | |
| expectedGeneration: nextSharedGatewaySessionGeneration, | |
| logger: params.log.warn ? { warn: (m: string) => params.log.warn!(m) } : undefined, | |
| }); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/server-aux-handlers.ts
Line: 65-69
Comment:
**Unbound method extraction may break logger**
`params.log.warn` is extracted as a bare reference and then re-wrapped in a new object literal `{ warn }`. If the underlying logger is a class instance whose `warn` method references `this`, calling it via the re-wrapped object will either throw or silently misbehave — and because `forceCloseSharedAuthClient` calls `logger?.warn(...)` inside a catch block (no guard around the log call itself), a thrown exception there will propagate up and abort the eviction loop for remaining clients.
All other call sites pass the logger object directly (e.g., `logger: params.logReload` in `server-reload-handlers.ts`, `logger: log` in `server.impl.ts`). Here the pattern is inconsistent and less safe. Using an arrow-function wrapper (`warn: (m: string) => params.log.warn!(m)`) binds the call back to `params.log` without extracting the unbound reference.
```suggestion
disconnectStaleSharedGatewayAuthClients({
clients: params.clients,
expectedGeneration: nextSharedGatewaySessionGeneration,
logger: params.log.warn ? { warn: (m: string) => params.log.warn!(m) } : undefined,
});
```
How can I resolve this? If you propose a fix, please make it concise.Greptile review feedback on openclaw#70676: the previous 'const warn = params.log.warn; logger: { warn }' shape detaches the method from its owning logger, so any implementation that relies on 'this' (or that gets rebound later on params.log) would mis-fire. Replace with an arrow-function wrapper that invokes params.log.warn through its owner each call.
|
Addressed in [commit hash] — switched to arrow-function wrapper to preserve binding |
|
Addressed in ab83788 — switched to arrow-function wrapper to preserve binding |
|
Codex review: found issues before merge. Summary Reproducibility: yes. Current main exposes the issue directly in the three bare Next step before merge Security Review findings
Review detailsBest possible solution: Rebase the PR, keep the focused revocation log-and-terminate implementation, preserve current main's untrusted secrets-event handling, add an Do we have a high-confidence way to reproduce the issue? Yes. Current main exposes the issue directly in the three bare Is this the best way to solve the issue? No for the current PR head. The revocation fix is the narrow maintainable approach, but the stale Full review comments:
Overall correctness: patch is incorrect Security concerns:
Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 9bedcff904dd. |
…erminate
The three best-effort disconnect loops that force-close stale / revoked WS
clients wrapped socket.close() in a bare catch {}. If close() threw on a
half-closed or malformed socket, the failure was swallowed and the client
kept its authenticated connection. On the device-removal path
(disconnectClientsForDevice) that was a real fail-open: an operator
removing a compromised device would see the call return success while the
attacker's live socket kept processing RPCs, because the per-message
generation check only covers shared-auth clients (message-handler.ts
$L1444-L1458), not device-token clients. The shared-auth-rotation paths
were fail-open only in the delay sense — the per-message check eventually
re-catches them — but silent close failures still hid operator-visible
errors during auth rotation.
Replace the three catches with: log via the caller's logger, then escalate
to socket.terminate() (wrapped in its own swallow so one stubborn socket
can't block the eviction loop). Thread loggers through the five call sites
— server-request-context (logGateway), server-aux-handlers (log),
server-reload-handlers (logReload, four sites), and server.impl
(enforceSharedGatewaySessionGenerationForConfigWrite). Widen the socket
client types with optional terminate() and connId for log context; both
additions are backwards-compatible.
New tests for server-shared-auth-generation and server-request-context exercise the three log-then-terminate paths added in the parent commit: inject a throwing socket.close, assert the logger fires with deviceId/ connId/error context, assert socket.terminate() is called as fallback, and assert remaining clients in the iterable are still evicted. Also covers: already-current generation is skipped (no spurious close), terminate() failures are swallowed so the loop keeps going, and absence of a logger stays silent on the happy path. Refactors the existing createGatewayRequestContext test boilerplate into a makeContextParams helper to keep the new device-removal test focused.
ab83788 to
329f425
Compare
`GatewayRequestContextParams.clients` is `Set<GatewayRequestContextClient>` and `disconnectClientsForDevice` is optional on the shared context type; cast the populated `clients` set and assert the helper before invoking it so `tsgo:core:test` passes.
Set is invariant; direct cast from the populated literal to Set of GatewayRequestContextClient is rejected. Route the cast through unknown so the test fixture compiles under tsgo:core:test.
The secrets-reloader state event interpolates only an enum code and an internal message string, which matches the GHSA-gfmx-pph7-g46x triage note for low-risk callers. Add the explicit trusted: true to self-document intent and clear the opengrep finding; runtime behavior is unchanged because trusted defaults to true.
|
Hi @steipete, @gumadeiras, Regarding the CHANGELOG.md requirement flagged by Clawsweeper: I've completed all technical fixes and unit tests, but I'd prefer if the maintainers could handle the changelog entry during the merge to avoid any potential file conflicts. Everything else is ready and verified. |
Summary
Problem: Three bare catch {} blocks in the gateway WebSocket revocation paths (server-shared-auth-generation.ts:28,43 and server-request-context.ts:116) silently swallow socket.close() failures. When an admin rotates gateway auth or removes a compromised device, a failing close leaves the evicted client connected with no log and no fallback.
Why it matters: On the device-removal path, this is a fail-open security gap — an admin removes a compromised device, the close throws, the attacker keeps their authenticated WebSocket connection. The operator sees no indication that revocation was partial. Shared-auth rotation has per-message generation enforcement as a backstop, but device-token clients do not.
What changed: Replaced all three bare catches with log-then-terminate: capture the error, warn with connId/deviceId/message, fall back to socket.terminate() (force-close, skips the close handshake). Extracted a shared forceCloseSharedAuthClient helper and describeRevocationError utility. Threaded RevocationLogger through all call sites.
What did NOT change: No changes to auth logic, pairing, session management, or any non-revocation paths.
Change Type (select all)
Bug fix
Security hardening
Scope (select all touched areas)
Gateway / orchestration
Auth / tokens
Linked Issue/PR
Related #70515 (same bare-catch pattern found in config recovery)
This PR fixes a bug or regression
Root Cause (if applicable)
Root cause: Bare catch {} around socket.close() in three revocation loops — same anti-pattern as the config-restore bug fixed in #70515.
Missing detection / guardrail: No test injected a throwing socket.close() on the revocation path.
Contributing context: The catch structure is intentional (rethrowing would kill the eviction loop for sibling clients), but the silent swallow is not. Confirmed that shared-auth clients have per-message generation re-check (message-handler.ts:1444-1458), but device-token clients have no equivalent — making the device-removal path the real fail-open.
Regression Test Plan (if applicable)
Coverage level:
Unit test
Target test or file: src/gateway/server-shared-auth-generation.test.ts (new), src/gateway/server-request-context.test.ts (extended)
Scenario: Inject a throwing socket.close, assert (a) logger.warn fires with connId + error message, (b) terminate() is called as fallback, (c) remaining clients in the iterable still get evicted.
Why smallest reliable guardrail: Unit test with injected socket failure covers the exact three branches.
User-visible / Behavior Changes
Operators now see a warning log when a revocation close fails, instead of silence.
Failed closes fall back to socket.terminate() — the evicted client is now force-killed even when the graceful close throws.
Diagram (if applicable)
textBefore:
[socket.close() throws] -> [catch {}] -> [attacker keeps connection] -> [no log]
After:
[socket.close() throws] -> [warn log with connId + error] -> [socket.terminate()] -> [connection killed]
Security Impact (required)
New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No
Repro + Verification
Environment
OS: Linux (Ubuntu 24)
Runtime/container: Node 24
Steps
Connect a device via WebSocket with a valid device token
Remove the device via admin action (triggering disconnectClientsForDevice)
Simulate socket.close() throwing (e.g., socket already in CLOSING state)
Expected
Warning log with deviceId, connId, and error message
socket.terminate() called as fallback
Client connection forcibly closed
Actual (before fix)
No log
No terminate fallback
Client connection may survive revocation
Evidence
Failing test/log before + passing after
Human Verification (required)
Verified scenarios: Unit tests with injected throwing socket.close for all three revocation paths (device removal, disconnect-all shared auth, disconnect-stale shared auth)
Edge cases checked: terminate() also throwing (loop continues), non-shared-auth clients skipped, generation-matching clients skipped
What I did not verify: Full integration test with real WebSocket in half-closed state
Compatibility / Migration
Backward compatible? Yes
Config/env changes? No
Migration needed? No
Risks and Mitigations
Risk: RevocationLogger is optional — call sites that don't pass a logger get the old silent behavior
Mitigation: All existing call sites now thread their logger through; new call sites without a logger still get terminate() fallback, just no log