Skip to content

Commit 53849c6

Browse files
docs: describe device-token rotation race as open, not fixed
Codex review on #71156 correctly flagged that the AGENTS.md was describing a synchronous 'invalidated' flag on GatewayWsClient plus a per-RPC dispatch guard as already-landed guarantees, when they live in the still-open fix/gateway-device-token-rpc-revalidation PR. On main, GatewayWsClient has no 'invalidated' field and message-handler.ts only re-checks usesSharedGatewayAuth. Documenting the proposed fix as a shipped guarantee could let reviewers wave through rotate/revoke code paths that are in fact still racy against pipelined RPCs. Rewrite three references in the two AGENTS.md files: - Reframe the 'Related incidents' list in server-methods/AGENTS.md from 'landed' to 'in-flight' and mark rotate/revoke on main as best-effort rather than race-safe. - Rewrite the 'Shared-auth vs device-token' section in channels/plugins/AGENTS.md to describe current main (no per-RPC check for device-token) and position the 'invalidated' flag as a proposal, not a state. - Rewrite the rotate/revoke 'must never happen' bullet from 'the invalidated flag must be set before respond()' to a mechanism- agnostic invariant ('old-token authority must end before response flush, whichever mechanism closes the race'). No code changes. Keeps the content honest against main until the device-token PR lands (which this doc PR does not depend on).
1 parent f371c27 commit 53849c6

2 files changed

Lines changed: 31 additions & 21 deletions

File tree

src/channels/plugins/AGENTS.md

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -55,20 +55,25 @@ This asymmetry is subtle and load-bearing:
5555

5656
- **Shared-auth clients** (`usesSharedGatewayAuth: true`) are re-checked
5757
on every RPC against `sharedGatewaySessionGeneration` at
58-
`src/gateway/server/ws-connection/message-handler.ts:1444-1458`. A token
59-
rotation forces the next RPC from any stale client to close with
60-
`4001 gateway auth changed`.
61-
- **Device-token clients** were historically not re-checked per RPC. A
62-
`device.token.rotate` / `.revoke` scheduled the socket close via
63-
`queueMicrotask`, and RPCs already pipelined in the WS buffer landed
64-
with the old token. Fixed in `fix/gateway-device-token-rpc-revalidation`
65-
by adding a synchronous `invalidated` flag on `GatewayWsClient` set
66-
before `respond()` and checked at the top of the per-RPC dispatch.
58+
`src/gateway/server/ws-connection/message-handler.ts` in the per-request
59+
dispatch block. A token rotation forces the next RPC from any stale
60+
client to close with `4001 gateway auth changed`.
61+
- **Device-token clients are not re-checked per RPC on `main`.** A
62+
`device.token.rotate` / `.revoke` schedules the socket close via
63+
`queueMicrotask`, so RPCs already pipelined in the WS buffer can land
64+
with the old token before the disconnect takes effect. The PR
65+
`fix/gateway-device-token-rpc-revalidation` proposes closing this race
66+
with a synchronous `invalidated` flag on `GatewayWsClient` set before
67+
`respond()` and checked at the top of the per-RPC dispatch — but until
68+
that PR lands, **treat `device.token.rotate` / `.revoke` as
69+
best-effort rather than race-safe**, and be careful about adding
70+
operations whose correctness depends on the rotate response arriving
71+
before any further RPC is processed.
6772

6873
If you add a new auth method in this zone, decide explicitly which of
6974
these two per-RPC check shapes it follows and document it. Don't leave
7075
the third case "I didn't think about it" — that is exactly how the
71-
device-token race landed.
76+
device-token race exists in the first place.
7277

7378
## What must never happen
7479

@@ -78,10 +83,13 @@ device-token race landed.
7883
`allowFrom` list, or vice versa. If the two stores disagree, a
7984
previously-paired sender may silently get rejected or a revoked sender
8085
may silently get accepted.
81-
- A `device.token.rotate` / `.revoke` response arriving at the admin
82-
while the old token is still authenticating RPCs. The `invalidated`
83-
flag must be set _before_ `respond()` fires, not in a microtask that
84-
races the response flush.
86+
- A `device.token.rotate` / `.revoke` completing (from the admin's
87+
point of view) while pipelined RPCs on the rotated client can still
88+
authenticate with the old token. Whichever mechanism closes this race
89+
(the in-flight `invalidated` flag proposal, or something equivalent),
90+
the invariant is that the old-token authority ends **before** the
91+
response flush, not after — a microtask-scheduled disconnect alone is
92+
not sufficient.
8593
- A pairing adapter's `notifyApproval` throwing without a downstream
8694
catch that either (a) rolls the approval back or (b) logs the
8795
partial-commit loudly. Silent failure here is the pattern that put

src/gateway/server-methods/AGENTS.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -59,9 +59,9 @@ place to add logic.
5959

6060
### Related incidents (silent-failure shape)
6161

62-
These three PRs landed on the same class of bug in adjacent files. Same
63-
shape, different surfaces — cite them as examples of what "regress silently"
64-
looks like concretely:
62+
Three in-flight PRs target this same class of bug in adjacent files. At
63+
time of writing some may still be open against `main`; treat the shape
64+
description as the invariant, not the merge status:
6565

6666
- `fix/config-restore-false-success` — bare catch on `copyFile` during a
6767
suspicious-read recovery caused the audit log to report `valid: true`
@@ -72,10 +72,12 @@ looks like concretely:
7272
rotation that threw on close was swallowed, leaving the client
7373
authenticated past the point the admin thought they were kicked.
7474
- `fix/gateway-device-token-rpc-revalidation` — rotate/revoke responded
75-
OK before the microtask-scheduled disconnect, and there was no per-RPC
76-
re-check for device-token auth (only for shared-auth). Pipelined RPCs
77-
landed with the rotated token. Fixed by a synchronous `invalidated`
78-
flag plus a dispatcher-level guard.
75+
OK before the microtask-scheduled disconnect, and there is no per-RPC
76+
re-check for device-token auth on `main` (only for shared-auth).
77+
Pipelined RPCs could therefore land with the rotated token. The PR
78+
proposes a synchronous `invalidated` flag plus a dispatcher-level
79+
guard; until it lands, **treat rotate/revoke as best-effort, not
80+
race-safe**.
7981

8082
The common pattern: **a privileged operation claims success before the
8183
security guarantee is actually established.** Watch for it in reviews of

0 commit comments

Comments
 (0)