fix(gateway): skip seq-gap broadcast for stale post-lifecycle events#43751
fix(gateway): skip seq-gap broadcast for stale post-lifecycle events#43751obviyus merged 2 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR adds a one-line guard ( Key observations:
Confidence Score: 4/5
Last reviewed commit: 1cdb818 |
byungsker
left a comment
There was a problem hiding this comment.
The root cause analysis is spot-on: after lifecycle:end calls agentRunSeq.delete(evt.runId), the next ?? 0 fallback turns any stale in-flight event into a spurious gap report. The last > 0 guard is the minimal correct fix.
One side-effect worth noting: the stale event still runs agentRunSeq.set(evt.runId, evt.seq) at line 570, so it re-creates a zombie entry for a completed run. It will never be cleaned up by the lifecycle-end path (which already ran). In practice this is harmless — the run is finished and no further events arrive — but it could cause a subtle gap-suppression if a *new* run somehow reused the same runId (extremely unlikely with UUID run IDs). Not a blocker, just worth knowing.
No tests were added; a unit test covering the post-lifecycle stale-event path would be a nice-to-have but is non-trivial to set up given the mock complexity. LGTM for the fix itself.
|
Hi @obviyus , this is just a one-line bug fix that's already passed botcheck — could you take a look? |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 833bf37612
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 65b7067c3a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 13f8d12660
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 58d273c3cc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94b7ffd2a7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Landed on main. Thanks @caesargattuso. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 92d23fbdb9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const runId = randomUUID(); | ||
|
|
||
| // Ensure chat UI clients refresh when this run completes (even though it wasn't started via chat.send). | ||
| // This maps agent bus events (keyed by sessionId) to chat events (keyed by clientRunId). | ||
| ctx.addChatRun(sessionId, { | ||
| // This maps agent bus events (keyed by per-turn runId) to chat events (keyed by clientRunId). | ||
| ctx.addChatRun(runId, { |
There was a problem hiding this comment.
Keep voice transcript streams keyed by sessionId
Fresh evidence from this commit: voice.transcript now generates a per-turn UUID here instead of reusing the session ID. The shared iOS/macOS chat UI drops external agent events unless evt.runId == sessionId (apps/shared/OpenClawKit/Sources/OpenClawChatUI/ChatViewModel.swift:889-892), and its regression tests model external streaming with runId: sessionId (apps/shared/OpenClawKit/Tests/OpenClawKitTests/ChatViewModelTests.swift:523-540). In practice, if a user is watching a session while a node voice transcript runs, live assistant text and tool cards stop appearing until the final chat refresh lands.
Useful? React with 👍 / 👎.
…penclaw#43751) * fix: stop stale gateway seq-gap errors (openclaw#43751) (thanks @caesargattuso) * fix: keep agent.request run ids session-scoped --------- Co-authored-by: Ayaan Zaidi <[email protected]>
…penclaw#43751) * fix: stop stale gateway seq-gap errors (openclaw#43751) (thanks @caesargattuso) * fix: keep agent.request run ids session-scoped --------- Co-authored-by: Ayaan Zaidi <[email protected]>
…penclaw#43751) * fix: stop stale gateway seq-gap errors (openclaw#43751) (thanks @caesargattuso) * fix: keep agent.request run ids session-scoped --------- Co-authored-by: Ayaan Zaidi <[email protected]>
…penclaw#43751) * fix: stop stale gateway seq-gap errors (openclaw#43751) (thanks @caesargattuso) * fix: keep agent.request run ids session-scoped --------- Co-authored-by: Ayaan Zaidi <[email protected]> (cherry picked from commit 57f1cf6)
* fix(gateway): skip device pairing when auth.mode=none Fixes openclaw#42931 When gateway.auth.mode is set to "none", authentication succeeds with method "none" but sharedAuthOk remains false because the auth-context only recognises token/password/trusted-proxy methods. This causes all pairing-skip conditions to fail, so Control UI browser connections get closed with code 1008 "pairing required" despite auth being disabled. Short-circuit the skipPairing check: if the operator explicitly disabled authentication, device pairing (which is itself an auth mechanism) must also be bypassed. Fixes openclaw#42931 (cherry picked from commit 9bffa34) * fix(gateway): strip unbound scopes for shared-auth connects (cherry picked from commit 7dc447f) * fix(gateway): increase WS handshake timeout from 3s to 10s (openclaw#49262) * fix(gateway): increase WS handshake timeout from 3s to 10s The 3-second default is too aggressive when the event loop is under load (concurrent sessions, compaction, agent turns), causing spurious 'gateway closed (1000)' errors on CLI commands like `openclaw cron list`. Changes: - Increase DEFAULT_HANDSHAKE_TIMEOUT_MS from 3_000 to 10_000 - Add OPENCLAW_HANDSHAKE_TIMEOUT_MS env var for user override (no VITEST gate) - Keep OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS as fallback for existing tests Fixes openclaw#46892 * fix: restore VITEST guard on test env var, use || for empty-string fallback, fix formatting * fix: cover gateway handshake timeout env override (openclaw#49262) (thanks @fuller-stack-dev) --------- Co-authored-by: Wilfred <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]> (cherry picked from commit 36f394c) * fix(gateway): skip Control UI pairing when auth.mode=none (closes openclaw#42931) (openclaw#47148) When auth is completely disabled (mode=none), requiring device pairing for Control UI operator sessions adds friction without security value since any client can already connect without credentials. Add authMode parameter to shouldSkipControlUiPairing so the bypass fires only for Control UI + operator role + auth.mode=none. This avoids the openclaw#43478 regression where a top-level OR disabled pairing for ALL websocket clients. (cherry picked from commit 26e0a3e) * fix(gateway): clear trusted-proxy control ui scopes (cherry picked from commit ccf16cd) * fix(gateway): guard interface discovery failures Closes openclaw#44180. Refs openclaw#47590. Co-authored-by: Peter Steinberger <[email protected]> (cherry picked from commit 3faaf89) * fix(gateway): suppress ciao interface assertions Closes openclaw#38628. Refs openclaw#47159, openclaw#52431. Co-authored-by: Peter Steinberger <[email protected]> (cherry picked from commit c0d4abc) * fix(gateway): run before_tool_call for HTTP tools (cherry picked from commit 8cc0c9b) * fix(gateway): skip seq-gap broadcast for stale post-lifecycle events (openclaw#43751) * fix: stop stale gateway seq-gap errors (openclaw#43751) (thanks @caesargattuso) * fix: keep agent.request run ids session-scoped --------- Co-authored-by: Ayaan Zaidi <[email protected]> (cherry picked from commit 57f1cf6) * fix(gateway): honor trusted proxy hook auth rate limits (cherry picked from commit 4da617e) * fix(gateway): enforce browser origin check regardless of proxy headers In trusted-proxy mode, enforceOriginCheckForAnyClient was set to false whenever proxy headers were present. This allowed browser-originated WebSocket connections from untrusted origins to bypass origin validation entirely, as the check only ran for control-ui and webchat client types. An attacker serving a page from an untrusted origin could connect through a trusted reverse proxy, inherit proxy-injected identity, and obtain operator.admin access via the sharedAuthOk / roleCanSkipDeviceIdentity path without any origin restriction. Remove the hasProxyHeaders exemption so origin validation runs for all browser-originated connections regardless of how the request arrived. Fixes GHSA-5wcw-8jjv-m286 (cherry picked from commit ebed3bb) * fix(gateway): harden health monitor account gating (openclaw#46749) * gateway: harden health monitor account gating * gateway: tighten health monitor account-id guard (cherry picked from commit 29fec8b) * fix(gateway): bound unanswered client requests (openclaw#45689) * fix(gateway): bound unanswered client requests * fix(gateway): skip default timeout for expectFinal requests * fix(gateway): preserve gateway call timeouts * fix(gateway): localize request timeout policy * fix(gateway): clamp explicit request timeouts * fix(gateway): clamp default request timeout (cherry picked from commit 5fc43ff) * fix(gateway): propagate real gateway client into plugin subagent runtime Plugin subagent dispatch used a hardcoded synthetic client carrying operator.admin, operator.approvals, and operator.pairing for all runtime.subagent.* calls. Plugin HTTP routes with auth:"plugin" require no gateway auth by design, so an unauthenticated external request could drive admin-only gateway methods (sessions.delete, agent.run) through the subagent runtime. Propagate the real gateway client into the plugin runtime request scope when one is available. Plugin HTTP routes now run inside a scoped runtime client: auth:"plugin" routes receive a non-admin synthetic operator.write client; gateway-authenticated routes retain admin-capable scopes. The security boundary is enforced at the HTTP handler level. Fixes GHSA-xw77-45gv-p728 (cherry picked from commit a1520d7) * fix(gateway): enforce caller-scope subsetting in device.token.rotate device.token.rotate accepted attacker-controlled scopes and forwarded them to rotateDeviceToken without verifying the caller held those scopes. A pairing-scoped token could rotate up to operator.admin on any already-paired device whose approvedScopes included admin. Add a caller-scope subsetting check before rotateDeviceToken: the requested scopes must be a subset of client.connect.scopes via the existing roleScopesAllow helper. Reject with missing scope: <scope> if not. Also add server.device-token-rotate-authz.test.ts covering both the priv-esc path and the admin-to-node-invoke chain. Fixes GHSA-4jpw-hj22-2xmc (cherry picked from commit dafd61b) * fix(gateway): pin plugin webhook route registry (openclaw#47902) (cherry picked from commit a69f619) * fix(gateway): split conversation reset from admin reset (cherry picked from commit c91d162) * fix(gateway): harden token fallback/reconnect behavior and docs (openclaw#42507) * fix(gateway): harden token fallback and auth reconnect handling * docs(gateway): clarify auth retry and token-drift recovery * fix(gateway): tighten auth reconnect gating across clients * fix: harden gateway token retry (openclaw#42507) (thanks @joshavant) (cherry picked from commit a76e810) * fix: adapt cherry-picks for fork TS strictness - Replace OpenClawConfig with RemoteClawConfig in server-channels and server-runtime-state - Replace loadOpenClawPlugins with loadRemoteClawPlugins in server-plugins and remove unsupported runtimeOptions field and dead subagent runtime code - Export HookClientIpConfig type from server-http and thread it through server/hooks into server-runtime-state and server.impl - Create plugins-http/ submodules (path-context, route-match, route-auth) extracted from the monolithic plugins-http.ts by upstream refactor - Create stub modules for gutted upstream layers: acp/control-plane/manager, agents/bootstrap-cache, agents/pi-embedded, agents/internal-events - Strip thinkingLevel, reasoningLevel, skillsSnapshot from SessionEntry literals in agent.ts and session-reset-service.ts (Pi-specific fields) - Remove internalEvents from agent ingress opts and loadGatewayModelCatalog from sessions patch call (not present in fork types) - Fix connect-policy tests to pass booleans instead of role strings for the sharedAuthOk parameter (fork changed the function signature) - Add isHealthMonitorEnabled to ChannelManager mocks in test files - Widen runBeforeToolCallHook mock return type to accept blocked: true - Add explicit string types for msg params in server-plugins logger Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]> * fix: apply fork naming to cherry-picked bonjour files --------- Co-authored-by: Andrew Demczuk <[email protected]> Co-authored-by: Peter Steinberger <[email protected]> Co-authored-by: fuller-stack-dev <[email protected]> Co-authored-by: Wilfred <[email protected]> Co-authored-by: Ayaan Zaidi <[email protected]> Co-authored-by: caesargattuso <[email protected]> Co-authored-by: Robin Waslander <[email protected]> Co-authored-by: Tak Hoffman <[email protected]> Co-authored-by: Peter Steinberger <[email protected]> Co-authored-by: Josh Avant <[email protected]> Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
…penclaw#43751) * fix: stop stale gateway seq-gap errors (openclaw#43751) (thanks @caesargattuso) * fix: keep agent.request run ids session-scoped --------- Co-authored-by: Ayaan Zaidi <[email protected]>
fix(gateway): skip seq-gap broadcast for stale post-lifecycle events
After a
lifecycle:endevent clearsagentRunSeq, any remainingin-flight events for the same runId arrive with
last === 0, causinga spurious
seq gaperror broadcast (expected: 1, received: N).Guard the check with
last > 0so only genuinely out-of-order events(where a prior seq was recorded) trigger the error.
Summary
lifecycle:endclearsagentRunSeq, residual in-flight events (e.g. a trailingchatorheartbeat) findlast === 0and incorrectly trigger aseq gaperror broadcast.agenterror event, causing downstream consumers (e.g. ClawdbotWebSocketClient) to treat a successful run as failed.createAgentEventHandleris now guarded bylast > 0; only events where a prior seq was already recorded can produce a gap error.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
Spurious
seq gaperror events are no longer broadcast to clients at the end of a successful agent run.Security Impact (required)
Repro + Verification
Environment
Steps
lifecycle:end(seq N) → trailingchat final/heartbeatevents arrive after.agent errorevent withreason: seq gap, expected: 1, received: N+1is broadcast.Expected
seq gaperror event afterlifecycle:end.Actual (before fix)
agent error { reason: "seq gap", expected: 1, received: 307 }broadcast to clients, causing WebSocket client to treat the run as errored.Evidence
Human Verification (required)
lifecycle:endclearsagentRunSeqbefore residual events arrive.Review Conversations
Compatibility / Migration
Failure Recovery (if this breaks)
last > 0 &&guard insrc/gateway/server-chat.ts.src/gateway/server-chat.tsonly.last === 0).Risks and Mitigations
last === 0,seq === 2would have reportedexpected: 1, received: 2— but that is also indistinguishable from the stale-event case); the practical risk is negligible.