Gateway: rebind plugin registry before channel lifecycle#49526
Gateway: rebind plugin registry before channel lifecycle#49526odex21 wants to merge 5 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes a race condition in the BlueBubbles channel hot-reload path where plugin HTTP routes were being registered on a stale Key changes:
Concern — deferred plugin loading path: Confidence Score: 3/5
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c10d6189b4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
c10d618 to
2882f67
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 707ad0860c
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Quick scope note relative to #45150 and #45568:
So the three fixes sit at different points in the chain:
This PR is specifically the third case. |
707ad08 to
58d7afb
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 58d7afb3a2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ReviewRebinding the plugin registry before channel lifecycle operations is a targeted fix. Changes to Good test coverage in LGTM. Verify existing gateway startup/reload tests pass. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4bb5500d2f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b7d1b2d029
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
55f09d9 to
8247c0a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8247c0a0bd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
| } | ||
|
|
||
| return { registry: activeRegistry, cacheKey: activeCacheKey }; |
There was a problem hiding this comment.
Don't promote registries with additional channel IDs
If a running gateway lazily bootstraps an external channel plugin via src/infra/outbound/channel-resolution.ts, this helper now upgrades channel lifecycle calls to that newer registry just because it contains a superset of channel IDs. However startGatewayServer() still builds channelLogs and channelRuntimeEnvs once from the original registry (src/gateway/server.impl.ts:556-561), and startChannelInternal() indexes those frozen maps when starting a channel (src/gateway/server-channels.ts:338-345). A later hot-reload/manual restart of the newly added channel will therefore call its startAccount with runtime: undefined (and usually log: undefined), which can break startup until the whole gateway restarts.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Thanks, I agree this is a real concern for dynamically introduced channel plugins.
That said, I do not think it is the right thing to expand this PR to solve. This patch is scoped to the reproducible bug where an already-configured BlueBubbles channel stops receiving webhooks after a channels.bluebubbles.* hot reload. Newly discovered channel plugins are a different lifecycle boundary: today channelLogs / channelRuntimeEnvs are startup-scoped, so bringing a brand new channel online still effectively requires a full gateway restart.
In other words, I read this as a broader gateway runtime contract issue rather than a regression in the verified BlueBubbles restart path this PR is fixing. I’m leaving that boundary explicit here instead of widening this bug-fix PR further.
8247c0a to
2368548
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2368548429
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| cache: false, | ||
| activate: false, |
There was a problem hiding this comment.
Keep schema snapshots from mutating live plugin globals
Making schema reads non-activating here avoids replacing the active plugin registry, but it does not keep the other plugin globals stable. Snapshot loads still execute api.registerInteractiveHandler() and api.registerContextEngine() in full registration mode (src/plugins/registry.ts:934-971), and those APIs mutate process-wide registries without any activate:false guard. After a config.get or config.schema* request, a newly discovered extension can therefore start handling interactive callbacks or become available through resolveContextEngine() even though the gateway never activated that plugin's live registry or hook runner.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Thanks, I agree this points at a broader plugin-loader side effect issue.
The change in this PR was intentionally narrower: keep config.get / config.schema* from replacing the live active registry and global hook runner while the gateway is serving channel traffic. That was directly tied to the restart-path regression this PR is fixing.
The remaining global side effects from snapshot loads (registerInteractiveHandler, registerContextEngine, etc.) look real, but they are not part of the BlueBubbles hot-reload repro or the local/manual verification for this PR. I think that cleanup belongs in a separate follow-up that can define the intended snapshot-registration contract more explicitly, rather than continuing to widen this bug-fix PR.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7202ce4410
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| pluginRegistry = channelLifecyclePluginRuntimeState.registry; | ||
| pluginRegistryCacheKey = channelLifecyclePluginRuntimeState.cacheKey; |
There was a problem hiding this comment.
Refresh WS method handlers when promoting a live plugin registry
When resolvePluginRuntimeStateForChannels() upgrades pluginRegistry here, the gateway only swaps the registry object and cache key. The WebSocket dispatcher has already snapshotted gatewayMethods and extraHandlers from the startup registry at src/gateway/server.impl.ts:576-577 and src/gateway/server.impl.ts:1139-1147, so the next channel lifecycle call can leave the process in a mixed state: getChannelPlugin()/hook lookups use the promoted registry, but plugin gateway methods from that registry still hit the old handler table and return unknown method. A concrete case is a live upgrade to Matrix's full runtime, which registers matrix.verify.* in extensions/matrix/index.ts; after the promotion those verification RPCs stay unavailable until the whole gateway restarts.
Useful? React with 👍 / 👎.
Summary
Describe the problem and fix in 2–5 bullets:
channels.bluebubbles.*could hot-reload the BlueBubbles channel after plugin runtime state had drifted, so the restart path could lose the live webhook registration and BlueBubbles webhook deliveries would fall through to404 Not Found.http://localhost:18789/bluebubbles-webhook?...still behaved as if the route was missing after channel restart / hot reload.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
In the verified local BlueBubbles hot-reload path, the webhook stays reachable after a
channels.bluebubbles.*config change and channel restart instead of falling through to404 Not Found.Security Impact (required)
Yes/No) NoYes/No) NoYes/No) NoYes/No) NoYes/No) NoYes, explain risk + mitigation:Repro + Verification
Environment
channels.bluebubbles.*with a local webhook URL athttp://localhost:18789/bluebubbles-webhook?...Steps
channels.bluebubbles.*in the local config so the BlueBubbles channel hot-reloads / restarts.Expected
Actual
404 Not Foundafter channel restart / hot reload.Evidence
Attach at least one:
Manual replay evidence from the current local verification:
config change detected; evaluating reload (channels.bluebubbles.allowFrom, channels.bluebubbles.accounts.default.allowFrom)restarting bluebubbles channelBlueBubbles webhook listening on /bluebubbles-webhook400 invalid payloadThat
400 invalid payloadis intentional from an incomplete test payload and shows the request now reaches the BlueBubbles webhook handler instead of missing the route.Human Verification (required)
What you personally verified (not just CI), and how:
channels.bluebubbles.allowFrom/channels.bluebubbles.accounts.default.allowFrom, observed the config watcher trigger a BlueBubbles restart, and replayed the same webhook URL after restart.400 invalid payloadinstead of404 Not Found, confirming handler reachability after the config-triggered restart.Review Conversations
If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.
Compatibility / Migration
Yes/No) YesYes/No) NoYes/No) NoFailure Recovery (if this breaks)
src/gateway/server-channels.ts,src/gateway/server-plugin-runtime-state.ts,src/gateway/server.impl.ts,src/gateway/server-methods/config.ts,src/channels/plugins/index.ts,src/channels/plugins/registry.ts,src/gateway/server-channels.test.ts,src/gateway/server.config-patch.test.tsRisks and Mitigations
List only real risks for this PR. Add/remove entries as needed. If none, write
None.404.