fix(discord): scope command-deploy cache by application id#77367
fix(discord): scope command-deploy cache by application id#77367lonexreb wants to merge 3 commits intoopenclaw:mainfrom
Conversation
|
Codex review: needs real behavior proof before merge. Summary Reproducibility: yes. Source inspection on current main gives a high-confidence path: a shared hash-store path plus bare cache keys lets a second application with the same command hash return before REST reconcile; the PR adds stub REST regression tests for that path, but no live setup proof. Real behavior proof Next step before merge Security Review findings
Review detailsBest possible solution: Keep the application-id namespace and per-path cache serialization, then merge only after live multi-bot Discord proof is supplied and the small lock-map cleanup is addressed or accepted as non-blocking. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection on current main gives a high-confidence path: a shared hash-store path plus bare cache keys lets a second application with the same command hash return before REST reconcile; the PR adds stub REST regression tests for that path, but no live setup proof. Is this the best way to solve the issue? Yes for the core code direction: application-id cache keys plus serialized shared-cache persistence is the narrow maintainable fix. The remaining merge blocker is proof from a real setup, with one low-priority cleanup in the mutex bookkeeping. Full review comments:
Overall correctness: patch is correct What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 7d5ca3064a51. |
74583ed to
d6e0c53
Compare
|
Addressed [P2] from clawsweeper review. Rebased onto current |
d6e0c53 to
b9fd4e5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b9fd4e5861
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| private scopedCacheKey(suffix: string): string { | ||
| return `app:${this.params.clientId}:${suffix}`; |
There was a problem hiding this comment.
Merge cache contents before writing app-scoped keys
When the gateway starts multiple Discord accounts at once, src/gateway/server-channels.ts runs up to 4 account startups concurrently, so two deployers sharing command-deploy-cache.json can both load the old file before either writes. With these new per-application keys, the last persistHashes() rename writes only its own in-memory app:<id>:... entry and drops the other account's entry, so subsequent restarts miss the cache for at least one app and re-reconcile slash commands despite no command changes, defeating the rate-limit protection this cache provides. Please re-read/merge the current on-disk hashes before persisting or otherwise serialize writes for the shared file.
Useful? React with 👍 / 👎.
|
Addressed [P2] from codex review ( |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 80d9063861
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const raw = await fs.readFile(storePath, "utf8"); | ||
| const parsed = JSON.parse(raw) as { hashes?: unknown }; | ||
| if (parsed.hashes && typeof parsed.hashes === "object") { | ||
| for (const [key, value] of Object.entries(parsed.hashes)) { | ||
| if (typeof value === "string" && key.trim() && value.trim()) { |
There was a problem hiding this comment.
Serialize cache writes across deployers
Fresh evidence in this revision: although persistHashes now re-reads the file, it still does an unsynchronized read→merge→rename cycle, so two deployers persisting at the same time can both read the same old snapshot and the later rename will drop the first writer’s app:<id>:... entries. This means multi-account startup can still lose cache keys and re-trigger unnecessary Discord reconciles/rate-limit exposure; the new test does not cover this because it runs the two deploys sequentially (await deployerA... then await deployerB...).
Useful? React with 👍 / 👎.
Multi-bot Discord setups share a single command-deploy-cache.json under the state dir. Cache keys were unscoped (`global:reconcile`, `guild:<id>`), so a later account whose command set hashed identically to an earlier account would hit the shared hash and skip its own application's command reconcile entirely — Discord's Integration panel showed 'This application has no commands' for the secondary bot even though gateway connect, application id, and token were all valid. Scope every cache key with `app:<clientId>:` so each Discord application reconciles independently. Add regression tests covering: two applications with identical command sets each call REST against their own application; a single application with the same command set still hits the persisted cache; the on-disk cache JSON contains application-scoped keys. Fixes openclaw#77359.
…ncurrent writes Codex follow-up on openclaw#77359 noted that server-channels.ts can start multiple Discord deployers concurrently, so two deployers that both load the cache file before either persists end up with the second writer overwriting the first writer's app-scoped key — defeating the rate-limit cache that the file exists to provide. Inside persistHashes, re-read the on-disk cache and merge it with our in-memory entries before the rename. Our in-memory entries always win on key collisions (we just produced them); on-disk entries we don't have in memory are preserved. Refresh in-memory state after the write so future writes from the same deployer also keep entries other deployers added. This is the lighter of the two repairs the codex review suggested (re-read/merge vs serialize writes); it covers the realistic case where one deployer writes before the other persists. Add a regression test that exercises the load-then-other-deployer-writes-then-persist sequence.
80d9063 to
75f64f1
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 75f64f1ad8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const raw = await fs.readFile(storePath, "utf8"); | ||
| const parsed = JSON.parse(raw) as { hashes?: unknown }; | ||
| if (parsed.hashes && typeof parsed.hashes === "object") { | ||
| for (const [key, value] of Object.entries(parsed.hashes)) { | ||
| if (typeof value === "string" && key.trim() && value.trim()) { |
There was a problem hiding this comment.
Serialize command-deploy cache writes across deployers
Fresh evidence in this commit: cache keys are now per-application (app:<id>:...), so dropping one writer’s key now causes that bot to miss cache hits and re-run Discord reconciliation. persistHashes still performs an unsynchronized read→merge→rename cycle, so if two deployers persist concurrently they can both read the same old snapshot and the later rename overwrites the earlier write, losing entries from the first deployer; this is reachable during multi-account startup where channel accounts are started with concurrency in src/gateway/server-channels.ts.
Useful? React with 👍 / 👎.
…mutex Codex follow-up on openclaw#77367 noted: re-read-before-write inside persistHashes isn't enough — two deployers running persistHashes in true parallel can both read the same snapshot before either writes, and the later rename overwrites the earlier writer's app-scoped entries. Add a module-level Map<storePath, Promise<void>> mutex and wrap the read-merge-write cycle in withCachePersistLock so concurrent persists for the same on-disk path serialize. In-process is sufficient because Discord deployers only run inside the gateway process. New regression test fires three deployers via Promise.all on the same tick and asserts all three application-scoped entries survive — pre-fix this race lost at least one entry.
|
Addressed [P2] codex follow-up — the previous "re-read inside persistHashes" alone was insufficient for true concurrent persists. Added an in-process per-path async mutex ( Re-review progress:
|
Summary
Fixes #77359 — multi-bot Discord setups where slash commands silently fail to
register for the secondary account. The persisted command-deploy cache
(
command-deploy-cache.json) was shared across Discord applications and keyedonly by command-set scope (
global:reconcile,guild:<id>,dev-guild:<id>,global:overwrite). When a later account with the same command set startedafter the first account had already written the shared hash,
putCommandSetIfChangedreturned early and the later account skipped reconciling its own Discord
application — leaving "This application has no commands" in the Integrations
panel for that bot even though gateway connect, application id, and token were
all valid.
Per-application application id is correctly resolved by
monitorDiscordProvider(
extensions/discord/src/monitor/provider.ts:316); the bug is purely in thepersistence layer of
DiscordCommandDeployer.Fix
Scope every cache key with
app:<clientId>:insideDiscordCommandDeployer,where
clientIdis the resolved Discord application id. The shared cache fileis preserved (still good for multi-account rate-limit protection on a single
host) but each application reconciles independently.
This forces a one-time re-reconcile after upgrade as old keys (
global:reconcile)no longer match new keys (
app:<id>:global:reconcile); subsequent restartshit the cache normally.
Verification
pnpm test extensions/discord/src/internal/command-deploy.test.ts— 14 passed (3 new regression tests)pnpm test extensions/discord/src/internal/client.test.ts extensions/discord/src/monitor/provider.startup.test.ts— 23 passedpnpm exec oxfmt --check --threads=1 extensions/discord/src/internal/command-deploy.ts extensions/discord/src/internal/command-deploy.test.ts CHANGELOG.md— cleanpnpm check:changed— passes for the changed lanes; the typecheck failure onsrc/infra/command-explainer/{extract,tree-sitter-runtime}.ts(web-tree-sittermodule not found) reproduces on a clean
origin/maincheckout and isunrelated to this PR.
New regression tests cover:
against their own application id (the original bug — would have silently
passed before).
persisted cache (no regression on the rate-limit-protection path).
app:<id>:global:reconcilekeys, never thebare
global:reconcilekey (lock the wire format so this doesn't silentlyregress).
Closes
Fixes #77359
Real behavior proof
Test:
pnpm test extensions/discord/src/internal/command-deploy.test.ts— 15 passed.The regression test exercises
DiscordCommandDeployeragainst a stub REST client. Pre-fix: two deployers with the same command set sharing onecommand-deploy-cache.jsonwould have one skip its own application's reconcile because the cache key was unscoped. Post-fix test asserts each application reconciles against its own client id, AND that concurrent writes merge entries instead of overwriting (re-read on-disk hashes insidepersistHashes). Same code path runs at gateway startup.