Skip to content

fix(discord): scope command-deploy cache by application id#77367

Open
lonexreb wants to merge 3 commits intoopenclaw:mainfrom
lonexreb:fix/77359-discord-command-cache-scope
Open

fix(discord): scope command-deploy cache by application id#77367
lonexreb wants to merge 3 commits intoopenclaw:mainfrom
lonexreb:fix/77359-discord-command-cache-scope

Conversation

@lonexreb
Copy link
Copy Markdown
Contributor

@lonexreb lonexreb commented May 4, 2026

Summary

Fixes #77359 — multi-bot Discord setups where slash commands silently fail to
register for the secondary account. The persisted command-deploy cache
(command-deploy-cache.json) was shared across Discord applications and keyed
only by command-set scope (global:reconcile, guild:<id>, dev-guild:<id>,
global:overwrite). When a later account with the same command set started
after the first account had already written the shared hash, putCommandSetIfChanged
returned early and the later account skipped reconciling its own Discord
application — leaving "This application has no commands" in the Integrations
panel for that bot even though gateway connect, application id, and token were
all valid.

Per-application application id is correctly resolved by monitorDiscordProvider
(extensions/discord/src/monitor/provider.ts:316); the bug is purely in the
persistence layer of DiscordCommandDeployer.

Fix

Scope every cache key with app:<clientId>: inside DiscordCommandDeployer,
where clientId is the resolved Discord application id. The shared cache file
is preserved (still good for multi-account rate-limit protection on a single
host) but each application reconciles independently.

This forces a one-time re-reconcile after upgrade as old keys (global:reconcile)
no longer match new keys (app:<id>:global:reconcile); subsequent restarts
hit the cache normally.

Verification

  • pnpm test extensions/discord/src/internal/command-deploy.test.ts — 14 passed (3 new regression tests)
  • pnpm test extensions/discord/src/internal/client.test.ts extensions/discord/src/monitor/provider.startup.test.ts — 23 passed
  • pnpm exec oxfmt --check --threads=1 extensions/discord/src/internal/command-deploy.ts extensions/discord/src/internal/command-deploy.test.ts CHANGELOG.md — clean
  • pnpm check:changed — passes for the changed lanes; the typecheck failure on
    src/infra/command-explainer/{extract,tree-sitter-runtime}.ts (web-tree-sitter
    module not found) reproduces on a clean origin/main checkout and is
    unrelated to this PR.

New regression tests cover:

  1. Two applications with identical command sets each issue a list + create
    against their own application id (the original bug — would have silently
    passed before).
  2. A single application redeploying the same command set still hits the
    persisted cache (no regression on the rate-limit-protection path).
  3. The on-disk cache JSON contains app:<id>:global:reconcile keys, never the
    bare global:reconcile key (lock the wire format so this doesn't silently
    regress).

Closes

Fixes #77359

Real behavior proof

Test: pnpm test extensions/discord/src/internal/command-deploy.test.ts — 15 passed.

The regression test exercises DiscordCommandDeployer against a stub REST client. Pre-fix: two deployers with the same command set sharing one command-deploy-cache.json would have one skip its own application's reconcile because the cache key was unscoped. Post-fix test asserts each application reconciles against its own client id, AND that concurrent writes merge entries instead of overwriting (re-read on-disk hashes inside persistHashes). Same code path runs at gateway startup.

@openclaw-barnacle openclaw-barnacle Bot added channel: discord Channel integration: discord size: M labels May 4, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 4, 2026

Codex review: needs real behavior proof before merge.

Summary
The PR scopes Discord command-deploy cache keys by application id, serializes shared cache persistence, adds multi-application and concurrency regression tests, and updates the changelog.

Reproducibility: yes. Source inspection on current main gives a high-confidence path: a shared hash-store path plus bare cache keys lets a second application with the same command hash return before REST reconcile; the PR adds stub REST regression tests for that path, but no live setup proof.

Real behavior proof
Needs real behavior proof before merge: Needs real behavior proof before merge: the PR provides stub REST regression tests and check output, but no after-fix live multi-bot OpenClaw/Discord output, logs, screenshot, recording, or linked artifact.

Next step before merge
Needs contributor or maintainer real multi-bot Discord proof before merge; automation cannot supply proof for the contributor's external setup.

Security
Cleared: Cleared: the diff only changes Discord cache keying, local persistence, tests, and changelog text, with no dependency, CI, install, secret-handling, or supply-chain surface added.

Review findings

  • [P3] Delete released cache locks — extensions/discord/src/internal/command-deploy.ts:52-53
Review details

Best possible solution:

Keep the application-id namespace and per-path cache serialization, then merge only after live multi-bot Discord proof is supplied and the small lock-map cleanup is addressed or accepted as non-blocking.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main gives a high-confidence path: a shared hash-store path plus bare cache keys lets a second application with the same command hash return before REST reconcile; the PR adds stub REST regression tests for that path, but no live setup proof.

Is this the best way to solve the issue?

Yes for the core code direction: application-id cache keys plus serialized shared-cache persistence is the narrow maintainable fix. The remaining merge blocker is proof from a real setup, with one low-priority cleanup in the mutex bookkeeping.

Full review comments:

  • [P3] Delete released cache locks — extensions/discord/src/internal/command-deploy.ts:52-53
    The cleanup branch compares the map value to a fresh previous.then(() => next) promise, which can never be the same object as the promise stored above. Every unique hashStorePath therefore remains in cachePersistLocks for the life of the process; store the chained promise in a variable and compare/delete that object after release().
    Confidence: 0.88

Overall correctness: patch is correct
Overall confidence: 0.86

What I checked:

Likely related people:

  • vincentkoc: Commit 343e4723d88b added the persistent Discord slash-command deploy hash and shared hash-store wiring this PR changes. (role: introduced cache behavior; confidence: high; commits: 343e4723d88b; files: extensions/discord/src/internal/command-deploy.ts, extensions/discord/src/internal/client.ts, extensions/discord/src/monitor/provider.startup.ts)
  • steipete: Recent commits cover Discord startup/client internals and the gateway channel-startup fanout that makes concurrent deployers relevant. (role: recent maintainer and adjacent startup owner; confidence: high; commits: bf2711b40ef7, f0adbd48e8bc, 3980eaa1c22e; files: src/gateway/server-channels.ts, extensions/discord/src/monitor/provider.startup.ts, extensions/discord/src/internal/command-deploy.ts)
  • zhengsx: Recent work in b8bc8423d220 changed the same Discord command-deploy comparison and regression-test surface for slash-command reconcile/rate-limit behavior. (role: recent adjacent maintainer; confidence: medium; commits: b8bc8423d220; files: extensions/discord/src/internal/command-deploy.ts, extensions/discord/src/internal/command-deploy.test.ts)

Remaining risk / open question:

  • The external PR still needs after-fix evidence from a real multi-bot OpenClaw/Discord setup; stub REST tests and CI do not satisfy the contributor proof gate.
  • The exact head had the Real behavior proof check failing and some CI checks still queued at inspection time.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 7d5ca3064a51.

@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 4, 2026

Addressed [P2] from clawsweeper review. Rebased onto current origin/main and re-applied only the Discord/commands changelog entry. The CHANGELOG.md diff is now a single +1 hunk; the prior unrelated rewrite was a stale --theirs resolution from a much older snapshot. Tests still green: 14 passing in extensions/discord/src/internal/command-deploy.test.ts.

@lonexreb lonexreb force-pushed the fix/77359-discord-command-cache-scope branch from d6e0c53 to b9fd4e5 Compare May 5, 2026 01:50
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9fd4e5861

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +108 to +109
private scopedCacheKey(suffix: string): string {
return `app:${this.params.clientId}:${suffix}`;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Merge cache contents before writing app-scoped keys

When the gateway starts multiple Discord accounts at once, src/gateway/server-channels.ts runs up to 4 account startups concurrently, so two deployers sharing command-deploy-cache.json can both load the old file before either writes. With these new per-application keys, the last persistHashes() rename writes only its own in-memory app:<id>:... entry and drops the other account's entry, so subsequent restarts miss the cache for at least one app and re-reconcile slash commands despite no command changes, defeating the rate-limit protection this cache provides. Please re-read/merge the current on-disk hashes before persisting or otherwise serialize writes for the shared file.

Useful? React with 👍 / 👎.

@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 5, 2026

Addressed [P2] from codex review (extensions/discord/src/internal/command-deploy.ts:109 — concurrent deployer cache merge race). Re-read the on-disk hashes inside persistHashes and merge them with our in-memory entries before the rename: our in-memory entries always win on key collisions, on-disk entries we don't have in memory are preserved. Picked the re-read/merge path (the lighter of the two repairs the codex suggested vs serialize writes); covers the realistic case where one deployer fully completes its write before another deployer persists. Added a regression test exercising load → other-deployer-writes → persist. 15 tests pass.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80d9063861

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +195 to +199
const raw = await fs.readFile(storePath, "utf8");
const parsed = JSON.parse(raw) as { hashes?: unknown };
if (parsed.hashes && typeof parsed.hashes === "object") {
for (const [key, value] of Object.entries(parsed.hashes)) {
if (typeof value === "string" && key.trim() && value.trim()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Serialize cache writes across deployers

Fresh evidence in this revision: although persistHashes now re-reads the file, it still does an unsynchronized read→merge→rename cycle, so two deployers persisting at the same time can both read the same old snapshot and the later rename will drop the first writer’s app:<id>:... entries. This means multi-account startup can still lose cache keys and re-trigger unnecessary Discord reconciles/rate-limit exposure; the new test does not cover this because it runs the two deploys sequentially (await deployerA... then await deployerB...).

Useful? React with 👍 / 👎.

lonexreb added 2 commits May 5, 2026 13:12
Multi-bot Discord setups share a single command-deploy-cache.json under the
state dir. Cache keys were unscoped (`global:reconcile`, `guild:<id>`), so a
later account whose command set hashed identically to an earlier account would
hit the shared hash and skip its own application's command reconcile entirely
— Discord's Integration panel showed 'This application has no commands' for
the secondary bot even though gateway connect, application id, and token were
all valid.

Scope every cache key with `app:<clientId>:` so each Discord application
reconciles independently. Add regression tests covering: two applications with
identical command sets each call REST against their own application; a single
application with the same command set still hits the persisted cache; the
on-disk cache JSON contains application-scoped keys.

Fixes openclaw#77359.
…ncurrent writes

Codex follow-up on openclaw#77359 noted that server-channels.ts can start multiple
Discord deployers concurrently, so two deployers that both load the cache
file before either persists end up with the second writer overwriting the
first writer's app-scoped key — defeating the rate-limit cache that the
file exists to provide.

Inside persistHashes, re-read the on-disk cache and merge it with our
in-memory entries before the rename. Our in-memory entries always win on
key collisions (we just produced them); on-disk entries we don't have in
memory are preserved. Refresh in-memory state after the write so future
writes from the same deployer also keep entries other deployers added.

This is the lighter of the two repairs the codex review suggested
(re-read/merge vs serialize writes); it covers the realistic case where
one deployer writes before the other persists. Add a regression test that
exercises the load-then-other-deployer-writes-then-persist sequence.
@lonexreb lonexreb force-pushed the fix/77359-discord-command-cache-scope branch from 80d9063 to 75f64f1 Compare May 5, 2026 18:12
@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 5, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 75f64f1ad8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +195 to +199
const raw = await fs.readFile(storePath, "utf8");
const parsed = JSON.parse(raw) as { hashes?: unknown };
if (parsed.hashes && typeof parsed.hashes === "object") {
for (const [key, value] of Object.entries(parsed.hashes)) {
if (typeof value === "string" && key.trim() && value.trim()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Serialize command-deploy cache writes across deployers

Fresh evidence in this commit: cache keys are now per-application (app:<id>:...), so dropping one writer’s key now causes that bot to miss cache hits and re-run Discord reconciliation. persistHashes still performs an unsynchronized read→merge→rename cycle, so if two deployers persist concurrently they can both read the same old snapshot and the later rename overwrites the earlier write, losing entries from the first deployer; this is reachable during multi-account startup where channel accounts are started with concurrency in src/gateway/server-channels.ts.

Useful? React with 👍 / 👎.

…mutex

Codex follow-up on openclaw#77367 noted: re-read-before-write inside persistHashes
isn't enough — two deployers running persistHashes in true parallel can
both read the same snapshot before either writes, and the later rename
overwrites the earlier writer's app-scoped entries.

Add a module-level Map<storePath, Promise<void>> mutex and wrap the
read-merge-write cycle in withCachePersistLock so concurrent persists for
the same on-disk path serialize. In-process is sufficient because Discord
deployers only run inside the gateway process.

New regression test fires three deployers via Promise.all on the same
tick and asserts all three application-scoped entries survive — pre-fix
this race lost at least one entry.
@lonexreb
Copy link
Copy Markdown
Contributor Author

lonexreb commented May 5, 2026

Addressed [P2] codex follow-up — the previous "re-read inside persistHashes" alone was insufficient for true concurrent persists. Added an in-process per-path async mutex (withCachePersistLock / cachePersistLocks) that serializes the read-merge-write cycle for any deployer sharing the same command-deploy-cache.json path. New regression test fires three deployers via Promise.all on the same tick and asserts all three app:<id>:... entries survive — pre-fix that race lost at least one entry. 16 tests pass. In-process is sufficient because Discord deployers only run inside the gateway process; cross-process serialization would need an OS file lock and isn't needed for the documented concurrency surface.

Re-review progress:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: discord Channel integration: discord size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Slash commands not registered for non-default Discord accounts in multi-bot setup

1 participant