Skip to content

fix(process): use globalThis singleton for command-queue state to survive code-splitting#41903

Closed
best wants to merge 1 commit intoopenclaw:mainfrom
best:fix/command-queue-globalThis-singleton
Closed

fix(process): use globalThis singleton for command-queue state to survive code-splitting#41903
best wants to merge 1 commit intoopenclaw:mainfrom
best:fix/command-queue-globalThis-singleton

Conversation

@best
Copy link
Copy Markdown

@best best commented Mar 10, 2026

Summary

  • Problem: command-queue.ts stores shared runtime state (lanes, gatewayDraining, nextTaskId) as module-scoped variables. tsdown code-splitting duplicates this module into 10 independent chunks, each with isolated state. setCommandLaneConcurrency() only updates one chunk; actual message processing uses different chunks stuck at maxConcurrent=1.
  • Why it matters: agents.defaults.maxConcurrent config is completely ignored for all users. All requests serialize regardless of configuration. Additionally, resetAllLanes() and markGatewayDraining() only affect one chunk.
  • What changed: Replaced 3 module-scoped variables with a globalThis singleton via Symbol.for('openclaw.commandQueueState'), following the same pattern used in internal-hooks.ts and context-engine/registry.ts. Updated 14 call sites.
  • What did NOT change (scope boundary): No changes to tests, build config, other modules, or the lanes.ts lane-name helpers. Error classes remain module-scoped (separate concern).

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • agents.defaults.maxConcurrent now actually takes effect across all chunks (was silently ignored)
  • resetAllLanes() on SIGUSR1 now correctly resets lanes in all chunks (was only resetting one)
  • markGatewayDraining() now visible to pi-embedded chunks during shutdown (was invisible)

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux 6.12.73+deb13-amd64 (x64)
  • Runtime/container: Node.js v22.22.0
  • Model/provider: Any (bug is model-independent)
  • Integration/channel: Telegram + Discord (any multi-channel setup)
  • Relevant config: agents.defaults.maxConcurrent: 10

Steps

  1. Configure agents.defaults.maxConcurrent: 10
  2. Send messages to two channels simultaneously
  3. Observe both process concurrently (before fix: second waits for first)

Expected

  • Messages on different channels process concurrently up to configured limit

Actual

  • Before fix: all requests serialize through maxConcurrent=1
  • After fix: configured concurrency is respected

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets

All 16 existing command-queue.test.ts tests pass unchanged:

✓ src/process/command-queue.test.ts > command queue > resetAllLanes is safe when no lanes have been created
✓ src/process/command-queue.test.ts > command queue > runs tasks one at a time in order
✓ src/process/command-queue.test.ts > command queue > logs enqueue depth after push
...
Test Files  1 passed (1)
     Tests  16 passed (16)

dist/ state duplication confirmed:

$ rg -l 'let nextTaskId = 1;' dist/
dist/compact-BIp3lbnh.js
dist/pi-embedded-CJVNBk0y.js
dist/pi-embedded-DePf5reQ.js
dist/plugin-sdk/dispatch-C5fHAX5a.js
dist/plugin-sdk/dispatch-D36HVShU.js
dist/plugin-sdk/dispatch-DHll-mp1.js
dist/plugin-sdk/dispatch-DJpqKByO.js
dist/plugin-sdk/dispatch-vBEK_Alg.js
dist/plugin-sdk/reply-DYZ7va08.js
dist/reply-BZQv8rey.js

Human Verification (required)

  • Verified scenarios: All 16 existing unit tests pass; manual verification of serial blocking before fix and concurrent processing after applying the fix to dist files
  • Edge cases checked: resetAllLanes() still correctly clears globalThis state; markGatewayDraining() propagates across chunks; primitive values (gatewayDraining, nextTaskId) accessed through state object (not destructured) to ensure mutations propagate
  • What you did not verify: Full pnpm build && pnpm check (shallow clone, no full dev dependencies); cross-chunk instanceof for error classes (separate concern, noted in issue)

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Revert this single commit; the module falls back to module-scoped variables
  • Files/config to restore: src/process/command-queue.ts
  • Known bad symptoms reviewers should watch for: If Symbol.for key collides with external code (extremely unlikely given namespace), queue state could leak

Risks and Mitigations

AI-Assisted Contribution

  • AI-assisted: This PR was authored by an AI agent (OpenClaw/Claude) with human review and approval by @best
  • Degree of testing: All 16 existing unit tests pass unchanged; manual verification of serial blocking before fix and concurrent processing after patching dist files; CI full suite (30/30 checks) passed
  • Understanding confirmed: The contributor and reviewer both understand the change — it applies an existing codebase pattern (globalThis singleton used in internal-hooks.ts and context-engine/registry.ts) to command-queue.ts which was missed
  • Bot review conversations: Greptile review addressed (5/5 confidence, no issues raised). Codex review passed (no issues found).

…vive code-splitting

command-queue.ts stores shared runtime state (lanes, gatewayDraining,
nextTaskId) as module-scoped variables. tsdown code-splitting duplicates
this module into 10 independent chunks, each with its own isolated copy.

This causes:
- setCommandLaneConcurrency() only updates one chunk's lanes Map
- agents.defaults.maxConcurrent config is completely ignored
- All requests serialize through maxConcurrent=1 regardless of config
- resetAllLanes() (SIGUSR1) only resets one chunk's lanes
- markGatewayDraining() is invisible to pi-embedded chunks

Replace module-scoped state with a globalThis singleton using
Symbol.for('openclaw.commandQueueState'), following the same pattern
already used in internal-hooks.ts and context-engine/registry.ts.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR correctly fixes a real bug where tsdown code-splitting duplicated command-queue.ts into up to 10 independent chunks, each with isolated module-scoped state. This caused setCommandLaneConcurrency(), markGatewayDraining(), and resetAllLanes() to only affect one chunk, silently ignoring agents.defaults.maxConcurrent config and serializing all requests.

The fix replaces three module-scoped variables (gatewayDraining, lanes, nextTaskId) with a globalThis singleton keyed by Symbol.for("openclaw.commandQueueState"), following the exact same pattern already used in context-engine/registry.ts and consistent with internal-hooks.ts.

Technical correctness verified:

  • Mutable primitives (nextTaskId, gatewayDraining) are always accessed through the state object reference rather than destructured, ensuring increments/mutations propagate correctly across all chunks
  • The Map for lanes is a reference type accessed safely through the state object
  • The singleton object reference never changes — only its properties are mutated — so closures in drainLane's pump function remain valid even after resetAllLanes() bumps the generation
  • All call sites (14+ locations) correctly updated to use the new singleton pattern
  • All 16 existing tests pass unchanged

The change is minimal, targeted, and follows an established codebase pattern. Error classes remain module-scoped (separate pre-existing concern noted in #41901).

Confidence Score: 5/5

  • Safe to merge — targeted bug fix with proven reference semantics correctness and no behavior regressions.
  • The fix correctly identifies and solves a real code-splitting bug by replacing module-scoped state with a globalThis singleton. All mutation semantics (mutable primitives via state object reference, Map references, stable closure references) are verified as correct. All 16 existing tests pass unchanged. The implementation follows an established codebase pattern. No new logic introduced — only state storage location changes to survive code-splitting.
  • No files require special attention

Last reviewed commit: cbff929

@best
Copy link
Copy Markdown
Author

best commented Mar 11, 2026

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@best
Copy link
Copy Markdown
Author

best commented Mar 11, 2026

@vincentkoc This PR is ready for review. CI green (30/30), Greptile 5/5, Codex passed with no issues.

Fixes a state isolation bug in command-queue.ts where tsdown code-splitting causes maxConcurrent to be silently ignored — the same globalThis singleton pattern already used in internal-hooks.ts. Single file change, all existing tests pass. Details in #41901.

@vincentkoc
Copy link
Copy Markdown
Member

Linking #43683 here because it carries the same command-queue diagnosis but expands the fix across the other split registries that were still producing the cross-channel duplicate/busy-session reports.

The broader audit found additional mutable state duplicated into separate bundled chunks beyond command-queue itself:

  • embedded runs
  • followup queue state / drain callbacks / queued-message dedupe
  • inbound dedupe
  • Slack thread participation
  • Telegram thread bindings
  • Telegram draft-id allocation
  • Telegram sent-message cache

So this is not a competing root cause. It is the same bundled-runtime singleton split, fixed across the rest of the live state surfaces that were still diverging in production builds.

Tracking the broader pass in #43683.

@best
Copy link
Copy Markdown
Author

best commented Mar 12, 2026

Thanks for the comprehensive follow-up in #43683 — that's exactly the broader audit we were planning to do next. Happy to see the fix expanded to cover the followup queue, dedupe cache, and channel-specific state as well.

Closing this PR in favor of #43683 since it covers the same command-queue.ts fix plus all the other affected registries. Our original investigation and evidence in #41901 should still be useful context for reviewers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: command-queue module-scoped state duplicated by code-splitting, maxConcurrent config ignored

2 participants