fix: avoid manual cron deadlock on reentrant lanes by siewcapital · Pull Request #42161 · openclaw/openclaw

siewcapital · 2026-03-10T13:08:57Z

Summary

Problem: manual openclaw cron run <jobId> can enqueue successfully but never execute because the same cron lane is re-entered through the embedded runner path.
Why it matters: ad-hoc cron runs can sit until timeout even though scheduled runs still work.
What changed: the command queue now detects when work re-enters a lane already active in the current async call stack and runs that inner work directly instead of queuing it behind itself.
What did NOT change (scope boundary): this PR does not change cron delivery behavior, cron scheduling rules, or subagent queue limits.

Change Type

Scope

Linked Issue/PR

User-visible / Behavior Changes

Manual cron runs no longer deadlock when the cron lane is re-entered through nested queueing.
Scheduled cron behavior is unchanged.

Security Impact

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: local macOS dev environment for tests
Runtime/container: Node.js dev environment
Model/provider: not required for the regression tests
Integration/channel (if any): not required for the regression tests
Relevant config (redacted): not required

Steps

Queue work in a lane.
Re-enter that same lane through nested async work.
Observe whether the inner task completes or waits behind itself forever.

Expected

Re-entrant same-lane work completes.
The queue does not deadlock on cron -> session -> cron re-entry.

Actual

Before this change, nested re-entry could queue work behind the currently active task in the same lane and time out.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification

What you personally verified (not just CI), and how:

Verified scenarios: ran pnpm vitest run src/process/command-queue.test.ts src/cron/service.issue-regressions.test.ts, pnpm exec oxlint src/process/command-queue.ts src/process/command-queue.test.ts, and pnpm build
Edge cases checked: direct same-lane re-entry and cron -> session -> cron re-entry
What you did not verify: full live manual cron reproduction against a running gateway with a real model provider in this branch

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly: revert this PR or restore the previous queue behavior in src/process/command-queue.ts
Files/config to restore: src/process/command-queue.ts, src/process/command-queue.test.ts
Known bad symptoms reviewers should watch for: unexpected behavior in other intentionally queued same-lane nested workflows

Risks and Mitigations

Risk: same-lane nested work now bypasses queueing when the lane is already active in the current async stack
- Mitigation: behavior is covered by focused queue tests plus the existing cron regression test file, and the change is limited to re-entrant ownership of the same lane

greptile-apps · 2026-03-10T13:14:40Z

Greptile Summary

This PR fixes a deadlock that occurs when openclaw cron run <jobId> re-enters the same cron lane through the embedded runner path. The solution uses AsyncLocalStorage to track which lanes are currently executing in the async call stack and short-circuits enqueueCommandInLane to call the task directly when the target lane is already owned by the current async context.

The fix is well-scoped, properly tested with focused regression tests (direct re-entry and cross-lane re-entry), and includes sensible draining semantics: re-entrant work is allowed to complete even during gateway drain, which is correct because such work is part of an already-committed outer task. The code is safe to merge.

Confidence Score: 5/5

This PR is safe to merge — the re-entrancy fix is well-scoped, uses idiomatic async patterns, and is covered by focused regression tests.
The fix correctly uses AsyncLocalStorage to track lane execution context and short-circuits re-entrant calls. The two regression tests (direct and cross-lane re-entry) comprehensively validate the deadlock prevention. Gateway draining behavior is intentional and tested. No breaking changes to existing behavior. Code is clean and maintainable.
No files require special attention

_{Last reviewed commit: f2e70dc}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2e70dcd4d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-10T13:15:59Z

src/process/command-queue.ts

        state.activeTaskIds.add(taskId);
        void (async () => {
          const startTime = Date.now();
+          const parentStack = laneExecutionScope.getStore() ?? [];


Preserve caller lane context at enqueue time

This captures lane ancestry at dequeue time (laneExecutionScope.getStore() inside drainLane) instead of when the task is enqueued, so queued middle-lane work can lose the outer lane that is still waiting on it. In a cron -> session -> cron chain where session is already busy, the delayed session callback may start without cron in scope, making the inner enqueueCommandInLane("cron", ...) queue behind the still-active outer cron task and deadlock again under load.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-10T13:15:59Z

src/process/command-queue.ts

+          const parentStack = laneExecutionScope.getStore() ?? [];
          try {
-            const result = await entry.task();
+            const result = await laneExecutionScope.run([...parentStack, lane], async () => {


Avoid duplicating lane markers on every dequeue

Building each task scope as run([...parentStack, lane], ...) causes the same lane to be appended repeatedly because pump() is invoked from the previous task’s context; with a long queue this grows the stack every task and repeatedly copies larger arrays, and stale ancestor lanes can persist longer than intended, increasing false-positive reentrancy matches and bypassing normal queueing semantics.

Useful? React with 👍 / 👎.

fix: avoid manual cron deadlock on reentrant lanes

f2e70dc

openclaw-barnacle bot added the size: S label Mar 10, 2026

chatgpt-codex-connector bot reviewed Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: avoid manual cron deadlock on reentrant lanes#42161

fix: avoid manual cron deadlock on reentrant lanes#42161
siewcapital wants to merge 1 commit intoopenclaw:mainfrom
siewcapital:codex/cron-run-41266

siewcapital commented Mar 10, 2026

Uh oh!

greptile-apps bot commented Mar 10, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 10, 2026

Uh oh!

chatgpt-codex-connector bot Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

siewcapital commented Mar 10, 2026

Summary

Change Type

Scope

Linked Issue/PR

User-visible / Behavior Changes

Security Impact

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification

Review Conversations

Compatibility / Migration

Failure Recovery (if this breaks)

Risks and Mitigations

Uh oh!

greptile-apps bot commented Mar 10, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant