Skip to content

fix: avoid manual cron deadlock on reentrant lanes#42161

Open
siewcapital wants to merge 1 commit intoopenclaw:mainfrom
siewcapital:codex/cron-run-41266
Open

fix: avoid manual cron deadlock on reentrant lanes#42161
siewcapital wants to merge 1 commit intoopenclaw:mainfrom
siewcapital:codex/cron-run-41266

Conversation

@siewcapital
Copy link
Copy Markdown

Summary

  • Problem: manual openclaw cron run <jobId> can enqueue successfully but never execute because the same cron lane is re-entered through the embedded runner path.
  • Why it matters: ad-hoc cron runs can sit until timeout even though scheduled runs still work.
  • What changed: the command queue now detects when work re-enters a lane already active in the current async call stack and runs that inner work directly instead of queuing it behind itself.
  • What did NOT change (scope boundary): this PR does not change cron delivery behavior, cron scheduling rules, or subagent queue limits.

Change Type

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Manual cron runs no longer deadlock when the cron lane is re-entered through nested queueing.
  • Scheduled cron behavior is unchanged.

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: local macOS dev environment for tests
  • Runtime/container: Node.js dev environment
  • Model/provider: not required for the regression tests
  • Integration/channel (if any): not required for the regression tests
  • Relevant config (redacted): not required

Steps

  1. Queue work in a lane.
  2. Re-enter that same lane through nested async work.
  3. Observe whether the inner task completes or waits behind itself forever.

Expected

  • Re-entrant same-lane work completes.
  • The queue does not deadlock on cron -> session -> cron re-entry.

Actual

  • Before this change, nested re-entry could queue work behind the currently active task in the same lane and time out.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification

What you personally verified (not just CI), and how:

  • Verified scenarios: ran pnpm vitest run src/process/command-queue.test.ts src/cron/service.issue-regressions.test.ts, pnpm exec oxlint src/process/command-queue.ts src/process/command-queue.test.ts, and pnpm build
  • Edge cases checked: direct same-lane re-entry and cron -> session -> cron re-entry
  • What you did not verify: full live manual cron reproduction against a running gateway with a real model provider in this branch

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert this PR or restore the previous queue behavior in src/process/command-queue.ts
  • Files/config to restore: src/process/command-queue.ts, src/process/command-queue.test.ts
  • Known bad symptoms reviewers should watch for: unexpected behavior in other intentionally queued same-lane nested workflows

Risks and Mitigations

  • Risk: same-lane nested work now bypasses queueing when the lane is already active in the current async stack
    • Mitigation: behavior is covered by focused queue tests plus the existing cron regression test file, and the change is limited to re-entrant ownership of the same lane

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 10, 2026

Greptile Summary

This PR fixes a deadlock that occurs when openclaw cron run <jobId> re-enters the same cron lane through the embedded runner path. The solution uses AsyncLocalStorage to track which lanes are currently executing in the async call stack and short-circuits enqueueCommandInLane to call the task directly when the target lane is already owned by the current async context.

The fix is well-scoped, properly tested with focused regression tests (direct re-entry and cross-lane re-entry), and includes sensible draining semantics: re-entrant work is allowed to complete even during gateway drain, which is correct because such work is part of an already-committed outer task. The code is safe to merge.

Confidence Score: 5/5

  • This PR is safe to merge — the re-entrancy fix is well-scoped, uses idiomatic async patterns, and is covered by focused regression tests.
  • The fix correctly uses AsyncLocalStorage to track lane execution context and short-circuits re-entrant calls. The two regression tests (direct and cross-lane re-entry) comprehensively validate the deadlock prevention. Gateway draining behavior is intentional and tested. No breaking changes to existing behavior. Code is clean and maintainable.
  • No files require special attention

Last reviewed commit: f2e70dc

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f2e70dcd4d

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

state.activeTaskIds.add(taskId);
void (async () => {
const startTime = Date.now();
const parentStack = laneExecutionScope.getStore() ?? [];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve caller lane context at enqueue time

This captures lane ancestry at dequeue time (laneExecutionScope.getStore() inside drainLane) instead of when the task is enqueued, so queued middle-lane work can lose the outer lane that is still waiting on it. In a cron -> session -> cron chain where session is already busy, the delayed session callback may start without cron in scope, making the inner enqueueCommandInLane("cron", ...) queue behind the still-active outer cron task and deadlock again under load.

Useful? React with 👍 / 👎.

const parentStack = laneExecutionScope.getStore() ?? [];
try {
const result = await entry.task();
const result = await laneExecutionScope.run([...parentStack, lane], async () => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid duplicating lane markers on every dequeue

Building each task scope as run([...parentStack, lane], ...) causes the same lane to be appended repeatedly because pump() is invoked from the previous task’s context; with a long queue this grows the stack every task and repeatedly copies larger arrays, and stale ancestor lanes can persist longer than intended, increasing false-positive reentrancy matches and bypassing normal queueing semantics.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Manual cron run enqueues but never executes on Linux (2026.3.8)

1 participant