Skip to content

Cron: fix manual isolated run deadlock#41532

Open
sahancava wants to merge 1 commit intoopenclaw:mainfrom
sahancava:fix/41507-cron-manual-execution
Open

Cron: fix manual isolated run deadlock#41532
sahancava wants to merge 1 commit intoopenclaw:mainfrom
sahancava:fix/41507-cron-manual-execution

Conversation

@sahancava
Copy link
Copy Markdown
Contributor

Summary

Describe the problem and fix in 2-5 bullets:

  • Problem: Manual cron job triggers from both the API/tooling and the dashboard Run button returned ok: true / enqueued: true, but isolated cron sessions were never spawned, so the job silently never executed.
  • Why it matters: This breaks the operator escape hatch for cron jobs. Scheduled runs still work, but any manual retry, backfill, or on-demand execution path becomes a no-op even though the caller sees a success response.
  • What changed:
    • Identified that manual cron.run dispatch was enqueuing onto the same global cron lane later re-entered by isolated cron agent turns, which created a self-deadlock.
    • Added a dedicated cron-dispatch lane for manual background dispatch while keeping actual cron job execution on the existing cron lane.
    • Wired the new lane into startup and reload concurrency configuration so it respects the same cron concurrency cap.
    • Added regression coverage in src/cron/service.issue-regressions.test.ts to prove manual isolated runs no longer deadlock when the isolated executor re-enters the cron lane.
  • What did NOT change (scope boundary): Scheduled cron execution logic is unchanged. Cron job payload semantics, isolated session behavior, and the dashboard/API contract are unchanged aside from manual triggers now actually executing.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

Users manually running cron jobs from the dashboard or the cron.run API/tool path will now get a real isolated execution instead of a silent enqueue-without-execution no-op.

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node.js 22.x+
  • Model/provider: N/A
  • Integration/channel (if any): Gateway cron API + dashboard/manual trigger path
  • Relevant config (redacted): Cron job with sessionTarget: "isolated" and payload.kind: "agentTurn"

Steps

  1. Create or use an existing cron job that successfully runs on schedule.
  2. Trigger it manually through the dashboard Run button or cron run --jobId <id> --runMode force.
  3. Inspect cron run history and session spawning.

Expected

  • Manual trigger enqueues background work, spawns the isolated cron session, and records run history.

Actual

  • Prior to this PR, the API returned success but the work was deadlocked on the shared cron lane, so no isolated session spawned and no run history was recorded.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

(Note: Added a regression test that simulates the isolated executor re-entering the cron lane and confirms the manual run completes instead of stalling.)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: Ran pnpm exec vitest run --config vitest.config.ts src/cron/service.issue-regressions.test.ts and pnpm exec vitest run --config vitest.config.ts src/gateway/server.cron.test.ts after the fix.
  • Edge cases checked: Verified that the manual dispatch path uses a separate lane while isolated cron execution can still enqueue work onto the existing cron lane without deadlocking.
  • What you did not verify: Did not manually click the dashboard Run button or exercise a live cron job on a running gateway instance.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: Revert this specific PR commit.
  • Known bad symptoms reviewers should watch for: Cron manual triggers start but unexpectedly ignore the configured cron concurrency cap, which would suggest the new dispatch lane is no longer sharing the expected concurrency setting.

Risks and Mitigations

  • Risk: Adding a second lane for manual dispatch could drift from the configured cron concurrency if its limit is not kept in sync.
    • Mitigation: The new cron-dispatch lane is explicitly initialized and reloaded with cfg.cron?.maxConcurrentRuns ?? 1, and regression coverage exercises the deadlock case directly.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 9, 2026

Greptile Summary

This PR fixes a silent deadlock in manual cron job execution by routing enqueueRun (the manual/dashboard Run path) through a new dedicated CommandLane.CronDispatch lane instead of the shared CommandLane.Cron lane. The root cause was that isolated cron executors re-enter CommandLane.Cron internally, so dispatching onto that same lane from a manual trigger created a self-deadlock where the dispatch held the only available Cron slot while waiting for the isolated executor that needed that same slot.

Key changes:

  • src/process/lanes.ts: New CronDispatch = "cron-dispatch" lane added to CommandLane.
  • src/cron/service/ops.ts: enqueueRun now dispatches onto CronDispatch; also calls setCommandLaneConcurrency defensively on each invocation (see inline comment).
  • src/gateway/server-lanes.ts and server-reload-handlers.ts: CronDispatch concurrency is initialized and kept in sync with the existing Cron lane cap on startup and live reload.
  • src/cron/service.issue-regressions.test.ts: New regression test directly exercises the previously deadlocking scenario and confirms it completes; existing tests updated to clear the new lane.

Confidence Score: 4/5

  • Safe to merge; the fix is correct and well-tested, with one minor side-effect in dispatch logic worth cleaning up.
  • The deadlock fix is structurally sound — separating the dispatch lane from the execution lane is the right solution. The startup, reload, and test paths are all correctly updated. The only concern is setCommandLaneConcurrency being called inside enqueueRun on every invocation: it's redundant with the gateway lifecycle handlers and could in theory revert a live config-reload update if a concurrent enqueueRun carries a stale state.deps.cronConfig. This is a low-severity style issue, not a correctness blocker for the stated fix.
  • src/cron/service/ops.ts — the setCommandLaneConcurrency call at line 538 inside enqueueRun should be reviewed for potential interference with config reload handlers.

Last reviewed commit: 35669a6

@sahancava sahancava force-pushed the fix/41507-cron-manual-execution branch from d1ba93a to 3da7835 Compare March 10, 2026 00:03
@openclaw-barnacle openclaw-barnacle bot added channel: googlechat Channel integration: googlechat extensions: memory-core Extension: memory-core labels Mar 10, 2026
@benclawbot
Copy link
Copy Markdown

For the duplicate message issue, this is likely related to how messages are broadcast. Check if broadcast and nodeSendToSession are both being called.

@sahancava
Copy link
Copy Markdown
Contributor Author

For the duplicate message issue, this is likely related to how messages are broadcast. Check if broadcast and nodeSendToSession are both being called.

This PR only changes cron manual dispatch laneing and does not touch the chat delivery paths that use broadcast / nodeSendToSession. The duplicate-message hypothesis appears unrelated to #41507.

@sahancava sahancava force-pushed the fix/41507-cron-manual-execution branch from 3da7835 to d4b8a00 Compare March 10, 2026 03:18
@openclaw-barnacle openclaw-barnacle bot removed channel: googlechat Channel integration: googlechat extensions: memory-core Extension: memory-core labels Mar 10, 2026
@sahancava sahancava force-pushed the fix/41507-cron-manual-execution branch 2 times, most recently from 03c6f9f to f273b57 Compare March 10, 2026 05:12
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f273b5787e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/41507-cron-manual-execution branch from f273b57 to 1baf480 Compare March 10, 2026 05:19
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1baf480521

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0bc02aef04

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/41507-cron-manual-execution branch from d7897e7 to 4edc393 Compare March 10, 2026 06:04
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4edc393b2e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd9ccd1194

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/41507-cron-manual-execution branch from af572f1 to 2ae6857 Compare March 10, 2026 14:50
@openclaw-barnacle openclaw-barnacle bot added agents Agent runtime and tooling size: XL and removed size: M labels Mar 10, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4025ce618a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3be236b71f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cb68156143

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/41507-cron-manual-execution branch from 9a78fc5 to d585251 Compare March 13, 2026 18:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d5852516e1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9fd7e04f2b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/41507-cron-manual-execution branch from 9fd7e04 to 98b301a Compare March 13, 2026 19:00
@openclaw-barnacle openclaw-barnacle bot added the channel: telegram Channel integration: telegram label Mar 13, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6da2467687

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 039dc60f3b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@sahancava sahancava force-pushed the fix/41507-cron-manual-execution branch from 039dc60 to fd47ae1 Compare March 13, 2026 22:36
@openclaw-barnacle openclaw-barnacle bot added size: L and removed channel: telegram Channel integration: telegram agents Agent runtime and tooling size: XL labels Mar 13, 2026
@sahancava
Copy link
Copy Markdown
Contributor Author

Scope/status update for this PR (fix/41507-cron-manual-execution):

  • Diff remains cron-scoped only (changedFiles: 8):
    • src/cron/service/ops.ts
    • src/cron/service.issue-regressions.test.ts
    • src/cron/service/state.ts
    • src/cron/service/timer.ts
    • src/gateway/server-cron.ts
    • src/gateway/server-lanes.ts
    • src/gateway/server-reload-handlers.ts
    • src/process/lanes.ts
  • No unrelated infra/path fixes were added in this PR.

Baseline evidence (same failures are already present on main):

So current red checks here are baseline/mainline test instability, not regressions introduced by this cron change set.

Targeted PR-scope validation reruns (local) are green:

  • pnpm exec vitest run --config vitest.config.ts src/cron/service.issue-regressions.test.ts (44 passed)
  • pnpm exec vitest run --config vitest.config.ts src/gateway/server.cron.test.ts (7 passed)
  • pnpm exec oxfmt --check on the 8 touched cron/gateway-lane files (clean)

Plan from here: keep this PR scope-locked and rebase/re-run full CI once main is green again.

@sahancava sahancava force-pushed the fix/41507-cron-manual-execution branch from fd47ae1 to 0adb365 Compare March 13, 2026 23:39
@sahancava sahancava force-pushed the fix/41507-cron-manual-execution branch from 0adb365 to 139c796 Compare March 13, 2026 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Cron: manual job triggers via API/dashboard enqueue but never execute

2 participants