fix(cron): add failure limit and exponential backoff for isolated tasks by Baoxd123 · Pull Request #8578 · openclaw/openclaw

Baoxd123 · 2026-02-04T05:49:53Z

Fixes #8520

What's Fixed

Prevents infinite retry loops in cron scheduler for isolated tasks.

Changes

Added MAX_CONSECUTIVE_FAILURES (3 failures max)
Track consecutive failures in CronJobState
Auto-disable jobs after 3 consecutive failures
Implement exponential backoff (1s, 2s, 4s retry delays)

Testing

All 5054 tests pass ✅
6 new comprehensive test cases pass

🤖 AI-Assisted Development

Built with Gemini 3 Pro, Claude 4.5 Haiku&Opus and OpenClaw. Code is fully tested and ready for production. Confirms understanding of implementation.

Greptile Overview

Greptile Summary

This PR adds failure protection to the cron scheduler by introducing CronJobState.consecutiveFailures, enforcing a max consecutive error threshold (3) that auto-disables a job, and applying exponential backoff for isolated jobs on error. It also adds a focused vitest suite covering failure counting, disabling, and backoff behavior.

The changes are centered in src/cron/service/timer.ts’s executeJob() finish path, which now mutates job state based on status and adjusts nextRunAtMs scheduling decisions accordingly. src/cron/types.ts extends the persisted job state to track failures.

Confidence Score: 3/5

This PR is likely safe to merge, but there are a couple of behavioral edge cases worth clarifying before landing.
Core approach (failure counter + disable + backoff) is straightforward and well-covered by tests, but skipped handling can unintentionally carry failures forward, and removing the finally-block resync may affect long-running job scheduling/metadata. Neither is a compile-time issue, but both can cause surprising runtime behavior.
src/cron/service/timer.ts

_{(2/5) Greptile learns from your feedback when you react with thumbs up/down!}

- Add MAX_CONSECUTIVE_FAILURES (3) to prevent infinite retry loops - Track consecutiveFailures in CronJobState - Auto-disable jobs after 3 consecutive failures - Implement exponential backoff for isolated tasks (1s, 2s, 4s) - Add comprehensive test suite (6 test cases) - Prevents API rate limit cooldown from failed cron tasks Fixes openclaw#8520

greptile-apps

_{2 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-04T05:57:34Z

src/cron/service/timer.ts

+    // Track consecutive failures
+    if (status === "error") {
+      job.state.consecutiveFailures = (job.state.consecutiveFailures ?? 0) + 1;
+    } else if (status === "ok") {
+      // Reset failure counter on success
+      job.state.consecutiveFailures = 0;
+    }


[P0] consecutiveFailures is not reset on status === "skipped", so skips can carry forward prior failures and cause an auto-disable sooner than intended.

Example: error → error → skipped → error will disable the job on that last error (counter goes 2→2→3) even though the skipped run likely indicates “not applicable / no-op” rather than continued failure. If the intent is “only consecutive errors count,” consider resetting on non-error statuses (e.g., treat skipped like ok for the counter), or explicitly decide and document otherwise.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/cron/service/timer.ts Line: 101:107 Comment: [P0] `consecutiveFailures` is not reset on `status === "skipped"`, so skips can carry forward prior failures and cause an auto-disable sooner than intended. Example: error → error → skipped → error will disable the job on that last error (counter goes 2→2→3) even though the skipped run likely indicates “not applicable / no-op” rather than continued failure. If the intent is “only consecutive *errors* count,” consider resetting on non-error statuses (e.g., treat `skipped` like `ok` for the counter), or explicitly decide and document otherwise. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-04T05:57:35Z

src/cron/service/timer.ts

  } finally {
    job.updatedAtMs = nowMs;
-    if (!opts.forced && job.enabled && !deleted) {
-      // Keep nextRunAtMs in sync in case the schedule advanced during a long run.
-      job.state.nextRunAtMs = computeJobNextRunAtMs(job, state.deps.nowMs());
-    }
+    // Note: Do not override job.state.nextRunAtMs here, as it has already been
+    // set correctly in the finish() callback with proper failure protection logic.


[P1] The finally block no longer recomputes nextRunAtMs using the current time (state.deps.nowMs()), and now always sets job.updatedAtMs = nowMs (the function arg) even if the job ran for a long time.

Previously this acted as a safety net (“schedule advanced during a long run”) and kept updatedAtMs aligned with completion. With the change, long-running jobs can end with stale updatedAtMs, and nextRunAtMs is locked to endedAt computed inside finish(). If your scheduler expects nextRunAtMs to be based on the wall clock at completion (not start), consider using endedAt for updatedAtMs and/or restoring the sync logic (while still preserving failure-protection behavior).

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/cron/service/timer.ts Line: 265:268 Comment: [P1] The `finally` block no longer recomputes `nextRunAtMs` using the *current* time (`state.deps.nowMs()`), and now always sets `job.updatedAtMs = nowMs` (the function arg) even if the job ran for a long time. Previously this acted as a safety net (“schedule advanced during a long run”) and kept `updatedAtMs` aligned with completion. With the change, long-running jobs can end with stale `updatedAtMs`, and `nextRunAtMs` is locked to `endedAt` computed inside `finish()`. If your scheduler expects `nextRunAtMs` to be based on the wall clock at completion (not start), consider using `endedAt` for `updatedAtMs` and/or restoring the sync logic (while still preserving failure-protection behavior). <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-02-04T05:57:36Z

src/cron/service/timer.failure-protection.test.ts

+
+    // First failure: should schedule retry after 1s (1000 * 2^0)
+    await executeJob(mockState as CronServiceState, testJob, nowMs, { forced: false });
+    expect(testJob.state.consecutiveFailures).toBe(1);
+    const firstBackoff = testJob.state.nextRunAtMs ? testJob.state.nextRunAtMs - nowMs : 0;
+    expect(firstBackoff).toBe(1000); // 1 second
+
+    // Second failure: should schedule retry after 2s (1000 * 2^1)
+    nowMs += 1000;


[P3] This test name/comment and expectation conflict: the test is titled "should not auto-disable main session tasks at same threshold" but it ultimately asserts expect(testJob.enabled).toBe(false) after repeated failures.

Consider renaming to reflect the actual behavior being verified (e.g., that main jobs do respect MAX_CONSECUTIVE_FAILURES) to avoid confusion when maintaining the suite.

_{Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!}

Prompt To Fix With AI

This is a comment left during a code review. Path: src/cron/service/timer.failure-protection.test.ts Line: 105:113 Comment: [P3] This test name/comment and expectation conflict: the test is titled `"should not auto-disable main session tasks at same threshold"` but it ultimately asserts `expect(testJob.enabled).toBe(false)` after repeated failures. Consider renaming to reflect the actual behavior being verified (e.g., that main jobs *do* respect `MAX_CONSECUTIVE_FAILURES`) to avoid confusion when maintaining the suite. <sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub> How can I resolve this? If you propose a fix, please make it concise.

- Reset consecutiveFailures on skipped status (skipped = not applicable/no-op) - Use state.deps.nowMs() for updatedAtMs to reflect actual completion time - Rename test to match actual behavior (main tasks also auto-disable)

Baoxd123 mentioned this pull request Feb 4, 2026

[Bug]: ⚠️Cron isolated tasks without failure limit trigger infinite retry loop → API cooldown #8520

Closed

greptile-apps bot reviewed Feb 4, 2026

View reviewed changes

Andy added 2 commits February 4, 2026 01:28

fix(cron): address code review feedback

00df281

- Reset consecutiveFailures on skipped status (skipped = not applicable/no-op) - Use state.deps.nowMs() for updatedAtMs to reflect actual completion time - Rename test to match actual behavior (main tasks also auto-disable)

fix: resolve oxlint template-expressions errors

73b7cee

sebslight mentioned this pull request Feb 13, 2026

Fix: Prevent infinite retry loop in cron isolated tasks (#8520) #8538

Closed

5 tasks

thewilloftheshadow force-pushed the main branch from bfc1ccb to f92900f Compare February 15, 2026 18:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

fix(cron): add failure limit and exponential backoff for isolated tasks#8578

fix(cron): add failure limit and exponential backoff for isolated tasks#8578
Baoxd123 wants to merge 3 commits intoopenclaw:mainfrom
Baoxd123:fix/cron-failure-protection

Baoxd123 commented Feb 4, 2026 •

edited by greptile-apps bot

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Feb 4, 2026

Uh oh!

greptile-apps bot Feb 4, 2026

Uh oh!

greptile-apps bot Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

Baoxd123 commented Feb 4, 2026 • edited by greptile-apps bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's Fixed

Changes

Testing

🤖 AI-Assisted Development

Greptile Overview

Greptile Summary

Confidence Score: 3/5

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Baoxd123 commented Feb 4, 2026 •

edited by greptile-apps bot

Loading