Skip to content

fix(cron): prevent spin loop when job completes within scheduled second#18073

Merged
steipete merged 1 commit intoopenclaw:mainfrom
widingmarcus-cyber:fix/cron-spin-loop-17821
Feb 16, 2026
Merged

fix(cron): prevent spin loop when job completes within scheduled second#18073
steipete merged 1 commit intoopenclaw:mainfrom
widingmarcus-cyber:fix/cron-spin-loop-17821

Conversation

@widingmarcus-cyber
Copy link
Copy Markdown
Contributor

@widingmarcus-cyber widingmarcus-cyber commented Feb 16, 2026

Problem

When a cron job (e.g. 0 13 * * *) fires and completes within the same wall-clock second it was scheduled for, computeNextRunAtMs can return undefined for that second. This causes the scheduler to immediately recompute, find the job still "due", and re-trigger it — creating a spin loop of 100+ phantom executions per second.

Reported in #17821 with clear evidence: jobs firing with durationMs: 0 and nextRunAtMs stuck at the same timestamp.

Root Cause

computeNextRunAtMs floors nowMs to the current second boundary before asking croner for the next run. When the job completes within the same second it was scheduled for (common for fast isolated jobs), the floored time can still match the schedule, and depending on croner version/timezone, nextRun() may return the same second — which fails the nextMs > nowSecondMs check and returns undefined.

undefined nextRunAtMs triggers recomputeNextRuns to retry with the current nowMs, which is still in the same second → same result → tight loop.

Fix (two layers)

1. computeJobNextRunAtMs fallback (jobs.ts)

When computeNextRunAtMs returns undefined for a cron-kind schedule, retry with the ceiling (next second) as reference time. This guarantees we always land on the next valid occurrence rather than returning undefined.

2. MIN_REFIRE_GAP_MS safety net (timer.ts)

After a successful cron job run, ensure nextRunAtMs is at least 2 seconds in the future. This is a belt-and-suspenders guard that breaks any remaining spin-loop edge case. The 2s gap never affects normal schedules (where the natural next run is hours/days away) and only applies to cron-kind schedules (not every or at).

Tests

  • New regression test: simulates a 0 13 * * * job completing in 7ms (within the scheduled second), verifies it fires exactly once and nextRunAtMs advances to the next day
  • All 111 existing cron tests pass

Fixes #17821

Greptile Summary

Fixes a spin-loop bug (#17821) where a cron job completing within the same wall-clock second it was scheduled for would cause computeNextRunAtMs to return undefined, leading to immediate recomputation and 100+ phantom re-executions per second.

  • Layer 1 — computeJobNextRunAtMs fallback (jobs.ts): When computeNextRunAtMs returns undefined for a cron-kind schedule, retries with the next-second ceiling as reference time, ensuring the next valid occurrence is always found.
  • Layer 2 — MIN_REFIRE_GAP_MS safety net (timer.ts): After a successful cron job run, ensures nextRunAtMs is at least 2 seconds in the future. This belt-and-suspenders guard only applies to cron-kind schedules and never affects normal schedules (where the natural next run is hours/days away).
  • Regression test: Simulates a daily 0 13 * * * job completing in 7ms within the scheduled second, verifying it fires exactly once and nextRunAtMs correctly advances to the next day.

Both fixes are properly scoped to cron-kind schedules only, leaving every and at schedule types unaffected. The defense-in-depth approach is sound — the computeJobNextRunAtMs fallback addresses the root cause, while the MIN_REFIRE_GAP_MS guard protects against any remaining edge cases from croner/timezone interactions.

Confidence Score: 5/5

  • This PR is safe to merge — it adds targeted, well-scoped defensive guards with no risk to existing behavior.
  • The changes are minimal, well-understood, and narrowly scoped to cron-kind schedules. Both layers of defense are logically sound: the computeJobNextRunAtMs fallback directly addresses the root cause (floored nowMs matching the schedule), and the MIN_REFIRE_GAP_MS guard provides a safety net without affecting normal schedules. The regression test directly validates the exact scenario from the bug report. All guards use Math.max with the natural next run, so they can only delay — never skip — legitimate schedule times.
  • No files require special attention.

Last reviewed commit: 93f0767

…nd (openclaw#17821)

When a cron job fires and completes within the same wall-clock second it
was scheduled for, the next-run computation could return undefined or the
same second, causing the scheduler to re-trigger the job hundreds of
times in a tight loop.

Two-layer fix:

1. computeJobNextRunAtMs: When computeNextRunAtMs returns undefined for a
   cron-kind schedule (edge case where floored nowSecondMs matches the
   schedule), retry with the ceiling (next second) as reference time.
   This ensures we always get the next valid occurrence.

2. applyJobResult: Add MIN_REFIRE_GAP_MS (2s) safety net for cron-kind
   jobs.  After a successful run, nextRunAtMs is guaranteed to be at
   least 2s in the future.  This breaks any remaining spin-loop edge
   cases without affecting normal daily/hourly schedules (where the
   natural next run is hours/days away).

Fixes openclaw#17821
@steipete steipete merged commit 8af4712 into openclaw:main Feb 16, 2026
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Cron Job Spin Loop Bug Report

2 participants