Skip to content

bug: every-type cron jobs never fire — recomputeNextRuns race in onTimer #9788

@pycckuu

Description

@pycckuu

Summary

every-type cron jobs (e.g. everyMs: 600000) never execute because onTimer calls recomputeNextRuns() before runDueJobs(), pushing nextRunAtMs past now on every tick.

Environment

  • OpenClaw version: 2026.2.3-1
  • OS: macOS (Darwin 24.5.0, arm64)
  • Node.js: v22.22.0
  • Gateway: Running as daemon

Root Cause

In src/cron/service/timer.ts, the onTimer flow is:

async function onTimer(state) {
  if (state.running) return;
  state.running = true;
  try {
    await locked(state, async () => {
      await ensureLoaded(state, { forceReload: true });  // ← reloads + recomputes
      await runDueJobs(state);                           // ← checks due AFTER recompute
      await persist(state);
      armTimer(state);
    });
  } finally {
    state.running = false;
  }
}

ensureLoaded(forceReload: true) calls recomputeNextRuns(), which calls computeNextRunAtMs() for each job.

For every-type jobs, computeNextRunAtMs uses a ceiling division:

const anchor = Math.max(0, Math.floor(schedule.anchorMs ?? nowMs));
const elapsed = nowMs - anchor;
return anchor + Math.max(1, Math.floor((elapsed + everyMs - 1) / everyMs)) * everyMs;

The race:

  1. Timer fires at T + ε (setTimeout always fires a few ms late)
  2. recomputeNextRuns runs with now = T + ε
  3. Without anchorMs: anchor = now, so result is now + everyMs → always in the future
  4. With anchorMs: since now = T + ε > T (past the slot boundary), ceiling division returns T + everyMs → also in the future
  5. runDueJobs checks now >= nextRunAtMsT + ε >= T + everyMsfalse → not due
  6. Timer re-armed for T + everyMs. Same thing repeats forever.

The job is perpetually deferred because recomputeNextRuns always returns a strictly future time.

Steps to Reproduce

  1. Create an every-type cron job:
{
  "name": "Test every 5min",
  "schedule": { "kind": "every", "everyMs": 300000 },
  "sessionTarget": "isolated",
  "enabled": true,
  "payload": { "kind": "agentTurn", "message": "Say hello" }
}
  1. Wait for the timer to fire
  2. Check openclaw cron runs --id <jobId> — zero runs
  3. openclaw cron run <jobId> --force works fine (bypasses the timer path)

Observed Behavior

  • All 5 every-type jobs stopped running after gateway restart
  • Zero runs recorded for 6+ hours despite correct nextRunAtMs values
  • nextRunAtMs silently advances by everyMs each tick without executing
  • cron.run --force works, confirming the scheduler and job execution are functional
  • cron-expression jobs may also be affected (none fired either after restart)

Evidence

  • Email Checker (every: 15min): ran fine 08:06–10:41 UTC, then zero runs
  • Docker Build Watcher (every: 10min): ran fine until 10:47 UTC, then zero runs
  • Gateway restarted at 12:57 UTC — no jobs ran after restart
  • Watched Docker Build Watcher: nextRunAtMs moved from 1770312000000 to 1770312600000 (+10min) without executing
  • Setting anchorMs did not help (ceiling division still overshoots by ε ms)

Suggested Fix

Option A: Snapshot nextRunAtMs before reload

async function onTimer(state) {
  // ...
  const dueSnapshot = state.store?.jobs
    .filter(j => j.enabled && typeof j.state.nextRunAtMs === "number")
    .map(j => ({ id: j.id, dueAt: j.state.nextRunAtMs }));
  
  await ensureLoaded(state, { forceReload: true });
  
  // Restore pre-reload nextRunAtMs for due check
  for (const snap of dueSnapshot) {
    const job = state.store.jobs.find(j => j.id === snap.id);
    if (job && snap.dueAt <= now) job.state.nextRunAtMs = snap.dueAt;
  }
  
  await runDueJobs(state);
  // ...
}

Option B: Skip recomputeNextRuns in timer path

await ensureLoaded(state, { forceReload: true, skipRecompute: true });

Option C: Change ceiling to floor in computeNextRunAtMs for the current-slot case:

// Return current slot if now is within it, next slot otherwise
const slot = Math.floor(elapsed / everyMs);
const slotStart = anchor + slot * everyMs;
return slotStart >= nowMs ? slotStart : slotStart + everyMs;

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions