Skip to content

Cron scheduler stuck with stale nextWakeAtMs after gateway restarts — misses all scheduled slots until external API call #10564

@zhiyuanw101

Description

@zhiyuanw101

Bug Description

After multiple gateway restarts (triggered by config hot-reload via SIGUSR1), the cron scheduler sets a nextWakeAtMs timestamp that is already in the past. It then never self-corrects — all subsequent scheduled slots are silently missed until an external cron API call (e.g. cron list) forces a recalculation.

Environment

  • OS: macOS 14 (arm64)
  • Node: v25.5.0
  • OpenClaw: latest (installed via npm)
  • Cron job: 0 7-19 * * 1-5 (hourly 7am–7pm PST, weekdays)
  • Session target: isolated agentTurn

Steps to Reproduce

  1. Have a cron job with schedule 0 7-19 * * 1-5 (tz: America/Los_Angeles)
  2. Edit workspace files multiple times between 12:00–1:00 AM PST, triggering gateway config hot-reload (SIGUSR1 restarts)
  3. Gateway restarts ~8 times between 12:14–12:48 AM
  4. After final restart, cron module logs: nextWakeAtMs: 17703684000001:00 AM PST (already in the past, and outside the 7-19 schedule window)
  5. Gateway stabilizes and stays running from 12:48 AM onward
  6. Mac stays awake (process keeping it alive)
  7. 7:00 AM, 8:00 AM, 9:00 AM slots are all silently missed — zero cron activity in logs
  8. Only TypeError: fetch failed (non-fatal, unrelated) appears during 7-9 AM window
  9. At 9:12 AM, a manual cron list API call is made
  10. Scheduler recalculates nextWakeAtMs to 177040080000010:00 AM PST (correct next slot)

Expected Behavior

The cron scheduler should:

  • Detect stale nextWakeAtMs on startup/restart and recalculate to the next valid future slot
  • Have a periodic self-check timer that catches missed slots
  • Not require an external API call to "unstick" itself

Relevant Log Entries

# Multiple restarts between 08:14–08:48 UTC (12:14–12:48 AM PST)
cron: started  {enabled:true, jobs:2, nextWakeAtMs:1770368400000}  # 1:00 AM PST — already past!
cron: started  {enabled:true, jobs:2, nextWakeAtMs:1770368400000}  # same stale value
# ... repeated 8 times

# 15:00–17:00 UTC (7–9 AM PST): ZERO cron activity
# Only non-fatal fetch errors appear

# 17:12 UTC (9:12 AM PST): External cron API call triggers recalculation
# nextWakeAtMs jumps to 1770400800000 (10:00 AM PST) ✅

Impact

Cron jobs silently fail to fire for hours after gateway restarts. Users have no indication anything is wrong until they manually check.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions