-
-
Notifications
You must be signed in to change notification settings - Fork 39.8k
Description
Bug Description
After a Gateway SIGUSR1 restart (triggered by a config change), the cron scheduler stops dispatching jobs automatically. The scheduler reports itself as "started" and advances nextWakeAtMs correctly, but never actually spawns isolated sessions to execute due jobs.
Environment
- OpenClaw version:
2026.2.3-1 - OS: WSL2 Ubuntu 24.04 on Windows
- Node.js: v22.22.0
- Runtime: systemd user service (
openclaw-gateway.service)
Steps to Reproduce
- Have multiple recurring cron jobs configured (mix of
everyandcronschedules, allsessionTarget: "isolated") - Trigger a Gateway SIGUSR1 restart (e.g., via
gateway(action="restart")tool call or config change) - Observe that cron jobs stop firing after the restart
Expected Behavior
After a SIGUSR1 restart, the cron scheduler should re-arm all timers and dispatch jobs when their nextRunAtMs is reached.
Actual Behavior
cron(action="status")reports{ enabled: true, jobs: 11, nextWakeAtMs: <future timestamp> }— looks healthynextWakeAtMsadvances past job fire times — the scheduler tick is running- But zero isolated sessions are spawned — no
cron:<jobId>sessions appear insessions_list cron(action="runs")shows no new entries after the restart- No
dispatch,spawn,agentTurn, orisolatedlog entries appear in Gateway logs - Gateway logs only show
cron: startedat boot — no subsequent dispatch activity
Attempted Fixes (all failed to restore auto-dispatch)
- SIGUSR1 restart via
gateway(action="restart")— scheduler re-initializes but still doesn't dispatch - Full process restart via
systemctl --user restart openclaw-gateway.service— new PID, scheduler says "started", still doesn't dispatch cron(action="wake", mode="now")— returns{ ok: true }but no jobs fire- Manually resetting
nextRunAtMsinjobs.jsonto past timestamps — scheduler recalculates forward from interval on restart, still doesn't dispatch - Clearing
lastRunAtMsandlastStatusfrom all job state — same result
Workaround That Works
openclaw cron run <jobId> --force successfully executes jobs. The execution pipeline itself is fine — only the automatic timer-based dispatch is broken.
Current workaround: system crontab entries that call openclaw cron run <jobId> --force on each job's interval.
Timeline
- Feb 5, 2:58 PM CT: Last successful automatic cron dispatch (Kalshi scanner job)
- Feb 5, 4:43 PM CT: Gateway SIGUSR1 restart triggered by config change (block streaming + TTS settings)
- Feb 5, 4:43 PM CT → Feb 7, 2:30 AM CT: ~35 hours of zero automatic dispatches across all 11 jobs
- Feb 6, 11:47 PM CT: Another session detected the stall and attempted SIGUSR1 restart — didn't fix it
- Feb 7, 2:04 AM CT: Full systemctl restart — didn't fix auto-dispatch
- Feb 7, 2:30 AM CT:
openclaw cron run --forceconfirmed jobs can execute manually
Key Observation
The wakeMode on all jobs is "next-heartbeat". The heartbeat subsystem IS running (5 min interval, confirmed in logs). But the cron dispatcher doesn't appear to check for due jobs during heartbeat ticks — or the check silently fails/skips.
Relevant Log Entries
# Cron says "started" after restart but never dispatches:
{"module":"cron"} {"enabled":true,"jobs":11,"nextWakeAtMs":1770452046127} "cron: started"
# Heartbeat running fine:
{"subsystem":"gateway/heartbeat"} {"intervalMs":300000} "heartbeat: started"
# Zero entries matching: dispatch, spawn, agentTurn, isolated, fire, tick
grep -c "dispatch\|spawn\|agentTurn\|isolated" /tmp/openclaw/openclaw-2026-02-07.log
# Result: 0
Job Store Path
~/.openclaw/cron/jobs.json — 11 jobs, all enabled: true, mix of schedule.kind: "every" (5m, 15m, 60m, 180m, 360m) and schedule.kind: "cron".
Suggested Investigation
The scheduler tick advances nextWakeAtMs but the dispatch codepath that should compare nextRunAtMs against Date.now() and spawn isolated sessions appears to be silently broken after SIGUSR1. Possibly a stale reference to the old scheduler instance, or the dispatch callback isn't re-registered on the new timer.