-
-
Notifications
You must be signed in to change notification settings - Fork 69.4k
bug: Stale cron locks persist across gateway restarts #30096
Description
Summary
Cron job locks (runningAtMs field in cron/jobs.json) are not cleared when the gateway restarts. If a cron job is actively running when the server crashes, reboots, or the gateway process is killed, the job remains permanently locked and will never fire again.
Steps to Reproduce
- Have a cron job running (e.g., an
agentTurnjob with a 30-minute timeout) - Kill the gateway process or reboot the server while the job is mid-execution
- Restart the gateway
- Observe the job never fires again — it stays in a "running" state indefinitely
Expected Behavior
On gateway startup, any runningAtMs locks in jobs.json should be cleared, since no jobs can actually be running before the gateway process starts. Stale locks from a previous process should not carry over.
Actual Behavior
The runningAtMs timestamp persists in jobs.json across restarts. The cron scheduler sees the job as "already running" and skips it on every subsequent tick. The only fix is manually editing jobs.json or using the API to disable/re-enable the job.
Workaround
We wrote a Python script that runs as ExecStartPre in the systemd service file. It strips runningAtMs from all jobs in jobs.json before the gateway starts:
import json, pathlib
p = pathlib.Path.home() / ".openclaw/cron/jobs.json"
if p.exists():
data = json.loads(p.read_text())
jobs = data.get("jobs", data if isinstance(data, list) else [])
cleared = 0
for j in jobs:
if "runningAtMs" in j:
del j["runningAtMs"]
cleared += 1
if cleared:
p.write_text(json.dumps(data, indent=2))Impact
High — any unclean shutdown permanently disables affected cron jobs with no warning. On a system with 26+ cron jobs, this creates silent failures that are hard to detect without manual monitoring.
Environment
- OpenClaw latest (as of Feb 28, 2026)
- Oracle Cloud ARM64, Linux 6.17
- systemd-managed gateway service