-
-
Notifications
You must be signed in to change notification settings - Fork 69.3k
Gateway becomes unresponsive when restarted with overdue cron jobs #18892
Copy link
Copy link
Closed
Labels
staleMarked as stale due to inactivityMarked as stale due to inactivity
Description
Bug Description
When the gateway is restarted while multiple cron jobs have nextRunAtMs values in the past (i.e., they were scheduled to fire while the gateway was down or being restarted), the gateway attempts to fire all overdue jobs simultaneously on startup. This overwhelms the gateway process and makes it completely unresponsive — the UI shows no cron jobs, openclaw cron list times out after 30s, and the WebSocket endpoint stops responding to cron-related requests.
Other gateway functions (e.g., openclaw status, sessions.list from UI) continue to work, suggesting the cron subsystem specifically is blocked.
Steps to Reproduce
- Set up multiple cron jobs with
nextRunAtMsstate values - Stop the gateway (or let it crash during a cron run)
- Wait until several cron jobs become overdue (their
nextRunAtMsis in the past) - Restart the gateway
- Immediately try
openclaw cron listor check the UI → times out / shows no cron jobs
Expected Behavior
The gateway should either:
- Stagger overdue jobs — fire them one at a time with a configurable delay between each, rather than all at once
- Skip overdue jobs — detect that the scheduled time has passed and advance
nextRunAtMsto the next occurrence based on the cron expression - Rate-limit concurrent cron executions — enforce a maximum number of simultaneous cron sessions (ideally configurable, default 1)
Actual Behavior
- All overdue cron jobs fire simultaneously on startup
- The gateway process (Node.js, ~430MB RSS) becomes completely unresponsive to cron-related WebSocket requests
openclaw cron listreturnsError: gateway timeout after 30000ms- UI shows no cron jobs
- Gateway logs show
shutdown timed out; exiting without full cleanupon subsequent restart attempts - The only fix is to manually edit
jobs.jsonto push allnextRunAtMsvalues into the future, then restart
Workaround
Before restarting the gateway, push all overdue nextRunAtMs values into the future:
import json
from datetime import datetime, timezone, timedelta
with open('~/.openclaw/cron/jobs.json') as f:
data = json.load(f)
now = datetime.now(timezone.utc)
min_ms = int((now + timedelta(minutes=10)).timestamp() * 1000)
for job in data['jobs']:
nrm = job.get('state', {}).get('nextRunAtMs', 0)
if nrm < min_ms:
job['state']['nextRunAtMs'] = min_ms
with open('~/.openclaw/cron/jobs.json', 'w') as f:
json.dump(data, f, indent=2)Then restart the gateway.
Environment
- OpenClaw version: 2026.2.15
- Node.js: 22.17.1
- OS: Ubuntu 24.04 (Linux 6.8.0-94-generic x64)
- Cron jobs: 45 jobs configured, 10 were overdue at time of restart
- Gateway config: local loopback, single instance
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
staleMarked as stale due to inactivityMarked as stale due to inactivity
Type
Fields
Give feedbackNo fields configured for issues without a type.