Skip to content

Gateway becomes unresponsive when restarted with overdue cron jobs #18892

@kaisnowy

Description

@kaisnowy

Bug Description

When the gateway is restarted while multiple cron jobs have nextRunAtMs values in the past (i.e., they were scheduled to fire while the gateway was down or being restarted), the gateway attempts to fire all overdue jobs simultaneously on startup. This overwhelms the gateway process and makes it completely unresponsive — the UI shows no cron jobs, openclaw cron list times out after 30s, and the WebSocket endpoint stops responding to cron-related requests.

Other gateway functions (e.g., openclaw status, sessions.list from UI) continue to work, suggesting the cron subsystem specifically is blocked.

Steps to Reproduce

  1. Set up multiple cron jobs with nextRunAtMs state values
  2. Stop the gateway (or let it crash during a cron run)
  3. Wait until several cron jobs become overdue (their nextRunAtMs is in the past)
  4. Restart the gateway
  5. Immediately try openclaw cron list or check the UI → times out / shows no cron jobs

Expected Behavior

The gateway should either:

  • Stagger overdue jobs — fire them one at a time with a configurable delay between each, rather than all at once
  • Skip overdue jobs — detect that the scheduled time has passed and advance nextRunAtMs to the next occurrence based on the cron expression
  • Rate-limit concurrent cron executions — enforce a maximum number of simultaneous cron sessions (ideally configurable, default 1)

Actual Behavior

  • All overdue cron jobs fire simultaneously on startup
  • The gateway process (Node.js, ~430MB RSS) becomes completely unresponsive to cron-related WebSocket requests
  • openclaw cron list returns Error: gateway timeout after 30000ms
  • UI shows no cron jobs
  • Gateway logs show shutdown timed out; exiting without full cleanup on subsequent restart attempts
  • The only fix is to manually edit jobs.json to push all nextRunAtMs values into the future, then restart

Workaround

Before restarting the gateway, push all overdue nextRunAtMs values into the future:

import json
from datetime import datetime, timezone, timedelta

with open('~/.openclaw/cron/jobs.json') as f:
    data = json.load(f)

now = datetime.now(timezone.utc)
min_ms = int((now + timedelta(minutes=10)).timestamp() * 1000)

for job in data['jobs']:
    nrm = job.get('state', {}).get('nextRunAtMs', 0)
    if nrm < min_ms:
        job['state']['nextRunAtMs'] = min_ms

with open('~/.openclaw/cron/jobs.json', 'w') as f:
    json.dump(data, f, indent=2)

Then restart the gateway.

Environment

  • OpenClaw version: 2026.2.15
  • Node.js: 22.17.1
  • OS: Ubuntu 24.04 (Linux 6.8.0-94-generic x64)
  • Cron jobs: 45 jobs configured, 10 were overdue at time of restart
  • Gateway config: local loopback, single instance

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions