Cron scheduler auto-dispatch stalls after SIGUSR1 restart — jobs never fire despite scheduler reporting 'started'

## Bug Description

After a Gateway SIGUSR1 restart (triggered by a config change), the cron scheduler stops dispatching jobs automatically. The scheduler reports itself as `"started"` and advances `nextWakeAtMs` correctly, but **never actually spawns isolated sessions to execute due jobs**.

## Environment

- **OpenClaw version**: `2026.2.3-1`
- **OS**: WSL2 Ubuntu 24.04 on Windows
- **Node.js**: v22.22.0
- **Runtime**: systemd user service (`openclaw-gateway.service`)

## Steps to Reproduce

1. Have multiple recurring cron jobs configured (mix of `every` and `cron` schedules, all `sessionTarget: "isolated"`)
2. Trigger a Gateway SIGUSR1 restart (e.g., via `gateway(action="restart")` tool call or config change)
3. Observe that cron jobs stop firing after the restart

## Expected Behavior

After a SIGUSR1 restart, the cron scheduler should re-arm all timers and dispatch jobs when their `nextRunAtMs` is reached.

## Actual Behavior

- `cron(action="status")` reports `{ enabled: true, jobs: 11, nextWakeAtMs: <future timestamp> }` — looks healthy
- `nextWakeAtMs` **advances** past job fire times — the scheduler tick is running
- But **zero isolated sessions are spawned** — no `cron:<jobId>` sessions appear in `sessions_list`
- `cron(action="runs")` shows no new entries after the restart
- No `dispatch`, `spawn`, `agentTurn`, or `isolated` log entries appear in Gateway logs
- Gateway logs only show `cron: started` at boot — no subsequent dispatch activity

## Attempted Fixes (all failed to restore auto-dispatch)

1. **SIGUSR1 restart** via `gateway(action="restart")` — scheduler re-initializes but still doesn't dispatch
2. **Full process restart** via `systemctl --user restart openclaw-gateway.service` — new PID, scheduler says "started", still doesn't dispatch
3. **`cron(action="wake", mode="now")`** — returns `{ ok: true }` but no jobs fire
4. **Manually resetting `nextRunAtMs`** in `jobs.json` to past timestamps — scheduler recalculates forward from interval on restart, still doesn't dispatch
5. **Clearing `lastRunAtMs` and `lastStatus`** from all job state — same result

## Workaround That Works

**`openclaw cron run <jobId> --force`** successfully executes jobs. The execution pipeline itself is fine — only the automatic timer-based dispatch is broken.

Current workaround: system crontab entries that call `openclaw cron run <jobId> --force` on each job's interval.

## Timeline

- **Feb 5, 2:58 PM CT**: Last successful automatic cron dispatch (Kalshi scanner job)
- **Feb 5, 4:43 PM CT**: Gateway SIGUSR1 restart triggered by config change (block streaming + TTS settings)
- **Feb 5, 4:43 PM CT → Feb 7, 2:30 AM CT**: ~35 hours of zero automatic dispatches across all 11 jobs
- **Feb 6, 11:47 PM CT**: Another session detected the stall and attempted SIGUSR1 restart — didn't fix it
- **Feb 7, 2:04 AM CT**: Full systemctl restart — didn't fix auto-dispatch
- **Feb 7, 2:30 AM CT**: `openclaw cron run --force` confirmed jobs can execute manually

## Key Observation

The `wakeMode` on all jobs is `"next-heartbeat"`. The heartbeat subsystem IS running (5 min interval, confirmed in logs). But the cron dispatcher doesn't appear to check for due jobs during heartbeat ticks — or the check silently fails/skips.

## Relevant Log Entries

```
# Cron says "started" after restart but never dispatches:
{"module":"cron"} {"enabled":true,"jobs":11,"nextWakeAtMs":1770452046127} "cron: started"

# Heartbeat running fine:
{"subsystem":"gateway/heartbeat"} {"intervalMs":300000} "heartbeat: started"

# Zero entries matching: dispatch, spawn, agentTurn, isolated, fire, tick
grep -c "dispatch\|spawn\|agentTurn\|isolated" /tmp/openclaw/openclaw-2026-02-07.log
# Result: 0
```

## Job Store Path

`~/.openclaw/cron/jobs.json` — 11 jobs, all `enabled: true`, mix of `schedule.kind: "every"` (5m, 15m, 60m, 180m, 360m) and `schedule.kind: "cron"`.

## Suggested Investigation

The scheduler tick advances `nextWakeAtMs` but the dispatch codepath that should compare `nextRunAtMs` against `Date.now()` and spawn isolated sessions appears to be silently broken after SIGUSR1. Possibly a stale reference to the old scheduler instance, or the dispatch callback isn't re-registered on the new timer.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cron scheduler auto-dispatch stalls after SIGUSR1 restart — jobs never fire despite scheduler reporting 'started' #11013

Bug Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Attempted Fixes (all failed to restore auto-dispatch)

Workaround That Works

Timeline

Key Observation

Relevant Log Entries

Job Store Path

Suggested Investigation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Cron scheduler auto-dispatch stalls after SIGUSR1 restart — jobs never fire despite scheduler reporting 'started' #11013

Description

Bug Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Attempted Fixes (all failed to restore auto-dispatch)

Workaround That Works

Timeline

Key Observation

Relevant Log Entries

Job Store Path

Suggested Investigation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions