[Bug] Cron scheduler lane wedges (jobs stop running for hours) while gateway remains responsive; restart clears

## Summary
Cron jobs stop executing for long periods (hours+) even though the Gateway process remains up and interactive commands may still work. The internal cron lane appears to wedge behind a job that never reaches a `lane task done: lane=cron` completion.

This causes **all scheduled digests/alerts to silently stop**, and any cron-based watchdogs also stop (same failure domain). Manual gateway restart clears the condition.

## Environment
- Host: macOS (arm64)
- OpenClaw: 2026.2.3-1
- Gateway mode: local (ws://127.0.0.1:18789)

## What I observe
- Regular cron jobs (e.g. `telegram-health-check`, `gateway-rpc-health-check`, `notification-audit`, etc.) run normally for a while.
- Then at some point, a cron session starts processing and the system never logs `lane task done: lane=cron` afterwards.
- After that moment, **no cron sessions produce transcript activity** for many hours (verified by `~/.openclaw/agents/main/sessions/*.jsonl` mtimes and a local watcher state file).
- Gateway restart restores cron execution.

## Evidence (log excerpts)
Captured from an incident bundle created on the host:

### Last known activity
From my local state file (`cron-scheduler-watch.json`):
- `lastSeenActivityEpoch`: **1770309855** (Feb 5, 2026 ~08:44 PT)

### Cron lane dequeues a job and then stops completing
From `incident-20260205-084413.log` around **2026-02-05T16:44:00Z**:
- `lane enqueue: lane=session:agent:main:cron:5eb4a3da-... queueSize=1`
- `lane dequeue: lane=session:agent:main:cron:5eb4a3da-...`
- `lane enqueue: lane=cron queueSize=1`
- `lane dequeue: lane=cron ...`
- `embedded run start: runId=acc80e6c-... sessionId=acc80e6c-...`

After that point, there is **no** subsequent `lane task done: lane=cron` entry in that capture.

(For context, earlier in the same capture there are multiple successful `lane task done: lane=cron` lines, e.g. at 16:30, 16:32, 16:34, 16:35, 16:42Z.)

## Impact
- Missed scheduled digests and alerting (e.g. daily suggestions digest, Reddit digest, etc.)
- Cron-based watchdogs cannot detect/recover because they also stop when cron is wedged.

## Workaround
I installed an out-of-band watchdog (macOS LaunchAgent) that runs every 5 minutes:
- Checks for recent cron transcript activity.
- If stale, captures an incident bundle and mutex-restarts the gateway.

This reduces user-visible downtime but is obviously not ideal.

## Request
1. Any known issue/duplicate for **cron lane wedging** specifically?
2. Can cron execution be protected with per-job timeouts / cancellation so one stuck job cannot wedge the entire scheduler?
3. Any recommended debug flags to capture where the cron worker is blocked (e.g. stuck `exec`, stuck gateway admin request, stuck model call)?

If maintainers want, I can attach the full incident bundle(s) (they contain redacted config summaries and gateway logs).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Cron scheduler lane wedges (jobs stop running for hours) while gateway remains responsive; restart clears #10904

Summary

Environment

What I observe

Evidence (log excerpts)

Last known activity

Cron lane dequeues a job and then stops completing

Impact

Workaround

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] Cron scheduler lane wedges (jobs stop running for hours) while gateway remains responsive; restart clears #10904

Description

Summary

Environment

What I observe

Evidence (log excerpts)

Last known activity

Cron lane dequeues a job and then stops completing

Impact

Workaround

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions