Cron: kind:"at" jobs with non-"ok" terminal status loop forever at 100% CPU

## Summary

One-shot cron jobs (`kind: "at"`) that complete with a non-`"ok"` status (e.g., `"skipped"`, `"error"`) are never disabled. This causes `computeJobNextRunAtMs()` to keep returning the past scheduled time, which triggers a `setTimeout(..., 0)` tight loop that pegs the gateway at 100%+ CPU indefinitely.

## Environment

- **OpenClaw version:** 2026.2.6-3
- **OS:** Raspberry Pi OS (Linux 6.12.62+rpt-rpi-2712, aarch64)
- **Install method:** npm global

## Steps to Reproduce

1. Create a one-shot cron job scheduled for a time in the near future:
   ```json
   {
     "name": "test-reminder",
     "schedule": {"kind": "at", "atMs": <timestamp_2_minutes_from_now>},
     "sessionTarget": "isolated",
     "wakeMode": "now",
     "payload": {"kind": "systemEvent", "text": "test"}
   }
   ```

2. Wait for the scheduled time to pass

3. If the job completes with `lastStatus: "skipped"` (or any status other than `"ok"`), the loop begins

## Expected Behavior

- `kind: "at"` jobs should be disabled after **any** terminal execution — whether `"ok"`, `"skipped"`, or `"error"`
- At minimum, there should be a retry cap or exponential backoff for past-due one-shot jobs

## Actual Behavior

The job stays `enabled: true` and enters an infinite tight loop:

1. `armTimer()` → `nextWakeAtMs()` returns past time → `setTimeout(fn, 0)` (fires immediately)
2. `onTimer()` → executes job → status is `"skipped"` → `nextRunAtMs` recomputed to same past time
3. `armTimer()` again → goto 1

This repeats ~4,800 times/second, driving the gateway to 100%+ CPU.

## Evidence

### CPU Profile (`--cpu-prof`, 105-second capture)

Top functions by sample count during the busy loop:

| Function | % of samples | Source |
|----------|-------------|--------|
| `croner.js` (g, C, partToArray) | ~16% | Schedule evaluation |
| `json5/parse.js` (lex, string, push) | ~6% | `loadCronStore` |
| `saveCronStore` | ~2% | Writing jobs.json |
| `loadCronStore` | ~1.6% | Reading jobs.json |
| File I/O (copyFile, rename, stat) | ~5% | Per-iteration persistence |
| **Idle** | **38.6%** | Should be ~99% at rest |

### strace confirmation

The loop manifests as ~5,200 reads/second on an eventfd (io_uring completion):
```
read(19, "\1\0\0\0\0\0\0\0", 8) = 8   # repeated thousands of times/sec
```

The eventfd activity is driven by the constant file I/O from `saveCronStore`/`loadCronStore` on every iteration.

### Before/After

| State | CPU usage |
|-------|-----------|
| With stuck job | ~116% (single core pegged) |
| After removing job | ~3% (normal idle: Telegram polling + heartbeat) |

## Root Cause

In the cron timer logic, `kind: "at"` jobs are only auto-disabled when `status === "ok"`. The relevant logic (pseudocode from bundled source):

```js
// After job execution:
if (job.schedule.kind === "at" && lastStatus === "ok") {
  job.enabled = false;  // ← only disables on "ok"
}

// Next run calculation:
if (job.schedule.kind === "at" && job.enabled) {
  return job.schedule.atMs;  // ← returns the original (past) time
}
```

Since `"skipped"` ≠ `"ok"`, the job remains enabled. `computeJobNextRunAtMs()` returns the original `atMs` (which is in the past), causing `setTimeout` to fire with delay 0 on every cycle.

## Suggested Fix

```js
// Disable kind:"at" jobs on ANY terminal status, not just "ok":
if (job.schedule.kind === "at") {
  job.enabled = false;
}
```

Or more conservatively:
- Add a retry counter with a cap (e.g., 3 attempts)
- Add exponential backoff for past-due one-shot jobs
- Treat `"skipped"` and `"error"` as terminal for one-shot jobs

## Impact

- **Severity: High** — a single stuck job silently pegs an entire CPU core
- In our case, the job ran undetected for 4+ days at 100% CPU on a Raspberry Pi 5, causing significant NVMe write amplification and heat
- The only fix is manual removal of the stuck job from `cron/jobs.json`

## Workaround

Identify and remove (or disable) stuck one-shot jobs:
```bash
# Find enabled at-jobs with nextRunAtMs in the past:
openclaw cron list --all --json | jq '[.[] | select(.schedule.kind == "at" and .enabled == true and .state.nextRunAtMs < (now * 1000))]'
```

## Related Issues

- #11438 — Same infinite loop mechanism, triggered by model rejection errors
- #8298 — Case 2 describes the same "skipped" loop behavior
- #6483 — Previous cron bug (enabled default), fixed in v2026.2.6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cron: kind:"at" jobs with non-"ok" terminal status loop forever at 100% CPU #11452

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Evidence

CPU Profile (`--cpu-prof`, 105-second capture)

strace confirmation

Before/After

Root Cause

Suggested Fix

Impact

Workaround

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Function	% of samples	Source
`croner.js` (g, C, partToArray)	~16%	Schedule evaluation
`json5/parse.js` (lex, string, push)	~6%	`loadCronStore`
`saveCronStore`	~2%	Writing jobs.json
`loadCronStore`	~1.6%	Reading jobs.json
File I/O (copyFile, rename, stat)	~5%	Per-iteration persistence
Idle	38.6%	Should be ~99% at rest

State	CPU usage
With stuck job	~116% (single core pegged)
After removing job	~3% (normal idle: Telegram polling + heartbeat)

Uh oh!

Cron: kind:"at" jobs with non-"ok" terminal status loop forever at 100% CPU #11452

Description

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Evidence

CPU Profile (--cpu-prof, 105-second capture)

strace confirmation

Before/After

Root Cause

Suggested Fix

Impact

Workaround

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

CPU Profile (`--cpu-prof`, 105-second capture)