Skip to content

fix(cron): default enabled to true for new jobs#8698

Open
emmick4 wants to merge 3 commits intoopenclaw:mainfrom
emmick4:fix/cron-enabled-default
Open

fix(cron): default enabled to true for new jobs#8698
emmick4 wants to merge 3 commits intoopenclaw:mainfrom
emmick4:fix/cron-enabled-default

Conversation

@emmick4
Copy link

@emmick4 emmick4 commented Feb 4, 2026

Problem

Jobs created via the cron.add tool/API were silently disabled because enabled was not set by default. The scheduler treated undefined as disabled, so jobs never ran.

Solution

Add enabled: true to the defaults applied during job creation in normalize.ts, matching the documented behavior that jobs are enabled by default.

Changes

  • One-line fix in src/cron/normalize.ts
  • Two tests added in src/cron/normalize.test.ts:
    • Verifies enabled defaults to true when not provided
    • Verifies explicit enabled: false is preserved

Fixes #6988

Greptile Overview

Greptile Summary

This PR fixes cron job creation defaults by setting enabled: true when enabled is omitted in src/cron/normalize.ts, with new tests ensuring the default is applied and explicit enabled: false is preserved.

In addition, it introduces a new disk-backed gateway lifecycle/crash detection mechanism (src/infra/lifecycle-state.ts) plus a checkpoint persistence layer (src/agents/checkpoint.ts), and wires lifecycle transitions into gateway startup/shutdown (src/gateway/server.impl.ts, src/gateway/server-close.ts). On startup, it logs a warning if the previous run appears to have crashed and surfaces any incomplete checkpoints.

Main things to double-check are the shutdown path reliably marking stopped even when teardown steps fail, and that lifecycle heartbeats are actually updated periodically so crash time/uptime reporting is accurate.

Confidence Score: 3/5

  • This PR is likely safe to merge, but the new lifecycle/crash-detection additions have a couple correctness gaps around heartbeat accuracy and shutdown-state recording.
  • The cron default change and tests look correct and low risk. However, the PR also adds new persistence and lifecycle tracking; as written, crash time/uptime will be stale without a periodic updateLifecycleHeartbeat() call, and failures during shutdown can prevent marking stopped, affecting future crash classification.
  • src/infra/lifecycle-state.ts, src/gateway/server-close.ts

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

Adds a new lifecycle state module that tracks gateway process state to disk.
This enables crash detection on restart:

- markLifecycleStarting() - called on startup, detects if previous run crashed
- markLifecycleRunning() - called after successful startup
- markLifecycleStopping() - called when shutdown begins
- markLifecycleStopped() - called after clean shutdown

If the previous state was 'running' but we're starting up, we know the
gateway crashed. This is the foundation for recovery hooks that can resume
interrupted work (GUPP-style persistence).

Part of self-architecture project to enable more autonomous operation.
Implements a checkpoint system that allows agents to save progress on
long-running tasks. Checkpoints survive crashes and can be used to
resume interrupted work.

New module: src/agents/checkpoint.ts
- saveCheckpoint() - Save task progress
- loadCheckpoint() - Load checkpoint by session ID
- clearCheckpoint() - Remove checkpoint when work complete
- listActiveCheckpoints() - Find all active checkpoints
- updateCheckpoint() - Create or update checkpoint
- findStaleCheckpoints() - Find checkpoints that may need attention

Integration with lifecycle-state.ts:
- On crash detection, lists any interrupted checkpoints
- Logs interrupted work on startup for visibility

This builds on the earlier lifecycle-state module to provide a complete
crash recovery foundation for OpenClaw agents.
Jobs created via the cron.add tool/API were silently disabled because
`enabled` was not set by default. The scheduler treated `undefined` as
disabled, so jobs never ran.

This fix adds `enabled: true` to the defaults applied during job creation,
matching the documented behavior that jobs are enabled by default.

Fixes openclaw#6988
@openclaw-barnacle openclaw-barnacle bot added gateway Gateway runtime agents Agent runtime and tooling labels Feb 4, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +35 to +37
// Mark lifecycle as stopping for crash detection
await markLifecycleStopping();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markLifecycleStopping() is awaited before any shutdown work, but markLifecycleStopped() is only reached if all later cleanup steps succeed. If any awaited call throws (e.g. tailscaleCleanup(), stopChannel(), or httpServer.close()), the lifecycle file will stay at stopping and the next startup won’t be classified as a crash (it’ll be unknown). Consider wrapping the shutdown body in try { ... } finally { await markLifecycleStopped(); } (or otherwise ensuring stopped is recorded on clean exits even when some teardown steps fail).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/server-close.ts
Line: 35:37

Comment:
`markLifecycleStopping()` is awaited before any shutdown work, but `markLifecycleStopped()` is only reached if *all* later cleanup steps succeed. If any awaited call throws (e.g. `tailscaleCleanup()`, `stopChannel()`, or `httpServer.close()`), the lifecycle file will stay at `stopping` and the next startup won’t be classified as a crash (it’ll be `unknown`). Consider wrapping the shutdown body in `try { ... } finally { await markLifecycleStopped(); }` (or otherwise ensuring `stopped` is recorded on clean exits even when some teardown steps fail).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +56 to +70
export async function markLifecycleStarting(): Promise<PreviousShutdown | null> {
const previous = await loadLifecycleState();
let previousShutdown: PreviousShutdown | null = null;

if (previous) {
const baseInfo = {
time: previous.lastHeartbeat,
uptime: previous.lastHeartbeat - previous.startedAt,
};

if (previous.status === "running") {
// Previous state was "running" but we're starting up = crash
// Check for interrupted checkpoints
const checkpoints = await listActiveCheckpoints();
const interrupted = checkpoints.filter((cp) => cp.progress < 100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crash detection here hinges on previous.status === "running", but nothing ever calls updateLifecycleHeartbeat() in this PR. Since markLifecycleRunning() sets lastHeartbeat only once, the reported crash time/uptime will be misleading for long-running gateways (it will look like it crashed shortly after start). Either wire updateLifecycleHeartbeat() into an existing periodic timer/heartbeat event, or remove the lastHeartbeat/uptime reporting until it’s actually updated regularly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/lifecycle-state.ts
Line: 56:70

Comment:
Crash detection here hinges on `previous.status === "running"`, but nothing ever calls `updateLifecycleHeartbeat()` in this PR. Since `markLifecycleRunning()` sets `lastHeartbeat` only once, the reported crash time/uptime will be misleading for long-running gateways (it will look like it crashed shortly after start). Either wire `updateLifecycleHeartbeat()` into an existing periodic timer/heartbeat event, or remove the `lastHeartbeat`/uptime reporting until it’s actually updated regularly.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +28 to +34
export async function saveCheckpoint(checkpoint: Checkpoint): Promise<void> {
await fs.mkdir(CHECKPOINT_DIR, { recursive: true });
const filePath = path.join(CHECKPOINT_DIR, `${checkpoint.sessionId}.json`);
const tmp = `${filePath}.${process.pid}.tmp`;
await fs.writeFile(tmp, JSON.stringify(checkpoint, null, 2), "utf-8");
await fs.rename(tmp, filePath);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

saveCheckpoint() writes to a temp file and renames it, but if the process crashes between writeFile and rename, listActiveCheckpoints() currently ignores *.tmp files, so those checkpoints become invisible (and markLifecycleStarting() won’t report them as interrupted). If you want crash recovery to be reliable, consider either including .tmp files in listActiveCheckpoints() (best-effort parse) or cleaning them up / using a more robust atomic-write pattern on platforms where rename may fail (e.g. cross-device).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/checkpoint.ts
Line: 28:34

Comment:
`saveCheckpoint()` writes to a temp file and renames it, but if the process crashes between `writeFile` and `rename`, `listActiveCheckpoints()` currently ignores `*.tmp` files, so those checkpoints become invisible (and `markLifecycleStarting()` won’t report them as interrupted). If you want crash recovery to be reliable, consider either including `.tmp` files in `listActiveCheckpoints()` (best-effort parse) or cleaning them up / using a more robust atomic-write pattern on platforms where `rename` may fail (e.g. cross-device).

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling gateway Gateway runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cron jobs created via API/tool don't run: enabled defaults to undefined instead of true

1 participant

Comments