Skip to content

feat: auto-prune expired cron run sessions#12588

Closed
Glucksberg wants to merge 1 commit intoopenclaw:mainfrom
Glucksberg:feat/cron-session-pruning-clean
Closed

feat: auto-prune expired cron run sessions#12588
Glucksberg wants to merge 1 commit intoopenclaw:mainfrom
Glucksberg:feat/cron-session-pruning-clean

Conversation

@Glucksberg
Copy link
Copy Markdown
Contributor

@Glucksberg Glucksberg commented Feb 9, 2026

The problem

Every isolated cron job creates a cron:jobId:run:uuid entry in sessions.json that is never cleaned up. If you have 20 jobs running every 15 minutes, that is ~100 orphan entries per day. In a month, 3000+ entries in a JSON file that gets read and written on every session operation. Performance degrades, disk grows, and any tool that lists sessions gets slower over time.

Closes #12289
Related: #11260, #11643, #12374
See also: #12570 (alternative approach)

What this does

Adds a simple garbage collector for expired cron run sessions, following the same pattern as Kubernetes ttlSecondsAfterFinished — persist normally for observability, then clean up after a configurable retention period.

Default: 24 hours. Configurable:

cron:
  sessionRetention: "24h"   # or "7d", "1h", false to disable

Where the lines come from

  • ~35 lines of actual logic (the reaper sweep function)
  • 115 lines total for the reaper module (types, config parsing, throttle, exports)
  • 174 lines of tests (12 tests covering every path)
  • 34 lines of wiring (connecting to the existing cron timer + passing config)

The complexity is in the tests, not the code.

Why it is safe

  1. Uses updateSessionStore which already has file locking — no race conditions
  2. Only deletes keys matching :cron:*:run:* — regular sessions are untouchable
  3. Wrapped in try/catch — if the reaper fails, it logs a warning and moves on. Zero impact on job execution
  4. Conservative default (24h) — plenty of time for debugging recent runs
  5. Can be fully disabled with sessionRetention: false
  6. Zero breaking changes — existing setups get automatic cleanup with no config needed

How it works

The reaper piggybacks on the cron timer tick that already runs every ~1 minute. It self-throttles to sweep at most every 5 minutes to avoid unnecessary I/O. On each sweep it reads the session store, deletes expired cron:*:run:* entries, and writes back. That is it.

What I tested

Manually tested on a live production gateway:

  1. Baseline: 161 sessions in sessions.json, 156 of them orphaned cron:*:run:* entries (oldest 272 minutes old)
  2. Config: Set cron.sessionRetention: "2m" to accelerate testing (instead of waiting 24h)
  3. Test cron job: Created a lightweight job running every 1 minute (Say "ping" via Sonnet, delivery: none) to generate new sessions and trigger the reaper
  4. Result: After gateway restart with the new build, the reaper cleaned 148 expired sessions (161 → 13). The remaining 8 run sessions were all < 3 minutes old (within the 2-minute retention window)
  5. Cleanup: Removed the test cron job, restored sessionRetention: "24h"

The reaper correctly:

  • Pruned all sessions older than the retention period
  • Preserved sessions within the retention window
  • Left non-cron sessions untouched (1 main session preserved)
  • Left cron base sessions intact (4 base sessions preserved)
  • Ran silently without impacting cron job execution

Comparison with #12570

Both PRs solve the same core problem. Here is a direct comparison:

This PR (#12588) #12570
Strategy TTL-based reaper (delete after retention expires) Immediate delete after run + 10min backup sweeper
Default behavior Keep sessions for 24h, then prune Delete immediately on completion
Observability Recent runs visible for debugging via sessions_list Sessions gone immediately — no post-mortem window
Config cron.sessionRetention: "24h" / "7d" / false Per-job cleanup: "delete" / "keep"
Scope Single config for all cron jobs Per-job control, but no global TTL
Multi-agent stores Path resolved from config, passed explicitly Sweeper only scans default agent store (Greptile finding)
Key matching Regex :cron:[^:]+:run: (anchored to cron) /:run:/ pattern (may match non-cron sessions — Greptile finding)
Files changed 7 files, +326 / -0 9 files, +361 / -3
Tests 12 new, 100/100 cron suite 7 new, 252/252 full suite
Greptile score 3/5 (both findings addressed in follow-up) 3/5 (2 open findings: wrong-store sweep + broad key match)

Key tradeoffs:

@openclaw-barnacle openclaw-barnacle bot added the gateway Gateway runtime label Feb 9, 2026
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +73 to +75
const storePath =
params.sessionStorePath ?? resolveStorePath(undefined, { agentId: params.agentId });

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignores configured session store
resolveStorePath(undefined, { agentId }) hardcodes the default session store path when sessionStorePath isn’t provided, ignoring any configured cfg.session.store. If sweepCronRunSessions is ever called without sessionStorePath (it already accepts agentId for this), pruning will operate on the wrong sessions.json.

This is currently masked by timer.ts always passing state.deps.sessionStorePath, but the fallback path is incorrect for the function’s API contract.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/session-reaper.ts
Line: 73:75

Comment:
**Ignores configured session store**
`resolveStorePath(undefined, { agentId })` hardcodes the default session store path when `sessionStorePath` isn’t provided, ignoring any configured `cfg.session.store`. If `sweepCronRunSessions` is ever called without `sessionStorePath` (it already accepts `agentId` for this), pruning will operate on the wrong `sessions.json`.

This is currently masked by `timer.ts` always passing `state.deps.sessionStorePath`, but the fallback path is incorrect for the function’s API contract.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +277 to +289
// Piggyback session reaper on timer tick (self-throttled to every 5 min).
if (state.deps.sessionStorePath) {
try {
await sweepCronRunSessions({
cronConfig: state.deps.cronConfig,
sessionStorePath: state.deps.sessionStorePath,
nowMs: state.deps.nowMs(),
log: state.deps.log,
});
} catch (err) {
state.deps.log.warn({ err: String(err) }, "cron: session reaper sweep failed");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reaper runs outside lock
onTimer() calls sweepCronRunSessions() after leaving the locked(state, ...) sections. sweepCronRunSessions uses updateSessionStore (session-store lock), but cron’s own locked() may not coordinate with that. If any cron job (or other code path) touches sessions.json while locked() is held, you can end up with avoidable contention/ordering inversions (cron store lock vs session store lock).

If the intent is “safe + low overhead,” consider invoking the sweep in a consistent lock order (or document/enforce that cron code never holds locked() while operating on sessions).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 277:289

Comment:
**Reaper runs outside lock**
`onTimer()` calls `sweepCronRunSessions()` after leaving the `locked(state, ...)` sections. `sweepCronRunSessions` uses `updateSessionStore` (session-store lock), but cron’s own `locked()` may not coordinate with that. If any cron job (or other code path) touches `sessions.json` while `locked()` is held, you can end up with avoidable contention/ordering inversions (cron store lock vs session store lock).

If the intent is “safe + low overhead,” consider invoking the sweep in a consistent lock order (or document/enforce that cron code never holds `locked()` while operating on sessions).

How can I resolve this? If you propose a fix, please make it concise.

@Glucksberg
Copy link
Copy Markdown
Contributor Author

Addressed both Greptile findings:

1. Fallback path (session-reaper.ts): Removed the fallback entirely. sessionStorePath is now a required parameter — the caller (server-cron.ts) resolves the correct path from config before passing it in. No ambiguity about which sessions.json gets swept.

2. Lock ordering (timer.ts): This is by design. The sweep runs AFTER all locked(state) sections have been released in onTimer(). The two locks (cron store lock vs session-store file lock) are never held simultaneously, so there's no inversion risk. Added a doc comment in sweepCronRunSessions() to make this explicit.

@Takhoffman
Copy link
Copy Markdown
Contributor

A comprehensive solution at the pi level would be ideal. I am asking bad logic if this is something that pi can solve for us. Related: #12899

Add TTL-based reaper for isolated cron run sessions that accumulate
indefinitely in sessions.json.

New config option:
  cron.sessionRetention: string | false  (default: '24h')

The reaper runs piggy-backed on the cron timer tick, self-throttled
to sweep at most every 5 minutes. It removes session entries matching
the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now.

Design follows the Kubernetes ttlSecondsAfterFinished pattern:
- Sessions are persisted normally (observability/debugging)
- A periodic reaper prunes expired entries
- Configurable retention with sensible default
- Set to false to disable pruning entirely

Files changed:
- src/config/types.cron.ts: Add sessionRetention to CronConfig
- src/config/zod-schema.ts: Add Zod validation for sessionRetention
- src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions)
- src/cron/session-reaper.test.ts: 12 tests covering all paths
- src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps
- src/cron/service/timer.ts: Wire reaper into onTimer tick
- src/gateway/server-cron.ts: Pass config and session store path to deps

Closes openclaw#12289
@gumadeiras
Copy link
Copy Markdown
Member

Fix landed on main via e19a235.

Thanks @Glucksberg !

@gumadeiras gumadeiras closed this Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Isolated cron sessions accumulate indefinitely — no built-in pruning

3 participants