feat: auto-prune expired cron run sessions#12588
feat: auto-prune expired cron run sessions#12588Glucksberg wants to merge 1 commit intoopenclaw:mainfrom
Conversation
src/cron/session-reaper.ts
Outdated
| const storePath = | ||
| params.sessionStorePath ?? resolveStorePath(undefined, { agentId: params.agentId }); | ||
|
|
There was a problem hiding this comment.
Ignores configured session store
resolveStorePath(undefined, { agentId }) hardcodes the default session store path when sessionStorePath isn’t provided, ignoring any configured cfg.session.store. If sweepCronRunSessions is ever called without sessionStorePath (it already accepts agentId for this), pruning will operate on the wrong sessions.json.
This is currently masked by timer.ts always passing state.deps.sessionStorePath, but the fallback path is incorrect for the function’s API contract.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/session-reaper.ts
Line: 73:75
Comment:
**Ignores configured session store**
`resolveStorePath(undefined, { agentId })` hardcodes the default session store path when `sessionStorePath` isn’t provided, ignoring any configured `cfg.session.store`. If `sweepCronRunSessions` is ever called without `sessionStorePath` (it already accepts `agentId` for this), pruning will operate on the wrong `sessions.json`.
This is currently masked by `timer.ts` always passing `state.deps.sessionStorePath`, but the fallback path is incorrect for the function’s API contract.
How can I resolve this? If you propose a fix, please make it concise.| // Piggyback session reaper on timer tick (self-throttled to every 5 min). | ||
| if (state.deps.sessionStorePath) { | ||
| try { | ||
| await sweepCronRunSessions({ | ||
| cronConfig: state.deps.cronConfig, | ||
| sessionStorePath: state.deps.sessionStorePath, | ||
| nowMs: state.deps.nowMs(), | ||
| log: state.deps.log, | ||
| }); | ||
| } catch (err) { | ||
| state.deps.log.warn({ err: String(err) }, "cron: session reaper sweep failed"); | ||
| } | ||
| } |
There was a problem hiding this comment.
Reaper runs outside lock
onTimer() calls sweepCronRunSessions() after leaving the locked(state, ...) sections. sweepCronRunSessions uses updateSessionStore (session-store lock), but cron’s own locked() may not coordinate with that. If any cron job (or other code path) touches sessions.json while locked() is held, you can end up with avoidable contention/ordering inversions (cron store lock vs session store lock).
If the intent is “safe + low overhead,” consider invoking the sweep in a consistent lock order (or document/enforce that cron code never holds locked() while operating on sessions).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/service/timer.ts
Line: 277:289
Comment:
**Reaper runs outside lock**
`onTimer()` calls `sweepCronRunSessions()` after leaving the `locked(state, ...)` sections. `sweepCronRunSessions` uses `updateSessionStore` (session-store lock), but cron’s own `locked()` may not coordinate with that. If any cron job (or other code path) touches `sessions.json` while `locked()` is held, you can end up with avoidable contention/ordering inversions (cron store lock vs session store lock).
If the intent is “safe + low overhead,” consider invoking the sweep in a consistent lock order (or document/enforce that cron code never holds `locked()` while operating on sessions).
How can I resolve this? If you propose a fix, please make it concise.|
Addressed both Greptile findings: 1. Fallback path (session-reaper.ts): Removed the fallback entirely. 2. Lock ordering (timer.ts): This is by design. The sweep runs AFTER all |
|
A comprehensive solution at the pi level would be ideal. I am asking bad logic if this is something that pi can solve for us. Related: #12899 |
Add TTL-based reaper for isolated cron run sessions that accumulate indefinitely in sessions.json. New config option: cron.sessionRetention: string | false (default: '24h') The reaper runs piggy-backed on the cron timer tick, self-throttled to sweep at most every 5 minutes. It removes session entries matching the pattern cron:<jobId>:run:<uuid> whose updatedAt + retention < now. Design follows the Kubernetes ttlSecondsAfterFinished pattern: - Sessions are persisted normally (observability/debugging) - A periodic reaper prunes expired entries - Configurable retention with sensible default - Set to false to disable pruning entirely Files changed: - src/config/types.cron.ts: Add sessionRetention to CronConfig - src/config/zod-schema.ts: Add Zod validation for sessionRetention - src/cron/session-reaper.ts: New reaper module (sweepCronRunSessions) - src/cron/session-reaper.test.ts: 12 tests covering all paths - src/cron/service/state.ts: Add cronConfig/sessionStorePath to deps - src/cron/service/timer.ts: Wire reaper into onTimer tick - src/gateway/server-cron.ts: Pass config and session store path to deps Closes openclaw#12289
|
Fix landed on main via e19a235. Thanks @Glucksberg ! |
The problem
Every isolated cron job creates a
cron:jobId:run:uuidentry insessions.jsonthat is never cleaned up. If you have 20 jobs running every 15 minutes, that is ~100 orphan entries per day. In a month, 3000+ entries in a JSON file that gets read and written on every session operation. Performance degrades, disk grows, and any tool that lists sessions gets slower over time.Closes #12289
Related: #11260, #11643, #12374
See also: #12570 (alternative approach)
What this does
Adds a simple garbage collector for expired cron run sessions, following the same pattern as Kubernetes
ttlSecondsAfterFinished— persist normally for observability, then clean up after a configurable retention period.Default: 24 hours. Configurable:
Where the lines come from
The complexity is in the tests, not the code.
Why it is safe
updateSessionStorewhich already has file locking — no race conditions:cron:*:run:*— regular sessions are untouchablesessionRetention: falseHow it works
The reaper piggybacks on the cron timer tick that already runs every ~1 minute. It self-throttles to sweep at most every 5 minutes to avoid unnecessary I/O. On each sweep it reads the session store, deletes expired
cron:*:run:*entries, and writes back. That is it.What I tested
Manually tested on a live production gateway:
sessions.json, 156 of them orphanedcron:*:run:*entries (oldest 272 minutes old)cron.sessionRetention: "2m"to accelerate testing (instead of waiting 24h)Say "ping"via Sonnet, delivery: none) to generate new sessions and trigger the reapersessionRetention: "24h"The reaper correctly:
Comparison with #12570
Both PRs solve the same core problem. Here is a direct comparison:
sessions_listcron.sessionRetention: "24h" / "7d" / falsecleanup: "delete" / "keep":cron:[^:]+:run:(anchored to cron)/:run:/pattern (may match non-cron sessions — Greptile finding)Key tradeoffs: