-
-
Notifications
You must be signed in to change notification settings - Fork 39.7k
Closed
Description
Bug Description
cron.list (and other cron read operations) are blocked for 12–421 seconds because they share a promise-chain lock with executeJob, which holds the lock while waiting for LLM agent turns to complete.
Evidence (gateway logs)
# Normal WS calls during the same period: 50-80ms
2026-02-06T09:19:43.731Z [ws] ⇄ res ✓ chat.history 62ms
2026-02-06T09:21:08.788Z [ws] ⇄ res ✓ chat.history 55ms
# cron.list calls: 12,000-421,000ms
2026-02-06T09:01:33.299Z [ws] ⇄ res ✓ cron.list 77920ms
2026-02-06T10:31:38.411Z [ws] ⇄ res ✓ cron.list 89381ms
2026-02-06T11:01:41.640Z [ws] ⇄ res ✓ cron.list 80715ms
2026-02-06T07:37:24.814Z [ws] ⇄ res ✓ cron.list 421122ms
# gateway.err.log — 15 consecutive timeouts
2026-02-05T20:01:10.313Z [tools] cron failed: gateway timeout after 60000ms
2026-02-06T03:01:12.619Z [tools] cron failed: gateway timeout after 60000ms
2026-02-06T05:01:05.948Z [tools] cron failed: gateway timeout after 60000ms
... (15 total)
All slow cron.list calls occur at :00 / :30 — exactly when heartbeat cron jobs fire.
Root Cause Analysis
In src/cron/service/locked.ts, all cron operations share a single promise-chain lock:
export async function locked<T>(state: CronServiceState, fn: () => Promise<T>): Promise<T> {
const storeOp = storeLocks.get(storePath) ?? Promise.resolve();
const next = Promise.all([resolveChain(state.op), resolveChain(storeOp)]).then(fn);
state.op = keepAlive;
storeLocks.set(storePath, keepAlive);
return (await next) as T;
}In src/cron/service/timer.ts, runDueJobs holds this lock while sequentially awaiting each agent turn:
// Inside onTimer() → locked() scope:
async function runDueJobs(state: CronServiceState) {
for (const job of due) {
await executeJob(state, job, now, { forced: false }); // blocks lock for 90s+ per job
}
}And executeJob awaits the full LLM agent turn inside the lock:
const res = await state.deps.runIsolatedAgentJob({...}); // 60-300s for LLM completionSince cron.list also acquires the same lock:
export async function list(state, opts?) {
return await locked(state, async () => { // ← blocked until executeJob releases
await ensureLoaded(state);
return jobs.toSorted(...);
});
}Result: A simple read-only list operation waits for ALL due jobs to finish executing sequentially.
Impact
- Agent tool calls to
cron.listtimeout at 60s default cron.updatealso blocked (observed 29s delay)- With multiple jobs due simultaneously, lock hold time compounds (421s observed = ~4 jobs × 90s each)
- Any cron management during active job execution is effectively impossible
Suggested Fix
Separate the execution lock from the state-read lock. Options:
- Read-write lock:
cron.list/cron.statustake a read lock that doesn't block on execution - Execute outside lock:
executeJobshould release the lock before awaitingrunIsolatedAgentJob, re-acquire after to update state - Snapshot for reads:
cron.listreads a snapshot of the job array without acquiring the execution lock
Environment
- OpenClaw version: 2026.2.4
- 28 active cron jobs
- Heartbeat schedule:
*/30 10-20 * * 1-5(work hours) +0 0-9,21-23 * * *(off-hours) - Typical agent turn duration: 60-120s
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels