Cron: disable one-shot at jobs after any terminal status to prevent retry storms (#11438, #11452)#11459
Closed
lailoo wants to merge 2 commits intoopenclaw:mainfrom
Closed
Cron: disable one-shot at jobs after any terminal status to prevent retry storms (#11438, #11452)#11459lailoo wants to merge 2 commits intoopenclaw:mainfrom
lailoo wants to merge 2 commits intoopenclaw:mainfrom
Conversation
Contributor
Author
Real-environment verificationReproduced the retry storm on
The fix correctly disables one-shot |
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #11438
Fixes #11452
One-shot
atcron jobs that complete with a non-"ok"status (e.g."error","skipped") enter an infinite retry loop becausecomputeJobNextRunAtMsreturns the original (past)atMs, causingfindDueJobsto immediately re-trigger the job. This generates 1,900+ failed attempts in 5 seconds and pegs the gateway at 100% CPU.Problem
When an
atjob finishes with any non-"ok"status:onTimercallscomputeJobNextRunAtMs(job, result.endedAt)atjobs, this returns the originalatMs(already in the past)armTimerfires immediately,findDueJobspicks up the job againSolution
Disable one-shot
atjobs after any terminal status (ok,error,skipped), not justok. One-shot jobs are inherently run-once — if the job fails or is skipped, the user can manually re-enable or re-create it.This applies to both
onTimer(timer-triggered) andexecuteJob(cron runcommand) code paths.Changes
src/cron/service/timer.ts: Simplifyatjob post-execution to disable on any terminal statussrc/cron/service.runs-one-shot-main-job-disables-it.test.ts: Add regression testCHANGELOG.md: Add entry referencing both issuesTesting
pnpm vitest run src/cron/)main(before fix)true(still retrying)fix/cron-at-retry-storm-11438false(disabled)