fix: clamp cron timer delay to 60s as drift guard#10580
Closed
zhiyuanw101 wants to merge 1 commit intoopenclaw:mainfrom
Closed
fix: clamp cron timer delay to 60s as drift guard#10580zhiyuanw101 wants to merge 1 commit intoopenclaw:mainfrom
zhiyuanw101 wants to merge 1 commit intoopenclaw:mainfrom
Conversation
Comment on lines
34
to
36
| afterEach(() => { | ||
| vi.useRealTimers(); | ||
| }); |
Contributor
There was a problem hiding this comment.
Cleanup skipped on failure
Both tests call cron.stop()/store.cleanup() at the end of the test body, so if an assertion throws earlier the temp dir is leaked and the CronService may keep timers open until afterEach runs. This can make the suite flaky over time (especially when running individual tests repeatedly). Consider moving cron.stop() + store.cleanup() into afterEach or a try/finally around the test body so cleanup always happens even on failure.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/cron/service.drift-guard-catches-overdue-jobs.test.ts
Line: 34:36
Comment:
**Cleanup skipped on failure**
Both tests call `cron.stop()`/`store.cleanup()` at the end of the test body, so if an assertion throws earlier the temp dir is leaked and the CronService may keep timers open until `afterEach` runs. This can make the suite flaky over time (especially when running individual tests repeatedly). Consider moving `cron.stop()` + `store.cleanup()` into `afterEach` or a `try/finally` around the test body so cleanup always happens even on failure.
How can I resolve this? If you propose a fix, please make it concise.The cron scheduler relies solely on setTimeout to trigger job execution. If the timer is silently lost (macOS sleep/wake, GC pause, process throttling), no recovery mechanism exists until an external API call happens to re-arm the timer. Clamp the maximum setTimeout delay in armTimer() to 60 seconds. When the timer fires before a job is due, onTimer() finds no overdue jobs, does nothing, and re-arms — effectively polling every 60s. When a job IS overdue (because the previous timer was lost), onTimer() catches it within at most 60 seconds of the missed slot. This is simpler and more robust than a separate setInterval drift guard: - No second timer to manage or leak - No interaction issues with fake timers in tests - Same setTimeout + re-arm pattern already used by the scheduler - Minimal overhead (one timer wake per minute when idle)
801fc4d to
a3061c2
Compare
Member
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Fixes #10564
The cron scheduler relies solely on
setTimeoutto trigger job execution. If the timer is silently lost — due to macOS sleep/wake, GC pause, process throttling, or any other runtime disruption — no recovery mechanism exists. The missed job slot is never executed until an external API call (e.g.cron list) happens to callensureLoaded→recomputeNextRuns→armTimer.Observed behavior: A cron job scheduled for 7:00 AM PST (
0 7-19 * * 1-5) did not fire until manually triggered 2+ hours later. Gateway logs showed the timer was correctly armed after job creation but never fired, likely due to macOS power management suspending the Node.js event loop during sleep.Solution
Clamp the maximum
setTimeoutdelay inarmTimer()to 60 seconds viaDRIFT_GUARD_MAX_DELAY_MS. This means:onTimer()finds no overdue jobs, does nothing, and re-armsThis is simpler and more robust than a separate
setIntervaldrift guard:setTimeout+ re-arm pattern already used by the schedulerChanges
src/cron/service/timer.ts: AddDRIFT_GUARD_MAX_DELAY_MS = 60_000constant, clamp delay inarmTimer()src/cron/service.drift-guard-catches-overdue-jobs.test.ts: Two new tests validating the drift guard behaviorTest Results
All 60 tests pass (58 existing + 2 new):
Greptile Overview
Greptile Summary
setTimeoutdelays inarmTimer()to a maximum of 60s, so overdue jobs can be detected even if the original timer was lost (sleep/GC/throttling).DRIFT_GUARD_MAX_DELAY_MSand updates timer arming to use the minimum of computed delay, Node’s max timeout, and the drift-guard cap.Confidence Score: 4/5
(2/5) Greptile learns from your feedback when you react with thumbs up/down!