fix(cron): implement true LRU cache eviction#39706
fix(cron): implement true LRU cache eviction#39706liangruochong44-ui wants to merge 2 commits intoopenclaw:mainfrom
Conversation
- Add WAL journal mode to memory SQLite database (fixes openclaw#36035) WAL mode is crash-safe and survives SIGTERM/SIGKILL mid-write - Add transient SQLite error handling (fixes openclaw#34678) Classify SQLITE_CANTOPEN, SQLITE_BUSY, SQLITE_LOCKED, SQLITE_IOERR as non-fatal errors to prevent LaunchAgent crash loops
The cronEvalCache was using FIFO eviction (first inserted = first evicted), but Map.get() doesn't reorder entries. This caused frequently accessed entries to be prematurely evicted. Fix: On cache hit, delete and re-set the entry to move it to the end of Map iteration order, implementing true LRU behavior. Fixes openclaw#39679
Greptile SummaryThis PR bundles three distinct changes under the cron LRU cache fix title: (1) the stated LRU fix in
Confidence Score: 2/5
Last reviewed commit: 0a9c952 |
| "SQLITE_CANTOPEN", | ||
| "SQLITE_BUSY", | ||
| "SQLITE_LOCKED", | ||
| "SQLITE_IOERR", | ||
| ]); |
There was a problem hiding this comment.
SQLITE_CANTOPEN and SQLITE_IOERR are not truly transient
SQLITE_CANTOPEN typically indicates a permanent condition — the database file or its parent directory doesn't exist, or the process lacks permissions that won't fix themselves. Classifying it as transient means the gateway will silently continue running while unable to access its database, causing silent data-loss or reads returning stale data instead of crashing-fast.
SQLITE_IOERR is similarly dangerous: it's the parent code for disk-level I/O failures (SQLITE_IOERR_READ, SQLITE_IOERR_WRITE, SQLITE_IOERR_FSYNC, etc.) that can indicate disk corruption or hardware failure — not temporary contention. Swallowing it prevents the operator from knowing the database is in a bad state.
Consider narrowing the set to only codes that are genuinely retryable:
| "SQLITE_CANTOPEN", | |
| "SQLITE_BUSY", | |
| "SQLITE_LOCKED", | |
| "SQLITE_IOERR", | |
| ]); | |
| const TRANSIENT_SQLITE_CODES = new Set([ | |
| "SQLITE_BUSY", | |
| "SQLITE_LOCKED", | |
| ]); |
If SQLITE_CANTOPEN or SQLITE_IOERR must be handled at all, they should at minimum not be suppressed silently — they warrant a distinct code-path that either retries with backoff or shuts down with a clear error.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/unhandled-rejections.ts
Line: 72-76
Comment:
**`SQLITE_CANTOPEN` and `SQLITE_IOERR` are not truly transient**
`SQLITE_CANTOPEN` typically indicates a permanent condition — the database file or its parent directory doesn't exist, or the process lacks permissions that won't fix themselves. Classifying it as transient means the gateway will silently continue running while unable to access its database, causing silent data-loss or reads returning stale data instead of crashing-fast.
`SQLITE_IOERR` is similarly dangerous: it's the parent code for disk-level I/O failures (`SQLITE_IOERR_READ`, `SQLITE_IOERR_WRITE`, `SQLITE_IOERR_FSYNC`, etc.) that can indicate disk corruption or hardware failure — not temporary contention. Swallowing it prevents the operator from knowing the database is in a bad state.
Consider narrowing the set to only codes that are genuinely retryable:
```suggestion
const TRANSIENT_SQLITE_CODES = new Set([
"SQLITE_BUSY",
"SQLITE_LOCKED",
]);
```
If `SQLITE_CANTOPEN` or `SQLITE_IOERR` must be handled at all, they should at minimum not be suppressed silently — they warrant a distinct code-path that either retries with backoff or shuts down with a clear error.
How can I resolve this? If you propose a fix, please make it concise.| const rawMessage = (candidate as { message?: unknown }).message; | ||
| const message = typeof rawMessage === "string" ? rawMessage : ""; | ||
| if (message.includes("SQLITE_CANTOPEN") || message.includes("SQLITE_BUSY") || | ||
| message.includes("SQLITE_LOCKED") || message.includes("SQLITE_IOERR")) { | ||
| return true; |
There was a problem hiding this comment.
Message check skips case normalization
Unlike isTransientNetworkError (line 186), which calls .toLowerCase().trim() before matching, the message here is used as-is and matched against uppercase literals. If a SQLite binding ever surfaces a lowercase variant (e.g. "sqlite_busy") or mixed-case message, these checks would silently miss it.
| const rawMessage = (candidate as { message?: unknown }).message; | |
| const message = typeof rawMessage === "string" ? rawMessage : ""; | |
| if (message.includes("SQLITE_CANTOPEN") || message.includes("SQLITE_BUSY") || | |
| message.includes("SQLITE_LOCKED") || message.includes("SQLITE_IOERR")) { | |
| return true; | |
| const message = typeof rawMessage === "string" ? rawMessage.toUpperCase() : ""; | |
| if (message.includes("SQLITE_CANTOPEN") || message.includes("SQLITE_BUSY") || | |
| message.includes("SQLITE_LOCKED") || message.includes("SQLITE_IOERR")) { |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/unhandled-rejections.ts
Line: 234-238
Comment:
**Message check skips case normalization**
Unlike `isTransientNetworkError` (line 186), which calls `.toLowerCase().trim()` before matching, the message here is used as-is and matched against uppercase literals. If a SQLite binding ever surfaces a lowercase variant (e.g. `"sqlite_busy"`) or mixed-case message, these checks would silently miss it.
```suggestion
const message = typeof rawMessage === "string" ? rawMessage.toUpperCase() : "";
if (message.includes("SQLITE_CANTOPEN") || message.includes("SQLITE_BUSY") ||
message.includes("SQLITE_LOCKED") || message.includes("SQLITE_IOERR")) {
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a9c952f7b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const db = new DatabaseSync(dbPath, { allowExtension: this.settings.store.vector.enabled }); | ||
| // WAL mode is crash-safe and survives SIGTERM/SIGKILL mid-write. | ||
| // It also allows concurrent reads during writes. | ||
| db.exec("PRAGMA journal_mode = WAL"); |
There was a problem hiding this comment.
Set busy_timeout before enabling WAL mode
Running PRAGMA journal_mode = WAL before PRAGMA busy_timeout = 5000 makes the first lock-sensitive statement execute with SQLite's default 0ms timeout, so a brief concurrent writer can cause immediate database is locked failures during DB open/reopen. This regresses the retry behavior described in the nearby comment ("retry instead of failing immediately"), and can make startup/reindexing flaky under normal multi-process contention.
Useful? React with 👍 / 👎.
| if (message.includes("SQLITE_CANTOPEN") || message.includes("SQLITE_BUSY") || | ||
| message.includes("SQLITE_LOCKED") || message.includes("SQLITE_IOERR")) { |
There was a problem hiding this comment.
Match transient SQLite errors using normalized messages
The new SQLite suppression path only matches uppercase substrings like SQLITE_BUSY/SQLITE_LOCKED, but common runtime forms in this codebase are normalized lowercase or plain text such as database is locked (for example, qmd-manager already checks those forms). As written, many transient SQLite lock rejections will not be recognized and still trigger process.exit(1), so the intended non-fatal handling is incomplete.
Useful? React with 👍 / 👎.
Summary
The cronEvalCache was using FIFO eviction (first inserted = first evicted), but Map.get() doesn't reorder entries. This caused frequently accessed entries to be prematurely evicted.
Changes
Testing
Closes #39679