fix(database): add WAL journal mode and transient SQLite error handling#39471
fix(database): add WAL journal mode and transient SQLite error handling#39471liangruochong44-ui wants to merge 1 commit intoopenclaw:mainfrom
Conversation
- Add WAL journal mode to memory SQLite database (fixes openclaw#36035) WAL mode is crash-safe and survives SIGTERM/SIGKILL mid-write - Add transient SQLite error handling (fixes openclaw#34678) Classify SQLITE_CANTOPEN, SQLITE_BUSY, SQLITE_LOCKED, SQLITE_IOERR as non-fatal errors to prevent LaunchAgent crash loops
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 800c0dcbb2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const db = new DatabaseSync(dbPath, { allowExtension: this.settings.store.vector.enabled }); | ||
| // WAL mode is crash-safe and survives SIGTERM/SIGKILL mid-write. | ||
| // It also allows concurrent reads during writes. | ||
| db.exec("PRAGMA journal_mode = WAL"); |
There was a problem hiding this comment.
Move busy_timeout ahead of WAL pragma
Running PRAGMA journal_mode = WAL before setting PRAGMA busy_timeout leaves the very first statement on a new connection with SQLite's default timeout of 0ms, so a concurrent writer can make database open fail immediately with database is locked instead of waiting/retrying. This regresses the existing concurrency behavior documented in the following comment and can cause startup/reopen failures under normal multi-process contention.
Useful? React with 👍 / 👎.
| const TRANSIENT_SQLITE_CODES = new Set([ | ||
| "SQLITE_CANTOPEN", | ||
| "SQLITE_BUSY", | ||
| "SQLITE_LOCKED", | ||
| "SQLITE_IOERR", |
There was a problem hiding this comment.
Match node:sqlite transient errors as non-fatal
The transient SQLite filter only whitelists SQLITE_* codes, but node:sqlite commonly throws lock/open failures as code: "ERR_SQLITE_ERROR" with plain messages like database is locked or unable to open database file; those cases won't match this set and will still fall through to process.exit(1). That means the new unhandled-rejection guard does not actually cover the runtime error shape used by this codepath.
Useful? React with 👍 / 👎.
Greptile SummaryThis PR makes two improvements to SQLite database handling: it sets Key changes:
Concern:
Confidence Score: 3/5
Last reviewed commit: 800c0dc |
| const TRANSIENT_SQLITE_CODES = new Set([ | ||
| "SQLITE_CANTOPEN", | ||
| "SQLITE_BUSY", | ||
| "SQLITE_LOCKED", | ||
| "SQLITE_IOERR", | ||
| ]); |
There was a problem hiding this comment.
SQLITE_CANTOPEN may not be transient
SQLITE_CANTOPEN (error code 14) typically indicates a permanent problem — the database file path doesn't exist, permissions are wrong, or the parent directory is missing. This is fundamentally different from SQLITE_BUSY and SQLITE_LOCKED, which are genuinely transient contention errors.
By treating SQLITE_CANTOPEN as non-fatal and suppressing the crash, the gateway will silently continue in a degraded state where all database operations fail. This risks:
- Hiding misconfigured
store.pathsettings - Masking cases where the database directory was deleted or the user's home directory is unmounted
- Silently losing memory writes (the write fails, the process continues, but the data is never persisted)
Consider removing SQLITE_CANTOPEN from the transient set, or only classifying it as transient if the error occurs during a background/non-critical operation rather than at the handler level where context is lost.
| const TRANSIENT_SQLITE_CODES = new Set([ | |
| "SQLITE_CANTOPEN", | |
| "SQLITE_BUSY", | |
| "SQLITE_LOCKED", | |
| "SQLITE_IOERR", | |
| ]); | |
| const TRANSIENT_SQLITE_CODES = new Set([ | |
| "SQLITE_BUSY", | |
| "SQLITE_LOCKED", | |
| "SQLITE_IOERR", | |
| ]); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/unhandled-rejections.ts
Line: 71-76
Comment:
`SQLITE_CANTOPEN` may not be transient
`SQLITE_CANTOPEN` (error code 14) typically indicates a permanent problem — the database file path doesn't exist, permissions are wrong, or the parent directory is missing. This is fundamentally different from `SQLITE_BUSY` and `SQLITE_LOCKED`, which are genuinely transient contention errors.
By treating `SQLITE_CANTOPEN` as non-fatal and suppressing the crash, the gateway will silently continue in a degraded state where all database operations fail. This risks:
- Hiding misconfigured `store.path` settings
- Masking cases where the database directory was deleted or the user's home directory is unmounted
- Silently losing memory writes (the write fails, the process continues, but the data is never persisted)
Consider removing `SQLITE_CANTOPEN` from the transient set, or only classifying it as transient if the error occurs during a background/non-critical operation rather than at the handler level where context is lost.
```suggestion
const TRANSIENT_SQLITE_CODES = new Set([
"SQLITE_BUSY",
"SQLITE_LOCKED",
"SQLITE_IOERR",
]);
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
This PR fixes two database-related issues:
Fix #36035 - Memory SQLite should use WAL journal mode by default
PRAGMA journal_mode = WALwhen opening the memory SQLite databaseFix #34678 - SQLITE_CANTOPEN should be classified as transient
TRANSIENT_SQLITE_CODESset with SQLITE_CANTOPEN, SQLITE_BUSY, SQLITE_LOCKED, SQLITE_IOERRisTransientSqliteError()function to check for SQLite transient errorsinstallUnhandledRejectionHandler()to handle SQLite errors gracefullyCloses #36035
Closes #34678