Skip to content

fix(database): add WAL journal mode and transient SQLite error handling#39471

Open
liangruochong44-ui wants to merge 1 commit intoopenclaw:mainfrom
liangruochong44-ui:fix/database-wal-and-transient-errors
Open

fix(database): add WAL journal mode and transient SQLite error handling#39471
liangruochong44-ui wants to merge 1 commit intoopenclaw:mainfrom
liangruochong44-ui:fix/database-wal-and-transient-errors

Conversation

@liangruochong44-ui
Copy link
Copy Markdown

Summary

This PR fixes two database-related issues:

Fix #36035 - Memory SQLite should use WAL journal mode by default

  • Added PRAGMA journal_mode = WAL when opening the memory SQLite database
  • WAL mode is crash-safe and survives SIGTERM/SIGKILL mid-write
  • Also allows concurrent reads during writes
  • Prevents "database disk image is malformed" errors during gateway restarts

Fix #34678 - SQLITE_CANTOPEN should be classified as transient

  • Added TRANSIENT_SQLITE_CODES set with SQLITE_CANTOPEN, SQLITE_BUSY, SQLITE_LOCKED, SQLITE_IOERR
  • Added isTransientSqliteError() function to check for SQLite transient errors
  • Modified installUnhandledRejectionHandler() to handle SQLite errors gracefully
  • Prevents LaunchAgent crash loops when encountering transient SQLite errors

Closes #36035
Closes #34678

- Add WAL journal mode to memory SQLite database (fixes openclaw#36035)
  WAL mode is crash-safe and survives SIGTERM/SIGKILL mid-write
- Add transient SQLite error handling (fixes openclaw#34678)
  Classify SQLITE_CANTOPEN, SQLITE_BUSY, SQLITE_LOCKED, SQLITE_IOERR as
  non-fatal errors to prevent LaunchAgent crash loops
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 800c0dcbb2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

const db = new DatabaseSync(dbPath, { allowExtension: this.settings.store.vector.enabled });
// WAL mode is crash-safe and survives SIGTERM/SIGKILL mid-write.
// It also allows concurrent reads during writes.
db.exec("PRAGMA journal_mode = WAL");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Move busy_timeout ahead of WAL pragma

Running PRAGMA journal_mode = WAL before setting PRAGMA busy_timeout leaves the very first statement on a new connection with SQLite's default timeout of 0ms, so a concurrent writer can make database open fail immediately with database is locked instead of waiting/retrying. This regresses the existing concurrency behavior documented in the following comment and can cause startup/reopen failures under normal multi-process contention.

Useful? React with 👍 / 👎.

Comment on lines +71 to +75
const TRANSIENT_SQLITE_CODES = new Set([
"SQLITE_CANTOPEN",
"SQLITE_BUSY",
"SQLITE_LOCKED",
"SQLITE_IOERR",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Match node:sqlite transient errors as non-fatal

The transient SQLite filter only whitelists SQLITE_* codes, but node:sqlite commonly throws lock/open failures as code: "ERR_SQLITE_ERROR" with plain messages like database is locked or unable to open database file; those cases won't match this set and will still fall through to process.exit(1). That means the new unhandled-rejection guard does not actually cover the runtime error shape used by this codepath.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR makes two improvements to SQLite database handling: it sets PRAGMA journal_mode = WAL on every database open in MemoryManagerSyncOps (preventing corruption after abrupt restarts), and it classifies several SQLite error codes as transient so that unhandled rejections containing them log a warning instead of crashing the process.

Key changes:

  • src/memory/manager-sync-ops.ts: PRAGMA journal_mode = WAL is correctly added alongside the existing busy_timeout pragma — straightforward and appropriate.
  • src/infra/unhandled-rejections.ts: New isTransientSqliteError() function and TRANSIENT_SQLITE_CODES set; the unhandled rejection handler now suppresses crashes for these codes.

Concern:

  • SQLITE_CANTOPEN in TRANSIENT_SQLITE_CODES is problematic. Unlike SQLITE_BUSY/SQLITE_LOCKED (contention) and SQLITE_IOERR (I/O blips), SQLITE_CANTOPEN typically indicates a permanent problem — wrong path, missing directory, or bad permissions. Silently continuing when the database can't be opened could mask misconfiguration and cause silent data loss rather than a visible, fixable crash.

Confidence Score: 3/5

  • The WAL pragma change is safe; the transient SQLite error classification is mostly correct but misclassifies SQLITE_CANTOPEN as non-fatal, risking silent data loss.
  • The WAL pragma addition is a well-understood, low-risk improvement for crash safety. The isTransientSqliteError() implementation is structurally sound and mirrors the established network-error pattern. However, including SQLITE_CANTOPEN in the transient set is problematic — it commonly signals a permanent inability to open the database file, and suppressing the crash could cause the gateway to silently operate without a functioning database, hiding misconfiguration from operators and causing data loss. This core issue warrants revision.
  • src/infra/unhandled-rejections.ts — specifically the inclusion of SQLITE_CANTOPEN in TRANSIENT_SQLITE_CODES.

Last reviewed commit: 800c0dc

Comment on lines +71 to +76
const TRANSIENT_SQLITE_CODES = new Set([
"SQLITE_CANTOPEN",
"SQLITE_BUSY",
"SQLITE_LOCKED",
"SQLITE_IOERR",
]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLITE_CANTOPEN may not be transient

SQLITE_CANTOPEN (error code 14) typically indicates a permanent problem — the database file path doesn't exist, permissions are wrong, or the parent directory is missing. This is fundamentally different from SQLITE_BUSY and SQLITE_LOCKED, which are genuinely transient contention errors.

By treating SQLITE_CANTOPEN as non-fatal and suppressing the crash, the gateway will silently continue in a degraded state where all database operations fail. This risks:

  • Hiding misconfigured store.path settings
  • Masking cases where the database directory was deleted or the user's home directory is unmounted
  • Silently losing memory writes (the write fails, the process continues, but the data is never persisted)

Consider removing SQLITE_CANTOPEN from the transient set, or only classifying it as transient if the error occurs during a background/non-critical operation rather than at the handler level where context is lost.

Suggested change
const TRANSIENT_SQLITE_CODES = new Set([
"SQLITE_CANTOPEN",
"SQLITE_BUSY",
"SQLITE_LOCKED",
"SQLITE_IOERR",
]);
const TRANSIENT_SQLITE_CODES = new Set([
"SQLITE_BUSY",
"SQLITE_LOCKED",
"SQLITE_IOERR",
]);
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/unhandled-rejections.ts
Line: 71-76

Comment:
`SQLITE_CANTOPEN` may not be transient

`SQLITE_CANTOPEN` (error code 14) typically indicates a permanent problem — the database file path doesn't exist, permissions are wrong, or the parent directory is missing. This is fundamentally different from `SQLITE_BUSY` and `SQLITE_LOCKED`, which are genuinely transient contention errors.

By treating `SQLITE_CANTOPEN` as non-fatal and suppressing the crash, the gateway will silently continue in a degraded state where all database operations fail. This risks:
- Hiding misconfigured `store.path` settings
- Masking cases where the database directory was deleted or the user's home directory is unmounted
- Silently losing memory writes (the write fails, the process continues, but the data is never persisted)

Consider removing `SQLITE_CANTOPEN` from the transient set, or only classifying it as transient if the error occurs during a background/non-critical operation rather than at the handler level where context is lost.

```suggestion
const TRANSIENT_SQLITE_CODES = new Set([
  "SQLITE_BUSY",
  "SQLITE_LOCKED",
  "SQLITE_IOERR",
]);
```

How can I resolve this? If you propose a fix, please make it concise.

@liangruochong44-ui
Copy link
Copy Markdown
Author

This PR also helps with readonly database issues (#19167, #34710) since the WAL mode and transient error handling improve database stability and prevent many of the race conditions that cause readonly errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory SQLite should use WAL journal mode by default SQLITE_CANTOPEN should be classified as transient, not fatal (kills LaunchAgent)

1 participant