fix(memory): use WAL journal mode for SQLite memory index#36535
fix(memory): use WAL journal mode for SQLite memory index#36535anthonyfig wants to merge 1 commit intoopenclaw:mainfrom
Conversation
Set PRAGMA journal_mode=WAL and PRAGMA busy_timeout=5000 when opening the memory index database. WAL mode survives SIGTERM/SIGKILL mid-write and prevents database corruption that occurs with the default rollback journal when the gateway is killed during write operations. busy_timeout prevents immediate SQLITE_BUSY errors under contention. Closes openclaw#36035
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c10436bd65
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| return new DatabaseSync(dbPath, { allowExtension: this.settings.store.vector.enabled }); | ||
| const db = new DatabaseSync(dbPath, { allowExtension: this.settings.store.vector.enabled }); | ||
| // WAL survives SIGTERM/SIGKILL mid-write and allows concurrent readers. | ||
| db.exec("PRAGMA journal_mode = WAL"); |
There was a problem hiding this comment.
Set busy_timeout before switching journal mode
When an existing index is still in journal_mode=delete and another process holds a write transaction (for example, overlapping gateway processes during restart), PRAGMA journal_mode = WAL can raise SQLITE_BUSY; because busy_timeout is configured only afterward, this failure is immediate and aborts manager initialization. Before this change, opening the DB in that state could still proceed (e.g., schema no-op reads), so this introduces a startup regression that can temporarily disable memory search until the lock clears. Configure PRAGMA busy_timeout first (or retry the WAL switch) so first-open migration to WAL is lock-tolerant.
Useful? React with 👍 / 👎.
Greptile SummarySummaryThis PR adds Correctness: Both PRAGMAs are applied before the DB is used, so the main index and the temp reindex DB both get WAL mode through the same code path. Backward compatibility: SQLite converts a WAL mode verification: Tests: 3 new tests cover fresh-DB WAL mode, Confidence Score: 4/5
Last reviewed commit: c10436b |
| db.exec("PRAGMA journal_mode = WAL"); | ||
| db.exec("PRAGMA busy_timeout = 5000"); |
There was a problem hiding this comment.
WAL mode activation not verified
db.exec() discards the result of PRAGMA journal_mode = WAL. SQLite can silently keep the current mode (returning "delete" instead of "wal") when WAL is unsupported — e.g., on network filesystems (NFS/SMB), read-only mounts, or when another connection has the DB open in a different journaling mode. Since this PR's entire purpose is to guarantee WAL protection against corruption, it would be safer to verify the PRAGMA succeeded and log a warning if it didn't:
const modeRow = db.prepare("PRAGMA journal_mode = WAL").get() as { journal_mode: string };
if (modeRow.journal_mode !== "wal") {
log.warn(`memory index: could not enable WAL journal mode (got '${modeRow.journal_mode}'); database may be at risk of corruption under SIGTERM`);
}
db.exec("PRAGMA busy_timeout = 5000");This is a low-probability scenario on typical local filesystems, but since the fix's safety goal could silently fail without warning on edge-case mounts, surfacing a diagnostic message would help operators catch misconfiguration.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/memory/manager-sync-ops.ts
Line: 263-264
Comment:
**WAL mode activation not verified**
`db.exec()` discards the result of `PRAGMA journal_mode = WAL`. SQLite can silently keep the current mode (returning `"delete"` instead of `"wal"`) when WAL is unsupported — e.g., on network filesystems (NFS/SMB), read-only mounts, or when another connection has the DB open in a different journaling mode. Since this PR's entire purpose is to guarantee WAL protection against corruption, it would be safer to verify the PRAGMA succeeded and log a warning if it didn't:
```typescript
const modeRow = db.prepare("PRAGMA journal_mode = WAL").get() as { journal_mode: string };
if (modeRow.journal_mode !== "wal") {
log.warn(`memory index: could not enable WAL journal mode (got '${modeRow.journal_mode}'); database may be at risk of corruption under SIGTERM`);
}
db.exec("PRAGMA busy_timeout = 5000");
```
This is a low-probability scenario on typical local filesystems, but since the fix's safety goal could silently fail without warning on edge-case mounts, surfacing a diagnostic message would help operators catch misconfiguration.
How can I resolve this? If you propose a fix, please make it concise.
Set PRAGMA journal_mode=WAL and PRAGMA busy_timeout=5000 when opening the memory index database. WAL mode survives SIGTERM/SIGKILL mid-write and prevents database corruption that occurs with the default rollback journal when the gateway is killed during write operations.
busy_timeout prevents immediate SQLITE_BUSY errors under contention.
Closes #36035
Summary
journal_mode=delete(SQLite default). If the gateway is killed mid-write (SIGTERM during restart/update/config change), the journal file is deleted before the write completes, corrupting the database.rm + openclaw memory index). Gateway restarts are frequent.PRAGMA journal_mode = WALandPRAGMA busy_timeout = 5000inopenDatabaseAtPath()— the single entry point where all memory SQLite databases are opened (main index + temp reindex). Added 3 tests.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
User-visible / Behavior Changes
None visible. Existing databases are transparently converted to WAL on first open (SQLite handles this automatically). Two sidecar files (
-wal,-shm) will appear next to the database file —moveIndexFiles()andremoveIndexFiles()already handle these suffixes.Security Impact (required)
NoNoNoNoNoRepro + Verification
Environment
Steps
pnpm installvitest run src/memory/manager.wal-journal.test.ts(focused WAL tests)vitest run src/memory/(full memory suite)pnpm test(full project suite)Expected
PRAGMA journal_modereturnswalon opened databases.Actual
Evidence
New test
src/memory/manager.wal-journal.test.tsasserts:PRAGMA journal_modereturnswalafter openingPRAGMA busy_timeoutreturns5000Human Verification (required)
openDatabase()which callsopenDatabaseAtPath().Compatibility / Migration
YesNoNo— SQLite converts automatically on firstPRAGMA journal_mode = WAL.Failure Recovery (if this breaks)
db.exec("PRAGMA ...")lines inmanager-sync-ops.ts:openDatabaseAtPath().src/memory/manager-sync-ops.tsSQLITE_BUSYerrors (unlikely with 5s busy_timeout), unexpected-wal/-shmfiles appearing alongsidemain.sqlite(expected behavior).Risks and Mitigations
-wal,-shm) must travel with the database during atomic reindex swaps.moveIndexFiles()andremoveIndexFiles()already handle["", "-wal", "-shm"]suffixes — this was designed for WAL from the start.