fix: rotate WAL file at 10 MB to prevent unbounded growth#573
fix: rotate WAL file at 10 MB to prevent unbounded growth#573arnoldwender wants to merge 1 commit intoMemPalace:developfrom
Conversation
web3guru888
left a comment
There was a problem hiding this comment.
This fixes a genuine production pain point — WAL files growing to gigabytes silently is something we've run into with high-frequency add_drawer patterns.
The implementation is clean. A few observations:
Overhead: _wal_rotate() calls _WAL_FILE.stat() on every single write. For high-throughput workloads this is a syscall per write. Not catastrophic, but could be batched (e.g., only check every N writes or every 60 seconds). At 10 MB / ~500 bytes per entry that's ~20,000 writes between rotations, so the amortized cost is low. Fine as-is.
Single backup race: If two concurrent writers both hit _wal_rotate() just as the file crosses 10 MB, both will try to rename and one will get FileNotFoundError. This is caught by the outer except Exception and logged, so it won't blow up — but the second caller will then write to a fresh file and the backup from the first rotation will be intact. Acceptable behavior.
0o600 permissions: chmod after rename is a TOCTOU window — the file is temporarily accessible at the backup path without restricted permissions. Ideally you'd rename to a temp path, chmod, then rename to .1. Minor, but worth noting if the WAL contains sensitive memory contents (it does).
The f-string → %s logging fix is a nice bonus (avoids evaluating the format string when the log level is filtered).
Overall: the core fix is correct and the edge cases are non-fatal. LGTM.
…compress keys - Default new collections to cosine distance (hnsw:space=cosine) instead of L2 — fixes negative similarity scores (MemPalace#568) - Filter unexpected MCP tool args before dispatch — prevents TypeError crash from extra params like top_k (MemPalace#572) - Rotate WAL at 10 MB with one backup to prevent unbounded growth (MemPalace#573) - Fix cmd_compress KeyError: align dict keys with compression_stats() return values (MemPalace#569) - Fix spellcheck _load_known_names: read from "people" key, use dict keys as canonical names (MemPalace#570) - Repair command uses cosine distance for rebuilt collection Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Incorporated into #562 — WAL rotation at 10 MB with one |
web3guru888
left a comment
There was a problem hiding this comment.
WAL rotation is a sensible maintenance feature — unbounded growth is a real operational problem for anyone running intensive agent workloads. The 10 MB threshold is a reasonable default.
A few design notes:
Backup count: A single .jsonl.1 backup means you lose visibility into entries older than one rotation cycle. For most use cases that's fine, but if someone is using the WAL for audit purposes (e.g. replay after palace corruption), losing the .1 file on the second rotation is surprising. Worth documenting the retention policy explicitly — even a comment in _wal_rotate() saying "only the most recent backup is kept; use wal_replay before WAL grows >10 MB if you need full audit history" would set expectations correctly.
Rotation under concurrent writers: If two processes call _wal_log() near-simultaneously and both observe the size threshold, both may try to rename write_log.jsonl to write_log.jsonl.1. On POSIX, os.rename is atomic so one will win and the other will rotate an already-rotated file — resulting in some entries in .1 getting overwritten. For single-process use this is fine; for multi-process MCP server setups, worth a file lock around the rotate-then-reopen sequence. A simple fcntl.flock on the WAL file would cover it.
Missing tests: The checklist shows all four test scenarios as unchecked. These aren't hard to write — a tmp_path fixture with a pre-seeded WAL file over the threshold, assert that .1 is created and the original is reset. Would be good to have them before merge.
Configurable threshold: _WAL_MAX_BYTES as a module constant is the right approach — clean and easy to override in tests. Alternatively, expose it via mempalace.yaml config (wal_max_bytes: 10_000_000) so operators running high-throughput pipelines can tune it without patching.
Directionally right. The core logic is straightforward — the concurrent writer edge case is the main thing worth hardening before this goes to production.
[MemPalace-AGI integration — 215 tests, 710 KG entities]
|
Thanks for the detailed review — all valid points. Overhead: Agreed, Concurrent writers: Fair point on the POSIX rename race. For single-process MCP this is non-fatal as you noted. A follow-up could add TOCTOU on chmod: The 0o700 parent directory limits exposure, but noted for a future hardening pass. Tests: Will add rotation coverage — Backup count / retention: Good call on documenting this. Will add an inline comment clarifying single-backup semantics and the recommendation to replay before rotation if full audit history is needed. Configurable threshold: Exposing via |
Summary
_wal_log()— rotateswrite_log.jsonlwhen it exceeds 10 MBwrite_log.jsonl.1), discards older backups on next rotationProblem
The write-ahead log (
~/.mempalace/wal/write_log.jsonl) appends every write operation forever with no rotation or size limit. Over months of heavy use with frequentadd_drawer,kg_add, anddiary_writecalls, this file can grow to gigabytes, consuming disk space silently.Fix
_wal_rotate()function checks file size before each write_WAL_MAX_BYTES).jsonl.1) for recent audit trail0o600permissionsTest plan
.jsonl.1, new file created.jsonl.1backup — overwritten on next rotation