feat: add update command for incremental palace sync#353
feat: add update command for incremental palace sync#353phobicdotno wants to merge 25 commits intoMemPalace:developfrom
Conversation
web3guru888
left a comment
There was a problem hiding this comment.
Review: mempalace update — Incremental Sync
Great addition — incremental sync is something I've been wanting for our integration project. The approach of leveraging existing scan_project() and process_file() is exactly right (no parallel reimplementation), and the new/changed/deleted classification is clean. A few observations:
1. Hardcoded limit=100000 on collection.get()
results = collection.get(
where={"wing": wing},
include=["metadatas"],
limit=100000,
)ChromaDB's get() with limit=None (or omitted) returns all matching results — no pagination needed. The hardcoded 100K silently truncates large palaces: wing gets partially synced, some files appear as "new" every run, and stale drawers never get cleaned up. Since there's no warning when the limit is hit, this would be a hard-to-debug issue.
Suggestion: just drop the limit parameter entirely:
results = collection.get(
where={"wing": wing},
include=["metadatas"],
)2. Silent except Exception on wing query
try:
results = collection.get(
where={"wing": wing},
include=["metadatas"],
limit=100000,
)
except Exception:
results = {"ids": [], "metadatas": []}If the wing query fails (corrupt DB, permission error, ChromaDB bug), this silently treats the entire palace as empty. Every file becomes "new", gets re-mined, and you end up with duplicate drawers alongside the originals. That's the opposite of what a user expects from update.
At minimum, logging the exception would help. But arguably, if we can't read the existing state, the update should abort rather than silently double everything:
try:
results = collection.get(
where={"wing": wing},
include=["metadatas"],
)
except Exception as e:
print(f" ✗ Failed to read wing '{wing}': {e}")
print(" Cannot determine current state — aborting update.")
return3. Float equality for mtime comparison
if stored_mtime is not None and float(stored_mtime) == current_mtime:
unchanged += 1I tested this and ChromaDB does round-trip Python floats bit-exactly (at least on 0.6.x with the SQLite backend), so this works in practice today. But it's worth noting that this is an implementation detail of ChromaDB's storage layer, not a documented guarantee. If they ever switch to a format that loses precision (JSON serialization, for example), this comparison would start seeing every file as "changed".
A small epsilon would make this robust against future storage changes:
if stored_mtime is not None and abs(float(stored_mtime) - current_mtime) < 0.01:
unchanged += 110ms is well below any real file modification granularity (most filesystems have 1-second or coarser resolution). Low priority since it works today, but it's cheap insurance.
4. Test coverage gaps
The single integration test is solid for the happy path, but a couple of edge cases would be nice:
- Dry-run mode: Verify that
--dry-runreports changes but doesn't modify the collection. A quickassert col.count() == initial_countafter a dry-run update would do it. - Deletion-only path: A test where only files are deleted (no new/changed) to exercise the cleanup path in isolation.
- Empty palace: Calling
updatebefore anymine— currently the non-dry-run path would hitget_collection()which creates the collection, then see everything as new. That's probably fine, but worth a test to document the behavior.
5. Minor: redundant float() cast
palace_files[sf]["source_mtime"] = float(meta["source_mtime"])…and later:
if stored_mtime is not None and float(stored_mtime) == current_mtime:The value is already cast to float when building palace_files, so the second float(stored_mtime) is a no-op. Minor, but removing it reduces cognitive load.
Overall this is a clean, well-scoped feature that reuses existing infrastructure correctly. The main things I'd want addressed before merge are #1 (the silent 100K truncation) and #2 (the silent exception on wing query). The rest are nice-to-haves.
Nice work 👍
0a5649a to
782f537
Compare
|
Thanks for the thorough review — all great catches. Pushed fixes: #1 (limit=100000) — Dropped the #2 (silent except) — Now prints the error and aborts instead of silently treating the palace as empty. No more phantom duplicates. #3 (float mtime epsilon) — Added #4 (tests) — Added three new tests:
All 16 tests pass, ruff clean. |
Addresses review feedback: - Remove hardcoded limit=100000 on collection.get() - Abort on wing query failure instead of silently treating palace as empty - Use epsilon comparison for mtime floats - Remove redundant float() cast - Add tests: dry-run, deletion-only, empty-palace edge cases
782f537 to
8b9cbde
Compare
|
Also addressed #5 — applied the same epsilon comparison ( |
The initialize handler hardcoded protocolVersion "2024-11-05", which causes newer MCP clients (e.g. Claude Code) to reject the connection when they negotiate "2025-11-25" or later. Echo the client's requested version if it is in the supported set, otherwise fall back to the latest supported version. This keeps backwards compatibility with older clients while allowing newer ones to connect. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Fix command injection in hook script (pass paths via sys.argv) - Add sanitize_name/sanitize_content validators in config.py - Add 10MB file size guard + symlink skip in miners - Fix SQLite connection leak in knowledge_graph.py (reuse connection) - Use `with conn:` for proper transaction handling - Consolidate shared palace operations into palace.py - Add write-ahead log for audit trail on writes/deletes - Add metadata cache with 30s TTL for status/taxonomy calls - Upgrade md5 → sha256 for drawer/triple IDs - Harden file permissions (0o700/0o600) - Pin chromadb>=0.5.0,<0.7 Based on PR MemPalace#252 by @anthonyonazure with lint fixes applied. Co-Authored-By: anthonyonazure <[email protected]>
…_DIRS - _client → _client_cache to match conftest.py reset fixture - _get_collection now uses _get_client() return value instead of stale ref - Restore .pytest_cache and other dirs missing from palace.py SKIP_DIRS
The 30s TTL metadata cache returned stale data between test runs and after write operations. Reverted to direct col.get() reads which match the original behavior and pass all tests.
Remove datetime.now() from drawer_id hash so same content + wing + room always produces the same ID. This enables the idempotency check that returns "already_exists" on duplicate writes.
… harden security Review fixes (from Sage's review): - Restore mtime check in file_already_mined (check_mtime=True for miner) - Restore limit=10000 on MCP metadata fetches to prevent OOM on large palaces - Apply _SAFE_NAME_RE regex in sanitize_name (was dead code) - Drop raw_aaak metadata duplication in diary_write - chmod 0o700 on WAL dir, 0o600 on WAL file - Add check_same_thread=False on KnowledgeGraph SQLite connection - Remove __del__ (unreliable) and dead PRAGMA foreign_keys=ON
Covers the check_mtime=True path in palace.py to meet 85% coverage threshold.
Three critical bugfixes: 1. MCP server hangs on null arguments (MemPalace#394) — `params.get("arguments", {})` returns None when JSON has `"arguments": null`. Changed to `or {}`. 2. cmd_repair infinite recursion (MemPalace#395) — trailing slash on palace_path caused backup_path to be inside the source dir. Strip trailing sep. 3. OOM on large transcript files (MemPalace#396) — split_mega_files.py and normalize.py load entire files into memory. Added 500MB safety limit with clear skip/error messages. Closes MemPalace#394, MemPalace#395, MemPalace#396.
…ixes - Move file size check before try block so IOError propagates cleanly (not caught by the except OSError handler below it) - Wrap os.path.getsize in its own try/except to preserve existing test_normalize_io_error behavior on missing files - Add test_normalize_rejects_large_file (mocked getsize) - Add test_null_arguments_does_not_hang (MemPalace#394) - Add test_cmd_repair_trailing_slash_does_not_recurse (MemPalace#395) 532 tests pass locally, 0 regressions.
PyPI release cut covering 39 merged PRs since v3.0.0 on 2026-04-06. Highlights: Claude/Codex plugin packaging (MemPalace#270), security hardening (MemPalace#387), honest AAAK stats + benchmark corrections (MemPalace#147), Windows compatibility fixes, Knowledge Graph WAL mode + batching, 10K limit safety caps, and much more. See GitHub release notes for full changelog. Co-authored-by: milla-jovovich <[email protected]>
bump-plugin-version.yml has been failing on every merge to main since today's security + plugin-packaging work, because it tries to push directly to main and branch protection blocks it. It also conflicts with the manual version-management pattern we're currently using (manual bumps in PRs like MemPalace#409 for 3.1.0). Renaming to .yml.disabled so GitHub Actions skips it. If we want auto-bumps later, the workflow needs to open a PR instead of pushing directly, and coordinate with manual version bumps. Co-authored-by: milla-jovovich <[email protected]>
* fix: add mcp command with setup guidance * fix: include --palace guidance in mcp command output * fix: make mcp guidance commands copy-pastable --------- Co-authored-by: Milla J <[email protected]>
Complete OpenClaw skill exposing all MCP tools with session protocol, auto-install spec, and setup instructions for OpenClaw + other MCP hosts. Covers all 20 tools: search, check_duplicate, status, list_wings, list_rooms, get_taxonomy, get_aaak_spec, kg_query, kg_add, kg_invalidate, kg_timeline, kg_stats, traverse, find_tunnels, graph_stats, add_drawer, delete_drawer, diary_write, diary_read. Based on PR MemPalace#207 by @wanikua — updated to v3.1.0, added missing tools (check_duplicate, get_aaak_spec), expanded parameter docs, added OpenClaw CLI setup command. Co-Authored-By: wanikua <[email protected]>
Addresses review feedback: - Remove hardcoded limit=100000 on collection.get() - Abort on wing query failure instead of silently treating palace as empty - Use epsilon comparison for mtime floats - Remove redundant float() cast - Add tests: dry-run, deletion-only, empty-palace edge cases
Summary
Adds
mempalace update <dir>— an incremental sync that detects new, changed, and deleted files without requiring a full re-mine.source_mtime) have old drawers removed and are re-minedUsage
Supports
--wing,--no-gitignore,--include-ignored,--agent,--dry-run— same flags asmine.Changes
mempalace/miner.py— Addedupdate()function using existingscan_project(),process_file(), andsource_mtimetrackingmempalace/cli.py— Addedupdatesubparser andcmd_update()handlertests/test_miner.py— Addedtest_update_detects_changes()covering new/changed/deleted file detectionTest plan
pytest tests/ -v)