feat(cli): add clean subcommand to remove wings and rooms#548
feat(cli): add clean subcommand to remove wings and rooms#548brandonhon wants to merge 3 commits intoMemPalace:developfrom
Conversation
Adds `mempalace clean --wing X [--room Y]` for reclaiming palace space and starting projects over. Removes drawers from both `mempalace_drawers` and `mempalace_compressed` in lockstep, with an interactive confirmation prompt (skippable via `--yes`) and `--dry-run` for previews. Each clean operation appends a single summary entry to the WAL for audit. The knowledge graph is intentionally left untouched and the help text calls this out.
web3guru888
left a comment
There was a problem hiding this comment.
This fills a real gap — there's currently no first-class way to remove a wing once a project is done. Good motivation and clean scope.
What works well:
- Wing/room granularity is the right split. Wing-level clean covers the "project is done" case; room-level covers "re-mine after restructuring."
drawers+compressedkept in lockstep. This is important — without it, compressed stale entries would survive a clean.- WAL logging for the operation. Audit trail is the right default for a destructive command.
- Test coverage is thorough: dry-run, prompt cancel/confirm, no-compressed-collection graceful path, WAL verification. 11 tests is solid.
--yesflag for scripted use.
One issue with count_drawers + delete_drawers on large palaces:
Both functions call collection.get(where=...) to count first, then collection.delete(where=...). On a 10K+ drawer palace this means two full ChromaDB scans. The delete path could do delete(where=...) directly and get the count from the returned deletion count if ChromaDB exposes it — or just accept a single-pass "delete and count" by calling get with include=[] once and immediately deleting the returned IDs.
This is a performance concern, not a correctness concern. For typical palace sizes (hundreds to low thousands) it's not a problem. But worth noting.
Knowledge graph note:
The clean command correctly leaves the KG untouched and documents this explicitly. Worth considering whether --with-kg or a follow-up mempalace clean-kg --wing ... makes sense as a companion command — once you've deleted all drawers from a wing, the KG entities that only had provenance from those drawers become orphaned. Not blocking, just flagging for follow-on.
Minor:
The WAL entry uses datetime.now() (local time). Other WAL entries in the codebase use UTC (datetime.utcnow() or datetime.now(timezone.utc)). Worth being consistent.
Otherwise clean implementation. LGTM with those minor notes.
|
@web3guru888
Will push updates to this branch over the weekend. |
|
@brandonhon — all three resolutions sound right:
No rush — enjoy the weekend. The PR is solid as-is; these are all incremental improvements on an already correct implementation. |
…tion Addresses PR MemPalace#548 review feedback about scan amplification on large palaces. The previous implementation made up to six ChromaDB scans per clean operation: 1. count_drawers(drawers_col) — scan MemPalace#1 2. count_drawers(compressed_col) — scan MemPalace#2 3. delete_drawers(drawers_col) — internal get(where=...) scan MemPalace#3 4. delete_drawers(drawers_col) — delete(where=...) scan MemPalace#4 5. delete_drawers(compressed_col) — scan MemPalace#5 6. delete_drawers(compressed_col) — scan MemPalace#6 Each call was doing its own metadata filter over the collection, meaning a 100K-drawer palace paid the filter cost six times for a single cleanup. This refactor drops it to exactly two scans — one per collection — regardless of palace size. Changes: * palace.py: introduce `find_drawer_ids(col, wing, room)` which returns the matching ID list in a single `get(where=..., include=[])` call. ChromaDB fetches only IDs — no documents, embeddings, or metadatas — so the scan is as cheap as ChromaDB can make it. * palace.py: `count_drawers` is now a thin wrapper around `find_drawer_ids`. The old standalone `delete_drawers` helper is removed because its count-then-delete pattern is exactly what we are trying to avoid. * cli.py::cmd_clean: call `find_drawer_ids` once per collection, reuse the returned ID lists for both the preview counts and the subsequent delete. Deletes go through `col.delete(ids=[...])` which is an O(n) primary-key delete, not another metadata scan. Empty lists are guarded to stay compatible with ChromaDB versions that reject `delete()` with no ids or where filter. Why two scans and not one: ChromaDB collections are independent — there is no cross-collection query. `mempalace_drawers` and `mempalace_compressed` must each be filtered separately. A single-scan variant would have to assume that compressed IDs are a strict subset of drawer IDs and delete compressed by the drawer IDs we already found, but `tool_delete_drawer` in mcp_server.py does not cascade to compressed, so real palaces can contain orphaned compressed rows whose drawer is already gone. Going to one scan would silently leak those orphans. Two scans is the minimum that preserves correctness. Tests: * New `test_find_drawer_ids_*` unit tests cover wing-only, wing+room, and no-match cases. * New `test_find_drawer_ids_single_scan` monkey-patches `collection.get` to assert exactly one call. * New `test_clean_scans_each_collection_exactly_once` is a regression test that wraps `chromadb.PersistentClient` and counts `where`-filtered `get()` calls per collection during a full `cmd_clean` invocation, failing if either collection is scanned more than once. * The existing 15 CLI black-box tests stay identical — the behavior is unchanged, only the scan count dropped. Full suite: 551 passed (was 549, +2 new perf regression tests). Also in this commit: `-V` / `--version` flags, because every good app needs a version flag and we somehow shipped three minor releases without one. The installed version is now embedded in the `-h` description line too, so `mempalace -h` answers "what am I running?" without a separate invocation.
|
@web3guru888 |
Summary
Adds a new
mempalace cleansubcommand for reclaiming palace space and starting projects over. It removes drawers at two granularities — an entire wing, or a specific room within a wing — and keeps themempalace_drawersandmempalace_compressedcollections in lockstep so compressed copies never go stale after a cleanup.Motivation
Today there is no first-class way to remove memories once a project is done. Users either re-mine forever or hand-edit ChromaDB. The new subcommand gives a safe, auditable way to scrub scoped memory without touching the rest of the palace.
UX
Sample output:
Design decisions
--wingis required. No implicit "clean everything" — refuses to run without a wing filter. This is intentional: the cost of an accidental full wipe is very high.--wingand--room. I considered splitting into--room(clear contents) vs--del-room(remove the room itself), but rooms and wings are metadata-only in mempalace today — they exist only as fields on drawer metadata, not as first-class records. Once the last drawer in a room is deleted the room disappears fromlist_roomsautomatically, so "clear contents" and "delete room" would produce identical observable state. Added a registry was deemed scope creep.~/.mempalace/knowledge_graph.sqlite3do not carry awingfield, so they can't be filtered cleanly by wing without false positives. The--helptext explicitly notes this and the runtime output reminds the user after every cleanup.mempalace_compressedshares drawer IDs withmempalace_drawers. Leaving compressed copies behind after a wing/room cleanup would be a correctness bug, so both collections are filtered and deleted in the same operation.clean_wingorclean_roomrecord is appended to~/.mempalace/wal/write_log.jsonlwith{wing, room, drawers_deleted, compressed_deleted}. Per-drawer logging was rejected as noisy for bulk ops.[y/N]with a count summary before deleting.--yesskips the prompt for scripting.--dry-runshows counts and exits without touching anything.mempalace_compresseddoesn't exist (palace was never compressed), the cleanup proceeds againstmempalace_drawersonly without error.Files changed
mempalace/palace.py— added_build_where,count_drawers,delete_drawershelpers (+34 lines). Keeps the CLI layer thin and makes the primitives reusable for a future MCPcleantool.mempalace/cli.py— addedcmd_clean, thecleansubparser, and a dispatch entry. Docstring updated with the new command in the help epilog (+142 lines).tests/test_clean.py— 15 new tests (+221 lines).Tests
Added
tests/test_clean.pywith 15 tests covering:Primitive layer (
palace.count_drawers/delete_drawers):CLI layer (
cli.cmd_clean):mempalace_drawersandmempalace_compressed--dry-runmakes zero changesnat the prompt cancels the deleteyat the prompt proceedsclean_roomWAL entry with the expected counts--wingflag exits non-zeroTest results
All 549 tests in the suite pass (15 new, 534 existing). No regressions.
Test plan
python3 -m pytest tests/test_clean.py -vpasses locallypython3 -m pytest -q— full suite still greenpython3 -m mempalace clean --helpshows the expected usagemempalace clean --wing <some-wing> --dry-runreports accurate countsmempalace statusormempalace_list_roomsthat the wing/room is gone~/.mempalace/wal/write_log.jsonlwithoperation: clean_wing(orclean_room) and correct countsmempalace_compressedwas cleaned in lockstepNotes for reviewers
--helptext and runtime output so users aren't surprised. If you want KG cleanup later, it would need a new wing tag on KG triples — out of scope for this PR.delete_drawersuses ChromaDB's nativedelete(where=...). The_build_wherehelper produces either{"wing": X}or{"$and": [{"wing": X}, {"room": Y}]}depending on scope.brandonhonGitHub noreply email for email-privacy compliance.