Skip to content

feat(cli): add mempalace purge — delete drawers by wing/room#1087

Open
jphein wants to merge 3 commits intoMemPalace:developfrom
jphein:pr/cmd-purge-cli
Open

feat(cli): add mempalace purge — delete drawers by wing/room#1087
jphein wants to merge 3 commits intoMemPalace:developfrom
jphein:pr/cmd-purge-cli

Conversation

@jphein
Copy link
Copy Markdown
Collaborator

@jphein jphein commented Apr 21, 2026

Summary

Adds a destructive CLI command that removes drawers matching a wing and/or room filter. Example usage:

mempalace purge --wing old-project                # nuke everything in one wing
mempalace purge --wing big-wing --room drafts     # just that room in that wing
mempalace purge --room _registry --yes            # cross-wing by room, skip confirm

Users currently have no clean way to delete drawers by scope. This closes that gap.

Why the implementation is "nuke-and-reinsert" rather than collection.delete()

The obvious implementation is collection.delete(where=...). That path triggers hnswlib's internal updatePointrepairConnectionsForUpdate, which — on macOS ARM64 with Python 3.13 + chromadb 0.6.3 — has a thread-safety bug that dereferences dangling pointers and kills the process with EXC_BAD_ACCESS. See #521 (@StefanKremen, 2026-04-13) for the reproducer.

To be clear, this PR does NOT fix #521. The miner's modified-file re-mine path still hits the same bug on every edit, because it also delete-then-inserts. Fixing that is a separate, larger effort (possibly a hnswlib or chroma-core upstream fix, or a structural change to make re-mines rebuild rather than mutate).

This PR just avoids the bug for the one new operation we're adding, using a pattern that sidesteps updatePoint entirely:

  1. Extract drawers to keep (the ones NOT matching the filter).
  2. Release ChromaDB handles.
  3. shutil.rmtree(palace_path).
  4. Create a fresh PersistentClient + create_collection at the same path.
  5. Re-insert the keeps in batches.

The tradeoff: a full rebuild. On a 100K-drawer palace, that's a few minutes of embedding computation (if the embedding fn is re-run — chromadb caches) and disk I/O. Given that purge is an explicit user-initiated destructive op, not a hot path, the cost is acceptable. The alternative (collection.delete() + SIGSEGV) is not.

What's in the diff

  • mempalace/cli.pycmd_purge function (~112 lines), argparse block, dispatch entry. +125/-0.
  • tests/test_cli.py — 4 tests mocking chromadb + shutil at module-import time:
    1. No palace found → clean error message.
    2. No --wing or --room → refuses to run (no mass-delete safety valve).
    3. Filter matches zero drawers → exit cleanly, don't rebuild.
    4. Both --wing and --room → builds the $and composite filter.
      +81/-0.

All tests mock at the sys.modules layer because chromadb and shutil are imported lazily inside cmd_purge. Full tests/test_cli.py suite: 46 passed.

Interface design — open to feedback

This is the first destructive command in the CLI. A few defaults I chose without strong conviction — happy to change any of these based on maintainer preference:

  • --wing and/or --room is required. Running mempalace purge with no flags just prints Error: specify --wing and/or --room and returns. Deliberate: there's no "purge everything" mode, because that's rm -rf ~/.mempalace/palace.
  • --room without --wing purges across all wings. Makes it possible to, e.g., delete every _registry room. Open to the alternative: require --wing alongside --room (safer, but less useful).
  • Interactive confirmation by default, --yes to skip. Mirrors --yes on init and migrate.
  • No --dry-run yet. Could add one that prints "would purge N drawers, would keep M" without actually doing the rebuild. Let me know if that's worth the extra code.

Test plan

  • mempalace purge --help renders clean help block.
  • ruff check + ruff format --check clean (with pinned 0.4.10 per upstream CI).
  • pytest tests/test_cli.py — 46 passed (4 new).

Context

Running this implementation in production on a 165K-drawer fork palace since 2026-04-10 — used it to purge accidentally-mined test content and polluted conversation wings. The nuke-and-reinsert path has cleanly purged at wing-granularity (~10K-drawer wings) and room-granularity (~5K-drawer rooms) without a single EXC_BAD_ACCESS over that period. With in-place collection.delete(), the same operations reliably crashed within the first few hundred deletes.

Closes #848.

Copilot AI review requested due to automatic review settings April 21, 2026 23:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new destructive CLI subcommand, mempalace purge, to delete drawers scoped by --wing and/or --room by rebuilding the underlying ChromaDB palace directory (to avoid the reported in-place delete/update crash path).

Changes:

  • Implement cmd_purge with wing/room filtering, optional confirmation, and “nuke-and-reinsert” rebuild logic.
  • Extend CLI argparse + dispatch table with the new purge subcommand.
  • Add unit tests for basic guardrails and filter construction for cmd_purge.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
mempalace/cli.py Adds cmd_purge, wires up purge subcommand + dispatch entry.
tests/test_cli.py Adds tests covering missing-palace handling, required filters, zero-match behavior, and $and filter composition.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/cli.py Outdated
print(f" Will keep {keep_count:,} drawers, rebuild index")

if not args.yes:
confirm = input(f" Purge {len(match_ids):,} drawers? [y/N] ").strip().lower()
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input() here can raise EOFError when purge is run non-interactively (e.g., piped/CI), causing an unhandled exception and leaving the command in a confusing state. Consider reusing mempalace.migrate.confirm_destructive_action(...) (used by cmd_repair/migrate) or at least catching EOFError and printing the standard "Re-run with --yes" message.

Suggested change
confirm = input(f" Purge {len(match_ids):,} drawers? [y/N] ").strip().lower()
try:
confirm = input(f" Purge {len(match_ids):,} drawers? [y/N] ").strip().lower()
except EOFError:
print(" Aborted. Re-run with --yes to confirm non-interactively.")
return

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e9a59de (full rewrite). Path now uses confirm_destructive_action(prompt, yes=...) from mempalace.cli, which already handles EOFError for non-interactive callers — see its existing try: input() / except EOFError: return False shape. --yes short-circuits prompting entirely.

Comment thread mempalace/cli.py Outdated
Comment on lines +346 to +354
# Nuke and rebuild with clean HNSW index
palace_path = palace_path.rstrip(os.sep)
print(" Rebuilding palace...")
shutil.rmtree(palace_path)
os.makedirs(palace_path, mode=0o700)

new_client = chromadb.PersistentClient(path=palace_path)
new_col = new_client.create_collection("mempalace_drawers", metadata={"hnsw:space": "cosine"})

Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deletes the entire palace directory in-place. If the process is interrupted or new_col.add(...) fails mid-rebuild, the user can lose all remaining drawers (not just the purged ones). Safer approach: move the existing palace to a timestamped backup (or rebuild into a temp dir) and only swap/rename into place after the new collection is successfully populated; then optionally delete the backup.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e9a59de. The whole rebuild approach is gone — replaced with collection.delete(where=...) after tracing #521's race to the upsert path, not delete-by-where. No rmtree window, no add() loop to fail mid-flight.

Comment thread mempalace/cli.py Outdated
del col, client

# Nuke and rebuild with clean HNSW index
palace_path = palace_path.rstrip(os.sep)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutil.rmtree(palace_path) will recursively delete whatever path the user supplies. Since palace_path is not normalized/validated (and is later rstrip(os.sep)'d), a typo or unexpected value could wipe an unintended directory. Recommend resolving to abspath(normpath(expanduser(...))), explicitly verifying contains_palace_database(palace_path) (as cmd_repair/migrate do), and refusing to run if the path doesn't look like a MemPalace ChromaDB directory.

Suggested change
palace_path = palace_path.rstrip(os.sep)
palace_path = os.path.abspath(os.path.normpath(os.path.expanduser(palace_path)))
if not palace_path or palace_path == os.path.abspath(os.sep):
raise ValueError(f"Refusing to rebuild unsafe palace path: {palace_path!r}")
if not contains_palace_database(palace_path):
raise ValueError(
f"Refusing to rebuild non-MemPalace directory: {palace_path}"
)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e9a59de. No shutil.rmtree anymore — we route deletes through the existing ChromaBackend collection rather than touching the filesystem directly, so user input never reaches rmtree.

Comment thread mempalace/cli.py Outdated
Comment on lines +271 to +276
import chromadb
import shutil

palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
try:
client = chromadb.PersistentClient(path=palace_path)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd_purge opens Chroma via chromadb.PersistentClient directly; this bypasses ChromaBackend's startup fixes (e.g., _fix_blob_seq_ids(...) before client creation) and any future backend-level safety checks/caching logic. To avoid regressions on palaces that require those fixes, consider using ChromaBackend.make_client(...) or ChromaBackend().get_collection(..., create=...) for client/collection creation in this command too.

Suggested change
import chromadb
import shutil
palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
try:
client = chromadb.PersistentClient(path=palace_path)
import shutil
from .chroma_backend import ChromaBackend
palace_path = os.path.expanduser(args.palace) if args.palace else MempalaceConfig().palace_path
try:
client = ChromaBackend.make_client(palace_path)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in e9a59de. Reuses ChromaBackend.get_collection() so we get the same EF resolution + HNSW thread-pinning + cosine-space metadata as every other write path. The bypass concern was the dominant reason for the rewrite.

Comment thread tests/test_cli.py
Comment on lines +54 to +58
def _make_purge_args(**overrides):
"""Build a Namespace with all purge args set."""
defaults = {"palace": None, "wing": None, "room": None, "yes": True}
defaults.update(overrides)
return argparse.Namespace(**defaults)
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All purge tests set yes=True by default, so the interactive confirmation branch (including the abort path) and the rebuild path (rmtree + re-add) aren't exercised. Adding tests that (1) patch builtins.input with "n" when yes=False and assert no deletion happens, and (2) verify the rebuild flow calls rmtree/create_collection/add when yes=True, would better cover the new destructive behavior.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

@jphein jphein May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated — you were right that yes=True left the interactive branch uncovered. Pushed 69ba2f0: two new tests in tests/test_cli.pytest_cmd_purge_interactive_abort_leaves_data_intact (yes=False, input() returns "n", asserts no delete) and test_cmd_purge_eof_on_confirm_aborts_safely (yes=False, input() raises EOFError, asserts clean abort). 7/7 purge tests pass locally. Apologies for the earlier overclaim that these existed already.

jphein added a commit to jphein/mempalace that referenced this pull request Apr 21, 2026
…rows where we already have a PR that addresses them

Sweep caught three adjacent issues I hadn't cross-referenced:
- MemPalace#854 (silent_save flag never read) — closed by MemPalace#673; added to row + PR body
- MemPalace#848 (remove drawers from a wing) — closed by MemPalace#1087; added to row + PR body
- MemPalace#390 (default chunk exceeds MiniLM token cap) — addressed by MemPalace#1024
  (configurable unblocks the user without changing default)

Posted explanatory comments on all three issues pointing users to the
relevant PRs. PR bodies updated via REST API (gh pr edit's GraphQL path
was failing on a deprecation warning around projectCards).
jphein added a commit to jphein/mempalace that referenced this pull request Apr 22, 2026
…emPalace#1094 rows + MemPalace#659 CLEAN

Three open-PR rows were missing (cmd_export, cmd_purge, None-metadata
boundary). MemPalace#659 moved from MERGEABLE to CLEAN after the 2026-04-22
rebase onto current develop cleared its stale-merge blocker. Open-PR
count: 4 → 7.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@igorls igorls added enhancement New feature or request area/cli CLI commands labels Apr 24, 2026
Adds a destructive command for removing drawers by wing and/or room.
Implementation uses a nuke-and-reinsert pattern rather than
`collection.delete()` to avoid triggering the hnswlib `updatePoint`
race documented in MemPalace#521.

See PR body for full motivation.
@jphein jphein force-pushed the pr/cmd-purge-cli branch from b230736 to b63f5fa Compare April 24, 2026 21:25
@igorls
Copy link
Copy Markdown
Member

igorls commented Apr 25, 2026

Thanks for tackling #848. A few concerns before this lands:

1. Rebuild drops the embedding function

new_col = new_client.create_collection("mempalace_drawers", metadata={"hnsw:space": "cosine"})

ChromaBackend.create_collection (chroma.py:609-615) resolves an embedding function via _resolve_embedding_function() and passes it through; constructing the collection directly skips that, so re-inserted drawers get embedded under ChromaDB's default model rather than the project's. The vector space no longer matches what searcher.py queries against.

2. No backup before shutil.rmtree

cmd_repair does shutil.copytree(palace_path, palace_path + ".backup") before deletion (cli.py:468-479). Purge skips that and goes straight to rmtree. Any interrupt between rmtree and the end of the re-insert loop loses every kept drawer. CLAUDE.md: "A crash mid-operation must leave the existing palace untouched."

3. Bypasses the backend abstraction

Direct import chromadb + PersistentClient rather than ChromaBackend, unlike cmd_repair / cmd_migrate. Routing through the backend also fixes (1) and replaces del col, client (which relies on refcount finalization) with proper lifecycle handling.

4. Premise vs. #521

The PR says collection.delete(where=...) reaches the same updatePoint / repairConnectionsForUpdate crash. #521's fingerprint and analysis point only at the upsert path, and its proposed fix is collection.delete(where=...) precisely because it bypasses updatePoint. Could you share a crash stack showing repairConnectionsForUpdate on a delete() call? If delete(where=...) is actually safe the implementation collapses to a few lines.

5. Smaller

Addresses @igorls's review on MemPalace#1087:

  1. **Premise was wrong (MemPalace#521 doesn't apply to delete-by-where).** The
     original implementation extracted survivors, rmtree'd the palace,
     and re-inserted into a fresh collection — predicated on the idea
     that collection.delete(where=...) would trigger the same
     updatePoint / repairConnectionsForUpdate race as the upsert path
     in MemPalace#521. It doesn't: chromadb's filter-delete bypasses the HNSW
     update codepath that races. The simpler approach is correct.

  2. **Embedding function preserved.** No more bypassing
     ChromaBackend.create_collection / _resolve_embedding_function —
     we never recreate the collection at all.

  3. **No data loss on interrupt.** No rmtree means no window where a
     crash leaves the palace empty.

  4. **Routes through the backend.** Uses ChromaBackend.get_collection
     instead of `import chromadb; PersistentClient(...)` directly.
     No more `del col, client` reliance on refcount finalization.

  5. **Reuses confirm_destructive_action.** Replaces the custom
     `input(...)` prompt with the migrate.py helper.

End-to-end test added: real chromadb palace, real backend, real delete,
verify count + surviving ids. Catches the embedding-function regression
that was the load-bearing concern in the review.

Closes MemPalace#848 narrowly (wing/room delete). Source-file / query / dry-run
modes from MemPalace#848's broader ask are deliberately NOT in scope here —
filing as follow-ups if there's appetite.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@jphein
Copy link
Copy Markdown
Collaborator Author

jphein commented Apr 26, 2026

@igorls — thanks, all five concerns landed. New commit e9a59de:

(4) Premise — you were right. I traced #521's stack: the race is on updatePoint / repairConnectionsForUpdate, both reached only via the upsert/add path. collection.delete(where=...) skips that entirely. The nuke-and-rebuild was over-engineering. The whole command collapses to a col.delete(where=...) call now.

(1) Embedding function preserved. No create_collection call at all anymore — we keep the existing collection, so ChromaBackend._resolve_embedding_function's output is untouched. The vector space stays consistent with what searcher.py queries against.

(2) No rmtree. Crash mid-operation leaves the palace exactly as it was; the only state change is whatever col.delete(where=...) had already committed, which is a transactional chromadb operation rather than a process-level "deleted everything, now rebuilding" window.

(3) Through the backend. ChromaBackend.get_collection(palace_path, "mempalace_drawers") now, no direct chromadb.PersistentClient. del col, client is gone too.

(5) confirm_destructive_action reused. Imported from migrate.py per your suggestion.

Tests: added test_cmd_purge_deletes_via_where_clause — end-to-end against a real chromadb palace via the backend (no mocks), verifies survivor count + ids after the delete. Catches the embedding-regression class your concern (1) flagged.

Closes #848 scope: narrowed the body to claim wing/room delete only; --source-file, --query, --dry-run from #848 are out of scope for this PR. If there's appetite I can file those as separate follow-ups.

47/47 cli tests pass. Ready for re-review.

jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
Addresses @igorls's review on MemPalace#1087:

  1. **Premise was wrong (MemPalace#521 doesn't apply to delete-by-where).** The
     original implementation extracted survivors, rmtree'd the palace,
     and re-inserted into a fresh collection — predicated on the idea
     that collection.delete(where=...) would trigger the same
     updatePoint / repairConnectionsForUpdate race as the upsert path
     in MemPalace#521. It doesn't: chromadb's filter-delete bypasses the HNSW
     update codepath that races. The simpler approach is correct.

  2. **Embedding function preserved.** No more bypassing
     ChromaBackend.create_collection / _resolve_embedding_function —
     we never recreate the collection at all.

  3. **No data loss on interrupt.** No rmtree means no window where a
     crash leaves the palace empty.

  4. **Routes through the backend.** Uses ChromaBackend.get_collection
     instead of `import chromadb; PersistentClient(...)` directly.
     No more `del col, client` reliance on refcount finalization.

  5. **Reuses confirm_destructive_action.** Replaces the custom
     `input(...)` prompt with the migrate.py helper.

End-to-end test added: real chromadb palace, real backend, real delete,
verify count + surviving ids. Catches the embedding-function regression
that was the load-bearing concern in the review.

Closes MemPalace#848 narrowly (wing/room delete). Source-file / query / dry-run
modes from MemPalace#848's broader ask are deliberately NOT in scope here —
filing as follow-ups if there's appetite.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 26, 2026
Fork-main now carries:
- MemPalace#1087 rewrite (cmd_purge: collection.delete(where=...) instead of nuke-and-rebuild) — commit 366a9ad
- MemPalace#1094 cherry-pick (coerce None metadatas at chromadb boundary) — commit 43d728d

YAML manifest gains 2 entries; FORK_CHANGELOG regenerated. README test
count bumped 1372 → 1383. CLAUDE.md gets:
- row updated for MemPalace#1087 (rewrite per @igorls's review)
- new "Closed by jphein-with-triage" subsection noting MemPalace#622 closure
  (architectural concern resolved by MemPalace#673; verified before clicking)

Triage perms make audits load-bearing. New feedback memory:
feedback_audit_comments_with_triage_perms.md — verify PR/commit claims
via gh before posting, gh api PATCH supports edits, track in-comment
promises in scratch/promises.md.

scripts/check-docs.sh ran clean across all 4 stages.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Closes the test-coverage gap noted by @copilot-pull-request-reviewer
on MemPalace#1087: existing purge tests all set yes=True, leaving the
interactive confirmation branch — including the abort path and the
EOFError-on-stdin path that motivated the confirm_destructive_action
delegation in the first place — entirely uncovered.

Two new tests:

  test_cmd_purge_interactive_abort_leaves_data_intact: yes=False,
  patched input() returns "n", asserts col.delete() never fires.

  test_cmd_purge_eof_on_confirm_aborts_safely: yes=False,
  patched input() raises EOFError (non-tty case: piped CI run,
  `< /dev/null`), asserts clean abort instead of crash.

Both go through cmd_purge end-to-end with mocked backend so they
exercise the confirmation branch in cli.py without needing a real
chromadb. 7/7 purge tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/cli CLI commands enhancement New feature or request

Projects

None yet

3 participants