`mempalace repair` silently truncates drawers to 10,000 — data loss on palaces > 10K

  ## Summary

  `mempalace repair` silently caps recovery at **10,000 drawers**, discarding everything beyond that. On a palace with 67,580 drawers, repair completed successfully with no warning and left the palace with exactly
   10,000 — a **57,580-drawer loss (~85%)**.

  ## Environment

  - MemPalace: **3.3.3** (also reproducible on 3.3.2, see context)
  - Python: 3.x via pipx (`~/.local/pipx/venvs/mempalace`)
  - ChromaDB: **1.5.8**
  - OS: macOS 15 (Darwin 25.3.0), Apple Silicon
  - Palace size: ~619 MB, 67,580 drawers in `mempalace_drawers`, 499 in `mempalace_closets`

  ## Steps to Reproduce

  1. Palace with > 10,000 drawers; HNSW for `mempalace_drawers` corrupted (see "Trigger" below).
  2. `mempalace status` segfaults (exit 139) — known HNSW/sqlite drift symptom.
  3. Quarantine drawers HNSW files (`data_level0.bin`, `header.bin`, `length.bin`, `link_lists.bin`, `index_metadata.pickle`) so `repair` itself doesn't segfault.
  4. Run `mempalace repair --yes`.

  ## Observed Output

  =======================================================
    MemPalace Repair

    Palace: /Users/.../.mempalace/palace
    Drawers found: 10000
    ...
    Repair complete. 10000 drawers rebuilt.
    Backup saved at /Users/.../.mempalace/palace.backup

  `Drawers found: 10000` — but the underlying ChromaDB metadata segment held **67,580** embeddings (verified directly in `chroma.sqlite3`):

  ```sql
  SELECT c.name, COUNT(*)
  FROM embeddings e
  JOIN segments s ON e.segment_id = s.id
  JOIN collections c ON s.collection = c.id
  GROUP BY c.name;
  -- mempalace_closets : 499
  -- mempalace_drawers : 67580   <-- before repair
  -- mempalace_drawers : 10000   <-- after repair

  Expected Behavior

  Either:
  - Repair must process all drawers (paginated collection.get(limit=N, offset=K) loop), or
  - Repair must fail loudly when the source set exceeds the extraction cap, refusing to overwrite.

  Silently dropping 85% of memories with a "Repair complete" success message is the worst possible outcome for a tool whose entire job is data preservation.

  Root Cause (suspected)

  collection.get() in ChromaDB defaults to a 10,000-row limit. The repair extraction path likely calls .get() once without paginating. Same issue would affect any palace > 10K drawers.

  Trigger of the Underlying HNSW Corruption

  Pre-existing palace state — exact cause unknown. mempalace status, mempalace repair, and chromadb.Collection.count() on mempalace_drawers all segfault when the corrupt HNSW is loaded. Closets collection
  unaffected. The 3.3.2 #1000 "Quarantine stale HNSW" fix did not auto-trigger here; manual quarantine of HNSW files was required just to make repair runnable — at which point the 10K cap surfaced.

  Mitigation Used

  Restored the pre-repair backup (~/.mempalace.bak-<date>) and recovered the lost 57,580 drawers manually by extracting embedding vectors directly from chroma.sqlite3 and rebuilding the HNSW index without going
  through mempalace repair.

  Suggested Fix

  In the repair extraction loop, paginate:

  BATCH = 5000
  offset = 0
  while True:
      batch = src.get(limit=BATCH, offset=offset, include=["documents","metadatas","embeddings"])
      if not batch["ids"]:
          break
      dst.add(ids=batch["ids"], documents=batch["documents"],
              metadatas=batch["metadatas"], embeddings=batch["embeddings"])
      offset += BATCH

  Plus a sanity-check assertion: assert len(dst) == src_total_count before declaring success and before overwriting the live palace.

  Severity

  High. Silent data loss in a tool sold as "verbatim memory, 96.6% R@5". The user's only signal something went wrong was the post-repair status showing a number that looked too round.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`mempalace repair` silently truncates drawers to 10,000 — data loss on palaces > 10K #1208

Summary

Environment

Steps to Reproduce

Observed Output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mempalace repair silently truncates drawers to 10,000 — data loss on palaces > 10K #1208

Description

Summary

Environment

Steps to Reproduce

Observed Output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`mempalace repair` silently truncates drawers to 10,000 — data loss on palaces > 10K #1208