Skip to content

bug: existing 0.6.x palaces need mempalace repair after #1010 — filtered query(where=...) fails with 'Error finding id' until HNSW index is rebuilt #1035

@potterdigital

Description

@potterdigital

Summary

After upgrading an existing 0.6.x palace via the chromadb>=1.5.4,<2 pin landed in #1010, mempalace search (and any code path that calls col.query(query_texts=..., where={...})) fails with:

chromadb.errors.InternalError: Error executing plan: Internal error: Error finding id

…until mempalace repair is run against the upgraded palace. Unfiltered queries (col.query(query_texts=...) with no where) work fine even before the repair.

Because mempalace search --wing X and mempalace search --wing X --room Y always build a where filter (searcher.py:251build_where_filter), essentially every upgrading user will hit this on their first filtered search. The CLI reports the error as "Search error: …" and returns non-zero, with no hint about running repair.

Repro

Starting state: a palace built over a long period on chromadb==0.6.3. Upgrade in place:

pipx inject --force mempalace 'chromadb>=1.5.4,<2'
mempalace migrate --dry-run
# → "Palace is already readable by chromadb 1.5.8. N drawers found. No migration needed."

mempalace search "encryption" --wing my_project
#   Search error: Error executing plan: Internal error: Error finding id

mempalace search "encryption"                 # no filter
#   Returns results correctly.

mempalace repair --yes
#   Re-files all N drawers, backs up to palace.backup, swaps in place.

mempalace search "encryption" --wing my_project
#   Works.

Direct chromadb probe (confirms it's the filter path, not the palace contents)

import chromadb
c = chromadb.PersistentClient(path='/Users/.../palace')   # palace built on 0.6.3
col = c.get_collection('mempalace_drawers')

col.count()                                                # 36621 ✅
col.get(limit=5, include=['metadatas'])                    # ✅
col.query(query_texts=['encryption'], n_results=3)         # ✅
col.query(query_texts=['encryption'], n_results=3,
          where={'wing': 'my_project'})
# → InternalError: Error executing plan: Internal error: Error finding id

The SQLite source of truth (chroma.sqlite3) is complete (74,911 embedding rows, 38,762 unique embedding_ids, 36,621 live drawers reported by col.count()). It's the old 0.6.x HNSW index that chromadb 1.5.x's Rust query path can't combine with a metadata filter.

mempalace repair fixes it because it rebuilds the vector index from sqlite:

Drawers found: 36621
Extracting drawers...
Extracted 36621 drawers
Backing up to /Users/.../palace.backup...
Rebuilding collection...
Re-filed 5000/36621 drawers...
...
Repair complete. 36621 drawers rebuilt.

Why this wasn't caught in #1010's validation

The 50K-drawer migration test referenced in #1010 validated count() and unfiltered reads, which matches the maintainer claim of "zero API breakage". It looks like no filtered query(where={...}) was exercised against a 0.6.x-built HNSW index. Fresh palaces created under 1.5.x don't have this problem — it only affects users with pre-existing 0.6.x palaces.

Suggested fix

One of:

  1. Auto-repair on version mismatch. ChromaBackend._client() already runs _fix_blob_seq_ids preemptively (backends/chroma.py:14). A similar "if we detect a 0.6.x-built HNSW in the palace dir and chromadb is 1.5.x, run repair before returning the client" check would make the upgrade seamless.
  2. Release-note warning. Release notes for the version that ships the 1.5.x pin should tell existing users to run mempalace repair once after upgrade. Ideally the CLI also detects the mismatch and prints a hint on the first filtered-query failure.
  3. Call it from mempalace migrate when source_version is 0.6.x and target is 1.5+. Today migrate early-returns when chromadb can read the palace ("already readable") — but readable ≠ filter-queryable against the old index. Extending migrate to run the repair pass in that case would keep a single user-facing upgrade command.

Option 1 is the most user-friendly; option 3 is the most consistent with the existing migrate flow.

Environment

  • mempalace: 3.3.1 (pipx)
  • chromadb: post-upgrade 1.5.8 (source: 0.6.3)
  • Python: 3.12.7
  • macOS arm64 (Darwin 25.3.0)
  • Palace size: 36,621 drawers, 525 MB chroma.sqlite3

Happy to help

Can put up a PR for option 1 or option 3 if the maintainers have a preferred direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstorage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions