Skip to content

fix(search): hint at mempalace repair when filtered query hits HNSW index mismatch (#1035)#1081

Open
potterdigital wants to merge 2 commits intoMemPalace:developfrom
potterdigital:fix/1035-hnsw-repair-hint
Open

fix(search): hint at mempalace repair when filtered query hits HNSW index mismatch (#1035)#1081
potterdigital wants to merge 2 commits intoMemPalace:developfrom
potterdigital:fix/1035-hnsw-repair-hint

Conversation

@potterdigital
Copy link
Copy Markdown

@potterdigital potterdigital commented Apr 21, 2026

Summary

Fixes #1035. When a palace's HNSW index predates 1.0 (i.e. built under
chromadb 0.6.x), a filtered search on 1.5.x raises an opaque
ValueError: Error finding id .... The repair command already handles
the actual upgrade (mempalace repair), but users upgrading in place
have no way to discover that from the error alone.

This PR catches the exception in both query paths in searcher.py,
detects the specific signature (a where filter was in play and
the exception message contains "Error finding id"), and surfaces a
hint pointing at mempalace repair before re-raising (CLI) or
returning the error dict (API / MCP). The no-palace path already pairs
error + hint in the returned dict, so the API branch mirrors that
shape.

Changes

  • mempalace/searcher.py — detect-and-hint inside the existing
    except blocks in search() (CLI) and search_memories() (API),
    guarded by where and "Error finding id" in str(e)
  • CHANGELOG.md — one-line entry under ### Bug Fixes in the
    [Unreleased] — v3.3.0 section
  • tests/test_searcher.py::TestHNSWRepairHint — three focused tests:
    CLI positive, API positive, and a negative-guard test to ensure
    unrelated failures don't get a misleading repair suggestion

All 1060 tests in the full suite pass. ruff check and
ruff format --check both clean on the touched files.

Scope note for the maintainer

In the issue thread, #851 and #1016 were mentioned as related —
happy to be corrected, but my read is those address SQLite parameter
limits in col.get() pagination, which is a different code path from
the HNSW col.query() failure this PR addresses. If you'd like this
to also cover the pagination path, I'm glad to extend the PR; I kept
it narrow since your earlier reply greenlit a ~5-line detect-and-hint
specifically against the 0.6.x→1.5.x HNSW failure.

Known related gap

The mempalace repair command this hint points at currently
segfaults on a sufficiently corrupt HNSW index — repair calls
col.count() before doing any repair work, and that's the call that
crashes on drift. That's tracked separately in #1108 (wiring
quarantine_stale_hnsw() from #1000 into the backend open path). On
palaces where the index is merely stale (not corrupt enough to crash
count()), repair works as intended and this hint is actionable;
#1108 closes the remaining case.

Test plan

  • pytest tests/test_searcher.py -v — 24 passed (3 new)
  • pytest -x — 1060 passed
  • ruff check mempalace/searcher.py tests/test_searcher.py
  • ruff format --check mempalace/searcher.py tests/test_searcher.py

… index mismatch (MemPalace#1035)

When a pre-1.0 palace is queried with a `where` filter, ChromaDB raises
an opaque `Error finding id ...` and the user has no way to discover
that `mempalace repair` rebuilds the HNSW index. Detect that exact
signature (filter in play + `Error finding id` in the message) in both
the CLI `search()` and the API `search_memories()` paths and surface a
hint pointing at the repair command before re-raising / returning the
error dict.

Guarded so unrelated query failures (no filter, or different error
text) do not get a misleading repair suggestion. Three new tests in
tests/test_searcher.py::TestHNSWRepairHint cover the CLI positive,
API positive, and negative-guard paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown
Collaborator

@jphein jphein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed on review/1081 branch against upstream/develop:

  • All three TestHNSWRepairHint tests pass locally (pytest tests/test_searcher.py::TestHNSWRepairHint -v).
  • Full tests/test_searcher.py suite: 24 passed.
  • Diff reads clean — guard where and "Error finding id" in str(e) is correctly narrow; CLI prints hint before re-raising, API path returns it as a first-class hint key so MCP-server consumers don't have to parse stderr.

On the live-data smoke: my own palace (1.5.x, 165K drawers) doesn't reproduce the stale-HNSW failure mode the hint targets, so the hint is unreachable on my corpus by design — I can't add a "confirmed against real palace" data point the way I had hoped. Left the review as tests-and-diff-only for that reason; no changes requested.

@Qodo-Free-For-OSS
Copy link
Copy Markdown

Hi, Both the CLI and API/MCP hint text tells users to run mempalace repair <palace_path> but never substitutes the actual palace_path, reducing the hint’s usefulness even though the value is available in scope.

Severity: remediation recommended | Category: correctness

How to fix: Interpolate palace_path in hint

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

The repair hint currently includes a literal <palace_path> placeholder in both CLI output and the API return dict, even though palace_path is available.

Issue Context

This hint is shown specifically to help users upgrade pre-1.0 palaces. Providing the concrete path makes the suggested command directly copy/pastable.

Fix Focus Areas

  • mempalace/searcher.py[264-271]
  • mempalace/searcher.py[359-365]

Suggested change

  • Update the printed/returned hint to use an f-string, e.g. mempalace repair {palace_path} (optionally os.path.expanduser(palace_path) if you want it normalized).

We noticed a couple of other issues in this PR as well - happy to share if helpful.


Found by Qodo code review

@igorls igorls added bug Something isn't working area/search Search and retrieval labels Apr 24, 2026
Address Qodo review on MemPalace#1081: the hint string contained a literal
`<palace_path>` placeholder in both the CLI print and the API dict,
even though `palace_path` is in scope at both sites. Use an f-string
so the suggested `mempalace repair <path>` command is directly
copy-pasteable.

Also strengthens both positive tests in `TestHNSWRepairHint` to
assert the interpolated path appears AND the literal `<palace_path>`
placeholder does not — guards against the same mistake in future
edits to the hint text.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@potterdigital
Copy link
Copy Markdown
Author

Fixed in 54029c7 — both hint strings now use an f-string to interpolate palace_path, so the suggested command is directly copy-pasteable. Both positive tests in TestHNSWRepairHint were strengthened to assert the interpolated path appears and that the literal <palace_path> placeholder does not, guarding against the same regression in future edits.

We noticed a couple of other issues in this PR as well - happy to share if helpful.

@qodo-ai-reviewer yes please — go ahead and surface the others.

Copy link
Copy Markdown
Collaborator

@jphein jphein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed 54029c7 against the 2026-04-21 approval baseline.

The change is small (2 lines in searcher.py, strengthened tests in test_searcher.py): replaces the literal <palace_path> placeholder in both hint sites with f-string interpolation, so the suggested command is directly copy-pasteable. Both positive tests in TestHNSWRepairHint now assert both (a) the interpolated path appears and (b) the literal <palace_path> doesn't leak — good regression guard for future edits to the hint text.

No changes to the guard logic (where and "Error finding id" in str(e)) I approved earlier. Re-approving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/search Search and retrieval bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: existing 0.6.x palaces need mempalace repair after #1010 — filtered query(where=...) fails with 'Error finding id' until HNSW index is rebuilt

4 participants