feat(searcher): warnings + sqlite BM25 top-up when vector underdelivers by jphein · Pull Request #1005 · MemPalace/mempalace

jphein · 2026-04-18T18:04:04Z

Summary

When ChromaDB's HNSW index is sparse, drifted, or rejects a filter (#951's Error finding id), today's search_memories returns fewer hits than the palace actually contains — silently. A user querying for drawers that exist in sqlite sees an empty or partial result and concludes the palace has forgotten, when only the vector ranking layer is degraded.

Silent hit-miss is worse than a crash because callers can't detect it. This is especially painful:

right after mempalace repair while HNSW is still rebuilding
on palaces where ChromaDB: HNSW persistence vs add_drawer — default sync_threshold can leave semantic search stale #823's default sync_threshold has left on-disk HNSW lagging behind sqlite
when a filter hits fix: degrade gracefully when filtered Chroma search fails #951's planner failure

Contract change

search_memories() no longer hard-fails on vector errors — it always returns a result (except "error": "No palace found"). The return dict gains two fields:

"warnings": ["vector search unavailable: ...", ...]  # why we couldn't get more
"available_in_scope": 5243                            # sqlite-authoritative scope count

Behavior:

Run the vector query. On exception, append a warning ("vector search unavailable: <err>") and continue — do not raise.
After hybrid ranking, count sqlite drawers matching the scope (wing/room filter, or total). That's the ceiling of what should have been returned.
If len(hits) < n_results and the caller did NOT set max_distance (strict-similarity mode), top up from the sqlite pool via BM25 keyword ranking.
- Fallback hits are tagged matched_via="sqlite_bm25_fallback".
- distance/similarity are None since there's no vector score.
- Drawers with zero query-term overlap are skipped — no padding with arbitrary content.
If scope > returned, add a warning pointing at mempalace repair.

CLI search() now delegates to search_memories so both paths share the fallback and warning surface. Warnings print with a ! prefix above the results:

============================================================
  Results for: "kiyo xhci"
  Wing: kiyo-xhci-fix
  Scope has: 5243 drawers matching filter
  ! vector search returned 0 of 5 requested; filled 5 from sqlite+BM25 keyword match
============================================================

Relation to #951 and #823

Sibling of fix: degrade gracefully when filtered Chroma search fails #951: fix: degrade gracefully when filtered Chroma search fails #951 catches the specific Error finding id at the query layer and falls back to col.get with local lexical ranking. This PR widens that idea to any vector underdelivery and adds the visible warning layer. They compose cleanly — if you merge fix: degrade gracefully when filtered Chroma search fails #951 first I can rebase and use its fallback as the inner primitive.
Addresses half of ChromaDB: HNSW persistence vs add_drawer — default sync_threshold can leave semantic search stale #823: ChromaDB: HNSW persistence vs add_drawer — default sync_threshold can leave semantic search stale #823 is the drift-causes-stale-search symptom. This PR stops the silent miss; the other half (lowering default sync_threshold to reduce drift in the first place) is orthogonal and can land separately.

Complexity

Ruff flagged search_memories (C901) after the additions. Extracted the fallback + scope-count into _sqlite_fallback_and_scope() to stay under threshold without losing any local context.

Tests

test_search_memories_fills_from_sqlite_when_vector_underdelivers — mocks vector returning 1 hit, sqlite having 4 drawers in scope (2 with query-term matches, 1 without); asserts 2 fallback hits get promoted via BM25 and the one with zero matches is skipped; asserts available_in_scope == 4 and the warning text.
test_search_memories_query_error_degrades_to_warning — vector raises, warning surfaced, no hard failure.
test_search_query_error_degrades_to_warning — CLI no longer raises on vector failure, prints the warning.

4 prior tests updated for the new contract (scope count, warnings list, no SearchError on pure query failure).

pytest tests/ → 974 passed
ruff check / ruff format --check — clean

Scope / non-goals

No change to the MCP tool schema beyond the additive return fields (existing callers reading results and query still work).
No change to closet search or hybrid rank — only the path between drawer_results and the final return.
Does not wire quarantine_stale_hnsw (feat(backends): quarantine_stale_hnsw — recover from HNSW/sqlite drift (closes #823) #1000) into the read path — intentionally separate.

Copilot

Pull request overview

This PR updates MemPalace search behavior to avoid silent underdelivery when ChromaDB’s vector index fails or returns too few hits by surfacing warnings and optionally topping up results from SQLite using BM25 keyword ranking. It also routes the CLI search() through search_memories() so both programmatic and CLI callers share the same fallback/warning behavior.

Changes:

Make search_memories() degrade vector query failures into "warnings" instead of returning an "error" (except for “No palace found”).
Add SQLite scope accounting + BM25 top-up fallback when vector results underdeliver and max_distance filtering is disabled.
Update CLI search() to delegate to search_memories() and print warnings/scope information; update tests for the new contract.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`mempalace/searcher.py`	Adds warnings + SQLite/BM25 fallback logic and unifies CLI and API search behavior via `search_memories()`.
`tests/test_searcher.py`	Updates/extends tests to validate warning surfacing and fallback behavior instead of hard failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T18:09:03Z

+        pool_kwargs: dict = {"include": ["documents", "metadatas"]}
+        if where:
+            pool_kwargs["where"] = where
+        # Cap the pool — sqlite-variable limits (#950) bite around 32k, and
+        # a fallback pool 20x the request is plenty for BM25 ranking.
+        pool_kwargs["limit"] = max(n_results * 20, 100)
+        pool = drawers_col.get(**pool_kwargs)
+    except Exception as e:
+        warnings.append(f"sqlite fallback unavailable: {e}")
+        return available_in_scope, warnings
+
+    pool_docs = pool.get("documents") or []
+    pool_metas = pool.get("metadatas") or []
+    available_in_scope = len(pool_docs)
+


available_in_scope is computed as len(pool_docs) after a get(limit=max(n_results*20, 100)), so it will never exceed that limit and is not an authoritative “scope count”. This makes the CLI scope line and the later “unreachable via HNSW” warning potentially misleading on large palaces. Consider computing the full in-scope count separately (e.g., drawers_col.count(where=where) if safe, or paginating get(include=["ids"], limit=..., offset=...) to count) and keep the limited get(include=["documents","metadatas"]) only for BM25 fallback candidates.

Addressed in commit d0f4e32 (squashed rebase of the original b4cbdfd). _count_in_scope() now paginates col.get(include=[], limit=5000, offset=K) to compute an authoritative scope total — independent of the (capped) fallback pool — so available_in_scope reflects sqlite truth, not the pool ceiling. Pagination keeps each query well under #950's SQL-variable limit.

Copilot · 2026-04-18T18:09:04Z

+    # Surface unreachable data: scope has more matches than we could rank.
+    if available_in_scope is not None and available_in_scope > len(hits) and len(hits) < n_results:
+        warnings.append(
+            f"{available_in_scope} drawers match this scope in sqlite; "
+            f"{len(hits)} ranked — the rest are unreachable via the current "
+            f"HNSW index. Run `mempalace repair` to rebuild."
+        )


The “Run mempalace repair” warning is currently gated on len(hits) < n_results. If the sqlite BM25 fallback fills the request to n_results, this warning won’t surface even when available_in_scope > len(hits) (i.e., lots of drawers exist in-scope but couldn’t be vector-ranked). If the intended contract is “warn whenever scope > returned”, drop the len(hits) < n_results condition (or otherwise key it off vector underdelivery rather than the final hit count).

Addressed in d0f4e32. The "Run `mempalace repair`" warning now gates off a vector_underdelivered flag captured before the BM25 fallback runs, instead of len(hits) < n_results. So when sqlite top-up papers over a degraded vector path, the user still sees that the vector index is the under-delivering layer.

Copilot · 2026-04-18T18:09:04Z

+        print(f"\n  {result['error']}")
+        if "hint" in result:
+            print(f"  {result['hint']}")
+        raise SearchError(result["error"])


CLI search() used to print the missing-palace path (No palace found at <path>). After delegating to search_memories(), it now prints only No palace found (no path), which is less actionable when multiple palaces/paths are involved. Consider including palace_path in the printed message (either by enriching the search_memories error payload or formatting it in search()).

Suggested change

print(f"\n {result['error']}")

if "hint" in result:

print(f" {result['hint']}")

raise SearchError(result["error"])

error_message = result["error"]

if error_message == "No palace found":

error_message = f"{error_message} at {palace_path}"

print(f"\n {error_message}")

if "hint" in result:

print(f" {result['hint']}")

raise SearchError(error_message)

Addressed in d0f4e32. CLI search() now preserves the palace path explicitly when printing the error: structured payload from search_memories stays path-agnostic (right for the API contract), but the CLI prepends at <palace_path> so users still see which palace failed to open.

Three corrections from @copilot-pull-request-reviewer: 1. available_in_scope was capped at n_results*20 via the pool get(), so the "sqlite-authoritative" framing was misleading on large palaces (would report at most 100 even when 5000+ drawers matched). Split the count out of the pool fetch: new _count_in_scope() paginates col.get(include=[], limit=5000, offset=K) to sum an authoritative total. Pagination keeps each query well under MemPalace#950's SQL-variable limit. 2. The "Run mempalace repair" warning was gated on len(hits) < n_results, so it never surfaced when the BM25 fallback filled the request — even though the vector index was the actual degraded layer. Added a vector_underdelivered flag captured before fallback runs; the warning now gates off that flag so users see "HNSW can't reach the rest" whenever the vector path fell short, regardless of how much BM25 papered over. 3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor to delegate through search_memories(). Preserve it explicitly in search() — the structured error payload stays path-agnostic, but the CLI adds "at <palace_path>" when printing so users still see which palace failed to open. Also updated _sqlite_fallback_and_scope() signature to take vector_underdelivered separately from allow_fallback, matching what the two warnings now need to distinguish.

README: - Fork-changes table: expand the None-metadata row to cover all 8 sites (searcher.py CLI + API + closet-boost, miner.status, 4 mcp_server handlers). Previous row only called out the CLI print path. - Add a new Search row: warnings + sqlite BM25 top-up contract (the "never silent miss" feature) with pointer to MemPalace#951 + MemPalace#823. - Open-PR table: expand MemPalace#999 scope line to mention 8 sites + architectural note, update MemPalace#1000 to reflect post-MemPalace#995 rebase, add MemPalace#1005 with Copilot fixes noted. CLAUDE.md: - PR status header: 7 open -> 8 open (adds MemPalace#1005). - Same PR row updates as README for MemPalace#999/MemPalace#1000/MemPalace#1005. - Fork Changes list: expand entry 11 (None guards) to 8 sites + adapter consolidation proposal on MemPalace#999; add entry 14 for the warnings+BM25 feature; keep 12 and 13 as-is. 42 README-claim tests still pass.

Concrete evidence: - New "What this looks like in practice" section right after the status line, showing a real stop-hook systemMessage output and a real mempalace_search response shape (warnings, available_in_scope, matched_via tags, similarity scores). Someone evaluating the fork can see what "running in production" actually looks like. Headlines box: - New "Headlines" subsection at the top of Fork Changes with the three differentiators someone should know if they only read one section — silent-save hook, ChromaDB 1.5.x hardening (quarantine + None guards), search-never-misses contract. Links to MemPalace#673/MemPalace#999/ MemPalace#1000/MemPalace#1005 so readers can jump to the work itself. Citations for comparison table: - Every row now links to its upstream repo: Hindsight (vectorize-io), Mem0 + OpenMemory (mem0ai), Cognee (topoteretes), Letta (letta-ai), engram (NickCirv), CaviraOSS OpenMemory. Cognee row updated since they've added MCP support since we first wrote the row. - Replaced the "Systems mentioned without captured primary URLs" footnote (now stale since we have them) with a "Verification note" that's honest about the point-in-time nature of these columns and explains why TagMem is absent. Structure cleanups: - Removed the upstream MemPalace logo at the top — it's milla- jovovich's asset and using it in a fork README is awkward. - Renamed "Roadmap" → "Planned work" — the section is decisive with priorities and time estimates, "Roadmap" was underselling it. No content removed from the document beyond the stale footnote and the upstream logo. All existing sections intact. 42 README-claim tests pass.

Every remaining row in "Still ahead of upstream" now carries a status so the reader can tell at a glance whether each change is being upstreamed, pending a PR, or deliberately fork-local. Dropped: - "Guard ChromaDB 1.5.x metadata-mismatch segfault" — this row was overstated. The memory file for today's debugging notes that the try-get/except-create pattern is defensive code that never reproduced a specific crash (the actual crashes traced to HNSW drift). Leaving it in "Still ahead" implied an upstream-candidate fix, which it isn't. Code stays in place as defensive, but the README no longer claims it as a fork-ahead feature. Moved to Superseded: - "Stale HNSW mtime detection + mempalace_reconnect" — upstream took a different approach in MemPalace#757. Our broader inode+mtime detection and the mempalace_reconnect MCP tool remain as fork-local convenience; they're just not "ahead of upstream" anymore. Statuses now populated: - Linked PR number for the 7 changes with active upstream PRs (MemPalace#659, MemPalace#660, MemPalace#661, MemPalace#673 with APPROVED note, MemPalace#999, MemPalace#1000, MemPalace#1005). - "PR pending" for 3 items that are good candidates but unfiled: epsilon mtime comparison, max_distance parameter, tool output mining. - "fork-only" for 2 items we keep intentionally without pitching upstream: .blob_seq_ids_migrated marker (narrow), bulk_check_mined (complementary to upstream's MemPalace#784 file-locking). Legend sentence added above the table explains the three status values. 42 README-claim tests pass.

Dialectician

Read the diff end to end, ran the full test suite on your main + this branch (998/998 pass in 40.6s on ChromaDB 1.5.8). Think this lands well. Two things to call out, neither a blocker.

Gate semantics on allow_fallback = (max_distance <= 0.0)

search_memories defaults max_distance = 0.0, so CLI callers (no override) get the BM25 fallback. But tool_search in mcp_server.py defaults to max_distance = 1.5 and always passes it through, so the MCP path hits 1.5 <= 0.0, False, fallback disabled by default. If an agent hits a drifted palace via the MCP tool, they get the silent underdeliver this PR is designed to prevent, because the gate conflates "permissive default" with "strict caller-set threshold."

Might be intentional if you are scoping to CLI for now. If not, cleaner gate is Optional[float] with None meaning "allow fallback" and any explicit value meaning "respect strict mode." Happy to follow up with a small PR if you confirm which way.

_count_in_scope on unfiltered scopes

When both wing and room are None, where is None and the function paginates all drawers to count. At 137K with PAGE=5000 that is ~28 round trips per search just for the scope count. drawers_col.count() does this O(1) when no filter is needed.

Also worth noting: the pre-mutation vector_hit_count capture (so the scope warning still fires after BM25 top-up) is load-bearing subtlety. Worth a comment if you have not already lost patience with refactor resistance.

Peace B.Sweet

Dialectician

Workload validation from a real palace (2487 drawers, 10 wings, active production use as memory backing for an agent stack). This palace has HNSW drift, every filtered-by-wing query hits the planner error. Exact reproduction of the scenario this PR addresses.

Three CLI queries against --wing <real-wing-id> with keywords "federation", "test", "memory":

Before your fork (mempalace 3.3.0, current installed runtime):

Search error: Search error: Error executing plan: Internal error: Error finding id

Hard SearchError. Caller sees nothing.

With your fork:

Warning: vector search unavailable: Error executing plan: Internal error: Error finding id
Scope count: Scope has: 22 drawers matching filter / 2280 drawers matching filter
BM25 top-up: vector search returned 0 of 3 requested; filled N from sqlite+BM25 keyword match
Actionable hint: Run mempalace repair to rebuild the HNSW index for full semantic recall.

Every contract promise in the PR description fires as advertised on real data. The vector_underdelivered pre-mutation capture also proved load-bearing: in one query the BM25 fallback filled 3 of 3 requested and the "2280 drawers in scope, vector ranked 0" warning still surfaced, which is the correct signal (vector is still the degraded layer even though the final hit count is fine).

Separately: the gate-semantics note from my earlier review (CLI default max_distance=0.0 enables BM25 fallback, MCP default 1.5 disables it) is more significant than I initially weighted it. The MCP path, which is the primary access pattern for agents querying this palace, would see "0 hits" silently on this same failure rather than "0 hits + warning + BM25 fill." For a drifted palace being queried by Claude through MCP, the agent would conclude the memory is empty when it is only the ranking layer that is broken. Happy to send the Optional[float] gate fix as a small follow-up PR if you greenlight.

Peace B.Sweet

- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

jphein · 2026-04-19T00:57:26Z

Addressed both Dialectician review points:

Gate semantics: Replaced allow_fallback=(max_distance <= 0.0) with allow_fallback = (max_distance <= 0.0) or bool(warnings). The warnings list is non-empty only when a vector error occurred (the except Exception as e block in the query path). This means:

CLI callers (default max_distance=0.0): fallback fires as before
MCP tool_search (default max_distance=1.5) on a drifted palace: fallback now fires correctly
MCP tool_search on a working HNSW with strict filter that legitimately eliminates all results: no fallback (correct — vector isn't broken, the filter is just strict)

_count_in_scope performance: Unfiltered scopes (no wing/room filter, where is empty) now use Collection.count() which is O(1). Paginated get() still runs for filtered scopes since count() doesn't accept a where argument.

Also added the comment on vector_hit_count pre-mutation capture Dialectician called out as load-bearing subtlety.

jphein · 2026-04-19T00:59:26Z

Thanks Dialectician — really appreciate the thorough review. Running the full suite + validating on a live drifted palace is exactly the kind of signal that's hard to fake, and the 'pre-mutation vector_hit_count capture is load-bearing' note saved a comment I'd been meaning to add.

Hoping the fixes hold up as well on your palace as they did in the test run.

On the three Copilot inline comments:

available_in_scope cap: _count_in_scope uses a paginated get(include=[]) loop (not a single capped get) so the count is authoritative for filtered scopes. Unfiltered scopes now use Collection.count() (O(1), no cap). Copilot was describing an earlier draft.

Run repair warning gate: Fixed by capturing vector_hit_count = len(hits) before BM25 may extend hits. The warning now gates on vector_underdelivered (pre-BM25) rather than final hit count, so it surfaces even when the fallback fills the request.

CLI missing palace path: The CLI search() now enriches the 'No palace found' error payload with the path before printing, so the output is 'No palace found at \<path\>' again.

Dialectician · 2026-04-19T01:00:31Z

Pulled b9a6585, ran the test suite (974 pass, 106 deselected), spot-checked the runtime against the repaired palace. The (max_distance <= 0.0) or bool(warnings) gate is a better solution than the Optional[float] shape I suggested. It captures the real invariant (vector is broken so fall back) rather than a proxy (caller did or did not explicitly set a strict threshold). In particular it handles the case my proposal would have gotten wrong: MCP strict caller with a working HNSW and a legitimately empty filter scope. No fallback in that case, correct.

count() short-circuit on unfiltered scopes looks clean too. Comment on the vector_hit_count capture helps.

Good trade. LGTM on this round.

Peace B.Sweet

- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…review addressed Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Three corrections from @copilot-pull-request-reviewer: 1. available_in_scope was capped at n_results*20 via the pool get(), so the "sqlite-authoritative" framing was misleading on large palaces (would report at most 100 even when 5000+ drawers matched). Split the count out of the pool fetch: new _count_in_scope() paginates col.get(include=[], limit=5000, offset=K) to sum an authoritative total. Pagination keeps each query well under MemPalace#950's SQL-variable limit. 2. The "Run mempalace repair" warning was gated on len(hits) < n_results, so it never surfaced when the BM25 fallback filled the request — even though the vector index was the actual degraded layer. Added a vector_underdelivered flag captured before fallback runs; the warning now gates off that flag so users see "HNSW can't reach the rest" whenever the vector path fell short, regardless of how much BM25 papered over. 3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor to delegate through search_memories(). Preserve it explicitly in search() — the structured error payload stays path-agnostic, but the CLI adds "at <palace_path>" when printing so users still see which palace failed to open. Also updated _sqlite_fallback_and_scope() signature to take vector_underdelivered separately from allow_fallback, matching what the two warnings now need to distinguish.

- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Dialectician · 2026-04-19T01:13:44Z

Heads-up on 7c8bb59: one test regression from the count() short-circuit. Full run is 1 failed, 1024 passed.

FAILED tests/test_searcher.py::TestSearchCLI::test_search_handles_none_metadata_without_crash
E       TypeError: > not supported between instances of MagicMock and int
searcher.py:683

The test mocks mock_col but does not stub count(), so _count_in_scope returns a MagicMock, which then fails the available_in_scope > vector_hit_count compare.

Fix is one line, same pattern you applied to the other four tests in b9a6585:

mock_col = MagicMock()
mock_col.count.return_value = 2  # any int; test does not otherwise care
mock_col.query.return_value = {...}

(Runtime behavior is unaffected because real ChromaDB count() returns an int; the miss is mock-only.)

Peace B.Sweet

jphein · 2026-04-22T17:29:54Z

Thanks @Dialectician — verified green. The regression at 7c8bb59 was the test-only mock_col.count wiring, fixed by c8bb47e. Current state on the PR head: 6 CI checks SUCCESS, mergeable CLEAN. Appreciate the catch.

Three corrections from @copilot-pull-request-reviewer: 1. available_in_scope was capped at n_results*20 via the pool get(), so the "sqlite-authoritative" framing was misleading on large palaces (would report at most 100 even when 5000+ drawers matched). Split the count out of the pool fetch: new _count_in_scope() paginates col.get(include=[], limit=5000, offset=K) to sum an authoritative total. Pagination keeps each query well under MemPalace#950's SQL-variable limit. 2. The "Run mempalace repair" warning was gated on len(hits) < n_results, so it never surfaced when the BM25 fallback filled the request — even though the vector index was the actual degraded layer. Added a vector_underdelivered flag captured before fallback runs; the warning now gates off that flag so users see "HNSW can't reach the rest" whenever the vector path fell short, regardless of how much BM25 papered over. 3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor to delegate through search_memories(). Preserve it explicitly in search() — the structured error payload stays path-agnostic, but the CLI adds "at <palace_path>" when printing so users still see which palace failed to open. Also updated _sqlite_fallback_and_scope() signature to take vector_underdelivered separately from allow_fallback, matching what the two warnings now need to distinguish.

- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Three corrections from @copilot-pull-request-reviewer: 1. available_in_scope was capped at n_results*20 via the pool get(), so the "sqlite-authoritative" framing was misleading on large palaces (would report at most 100 even when 5000+ drawers matched). Split the count out of the pool fetch: new _count_in_scope() paginates col.get(include=[], limit=5000, offset=K) to sum an authoritative total. Pagination keeps each query well under MemPalace#950's SQL-variable limit. 2. The "Run mempalace repair" warning was gated on len(hits) < n_results, so it never surfaced when the BM25 fallback filled the request — even though the vector index was the actual degraded layer. Added a vector_underdelivered flag captured before fallback runs; the warning now gates off that flag so users see "HNSW can't reach the rest" whenever the vector path fell short, regardless of how much BM25 papered over. 3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor to delegate through search_memories(). Preserve it explicitly in search() — the structured error payload stays path-agnostic, but the CLI adds "at <palace_path>" when printing so users still see which palace failed to open. Also updated _sqlite_fallback_and_scope() signature to take vector_underdelivered separately from allow_fallback, matching what the two warnings now need to distinguish.

- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Two new scripts plus legitimate fixes they surfaced. scripts/preflight.sh — runs the same checks CI does (ruff check, ruff format --check, pytest). Prevents the class of "I ran ruff format locally but CI runs ruff check too" bugs we hit on PR MemPalace#1024 and PR MemPalace#1198 today. scripts/rebase-on-develop.sh — rebases a fork PR branch onto upstream/develop, then auto-restores fork-only collateral (.claude-plugin/hooks/*.sh, .claude-plugin/.mcp.json) to upstream's state. The collateral cleanup was the recurring step I had to remember manually for each of MemPalace#1005/MemPalace#1024/MemPalace#1177; this codifies it. Supports --finish for the "after I resolved conflicts manually" continuation path. Never auto-pushes — prints the push command for confirmation. Lint debt the new preflight caught on main: - tests/test_palace_graph.py: F811 — `invalidate_graph_cache` was imported at line 9, then re-imported inside the chromadb-mock patch.dict block at line 44. Removed the duplicate; the line-9 import is sufficient since the autouse fixture at line 12 already uses it. - mempalace/backends/chroma.py: ruff format wanted a one-line `collection.modify(configuration=...)` call instead of the wrapped multi-line form. - mempalace/hooks_cli.py: ruff format wanted blank line after import in the daemon-URL try block, and dict-literal style for the json.dumps call. CI on jphein/mempalace runs both ruff check and ruff format --check; without these fixes the next push to main would have shown red. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

jphein · 2026-05-03T17:59:07Z

Posting an update now that #1306 (candidate_strategy="union") has merged into develop.

The two PRs address overlapping but distinct failure modes, and they aren't strictly redundant:

	#1306 (`candidate_strategy="union"`, merged)	#1005 (this PR)
Mechanism	Pre-rerank pool widening: vector ∪ BM25 candidates feed `_hybrid_rank` together	Post-hoc top-up: vector path runs first, BM25 promotes only if `len(hits) < n_results`
Failure mode	Query embeds close to the wrong content semantically (terminology guides, narrative-shaped queries)	HNSW sparse / drifted / `#951` filter-planner failure / vector path raised
Default behavior	Opt-in (`candidate_strategy="vector"` is unchanged)	Auto (warnings always populated; top-up triggers when underdelivered)
`warnings: [...]` field	No	Yes — explicit "vector search unavailable: ..." surface
`available_in_scope: N`	No	Yes — sqlite-authoritative scope count
Vector exception handling	Raises	Caught → warning, returns BM25-only

The observability slice (warnings + available_in_scope + no-raise on vector errors) is novel and not covered by #1306. The candidate-pool-widening slice overlaps semantically.

Three paths I'd be happy with — your call:

Reframe + rebase. Keep the observability slice and the post-hoc top-up; cite feat(searcher): candidate_strategy="union" — BM25 candidates joined with vector pool before hybrid rerank #1306 as orthogonal. The post-hoc top-up still earns its keep because feat(searcher): candidate_strategy="union" — BM25 candidates joined with vector pool before hybrid rerank #1306's union mode is opt-in and doesn't help callers who didn't pass the flag (or who hit infra-shaped failures rather than query-shape mismatches).
Slim down + rebase. Drop the BM25 top-up entirely; keep just the warnings + available_in_scope + no-raise. Smaller, narrower, easier to review.
Close as superseded. If you'd rather have one knob (candidate_strategy="union") than two coexisting fallback shapes, I'll close.

The current branch is several weeks pre-#1306, so any rebase will be a clean re-apply on top of the new searcher.py — no merge churn for you to read through. Happy to do whichever you prefer.

Three corrections from @copilot-pull-request-reviewer: 1. available_in_scope was capped at n_results*20 via the pool get(), so the "sqlite-authoritative" framing was misleading on large palaces (would report at most 100 even when 5000+ drawers matched). Split the count out of the pool fetch: new _count_in_scope() paginates col.get(include=[], limit=5000, offset=K) to sum an authoritative total. Pagination keeps each query well under MemPalace#950's SQL-variable limit. 2. The "Run mempalace repair" warning was gated on len(hits) < n_results, so it never surfaced when the BM25 fallback filled the request — even though the vector index was the actual degraded layer. Added a vector_underdelivered flag captured before fallback runs; the warning now gates off that flag so users see "HNSW can't reach the rest" whenever the vector path fell short, regardless of how much BM25 papered over. 3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor to delegate through search_memories(). Preserve it explicitly in search() — the structured error payload stays path-agnostic, but the CLI adds "at <palace_path>" when printing so users still see which palace failed to open. Also updated _sqlite_fallback_and_scope() signature to take vector_underdelivered separately from allow_fallback, matching what the two warnings now need to distinguish.

- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

When the vector index returns fewer than n_results (sparse HNSW post-repair, MemPalace#951 filter-planner failure, drift), search_memories now: 1. Computes an authoritative scope count via paginated col.get(), surfaced as `available_in_scope` in the response. Caps each query below MemPalace#950's SQL-variable limit. 2. Tops up the hits list with BM25-ranked sqlite candidates tagged `matched_via: "sqlite_bm25_fallback"` when the vector path is under-delivering. Skips candidates with BM25 score 0 so the fallback never pads with unrelated content. 3. Returns `warnings: [...]` describing when fallback fired and when the scope contains more drawers than the vector path can rank (gated on a `vector_underdelivered` flag captured before fallback runs, so the warning surfaces even when BM25 papered over the gap). CLI search() delegates to search_memories() so terminal output and MCP responses share the same retrieval, fallback, and warning semantics. Preserves the palace path in printed errors. Closes the silent 0-hit failure mode where data was in sqlite but the vector path returned nothing — visible to the user via warnings and `available_in_scope`, fixable via `mempalace repair`. Tests: 29/29 pass on rebased branch (Python 3.9 floor honored via Optional[int]). Mock setup updated to set count.return_value so the new "more in scope" warning path doesn't fail on MagicMock comparison. Squashed rebase against current upstream/develop (post-MemPalace#1377). Was filed as 5-commit history; squashed for cleaner review. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

jphein · 2026-05-06T09:46:42Z

Rebased against current develop (post-#1377/#1029/#1019/#988) and squashed the 5-commit history into one focused commit (d0f4e32). Preserves Dialectician's vector_underdelivered fix, Copilot's _count_in_scope paginated authoritative count, and the Optional[int] Python 3.9 compat. Mock setup in test_effective_distance_clamped_to_valid_cosine_range (added by #1029 while this PR was waiting) needed mock_col.count.return_value = 2 so the new "more in scope" warning path doesn't compare a MagicMock to an int; that change matches the same pattern @copilot-pull-request-reviewer originally asked for.

Local: 1555 passed, 0 failed (Python 3.12, full suite). CI status will populate shortly.

Mergeable now. Ready when you are.

…noqa The squash in d0f4e32 pushed search_memories one branch over ruff's C901 threshold (26 > 25). The function is doing meaningful work — dispatch across vector / BM25-only / candidate-strategy / sqlite- fallback / warning paths, each of which is one branch — so an extraction here would either thrash the diff or re-run the A/B benchmarks that gated the original split decisions. Adding a noqa with intent annotated is the smaller, lower-risk fix; an extraction can be a follow-up if maintainers prefer.

jphein requested review from bensig, igorls and milla-jovovich as code owners April 18, 2026 18:04

Copilot AI review requested due to automatic review settings April 18, 2026 18:04

Copilot started reviewing on behalf of jphein April 18, 2026 18:04 View session

Copilot AI reviewed Apr 18, 2026

View reviewed changes

Dialectician reviewed Apr 19, 2026

View reviewed changes

Dialectician mentioned this pull request Apr 19, 2026

fix(searcher): guard against None metadata in CLI print path #999

Merged

Dialectician reviewed Apr 19, 2026

View reviewed changes

jphein added a commit to jphein/mempalace that referenced this pull request Apr 19, 2026

docs: update README — MemPalace#1000 and MemPalace#1005 Dialectician …

590acae

…review addressed Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

jphein force-pushed the feat/search-warnings-sqlite-fallback branch from b9a6585 to 9e45485 Compare April 19, 2026 01:05

igorls added enhancement New feature or request area/search Search and retrieval labels Apr 24, 2026

jphein force-pushed the feat/search-warnings-sqlite-fallback branch from c8bb47e to a3e90a1 Compare April 24, 2026 21:25

jphein force-pushed the feat/search-warnings-sqlite-fallback branch from a3e90a1 to eb69eb7 Compare April 25, 2026 14:39

jphein force-pushed the feat/search-warnings-sqlite-fallback branch from eb69eb7 to 03ecefd Compare May 6, 2026 09:39

jphein force-pushed the feat/search-warnings-sqlite-fallback branch from 03ecefd to d0f4e32 Compare May 6, 2026 09:46

jphein mentioned this pull request May 6, 2026

Hash embedding mode should use lexical/BM25 search fallback #1380

Open

Conversation

jphein commented Apr 18, 2026

Summary

Contract change

Relation to #951 and #823

Complexity

Tests

Scope / non-goals

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

jphein May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

jphein May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

jphein May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Dialectician left a comment

Choose a reason for hiding this comment

Uh oh!

Dialectician left a comment

Choose a reason for hiding this comment

Uh oh!

jphein commented Apr 19, 2026

Uh oh!

jphein commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dialectician commented Apr 19, 2026

Uh oh!

Dialectician commented Apr 19, 2026

Uh oh!

jphein commented Apr 22, 2026

Uh oh!

jphein commented May 3, 2026

Uh oh!

jphein commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jphein commented Apr 19, 2026 •

edited

Loading