feat(searcher): warnings + sqlite BM25 top-up when vector underdelivers#1005
feat(searcher): warnings + sqlite BM25 top-up when vector underdelivers#1005jphein wants to merge 2 commits intoMemPalace:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates MemPalace search behavior to avoid silent underdelivery when ChromaDB’s vector index fails or returns too few hits by surfacing warnings and optionally topping up results from SQLite using BM25 keyword ranking. It also routes the CLI search() through search_memories() so both programmatic and CLI callers share the same fallback/warning behavior.
Changes:
- Make
search_memories()degrade vector query failures into"warnings"instead of returning an"error"(except for “No palace found”). - Add SQLite scope accounting + BM25 top-up fallback when vector results underdeliver and
max_distancefiltering is disabled. - Update CLI
search()to delegate tosearch_memories()and print warnings/scope information; update tests for the new contract.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
mempalace/searcher.py |
Adds warnings + SQLite/BM25 fallback logic and unifies CLI and API search behavior via search_memories(). |
tests/test_searcher.py |
Updates/extends tests to validate warning surfacing and fallback behavior instead of hard failures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pool_kwargs: dict = {"include": ["documents", "metadatas"]} | ||
| if where: | ||
| pool_kwargs["where"] = where | ||
| # Cap the pool — sqlite-variable limits (#950) bite around 32k, and | ||
| # a fallback pool 20x the request is plenty for BM25 ranking. | ||
| pool_kwargs["limit"] = max(n_results * 20, 100) | ||
| pool = drawers_col.get(**pool_kwargs) | ||
| except Exception as e: | ||
| warnings.append(f"sqlite fallback unavailable: {e}") | ||
| return available_in_scope, warnings | ||
|
|
||
| pool_docs = pool.get("documents") or [] | ||
| pool_metas = pool.get("metadatas") or [] | ||
| available_in_scope = len(pool_docs) | ||
|
|
There was a problem hiding this comment.
available_in_scope is computed as len(pool_docs) after a get(limit=max(n_results*20, 100)), so it will never exceed that limit and is not an authoritative “scope count”. This makes the CLI scope line and the later “unreachable via HNSW” warning potentially misleading on large palaces. Consider computing the full in-scope count separately (e.g., drawers_col.count(where=where) if safe, or paginating get(include=["ids"], limit=..., offset=...) to count) and keep the limited get(include=["documents","metadatas"]) only for BM25 fallback candidates.
There was a problem hiding this comment.
Addressed in commit d0f4e32 (squashed rebase of the original b4cbdfd). _count_in_scope() now paginates col.get(include=[], limit=5000, offset=K) to compute an authoritative scope total — independent of the (capped) fallback pool — so available_in_scope reflects sqlite truth, not the pool ceiling. Pagination keeps each query well under #950's SQL-variable limit.
| # Surface unreachable data: scope has more matches than we could rank. | ||
| if available_in_scope is not None and available_in_scope > len(hits) and len(hits) < n_results: | ||
| warnings.append( | ||
| f"{available_in_scope} drawers match this scope in sqlite; " | ||
| f"{len(hits)} ranked — the rest are unreachable via the current " | ||
| f"HNSW index. Run `mempalace repair` to rebuild." | ||
| ) |
There was a problem hiding this comment.
The “Run mempalace repair” warning is currently gated on len(hits) < n_results. If the sqlite BM25 fallback fills the request to n_results, this warning won’t surface even when available_in_scope > len(hits) (i.e., lots of drawers exist in-scope but couldn’t be vector-ranked). If the intended contract is “warn whenever scope > returned”, drop the len(hits) < n_results condition (or otherwise key it off vector underdelivery rather than the final hit count).
There was a problem hiding this comment.
Addressed in d0f4e32. The "Run `mempalace repair`" warning now gates off a vector_underdelivered flag captured before the BM25 fallback runs, instead of len(hits) < n_results. So when sqlite top-up papers over a degraded vector path, the user still sees that the vector index is the under-delivering layer.
| print(f"\n {result['error']}") | ||
| if "hint" in result: | ||
| print(f" {result['hint']}") | ||
| raise SearchError(result["error"]) |
There was a problem hiding this comment.
CLI search() used to print the missing-palace path (No palace found at <path>). After delegating to search_memories(), it now prints only No palace found (no path), which is less actionable when multiple palaces/paths are involved. Consider including palace_path in the printed message (either by enriching the search_memories error payload or formatting it in search()).
| print(f"\n {result['error']}") | |
| if "hint" in result: | |
| print(f" {result['hint']}") | |
| raise SearchError(result["error"]) | |
| error_message = result["error"] | |
| if error_message == "No palace found": | |
| error_message = f"{error_message} at {palace_path}" | |
| print(f"\n {error_message}") | |
| if "hint" in result: | |
| print(f" {result['hint']}") | |
| raise SearchError(error_message) |
There was a problem hiding this comment.
Addressed in d0f4e32. CLI search() now preserves the palace path explicitly when printing the error: structured payload from search_memories stays path-agnostic (right for the API contract), but the CLI prepends at <palace_path> so users still see which palace failed to open.
Three corrections from @copilot-pull-request-reviewer: 1. available_in_scope was capped at n_results*20 via the pool get(), so the "sqlite-authoritative" framing was misleading on large palaces (would report at most 100 even when 5000+ drawers matched). Split the count out of the pool fetch: new _count_in_scope() paginates col.get(include=[], limit=5000, offset=K) to sum an authoritative total. Pagination keeps each query well under MemPalace#950's SQL-variable limit. 2. The "Run mempalace repair" warning was gated on len(hits) < n_results, so it never surfaced when the BM25 fallback filled the request — even though the vector index was the actual degraded layer. Added a vector_underdelivered flag captured before fallback runs; the warning now gates off that flag so users see "HNSW can't reach the rest" whenever the vector path fell short, regardless of how much BM25 papered over. 3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor to delegate through search_memories(). Preserve it explicitly in search() — the structured error payload stays path-agnostic, but the CLI adds "at <palace_path>" when printing so users still see which palace failed to open. Also updated _sqlite_fallback_and_scope() signature to take vector_underdelivered separately from allow_fallback, matching what the two warnings now need to distinguish.
Three corrections from @copilot-pull-request-reviewer: 1. available_in_scope was capped at n_results*20 via the pool get(), so the "sqlite-authoritative" framing was misleading on large palaces (would report at most 100 even when 5000+ drawers matched). Split the count out of the pool fetch: new _count_in_scope() paginates col.get(include=[], limit=5000, offset=K) to sum an authoritative total. Pagination keeps each query well under MemPalace#950's SQL-variable limit. 2. The "Run mempalace repair" warning was gated on len(hits) < n_results, so it never surfaced when the BM25 fallback filled the request — even though the vector index was the actual degraded layer. Added a vector_underdelivered flag captured before fallback runs; the warning now gates off that flag so users see "HNSW can't reach the rest" whenever the vector path fell short, regardless of how much BM25 papered over. 3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor to delegate through search_memories(). Preserve it explicitly in search() — the structured error payload stays path-agnostic, but the CLI adds "at <palace_path>" when printing so users still see which palace failed to open. Also updated _sqlite_fallback_and_scope() signature to take vector_underdelivered separately from allow_fallback, matching what the two warnings now need to distinguish.
README: - Fork-changes table: expand the None-metadata row to cover all 8 sites (searcher.py CLI + API + closet-boost, miner.status, 4 mcp_server handlers). Previous row only called out the CLI print path. - Add a new Search row: warnings + sqlite BM25 top-up contract (the "never silent miss" feature) with pointer to MemPalace#951 + MemPalace#823. - Open-PR table: expand MemPalace#999 scope line to mention 8 sites + architectural note, update MemPalace#1000 to reflect post-MemPalace#995 rebase, add MemPalace#1005 with Copilot fixes noted. CLAUDE.md: - PR status header: 7 open -> 8 open (adds MemPalace#1005). - Same PR row updates as README for MemPalace#999/MemPalace#1000/MemPalace#1005. - Fork Changes list: expand entry 11 (None guards) to 8 sites + adapter consolidation proposal on MemPalace#999; add entry 14 for the warnings+BM25 feature; keep 12 and 13 as-is. 42 README-claim tests still pass.
Concrete evidence: - New "What this looks like in practice" section right after the status line, showing a real stop-hook systemMessage output and a real mempalace_search response shape (warnings, available_in_scope, matched_via tags, similarity scores). Someone evaluating the fork can see what "running in production" actually looks like. Headlines box: - New "Headlines" subsection at the top of Fork Changes with the three differentiators someone should know if they only read one section — silent-save hook, ChromaDB 1.5.x hardening (quarantine + None guards), search-never-misses contract. Links to MemPalace#673/MemPalace#999/ MemPalace#1000/MemPalace#1005 so readers can jump to the work itself. Citations for comparison table: - Every row now links to its upstream repo: Hindsight (vectorize-io), Mem0 + OpenMemory (mem0ai), Cognee (topoteretes), Letta (letta-ai), engram (NickCirv), CaviraOSS OpenMemory. Cognee row updated since they've added MCP support since we first wrote the row. - Replaced the "Systems mentioned without captured primary URLs" footnote (now stale since we have them) with a "Verification note" that's honest about the point-in-time nature of these columns and explains why TagMem is absent. Structure cleanups: - Removed the upstream MemPalace logo at the top — it's milla- jovovich's asset and using it in a fork README is awkward. - Renamed "Roadmap" → "Planned work" — the section is decisive with priorities and time estimates, "Roadmap" was underselling it. No content removed from the document beyond the stale footnote and the upstream logo. All existing sections intact. 42 README-claim tests pass.
Every remaining row in "Still ahead of upstream" now carries a status so the reader can tell at a glance whether each change is being upstreamed, pending a PR, or deliberately fork-local. Dropped: - "Guard ChromaDB 1.5.x metadata-mismatch segfault" — this row was overstated. The memory file for today's debugging notes that the try-get/except-create pattern is defensive code that never reproduced a specific crash (the actual crashes traced to HNSW drift). Leaving it in "Still ahead" implied an upstream-candidate fix, which it isn't. Code stays in place as defensive, but the README no longer claims it as a fork-ahead feature. Moved to Superseded: - "Stale HNSW mtime detection + mempalace_reconnect" — upstream took a different approach in MemPalace#757. Our broader inode+mtime detection and the mempalace_reconnect MCP tool remain as fork-local convenience; they're just not "ahead of upstream" anymore. Statuses now populated: - Linked PR number for the 7 changes with active upstream PRs (MemPalace#659, MemPalace#660, MemPalace#661, MemPalace#673 with APPROVED note, MemPalace#999, MemPalace#1000, MemPalace#1005). - "PR pending" for 3 items that are good candidates but unfiled: epsilon mtime comparison, max_distance parameter, tool output mining. - "fork-only" for 2 items we keep intentionally without pitching upstream: .blob_seq_ids_migrated marker (narrow), bulk_check_mined (complementary to upstream's MemPalace#784 file-locking). Legend sentence added above the table explains the three status values. 42 README-claim tests pass.
Dialectician
left a comment
There was a problem hiding this comment.
Read the diff end to end, ran the full test suite on your main + this branch (998/998 pass in 40.6s on ChromaDB 1.5.8). Think this lands well. Two things to call out, neither a blocker.
Gate semantics on allow_fallback = (max_distance <= 0.0)
search_memories defaults max_distance = 0.0, so CLI callers (no override) get the BM25 fallback. But tool_search in mcp_server.py defaults to max_distance = 1.5 and always passes it through, so the MCP path hits 1.5 <= 0.0, False, fallback disabled by default. If an agent hits a drifted palace via the MCP tool, they get the silent underdeliver this PR is designed to prevent, because the gate conflates "permissive default" with "strict caller-set threshold."
Might be intentional if you are scoping to CLI for now. If not, cleaner gate is Optional[float] with None meaning "allow fallback" and any explicit value meaning "respect strict mode." Happy to follow up with a small PR if you confirm which way.
_count_in_scope on unfiltered scopes
When both wing and room are None, where is None and the function paginates all drawers to count. At 137K with PAGE=5000 that is ~28 round trips per search just for the scope count. drawers_col.count() does this O(1) when no filter is needed.
Also worth noting: the pre-mutation vector_hit_count capture (so the scope warning still fires after BM25 top-up) is load-bearing subtlety. Worth a comment if you have not already lost patience with refactor resistance.
Peace B.Sweet
Dialectician
left a comment
There was a problem hiding this comment.
Workload validation from a real palace (2487 drawers, 10 wings, active production use as memory backing for an agent stack). This palace has HNSW drift, every filtered-by-wing query hits the planner error. Exact reproduction of the scenario this PR addresses.
Three CLI queries against --wing <real-wing-id> with keywords "federation", "test", "memory":
Before your fork (mempalace 3.3.0, current installed runtime):
Search error: Search error: Error executing plan: Internal error: Error finding id
Hard SearchError. Caller sees nothing.
With your fork:
- Warning:
vector search unavailable: Error executing plan: Internal error: Error finding id - Scope count:
Scope has: 22 drawers matching filter/2280 drawers matching filter - BM25 top-up:
vector search returned 0 of 3 requested; filled N from sqlite+BM25 keyword match - Actionable hint:
Run mempalace repair to rebuild the HNSW index for full semantic recall.
Every contract promise in the PR description fires as advertised on real data. The vector_underdelivered pre-mutation capture also proved load-bearing: in one query the BM25 fallback filled 3 of 3 requested and the "2280 drawers in scope, vector ranked 0" warning still surfaced, which is the correct signal (vector is still the degraded layer even though the final hit count is fine).
Separately: the gate-semantics note from my earlier review (CLI default max_distance=0.0 enables BM25 fallback, MCP default 1.5 disables it) is more significant than I initially weighted it. The MCP path, which is the primary access pattern for agents querying this palace, would see "0 hits" silently on this same failure rather than "0 hits + warning + BM25 fill." For a drifted palace being queried by Claude through MCP, the agent would conclude the memory is empty when it is only the ranking layer that is broken. Happy to send the Optional[float] gate fix as a small follow-up PR if you greenlight.
Peace B.Sweet
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
Addressed both Dialectician review points: Gate semantics: Replaced
Also added the comment on |
|
Thanks Dialectician — really appreciate the thorough review. Running the full suite + validating on a live drifted palace is exactly the kind of signal that's hard to fake, and the 'pre-mutation Hoping the fixes hold up as well on your palace as they did in the test run. On the three Copilot inline comments:
CLI missing palace path: The CLI |
|
Pulled b9a6585, ran the test suite (974 pass, 106 deselected), spot-checked the runtime against the repaired palace. The
Good trade. LGTM on this round. Peace B.Sweet |
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…review addressed Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Three corrections from @copilot-pull-request-reviewer: 1. available_in_scope was capped at n_results*20 via the pool get(), so the "sqlite-authoritative" framing was misleading on large palaces (would report at most 100 even when 5000+ drawers matched). Split the count out of the pool fetch: new _count_in_scope() paginates col.get(include=[], limit=5000, offset=K) to sum an authoritative total. Pagination keeps each query well under MemPalace#950's SQL-variable limit. 2. The "Run mempalace repair" warning was gated on len(hits) < n_results, so it never surfaced when the BM25 fallback filled the request — even though the vector index was the actual degraded layer. Added a vector_underdelivered flag captured before fallback runs; the warning now gates off that flag so users see "HNSW can't reach the rest" whenever the vector path fell short, regardless of how much BM25 papered over. 3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor to delegate through search_memories(). Preserve it explicitly in search() — the structured error payload stays path-agnostic, but the CLI adds "at <palace_path>" when printing so users still see which palace failed to open. Also updated _sqlite_fallback_and_scope() signature to take vector_underdelivered separately from allow_fallback, matching what the two warnings now need to distinguish.
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
b9a6585 to
9e45485
Compare
|
Heads-up on 7c8bb59: one test regression from the The test mocks Fix is one line, same pattern you applied to the other four tests in b9a6585: mock_col = MagicMock()
mock_col.count.return_value = 2 # any int; test does not otherwise care
mock_col.query.return_value = {...}(Runtime behavior is unaffected because real ChromaDB Peace B.Sweet |
|
Thanks @Dialectician — verified green. The regression at 7c8bb59 was the test-only |
Three corrections from @copilot-pull-request-reviewer: 1. available_in_scope was capped at n_results*20 via the pool get(), so the "sqlite-authoritative" framing was misleading on large palaces (would report at most 100 even when 5000+ drawers matched). Split the count out of the pool fetch: new _count_in_scope() paginates col.get(include=[], limit=5000, offset=K) to sum an authoritative total. Pagination keeps each query well under MemPalace#950's SQL-variable limit. 2. The "Run mempalace repair" warning was gated on len(hits) < n_results, so it never surfaced when the BM25 fallback filled the request — even though the vector index was the actual degraded layer. Added a vector_underdelivered flag captured before fallback runs; the warning now gates off that flag so users see "HNSW can't reach the rest" whenever the vector path fell short, regardless of how much BM25 papered over. 3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor to delegate through search_memories(). Preserve it explicitly in search() — the structured error payload stays path-agnostic, but the CLI adds "at <palace_path>" when printing so users still see which palace failed to open. Also updated _sqlite_fallback_and_scope() signature to take vector_underdelivered separately from allow_fallback, matching what the two warnings now need to distinguish.
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
c8bb47e to
a3e90a1
Compare
Three corrections from @copilot-pull-request-reviewer: 1. available_in_scope was capped at n_results*20 via the pool get(), so the "sqlite-authoritative" framing was misleading on large palaces (would report at most 100 even when 5000+ drawers matched). Split the count out of the pool fetch: new _count_in_scope() paginates col.get(include=[], limit=5000, offset=K) to sum an authoritative total. Pagination keeps each query well under MemPalace#950's SQL-variable limit. 2. The "Run mempalace repair" warning was gated on len(hits) < n_results, so it never surfaced when the BM25 fallback filled the request — even though the vector index was the actual degraded layer. Added a vector_underdelivered flag captured before fallback runs; the warning now gates off that flag so users see "HNSW can't reach the rest" whenever the vector path fell short, regardless of how much BM25 papered over. 3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor to delegate through search_memories(). Preserve it explicitly in search() — the structured error payload stays path-agnostic, but the CLI adds "at <palace_path>" when printing so users still see which palace failed to open. Also updated _sqlite_fallback_and_scope() signature to take vector_underdelivered separately from allow_fallback, matching what the two warnings now need to distinguish.
a3e90a1 to
eb69eb7
Compare
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Two new scripts plus legitimate fixes they surfaced. scripts/preflight.sh — runs the same checks CI does (ruff check, ruff format --check, pytest). Prevents the class of "I ran ruff format locally but CI runs ruff check too" bugs we hit on PR MemPalace#1024 and PR MemPalace#1198 today. scripts/rebase-on-develop.sh — rebases a fork PR branch onto upstream/develop, then auto-restores fork-only collateral (.claude-plugin/hooks/*.sh, .claude-plugin/.mcp.json) to upstream's state. The collateral cleanup was the recurring step I had to remember manually for each of MemPalace#1005/MemPalace#1024/MemPalace#1177; this codifies it. Supports --finish for the "after I resolved conflicts manually" continuation path. Never auto-pushes — prints the push command for confirmation. Lint debt the new preflight caught on main: - tests/test_palace_graph.py: F811 — `invalidate_graph_cache` was imported at line 9, then re-imported inside the chromadb-mock patch.dict block at line 44. Removed the duplicate; the line-9 import is sufficient since the autouse fixture at line 12 already uses it. - mempalace/backends/chroma.py: ruff format wanted a one-line `collection.modify(configuration=...)` call instead of the wrapped multi-line form. - mempalace/hooks_cli.py: ruff format wanted blank line after import in the daemon-URL try block, and dict-literal style for the json.dumps call. CI on jphein/mempalace runs both ruff check and ruff format --check; without these fixes the next push to main would have shown red. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
Posting an update now that #1306 ( The two PRs address overlapping but distinct failure modes, and they aren't strictly redundant:
The observability slice (warnings + Three paths I'd be happy with — your call:
The current branch is several weeks pre-#1306, so any rebase will be a clean re-apply on top of the new searcher.py — no merge churn for you to read through. Happy to do whichever you prefer. |
Three corrections from @copilot-pull-request-reviewer: 1. available_in_scope was capped at n_results*20 via the pool get(), so the "sqlite-authoritative" framing was misleading on large palaces (would report at most 100 even when 5000+ drawers matched). Split the count out of the pool fetch: new _count_in_scope() paginates col.get(include=[], limit=5000, offset=K) to sum an authoritative total. Pagination keeps each query well under MemPalace#950's SQL-variable limit. 2. The "Run mempalace repair" warning was gated on len(hits) < n_results, so it never surfaced when the BM25 fallback filled the request — even though the vector index was the actual degraded layer. Added a vector_underdelivered flag captured before fallback runs; the warning now gates off that flag so users see "HNSW can't reach the rest" whenever the vector path fell short, regardless of how much BM25 papered over. 3. CLI "No palace found" lost the palace path after MemPalace#1005's refactor to delegate through search_memories(). Preserve it explicitly in search() — the structured error payload stays path-agnostic, but the CLI adds "at <palace_path>" when printing so users still see which palace failed to open. Also updated _sqlite_fallback_and_scope() signature to take vector_underdelivered separately from allow_fallback, matching what the two warnings now need to distinguish.
- _count_in_scope: use col.count() for unfiltered scopes (O(1) vs 28 paginated round trips on 137K drawers when wing/room filter is absent) - allow_fallback gate: fire BM25 fallback when max_distance is permissive (<=0.0) OR when a vector error occurred (warnings non-empty). Fixes the MCP path: tool_search passes max_distance=1.5, which the old gate read as "strict" and silently skipped fallback — so agents on a drifted palace saw 0 hits instead of BM25 coverage + a warning - Add explanatory comment on vector_hit_count pre-mutation capture - Update test mocks with explicit count.return_value (required now that _count_in_scope calls col.count() for unfiltered scopes) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
eb69eb7 to
03ecefd
Compare
When the vector index returns fewer than n_results (sparse HNSW post-repair, MemPalace#951 filter-planner failure, drift), search_memories now: 1. Computes an authoritative scope count via paginated col.get(), surfaced as `available_in_scope` in the response. Caps each query below MemPalace#950's SQL-variable limit. 2. Tops up the hits list with BM25-ranked sqlite candidates tagged `matched_via: "sqlite_bm25_fallback"` when the vector path is under-delivering. Skips candidates with BM25 score 0 so the fallback never pads with unrelated content. 3. Returns `warnings: [...]` describing when fallback fired and when the scope contains more drawers than the vector path can rank (gated on a `vector_underdelivered` flag captured before fallback runs, so the warning surfaces even when BM25 papered over the gap). CLI search() delegates to search_memories() so terminal output and MCP responses share the same retrieval, fallback, and warning semantics. Preserves the palace path in printed errors. Closes the silent 0-hit failure mode where data was in sqlite but the vector path returned nothing — visible to the user via warnings and `available_in_scope`, fixable via `mempalace repair`. Tests: 29/29 pass on rebased branch (Python 3.9 floor honored via Optional[int]). Mock setup updated to set count.return_value so the new "more in scope" warning path doesn't fail on MagicMock comparison. Squashed rebase against current upstream/develop (post-MemPalace#1377). Was filed as 5-commit history; squashed for cleaner review. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
03ecefd to
d0f4e32
Compare
|
Rebased against current Local: 1555 passed, 0 failed (Python 3.12, full suite). CI status will populate shortly. Mergeable now. Ready when you are. |
…noqa The squash in d0f4e32 pushed search_memories one branch over ruff's C901 threshold (26 > 25). The function is doing meaningful work — dispatch across vector / BM25-only / candidate-strategy / sqlite- fallback / warning paths, each of which is one branch — so an extraction here would either thrash the diff or re-run the A/B benchmarks that gated the original split decisions. Adding a noqa with intent annotated is the smaller, lower-risk fix; an extraction can be a follow-up if maintainers prefer.
Summary
When ChromaDB's HNSW index is sparse, drifted, or rejects a filter (#951's
Error finding id), today'ssearch_memoriesreturns fewer hits than the palace actually contains — silently. A user querying for drawers that exist in sqlite sees an empty or partial result and concludes the palace has forgotten, when only the vector ranking layer is degraded.Silent hit-miss is worse than a crash because callers can't detect it. This is especially painful:
mempalace repairwhile HNSW is still rebuildingsync_thresholdhas left on-disk HNSW lagging behind sqliteContract change
search_memories()no longer hard-fails on vector errors — it always returns a result (except"error": "No palace found"). The return dict gains two fields:Behavior:
"vector search unavailable: <err>") and continue — do not raise.len(hits) < n_resultsand the caller did NOT setmax_distance(strict-similarity mode), top up from the sqlite pool via BM25 keyword ranking.matched_via="sqlite_bm25_fallback".distance/similarityareNonesince there's no vector score.mempalace repair.CLI
search()now delegates tosearch_memoriesso both paths share the fallback and warning surface. Warnings print with a!prefix above the results:Relation to #951 and #823
Error finding idat the query layer and falls back tocol.getwith local lexical ranking. This PR widens that idea to any vector underdelivery and adds the visible warning layer. They compose cleanly — if you merge fix: degrade gracefully when filtered Chroma search fails #951 first I can rebase and use its fallback as the inner primitive.sync_thresholdto reduce drift in the first place) is orthogonal and can land separately.Complexity
Ruff flagged
search_memories(C901) after the additions. Extracted the fallback + scope-count into_sqlite_fallback_and_scope()to stay under threshold without losing any local context.Tests
test_search_memories_fills_from_sqlite_when_vector_underdelivers— mocks vector returning 1 hit, sqlite having 4 drawers in scope (2 with query-term matches, 1 without); asserts 2 fallback hits get promoted via BM25 and the one with zero matches is skipped; assertsavailable_in_scope == 4and the warning text.test_search_memories_query_error_degrades_to_warning— vector raises, warning surfaced, no hard failure.test_search_query_error_degrades_to_warning— CLI no longer raises on vector failure, prints the warning.4 prior tests updated for the new contract (scope count, warnings list, no
SearchErroron pure query failure).pytest tests/→ 974 passedruff check/ruff format --check— cleanScope / non-goals
resultsandquerystill work).drawer_resultsand the final return.quarantine_stale_hnsw(feat(backends): quarantine_stale_hnsw — recover from HNSW/sqlite drift (closes #823) #1000) into the read path — intentionally separate.