fix(search): CLI hybrid rerank, legacy-metric warning, invariant tests (3.3.4)#1179
fix(search): CLI hybrid rerank, legacy-metric warning, invariant tests (3.3.4)#1179
Conversation
There was a problem hiding this comment.
Pull request overview
Improves CLI search quality and debuggability by aligning the CLI search path with the existing hybrid (BM25 + vector) ranking behavior, and adds safeguards to prevent future regressions in Chroma distance-metric configuration.
Changes:
- Wire CLI
search()through the existing_hybrid_rankreranker and surface bothcosine=andbm25=components in output. - Detect legacy Chroma collections missing
hnsw:space=cosineand emit a CLI warning pointing users tomempalace repair. - Add invariant tests ensuring every collection-creation path sets
hnsw:space=cosine.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
mempalace/searcher.py |
Adds legacy-metric warning + applies hybrid rerank to CLI printing path. |
mempalace/backends/chroma.py |
Exposes collection metadata via a passthrough property for metric checks. |
tests/test_searcher.py |
Adds regression tests for hybrid rerank behavior and legacy-metric warning behavior. |
tests/test_collection_metric_invariant.py |
Adds invariant coverage ensuring cosine metric is set across creation paths and persists on reopen. |
CHANGELOG.md |
Documents the CLI search rerank change and the legacy metric warning behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _warn_if_legacy_metric(col) -> None: | ||
| """Print a one-line notice if the palace was created without | ||
| ``hnsw:space=cosine``. | ||
|
|
There was a problem hiding this comment.
Docstring says this prints a “one-line notice”, but the implementation prints a multi-line message (and includes a leading newline). Update the docstring (or the output formatting) so they match, since tests/assertions and the changelog describe a specific user-facing behavior.
| - `mempalace_diary_read(wing="")` now returns diary entries from every wing this agent has written to, matching the #1097 "empty-string as no filter" pattern. Previously defaulted to `wing_<agent>`, siloing entries that hooks wrote to project-derived wings. (#1145) | ||
| - `mempalace mine` now skips the generated `entities.json` file so its contents aren't re-ingested as project content. (#1175) | ||
| - **CLI `mempalace search` retrieval quality.** The CLI was using pure ChromaDB cosine distance with no BM25 rerank, so drawers containing every query term but embedding as noise (directory listings, diff output, shell logs) scored `Match: 0.0` alongside genuinely irrelevant results with no way to tell them apart. Wired the CLI through the same `_hybrid_rank` the `mempalace_search` MCP tool already used, and surfaced both `cosine=` and `bm25=` scores in the output so users see which component of the match is firing. MCP search was unaffected; this fixes the human-facing CLI parity gap. | ||
| - **Legacy-palace distance-metric warning.** CLI search now detects palaces created before `hnsw:space=cosine` was consistently set and prints a one-line notice pointing at `mempalace repair`. Without the warning such palaces silently used L2 distance, under which the similarity display floored every result to `Match: 0.0`. New palaces mined today already set cosine correctly and now have invariant tests pinning that behavior so future refactors can't silently regress it. |
There was a problem hiding this comment.
This changelog entry describes the legacy-metric warning as a “one-line notice”, but the CLI currently prints a 3-line NOTICE block (with a blank line before it). Please align the wording here with the actual output format to avoid misleading users/support docs.
| - **Legacy-palace distance-metric warning.** CLI search now detects palaces created before `hnsw:space=cosine` was consistently set and prints a one-line notice pointing at `mempalace repair`. Without the warning such palaces silently used L2 distance, under which the similarity display floored every result to `Match: 0.0`. New palaces mined today already set cosine correctly and now have invariant tests pinning that behavior so future refactors can't silently regress it. | |
| - **Legacy-palace distance-metric warning.** CLI search now detects palaces created before `hnsw:space=cosine` was consistently set and prints a 3-line `NOTICE` block (preceded by a blank line) pointing at `mempalace repair`. Without the warning such palaces silently used L2 distance, under which the similarity display floored every result to `Match: 0.0`. New palaces mined today already set cosine correctly and now have invariant tests pinning that behavior so future refactors can't silently regress it. |
|
|
||
|
|
||
| def _assert_cosine(col, where: str) -> None: | ||
| meta = col.metadata if hasattr(col, "metadata") else col._collection.metadata |
There was a problem hiding this comment.
hasattr(col, "metadata") will invoke the property getter (and then col.metadata invokes it again), which can be surprising if the property ever does more than return a field. Prefer a single getattr(col, "metadata", None) (or try/except AttributeError) and fall back accordingly to avoid double evaluation.
| meta = col.metadata if hasattr(col, "metadata") else col._collection.metadata | |
| meta = getattr(col, "metadata", None) | |
| if meta is None: | |
| meta = col._collection.metadata |
762dba3 to
e3db748
Compare
bensig
left a comment
There was a problem hiding this comment.
Reviewed locally on `fix/search-metric-quality`. Approve, no asks. This is exactly the shape a bug-fix PR should have.
Full pytest: 1243 passed in 44s — 9 new tests over v3.3.3 baseline, exactly matching the 6 invariant + 3 new searcher tests this PR adds.
Why it lands cleanly
- Failure chain is laid out end-to-end in the PR body: empty-cosine query → semantic-noise drawer → distance ≥ 1.0 → `1 - dist` floors to 0 → CLI shows `Match: 0.0` across every result. Easy to follow, easy to verify the fix actually addresses the root cause and not just a symptom.
- Tests come with the fix, not as a follow-up. The new `test_collection_metric_invariant.py` locks `hnsw:space=cosine` on all four creation paths (legacy `get_or_create` / legacy `create` / RFC 001 typed `get_collection(create=True)` / public `mempalace.palace.get_collection`) plus a re-open round-trip. Future refactor that drops the metadata kwarg gets caught at test time instead of silently producing legacy palaces. New `test_searcher.py` cases cover the BM25 hybrid rerank getting applied + the legacy-metric warning firing/staying silent in the right scenarios.
- Reuses production-proven code. `_hybrid_rank` was already on the MCP path; this just wires a second consumer. Lower regression risk than introducing a new ranker.
- Display format change is honest. `cosine=0.0 bm25=5.5` is more diagnostic than a single opaque number — users see which signal fired.
- Warning points users at existing fix path. `mempalace repair` already exists and does exactly the right thing. The PR doesn't introduce a new code path; it signposts the existing one. Good restraint.
- Backend hardening is minimal and clean. New `metadata` property on `ChromaCollection` is a pass-through with `or {}` so callers don't need None-checks.
Minor (not blockers)
- Warning is stderr-only — respects scriptable callers, but anyone piping stderr to `/dev/null` misses it. Not a real concern; CLI users see it, agents (who'd be the noise-suppressing case) read the structured `hits` list instead.
- `_warn_if_legacy_metric` lives in `searcher.py`. Could go in a utilities module, but it's only called from `search` and is small. Fine where it is.
Ship it.
Three tightly-coupled search-quality fixes for v3.3.3: 1. CLI `mempalace search` now routes through the same `_hybrid_rank` the MCP path already used. Drawers whose text contains every query term but embed as file-tree noise (directory listings, diffs, log fragments) were scoring cosine distance >= 1.0 — the display formula `max(0, 1 - dist)` then floored every result to `Match: 0.0`, with no way for the user to tell a lexical match from a total miss. BM25 catches these cleanly; the display surfaces both `cosine=` and `bm25=` so users see which component is firing. 2. Legacy-palace distance-metric warning. Palaces created before `hnsw:space=cosine` was consistently set silently use ChromaDB's default L2 metric, which breaks the cosine-similarity formula (L2 distances routinely exceed 1.0 on normalized 384-dim vectors). The search path now detects this at query time and prints a one-line notice pointing at `mempalace repair`. Only fires for legacy palaces; new palaces already set cosine correctly. 3. Invariant tests pinning `hnsw:space=cosine` on every collection- creation path — legacy `get_or_create_collection`, legacy `create_collection`, RFC 001 `get_collection(create=True)`, the public `palace.get_collection`, and a round-trip through reopen. Locks down the correctness that new-user palaces already have so a future refactor can't silently regress it. Also adds a `metadata` property to `ChromaCollection` so callers can read the underlying hnsw:space without reaching into `_collection`. Tests: - New regression: simulate three candidates at distance 1.5 (cosine=0), one containing query terms — must rank first with non-zero bm25. - New: legacy metric (empty or non-cosine) produces stderr warning. - New: correctly-configured palace produces no warning. - New: all five creation paths pin cosine metadata. All existing tests still pass.
`test_fresh_palace_via_full_stack_gets_cosine` used `tempfile.Temporary-
Directory()` as a context manager, which tries to delete the temp path
on exit. On Windows, ChromaDB still holds SQLite file handles to
`chroma.sqlite3` when the context closes, producing:
PermissionError: [WinError 32] The process cannot access the file
because it is being used by another process: '...\\chroma.sqlite3'
NotADirectoryError: [WinError 267] The directory name is invalid
Other tests in the same file use pytest's `tmp_path` fixture, which
defers cleanup to session end (when the process is exiting and the
file-lock contention is moot). Align this one with the rest of the
file.
CLAUDE.md already documents the 80% Windows coverage allowance due to
"ChromaDB file lock cleanup" — the fix is to stop fighting the lock.
91ff94d to
ec5f4eb
Compare
…HNSW fixes Bring in 29 commits from upstream/develop since the last merge (2026-04-23): Major absorbed changes: - MemPalace#976 (Felipe Truman): HNSW graph corruption fix, mine_global_lock for fan-out, MAX_PRECOMPACT_BLOCK_ATTEMPTS=2 for /compact deadlocks. Closes MemPalace#974/MemPalace#965/MemPalace#955; likely resolves MemPalace#1172 too. - MemPalace#1179 (Igor): CLI mempalace search routes through _hybrid_rank, legacy-metric warning + _warn_if_legacy_metric, invariant tests on hnsw:space=cosine across all 5 collection-creation paths. - MemPalace#1180/MemPalace#1183/MemPalace#1184: cross-wing topic tunnels, init mine UX, --auto-mine. - MemPalace#1185 (perf/batched-upsert-gpu): batched ChromaDB upserts, GPU device detection via mempalace/embedding.py. - MemPalace#1182: graceful Ctrl-C during mempalace mine. - MemPalace#1168: tunnel permissions security fix. Conflict resolutions (8 files): - searcher.py: kept fork's CLI delegation through search_memories (cleaner single-source-of-truth path); upstream's inline-retrieval branch dropped. Merged upstream's print formatting (cosine= + bm25=) with fork's matched_via reporting from sqlite BM25 fallback. - backends/chroma.py: kept fork's _BLOB_FIX_MARKER + palace_path arg to ChromaCollection (MemPalace#1171 write lock); merged upstream's **ef_kwargs (embedding_function support from MemPalace#1185). Removed duplicate _pin_hnsw_threads (fork already cherry-picked Felipe's earlier). - mcp_server.py: kept fork's palace_path arg + upstream's clearer comment on hnsw:num_threads=1 rationale. - miner.py: took upstream's serial mine() flow (mine_global_lock + Ctrl-C), RESTORED fork's strict detect_room — substring matching from upstream breaks fork-only test_detect_room_priority1_no_substring_match. Added `import re` for word-boundary regex matching. Fork-ahead concurrent mining (workers=, ThreadPoolExecutor from 5cd14bd) is dropped — daemon migration deprioritizes local concurrent mining; can re-port if needed. - CHANGELOG.md: combined fork's segfault-trio narrative with upstream's v3.3.4 release notes. - HOOKS_TUTORIAL.md: took MEMPALACE_PYTHON env var name (fork README was stale; hooks already use this name per fork-ahead MemPalace#19). - test_convo_miner_unit.py: took both contextlib + pytest imports. - test_searcher.py: imported upstream's 3 new TestSearchCLI tests but skipped them with TODOs — they assume upstream's inline-retrieval CLI with simpler mocks. Rewrite needed for fork's delegated search_memories path (sqlite BM25 fallback + scope counting). Test result: 1334 passed, 3 skipped (the upstream tests above), 3 warnings. Implications for open fork PRs: - MemPalace#1171 (cross-process write lock at adapter) becomes structurally redundant given MemPalace#976's mine_global_lock + daemon-strict serialization. Slated for close with thank-you to Felipe. - MemPalace#1173/MemPalace#1177 still relevant (quarantine threshold + blob_seq marker). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
After today's work: - Merged upstream/develop (29 commits), absorbing MemPalace#976 (HNSW + mine_global_lock + PreCompact attempt-cap), MemPalace#1179 (CLI BM25), MemPalace#1180/MemPalace#1182/MemPalace#1183/MemPalace#1185 features. - Filed PR MemPalace#1198 (`_tokenize` None guard) upstream. - Closed PR MemPalace#1171 (adapter-level flock) — superseded by MemPalace#976 + daemon-strict. - palace-daemon migration commits (c09582c + 0e97b19) make the daemon the single canonical writer; in-tree adapter flock no longer carries its weight. README: - Lead paragraph: 2026-04-21 → 2026-04-25 sync date, 165,632 → 151,420 drawer count, daemon-on-disks topology, listed absorbed upstream PRs. - Fork change queue: dropped MemPalace#1171 row, added MemPalace#1198 row. - "Multi-client coordination" section: rewrote — palace-daemon is now the fork's primary answer (was "deferred"), MemPalace#1171 retired, narrative shifted to defense-in-depth around MemPalace#976 + daemon. - Two-layer memory model: storage path is now palace-daemon HTTP, not ~/.mempalace/palace. - Ecosystem palace-daemon row: marks integration as complete (commits cited). - Open upstream PRs table: refreshed to MemPalace org URLs, added MemPalace#1198 row, removed MemPalace#1171, added 2026-04-25 develop sync to merged list, added MemPalace#1171 to closed list with rationale. - Test count 1096 → 1334. CLAUDE.md: - Row 20 (MemPalace#1198): "Upstream PR pending" → "Filed as MemPalace#1198 on 2026-04-24". - Row MemPalace#1171 in PR table: open → closed (with MemPalace#976 supersession). - Added MemPalace#1198 row to PR table. - Top-of-section count: "14 merged, 7 open, 9 closed" → "14 merged, 7 open (including MemPalace#1198), 10 closed (added MemPalace#1171)" + sync date 2026-04-25. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Centralizes upstream MemPalace#1179's legacy-metric warning at search_memories itself rather than the CLI search() function. Both the CLI (which delegates to search_memories) and the MCP path (which calls search_memories directly) now surface the warning when a palace was created without hnsw:space=cosine. Un-skips the three TestSearchCLI tests that were marked TODO in the 2026-04-25 develop merge — they now pass naturally: test_search_applies_bm25_hybrid_rerank test_search_warns_when_palace_uses_wrong_distance_metric test_search_does_not_warn_when_palace_is_correctly_configured Adds `mock_col.count.return_value` to those tests because the fork's search_memories calls col.count() for `available_in_scope`; the upstream mocks didn't set it (their search() did inline retrieval and never counted). 1337 tests pass (was 1334 with 3 skipped). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Summary
Blocking fix for v3.3.3. Three tightly-coupled search-quality fixes, all CLI-facing:
hnsw:space=cosinewas consistently set on the collection.The failure chain
A drawer whose text contains every query term can still embed as semantic noise when the drawer is a mechanical artifact (directory listing, diff, log fragment) rather than prose about its subject. Cosine distance to a natural-language query crosses 1.0, so
max(0, 1 - dist)floors to 0.0.Before this PR, the CLI showed
Match: 0.0on every result in that scenario — lexical matches indistinguishable from total misses. Separately, palaces created withouthnsw:space=cosinemetadata were silently using L2 distance, under which the same1 - distformula is nonsense and every result floors to 0.0.Fix 1 — hybrid rerank on the CLI
_hybrid_rankinsearcher.pywas already production-proven on the MCP path. The CLIsearch()function just didn't call it. Two consumers of the same retrieval code with opposite ranking quality.The fix reshapes ChromaDB's columnar query result into row-dicts, calls
_hybrid_rank, and emits from the reordered list. Display now showsMatch: cosine=0.0 bm25=5.5so users see each component separately rather than a single opaque number.Fix 2 — legacy-metric warning
New palaces set
hnsw:space=cosinecorrectly today. Legacy palaces (mined before this metadata was consistently set) silently use L2 distance, breaking the similarity formula. Detecting this on open and printing a one-line notice:mempalace repairalready exists in the codebase (repair.py) and does exactly the right thing — extracts all drawers, deletes the collection, recreates with cosine, re-upserts. The warning just points legacy users at it rather than letting them guess.Fix 3 — invariant tests
New file
tests/test_collection_metric_invariant.pypinshnsw:space=cosineon every creation path:get_or_create_collectioncreate_collectionget_collection(create=True)mempalace.palace.get_collectionIf a future refactor removes the
metadata={"hnsw:space": "cosine"}parameter from any path, this test file catches it — rather than silently producing legacy-style palaces that slowly accumulate bad search.Risk
Low:
Output format for CLI search changes from
Match: 0.0toMatch: cosine=0.0 bm25=0.0. This is human-facing text, not a machine-readable format; no downstream parsers in the codebase.Test plan
mempalace repair