Skip to content

fix(search): CLI hybrid rerank, legacy-metric warning, invariant tests (3.3.4)#1179

Merged
igorls merged 2 commits intodevelopfrom
fix/search-metric-quality
Apr 25, 2026
Merged

fix(search): CLI hybrid rerank, legacy-metric warning, invariant tests (3.3.4)#1179
igorls merged 2 commits intodevelopfrom
fix/search-metric-quality

Conversation

@igorls
Copy link
Copy Markdown
Member

@igorls igorls commented Apr 24, 2026

Summary

Blocking fix for v3.3.3. Three tightly-coupled search-quality fixes, all CLI-facing:

  1. Hybrid BM25+cosine rerank on the CLI search path (the MCP tool already had it).
  2. Warning for legacy palaces that were created before hnsw:space=cosine was consistently set on the collection.
  3. Invariant tests pinning cosine metadata on every collection-creation path so this can't silently regress.

The failure chain

A drawer whose text contains every query term can still embed as semantic noise when the drawer is a mechanical artifact (directory listing, diff, log fragment) rather than prose about its subject. Cosine distance to a natural-language query crosses 1.0, so max(0, 1 - dist) floors to 0.0.

Before this PR, the CLI showed Match: 0.0 on every result in that scenario — lexical matches indistinguishable from total misses. Separately, palaces created without hnsw:space=cosine metadata were silently using L2 distance, under which the same 1 - dist formula is nonsense and every result floors to 0.0.

Fix 1 — hybrid rerank on the CLI

_hybrid_rank in searcher.py was already production-proven on the MCP path. The CLI search() function just didn't call it. Two consumers of the same retrieval code with opposite ranking quality.

The fix reshapes ChromaDB's columnar query result into row-dicts, calls _hybrid_rank, and emits from the reordered list. Display now shows Match: cosine=0.0 bm25=5.5 so users see each component separately rather than a single opaque number.

Fix 2 — legacy-metric warning

New palaces set hnsw:space=cosine correctly today. Legacy palaces (mined before this metadata was consistently set) silently use L2 distance, breaking the similarity formula. Detecting this on open and printing a one-line notice:

NOTICE: this palace was created without cosine distance (no hnsw:space metadata).
        Semantic similarity scores will not be meaningful.
        Run `mempalace repair` to rebuild the index with the correct metric.

mempalace repair already exists in the codebase (repair.py) and does exactly the right thing — extracts all drawers, deletes the collection, recreates with cosine, re-upserts. The warning just points legacy users at it rather than letting them guess.

Fix 3 — invariant tests

New file tests/test_collection_metric_invariant.py pins hnsw:space=cosine on every creation path:

  • Legacy get_or_create_collection
  • Legacy create_collection
  • RFC 001 typed get_collection(create=True)
  • The public mempalace.palace.get_collection
  • Round-trip through close-and-reopen

If a future refactor removes the metadata={"hnsw:space": "cosine"} parameter from any path, this test file catches it — rather than silently producing legacy-style palaces that slowly accumulate bad search.

Risk

Low:

  • The hybrid rerank function is already production-proven on the MCP path; this PR only wires a second consumer.
  • The warning is stderr-only and one line; does not affect search results or programmatic callers.
  • The invariant tests observe existing correct behavior and pin it — they introduce no new behavior.

Output format for CLI search changes from Match: 0.0 to Match: cosine=0.0 bm25=0.0. This is human-facing text, not a machine-readable format; no downstream parsers in the codebase.

Test plan

  • Full test suite: 1243 passing
  • Ruff + format clean
  • New regression test: three candidates at distance 1.5 (cosine = 0), one containing query terms — must rank first with non-zero bm25
  • New test: legacy metric (empty or non-cosine metadata) produces stderr warning pointing at mempalace repair
  • New test: correctly-configured palace produces no warning
  • New test file: cosine metadata pinned on every creation path
  • No changes to MCP search path — agents unaffected

Copilot AI review requested due to automatic review settings April 24, 2026 21:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves CLI search quality and debuggability by aligning the CLI search path with the existing hybrid (BM25 + vector) ranking behavior, and adds safeguards to prevent future regressions in Chroma distance-metric configuration.

Changes:

  • Wire CLI search() through the existing _hybrid_rank reranker and surface both cosine= and bm25= components in output.
  • Detect legacy Chroma collections missing hnsw:space=cosine and emit a CLI warning pointing users to mempalace repair.
  • Add invariant tests ensuring every collection-creation path sets hnsw:space=cosine.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
mempalace/searcher.py Adds legacy-metric warning + applies hybrid rerank to CLI printing path.
mempalace/backends/chroma.py Exposes collection metadata via a passthrough property for metric checks.
tests/test_searcher.py Adds regression tests for hybrid rerank behavior and legacy-metric warning behavior.
tests/test_collection_metric_invariant.py Adds invariant coverage ensuring cosine metric is set across creation paths and persists on reopen.
CHANGELOG.md Documents the CLI search rerank change and the legacy metric warning behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread mempalace/searcher.py
Comment on lines +239 to +242
def _warn_if_legacy_metric(col) -> None:
"""Print a one-line notice if the palace was created without
``hnsw:space=cosine``.

Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says this prints a “one-line notice”, but the implementation prints a multi-line message (and includes a leading newline). Update the docstring (or the output formatting) so they match, since tests/assertions and the changelog describe a specific user-facing behavior.

Copilot uses AI. Check for mistakes.
Comment thread CHANGELOG.md Outdated
- `mempalace_diary_read(wing="")` now returns diary entries from every wing this agent has written to, matching the #1097 "empty-string as no filter" pattern. Previously defaulted to `wing_<agent>`, siloing entries that hooks wrote to project-derived wings. (#1145)
- `mempalace mine` now skips the generated `entities.json` file so its contents aren't re-ingested as project content. (#1175)
- **CLI `mempalace search` retrieval quality.** The CLI was using pure ChromaDB cosine distance with no BM25 rerank, so drawers containing every query term but embedding as noise (directory listings, diff output, shell logs) scored `Match: 0.0` alongside genuinely irrelevant results with no way to tell them apart. Wired the CLI through the same `_hybrid_rank` the `mempalace_search` MCP tool already used, and surfaced both `cosine=` and `bm25=` scores in the output so users see which component of the match is firing. MCP search was unaffected; this fixes the human-facing CLI parity gap.
- **Legacy-palace distance-metric warning.** CLI search now detects palaces created before `hnsw:space=cosine` was consistently set and prints a one-line notice pointing at `mempalace repair`. Without the warning such palaces silently used L2 distance, under which the similarity display floored every result to `Match: 0.0`. New palaces mined today already set cosine correctly and now have invariant tests pinning that behavior so future refactors can't silently regress it.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changelog entry describes the legacy-metric warning as a “one-line notice”, but the CLI currently prints a 3-line NOTICE block (with a blank line before it). Please align the wording here with the actual output format to avoid misleading users/support docs.

Suggested change
- **Legacy-palace distance-metric warning.** CLI search now detects palaces created before `hnsw:space=cosine` was consistently set and prints a one-line notice pointing at `mempalace repair`. Without the warning such palaces silently used L2 distance, under which the similarity display floored every result to `Match: 0.0`. New palaces mined today already set cosine correctly and now have invariant tests pinning that behavior so future refactors can't silently regress it.
- **Legacy-palace distance-metric warning.** CLI search now detects palaces created before `hnsw:space=cosine` was consistently set and prints a 3-line `NOTICE` block (preceded by a blank line) pointing at `mempalace repair`. Without the warning such palaces silently used L2 distance, under which the similarity display floored every result to `Match: 0.0`. New palaces mined today already set cosine correctly and now have invariant tests pinning that behavior so future refactors can't silently regress it.

Copilot uses AI. Check for mistakes.


def _assert_cosine(col, where: str) -> None:
meta = col.metadata if hasattr(col, "metadata") else col._collection.metadata
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasattr(col, "metadata") will invoke the property getter (and then col.metadata invokes it again), which can be surprising if the property ever does more than return a field. Prefer a single getattr(col, "metadata", None) (or try/except AttributeError) and fall back accordingly to avoid double evaluation.

Suggested change
meta = col.metadata if hasattr(col, "metadata") else col._collection.metadata
meta = getattr(col, "metadata", None)
if meta is None:
meta = col._collection.metadata

Copilot uses AI. Check for mistakes.
@igorls igorls force-pushed the fix/search-metric-quality branch from 762dba3 to e3db748 Compare April 24, 2026 21:57
@igorls igorls changed the title fix(search): CLI hybrid rerank, legacy-metric warning, invariant tests — blocks 3.3.3 fix(search): CLI hybrid rerank, legacy-metric warning, invariant tests (3.3.4) Apr 24, 2026
Copy link
Copy Markdown
Collaborator

@bensig bensig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed locally on `fix/search-metric-quality`. Approve, no asks. This is exactly the shape a bug-fix PR should have.

Full pytest: 1243 passed in 44s — 9 new tests over v3.3.3 baseline, exactly matching the 6 invariant + 3 new searcher tests this PR adds.

Why it lands cleanly

  • Failure chain is laid out end-to-end in the PR body: empty-cosine query → semantic-noise drawer → distance ≥ 1.0 → `1 - dist` floors to 0 → CLI shows `Match: 0.0` across every result. Easy to follow, easy to verify the fix actually addresses the root cause and not just a symptom.
  • Tests come with the fix, not as a follow-up. The new `test_collection_metric_invariant.py` locks `hnsw:space=cosine` on all four creation paths (legacy `get_or_create` / legacy `create` / RFC 001 typed `get_collection(create=True)` / public `mempalace.palace.get_collection`) plus a re-open round-trip. Future refactor that drops the metadata kwarg gets caught at test time instead of silently producing legacy palaces. New `test_searcher.py` cases cover the BM25 hybrid rerank getting applied + the legacy-metric warning firing/staying silent in the right scenarios.
  • Reuses production-proven code. `_hybrid_rank` was already on the MCP path; this just wires a second consumer. Lower regression risk than introducing a new ranker.
  • Display format change is honest. `cosine=0.0 bm25=5.5` is more diagnostic than a single opaque number — users see which signal fired.
  • Warning points users at existing fix path. `mempalace repair` already exists and does exactly the right thing. The PR doesn't introduce a new code path; it signposts the existing one. Good restraint.
  • Backend hardening is minimal and clean. New `metadata` property on `ChromaCollection` is a pass-through with `or {}` so callers don't need None-checks.

Minor (not blockers)

  1. Warning is stderr-only — respects scriptable callers, but anyone piping stderr to `/dev/null` misses it. Not a real concern; CLI users see it, agents (who'd be the noise-suppressing case) read the structured `hits` list instead.
  2. `_warn_if_legacy_metric` lives in `searcher.py`. Could go in a utilities module, but it's only called from `search` and is small. Fine where it is.

Ship it.

igorls added 2 commits April 25, 2026 00:39
Three tightly-coupled search-quality fixes for v3.3.3:

1. CLI `mempalace search` now routes through the same `_hybrid_rank`
   the MCP path already used. Drawers whose text contains every query
   term but embed as file-tree noise (directory listings, diffs, log
   fragments) were scoring cosine distance >= 1.0 — the display formula
   `max(0, 1 - dist)` then floored every result to `Match: 0.0`, with
   no way for the user to tell a lexical match from a total miss. BM25
   catches these cleanly; the display surfaces both `cosine=` and
   `bm25=` so users see which component is firing.

2. Legacy-palace distance-metric warning. Palaces created before
   `hnsw:space=cosine` was consistently set silently use ChromaDB's
   default L2 metric, which breaks the cosine-similarity formula (L2
   distances routinely exceed 1.0 on normalized 384-dim vectors). The
   search path now detects this at query time and prints a one-line
   notice pointing at `mempalace repair`. Only fires for legacy
   palaces; new palaces already set cosine correctly.

3. Invariant tests pinning `hnsw:space=cosine` on every collection-
   creation path — legacy `get_or_create_collection`, legacy
   `create_collection`, RFC 001 `get_collection(create=True)`, the
   public `palace.get_collection`, and a round-trip through reopen.
   Locks down the correctness that new-user palaces already have so a
   future refactor can't silently regress it.

Also adds a `metadata` property to `ChromaCollection` so callers can
read the underlying hnsw:space without reaching into `_collection`.

Tests:
- New regression: simulate three candidates at distance 1.5 (cosine=0),
  one containing query terms — must rank first with non-zero bm25.
- New: legacy metric (empty or non-cosine) produces stderr warning.
- New: correctly-configured palace produces no warning.
- New: all five creation paths pin cosine metadata.

All existing tests still pass.
`test_fresh_palace_via_full_stack_gets_cosine` used `tempfile.Temporary-
Directory()` as a context manager, which tries to delete the temp path
on exit. On Windows, ChromaDB still holds SQLite file handles to
`chroma.sqlite3` when the context closes, producing:

    PermissionError: [WinError 32] The process cannot access the file
    because it is being used by another process: '...\\chroma.sqlite3'
    NotADirectoryError: [WinError 267] The directory name is invalid

Other tests in the same file use pytest's `tmp_path` fixture, which
defers cleanup to session end (when the process is exiting and the
file-lock contention is moot). Align this one with the rest of the
file.

CLAUDE.md already documents the 80% Windows coverage allowance due to
"ChromaDB file lock cleanup" — the fix is to stop fighting the lock.
@igorls igorls force-pushed the fix/search-metric-quality branch from 91ff94d to ec5f4eb Compare April 25, 2026 03:40
@igorls igorls merged commit 91c1d15 into develop Apr 25, 2026
6 checks passed
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
…HNSW fixes

Bring in 29 commits from upstream/develop since the last merge (2026-04-23):

Major absorbed changes:
- MemPalace#976 (Felipe Truman): HNSW graph corruption fix, mine_global_lock for
  fan-out, MAX_PRECOMPACT_BLOCK_ATTEMPTS=2 for /compact deadlocks. Closes
  MemPalace#974/MemPalace#965/MemPalace#955; likely resolves MemPalace#1172 too.
- MemPalace#1179 (Igor): CLI mempalace search routes through _hybrid_rank,
  legacy-metric warning + _warn_if_legacy_metric, invariant tests on
  hnsw:space=cosine across all 5 collection-creation paths.
- MemPalace#1180/MemPalace#1183/MemPalace#1184: cross-wing topic tunnels, init mine UX, --auto-mine.
- MemPalace#1185 (perf/batched-upsert-gpu): batched ChromaDB upserts, GPU device
  detection via mempalace/embedding.py.
- MemPalace#1182: graceful Ctrl-C during mempalace mine.
- MemPalace#1168: tunnel permissions security fix.

Conflict resolutions (8 files):
- searcher.py: kept fork's CLI delegation through search_memories (cleaner
  single-source-of-truth path); upstream's inline-retrieval branch dropped.
  Merged upstream's print formatting (cosine= + bm25=) with fork's
  matched_via reporting from sqlite BM25 fallback.
- backends/chroma.py: kept fork's _BLOB_FIX_MARKER + palace_path arg to
  ChromaCollection (MemPalace#1171 write lock); merged upstream's **ef_kwargs
  (embedding_function support from MemPalace#1185). Removed duplicate
  _pin_hnsw_threads (fork already cherry-picked Felipe's earlier).
- mcp_server.py: kept fork's palace_path arg + upstream's clearer comment
  on hnsw:num_threads=1 rationale.
- miner.py: took upstream's serial mine() flow (mine_global_lock + Ctrl-C),
  RESTORED fork's strict detect_room — substring matching from upstream
  breaks fork-only test_detect_room_priority1_no_substring_match. Added
  `import re` for word-boundary regex matching. Fork-ahead concurrent
  mining (workers=, ThreadPoolExecutor from 5cd14bd) is dropped — daemon
  migration deprioritizes local concurrent mining; can re-port if needed.
- CHANGELOG.md: combined fork's segfault-trio narrative with upstream's
  v3.3.4 release notes.
- HOOKS_TUTORIAL.md: took MEMPALACE_PYTHON env var name (fork README was
  stale; hooks already use this name per fork-ahead MemPalace#19).
- test_convo_miner_unit.py: took both contextlib + pytest imports.
- test_searcher.py: imported upstream's 3 new TestSearchCLI tests but
  skipped them with TODOs — they assume upstream's inline-retrieval CLI
  with simpler mocks. Rewrite needed for fork's delegated search_memories
  path (sqlite BM25 fallback + scope counting).

Test result: 1334 passed, 3 skipped (the upstream tests above), 3 warnings.

Implications for open fork PRs:
- MemPalace#1171 (cross-process write lock at adapter) becomes structurally
  redundant given MemPalace#976's mine_global_lock + daemon-strict serialization.
  Slated for close with thank-you to Felipe.
- MemPalace#1173/MemPalace#1177 still relevant (quarantine threshold + blob_seq marker).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
After today's work:

- Merged upstream/develop (29 commits), absorbing MemPalace#976 (HNSW + mine_global_lock + PreCompact attempt-cap), MemPalace#1179 (CLI BM25), MemPalace#1180/MemPalace#1182/MemPalace#1183/MemPalace#1185 features.
- Filed PR MemPalace#1198 (`_tokenize` None guard) upstream.
- Closed PR MemPalace#1171 (adapter-level flock) — superseded by MemPalace#976 + daemon-strict.
- palace-daemon migration commits (c09582c + 0e97b19) make the daemon the
  single canonical writer; in-tree adapter flock no longer carries its weight.

README:
- Lead paragraph: 2026-04-21 → 2026-04-25 sync date, 165,632 → 151,420
  drawer count, daemon-on-disks topology, listed absorbed upstream PRs.
- Fork change queue: dropped MemPalace#1171 row, added MemPalace#1198 row.
- "Multi-client coordination" section: rewrote — palace-daemon is now the
  fork's primary answer (was "deferred"), MemPalace#1171 retired, narrative shifted
  to defense-in-depth around MemPalace#976 + daemon.
- Two-layer memory model: storage path is now palace-daemon HTTP, not
  ~/.mempalace/palace.
- Ecosystem palace-daemon row: marks integration as complete (commits
  cited).
- Open upstream PRs table: refreshed to MemPalace org URLs, added MemPalace#1198
  row, removed MemPalace#1171, added 2026-04-25 develop sync to merged list, added
  MemPalace#1171 to closed list with rationale.
- Test count 1096 → 1334.

CLAUDE.md:
- Row 20 (MemPalace#1198): "Upstream PR pending" → "Filed as MemPalace#1198 on 2026-04-24".
- Row MemPalace#1171 in PR table: open → closed (with MemPalace#976 supersession).
- Added MemPalace#1198 row to PR table.
- Top-of-section count: "14 merged, 7 open, 9 closed" → "14 merged, 7 open
  (including MemPalace#1198), 10 closed (added MemPalace#1171)" + sync date 2026-04-25.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
jphein added a commit to jphein/mempalace that referenced this pull request Apr 25, 2026
Centralizes upstream MemPalace#1179's legacy-metric warning at search_memories
itself rather than the CLI search() function. Both the CLI (which
delegates to search_memories) and the MCP path (which calls
search_memories directly) now surface the warning when a palace was
created without hnsw:space=cosine.

Un-skips the three TestSearchCLI tests that were marked TODO in the
2026-04-25 develop merge — they now pass naturally:
  test_search_applies_bm25_hybrid_rerank
  test_search_warns_when_palace_uses_wrong_distance_metric
  test_search_does_not_warn_when_palace_is_correctly_configured

Adds `mock_col.count.return_value` to those tests because the fork's
search_memories calls col.count() for `available_in_scope`; the upstream
mocks didn't set it (their search() did inline retrieval and never
counted).

1337 tests pass (was 1334 with 3 skipped).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants