feat: add min_similarity threshold to search tool#350
feat: add min_similarity threshold to search tool#350hkf57 wants to merge 4 commits intoMemPalace:developfrom
Conversation
web3guru888
left a comment
There was a problem hiding this comment.
This directly addresses something we've been navigating in our integration. We run mempalace_search at different OODA phases with different quality requirements — Orient phase wants breadth (accepts lower similarity), Evaluate phase wants precision (rejects noise). Right now we handle this entirely in post-processing, discarding results below phase-specific thresholds after they come back from the search tool. A min_similarity parameter pushes that filtering to the right layer and avoids returning garbage to the MCP consumer in the first place.
The 0.0 default is the right call. Negative cosine similarity results are never useful — they're anti-correlated content — and silently returning them to callers contributes to exactly the class of silent wrong-answer bugs documented in #333.
A few thoughts on the implementation:
The searcher.py change looks clean. One edge case worth testing: when min_similarity filters all results, does search() return an empty list or raise? Empty list is the right behavior (caller can distinguish "nothing found" from an error), and from the PR description it sounds like that's what happens — but worth an explicit test case if not already covered.
On the MCP schema update: would it be worth adding a brief description string to the min_similarity property so LLM tool-callers understand what it controls? Something like "description": "Filter out results below this cosine similarity score. Default 0.0 removes negative-similarity noise." — models are sensitive to property descriptions when deciding whether to pass a parameter.
The benchmark numbers (overreach queries averaging -0.023, valid hits 0.05+) are a useful data point. We've seen similar distributions in our ChromaDB collections — the gap between noise and signal tends to be small in absolute terms but consistent enough that a fixed threshold works well in practice.
Overall this is a clean, well-motivated change.
- pyproject.toml: widen chromadb to <2.0 for Python 3.14 compat (MemPalace#302) - config.py + miner/convo_miner/mcp_server: add hnsw:space=cosine so similarity = 1 - distance stays in [0,1] instead of negative L2 (MemPalace#304) - searcher.py + layers.py: guard against ChromaDB 1.x empty-outer query results (IndexError on fresh collections) (MemPalace#305) - mcp_server.py: redirect stdout→stderr at import to protect JSON-RPC wire from chromadb/posthog chatter (MemPalace#306) - mcp_server.py: replace 10k-limited col.get with paginating _iter_all_metadatas helper; stop swallowing errors silently (MemPalace#307) - mcp_server.py: drop undocumented wait_for_previous arg injected by Gemini MCP clients (MemPalace#322) - searcher.py + hybrid_searcher.py + mcp_server.py: add min_similarity threshold filter so callers get a clean "no results" signal (MemPalace#350) - entity_detector.py: add CODE_KEYWORDS blocklist (~80 terms) to stop Rust types / React / framework names being detected as entities (MemPalace#349) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
PR Review: feat: add min_similarity threshold to search toolExecutive Summary
Affected Areas: Business Impact: Improves search quality by filtering noise, but silently changes return behavior for existing integrations (MCP consumers, benchmarks, CLI) Flow Changes: Ratings
PR Health
High Priority IssuesBug #1: No input validation on
|
| Caller | File | Risk |
|---|---|---|
cmd_search |
mempalace/cli.py |
LOW — CLI gets default 0.0, no user control |
| Recall benchmark | tests/benchmarks/test_recall_threshold.py |
MED — recall measurement may shift |
| Palace boost benchmark | tests/benchmarks/test_palace_boost.py |
MED — recall measurement may shift |
| Search benchmark | tests/benchmarks/test_search_bench.py |
LOW — timing unaffected |
| Memory profiler | tests/benchmarks/test_memory_profile.py |
LOW — RSS unaffected |
Created by Octocode MCP https://octocode.ai
|
Addressed the review feedback in 1dd1e2c: Bug #1 (input validation): Added range validation for Architecture #3 (filtered count): Response dict now includes Code Quality #5 (schema constraints): Added On Bug #2 (default 0.0 as breaking change): Keeping 0.0 as the default. Negative cosine similarity results are anti-correlated content; returning them is the current bug, not a feature. The benchmark data is clear: overreach queries average -0.023, valid hits are 0.05+. The 41/41 tests pass, ruff clean. 3 new tests added for negative thresholds, out-of-range validation, and |
Rebased cleanly onto develop after base branch change from main. - Add min_similarity param (default 0.0) to search() and search_memories() - Input validation: must be between -1.0 and 1.0 - Default 0.0 automatically filters negative similarities - Works alongside existing max_distance filtering - Add min_similarity to MCP schema with bounds - Fix flaky test queries for small corpus compatibility - 6 new tests covering threshold filtering, edge cases, validation
df48bfb to
8002e7e
Compare
…rity-threshold # Conflicts: # mempalace/searcher.py
|
Hi, Severity: action required | Category: correctness How to fix: Remove clamp or change contract Agent prompt to fix - you can give this to your LLM of choice:
We noticed a couple of other issues in this PR as well - happy to share if helpful. Spotted by Qodo code review - free for open-source projects. |
Clamped similarity to [0, 1] contradicted the documented [-1.0, 1.0] contract — anti-correlated results were floored to 0.0, passing any negative threshold but indistinguishable from a neutral match.
|
Thanks — fixed in the latest push. Dropped the Happy to hear the other issues you spotted. |
|
Friendly bump - the bot review feedback from 2026-04-21 was addressed in 1dd1e2c (dropped the |
Resolves four conflicts in mempalace/searcher.py around the BM25 hybrid rerank, closet-boost effective-distance clamp, and the min_similarity threshold added by this PR: * keep develop's _warn_if_legacy_metric helper alongside the new min_similarity parameter on search() * in the CLI display loop, use develop's hits/bm25 iteration shape but apply this PR's threshold filter and unclamped vec_sim so negative thresholds remain meaningful * keep develop's effective_dist clamp [0, 2] (closet-boost-overshoot fix) but compute and report the unclamped similarity so the filter contract still holds * extract _validate_min_similarity (raises) and _drop_below_threshold helpers, and split search_memories into a wrapper + impl so the validation and post-rerank filter live outside the inner pipeline without breaching the cyclomatic-complexity ceiling * apply the threshold to the post-rerank result list rather than the pre-rerank candidate pool, so legitimate BM25 lexical lifts are not dropped before they get a chance to be reranked * update the develop BM25 regression test to use distance=1.0 (not 1.5), matching its stated intent of cosine=0 in the now-unclamped display Full pytest suite (1558 tests) passes.
What does this PR do?
Adds a
min_similarityparameter tosearch()andsearch_memories()that filters out results below a given similarity score. Defaults to0.0, which discards negative-similarity noise without affecting positive matches. Supports the full cosine similarity range (-1.0 to 1.0) with input validation.Motivation: I ran a 30-question retrieval benchmark against a 10,000-drawer palace (200 Confluence ADR pages). Overreach queries (where no relevant content exists) averaged -0.023 similarity, while valid hits were consistently 0.05+. The current behaviour of always returning N results - even with negative scores - means consumers have no "nothing found" signal. The default of 0.0 eliminates the worst false positives automatically; callers wanting stricter filtering can raise it (e.g.
min_similarity=0.1), or lower it to include anti-correlated results (e.g.min_similarity=-0.5).Rebased onto develop after the base branch change from main. This is now a clean, minimal diff (62 insertions) on top of develop's hybrid search, closet ranking, and BM25 re-ranking - none of which is touched by this PR.
Changes:
searcher.py:search()andsearch_memories()acceptmin_similarity(default 0.0), skip results below threshold. Works alongside existingmax_distancefiltering. Input validation rejects values outside -1.0 to 1.0.mcp_server.py: Schema updated withmin_similarityproperty includingminimum/maximumconstraints. The existingtool_search()handler already acceptsmin_similarityand converts tomax_distance; this adds direct filtering in the searcher for programmatic callers.test_searcher.py: 6 new tests covering filtering, negative thresholds, out-of-range validation, andtotal_before_filter. Fixed two flaky upstream test queries (test_result_fields,test_search_n_results) that used vague queries producing negative similarities against the small 4-doc test corpus.Follows the same pattern as
tool_check_duplicate, which already uses a similarity threshold.How to test
23 tests pass (811/811 full suite). The 6 new tests cover:
test_min_similarity_filters_low_scores- threshold of 0.99 returns empty resultstest_min_similarity_keeps_high_scores- threshold just below top score preserves resultstest_min_similarity_default_filters_negatives- default 0.0 guarantees no negative scorestest_min_similarity_negative_threshold- negative threshold includes anti-correlated contenttest_min_similarity_out_of_range_returns_error- values outside -1.0 to 1.0 return errortest_total_before_filter- response shows how many results existed before filteringChecklist
python -m pytest tests/ -v- 811 passed)ruff check .)