feat: add min_similarity threshold to search tool by hkf57 · Pull Request #350 · MemPalace/mempalace

hkf57 · 2026-04-09T07:39:52Z

What does this PR do?

Adds a min_similarity parameter to search() and search_memories() that filters out results below a given similarity score. Defaults to 0.0, which discards negative-similarity noise without affecting positive matches. Supports the full cosine similarity range (-1.0 to 1.0) with input validation.

Motivation: I ran a 30-question retrieval benchmark against a 10,000-drawer palace (200 Confluence ADR pages). Overreach queries (where no relevant content exists) averaged -0.023 similarity, while valid hits were consistently 0.05+. The current behaviour of always returning N results - even with negative scores - means consumers have no "nothing found" signal. The default of 0.0 eliminates the worst false positives automatically; callers wanting stricter filtering can raise it (e.g. min_similarity=0.1), or lower it to include anti-correlated results (e.g. min_similarity=-0.5).

Rebased onto develop after the base branch change from main. This is now a clean, minimal diff (62 insertions) on top of develop's hybrid search, closet ranking, and BM25 re-ranking - none of which is touched by this PR.

Changes:

searcher.py: search() and search_memories() accept min_similarity (default 0.0), skip results below threshold. Works alongside existing max_distance filtering. Input validation rejects values outside -1.0 to 1.0.
mcp_server.py: Schema updated with min_similarity property including minimum/maximum constraints. The existing tool_search() handler already accepts min_similarity and converts to max_distance; this adds direct filtering in the searcher for programmatic callers.
test_searcher.py: 6 new tests covering filtering, negative thresholds, out-of-range validation, and total_before_filter. Fixed two flaky upstream test queries (test_result_fields, test_search_n_results) that used vague queries producing negative similarities against the small 4-doc test corpus.

Follows the same pattern as tool_check_duplicate, which already uses a similarity threshold.

How to test

python -m pytest tests/test_searcher.py -v

23 tests pass (811/811 full suite). The 6 new tests cover:

test_min_similarity_filters_low_scores - threshold of 0.99 returns empty results
test_min_similarity_keeps_high_scores - threshold just below top score preserves results
test_min_similarity_default_filters_negatives - default 0.0 guarantees no negative scores
test_min_similarity_negative_threshold - negative threshold includes anti-correlated content
test_min_similarity_out_of_range_returns_error - values outside -1.0 to 1.0 return error
test_total_before_filter - response shows how many results existed before filtering

Checklist

Tests pass (python -m pytest tests/ -v - 811 passed)
No hardcoded paths
Linter passes (ruff check .)
Clean rebase onto develop (no merge noise)

web3guru888

This directly addresses something we've been navigating in our integration. We run mempalace_search at different OODA phases with different quality requirements — Orient phase wants breadth (accepts lower similarity), Evaluate phase wants precision (rejects noise). Right now we handle this entirely in post-processing, discarding results below phase-specific thresholds after they come back from the search tool. A min_similarity parameter pushes that filtering to the right layer and avoids returning garbage to the MCP consumer in the first place.

The 0.0 default is the right call. Negative cosine similarity results are never useful — they're anti-correlated content — and silently returning them to callers contributes to exactly the class of silent wrong-answer bugs documented in #333.

A few thoughts on the implementation:

The searcher.py change looks clean. One edge case worth testing: when min_similarity filters all results, does search() return an empty list or raise? Empty list is the right behavior (caller can distinguish "nothing found" from an error), and from the PR description it sounds like that's what happens — but worth an explicit test case if not already covered.

On the MCP schema update: would it be worth adding a brief description string to the min_similarity property so LLM tool-callers understand what it controls? Something like "description": "Filter out results below this cosine similarity score. Default 0.0 removes negative-similarity noise." — models are sensitive to property descriptions when deciding whether to pass a parameter.

The benchmark numbers (overreach queries averaging -0.023, valid hits 0.05+) are a useful data point. We've seen similar distributions in our ChromaDB collections — the gap between noise and signal tends to be small in absolute terms but consistent enough that a fixed threshold works well in practice.

Overall this is a clean, well-motivated change.

- pyproject.toml: widen chromadb to <2.0 for Python 3.14 compat (MemPalace#302) - config.py + miner/convo_miner/mcp_server: add hnsw:space=cosine so similarity = 1 - distance stays in [0,1] instead of negative L2 (MemPalace#304) - searcher.py + layers.py: guard against ChromaDB 1.x empty-outer query results (IndexError on fresh collections) (MemPalace#305) - mcp_server.py: redirect stdout→stderr at import to protect JSON-RPC wire from chromadb/posthog chatter (MemPalace#306) - mcp_server.py: replace 10k-limited col.get with paginating _iter_all_metadatas helper; stop swallowing errors silently (MemPalace#307) - mcp_server.py: drop undocumented wait_for_previous arg injected by Gemini MCP clients (MemPalace#322) - searcher.py + hybrid_searcher.py + mcp_server.py: add min_similarity threshold filter so callers get a clean "no results" signal (MemPalace#350) - entity_detector.py: add CODE_KEYWORDS blocklist (~80 terms) to stop Rust types / React / framework names being detected as entities (MemPalace#349) Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

bgauryy · 2026-04-09T17:32:20Z

PR Review: feat: add min_similarity threshold to search tool

Executive Summary

Aspect	Value
PR Goal	Add `min_similarity` parameter to filter low-confidence search results
Files Changed	3
Risk Level	MEDIUM — changes default search behavior for all callers
Review Mode	Quick
Review Effort	2/5
Recommendation	REQUEST_CHANGES

Affected Areas: mempalace/searcher.py (core search), mempalace/mcp_server.py (MCP interface), tests/test_searcher.py

Business Impact: Improves search quality by filtering noise, but silently changes return behavior for existing integrations (MCP consumers, benchmarks, CLI)

Flow Changes: search_memories() and search() now post-filter ChromaDB results against min_similarity before returning. Default 0.0 silently strips negative-similarity results that were previously returned.

Ratings

Aspect	Score
Correctness	3/5
Security	5/5
Performance	5/5
Maintainability	4/5

PR Health

Has clear description (excellent motivation with benchmark data)
References ticket/issue — N/A (standalone feature)
Appropriate size (47 lines, well-scoped)
Has relevant tests (3 new tests + 1 updated)

High Priority Issues

Bug #1: No input validation on `min_similarity` — invalid values silently return zero results

Location: mempalace/searcher.py:21 and mempalace/searcher.py:100 | Confidence: HIGH

The MCP schema describes min_similarity as "0-1" but neither the schema nor the Python code enforces any range. An MCP consumer passing min_similarity=50 (or any value > 1.0) silently receives zero results with no error or warning. This is especially dangerous in an MCP context where the caller is an AI agent that may not understand why results disappeared.

 def search(query: str, palace_path: str, wing: str = None, room: str = None, n_results: int = 5, min_similarity: float = 0.0):
     """
     Search the palace. Returns verbatim drawer content.
     Optionally filter by wing (project) or room (aspect).
     """
+    if not (0.0 <= min_similarity <= 1.0):
+        raise ValueError(f"min_similarity must be between 0.0 and 1.0, got {min_similarity}")

Same fix needed in search_memories (returns error dict to match existing error pattern):

 def search_memories(
     query: str, palace_path: str, wing: str = None, room: str = None, n_results: int = 5,
     min_similarity: float = 0.0,
 ) -> dict:
+    if not (0.0 <= min_similarity <= 1.0):
+        return {"error": f"min_similarity must be between 0.0 and 1.0, got {min_similarity}"}

Bug #2: Default `0.0` is a silent breaking change — existing callers lose negative-similarity results

Location: mempalace/searcher.py:21 and mempalace/searcher.py:100 | Confidence: HIGH

Before this PR, search_memories("query", palace_path) returned ALL results including those with negative similarity. After this PR, the same call silently drops negative-similarity results. This is a semantic breaking change hidden behind a default parameter.

Evidence within the PR itself: The existing test_result_fields test had to be modified — its query was changed from "authentication" to "JWT authentication tokens session management" and a new assert len(result["results"]) > 0 guard was added, because the original vague query no longer guaranteed positive-similarity results under the new default.

Blast radius — 7 callers reference search_memories:

mempalace/mcp_server.py — updated (OK)
tests/test_searcher.py — updated (OK)
tests/benchmarks/test_palace_boost.py — NOT updated, measures recall
tests/benchmarks/test_search_bench.py — NOT updated, measures timing
tests/benchmarks/test_memory_profile.py — NOT updated, measures RSS
tests/benchmarks/test_recall_threshold.py — NOT updated, measures recall@5 and recall@10
README.md — example code (cosmetic)

The benchmark tests that measure recall are most at risk: if any needle queries previously matched with negative similarity (unlikely but possible with adversarial data), recall numbers silently change.

Recommended fix: Either (a) keep default as None and only filter when explicitly set, preserving backward compatibility:

 def search_memories(
     query: str, palace_path: str, wing: str = None, room: str = None, n_results: int = 5,
-    min_similarity: float = 0.0,
+    min_similarity: float = None,
 ) -> dict:
     ...
     for doc, meta, dist in zip(docs, metas, dists):
         similarity = round(1 - dist, 3)
-        if similarity < min_similarity:
+        if min_similarity is not None and similarity < min_similarity:
             continue

Or (b) keep 0.0 as default but explicitly document the behavior change and update the benchmark tests to pass min_similarity=-1.0 to preserve old behavior.

Medium Priority Issues

Architecture #3: Post-filter can return fewer results than `limit` with no indication to caller

Location: mempalace/searcher.py:145-158 | Confidence: HIGH

ChromaDB is asked for n_results items, then min_similarity post-filters. If a user requests limit=5 with min_similarity=0.3, they may receive 0-4 results with no indication that filtering occurred. The return dict has no field to communicate "5 were fetched, 3 were filtered".

This matters because MCP consumers (AI agents) may interpret an empty result list as "no relevant content exists" when in fact relevant content exists just below the threshold.

     return {
         "query": query,
         "filters": {"wing": wing, "room": room},
         "results": hits,
+        "total_before_filter": len(docs),
+        "min_similarity": min_similarity,
     }

Code Quality #4: Filtering logic duplicated across `search()` and `search_memories()`

Location: mempalace/searcher.py:74-77 and mempalace/searcher.py:145-147 | Confidence: MED

Both search() (CLI) and search_memories() (programmatic) independently implement the same if similarity < min_similarity: continue filter with identical round(1 - dist, 3) conversion. If the filtering formula or rounding changes, both must be updated in sync. Consider extracting the similarity computation and filter into a shared helper.

Not blocking — the duplication is small and the two functions have different output formats (print vs dict).

Code Quality #5: MCP schema lacks `minimum`/`maximum` constraints

Location: mempalace/mcp_server.py:626-629 | Confidence: MED

The JSON Schema definition for min_similarity says "0-1" in the description text but doesn't use JSON Schema's built-in minimum/maximum keywords. MCP clients that validate inputs against the schema won't catch out-of-range values.

 "min_similarity": {
     "type": "number",
     "description": "Minimum similarity threshold 0-1 (default 0.0, discarding negative scores). Results below this are omitted.",
+    "minimum": 0.0,
+    "maximum": 1.0,
 },

Flow Impact Analysis

MCP Consumer ──► tool_search(min_similarity=X)
                    │
                    ▼
              search_memories(min_similarity=X)
                    │
                    ├──► ChromaDB.query(n_results=N)  ← fetches N results
                    │
                    ▼
              POST-FILTER: similarity < min_similarity → skip
                    │
                    ▼
              Return 0..N hits  ← caller may receive < N results

CLI User ────► cmd_search(args)
                    │
                    ▼
              search(min_similarity=0.0)  ← CLI has no --min-similarity flag
                    │
                    ▼
              Same post-filter logic, prints to stdout

Callers NOT updated in this PR that use the default:

Caller	File	Risk
`cmd_search`	`mempalace/cli.py`	LOW — CLI gets default `0.0`, no user control
Recall benchmark	`tests/benchmarks/test_recall_threshold.py`	MED — recall measurement may shift
Palace boost benchmark	`tests/benchmarks/test_palace_boost.py`	MED — recall measurement may shift
Search benchmark	`tests/benchmarks/test_search_bench.py`	LOW — timing unaffected
Memory profiler	`tests/benchmarks/test_memory_profile.py`	LOW — RSS unaffected

Created by Octocode MCP https://octocode.ai

hkf57 · 2026-04-09T22:10:54Z

Addressed the review feedback in 1dd1e2c:

Bug #1 (input validation): Added range validation for min_similarity. Accepts -1.0 to 1.0 (the full cosine similarity range). search() raises ValueError; search_memories() returns an error dict to match existing error patterns. Negative thresholds are supported for callers who want to include anti-correlated content.

Architecture #3 (filtered count): Response dict now includes total_before_filter so callers can distinguish "nothing matched the query" from "matches existed but fell below the threshold".

Code Quality #5 (schema constraints): Added minimum: -1.0 and maximum: 1.0 to the JSON Schema definition.

On Bug #2 (default 0.0 as breaking change): Keeping 0.0 as the default. Negative cosine similarity results are anti-correlated content; returning them is the current bug, not a feature. The benchmark data is clear: overreach queries average -0.023, valid hits are 0.05+. The total_before_filter field gives callers full visibility into what was filtered. Benchmark tests are excluded from default pytest runs via markers and don't call search_memories directly.

41/41 tests pass, ruff clean. 3 new tests added for negative thresholds, out-of-range validation, and total_before_filter.

Rebased cleanly onto develop after base branch change from main. - Add min_similarity param (default 0.0) to search() and search_memories() - Input validation: must be between -1.0 and 1.0 - Default 0.0 automatically filters negative similarities - Works alongside existing max_distance filtering - Add min_similarity to MCP schema with bounds - Fix flaky test queries for small corpus compatibility - 6 new tests covering threshold filtering, edge cases, validation

…rity-threshold # Conflicts: # mempalace/searcher.py

Qodo-Free-For-OSS · 2026-04-21T06:18:07Z

Hi, search() and search_memories() clamp similarity to >=0.0, so anti-correlated (negative) cosine similarities can never be returned or filtered-in, contradicting the new API contract and tool schema that claim support for [-1.0, 1.0].

Severity: action required | Category: correctness

How to fix: Remove clamp or change contract

Agent prompt to fix - you can give this to your LLM of choice:

Issue description

Similarity is clamped to [0,1], but the PR claims similarity supports [-1,1] and that negative thresholds can include anti-correlated results.

Fix Focus Areas

mempalace/searcher.py[286-290]

mempalace/searcher.py[425-435]

mempalace/mcp_server.py[1371-1376]

What to change

Pick ONE consistent contract:

If you want true cosine similarity range [-1,1]: compute raw_similarity = 1 - dist (or 1 - effective_dist) without max(0.0, ...), then filter/return that value.

If you want similarity in [0,1] only: keep the clamp, but tighten validation/docs/schema to min_similarity in [0,1] and remove claims about negative similarity support.

Update tests accordingly (especially any tests implying negative similarity support).

We noticed a couple of other issues in this PR as well - happy to share if helpful.

Spotted by Qodo code review - free for open-source projects.

Clamped similarity to [0, 1] contradicted the documented [-1.0, 1.0] contract — anti-correlated results were floored to 0.0, passing any negative threshold but indistinguishable from a neutral match.

hkf57 · 2026-04-21T23:38:50Z

Thanks — fixed in the latest push. Dropped the max(0.0, ...) clamp at both sites in searcher.py, so similarity is now the true cosine value in [-1.0, 1.0]. Validation, schema, and docs were already consistent with the negative-threshold contract; the clamp was the odd one out. All 27 searcher tests pass, including the negative-threshold case which now actually exercises negatives end-to-end.

Happy to hear the other issues you spotted.

hkf57 · 2026-05-06T08:13:08Z

Friendly bump - the bot review feedback from 2026-04-21 was addressed in 1dd1e2c (dropped the max(0.0, ...) clamp at both searcher.py sites so similarity is the true cosine value in [-1.0, 1.0]). Branch is current against develop. Let me know if there is anything blocking review or further changes you would like.

Resolves four conflicts in mempalace/searcher.py around the BM25 hybrid rerank, closet-boost effective-distance clamp, and the min_similarity threshold added by this PR: * keep develop's _warn_if_legacy_metric helper alongside the new min_similarity parameter on search() * in the CLI display loop, use develop's hits/bm25 iteration shape but apply this PR's threshold filter and unclamped vec_sim so negative thresholds remain meaningful * keep develop's effective_dist clamp [0, 2] (closet-boost-overshoot fix) but compute and report the unclamped similarity so the filter contract still holds * extract _validate_min_similarity (raises) and _drop_below_threshold helpers, and split search_memories into a wrapper + impl so the validation and post-rerank filter live outside the inner pipeline without breaching the cyclomatic-complexity ceiling * apply the threshold to the post-rerank result list rather than the pre-rerank candidate pool, so legitimate BM25 lexical lifts are not dropped before they get a chance to be reranked * update the develop BM25 regression test to use distance=1.0 (not 1.5), matching its stated intent of cosine=0 in the now-unclamped display Full pytest suite (1558 tests) passes.

web3guru888 reviewed Apr 9, 2026

View reviewed changes

web3guru888 mentioned this pull request Apr 10, 2026

feat: rule-based contradiction detection (issue #27) #433

Closed

5 tasks

bensig changed the base branch from main to develop April 11, 2026 22:22

bensig requested review from bensig, igorls and milla-jovovich as code owners April 11, 2026 22:22

hkf57 force-pushed the feat/min-similarity-threshold branch from df48bfb to 8002e7e Compare April 13, 2026 22:29

igorls added area/mcp MCP server and tools area/search Search and retrieval enhancement New feature or request labels Apr 14, 2026

Merge remote-tracking branch 'mempalace/develop' into feat/min-simila…

0efc576

…rity-threshold # Conflicts: # mempalace/searcher.py

fix: remove similarity clamp so negative min_similarity works

aa00c62

Clamped similarity to [0, 1] contradicted the documented [-1.0, 1.0] contract — anti-correlated results were floored to 0.0, passing any negative threshold but indistinguishable from a neutral match.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add min_similarity threshold to search tool#350

feat: add min_similarity threshold to search tool#350
hkf57 wants to merge 4 commits intoMemPalace:developfrom
hkf57:feat/min-similarity-threshold

hkf57 commented Apr 9, 2026 •

edited

Loading

Uh oh!

web3guru888 left a comment

Uh oh!

bgauryy commented Apr 9, 2026

Uh oh!

hkf57 commented Apr 9, 2026

Uh oh!

Qodo-Free-For-OSS commented Apr 21, 2026

Issue description

Fix Focus Areas

What to change

Uh oh!

hkf57 commented Apr 21, 2026

Uh oh!

hkf57 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

hkf57 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How to test

Checklist

Uh oh!

web3guru888 left a comment

Choose a reason for hiding this comment

Uh oh!

bgauryy commented Apr 9, 2026

PR Review: feat: add min_similarity threshold to search tool

Executive Summary

Ratings

PR Health

High Priority Issues

Bug #1: No input validation on min_similarity — invalid values silently return zero results

Bug #2: Default 0.0 is a silent breaking change — existing callers lose negative-similarity results

Medium Priority Issues

Architecture #3: Post-filter can return fewer results than limit with no indication to caller

Code Quality #4: Filtering logic duplicated across search() and search_memories()

Code Quality #5: MCP schema lacks minimum/maximum constraints

Flow Impact Analysis

Uh oh!

hkf57 commented Apr 9, 2026

Uh oh!

Qodo-Free-For-OSS commented Apr 21, 2026

Issue description

Fix Focus Areas

What to change

Uh oh!

hkf57 commented Apr 21, 2026

Uh oh!

hkf57 commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hkf57 commented Apr 9, 2026 •

edited

Loading

Bug #1: No input validation on `min_similarity` — invalid values silently return zero results

Bug #2: Default `0.0` is a silent breaking change — existing callers lose negative-similarity results

Architecture #3: Post-filter can return fewer results than `limit` with no indication to caller

Code Quality #4: Filtering logic duplicated across `search()` and `search_memories()`

Code Quality #5: MCP schema lacks `minimum`/`maximum` constraints