fix(searcher): clamp effective_distance to valid cosine range [0, 2] by eldar702 · Pull Request #1029 · MemPalace/mempalace

eldar702 · 2026-04-19T08:09:14Z

Summary

search_memories in mempalace/searcher.py computes
effective_dist = dist - boost at line 411. The boost can be as
large as CLOSET_RANK_BOOSTS[0] == 0.40 when a closet hits at rank 0.
When the raw drawer distance is small (any near-exact match — typical
on short queries with strong semantic overlap), the subtraction goes
negative, violating the cosine-distance invariant [0, 2].

Two downstream effects, observed in the wild:

similarity > 1.0. Line 418:
```
\"similarity\": round(max(0.0, 1 - effective_dist), 3),
```
With effective_dist = -0.30 this returns 1.30. The
max(0.0, ...) only prevents negative similarities; it does not
cap above 1. API consumers see nonsensical similarity values.
Inverted ranking. Line 427 stores _sort_key = effective_dist;
line 435 sorts scored ascending. A negative _sort_key drops
below ordinary positive distances, so the strongest hybrid hit
lands last — the opposite of what hybrid retrieval is supposed to
deliver.

Change

One-line clamp to the valid cosine-distance range:

```diff

```
   effective_dist = dist - boost
```

   # Clamp to the valid cosine-distance range [0, 2]. When a strong

   # closet boost (up to 0.40) exceeds the raw distance, the subtraction

   # can go negative — which (a) yields \`\`similarity > 1.0\`\` downstream

   # and (b) makes the sort key land *below* ordinary positive distances,

   # inverting the ranking so the best hybrid matches sort last.

   effective_dist = max(0.0, min(2.0, dist - boost))

```

The boost still wins (closet-backed hits still rank first); it just no
longer flips the order or returns out-of-range values.

Test plan

New test TestSearchMemories::test_effective_distance_clamped_to_valid_cosine_range —
mocks get_collection + get_closets_collection to return a
low-distance drawer (0.08) with a strong closet match (rank 0,
boost 0.40). Asserts:

every hit has 0.0 <= similarity <= 1.0
every hit has 0.0 <= effective_distance <= 2.0
the closet-boosted source still ranks first

Full searcher + hybrid suite: 27 passed in 102.24s.

Relationship to other open PRs

Complements fix: clamp similarity scores to [0,1] to prevent negative values #988. fix: clamp similarity scores to [0,1] to prevent negative values #988 clamps the output similarity to
[0, 1], which is a partial mitigation: it hides the nonsensical
similarity number but leaves effective_distance (returned in the
result dict) and the sort key in the invalid range. This PR clamps at
the arithmetic source, so both downstream users stay in range and
ranking is correct. The two PRs can merge in either order; if fix: clamp similarity scores to [0,1] to prevent negative values #988
lands first, the max(0.0, ...) guard there becomes redundant but
harmless.
Orthogonal to Fix/check duplicate negative similarity #979 (tool_check_duplicate guards).

Why it doesn't introduce new issues

Plausible regression	Prevention
Clamping to `[0, 2]` caps the boost effect at `dist`.	Deliberate: the boost was over-tuned (0.40 vs typical distances 0.3–0.7). A cap still gives the boosted hit the best position without flipping the order.
Downstream consumers might expect `effective_distance` in the old range.	Old range was `(-∞, 2]`; new range is `[0, 2]` — strict subset. No downstream consumer is broken by narrower bounds.

``search_memories`` computes ``effective_dist = dist - boost`` where ``boost`` can be as large as ``CLOSET_RANK_BOOSTS[0] == 0.40`` for a rank-0 closet hit. When the raw drawer distance is small — any near-exact match — the subtraction goes negative. Two downstream effects: 1. Line 418 returns ``round(max(0.0, 1 - effective_dist), 3)`` as ``similarity``. With ``effective_dist = -0.30`` that yields ``similarity = 1.30``, outside the documented ``[0, 1]`` range. The ``max(0.0, ...)`` only prevents negative similarities; it does not cap above 1. 2. Line 427 stores ``_sort_key: effective_dist`` and line 435 sorts ``scored`` ascending by that key. A negative key drops *below* the rest, so the strongest hybrid matches end up sorting after weaker ones — ranking inversion under the exact conditions hybrid retrieval is supposed to serve best. Clamp ``effective_dist`` to the valid cosine-distance range ``[0, 2]``. The boost still wins (closet-backed hit still ranks first), it just no longer flips the order. Test added: mock drawer_col (base dist 0.08 / 0.35 for two sources) + closet_col (rank-0 closet for the 0.08 source) → assert all hits have ``0 <= similarity <= 1`` and ``0 <= effective_distance <= 2``, and that the closet-boosted source still ranks first. Relationship to other PRs: * **MemPalace#988** clamps the output ``similarity`` alone. That does not fix the sort-key inversion or the invalid ``effective_distance`` in the returned dict. This PR clamps at the arithmetic source so both downstream users of the value stay in range. * Orthogonal to **MemPalace#979** (``tool_check_duplicate`` negative similarity).

eldar702 requested review from bensig, igorls and milla-jovovich as code owners April 19, 2026 08:09

igorls added bug Something isn't working area/search Search and retrieval labels Apr 24, 2026

igorls added this to the v3.3.5 milestone May 2, 2026

igorls mentioned this pull request May 6, 2026

fix(searcher): guard against None metadata/doc in search result loops #1019

Merged

eldar702 and others added 2 commits May 6, 2026 02:19

style: ruff format tests/test_searcher.py (CI lint)

aac8437

igorls force-pushed the fix/searcher-effective-distance-clamp branch from ee88b29 to aac8437 Compare May 6, 2026 05:19

igorls merged commit f4617b3 into MemPalace:develop May 6, 2026
6 checks passed

jphein mentioned this pull request May 6, 2026

feat(searcher): warnings + sqlite BM25 top-up when vector underdelivers #1005

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(searcher): clamp effective_distance to valid cosine range [0, 2]#1029

fix(searcher): clamp effective_distance to valid cosine range [0, 2]#1029
igorls merged 2 commits intoMemPalace:developfrom
eldar702:fix/searcher-effective-distance-clamp

eldar702 commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eldar702 commented Apr 19, 2026

Summary

Change

Test plan

Relationship to other open PRs

Why it doesn't introduce new issues

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants