fix(searcher): clamp effective_distance to valid cosine range [0, 2]#1029
Merged
igorls merged 2 commits intoMemPalace:developfrom May 6, 2026
Merged
Conversation
``search_memories`` computes ``effective_dist = dist - boost`` where ``boost`` can be as large as ``CLOSET_RANK_BOOSTS[0] == 0.40`` for a rank-0 closet hit. When the raw drawer distance is small — any near-exact match — the subtraction goes negative. Two downstream effects: 1. Line 418 returns ``round(max(0.0, 1 - effective_dist), 3)`` as ``similarity``. With ``effective_dist = -0.30`` that yields ``similarity = 1.30``, outside the documented ``[0, 1]`` range. The ``max(0.0, ...)`` only prevents negative similarities; it does not cap above 1. 2. Line 427 stores ``_sort_key: effective_dist`` and line 435 sorts ``scored`` ascending by that key. A negative key drops *below* the rest, so the strongest hybrid matches end up sorting after weaker ones — ranking inversion under the exact conditions hybrid retrieval is supposed to serve best. Clamp ``effective_dist`` to the valid cosine-distance range ``[0, 2]``. The boost still wins (closet-backed hit still ranks first), it just no longer flips the order. Test added: mock drawer_col (base dist 0.08 / 0.35 for two sources) + closet_col (rank-0 closet for the 0.08 source) → assert all hits have ``0 <= similarity <= 1`` and ``0 <= effective_distance <= 2``, and that the closet-boosted source still ranks first. Relationship to other PRs: * **MemPalace#988** clamps the output ``similarity`` alone. That does not fix the sort-key inversion or the invalid ``effective_distance`` in the returned dict. This PR clamps at the arithmetic source so both downstream users of the value stay in range. * Orthogonal to **MemPalace#979** (``tool_check_duplicate`` negative similarity).
ee88b29 to
aac8437
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
search_memoriesinmempalace/searcher.pycomputeseffective_dist = dist - boostat line 411. Theboostcan be aslarge as
CLOSET_RANK_BOOSTS[0] == 0.40when a closet hits at rank 0.When the raw drawer distance is small (any near-exact match — typical
on short queries with strong semantic overlap), the subtraction goes
negative, violating the cosine-distance invariant
[0, 2].Two downstream effects, observed in the wild:
similarity> 1.0. Line 418:With
effective_dist = -0.30this returns1.30. Themax(0.0, ...)only prevents negative similarities; it does notcap above 1. API consumers see nonsensical similarity values.
Inverted ranking. Line 427 stores
_sort_key = effective_dist;line 435 sorts
scoredascending. A negative_sort_keydropsbelow ordinary positive distances, so the strongest hybrid hit
lands last — the opposite of what hybrid retrieval is supposed to
deliver.
Change
One-line clamp to the valid cosine-distance range:
```diff
```
The boost still wins (closet-backed hits still rank first); it just no
longer flips the order or returns out-of-range values.
Test plan
New test
TestSearchMemories::test_effective_distance_clamped_to_valid_cosine_range—mocks
get_collection+get_closets_collectionto return alow-distance drawer (
0.08) with a strong closet match (rank 0,boost 0.40). Asserts:
0.0 <= similarity <= 1.00.0 <= effective_distance <= 2.0Full searcher + hybrid suite:
27 passed in 102.24s.Relationship to other open PRs
similarityto[0, 1], which is a partial mitigation: it hides the nonsensicalsimilarity number but leaves
effective_distance(returned in theresult dict) and the sort key in the invalid range. This PR clamps at
the arithmetic source, so both downstream users stay in range and
ranking is correct. The two PRs can merge in either order; if fix: clamp similarity scores to [0,1] to prevent negative values #988
lands first, the
max(0.0, ...)guard there becomes redundant butharmless.
tool_check_duplicateguards).Why it doesn't introduce new issues
[0, 2]caps the boost effect atdist.effective_distancein the old range.(-∞, 2]; new range is[0, 2]— strict subset. No downstream consumer is broken by narrower bounds.