Skip to content

fix(searcher): clamp effective_distance to valid cosine range [0, 2]#1029

Merged
igorls merged 2 commits intoMemPalace:developfrom
eldar702:fix/searcher-effective-distance-clamp
May 6, 2026
Merged

fix(searcher): clamp effective_distance to valid cosine range [0, 2]#1029
igorls merged 2 commits intoMemPalace:developfrom
eldar702:fix/searcher-effective-distance-clamp

Conversation

@eldar702
Copy link
Copy Markdown
Contributor

Summary

search_memories in mempalace/searcher.py computes
effective_dist = dist - boost at line 411. The boost can be as
large as CLOSET_RANK_BOOSTS[0] == 0.40 when a closet hits at rank 0.
When the raw drawer distance is small (any near-exact match — typical
on short queries with strong semantic overlap), the subtraction goes
negative, violating the cosine-distance invariant [0, 2].

Two downstream effects, observed in the wild:

  1. similarity > 1.0. Line 418:

    \"similarity\": round(max(0.0, 1 - effective_dist), 3),

    With effective_dist = -0.30 this returns 1.30. The
    max(0.0, ...) only prevents negative similarities; it does not
    cap above 1. API consumers see nonsensical similarity values.

  2. Inverted ranking. Line 427 stores _sort_key = effective_dist;
    line 435 sorts scored ascending. A negative _sort_key drops
    below ordinary positive distances, so the strongest hybrid hit
    lands last — the opposite of what hybrid retrieval is supposed to
    deliver.

Change

One-line clamp to the valid cosine-distance range:

```diff

  •    effective_dist = dist - boost
    
  •    # Clamp to the valid cosine-distance range [0, 2]. When a strong
    
  •    # closet boost (up to 0.40) exceeds the raw distance, the subtraction
    
  •    # can go negative — which (a) yields \`\`similarity > 1.0\`\` downstream
    
  •    # and (b) makes the sort key land *below* ordinary positive distances,
    
  •    # inverting the ranking so the best hybrid matches sort last.
    
  •    effective_dist = max(0.0, min(2.0, dist - boost))
    

```

The boost still wins (closet-backed hits still rank first); it just no
longer flips the order or returns out-of-range values.

Test plan

New test TestSearchMemories::test_effective_distance_clamped_to_valid_cosine_range
mocks get_collection + get_closets_collection to return a
low-distance drawer (0.08) with a strong closet match (rank 0,
boost 0.40). Asserts:

  • every hit has 0.0 <= similarity <= 1.0
  • every hit has 0.0 <= effective_distance <= 2.0
  • the closet-boosted source still ranks first

Full searcher + hybrid suite: 27 passed in 102.24s.

Relationship to other open PRs

Why it doesn't introduce new issues

Plausible regression Prevention
Clamping to [0, 2] caps the boost effect at dist. Deliberate: the boost was over-tuned (0.40 vs typical distances 0.3–0.7). A cap still gives the boosted hit the best position without flipping the order.
Downstream consumers might expect effective_distance in the old range. Old range was (-∞, 2]; new range is [0, 2] — strict subset. No downstream consumer is broken by narrower bounds.

@igorls igorls added bug Something isn't working area/search Search and retrieval labels Apr 24, 2026
@igorls igorls added this to the v3.3.5 milestone May 2, 2026
eldar702 and others added 2 commits May 6, 2026 02:19
``search_memories`` computes ``effective_dist = dist - boost`` where
``boost`` can be as large as ``CLOSET_RANK_BOOSTS[0] == 0.40`` for a
rank-0 closet hit. When the raw drawer distance is small — any
near-exact match — the subtraction goes negative.

Two downstream effects:

1. Line 418 returns ``round(max(0.0, 1 - effective_dist), 3)`` as
   ``similarity``. With ``effective_dist = -0.30`` that yields
   ``similarity = 1.30``, outside the documented ``[0, 1]`` range.
   The ``max(0.0, ...)`` only prevents negative similarities; it does
   not cap above 1.
2. Line 427 stores ``_sort_key: effective_dist`` and line 435 sorts
   ``scored`` ascending by that key. A negative key drops *below* the
   rest, so the strongest hybrid matches end up sorting after weaker
   ones — ranking inversion under the exact conditions hybrid retrieval
   is supposed to serve best.

Clamp ``effective_dist`` to the valid cosine-distance range ``[0, 2]``.
The boost still wins (closet-backed hit still ranks first), it just no
longer flips the order.

Test added: mock drawer_col (base dist 0.08 / 0.35 for two sources) +
closet_col (rank-0 closet for the 0.08 source) → assert all hits have
``0 <= similarity <= 1`` and ``0 <= effective_distance <= 2``, and that
the closet-boosted source still ranks first.

Relationship to other PRs:

* **MemPalace#988** clamps the output ``similarity`` alone. That does not fix
  the sort-key inversion or the invalid ``effective_distance`` in the
  returned dict. This PR clamps at the arithmetic source so both
  downstream users of the value stay in range.
* Orthogonal to **MemPalace#979** (``tool_check_duplicate`` negative similarity).
@igorls igorls force-pushed the fix/searcher-effective-distance-clamp branch from ee88b29 to aac8437 Compare May 6, 2026 05:19
@igorls igorls merged commit f4617b3 into MemPalace:develop May 6, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/search Search and retrieval bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants