Skip to content

tool_check_duplicate in mcp_server.py and Layer3.search() in layers.py can return negative similarity scores for very dissimilar content. #978

@shafdev

Description

@shafdev

What happened?

tool_check_duplicate in mcp_server.py and Layer3.search() in layers.py can return negative similarity scores for very dissimilar content.

Both use round(1 - dist, 3) to convert a ChromaDB cosine distance into a similarity score. With hnsw:space=cosine, ChromaDB distances are in the range [0, 2] — not [0, 1]. For maximally dissimilar vectors the distance slightly exceeds 1.0, making 1 - dist negative.

The rest of the codebase already has the correct pattern: searcher.py line 285 uses round(max(0.0, 1 - dist), 3). The two affected sites are missing the max(0.0, ...) clamp.

Affected lines:

  • mempalace/mcp_server.pytool_check_duplicate(): similarity = round(1 - dist, 3)
  • mempalace/layers.pyLayer3.search(): similarity = round(1 - dist, 3)

What did you expect?

Similarity scores should always be in [0.0, 1.0]. A score of -0.004 is meaningless and could confuse AI clients that read or display the value (e.g. tool_check_duplicate returns the similarity in its JSON response, and Layer3 renders it as (sim=-0.004) in the memory context block).

How to reproduce:

  1. Install mempalace and run the snippet below (no palace required — uses an in-memory ChromaDB client).

  2. Run:

import chromadb

client = chromadb.Client()
col = client.get_or_create_collection("bug_demo", metadata={"hnsw:space": "cosine"})

col.add(
    ids=["drawer_1"],
    documents=["The Pythagorean theorem states that a^2 + b^2 = c^2 in a right triangle."],
    metadatas=[{"wing": "math", "room": "geometry"}],
)

results = col.query(
    query_texts=["Chocolate cake recipe with vanilla frosting and strawberries."],
    n_results=1,
    include=["distances"],
)

dist = results["distances"][0][0]
print(f"distance  : {dist:.4f}")           # e.g. 1.0035
print(f"similarity: {round(1 - dist, 3)}") # e.g. -0.004  ← negative!
  1. Observe that similarity is negative.

Output:

distance  : 1.0035
similarity: -0.004

Environment:

  • OS: macOS
  • Python version: 3.11.9
  • MemPal version: git SHA 2792ce8 (develop)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions