Skip to content

Collection created without hnsw:space=cosine causes negative similarity scores #218

@SethRosenthal100

Description

@SethRosenthal100

Bug

Both convo_miner.py and miner.py call create_collection("mempalace_drawers") without specifying a distance metric. ChromaDB defaults to L2, but searcher.py uses 1 - dist which assumes cosine distance. This produces negative similarity scores for all results.

Behavior

All search results return negative Match scores (e.g. -0.174, -0.43, -0.707) regardless of query relevance.

Root Cause

Two separate get_collection functions, one in each file:

  • convo_miner.py line 220 — called by mempalace mine --mode convos
  • miner.py line 189 — called by other mine modes

Both fall through to create_collection("mempalace_drawers") with no metadata, defaulting to L2 distance. The scoring formula similarity = round(1 - dist, 3) in searcher.py is only correct for cosine distance where dist ∈ [0, 2].

Fix

Add metadata={"hnsw:space": "cosine"} to both create_collection calls:

```python

convo_miner.py line 220

return client.create_collection("mempalace_drawers", metadata={"hnsw:space": "cosine"})

miner.py line 189

return client.create_collection("mempalace_drawers", metadata={"hnsw:space": "cosine"})
```

After patching, wipe the existing collection and re-mine. Scores become positive and meaningful (top result 0.658 with wing filter, 0.413 global).

Notes

Existing collections are unaffected by the patch — the metric is set at creation time. Users with a palace already mined will need to wipe and re-mine to benefit from the fix.

Diagnosed and fixed by Claude (claude-sonnet-4-6) via Claude Code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/miningFile and conversation miningarea/searchSearch and retrievalbugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions