Skip to content

[Bug]: bm25RankToScore always return same value #5767

@papago2355

Description

@papago2355

Summary

bm25RankToScore incorrectly normalized SQLite FTS5 BM25 scores by clamping negative values to zero.
Because FTS5’s bm25() returns negative values where more negative means more relevant, all valid BM25 scores were collapsed into a constant normalized score of 1.0, effectively discarding BM25 relevance information in hybrid search.


Steps to reproduce

  1. Use SQLite FTS5 and query with bm25() as the ranking function.

  2. Pass the raw bm25() value (which is typically negative) into bm25RankToScore.

  3. Observe the returned normalized score.

Example:

bm25RankToScore(-4.2)  // => 1.0
bm25RankToScore(-2.1)  // => 1.0
bm25RankToScore(-0.5)  // => 1.0

Expected behavior

BM25 normalization should preserve relative relevance, such that:

More negative (better) BM25 scores result in higher normalized scores

Less negative (worse) BM25 scores result in lower normalized scores

Different BM25 values produce different normalized outputs

This allows BM25 relevance to meaningfully contribute to hybrid (vector + keyword) ranking.


Actual behavior

All negative BM25 values were clamped to 0 via Math.max(0, rank), resulting in:

normalized = 0
score = 1 / (1 + 0) = 1.0

As a result:

Every keyword match received the same textScore

BM25 relevance differences were completely ignored

Hybrid ranking effectively treated keyword search as a boolean signal


Logs or screenshots

N/A
(Behavior can be reproduced deterministically with direct calls to bm25RankToScore using negative BM25 values.)


Fixes #5214

by pr

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions