Better vectorize score computations. #14704

jpountz · 2025-05-23T13:32:35Z

Existing vectorization of scores is a bit fragile since it relies on SimScorer#score being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of SimScorer may be used, in particular those from FeatureField.

Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at
#14679 (comment).

@gf2121

Existing vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at apache#14679 (comment).

github-actions · 2025-05-23T13:33:33Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

gf2121 · 2025-05-23T13:41:35Z

lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java

+     * Bulk computation of scores. For each entry in the given {@code buffer}, compute the score of
+     * the document and set its value in the {@code scores} array at the same index.
+     *
+     * <p><b>NOTE</b>: Doc IDs must be sorted, with no duplicates.


Should we override it in AssertingSimilarity to check these limitations?

gf2121 · 2025-05-23T13:45:43Z

lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java

+        }
+      } else {
+        if (normInverses == null || normInverses.length < buffer.size) {
+          normInverses = new float[ArrayUtil.oversize(buffer.size, Float.BYTES)];


Do we need this new array or we can just reuse scores?

Ohhhh, nice idea.

jpountz · 2025-05-23T13:51:32Z

Great feedback once again @gf2121 !

jpountz · 2025-05-23T13:52:03Z

Here is what luceneutil has to say for exhaustive evaluation (totalHitsThreshold=Integer.MAX_VALUE):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                            Term       74.75      (2.1%)       73.14      (4.2%)   -2.2% (  -8% -    4%) 0.073
                        PKLookup      318.63      (4.8%)      315.28      (5.8%)   -1.1% ( -11% -    9%) 0.586
                IntervalsOrdered        2.31      (4.0%)        2.29      (3.1%)   -0.7% (  -7% -    6%) 0.606
                        Wildcard       18.53      (6.3%)       18.45      (7.8%)   -0.4% ( -13% -   14%) 0.867
                CountAndHighHigh      359.81      (1.5%)      358.64      (1.7%)   -0.3% (  -3% -    2%) 0.572
                   TermMonthSort     3170.36      (2.9%)     3160.30      (2.4%)   -0.3% (  -5% -    5%) 0.746
                 CountAndHighMed      299.33      (1.5%)      298.56      (1.4%)   -0.3% (  -3% -    2%) 0.617
          CountFilteredOrHighMed      148.00      (0.7%)      147.73      (0.7%)   -0.2% (  -1% -    1%) 0.474
                 CountOrHighHigh      347.23      (1.6%)      346.72      (1.5%)   -0.1% (  -3% -    3%) 0.794
                      OrHighRare       21.29      (2.7%)       21.26      (3.7%)   -0.1% (  -6% -    6%) 0.908
              CombinedOrHighHigh        6.25      (3.7%)        6.24      (3.0%)   -0.1% (  -6% -    6%) 0.914
               CombinedOrHighMed        9.60      (3.6%)        9.59      (3.0%)   -0.1% (  -6% -    6%) 0.925
                 DismaxOrHighMed       10.60      (2.4%)       10.60      (3.0%)   -0.1% (  -5% -    5%) 0.934
                          Fuzzy1       96.54      (2.4%)       96.46      (1.7%)   -0.1% (  -4% -    4%) 0.918
                         Prefix3       14.98      (7.1%)       14.97      (8.1%)   -0.1% ( -14% -   16%) 0.985
         CountFilteredOrHighHigh      136.46      (0.9%)      136.43      (0.6%)   -0.0% (  -1% -    1%) 0.938
                    CombinedTerm       31.89      (3.0%)       31.88      (3.5%)   -0.0% (  -6% -    6%) 0.993
             CountFilteredOrMany       27.57      (1.6%)       27.56      (1.5%)   -0.0% (  -3% -    3%) 0.994
                          Fuzzy2       87.61      (2.4%)       87.66      (1.7%)    0.1% (  -3% -    4%) 0.947
                DismaxOrHighHigh        6.90      (2.2%)        6.90      (2.9%)    0.1% (  -4% -    5%) 0.929
                     CountOrMany       30.40      (1.6%)       30.43      (2.0%)    0.1% (  -3% -    3%) 0.889
                       OrHighMed       29.23      (4.9%)       29.26      (6.8%)    0.1% ( -11% -   12%) 0.961
                       CountTerm     8418.37      (3.6%)     8429.79      (4.0%)    0.1% (  -7% -    8%) 0.922
              FilteredAndHighMed      138.18      (1.5%)      138.39      (1.2%)    0.2% (  -2% -    2%) 0.749
                 FilteredPrefix3       29.68      (3.7%)       29.73      (3.1%)    0.2% (  -6% -    7%) 0.902
                      DismaxTerm       44.65      (4.6%)       44.73      (6.1%)    0.2% ( -10% -   11%) 0.931
                FilteredOr3Terms       52.80      (2.4%)       52.89      (2.7%)    0.2% (  -4% -    5%) 0.848
                    FilteredTerm      112.37      (1.5%)      112.62      (0.9%)    0.2% (  -2% -    2%) 0.628
               TermDayOfYearSort      285.08      (4.0%)      285.98      (3.0%)    0.3% (  -6% -    7%) 0.808
     FilteredAnd2Terms2StopWords      194.14      (1.4%)      194.86      (1.1%)    0.4% (  -2% -    2%) 0.425
               FilteredAnd3Terms      221.69      (1.0%)      222.51      (0.9%)    0.4% (  -1% -    2%) 0.280
                          Phrase        3.04      (2.3%)        3.05      (1.4%)    0.4% (  -3% -    4%) 0.558
                       And3Terms      147.00      (4.0%)      147.60      (2.0%)    0.4% (  -5% -    6%) 0.727
             FilteredAndHighHigh       71.07      (2.9%)       71.36      (1.2%)    0.4% (  -3% -    4%) 0.614
                        Or3Terms       40.07      (3.7%)       40.24      (5.8%)    0.4% (  -8% -   10%) 0.806
                     CountPhrase        4.18      (2.8%)        4.19      (1.6%)    0.4% (  -3% -    4%) 0.586
                  FilteredOrMany        5.74      (3.9%)        5.77      (2.4%)    0.5% (  -5% -    7%) 0.651
                  CountOrHighMed      336.79      (2.5%)      338.77      (2.1%)    0.6% (  -3% -    5%) 0.488
                  FilteredPhrase       23.17      (1.6%)       23.31      (1.1%)    0.6% (  -2% -    3%) 0.227
             CountFilteredPhrase       25.59      (1.6%)       25.75      (1.1%)    0.6% (  -2% -    3%) 0.224
               FilteredOrHighMed       56.17      (2.9%)       56.52      (2.6%)    0.6% (  -4% -    6%) 0.543
              FilteredOrHighHigh       47.21      (2.9%)       47.51      (2.4%)    0.6% (  -4% -    6%) 0.520
            FilteredAndStopWords       48.92      (3.9%)       49.33      (1.7%)    0.8% (  -4% -    6%) 0.438
              CombinedAndHighMed       41.62      (3.5%)       42.00      (2.0%)    0.9% (  -4% -    6%) 0.379
             CombinedAndHighHigh       13.04      (3.5%)       13.16      (2.1%)    0.9% (  -4% -    6%) 0.376
             FilteredOrStopWords       29.17      (3.3%)       29.44      (2.1%)    0.9% (  -4% -    6%) 0.353
                AndMedOrHighHigh       42.45      (4.0%)       42.87      (2.9%)    1.0% (  -5% -    8%) 0.443
                      OrHighHigh       20.07      (5.2%)       20.27      (7.1%)    1.0% ( -10% -   14%) 0.663
                      AndHighMed       90.98      (6.5%)       91.91      (2.6%)    1.0% (  -7% -   10%) 0.572
      FilteredOr2Terms2StopWords       26.93      (3.4%)       27.21      (2.1%)    1.0% (  -4% -    6%) 0.312
             And2Terms2StopWords       85.26      (5.6%)       86.21      (2.6%)    1.1% (  -6% -    9%) 0.487
                 AndHighOrMedMed       33.56      (3.2%)       33.95      (3.1%)    1.1% (  -4% -    7%) 0.315
                  FilteredIntNRQ       98.09      (2.4%)       99.37      (2.4%)    1.3% (  -3% -    6%) 0.136
                    AndStopWords       15.79      (9.8%)       16.04      (4.8%)    1.6% ( -11% -   17%) 0.579
                      TermDTSort      377.84      (3.4%)      384.79      (4.1%)    1.8% (  -5% -    9%) 0.181
                     AndHighHigh       26.00      (8.6%)       26.54      (3.4%)    2.1% (  -9% -   15%) 0.387
                   TermTitleSort       80.54      (5.4%)       82.55      (6.5%)    2.5% (  -8% -   15%) 0.252
              Or2Terms2StopWords        9.03      (5.6%)        9.26      (7.1%)    2.6% (  -9% -   16%) 0.273
                     OrStopWords       10.27      (5.5%)       10.55      (7.1%)    2.7% (  -9% -   16%) 0.245
                          IntNRQ       27.23      (5.3%)       28.07     (12.0%)    3.1% ( -13% -   21%) 0.367
                          OrMany        3.07      (5.4%)        3.24      (6.1%)    5.4% (  -5% -   17%) 0.010

OrMany gets a ~5.5% speedup, other queries don't seem to get a significant speedup though seeing Or2Terms2StopWords and OrStopWords next to the bottom of the list is likely not a coincidence.

rmuir · 2025-05-23T13:56:03Z

lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java

      this.weight = boost * idf.getValue().floatValue();
    }

+    private float score(float freq, float normInverse) {


consider the name? it is too dangerous to have both these signatures, even though float and long have different size:

float score(float, float); float score(float, long);

rmuir · 2025-05-23T18:24:41Z

before adding more complexity to scores can we resolve the current problem with scoring in jenkins? https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/3326/

java.lang.AssertionError: Doc 1 scores don't match
TopDocs totalHits=6 hits top=6
	0) doc=1	score=1.8358624
	1) doc=4	score=1.8358624
	2) doc=7	score=1.8358624
	3) doc=3	score=1.7754854
	4) doc=0	score=1.6846089
	5) doc=6	score=0.67934644
TopDocs totalHits=4 hits top=4
	0) doc=1	score=1.8358622
	1) doc=4	score=1.8358622
	2) doc=7	score=1.8358622
	3) doc=3	score=1.7754854
for query:(data:5 data:5 data:5 data:6 +data:6 data:Z data:X -data:1)~2 expected:<1.8358622> but was:<1.8358624>
...
NOTE: reproduce with: gradlew test --tests TestBooleanMinShouldMatch.testRandomQueries -Dtests.seed=E2469ED1547BF96C -Dtests.multiplier=3 -Dtests.directory=MMapDirectory -Dtests.locale=tok-001 -Dtests.timezone=Asia/Kuching -Dtests.asserts=true -Dtests.file.encoding=UTF-8

This failure has only surfaced recently, so I think it would be good to avoid stacking more changes to scoring on top of it.

jpountz · 2025-05-26T12:33:13Z

I was looking at it too as I was worried it may have been caused by recent optimizations, but it looks like an old and rare failure. I opened a tentative test fix at #14715.

@gf2121

Existing vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at #14679 (comment).

This reverts commit 64d3c11.

This reverts commit abe18a4.

jpountz · 2025-06-10T19:38:19Z

I had to revert this PR, it's not thread-safe: I thought SimScorers were created per Scorer, but they are created per Weight and may be used across multiple threads.

jpountz · 2025-06-10T20:02:56Z

@mikemccand FYI if a nightly run started with this change, it may fail due to this thread safety issue. The commit is now reverted, so restarting should be enough to fix the problem.

@gf2121

Existing vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at apache#14679 (comment).

@gf2121

Existing vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at apache#14679 (comment).

jpountz added this to the 10.3.0 milestone May 23, 2025

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking May 23, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking May 23, 2025

github-actions bot added module:core/search module:test-framework labels May 23, 2025

CHANGES

a5b3202

jpountz requested review from gf2121 and rmuir May 23, 2025 13:33

gf2121 reviewed May 23, 2025

View reviewed changes

Review feedback.

271388d

tidy

7c6f300

rmuir reviewed May 23, 2025

View reviewed changes

avoid name clash

91a4676

gf2121 approved these changes Jun 1, 2025

View reviewed changes

jpountz added 5 commits June 2, 2025 10:01

Merge branch 'main' into better_vectorize_score_computations

17749bd

Also check the first freq

4f67e1d

Merge branch 'main' into better_vectorize_score_computations

69d6637

Adjust to changes in main.

131eded

Merge branch 'main' into better_vectorize_score_computations

95ed778

jpountz merged commit abe18a4 into apache:main Jun 10, 2025
7 checks passed

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking Jun 10, 2025

jpountz deleted the better_vectorize_score_computations branch June 10, 2025 14:13

jpountz added a commit that referenced this pull request Jun 10, 2025

Revert "Better vectorize score computations. (#14704)"

bb590f5

This reverts commit 64d3c11.

jpountz added a commit that referenced this pull request Jun 10, 2025

Revert "Better vectorize score computations. (#14704)"

544d21b

This reverts commit abe18a4.

jpountz removed this from the 10.3.0 milestone Jun 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better vectorize score computations. #14704

Better vectorize score computations. #14704

Uh oh!

jpountz commented May 23, 2025

Uh oh!

github-actions bot commented May 23, 2025

Uh oh!

gf2121 May 23, 2025

Uh oh!

gf2121 May 23, 2025

Uh oh!

jpountz May 23, 2025

Uh oh!

jpountz commented May 23, 2025

Uh oh!

jpountz commented May 23, 2025 •

edited

Loading

Uh oh!

rmuir May 23, 2025

Uh oh!

rmuir commented May 23, 2025

Uh oh!

jpountz commented May 26, 2025

Uh oh!

Uh oh!

jpountz commented Jun 10, 2025

Uh oh!

jpountz commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Better vectorize score computations. #14704

Better vectorize score computations. #14704

Uh oh!

Conversation

jpountz commented May 23, 2025

Uh oh!

github-actions bot commented May 23, 2025

Uh oh!

gf2121 May 23, 2025

Choose a reason for hiding this comment

Uh oh!

gf2121 May 23, 2025

Choose a reason for hiding this comment

Uh oh!

jpountz May 23, 2025

Choose a reason for hiding this comment

Uh oh!

jpountz commented May 23, 2025

Uh oh!

jpountz commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmuir May 23, 2025

Choose a reason for hiding this comment

Uh oh!

rmuir commented May 23, 2025

Uh oh!

jpountz commented May 26, 2025

Uh oh!

Uh oh!

jpountz commented Jun 10, 2025

Uh oh!

jpountz commented Jun 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jpountz commented May 23, 2025 •

edited

Loading