Better vectorize score computations. #15039

jpountz · 2025-08-08T04:09:39Z

Existing auto-vectorization of scores is a bit fragile since it relies on SimScorer#score being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of SimScorer may be used, in particular those from FeatureField.

Furthermore, existing auto-vectorization has some room for improvement as @gf2121 highlighted at
#14679 (comment).

@gf2121

Existing auto-vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing auto-vectorization has some room for improvement as @gf2121 highlighted at apache#14679 (comment).

github-actions · 2025-08-08T04:10:29Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

jpountz · 2025-08-08T04:12:36Z

Even though SimScorer#score is monomorphic when running luceneutil, this change still yields a speedup.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   TermTitleSort       84.03      (9.4%)       82.36      (4.7%)   -2.0% ( -14% -   13%) 0.479
                    CombinedTerm       39.33      (0.8%)       39.03      (0.7%)   -0.8% (  -2% -    0%) 0.005
             FilteredAndHighHigh       78.80      (1.7%)       78.22      (1.7%)   -0.7% (  -4% -    2%) 0.254
            FilteredAndStopWords       65.21      (1.8%)       64.75      (2.0%)   -0.7% (  -4% -    3%) 0.328
                      TermDTSort      384.93      (3.0%)      383.03      (2.8%)   -0.5% (  -6% -    5%) 0.652
                  FilteredPhrase       32.27      (1.1%)       32.17      (1.2%)   -0.3% (  -2% -    2%) 0.485
                       CountTerm     9292.09      (3.8%)     9271.80      (2.2%)   -0.2% (  -6% -    6%) 0.853
               FilteredAnd3Terms      191.13      (0.7%)      190.90      (0.8%)   -0.1% (  -1% -    1%) 0.649
                 CountAndHighMed      310.74      (1.4%)      310.39      (1.4%)   -0.1% (  -2% -    2%) 0.832
                CountAndHighHigh      360.15      (2.1%)      359.87      (1.9%)   -0.1% (  -3% -    3%) 0.918
                FilteredOr3Terms      167.20      (1.0%)      167.09      (0.8%)   -0.1% (  -1% -    1%) 0.843
                     CountPhrase        4.26      (1.2%)        4.26      (1.6%)   -0.0% (  -2% -    2%) 0.964
         CountFilteredOrHighHigh      137.25      (0.9%)      137.24      (0.7%)   -0.0% (  -1% -    1%) 0.998
     FilteredAnd2Terms2StopWords      215.45      (0.9%)      215.45      (0.8%)   -0.0% (  -1% -    1%) 0.999
             CombinedAndHighHigh       23.39      (1.1%)       23.40      (0.9%)    0.0% (  -1% -    2%) 0.924
              CombinedOrHighHigh       23.19      (1.4%)       23.21      (0.9%)    0.1% (  -2% -    2%) 0.883
          CountFilteredOrHighMed      149.14      (0.7%)      149.25      (0.6%)    0.1% (  -1% -    1%) 0.765
             CountFilteredOrMany       27.26      (1.4%)       27.29      (1.1%)    0.1% (  -2% -    2%) 0.862
              FilteredAndHighMed      156.46      (1.0%)      156.63      (1.0%)    0.1% (  -1% -    2%) 0.790
               FilteredOrHighMed      153.23      (1.0%)      153.40      (1.0%)    0.1% (  -1% -    2%) 0.765
                  FilteredOrMany       16.50      (2.4%)       16.53      (1.6%)    0.2% (  -3% -    4%) 0.843
               CombinedOrHighMed       88.23      (0.9%)       88.39      (0.7%)    0.2% (  -1% -    1%) 0.553
              CombinedAndHighMed       89.40      (0.8%)       89.59      (0.9%)    0.2% (  -1% -    1%) 0.510
                AndMedOrHighHigh       88.85      (1.1%)       89.04      (1.3%)    0.2% (  -2% -    2%) 0.619
                  CountOrHighMed      362.64      (1.4%)      363.48      (1.3%)    0.2% (  -2% -    3%) 0.653
             CountFilteredPhrase       25.46      (1.7%)       25.52      (1.5%)    0.3% (  -2% -    3%) 0.669
                          OrMany       23.56      (2.5%)       23.63      (2.2%)    0.3% (  -4% -    5%) 0.744
                  FilteredIntNRQ      298.55      (0.6%)      299.46      (0.8%)    0.3% (  -1% -    1%) 0.261
      FilteredOr2Terms2StopWords      147.06      (1.0%)      147.54      (1.1%)    0.3% (  -1% -    2%) 0.415
                 CountOrHighHigh      343.27      (2.1%)      344.40      (2.0%)    0.3% (  -3% -    4%) 0.672
                    FilteredTerm      162.14      (2.8%)      162.70      (2.2%)    0.3% (  -4% -    5%) 0.718
                     CountOrMany       29.15      (1.4%)       29.27      (1.2%)    0.4% (  -2% -    3%) 0.420
              FilteredOrHighHigh       66.92      (2.1%)       67.25      (1.8%)    0.5% (  -3% -    4%) 0.503
             FilteredOrStopWords       45.53      (2.3%)       45.78      (2.1%)    0.6% (  -3% -    5%) 0.504
                 FilteredPrefix3      151.12      (2.5%)      152.02      (1.8%)    0.6% (  -3% -    5%) 0.467
                   TermMonthSort     3335.12      (2.1%)     3355.82      (2.1%)    0.6% (  -3% -    4%) 0.434
                       And3Terms      242.26      (1.1%)      244.68      (1.0%)    1.0% (  -1% -    3%) 0.012
                 AndHighOrMedMed       50.85      (1.6%)       51.36      (1.1%)    1.0% (  -1% -    3%) 0.059
             And2Terms2StopWords      205.51      (1.4%)      208.13      (1.3%)    1.3% (  -1% -    3%) 0.011
               TermDayOfYearSort      284.17      (2.6%)      288.16      (1.5%)    1.4% (  -2% -    5%) 0.077
                        Or3Terms      232.40      (1.6%)      235.89      (1.3%)    1.5% (  -1% -    4%) 0.006
              Or2Terms2StopWords      205.85      (1.5%)      209.59      (1.2%)    1.8% (   0% -    4%) 0.000
                      AndHighMed      203.33      (1.5%)      207.11      (1.5%)    1.9% (  -1% -    4%) 0.001
                    AndStopWords       47.25      (2.3%)       48.13      (2.0%)    1.9% (  -2% -    6%) 0.022
                            Term      663.37      (5.9%)      675.77      (5.7%)    1.9% (  -9% -   14%) 0.397
                     AndHighHigh       69.26      (2.4%)       70.92      (2.0%)    2.4% (  -1% -    6%) 0.004
                       OrHighMed      257.23      (2.2%)      263.59      (1.8%)    2.5% (  -1% -    6%) 0.001
                     OrStopWords       48.68      (3.1%)       50.25      (2.1%)    3.2% (  -1% -    8%) 0.001
                      OrHighHigh       77.59      (2.9%)       80.15      (2.1%)    3.3% (  -1% -    8%) 0.001
                      OrHighRare      285.82      (9.8%)      299.46      (6.3%)    4.8% ( -10% -   23%) 0.125

There are still a number of places where score computations are not vectorized, specifically:

When computing scores from impacts,
MaxScore's non-essential clauses,
All BlockMaxConjunctionBulkScorer's clauses but the leading clause.

gf2121

Thank you for sticking and fixing thread-safe issue!

I played with similar approach around a month ago but did not see a noticeable improvement. Maybe our recent optimizations make it appear.

There are still a number of places where score computations are not vectorized, specifically:

When computing scores from impacts,
MaxScore's non-essential clauses,
All BlockMaxConjunctionBulkScorer's clauses but the leading clause.

++. I guess MultiNormsLeafSimScorer#scoreRange that used by CombinedFieldQuery can be optimized as well.

gf2121 · 2025-08-08T06:19:25Z

lucene/core/src/java/org/apache/lucene/search/similarities/Similarity.java

+     *
+     * <p><b>NOTE</b>: The returned instance is not thread-safe.
+     */
+    public BulkSimScorer asBulkSimScorer() {


Change of AssertingSimilarity is not included in this patch, is this intended?

No, it's a miss!

Because nightly benchmarks only test a small set of scenarios, the JVM may end up over-optimizing query evaluation. For instance, it only runs with BM25Similarity, sorting tasks only run against a TermQuery, filtered vector search only exercises the approximate path, not the exact path, etc. This tries to make the benchmark more realistic by running some cheap queries before running bencharks, whose goal is to pollute call sites so that they are not all magically monomorphic. This will translate in a drop in performance for some tasks, but hopefully we can recover some of it in the future. Related PR: - apache/lucene#14968 where we suspected the speedup to be due to specialization making a call site monomorphic in nightly benchmarks that would not be monomorphic in the real world, - apache/lucene#15039 where we are trying to improve behavior with several different similarity impls but the benchmarks only show a small improvement since they always run with BM25Similarity.

I ran experiments locally that suggest that some of the performance decrease from type pollution (mikemccand/luceneutil#436) can be attributed to calls to `SimScorer#score` no longer being inlinable since they are polymorphic. This change helps `BM25Scorer` remain inlinable using similar tricks that we are applying for `Bits#get` and `ImpactsEnum#nextDoc`/`ImpactsEnum#advance`. Hopefully changes such as apache#15039 will help improve performance with other similarities as well in the future.

jpountz · 2025-08-23T19:12:18Z

I pushed an annotation, this helped more than what my local benchmarking reported!

@gf2121

Existing auto-vectorization of scores is a bit fragile since it relies on `SimScorer#score` being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of `SimScorer` may be used, in particular those from `FeatureField`. Furthermore, existing auto-vectorization has some room for improvement as @gf2121 highlighted at #14679 (comment).

In apache#15039 we introduced a bulk `SimScorer#score` API and used it to compute scores with the leading conjunctive clause and "essential" clauses of disjunctive queries. With this PR, we are now also using this bulk API when translating (term frequency, length normalization factor) pairs into the maximum possible score that a block of postings may produce. To do it right, I had to change the impacts API to no longer return a List of (term freq, norm) pairs, but instead two parallel arrays of term frequencies and norms that could (almost) directly be passed to the `SimScorer#score` bulk API. Unfortunately this makes the change quite big since many backward formats had to be touched.

In #15039 we introduced a bulk `SimScorer#score` API and used it to compute scores with the leading conjunctive clause and "essential" clauses of disjunctive queries. With this PR, we are now also using this bulk API when translating (term frequency, length normalization factor) pairs into the maximum possible score that a block of postings may produce. To do it right, I had to change the impacts API to no longer return a List of (term freq, norm) pairs, but instead two parallel arrays of term frequencies and norms that could (almost) directly be passed to the `SimScorer#score` bulk API. Unfortunately this makes the change quite big since many backward formats had to be touched. Co-authored-by: Guo Feng <[email protected]>

jpountz added this to the 10.3.0 milestone Aug 8, 2025

github-actions bot added module:core/search module:test-framework labels Aug 8, 2025

jpountz added 2 commits August 8, 2025 06:13

CHANGES

f9cab85

Merge branch 'main' into vectorize_sim_scores

7956d7b

gf2121 reviewed Aug 8, 2025

View reviewed changes

AssertingSimilarity

0b0b745

gf2121 approved these changes Aug 8, 2025

View reviewed changes

jpountz mentioned this pull request Aug 12, 2025

Pollute call sites before running benchmarks. mikemccand/luceneutil#436

Merged

HUSTERGS mentioned this pull request Aug 12, 2025

Brings back Scorer#applyAsRequiredClause #14968

Draft

jpountz mentioned this pull request Aug 17, 2025

Make calls to BM25Scorer#score inlinable. #15082

Closed

jpountz merged commit 772f63c into apache:main Aug 18, 2025
8 checks passed

jpountz deleted the vectorize_sim_scores branch August 18, 2025 05:44

This was referenced Aug 18, 2025

Filter by maxOtherDoc first to reduce the overhead of filterCompetitiveHits #15081

Open

Use BulkSimScorer on MultiNormsLeafSimScorer #15101

Merged

jpountz mentioned this pull request Aug 22, 2025

Fix GroupVIntBenchmark and some tests to actually use DataInput#readGroupVInt; deprecate/remove obsolete long[] code #15104

Merged

jpountz mentioned this pull request Sep 3, 2025

Use the bulk SimScorer#score API to compute impact scores. #15151

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Better vectorize score computations. #15039

Better vectorize score computations. #15039

Uh oh!

jpountz commented Aug 8, 2025

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

jpountz commented Aug 8, 2025

Uh oh!

gf2121 left a comment

Uh oh!

gf2121 Aug 8, 2025

Uh oh!

jpountz Aug 8, 2025

Uh oh!

Uh oh!

jpountz commented Aug 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Better vectorize score computations. #15039

Better vectorize score computations. #15039

Uh oh!

Conversation

jpountz commented Aug 8, 2025

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

jpountz commented Aug 8, 2025

Uh oh!

gf2121 left a comment

Choose a reason for hiding this comment

Uh oh!

gf2121 Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

jpountz Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jpountz commented Aug 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants