Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented Aug 8, 2025

Existing auto-vectorization of scores is a bit fragile since it relies on SimScorer#score being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of SimScorer may be used, in particular those from FeatureField.

Furthermore, existing auto-vectorization has some room for improvement as @gf2121 highlighted at
#14679 (comment).

Existing auto-vectorization of scores is a bit fragile since it relies on
`SimScorer#score` being inlined in the for loops where it is called. This is
currently the case in nightly benchmarks, but may not be the case in the real
world where more implementations of `SimScorer` may be used, in particular
those from `FeatureField`.

Furthermore, existing auto-vectorization has some room for improvement as
@gf2121 highlighted at
apache#14679 (comment).
@github-actions
Copy link
Contributor

github-actions bot commented Aug 8, 2025

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@jpountz
Copy link
Contributor Author

jpountz commented Aug 8, 2025

Even though SimScorer#score is monomorphic when running luceneutil, this change still yields a speedup.

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   TermTitleSort       84.03      (9.4%)       82.36      (4.7%)   -2.0% ( -14% -   13%) 0.479
                    CombinedTerm       39.33      (0.8%)       39.03      (0.7%)   -0.8% (  -2% -    0%) 0.005
             FilteredAndHighHigh       78.80      (1.7%)       78.22      (1.7%)   -0.7% (  -4% -    2%) 0.254
            FilteredAndStopWords       65.21      (1.8%)       64.75      (2.0%)   -0.7% (  -4% -    3%) 0.328
                      TermDTSort      384.93      (3.0%)      383.03      (2.8%)   -0.5% (  -6% -    5%) 0.652
                  FilteredPhrase       32.27      (1.1%)       32.17      (1.2%)   -0.3% (  -2% -    2%) 0.485
                       CountTerm     9292.09      (3.8%)     9271.80      (2.2%)   -0.2% (  -6% -    6%) 0.853
               FilteredAnd3Terms      191.13      (0.7%)      190.90      (0.8%)   -0.1% (  -1% -    1%) 0.649
                 CountAndHighMed      310.74      (1.4%)      310.39      (1.4%)   -0.1% (  -2% -    2%) 0.832
                CountAndHighHigh      360.15      (2.1%)      359.87      (1.9%)   -0.1% (  -3% -    3%) 0.918
                FilteredOr3Terms      167.20      (1.0%)      167.09      (0.8%)   -0.1% (  -1% -    1%) 0.843
                     CountPhrase        4.26      (1.2%)        4.26      (1.6%)   -0.0% (  -2% -    2%) 0.964
         CountFilteredOrHighHigh      137.25      (0.9%)      137.24      (0.7%)   -0.0% (  -1% -    1%) 0.998
     FilteredAnd2Terms2StopWords      215.45      (0.9%)      215.45      (0.8%)   -0.0% (  -1% -    1%) 0.999
             CombinedAndHighHigh       23.39      (1.1%)       23.40      (0.9%)    0.0% (  -1% -    2%) 0.924
              CombinedOrHighHigh       23.19      (1.4%)       23.21      (0.9%)    0.1% (  -2% -    2%) 0.883
          CountFilteredOrHighMed      149.14      (0.7%)      149.25      (0.6%)    0.1% (  -1% -    1%) 0.765
             CountFilteredOrMany       27.26      (1.4%)       27.29      (1.1%)    0.1% (  -2% -    2%) 0.862
              FilteredAndHighMed      156.46      (1.0%)      156.63      (1.0%)    0.1% (  -1% -    2%) 0.790
               FilteredOrHighMed      153.23      (1.0%)      153.40      (1.0%)    0.1% (  -1% -    2%) 0.765
                  FilteredOrMany       16.50      (2.4%)       16.53      (1.6%)    0.2% (  -3% -    4%) 0.843
               CombinedOrHighMed       88.23      (0.9%)       88.39      (0.7%)    0.2% (  -1% -    1%) 0.553
              CombinedAndHighMed       89.40      (0.8%)       89.59      (0.9%)    0.2% (  -1% -    1%) 0.510
                AndMedOrHighHigh       88.85      (1.1%)       89.04      (1.3%)    0.2% (  -2% -    2%) 0.619
                  CountOrHighMed      362.64      (1.4%)      363.48      (1.3%)    0.2% (  -2% -    3%) 0.653
             CountFilteredPhrase       25.46      (1.7%)       25.52      (1.5%)    0.3% (  -2% -    3%) 0.669
                          OrMany       23.56      (2.5%)       23.63      (2.2%)    0.3% (  -4% -    5%) 0.744
                  FilteredIntNRQ      298.55      (0.6%)      299.46      (0.8%)    0.3% (  -1% -    1%) 0.261
      FilteredOr2Terms2StopWords      147.06      (1.0%)      147.54      (1.1%)    0.3% (  -1% -    2%) 0.415
                 CountOrHighHigh      343.27      (2.1%)      344.40      (2.0%)    0.3% (  -3% -    4%) 0.672
                    FilteredTerm      162.14      (2.8%)      162.70      (2.2%)    0.3% (  -4% -    5%) 0.718
                     CountOrMany       29.15      (1.4%)       29.27      (1.2%)    0.4% (  -2% -    3%) 0.420
              FilteredOrHighHigh       66.92      (2.1%)       67.25      (1.8%)    0.5% (  -3% -    4%) 0.503
             FilteredOrStopWords       45.53      (2.3%)       45.78      (2.1%)    0.6% (  -3% -    5%) 0.504
                 FilteredPrefix3      151.12      (2.5%)      152.02      (1.8%)    0.6% (  -3% -    5%) 0.467
                   TermMonthSort     3335.12      (2.1%)     3355.82      (2.1%)    0.6% (  -3% -    4%) 0.434
                       And3Terms      242.26      (1.1%)      244.68      (1.0%)    1.0% (  -1% -    3%) 0.012
                 AndHighOrMedMed       50.85      (1.6%)       51.36      (1.1%)    1.0% (  -1% -    3%) 0.059
             And2Terms2StopWords      205.51      (1.4%)      208.13      (1.3%)    1.3% (  -1% -    3%) 0.011
               TermDayOfYearSort      284.17      (2.6%)      288.16      (1.5%)    1.4% (  -2% -    5%) 0.077
                        Or3Terms      232.40      (1.6%)      235.89      (1.3%)    1.5% (  -1% -    4%) 0.006
              Or2Terms2StopWords      205.85      (1.5%)      209.59      (1.2%)    1.8% (   0% -    4%) 0.000
                      AndHighMed      203.33      (1.5%)      207.11      (1.5%)    1.9% (  -1% -    4%) 0.001
                    AndStopWords       47.25      (2.3%)       48.13      (2.0%)    1.9% (  -2% -    6%) 0.022
                            Term      663.37      (5.9%)      675.77      (5.7%)    1.9% (  -9% -   14%) 0.397
                     AndHighHigh       69.26      (2.4%)       70.92      (2.0%)    2.4% (  -1% -    6%) 0.004
                       OrHighMed      257.23      (2.2%)      263.59      (1.8%)    2.5% (  -1% -    6%) 0.001
                     OrStopWords       48.68      (3.1%)       50.25      (2.1%)    3.2% (  -1% -    8%) 0.001
                      OrHighHigh       77.59      (2.9%)       80.15      (2.1%)    3.3% (  -1% -    8%) 0.001
                      OrHighRare      285.82      (9.8%)      299.46      (6.3%)    4.8% ( -10% -   23%) 0.125

There are still a number of places where score computations are not vectorized, specifically:

  • When computing scores from impacts,
  • MaxScore's non-essential clauses,
  • All BlockMaxConjunctionBulkScorer's clauses but the leading clause.

Copy link
Contributor

@gf2121 gf2121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sticking and fixing thread-safe issue!

I played with similar approach around a month ago but did not see a noticeable improvement. Maybe our recent optimizations make it appear.

There are still a number of places where score computations are not vectorized, specifically:

When computing scores from impacts,
MaxScore's non-essential clauses,
All BlockMaxConjunctionBulkScorer's clauses but the leading clause.

++. I guess MultiNormsLeafSimScorer#scoreRange that used by CombinedFieldQuery can be optimized as well.

*
* <p><b>NOTE</b>: The returned instance is not thread-safe.
*/
public BulkSimScorer asBulkSimScorer() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change of AssertingSimilarity is not included in this patch, is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's a miss!

jpountz added a commit to jpountz/luceneutil that referenced this pull request Aug 12, 2025
Because nightly benchmarks only test a small set of scenarios, the JVM may end
up over-optimizing query evaluation. For instance, it only runs with
BM25Similarity, sorting tasks only run against a TermQuery, filtered vector
search only exercises the approximate path, not the exact path, etc.

This tries to make the benchmark more realistic by running some cheap queries
before running bencharks, whose goal is to pollute call sites so that they are
not all magically monomorphic.

This will translate in a drop in performance for some tasks, but hopefully we
can recover some of it in the future.

Related PR:
 - apache/lucene#14968 where we suspected the speedup
   to be due to specialization making a call site monomorphic in nightly
   benchmarks that would not be monomorphic in the real world,
 - apache/lucene#15039 where we are trying to improve
   behavior with several different similarity impls but the benchmarks only
   show a small improvement since they always run with BM25Similarity.
jpountz added a commit to mikemccand/luceneutil that referenced this pull request Aug 13, 2025
Because nightly benchmarks only test a small set of scenarios, the JVM may end
up over-optimizing query evaluation. For instance, it only runs with
BM25Similarity, sorting tasks only run against a TermQuery, filtered vector
search only exercises the approximate path, not the exact path, etc.

This tries to make the benchmark more realistic by running some cheap queries
before running bencharks, whose goal is to pollute call sites so that they are
not all magically monomorphic.

This will translate in a drop in performance for some tasks, but hopefully we
can recover some of it in the future.

Related PR:
 - apache/lucene#14968 where we suspected the speedup
   to be due to specialization making a call site monomorphic in nightly
   benchmarks that would not be monomorphic in the real world,
 - apache/lucene#15039 where we are trying to improve
   behavior with several different similarity impls but the benchmarks only
   show a small improvement since they always run with BM25Similarity.
jpountz added a commit to jpountz/lucene that referenced this pull request Aug 17, 2025
I ran experiments locally that suggest that some of the performance decrease
from type pollution (mikemccand/luceneutil#436)
can be attributed to calls to `SimScorer#score` no longer being inlinable since
they are polymorphic. This change helps `BM25Scorer` remain inlinable using
similar tricks that we are applying for `Bits#get` and
`ImpactsEnum#nextDoc`/`ImpactsEnum#advance`.

Hopefully changes such as apache#15039 will help improve performance with other
similarities as well in the future.
@jpountz jpountz merged commit 772f63c into apache:main Aug 18, 2025
8 checks passed
@jpountz jpountz deleted the vectorize_sim_scores branch August 18, 2025 05:44
@jpountz
Copy link
Contributor Author

jpountz commented Aug 23, 2025

I pushed an annotation, this helped more than what my local benchmarking reported!

jpountz added a commit that referenced this pull request Aug 30, 2025
Existing auto-vectorization of scores is a bit fragile since it relies on
`SimScorer#score` being inlined in the for loops where it is called. This is
currently the case in nightly benchmarks, but may not be the case in the real
world where more implementations of `SimScorer` may be used, in particular
those from `FeatureField`.

Furthermore, existing auto-vectorization has some room for improvement as
@gf2121 highlighted at
#14679 (comment).
jpountz added a commit to jpountz/lucene that referenced this pull request Sep 3, 2025
In apache#15039 we introduced a bulk `SimScorer#score` API and used it to compute
scores with the leading conjunctive clause and "essential" clauses of
disjunctive queries. With this PR, we are now also using this bulk API when
translating (term frequency, length normalization factor) pairs into the
maximum possible score that a block of postings may produce.

To do it right, I had to change the impacts API to no longer return a List of
(term freq, norm) pairs, but instead two parallel arrays of term frequencies
and norms that could (almost) directly be passed to the `SimScorer#score` bulk
API. Unfortunately this makes the change quite big since many backward formats
had to be touched.
jpountz added a commit that referenced this pull request Sep 8, 2025
In #15039 we introduced a bulk `SimScorer#score` API and used it to compute
scores with the leading conjunctive clause and "essential" clauses of
disjunctive queries. With this PR, we are now also using this bulk API when
translating (term frequency, length normalization factor) pairs into the
maximum possible score that a block of postings may produce.

To do it right, I had to change the impacts API to no longer return a List of
(term freq, norm) pairs, but instead two parallel arrays of term frequencies
and norms that could (almost) directly be passed to the `SimScorer#score` bulk
API. Unfortunately this makes the change quite big since many backward formats
had to be touched.

Co-authored-by: Guo Feng <[email protected]>
jpountz added a commit that referenced this pull request Sep 8, 2025
In #15039 we introduced a bulk `SimScorer#score` API and used it to compute
scores with the leading conjunctive clause and "essential" clauses of
disjunctive queries. With this PR, we are now also using this bulk API when
translating (term frequency, length normalization factor) pairs into the
maximum possible score that a block of postings may produce.

To do it right, I had to change the impacts API to no longer return a List of
(term freq, norm) pairs, but instead two parallel arrays of term frequencies
and norms that could (almost) directly be passed to the `SimScorer#score` bulk
API. Unfortunately this makes the change quite big since many backward formats
had to be touched.

Co-authored-by: Guo Feng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants