Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented May 23, 2025

Existing vectorization of scores is a bit fragile since it relies on SimScorer#score being inlined in the for loops where it is called. This is currently the case in nightly benchmarks, but may not be the case in the real world where more implementations of SimScorer may be used, in particular those from FeatureField.

Furthermore, existing vectorization has some room for improvement as @gf2121 highlighted at
#14679 (comment).

Existing vectorization of scores is a bit fragile since it relies on
`SimScorer#score` being inlined in the for loops where it is called. This is
currently the case in nightly benchmarks, but may not be the case in the real
world where more implementations of `SimScorer` may be used, in particular
those from `FeatureField`.

Furthermore, existing vectorization has some room for improvement as @gf2121
highlighted at
apache#14679 (comment).
@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@jpountz jpountz requested review from gf2121 and rmuir May 23, 2025 13:33
* Bulk computation of scores. For each entry in the given {@code buffer}, compute the score of
* the document and set its value in the {@code scores} array at the same index.
*
* <p><b>NOTE</b>: Doc IDs must be sorted, with no duplicates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we override it in AssertingSimilarity to check these limitations?

}
} else {
if (normInverses == null || normInverses.length < buffer.size) {
normInverses = new float[ArrayUtil.oversize(buffer.size, Float.BYTES)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this new array or we can just reuse scores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh, nice idea.

@jpountz
Copy link
Contributor Author

jpountz commented May 23, 2025

Great feedback once again @gf2121 !

@jpountz
Copy link
Contributor Author

jpountz commented May 23, 2025

Here is what luceneutil has to say for exhaustive evaluation (totalHitsThreshold=Integer.MAX_VALUE):

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                            Term       74.75      (2.1%)       73.14      (4.2%)   -2.2% (  -8% -    4%) 0.073
                        PKLookup      318.63      (4.8%)      315.28      (5.8%)   -1.1% ( -11% -    9%) 0.586
                IntervalsOrdered        2.31      (4.0%)        2.29      (3.1%)   -0.7% (  -7% -    6%) 0.606
                        Wildcard       18.53      (6.3%)       18.45      (7.8%)   -0.4% ( -13% -   14%) 0.867
                CountAndHighHigh      359.81      (1.5%)      358.64      (1.7%)   -0.3% (  -3% -    2%) 0.572
                   TermMonthSort     3170.36      (2.9%)     3160.30      (2.4%)   -0.3% (  -5% -    5%) 0.746
                 CountAndHighMed      299.33      (1.5%)      298.56      (1.4%)   -0.3% (  -3% -    2%) 0.617
          CountFilteredOrHighMed      148.00      (0.7%)      147.73      (0.7%)   -0.2% (  -1% -    1%) 0.474
                 CountOrHighHigh      347.23      (1.6%)      346.72      (1.5%)   -0.1% (  -3% -    3%) 0.794
                      OrHighRare       21.29      (2.7%)       21.26      (3.7%)   -0.1% (  -6% -    6%) 0.908
              CombinedOrHighHigh        6.25      (3.7%)        6.24      (3.0%)   -0.1% (  -6% -    6%) 0.914
               CombinedOrHighMed        9.60      (3.6%)        9.59      (3.0%)   -0.1% (  -6% -    6%) 0.925
                 DismaxOrHighMed       10.60      (2.4%)       10.60      (3.0%)   -0.1% (  -5% -    5%) 0.934
                          Fuzzy1       96.54      (2.4%)       96.46      (1.7%)   -0.1% (  -4% -    4%) 0.918
                         Prefix3       14.98      (7.1%)       14.97      (8.1%)   -0.1% ( -14% -   16%) 0.985
         CountFilteredOrHighHigh      136.46      (0.9%)      136.43      (0.6%)   -0.0% (  -1% -    1%) 0.938
                    CombinedTerm       31.89      (3.0%)       31.88      (3.5%)   -0.0% (  -6% -    6%) 0.993
             CountFilteredOrMany       27.57      (1.6%)       27.56      (1.5%)   -0.0% (  -3% -    3%) 0.994
                          Fuzzy2       87.61      (2.4%)       87.66      (1.7%)    0.1% (  -3% -    4%) 0.947
                DismaxOrHighHigh        6.90      (2.2%)        6.90      (2.9%)    0.1% (  -4% -    5%) 0.929
                     CountOrMany       30.40      (1.6%)       30.43      (2.0%)    0.1% (  -3% -    3%) 0.889
                       OrHighMed       29.23      (4.9%)       29.26      (6.8%)    0.1% ( -11% -   12%) 0.961
                       CountTerm     8418.37      (3.6%)     8429.79      (4.0%)    0.1% (  -7% -    8%) 0.922
              FilteredAndHighMed      138.18      (1.5%)      138.39      (1.2%)    0.2% (  -2% -    2%) 0.749
                 FilteredPrefix3       29.68      (3.7%)       29.73      (3.1%)    0.2% (  -6% -    7%) 0.902
                      DismaxTerm       44.65      (4.6%)       44.73      (6.1%)    0.2% ( -10% -   11%) 0.931
                FilteredOr3Terms       52.80      (2.4%)       52.89      (2.7%)    0.2% (  -4% -    5%) 0.848
                    FilteredTerm      112.37      (1.5%)      112.62      (0.9%)    0.2% (  -2% -    2%) 0.628
               TermDayOfYearSort      285.08      (4.0%)      285.98      (3.0%)    0.3% (  -6% -    7%) 0.808
     FilteredAnd2Terms2StopWords      194.14      (1.4%)      194.86      (1.1%)    0.4% (  -2% -    2%) 0.425
               FilteredAnd3Terms      221.69      (1.0%)      222.51      (0.9%)    0.4% (  -1% -    2%) 0.280
                          Phrase        3.04      (2.3%)        3.05      (1.4%)    0.4% (  -3% -    4%) 0.558
                       And3Terms      147.00      (4.0%)      147.60      (2.0%)    0.4% (  -5% -    6%) 0.727
             FilteredAndHighHigh       71.07      (2.9%)       71.36      (1.2%)    0.4% (  -3% -    4%) 0.614
                        Or3Terms       40.07      (3.7%)       40.24      (5.8%)    0.4% (  -8% -   10%) 0.806
                     CountPhrase        4.18      (2.8%)        4.19      (1.6%)    0.4% (  -3% -    4%) 0.586
                  FilteredOrMany        5.74      (3.9%)        5.77      (2.4%)    0.5% (  -5% -    7%) 0.651
                  CountOrHighMed      336.79      (2.5%)      338.77      (2.1%)    0.6% (  -3% -    5%) 0.488
                  FilteredPhrase       23.17      (1.6%)       23.31      (1.1%)    0.6% (  -2% -    3%) 0.227
             CountFilteredPhrase       25.59      (1.6%)       25.75      (1.1%)    0.6% (  -2% -    3%) 0.224
               FilteredOrHighMed       56.17      (2.9%)       56.52      (2.6%)    0.6% (  -4% -    6%) 0.543
              FilteredOrHighHigh       47.21      (2.9%)       47.51      (2.4%)    0.6% (  -4% -    6%) 0.520
            FilteredAndStopWords       48.92      (3.9%)       49.33      (1.7%)    0.8% (  -4% -    6%) 0.438
              CombinedAndHighMed       41.62      (3.5%)       42.00      (2.0%)    0.9% (  -4% -    6%) 0.379
             CombinedAndHighHigh       13.04      (3.5%)       13.16      (2.1%)    0.9% (  -4% -    6%) 0.376
             FilteredOrStopWords       29.17      (3.3%)       29.44      (2.1%)    0.9% (  -4% -    6%) 0.353
                AndMedOrHighHigh       42.45      (4.0%)       42.87      (2.9%)    1.0% (  -5% -    8%) 0.443
                      OrHighHigh       20.07      (5.2%)       20.27      (7.1%)    1.0% ( -10% -   14%) 0.663
                      AndHighMed       90.98      (6.5%)       91.91      (2.6%)    1.0% (  -7% -   10%) 0.572
      FilteredOr2Terms2StopWords       26.93      (3.4%)       27.21      (2.1%)    1.0% (  -4% -    6%) 0.312
             And2Terms2StopWords       85.26      (5.6%)       86.21      (2.6%)    1.1% (  -6% -    9%) 0.487
                 AndHighOrMedMed       33.56      (3.2%)       33.95      (3.1%)    1.1% (  -4% -    7%) 0.315
                  FilteredIntNRQ       98.09      (2.4%)       99.37      (2.4%)    1.3% (  -3% -    6%) 0.136
                    AndStopWords       15.79      (9.8%)       16.04      (4.8%)    1.6% ( -11% -   17%) 0.579
                      TermDTSort      377.84      (3.4%)      384.79      (4.1%)    1.8% (  -5% -    9%) 0.181
                     AndHighHigh       26.00      (8.6%)       26.54      (3.4%)    2.1% (  -9% -   15%) 0.387
                   TermTitleSort       80.54      (5.4%)       82.55      (6.5%)    2.5% (  -8% -   15%) 0.252
              Or2Terms2StopWords        9.03      (5.6%)        9.26      (7.1%)    2.6% (  -9% -   16%) 0.273
                     OrStopWords       10.27      (5.5%)       10.55      (7.1%)    2.7% (  -9% -   16%) 0.245
                          IntNRQ       27.23      (5.3%)       28.07     (12.0%)    3.1% ( -13% -   21%) 0.367
                          OrMany        3.07      (5.4%)        3.24      (6.1%)    5.4% (  -5% -   17%) 0.010

OrMany gets a ~5.5% speedup, other queries don't seem to get a significant speedup though seeing Or2Terms2StopWords and OrStopWords next to the bottom of the list is likely not a coincidence.

this.weight = boost * idf.getValue().floatValue();
}

private float score(float freq, float normInverse) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider the name? it is too dangerous to have both these signatures, even though float and long have different size:

float score(float, float);
float score(float, long);

@rmuir
Copy link
Member

rmuir commented May 23, 2025

before adding more complexity to scores can we resolve the current problem with scoring in jenkins? https://jenkins.thetaphi.de/job/Lucene-MMAPv2-Linux/3326/

java.lang.AssertionError: Doc 1 scores don't match
TopDocs totalHits=6 hits top=6
	0) doc=1	score=1.8358624
	1) doc=4	score=1.8358624
	2) doc=7	score=1.8358624
	3) doc=3	score=1.7754854
	4) doc=0	score=1.6846089
	5) doc=6	score=0.67934644
TopDocs totalHits=4 hits top=4
	0) doc=1	score=1.8358622
	1) doc=4	score=1.8358622
	2) doc=7	score=1.8358622
	3) doc=3	score=1.7754854
for query:(data:5 data:5 data:5 data:6 +data:6 data:Z data:X -data:1)~2 expected:<1.8358622> but was:<1.8358624>
...
NOTE: reproduce with: gradlew test --tests TestBooleanMinShouldMatch.testRandomQueries -Dtests.seed=E2469ED1547BF96C -Dtests.multiplier=3 -Dtests.directory=MMapDirectory -Dtests.locale=tok-001 -Dtests.timezone=Asia/Kuching -Dtests.asserts=true -Dtests.file.encoding=UTF-8

This failure has only surfaced recently, so I think it would be good to avoid stacking more changes to scoring on top of it.

@jpountz
Copy link
Contributor Author

jpountz commented May 26, 2025

I was looking at it too as I was worried it may have been caused by recent optimizations, but it looks like an old and rare failure. I opened a tentative test fix at #14715.

@jpountz jpountz merged commit abe18a4 into apache:main Jun 10, 2025
7 checks passed
@jpountz jpountz deleted the better_vectorize_score_computations branch June 10, 2025 14:13
jpountz added a commit that referenced this pull request Jun 10, 2025
Existing vectorization of scores is a bit fragile since it relies on
`SimScorer#score` being inlined in the for loops where it is called. This is
currently the case in nightly benchmarks, but may not be the case in the real
world where more implementations of `SimScorer` may be used, in particular
those from `FeatureField`.

Furthermore, existing vectorization has some room for improvement as @gf2121
highlighted at
#14679 (comment).
jpountz added a commit that referenced this pull request Jun 10, 2025
jpountz added a commit that referenced this pull request Jun 10, 2025
@jpountz
Copy link
Contributor Author

jpountz commented Jun 10, 2025

I had to revert this PR, it's not thread-safe: I thought SimScorers were created per Scorer, but they are created per Weight and may be used across multiple threads.

@jpountz jpountz removed this from the 10.3.0 milestone Jun 10, 2025
@jpountz
Copy link
Contributor Author

jpountz commented Jun 10, 2025

@mikemccand FYI if a nightly run started with this change, it may fail due to this thread safety issue. The commit is now reverted, so restarting should be enough to fix the problem.

gf2121 pushed a commit to gf2121/lucene that referenced this pull request Jul 16, 2025
Existing vectorization of scores is a bit fragile since it relies on
`SimScorer#score` being inlined in the for loops where it is called. This is
currently the case in nightly benchmarks, but may not be the case in the real
world where more implementations of `SimScorer` may be used, in particular
those from `FeatureField`.

Furthermore, existing vectorization has some room for improvement as @gf2121
highlighted at
apache#14679 (comment).
gf2121 pushed a commit to gf2121/lucene that referenced this pull request Jul 19, 2025
Existing vectorization of scores is a bit fragile since it relies on
`SimScorer#score` being inlined in the for loops where it is called. This is
currently the case in nightly benchmarks, but may not be the case in the real
world where more implementations of `SimScorer` may be used, in particular
those from `FeatureField`.

Furthermore, existing vectorization has some room for improvement as @gf2121
highlighted at
apache#14679 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants