Skip to content

Conversation

@HUSTERGS
Copy link
Contributor

Description

This PR propose to introduce a new forEach api on Impacts. It seems to be helpful to reduce the cost of MaxScoreCache.computeMaxScore. I've tried many other ways, to avoid adding a new api but failed to get the same performance gain. like explictly implement MutableImpactList.forEach (also included in this PR for now).

Actually I'm not very sure about the root cause of the performance gain, especilly when the explict forEach for MutableImpactList didn't work out well

This PR is still a draft, includes many duplicate code, tests are not added, nor is the java doc. I'm not sure whether this change is desired, any comments and suggestions are welcome !!

Here is the luceneutil benchmark result on wikimediumall with searchConcurrency=0, taskCountPerCat=5, taskRepeatCount=50 after 20 iterations:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                    CombinedTerm       11.07      (4.3%)       10.90      (5.1%)   -1.5% ( -10% -    8%) 0.302
              CombinedOrHighHigh        5.48      (3.9%)        5.44      (4.5%)   -0.8% (  -8% -    8%) 0.556
                    AndStopWords        8.18      (6.3%)        8.12      (6.3%)   -0.8% ( -12% -   12%) 0.699
                      OrHighHigh       20.55      (6.8%)       20.40      (4.4%)   -0.7% ( -11% -   11%) 0.682
                        SpanNear        2.51      (4.5%)        2.50      (5.0%)   -0.3% (  -9% -    9%) 0.828
             CombinedAndHighHigh        5.60      (2.3%)        5.58      (2.5%)   -0.3% (  -4% -    4%) 0.691
                  FilteredPhrase        9.93      (2.3%)        9.91      (1.9%)   -0.2% (  -4% -    4%) 0.740
                          Phrase        7.63      (2.5%)        7.63      (2.9%)   -0.1% (  -5% -    5%) 0.913
         CountFilteredOrHighHigh       15.92      (0.5%)       15.92      (0.7%)    0.0% (  -1% -    1%) 0.932
                CountAndHighHigh       48.66      (1.7%)       48.69      (1.9%)    0.1% (  -3% -    3%) 0.922
                    SloppyPhrase        1.13      (4.1%)        1.13      (4.0%)    0.1% (  -7% -    8%) 0.958
          CountFilteredOrHighMed       18.00      (0.4%)       18.02      (0.5%)    0.1% (   0% -    1%) 0.587
                IntervalsOrdered        2.46      (2.9%)        2.46      (2.5%)    0.1% (  -5% -    5%) 0.899
               CombinedOrHighMed       20.74      (4.3%)       20.77      (4.7%)    0.1% (  -8% -    9%) 0.934
            FilteredAndStopWords        8.37      (2.2%)        8.38      (2.2%)    0.2% (  -4% -    4%) 0.760
                     CountOrMany        5.06      (2.2%)        5.07      (2.2%)    0.2% (  -4% -    4%) 0.753
                     OrStopWords        8.54      (7.7%)        8.56      (6.6%)    0.2% ( -13% -   15%) 0.917
             FilteredAndHighHigh       10.35      (2.2%)       10.37      (2.2%)    0.3% (  -4% -    4%) 0.698
              FilteredAndHighMed       31.51      (1.8%)       31.59      (1.6%)    0.3% (  -3% -    3%) 0.618
                       And3Terms       70.79      (4.2%)       71.00      (3.4%)    0.3% (  -6% -    8%) 0.808
               TermDayOfYearSort      270.56      (2.5%)      271.46      (2.4%)    0.3% (  -4% -    5%) 0.667
                 FilteredPrefix3       71.08      (3.6%)       71.33      (3.3%)    0.3% (  -6% -    7%) 0.754
                         Prefix3       75.79      (3.7%)       76.06      (3.6%)    0.4% (  -6% -    7%) 0.756
             CountFilteredIntNRQ       16.30      (1.3%)       16.36      (1.2%)    0.4% (  -2% -    2%) 0.366
                        Wildcard       47.59      (3.0%)       47.77      (3.6%)    0.4% (  -6% -    7%) 0.727
                 CountAndHighMed       75.05      (2.6%)       75.41      (2.3%)    0.5% (  -4% -    5%) 0.541
                     AndHighHigh       20.77      (4.5%)       20.87      (4.4%)    0.5% (  -8% -    9%) 0.731
                 CountOrHighHigh       50.08      (2.0%)       50.32      (1.6%)    0.5% (  -3% -    4%) 0.403
             CountFilteredOrMany        4.47      (2.3%)        4.50      (2.3%)    0.5% (  -3% -    5%) 0.475
                     CountPhrase        2.64      (4.1%)        2.65      (4.0%)    0.5% (  -7% -    8%) 0.682
              CombinedAndHighMed       21.17      (4.2%)       21.29      (4.0%)    0.6% (  -7% -    9%) 0.669
                   TermMonthSort     2323.87      (2.3%)     2336.78      (2.6%)    0.6% (  -4% -    5%) 0.477
             CountFilteredPhrase        9.13      (2.7%)        9.18      (2.8%)    0.6% (  -4% -    6%) 0.523
                  CountOrHighMed       77.64      (2.7%)       78.19      (2.2%)    0.7% (  -4% -    5%) 0.353
                          IntSet      295.62      (4.6%)      297.80      (5.1%)    0.7% (  -8% -   10%) 0.632
                FilteredOr3Terms       44.25      (3.0%)       44.59      (3.0%)    0.8% (  -5% -    6%) 0.426
                  FilteredOrMany        4.04      (2.6%)        4.07      (2.5%)    0.8% (  -4% -    6%) 0.303
                          OrMany        4.62      (3.1%)        4.66      (4.3%)    0.9% (  -6% -    8%) 0.470
                          IntNRQ       42.14      (3.5%)       42.51      (3.0%)    0.9% (  -5% -    7%) 0.404
                      AndHighMed       52.89      (4.0%)       53.36      (2.9%)    0.9% (  -5% -    8%) 0.422
                       OrHighMed       68.59      (5.5%)       69.25      (3.8%)    1.0% (  -7% -   10%) 0.521
                        Or3Terms       64.60      (5.9%)       65.23      (4.0%)    1.0% (  -8% -   11%) 0.543
                  FilteredIntNRQ       41.84      (3.5%)       42.25      (3.1%)    1.0% (  -5% -    7%) 0.349
                AndMedOrHighHigh       16.59      (2.1%)       16.75      (2.1%)    1.0% (  -3% -    5%) 0.133
                          Fuzzy1       40.79      (3.1%)       41.20      (3.8%)    1.0% (  -5% -    8%) 0.365
                      TermDTSort      144.75      (4.5%)      146.29      (4.3%)    1.1% (  -7% -   10%) 0.442
                          Fuzzy2       36.99      (2.9%)       37.39      (3.6%)    1.1% (  -5% -    7%) 0.291
               FilteredAnd3Terms      101.38      (3.2%)      102.49      (2.9%)    1.1% (  -4% -    7%) 0.258
              FilteredOrHighHigh       13.11      (2.1%)       13.25      (2.1%)    1.1% (  -2% -    5%) 0.095
      FilteredOr2Terms2StopWords       50.55      (4.0%)       51.13      (4.2%)    1.1% (  -6% -    9%) 0.373
               FilteredOrHighMed       39.28      (3.2%)       39.74      (3.1%)    1.2% (  -4% -    7%) 0.229
              Or2Terms2StopWords       61.37      (6.3%)       62.12      (5.4%)    1.2% (  -9% -   13%) 0.510
                   TermTitleSort       51.33      (5.3%)       51.96      (5.3%)    1.2% (  -8% -   12%) 0.466
             FilteredOrStopWords        8.21      (1.9%)        8.31      (2.1%)    1.3% (  -2% -    5%) 0.045
                 AndHighOrMedMed       14.07      (2.3%)       14.26      (2.6%)    1.3% (  -3% -    6%) 0.088
                DismaxOrHighHigh       35.29      (3.9%)       35.82      (2.9%)    1.5% (  -5% -    8%) 0.163
                    FilteredTerm       66.09      (3.9%)       67.15      (2.6%)    1.6% (  -4% -    8%) 0.125
                 DismaxOrHighMed       50.76      (3.4%)       51.58      (2.8%)    1.6% (  -4% -    8%) 0.094
     FilteredAnd2Terms2StopWords       59.42      (4.4%)       60.46      (4.0%)    1.8% (  -6% -   10%) 0.188
                       CountTerm     6836.11      (3.5%)     6958.93      (3.0%)    1.8% (  -4% -    8%) 0.080
             And2Terms2StopWords       59.08      (6.4%)       60.19      (5.8%)    1.9% (  -9% -   15%) 0.332
                         Respell       36.44      (3.4%)       37.12      (3.8%)    1.9% (  -5% -    9%) 0.098
                      DismaxTerm      521.82      (3.8%)      543.14      (3.3%)    4.1% (  -2% -   11%) 0.000
                          Term1M      475.55      (5.4%)      506.55      (3.8%)    6.5% (  -2% -   16%) 0.000
                         Term10K      475.63      (5.4%)      506.91      (3.6%)    6.6% (  -2% -   16%) 0.000
                         Term100      476.02      (5.3%)      507.50      (3.6%)    6.6% (  -2% -   16%) 0.000
                         TermB1M      475.91      (5.4%)      507.55      (3.7%)    6.6% (  -2% -   16%) 0.000
                       TermB1M1P      475.69      (5.4%)      507.37      (3.8%)    6.7% (  -2% -   16%) 0.000
                            Term      475.67      (5.3%)      507.76      (3.8%)    6.7% (  -2% -   16%) 0.000
                      OrHighRare       95.38      (5.6%)      102.70      (4.4%)    7.7% (  -2% -   18%) 0.000

@jpountz
Copy link
Contributor

jpountz commented Jul 10, 2025

Thanks for identifying this room for improvement. I'm a bit hesitant about the extra complexity since Term and OrHighRare are among the fastest queries already. Maybe something to keep in the back of our minds if we find more interesting queries in the future where computing max scores is a bottleneck?

@HUSTERGS
Copy link
Contributor Author

Thanks for your explaination! I got your point, lets close this PR for now : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants