Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented May 31, 2025

This essentially reverts the change from #14701 for conjunctive queries that have not reached their totalHitsThreshold yet. This should speed up queries whose total number of matches is in the order of totalHitsThreshold or less, such as filtered conjunctions on nightly benchmarks.

…icks in.

This essentially reverts the change from apache#14701 for conjunctive queries that
have not reached their `totalHitsThreshold` yet. This should speed up queries
whose total number of matches is in the order of `totalHitsThreshold` or less,
such as filtered conjunctions on nightly benchmarks.
@jpountz
Copy link
Contributor Author

jpountz commented May 31, 2025

On wikibigall:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
             And2Terms2StopWords      183.51     (11.4%)      180.04     (11.7%)   -1.9% ( -22% -   23%) 0.676
                        Or3Terms      185.48     (12.5%)      182.55     (13.2%)   -1.6% ( -24% -   27%) 0.753
                       And3Terms      193.71     (13.1%)      190.89     (13.3%)   -1.5% ( -24% -   28%) 0.778
                    AndStopWords       33.42     (15.4%)       32.97     (17.6%)   -1.4% ( -29% -   37%) 0.835
                            Term      579.62     (10.7%)      574.50     (10.2%)   -0.9% ( -19% -   22%) 0.830
                        PKLookup      310.82      (4.4%)      308.13      (4.4%)   -0.9% (  -9% -    8%) 0.618
              Or2Terms2StopWords      179.34     (10.9%)      177.86     (11.8%)   -0.8% ( -21% -   24%) 0.853
                       CountTerm     8514.91      (2.6%)     8446.93      (3.2%)   -0.8% (  -6% -    5%) 0.486
                     OrStopWords       36.05     (16.3%)       35.79     (18.8%)   -0.7% ( -30% -   41%) 0.916
                     AndHighHigh       53.90     (15.2%)       53.54     (14.8%)   -0.7% ( -26% -   34%) 0.911
                FilteredOr3Terms      164.88      (1.3%)      163.93      (1.6%)   -0.6% (  -3% -    2%) 0.313
                 AndHighOrMedMed       46.05      (2.0%)       45.81      (2.0%)   -0.5% (  -4% -    3%) 0.509
                      AndHighMed      177.64     (12.2%)      176.85     (12.1%)   -0.4% ( -22% -   27%) 0.925
               TermDayOfYearSort      285.54      (2.0%)      284.27      (2.4%)   -0.4% (  -4% -    4%) 0.608
               FilteredOrHighMed      150.96      (1.4%)      150.30      (1.2%)   -0.4% (  -2% -    2%) 0.377
                          IntNRQ      306.35      (1.0%)      305.04      (1.4%)   -0.4% (  -2% -    1%) 0.365
                      DismaxTerm      671.60      (8.4%)      668.73      (7.0%)   -0.4% ( -14% -   16%) 0.888
      FilteredOr2Terms2StopWords      145.30      (1.1%)      144.83      (1.3%)   -0.3% (  -2% -    2%) 0.494
                     CountPhrase        4.10      (3.2%)        4.09      (3.5%)   -0.3% (  -6% -    6%) 0.839
                   TermMonthSort     3176.82      (2.5%)     3168.80      (3.3%)   -0.3% (  -5% -    5%) 0.826
                      TermDTSort      390.49      (3.5%)      389.77      (4.6%)   -0.2% (  -7% -    8%) 0.909
             CountFilteredPhrase       25.30      (2.1%)       25.26      (2.1%)   -0.2% (  -4% -    4%) 0.831
                IntervalsOrdered        2.25      (2.9%)        2.25      (3.6%)   -0.2% (  -6% -    6%) 0.902
             FilteredOrStopWords       44.91      (1.8%)       44.84      (2.0%)   -0.1% (  -3% -    3%) 0.851
                  FilteredIntNRQ      298.82      (1.0%)      298.50      (1.4%)   -0.1% (  -2% -    2%) 0.828
              FilteredOrHighHigh       66.44      (1.5%)       66.37      (1.2%)   -0.1% (  -2% -    2%) 0.850
                AndMedOrHighHigh       78.56      (6.2%)       78.49      (6.2%)   -0.1% ( -11% -   13%) 0.974
                          Fuzzy1       95.30      (2.7%)       95.28      (4.2%)   -0.0% (  -6% -    7%) 0.988
                      OrHighRare      283.39      (8.2%)      283.47      (7.0%)    0.0% ( -14% -   16%) 0.992
         CountFilteredOrHighHigh      136.15      (0.9%)      136.36      (0.5%)    0.2% (  -1% -    1%) 0.610
          CountFilteredOrHighMed      147.59      (0.8%)      147.92      (0.5%)    0.2% (  -1% -    1%) 0.375
                 CountAndHighMed      311.40      (1.5%)      312.38      (1.5%)    0.3% (  -2% -    3%) 0.591
                  FilteredOrMany       16.49      (1.5%)       16.54      (1.8%)    0.3% (  -2% -    3%) 0.612
                        Wildcard       91.39      (1.6%)       91.70      (2.2%)    0.3% (  -3% -    4%) 0.662
             CountFilteredOrMany       27.40      (1.6%)       27.50      (1.0%)    0.4% (  -2% -    2%) 0.448
                  CountOrHighMed      366.59      (2.3%)      368.09      (1.9%)    0.4% (  -3% -    4%) 0.616
                          Fuzzy2       80.77      (2.3%)       81.12      (3.5%)    0.4% (  -5% -    6%) 0.709
                  FilteredPhrase       32.73      (1.6%)       32.88      (1.8%)    0.4% (  -2% -    3%) 0.499
                          OrMany       19.65      (8.7%)       19.76      (9.9%)    0.5% ( -16% -   20%) 0.882
                       OrHighMed      234.04     (10.3%)      235.53     (10.6%)    0.6% ( -18% -   23%) 0.877
                          Phrase       14.20      (3.0%)       14.30      (2.7%)    0.7% (  -4% -    6%) 0.554
                    CombinedTerm       31.45      (3.6%)       31.66      (2.6%)    0.7% (  -5% -    7%) 0.590
                 DismaxOrHighMed      178.26      (6.7%)      179.54      (6.9%)    0.7% ( -12% -   15%) 0.788
                    FilteredTerm      158.20      (2.4%)      159.52      (2.0%)    0.8% (  -3% -    5%) 0.341
               CombinedOrHighMed       72.81      (2.3%)       73.42      (1.9%)    0.8% (  -3% -    5%) 0.317
                DismaxOrHighHigh      118.90      (7.3%)      119.91      (7.1%)    0.9% ( -12% -   16%) 0.763
                 CountOrHighHigh      345.64      (2.2%)      348.61      (2.1%)    0.9% (  -3% -    5%) 0.312
                   TermTitleSort       86.50      (5.5%)       87.40      (7.2%)    1.0% ( -11% -   14%) 0.681
                         Prefix3      159.82      (2.4%)      161.51      (2.9%)    1.1% (  -4% -    6%) 0.312
                CountAndHighHigh      357.97      (1.9%)      361.80      (2.0%)    1.1% (  -2% -    5%) 0.159
                 FilteredPrefix3      150.59      (2.1%)      152.34      (2.5%)    1.2% (  -3% -    5%) 0.209
                     CountOrMany       30.36      (1.5%)       30.74      (1.6%)    1.2% (  -1% -    4%) 0.044
                      OrHighHigh       62.49     (13.6%)       63.44     (14.5%)    1.5% ( -23% -   34%) 0.783
              CombinedOrHighHigh       18.43      (3.6%)       18.73      (1.8%)    1.6% (  -3% -    7%) 0.148
              CombinedAndHighMed       39.74      (2.9%)       43.17      (1.8%)    8.6% (   3% -   13%) 0.000
             FilteredAndHighHigh       57.58      (6.9%)       63.06      (2.7%)    9.5% (   0% -   20%) 0.000
            FilteredAndStopWords       40.19      (5.9%)       44.35      (2.4%)   10.3% (   1% -   19%) 0.000
     FilteredAnd2Terms2StopWords      158.44      (5.4%)      177.31      (2.8%)   11.9% (   3% -   21%) 0.000
              FilteredAndHighMed      118.43      (9.5%)      134.44      (8.0%)   13.5% (  -3% -   34%) 0.000
             CombinedAndHighHigh       11.20      (3.4%)       12.98      (2.2%)   15.9% (   9% -   22%) 0.000
               FilteredAnd3Terms      159.89      (7.2%)      189.66      (5.6%)   18.6% (   5% -   33%) 0.000

float maxWindowScore = computeMaxScore(windowMin, windowMax);
scoreWindowScoreFirst(collector, acceptDocs, windowMin, windowMax + 1, maxWindowScore);
} else {
scoreWindowDocFirst(collector, acceptDocs, windowMin, windowMax + 1);
Copy link
Contributor

@gf2121 gf2121 May 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So minCompetitiveScore won't get a chance to be respected when filter clause leads the query because windowMax is DocIdSetIterator#NO_MORE_DOCS, could this cause regression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we've always had this problem? I remember trying to make things better but it didn't look great or caused performance regressions with term queries, the case I care about the most.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we've always had this problem?

I agree that the previous version could not skip windows, but within window, it only needs to do conjunction with the competitive docs, while this PR could evaluate more.

I'm not sure how much this will affect though. FilteredAndHighHigh tasks should provide similar case and numbers not look bad. Let's move on.

@jpountz jpountz merged commit 17a40bd into apache:main Jun 2, 2025
7 checks passed
@jpountz jpountz deleted the doc_at_a_time_when_ramping_up branch June 2, 2025 15:03
jpountz added a commit that referenced this pull request Jun 3, 2025
…icks in. (#14739)

This essentially reverts the change from #14701 for conjunctive queries that
have not reached their `totalHitsThreshold` yet. This should speed up queries
whose total number of matches is in the order of `totalHitsThreshold` or less,
such as filtered conjunctions on nightly benchmarks.
@jpountz
Copy link
Contributor Author

jpountz commented Jun 6, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants