Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. #14739

jpountz · 2025-05-31T14:01:19Z

This essentially reverts the change from #14701 for conjunctive queries that have not reached their totalHitsThreshold yet. This should speed up queries whose total number of matches is in the order of totalHitsThreshold or less, such as filtered conjunctions on nightly benchmarks.

…icks in. This essentially reverts the change from apache#14701 for conjunctive queries that have not reached their `totalHitsThreshold` yet. This should speed up queries whose total number of matches is in the order of `totalHitsThreshold` or less, such as filtered conjunctions on nightly benchmarks.

jpountz · 2025-05-31T14:10:52Z

On wikibigall:

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
             And2Terms2StopWords      183.51     (11.4%)      180.04     (11.7%)   -1.9% ( -22% -   23%) 0.676
                        Or3Terms      185.48     (12.5%)      182.55     (13.2%)   -1.6% ( -24% -   27%) 0.753
                       And3Terms      193.71     (13.1%)      190.89     (13.3%)   -1.5% ( -24% -   28%) 0.778
                    AndStopWords       33.42     (15.4%)       32.97     (17.6%)   -1.4% ( -29% -   37%) 0.835
                            Term      579.62     (10.7%)      574.50     (10.2%)   -0.9% ( -19% -   22%) 0.830
                        PKLookup      310.82      (4.4%)      308.13      (4.4%)   -0.9% (  -9% -    8%) 0.618
              Or2Terms2StopWords      179.34     (10.9%)      177.86     (11.8%)   -0.8% ( -21% -   24%) 0.853
                       CountTerm     8514.91      (2.6%)     8446.93      (3.2%)   -0.8% (  -6% -    5%) 0.486
                     OrStopWords       36.05     (16.3%)       35.79     (18.8%)   -0.7% ( -30% -   41%) 0.916
                     AndHighHigh       53.90     (15.2%)       53.54     (14.8%)   -0.7% ( -26% -   34%) 0.911
                FilteredOr3Terms      164.88      (1.3%)      163.93      (1.6%)   -0.6% (  -3% -    2%) 0.313
                 AndHighOrMedMed       46.05      (2.0%)       45.81      (2.0%)   -0.5% (  -4% -    3%) 0.509
                      AndHighMed      177.64     (12.2%)      176.85     (12.1%)   -0.4% ( -22% -   27%) 0.925
               TermDayOfYearSort      285.54      (2.0%)      284.27      (2.4%)   -0.4% (  -4% -    4%) 0.608
               FilteredOrHighMed      150.96      (1.4%)      150.30      (1.2%)   -0.4% (  -2% -    2%) 0.377
                          IntNRQ      306.35      (1.0%)      305.04      (1.4%)   -0.4% (  -2% -    1%) 0.365
                      DismaxTerm      671.60      (8.4%)      668.73      (7.0%)   -0.4% ( -14% -   16%) 0.888
      FilteredOr2Terms2StopWords      145.30      (1.1%)      144.83      (1.3%)   -0.3% (  -2% -    2%) 0.494
                     CountPhrase        4.10      (3.2%)        4.09      (3.5%)   -0.3% (  -6% -    6%) 0.839
                   TermMonthSort     3176.82      (2.5%)     3168.80      (3.3%)   -0.3% (  -5% -    5%) 0.826
                      TermDTSort      390.49      (3.5%)      389.77      (4.6%)   -0.2% (  -7% -    8%) 0.909
             CountFilteredPhrase       25.30      (2.1%)       25.26      (2.1%)   -0.2% (  -4% -    4%) 0.831
                IntervalsOrdered        2.25      (2.9%)        2.25      (3.6%)   -0.2% (  -6% -    6%) 0.902
             FilteredOrStopWords       44.91      (1.8%)       44.84      (2.0%)   -0.1% (  -3% -    3%) 0.851
                  FilteredIntNRQ      298.82      (1.0%)      298.50      (1.4%)   -0.1% (  -2% -    2%) 0.828
              FilteredOrHighHigh       66.44      (1.5%)       66.37      (1.2%)   -0.1% (  -2% -    2%) 0.850
                AndMedOrHighHigh       78.56      (6.2%)       78.49      (6.2%)   -0.1% ( -11% -   13%) 0.974
                          Fuzzy1       95.30      (2.7%)       95.28      (4.2%)   -0.0% (  -6% -    7%) 0.988
                      OrHighRare      283.39      (8.2%)      283.47      (7.0%)    0.0% ( -14% -   16%) 0.992
         CountFilteredOrHighHigh      136.15      (0.9%)      136.36      (0.5%)    0.2% (  -1% -    1%) 0.610
          CountFilteredOrHighMed      147.59      (0.8%)      147.92      (0.5%)    0.2% (  -1% -    1%) 0.375
                 CountAndHighMed      311.40      (1.5%)      312.38      (1.5%)    0.3% (  -2% -    3%) 0.591
                  FilteredOrMany       16.49      (1.5%)       16.54      (1.8%)    0.3% (  -2% -    3%) 0.612
                        Wildcard       91.39      (1.6%)       91.70      (2.2%)    0.3% (  -3% -    4%) 0.662
             CountFilteredOrMany       27.40      (1.6%)       27.50      (1.0%)    0.4% (  -2% -    2%) 0.448
                  CountOrHighMed      366.59      (2.3%)      368.09      (1.9%)    0.4% (  -3% -    4%) 0.616
                          Fuzzy2       80.77      (2.3%)       81.12      (3.5%)    0.4% (  -5% -    6%) 0.709
                  FilteredPhrase       32.73      (1.6%)       32.88      (1.8%)    0.4% (  -2% -    3%) 0.499
                          OrMany       19.65      (8.7%)       19.76      (9.9%)    0.5% ( -16% -   20%) 0.882
                       OrHighMed      234.04     (10.3%)      235.53     (10.6%)    0.6% ( -18% -   23%) 0.877
                          Phrase       14.20      (3.0%)       14.30      (2.7%)    0.7% (  -4% -    6%) 0.554
                    CombinedTerm       31.45      (3.6%)       31.66      (2.6%)    0.7% (  -5% -    7%) 0.590
                 DismaxOrHighMed      178.26      (6.7%)      179.54      (6.9%)    0.7% ( -12% -   15%) 0.788
                    FilteredTerm      158.20      (2.4%)      159.52      (2.0%)    0.8% (  -3% -    5%) 0.341
               CombinedOrHighMed       72.81      (2.3%)       73.42      (1.9%)    0.8% (  -3% -    5%) 0.317
                DismaxOrHighHigh      118.90      (7.3%)      119.91      (7.1%)    0.9% ( -12% -   16%) 0.763
                 CountOrHighHigh      345.64      (2.2%)      348.61      (2.1%)    0.9% (  -3% -    5%) 0.312
                   TermTitleSort       86.50      (5.5%)       87.40      (7.2%)    1.0% ( -11% -   14%) 0.681
                         Prefix3      159.82      (2.4%)      161.51      (2.9%)    1.1% (  -4% -    6%) 0.312
                CountAndHighHigh      357.97      (1.9%)      361.80      (2.0%)    1.1% (  -2% -    5%) 0.159
                 FilteredPrefix3      150.59      (2.1%)      152.34      (2.5%)    1.2% (  -3% -    5%) 0.209
                     CountOrMany       30.36      (1.5%)       30.74      (1.6%)    1.2% (  -1% -    4%) 0.044
                      OrHighHigh       62.49     (13.6%)       63.44     (14.5%)    1.5% ( -23% -   34%) 0.783
              CombinedOrHighHigh       18.43      (3.6%)       18.73      (1.8%)    1.6% (  -3% -    7%) 0.148
              CombinedAndHighMed       39.74      (2.9%)       43.17      (1.8%)    8.6% (   3% -   13%) 0.000
             FilteredAndHighHigh       57.58      (6.9%)       63.06      (2.7%)    9.5% (   0% -   20%) 0.000
            FilteredAndStopWords       40.19      (5.9%)       44.35      (2.4%)   10.3% (   1% -   19%) 0.000
     FilteredAnd2Terms2StopWords      158.44      (5.4%)      177.31      (2.8%)   11.9% (   3% -   21%) 0.000
              FilteredAndHighMed      118.43      (9.5%)      134.44      (8.0%)   13.5% (  -3% -   34%) 0.000
             CombinedAndHighHigh       11.20      (3.4%)       12.98      (2.2%)   15.9% (   9% -   22%) 0.000
               FilteredAnd3Terms      159.89      (7.2%)      189.66      (5.6%)   18.6% (   5% -   33%) 0.000

gf2121 · 2025-05-31T15:28:40Z

lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java

+        float maxWindowScore = computeMaxScore(windowMin, windowMax);
+        scoreWindowScoreFirst(collector, acceptDocs, windowMin, windowMax + 1, maxWindowScore);
+      } else {
+        scoreWindowDocFirst(collector, acceptDocs, windowMin, windowMax + 1);


So minCompetitiveScore won't get a chance to be respected when filter clause leads the query because windowMax is DocIdSetIterator#NO_MORE_DOCS, could this cause regression?

I believe we've always had this problem? I remember trying to make things better but it didn't look great or caused performance regressions with term queries, the case I care about the most.

I believe we've always had this problem?

I agree that the previous version could not skip windows, but within window, it only needs to do conjunction with the competitive docs, while this PR could evaluate more.

I'm not sure how much this will affect though. FilteredAndHighHigh tasks should provide similar case and numbers not look bad. Let's move on.

…icks in. (#14739) This essentially reverts the change from #14701 for conjunctive queries that have not reached their `totalHitsThreshold` yet. This should speed up queries whose total number of matches is in the order of `totalHitsThreshold` or less, such as filtered conjunctions on nightly benchmarks.

jpountz · 2025-06-06T07:43:17Z

This restored performance of filtered queries on nightly benchmarks: https://benchmarks.mikemccandless.com/FilteredAnd3Terms.html. And also improved combined conjunctions: https://benchmarks.mikemccandless.com/CombinedAndHighHigh.html. I'll push an annotation.

jpountz added this to the 10.3.0 milestone May 31, 2025

jpountz added the skip-changelog-check label May 31, 2025

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking May 31, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking May 31, 2025

github-actions bot added the module:core/search label May 31, 2025

gf2121 reviewed May 31, 2025

View reviewed changes

jpountz merged commit 17a40bd into apache:main Jun 2, 2025
7 checks passed

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking Jun 2, 2025

jpountz deleted the doc_at_a_time_when_ramping_up branch June 2, 2025 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. #14739

Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. #14739

Uh oh!

jpountz commented May 31, 2025

Uh oh!

jpountz commented May 31, 2025

Uh oh!

gf2121 May 31, 2025 •

edited

Loading

Uh oh!

jpountz Jun 1, 2025

Uh oh!

gf2121 Jun 1, 2025

Uh oh!

Uh oh!

jpountz commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. #14739

Keep evaluating conjunction one doc-at-a-time until dynamic pruning kicks in. #14739

Uh oh!

Conversation

jpountz commented May 31, 2025

Uh oh!

jpountz commented May 31, 2025

Uh oh!

gf2121 May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jpountz Jun 1, 2025

Choose a reason for hiding this comment

Uh oh!

gf2121 Jun 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jpountz commented Jun 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gf2121 May 31, 2025 •

edited

Loading