Skip to content

Conversation

@gf2121
Copy link
Contributor

@gf2121 gf2121 commented May 24, 2025

This tries to speed up TermQuery with the new API Scorer#nextDocsAndScores

TopN

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                   TermMonthSort     3577.64      (3.6%)     3491.60      (5.4%)   -2.4% ( -10% -    6%) 0.290
                          OrMany        9.72      (4.0%)        9.56      (4.2%)   -1.6% (  -9% -    6%) 0.424
                AndMedOrHighHigh       51.22      (3.4%)       50.57      (4.7%)   -1.3% (  -9% -    7%) 0.538
                 FilteredPrefix3      647.69      (1.6%)      640.85      (4.0%)   -1.1% (  -6% -    4%) 0.488
                          IntSet      634.61      (2.2%)      628.14      (3.1%)   -1.0% (  -6% -    4%) 0.447
                     CountPhrase        3.98      (3.0%)        3.94      (3.1%)   -0.9% (  -6% -    5%) 0.570
               CombinedOrHighMed       91.42      (1.5%)       90.67      (2.7%)   -0.8% (  -4% -    3%) 0.448
                       CountTerm    10500.02      (9.4%)    10419.36      (8.7%)   -0.8% ( -17% -   19%) 0.865
              CombinedOrHighHigh       12.61      (1.4%)       12.52      (3.6%)   -0.7% (  -5% -    4%) 0.586
               FilteredOrHighMed      163.12      (3.4%)      162.07      (3.6%)   -0.6% (  -7% -    6%) 0.713
               TermDayOfYearSort      328.81      (0.9%)      327.11      (1.7%)   -0.5% (  -3% -    2%) 0.447
                          Fuzzy1      113.73      (2.6%)      113.32      (2.4%)   -0.4% (  -5% -    4%) 0.771
             CountFilteredIntNRQ       21.75      (1.8%)       21.68      (1.7%)   -0.3% (  -3% -    3%) 0.717
          CountFilteredOrHighMed       50.21      (3.8%)       50.05      (3.8%)   -0.3% (  -7% -    7%) 0.868
             CombinedAndHighHigh       10.29      (1.6%)       10.27      (3.0%)   -0.3% (  -4% -    4%) 0.823
         CountFilteredOrHighHigh       42.70      (3.3%)       42.59      (3.5%)   -0.3% (  -6% -    6%) 0.884
                  FilteredIntNRQ       37.85      (0.5%)       37.76      (0.8%)   -0.2% (  -1% -    1%) 0.488
              CombinedAndHighMed       81.00      (1.7%)       80.84      (2.3%)   -0.2% (  -4% -    3%) 0.852
                  FilteredOrMany        7.31      (1.9%)        7.30      (1.8%)   -0.1% (  -3% -    3%) 0.881
             CountFilteredOrMany       13.60      (2.3%)       13.58      (2.2%)   -0.1% (  -4% -    4%) 0.910
                          Fuzzy2      126.14      (2.5%)      126.00      (3.2%)   -0.1% (  -5% -    5%) 0.938
                          IntNRQ       77.45      (0.4%)       77.44      (0.4%)   -0.0% (   0% -    0%) 0.959
             FilteredAndHighHigh       27.40      (3.1%)       27.40      (4.2%)   -0.0% (  -7% -    7%) 0.999
                         Prefix3       73.55      (1.4%)       73.59      (1.6%)    0.1% (  -2% -    3%) 0.936
                       And3Terms      525.88      (3.5%)      526.24      (3.7%)    0.1% (  -6% -    7%) 0.969
             FilteredOrStopWords       25.33      (3.2%)       25.35      (1.8%)    0.1% (  -4% -    5%) 0.938
                    SloppyPhrase        1.10      (1.3%)        1.10      (1.4%)    0.2% (  -2% -    2%) 0.813
              Or2Terms2StopWords      385.17      (1.3%)      385.85      (2.3%)    0.2% (  -3% -    3%) 0.852
                CountAndHighHigh       83.88      (1.4%)       84.06      (1.0%)    0.2% (  -2% -    2%) 0.717
                FilteredOr3Terms       86.29      (3.5%)       86.49      (2.4%)    0.2% (  -5% -    6%) 0.876
            FilteredAndStopWords       16.87      (4.9%)       16.92      (5.1%)    0.2% (  -9% -   10%) 0.922
                        Wildcard      112.39      (1.3%)      112.67      (1.2%)    0.2% (  -2% -    2%) 0.698
                 AndHighOrMedMed       44.53      (1.1%)       44.67      (1.6%)    0.3% (  -2% -    2%) 0.639
                         Respell       83.94      (0.9%)       84.23      (1.4%)    0.3% (  -1% -    2%) 0.562
                   TermTitleSort      145.80      (1.1%)      146.31      (2.0%)    0.4% (  -2% -    3%) 0.658
      FilteredOr2Terms2StopWords      196.81      (1.7%)      197.59      (2.1%)    0.4% (  -3% -    4%) 0.687
                        SpanNear        6.22      (0.8%)        6.26      (1.0%)    0.5% (  -1% -    2%) 0.243
             CountFilteredPhrase       90.57      (4.0%)       91.05      (2.9%)    0.5% (  -6% -    7%) 0.763
     FilteredAnd2Terms2StopWords      459.38      (2.4%)      462.00      (2.8%)    0.6% (  -4% -    5%) 0.660
                 CountOrHighHigh       83.30      (3.8%)       83.78      (2.1%)    0.6% (  -5% -    6%) 0.704
                      OrHighRare      948.62      (4.7%)      954.83      (2.1%)    0.7% (  -5% -    7%) 0.719
                 DismaxOrHighMed       96.87      (2.6%)       97.53      (4.1%)    0.7% (  -5% -    7%) 0.688
                      AndHighMed      116.14      (3.6%)      117.01      (4.0%)    0.7% (  -6% -    8%) 0.694
                     CountOrMany       11.71      (3.3%)       11.81      (2.2%)    0.8% (  -4% -    6%) 0.565
                          Phrase       12.48      (3.3%)       12.61      (2.5%)    1.0% (  -4% -    6%) 0.502
                     OrStopWords       37.89      (5.4%)       38.27      (7.4%)    1.0% ( -11% -   14%) 0.758
                IntervalsOrdered        2.19      (1.9%)        2.21      (1.4%)    1.0% (  -2% -    4%) 0.239
                       OrHighMed      195.95      (2.5%)      198.05      (4.7%)    1.1% (  -6% -    8%) 0.573
              FilteredOrHighHigh       28.37      (3.2%)       28.71      (1.6%)    1.2% (  -3% -    6%) 0.352
              FilteredAndHighMed      112.41      (3.2%)      113.89      (3.8%)    1.3% (  -5% -    8%) 0.452
                  FilteredPhrase       17.13      (4.4%)       17.36      (2.8%)    1.3% (  -5% -    8%) 0.469
                    FilteredTerm      128.12      (4.5%)      129.88      (4.8%)    1.4% (  -7% -   11%) 0.553
               FilteredAnd3Terms      118.53      (3.4%)      120.22      (3.4%)    1.4% (  -5% -    8%) 0.405
                DismaxOrHighHigh       93.55      (3.4%)       94.92      (4.4%)    1.5% (  -6% -    9%) 0.457
                  CountOrHighMed      159.93      (5.6%)      162.51      (5.6%)    1.6% (  -9% -   13%) 0.567
                 CountAndHighMed      133.67      (5.4%)      135.90      (5.4%)    1.7% (  -8% -   13%) 0.535
             And2Terms2StopWords       39.19      (4.3%)       39.94      (6.2%)    1.9% (  -8% -   12%) 0.470
                    CombinedTerm       25.29      (1.3%)       25.78      (2.0%)    1.9% (  -1% -    5%) 0.022
                      OrHighHigh       25.46      (4.2%)       25.96      (7.7%)    2.0% (  -9% -   14%) 0.519
                        Or3Terms      127.74      (5.3%)      130.56      (4.4%)    2.2% (  -7% -   12%) 0.365
                    AndStopWords       37.41      (5.7%)       38.25      (6.7%)    2.2% (  -9% -   15%) 0.473
                      TermDTSort      369.80      (6.3%)      379.16      (8.2%)    2.5% ( -11% -   18%) 0.490
                     AndHighHigh       82.62      (4.4%)       85.31      (3.3%)    3.3% (  -4% -   11%) 0.093
                      DismaxTerm      896.35      (3.7%)     1210.67      (8.1%)   35.1% (  22% -   48%) 0.000
                            Term      981.81      (3.7%)     1344.47      (6.9%)   36.9% (  25% -   49%) 0.000

Exhaustive

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
              FilteredAndHighMed      204.99      (1.6%)      201.23      (1.8%)   -1.8% (  -5% -    1%) 0.035
                          IntNRQ        8.18      (6.0%)        8.05      (5.2%)   -1.6% ( -12% -   10%) 0.556
                   TermMonthSort     3497.23      (5.2%)     3441.48      (1.6%)   -1.6% (  -8% -    5%) 0.409
                CountAndHighHigh       94.33      (2.7%)       92.87      (3.0%)   -1.5% (  -7% -    4%) 0.284
                       And3Terms      584.28      (3.2%)      575.55      (3.7%)   -1.5% (  -8% -    5%) 0.383
                          OrMany        0.97      (5.6%)        0.96      (9.2%)   -1.3% ( -15% -   14%) 0.733
                     CountOrMany       11.97      (4.6%)       11.83      (4.3%)   -1.2% (  -9% -    8%) 0.597
               FilteredAnd3Terms      139.54      (2.7%)      138.19      (2.3%)   -1.0% (  -5% -    4%) 0.441
                 CountOrHighHigh       95.76      (3.0%)       94.86      (3.4%)   -0.9% (  -7% -    5%) 0.556
                 CountAndHighMed      138.99      (4.2%)      137.74      (2.6%)   -0.9% (  -7% -    6%) 0.606
                AndMedOrHighHigh       54.89      (1.9%)       54.44      (2.3%)   -0.8% (  -4% -    3%) 0.441
                      TermDTSort      226.01      (1.4%)      224.45      (1.1%)   -0.7% (  -3% -    1%) 0.278
               TermDayOfYearSort      284.64      (0.7%)      282.69      (1.2%)   -0.7% (  -2% -    1%) 0.180
                  CountOrHighMed      171.81      (2.8%)      170.78      (1.9%)   -0.6% (  -5% -    4%) 0.620
                          IntSet      597.04      (2.5%)      593.54      (2.1%)   -0.6% (  -5% -    4%) 0.617
             CountFilteredOrMany       13.33      (1.5%)       13.26      (1.5%)   -0.5% (  -3% -    2%) 0.479
             FilteredAndHighHigh       34.56      (1.8%)       34.39      (1.9%)   -0.5% (  -4% -    3%) 0.596
         CountFilteredOrHighHigh       38.70      (1.5%)       38.53      (1.7%)   -0.5% (  -3% -    2%) 0.567
                     AndHighHigh       10.80      (0.7%)       10.76      (1.1%)   -0.4% (  -2% -    1%) 0.377
                   TermTitleSort      169.21      (2.5%)      168.52      (1.4%)   -0.4% (  -4% -    3%) 0.692
                         Respell       82.28      (1.9%)       81.95      (2.0%)   -0.4% (  -4% -    3%) 0.682
                        Wildcard       37.88      (4.5%)       37.81      (4.2%)   -0.2% (  -8% -    8%) 0.926
      FilteredOr2Terms2StopWords        9.94      (2.1%)        9.93      (1.5%)   -0.1% (  -3% -    3%) 0.874
          CountFilteredOrHighMed       53.57      (0.9%)       53.52      (1.2%)   -0.1% (  -2% -    1%) 0.851
             CountFilteredPhrase       91.72      (1.5%)       91.69      (2.0%)   -0.0% (  -3% -    3%) 0.976
                         Prefix3        5.93      (3.2%)        5.93      (3.0%)   -0.0% (  -6% -    6%) 0.990
     FilteredAnd2Terms2StopWords      357.95      (2.1%)      357.99      (1.1%)    0.0% (  -3% -    3%) 0.988
                    AndStopWords       13.39      (1.2%)       13.39      (1.4%)    0.0% (  -2% -    2%) 0.962
                    CombinedTerm       24.93      (1.5%)       24.94      (1.6%)    0.0% (  -2% -    3%) 0.955
                          Phrase        3.84      (2.8%)        3.84      (5.0%)    0.1% (  -7% -    8%) 0.978
                  FilteredIntNRQ       16.94      (0.7%)       16.95      (1.4%)    0.1% (  -1% -    2%) 0.879
                FilteredOr3Terms       21.83      (3.5%)       21.86      (2.6%)    0.1% (  -5% -    6%) 0.935
                 FilteredPrefix3        8.43      (1.4%)        8.44      (1.5%)    0.2% (  -2% -    3%) 0.830
               FilteredOrHighMed       21.71      (0.4%)       21.76      (0.6%)    0.2% (   0% -    1%) 0.387
              FilteredOrHighHigh       16.06      (1.7%)       16.10      (1.5%)    0.3% (  -2% -    3%) 0.749
                      AndHighMed       75.44      (2.1%)       75.64      (2.6%)    0.3% (  -4% -    5%) 0.826
             And2Terms2StopWords      297.11      (1.7%)      297.98      (1.5%)    0.3% (  -2% -    3%) 0.710
                 AndHighOrMedMed       43.16      (1.6%)       43.31      (2.0%)    0.3% (  -3% -    3%) 0.702
             CountFilteredIntNRQ       19.77      (0.4%)       19.85      (0.7%)    0.4% (   0% -    1%) 0.170
              CombinedAndHighMed       61.79      (1.7%)       62.07      (1.5%)    0.4% (  -2% -    3%) 0.583
            FilteredAndStopWords       18.77      (1.4%)       18.85      (1.5%)    0.4% (  -2% -    3%) 0.541
                 DismaxOrHighMed       11.22      (4.2%)       11.27      (4.9%)    0.4% (  -8% -    9%) 0.843
                          Fuzzy2       80.82      (3.2%)       81.20      (2.9%)    0.5% (  -5% -    6%) 0.759
                    FilteredTerm       38.29      (2.0%)       38.48      (1.7%)    0.5% (  -3% -    4%) 0.589
              Or2Terms2StopWords        2.49      (5.2%)        2.50     (11.0%)    0.5% ( -14% -   17%) 0.904
             CombinedAndHighHigh       16.43      (1.7%)       16.52      (1.4%)    0.6% (  -2% -    3%) 0.456
                       CountTerm    10806.25      (6.3%)    10880.88      (8.0%)    0.7% ( -12% -   16%) 0.848
                  FilteredPhrase       82.93      (1.6%)       83.52      (2.0%)    0.7% (  -2% -    4%) 0.426
                        SpanNear       34.73      (4.0%)       34.99      (2.3%)    0.8% (  -5% -    7%) 0.643
                    SloppyPhrase       25.49      (6.8%)       25.70      (4.9%)    0.8% ( -10% -   13%) 0.790
                DismaxOrHighHigh        4.21      (3.7%)        4.25      (4.4%)    0.8% (  -6% -    9%) 0.677
                     CountPhrase        6.48      (2.1%)        6.54      (2.1%)    0.9% (  -3% -    5%) 0.397
                      OrHighRare        4.33      (1.4%)        4.37      (1.2%)    0.9% (  -1% -    3%) 0.163
             FilteredOrStopWords        8.37      (1.3%)        8.46      (1.5%)    1.1% (  -1% -    3%) 0.126
                      DismaxTerm       48.75      (5.9%)       49.34      (3.4%)    1.2% (  -7% -   11%) 0.613
                     OrStopWords        3.33      (4.4%)        3.37     (11.3%)    1.3% ( -13% -   17%) 0.762
                  FilteredOrMany        1.86      (2.1%)        1.88      (1.6%)    1.3% (  -2% -    5%) 0.158
                      OrHighHigh        8.31      (3.8%)        8.43     (11.6%)    1.5% ( -13% -   17%) 0.728
                          Fuzzy1       47.01      (1.5%)       47.72      (5.7%)    1.5% (  -5% -    8%) 0.462
                       OrHighMed        6.75      (4.0%)        6.86     (11.9%)    1.6% ( -13% -   18%) 0.714
                IntervalsOrdered       31.01      (4.7%)       31.59      (3.0%)    1.9% (  -5% -   10%) 0.341
                        Or3Terms       24.10      (0.7%)       24.75      (9.6%)    2.7% (  -7% -   13%) 0.434
               CombinedOrHighMed        6.98      (5.0%)        7.23      (3.9%)    3.6% (  -5% -   13%) 0.116
              CombinedOrHighHigh        1.53      (5.4%)        1.59      (3.7%)    3.9% (  -4% -   13%) 0.093
                            Term       73.22      (1.7%)      119.72      (4.4%)   63.5% (  56% -   70%) 0.000

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice speedup! Term queries are fast, though a term query on the is one of the slowest queries in the Tantivy benchmark, so it's nice to get it optimized.

public int score(LeafCollector collector, Bits acceptDocs, int min, int max) throws IOException {
if (collector.competitiveIterator() != null) {
return new Weight.DefaultBulkScorer(scorer).score(collector, acceptDocs, min, max);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be an implementation detail of DefaultBulkScorer instead of a different class. Doing something like

if (scoreMode == TOP_SCORES && competitiveIterator == null) {
  // new optimization
} else {
  // existing DefaultBulkScorer code
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for feedback! I moved the impl into DefaultBulkScorer.

if (scoreMode == TOP_SCORES && competitiveIterator == null)

As description showing, exhaustive execution get optimized as well so i use scoreMode.needsScores instead.

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about my last suggestion, I had missed that DefaultBulkScorer had no way to know if scores are needed or not yet, so I think I like your previous approach a bit better to keep DefaultBulkScorer clean.


if (impactsDisi != null) {
impactsDisi.ensureCompetitive();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should rather put it at the beginning of the below for loop. For instance, imagine that the first block of docs returned only has deleted docs, then it will fetch a new block. It would be good to check if this block is competitive before fetching this new block as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice catch!

gf2121 added 3 commits May 26, 2025 21:09
@github-actions
Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

@github-actions github-actions bot added this to the 10.3.0 milestone May 26, 2025
@gf2121 gf2121 merged commit 12c3041 into apache:main May 26, 2025
7 checks passed
asf-gitbox-commits pushed a commit that referenced this pull request May 26, 2025
// nightly-benchmarks-results-changed //
weizijun added a commit to weizijun/lucene that referenced this pull request May 27, 2025
* main: (32 commits)
  update os.makedirs with pathlib mkdir (apache#14710)
  Optimize AbstractKnnVectorQuery#createBitSet with intoBitset (apache#14674)
  Implement #docIDRunEnd() on PostingsEnum. (apache#14693)
  Speed up TermQuery (apache#14709)
  Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. (apache#14701)
  Fix WindowsFS test failure seen on Policeman Jenkins (apache#14706)
  Use a temporary repository location to download certain ecj versions ("drops") (apache#14703)
  Add assumption to ignore occasional test failures due to disconnected graphs (apache#14696)
  Return MatchNoDocsQuery when IndexOrDocValuesQuery::rewrite does not match (apache#14700)
  Minor access modifier adjustment to a couple of lucene90 backward compat types (apache#14695)
  Speed up exhaustive evaluation. (apache#14679)
  Specify and test that IOContext is immutable (apache#14686)
  deps(java): bump org.gradle.toolchains.foojay-resolver-convention (apache#14691)
  deps(java): bump org.eclipse.jgit:org.eclipse.jgit (apache#14692)
  Clean up how the test framework creates asserting scorables. (apache#14452)
  Make competitive iterators more robust. (apache#14532)
  Remove DISIDocIdStream. (apache#14550)
  Implement AssertingPostingsEnum#intoBitSet. (apache#14675)
  Fix patience knn queries to work with seeded knn queries (apache#14688)
  Added toString() method to BytesRefBuilder (apache#14676)
  ...
@jpountz
Copy link
Contributor

jpountz commented May 31, 2025

This change yielded a good speedup on nightly benchmarks, I pushed an annotation. https://benchmarks.mikemccandless.com/Term.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants