Skip to content

Conversation

@jpountz
Copy link
Contributor

@jpountz jpountz commented Oct 13, 2023

When we initially introduced support for dynamic pruning, we had an implementation of WAND that would almost exclusively use advance(). Now that we switched to MAXSCORE and rely much more on nextDoc(), it makes sense to specialize nextDoc() as well.

When we initially introduced support for dynamic pruning, we had an
implementation of WAND that would almost exclusively use `advance()`. Now that
we switched to MAXSCORE and rely much more on `nextDoc()`, it makes sense to
specialize nextDoc() as well.
@jpountz jpountz added this to the 9.9.0 milestone Oct 13, 2023
@jpountz
Copy link
Contributor Author

jpountz commented Oct 13, 2023

Results on wikibigall. Both the baseline and the contender have #12668.

                           TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ      279.07      (9.5%)      269.35      (6.9%)   -3.5% ( -18% -   14%) 0.185
                         Prefix3      210.18      (3.2%)      207.28      (4.2%)   -1.4% (  -8% -    6%) 0.239
                  CountOrHighMed       87.59     (13.7%)       86.77     (14.3%)   -0.9% ( -25% -   31%) 0.832
                 CountOrHighHigh       56.41     (13.8%)       55.89     (14.6%)   -0.9% ( -25% -   31%) 0.836
                       LowPhrase       23.15      (3.7%)       23.00      (4.2%)   -0.7% (  -8% -    7%) 0.598
                        Wildcard       36.09      (3.7%)       35.90      (3.8%)   -0.5% (  -7% -    7%) 0.652
           HighTermDayOfYearSort      249.90      (2.0%)      249.63      (1.8%)   -0.1% (  -3% -    3%) 0.857
                         Respell       69.31      (1.8%)       69.38      (2.4%)    0.1% (  -3% -    4%) 0.876
                       MedPhrase       28.69      (2.4%)       28.73      (3.5%)    0.1% (  -5% -    6%) 0.894
                      HighPhrase       54.89      (2.8%)       55.06      (3.7%)    0.3% (  -6% -    7%) 0.771
                        PKLookup      224.50      (2.9%)      226.08      (2.6%)    0.7% (  -4% -    6%) 0.415
                     CountPhrase        4.40      (3.6%)        4.43      (4.8%)    0.8% (  -7% -    9%) 0.542
                          Fuzzy2       79.33      (1.8%)       80.04      (1.6%)    0.9% (  -2% -    4%) 0.092
                          Fuzzy1      116.13      (1.9%)      117.20      (1.7%)    0.9% (  -2% -    4%) 0.106
                       CountTerm    16592.06      (5.5%)    16762.13      (6.2%)    1.0% ( -10% -   13%) 0.580
                        HighTerm      377.81      (7.9%)      381.70      (7.1%)    1.0% ( -12% -   17%) 0.665
                         MedTerm      481.56      (7.2%)      487.41      (6.3%)    1.2% ( -11% -   15%) 0.570
                         LowTerm     1072.51      (6.8%)     1086.51      (6.0%)    1.3% ( -10% -   15%) 0.522
                      AndHighLow      996.73      (2.3%)     1013.39      (3.8%)    1.7% (  -4% -    8%) 0.096
                CountAndHighHigh       40.50      (3.6%)       41.23      (5.0%)    1.8% (  -6% -   10%) 0.187
               HighTermMonthSort     4998.32      (3.8%)     5092.15      (2.2%)    1.9% (  -3% -    8%) 0.056
                 CountAndHighMed      122.05      (2.8%)      124.41      (4.5%)    1.9% (  -5% -    9%) 0.099
                      AndHighMed      140.41      (2.8%)      145.21      (4.1%)    3.4% (  -3% -   10%) 0.002
                     AndHighHigh       51.03      (3.4%)       53.21      (4.6%)    4.3% (  -3% -   12%) 0.001
                       OrHighMed      183.06      (3.7%)      191.28      (5.0%)    4.5% (  -4% -   13%) 0.001
                       OrHighLow      598.61      (3.0%)      627.58      (3.7%)    4.8% (  -1% -   11%) 0.000
                      OrHighHigh       67.20      (6.2%)       71.31      (6.2%)    6.1% (  -5% -   19%) 0.002

Disjunctions see a bigger improvement than conjunctions, which makes sense since they tend to rely more on nextDoc() and less on advance() compared with conjunctions.

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jpountz jpountz merged commit 218edde into apache:main Oct 17, 2023
@jpountz jpountz deleted the optimize_impacts_enum_next_doc branch October 17, 2023 12:14
jpountz added a commit that referenced this pull request Oct 17, 2023
When we initially introduced support for dynamic pruning, we had an
implementation of WAND that would almost exclusively use `advance()`. Now that
we switched to MAXSCORE and rely much more on `nextDoc()`, it makes sense to
specialize nextDoc() as well.
@jpountz
Copy link
Contributor Author

jpountz commented Oct 18, 2023

This yielded a noticeable speedup on OrHighHigh and OrHighMed. I'll add an annotation.

clayburn added a commit to runningcode/lucene that referenced this pull request Oct 20, 2023
…ache.org

* upstream/main: (239 commits)
  Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation (apache#12633)
  Fix index out of bounds when writing FST to different metaOut (apache#12697) (apache#12698)
  Avoid object construction when linear searching arcs (apache#12692)
  chore: update the Javadoc example in Analyzer (apache#12693)
  coorect position on entry in CHANGES.txt
  Refactor ByteBlockPool so it is just a "shift/mask big array" (apache#12625)
  Extract the hnsw graph merging from being part of the vector writer (apache#12657)
  Specialize `BlockImpactsDocsEnum#nextDoc()`. (apache#12670)
  Speed up TestIndexOrDocValuesQuery. (apache#12672)
  Remove over-counting of deleted terms (apache#12586)
  Use MergeSorter in StableStringSorter (apache#12652)
  Use radix sort to speed up the sorting of terms in TermInSetQuery (apache#12587)
  Add timeouts to github jobs. Estimates taken from empirical run times (actions history), with a generous buffer added. (apache#12687)
  Optimize OnHeapHnswGraph's data structure (apache#12651)
  Add createClassLoader to replicator permissions (block specific to jacoco). (apache#12684)
  Move changes entry before backporting
  CHANGES
  Move testing properties to provider class (no classloading deadlock possible) and fallback to default provider in non-test mode
  simple cleanups to vector code (apache#12680)
  Better detect vector module in non-default setups (e.g., custom module layers) (apache#12677)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants