Specialize `BlockImpactsDocsEnum#nextDoc()`. #12670

jpountz · 2023-10-13T10:28:28Z

When we initially introduced support for dynamic pruning, we had an implementation of WAND that would almost exclusively use advance(). Now that we switched to MAXSCORE and rely much more on nextDoc(), it makes sense to specialize nextDoc() as well.

When we initially introduced support for dynamic pruning, we had an implementation of WAND that would almost exclusively use `advance()`. Now that we switched to MAXSCORE and rely much more on `nextDoc()`, it makes sense to specialize nextDoc() as well.

jpountz · 2023-10-13T10:30:23Z

Results on wikibigall. Both the baseline and the contender have #12668.

                           TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                          IntNRQ      279.07      (9.5%)      269.35      (6.9%)   -3.5% ( -18% -   14%) 0.185
                         Prefix3      210.18      (3.2%)      207.28      (4.2%)   -1.4% (  -8% -    6%) 0.239
                  CountOrHighMed       87.59     (13.7%)       86.77     (14.3%)   -0.9% ( -25% -   31%) 0.832
                 CountOrHighHigh       56.41     (13.8%)       55.89     (14.6%)   -0.9% ( -25% -   31%) 0.836
                       LowPhrase       23.15      (3.7%)       23.00      (4.2%)   -0.7% (  -8% -    7%) 0.598
                        Wildcard       36.09      (3.7%)       35.90      (3.8%)   -0.5% (  -7% -    7%) 0.652
           HighTermDayOfYearSort      249.90      (2.0%)      249.63      (1.8%)   -0.1% (  -3% -    3%) 0.857
                         Respell       69.31      (1.8%)       69.38      (2.4%)    0.1% (  -3% -    4%) 0.876
                       MedPhrase       28.69      (2.4%)       28.73      (3.5%)    0.1% (  -5% -    6%) 0.894
                      HighPhrase       54.89      (2.8%)       55.06      (3.7%)    0.3% (  -6% -    7%) 0.771
                        PKLookup      224.50      (2.9%)      226.08      (2.6%)    0.7% (  -4% -    6%) 0.415
                     CountPhrase        4.40      (3.6%)        4.43      (4.8%)    0.8% (  -7% -    9%) 0.542
                          Fuzzy2       79.33      (1.8%)       80.04      (1.6%)    0.9% (  -2% -    4%) 0.092
                          Fuzzy1      116.13      (1.9%)      117.20      (1.7%)    0.9% (  -2% -    4%) 0.106
                       CountTerm    16592.06      (5.5%)    16762.13      (6.2%)    1.0% ( -10% -   13%) 0.580
                        HighTerm      377.81      (7.9%)      381.70      (7.1%)    1.0% ( -12% -   17%) 0.665
                         MedTerm      481.56      (7.2%)      487.41      (6.3%)    1.2% ( -11% -   15%) 0.570
                         LowTerm     1072.51      (6.8%)     1086.51      (6.0%)    1.3% ( -10% -   15%) 0.522
                      AndHighLow      996.73      (2.3%)     1013.39      (3.8%)    1.7% (  -4% -    8%) 0.096
                CountAndHighHigh       40.50      (3.6%)       41.23      (5.0%)    1.8% (  -6% -   10%) 0.187
               HighTermMonthSort     4998.32      (3.8%)     5092.15      (2.2%)    1.9% (  -3% -    8%) 0.056
                 CountAndHighMed      122.05      (2.8%)      124.41      (4.5%)    1.9% (  -5% -    9%) 0.099
                      AndHighMed      140.41      (2.8%)      145.21      (4.1%)    3.4% (  -3% -   10%) 0.002
                     AndHighHigh       51.03      (3.4%)       53.21      (4.6%)    4.3% (  -3% -   12%) 0.001
                       OrHighMed      183.06      (3.7%)      191.28      (5.0%)    4.5% (  -4% -   13%) 0.001
                       OrHighLow      598.61      (3.0%)      627.58      (3.7%)    4.8% (  -1% -   11%) 0.000
                      OrHighHigh       67.20      (6.2%)       71.31      (6.2%)    6.1% (  -5% -   19%) 0.002

Disjunctions see a bigger improvement than conjunctions, which makes sense since they tend to rely more on nextDoc() and less on advance() compared with conjunctions.

romseygeek

LGTM

When we initially introduced support for dynamic pruning, we had an implementation of WAND that would almost exclusively use `advance()`. Now that we switched to MAXSCORE and rely much more on `nextDoc()`, it makes sense to specialize nextDoc() as well.

jpountz · 2023-10-18T14:31:51Z

This yielded a noticeable speedup on OrHighHigh and OrHighMed. I'll add an annotation.

…ache.org * upstream/main: (239 commits) Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation (apache#12633) Fix index out of bounds when writing FST to different metaOut (apache#12697) (apache#12698) Avoid object construction when linear searching arcs (apache#12692) chore: update the Javadoc example in Analyzer (apache#12693) coorect position on entry in CHANGES.txt Refactor ByteBlockPool so it is just a "shift/mask big array" (apache#12625) Extract the hnsw graph merging from being part of the vector writer (apache#12657) Specialize `BlockImpactsDocsEnum#nextDoc()`. (apache#12670) Speed up TestIndexOrDocValuesQuery. (apache#12672) Remove over-counting of deleted terms (apache#12586) Use MergeSorter in StableStringSorter (apache#12652) Use radix sort to speed up the sorting of terms in TermInSetQuery (apache#12587) Add timeouts to github jobs. Estimates taken from empirical run times (actions history), with a generous buffer added. (apache#12687) Optimize OnHeapHnswGraph's data structure (apache#12651) Add createClassLoader to replicator permissions (block specific to jacoco). (apache#12684) Move changes entry before backporting CHANGES Move testing properties to provider class (no classloading deadlock possible) and fallback to default provider in non-test mode simple cleanups to vector code (apache#12680) Better detect vector module in non-default setups (e.g., custom module layers) (apache#12677) ...

jpountz added this to the 9.9.0 milestone Oct 13, 2023

romseygeek approved these changes Oct 17, 2023

View reviewed changes

jpountz merged commit 218edde into apache:main Oct 17, 2023

jpountz deleted the optimize_impacts_enum_next_doc branch October 17, 2023 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Specialize `BlockImpactsDocsEnum#nextDoc()`. #12670

Specialize `BlockImpactsDocsEnum#nextDoc()`. #12670

Uh oh!

jpountz commented Oct 13, 2023

Uh oh!

jpountz commented Oct 13, 2023

Uh oh!

romseygeek left a comment

Uh oh!

jpountz commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Specialize BlockImpactsDocsEnum#nextDoc(). #12670

Specialize BlockImpactsDocsEnum#nextDoc(). #12670

Uh oh!

Conversation

jpountz commented Oct 13, 2023

Uh oh!

jpountz commented Oct 13, 2023

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

jpountz commented Oct 18, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Specialize `BlockImpactsDocsEnum#nextDoc()`. #12670

Specialize `BlockImpactsDocsEnum#nextDoc()`. #12670