Avoid object construction when linear searching arcs #12692

gf2121 · 2023-10-18T09:29:05Z

Description

This PR resolves a todo left in FST that we construct some useless objects during linear search. This is also an effort that tries to avoid Outputs#read as we have more outputs distributed in arcs with MSB VLong format.

I'll run luceneutil soon.

See also #12661

gf2121 · 2023-10-18T12:07:00Z

Result on wikimediumall (nothing changed obviously) :

                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                         LowTerm      229.72      (4.5%)      225.59      (3.8%)   -1.8% (  -9% -    6%) 0.197
                 LowSloppyPhrase        1.61      (4.3%)        1.59      (5.7%)   -1.6% ( -11% -    8%) 0.356
                     LowSpanNear        4.16      (2.0%)        4.10      (2.9%)   -1.5% (  -6% -    3%) 0.069
                        PKLookup      101.50      (2.8%)      100.16      (2.8%)   -1.3% (  -6% -    4%) 0.155
                    HighSpanNear        6.82      (3.1%)        6.75      (3.1%)   -1.0% (  -6% -    5%) 0.347
                     AndHighHigh       18.56      (2.9%)       18.43      (2.4%)   -0.7% (  -5% -    4%) 0.432
             MedIntervalsOrdered        3.45      (6.6%)        3.43      (7.2%)   -0.6% ( -13% -   14%) 0.802
                      AndHighMed       40.23      (4.9%)       40.02      (3.9%)   -0.5% (  -8% -    8%) 0.714
                   OrHighNotHigh      154.45      (4.2%)      153.91      (4.8%)   -0.4% (  -8% -    9%) 0.814
                     MedSpanNear        1.10      (2.2%)        1.10      (2.1%)   -0.1% (  -4% -    4%) 0.908
                    OrHighNotLow      233.46      (5.4%)      233.40      (4.3%)   -0.0% (  -9% -   10%) 0.986
                      HighPhrase       38.09      (2.7%)       38.13      (4.6%)    0.1% (  -7% -    7%) 0.946
                   OrNotHighHigh      139.46      (4.7%)      139.59      (4.5%)    0.1% (  -8% -    9%) 0.950
             LowIntervalsOrdered        2.74      (4.9%)        2.75      (5.5%)    0.2% (  -9% -   11%) 0.907
                          Fuzzy2       33.67      (2.2%)       33.75      (2.2%)    0.3% (  -4% -    4%) 0.730
                HighSloppyPhrase        3.56      (4.0%)        3.58      (3.7%)    0.4% (  -7% -    8%) 0.769
                       LowPhrase       25.91      (2.4%)       26.01      (3.9%)    0.4% (  -5% -    6%) 0.694
                         Respell       27.34      (2.4%)       27.47      (2.3%)    0.5% (  -4% -    5%) 0.540
                 MedSloppyPhrase        9.25      (3.6%)        9.30      (4.0%)    0.5% (  -6% -    8%) 0.674
                       OrHighLow      187.50      (4.7%)      188.87      (2.9%)    0.7% (  -6% -    8%) 0.578
                         MedTerm      233.89      (4.3%)      235.63      (3.8%)    0.7% (  -7% -    9%) 0.581
                         Prefix3       63.58      (5.8%)       64.11      (3.7%)    0.8% (  -8% -   10%) 0.607
                       MedPhrase        9.37      (2.6%)        9.45      (4.1%)    0.9% (  -5% -    7%) 0.444
                    OrNotHighLow      231.48      (3.0%)      233.63      (4.1%)    0.9% (  -6% -    8%) 0.438
                      AndHighLow      307.03      (3.8%)      310.29      (4.0%)    1.1% (  -6% -    9%) 0.412
                          Fuzzy1       42.13      (2.7%)       42.59      (2.2%)    1.1% (  -3% -    6%) 0.174
                        Wildcard       65.44      (3.1%)       66.25      (2.1%)    1.2% (  -3% -    6%) 0.160
            HighIntervalsOrdered        0.37      (4.9%)        0.37      (4.1%)    1.6% (  -7% -   11%) 0.304
                      OrHighHigh       12.90      (4.4%)       13.10      (5.0%)    1.6% (  -7% -   11%) 0.309
                    OrHighNotMed      117.52      (4.8%)      119.46      (5.2%)    1.7% (  -7% -   12%) 0.318
                    OrNotHighMed      123.61      (6.1%)      125.82      (4.2%)    1.8% (  -7% -   12%) 0.302
                        HighTerm      217.07      (4.1%)      222.24      (3.0%)    2.4% (  -4% -    9%) 0.047
                       OrHighMed       37.00      (4.8%)       37.94      (3.9%)    2.5% (  -5% -   11%) 0.081

mikemccand · 2023-10-18T13:02:05Z

lucene/core/src/java/org/apache/lucene/util/fst/FST.java

+        if (flag(flags, BIT_ARC_HAS_FINAL_OUTPUT)) {
+          outputs.skipFinalOutput(in);
+        }
+        if (!flag(flags, BIT_STOP_NODE) && !flag(flags, BIT_TARGET_NEXT)) {


Could we use == false instead? This more readable form reduces the chance of accidental future refactoring bugs.

Thanks @mikemccand , this pattern is copied from seekToNextNode, i refactored it too.

lucene/core/src/java/org/apache/lucene/util/fst/FST.java

mikemccand

I like this change! Rather than fully decoding each outgoing arc, we only decode the "metadata", enough to check whether the label matches, in the fallback case when we must linearly scan the arcs. I left one small comment about final and non-final outputs ...

…ache.org * upstream/main: (239 commits) Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation (apache#12633) Fix index out of bounds when writing FST to different metaOut (apache#12697) (apache#12698) Avoid object construction when linear searching arcs (apache#12692) chore: update the Javadoc example in Analyzer (apache#12693) coorect position on entry in CHANGES.txt Refactor ByteBlockPool so it is just a "shift/mask big array" (apache#12625) Extract the hnsw graph merging from being part of the vector writer (apache#12657) Specialize `BlockImpactsDocsEnum#nextDoc()`. (apache#12670) Speed up TestIndexOrDocValuesQuery. (apache#12672) Remove over-counting of deleted terms (apache#12586) Use MergeSorter in StableStringSorter (apache#12652) Use radix sort to speed up the sorting of terms in TermInSetQuery (apache#12587) Add timeouts to github jobs. Estimates taken from empirical run times (actions history), with a generous buffer added. (apache#12687) Optimize OnHeapHnswGraph's data structure (apache#12651) Add createClassLoader to replicator permissions (block specific to jacoco). (apache#12684) Move changes entry before backporting CHANGES Move testing properties to provider class (no classloading deadlock possible) and fallback to default provider in non-test mode simple cleanups to vector code (apache#12680) Better detect vector module in non-default setups (e.g., custom module layers) (apache#12677) ...

gf2121 · 2023-10-22T05:01:19Z

Nightly benchmark shows fuzzy queries are a bit happy for this change: https://home.apache.org/~mikemccand/lucenebench/2023.10.19.18.03.18.html.

gf2121 added 2 commits October 18, 2023 17:23

avoid object construct when linear search

65807a5

tidy

ea05c69

mikemccand reviewed Oct 18, 2023

View reviewed changes

gf2121 added 2 commits October 18, 2023 21:47

iter

3dd4697

iter

3d0914a

mikemccand reviewed Oct 18, 2023

View reviewed changes

lucene/core/src/java/org/apache/lucene/util/fst/FST.java Show resolved Hide resolved

mikemccand reviewed Oct 18, 2023

View reviewed changes

mikemccand approved these changes Oct 18, 2023

View reviewed changes

gf2121 changed the title ~~Avoid object construct when linear search~~ Avoid object construction when linear searching arcs Oct 19, 2023

gf2121 merged commit 343a9e7 into apache:main Oct 19, 2023

asfgit pushed a commit that referenced this pull request Oct 19, 2023

Avoid object construction when linear searching arcs (#12692)

6fc3483

gf2121 added this to the 9.9.0 milestone Oct 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid object construction when linear searching arcs #12692

Avoid object construction when linear searching arcs #12692

Uh oh!

gf2121 commented Oct 18, 2023

Uh oh!

gf2121 commented Oct 18, 2023

Uh oh!

mikemccand Oct 18, 2023

Uh oh!

gf2121 Oct 18, 2023

Uh oh!

Uh oh!

mikemccand left a comment

Uh oh!

gf2121 commented Oct 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Avoid object construction when linear searching arcs #12692

Avoid object construction when linear searching arcs #12692

Uh oh!

Conversation

gf2121 commented Oct 18, 2023

Description

Uh oh!

gf2121 commented Oct 18, 2023

Uh oh!

mikemccand Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

gf2121 Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

gf2121 commented Oct 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants