Skip to content

perf(sql): breaking change 💥- speed up IN comparisons on long lists#6109

Merged
bluestreak01 merged 15 commits intomasterfrom
nw_in_longs
Sep 11, 2025
Merged

perf(sql): breaking change 💥- speed up IN comparisons on long lists#6109
bluestreak01 merged 15 commits intomasterfrom
nw_in_longs

Conversation

@nwoolmer
Copy link
Copy Markdown
Contributor

@nwoolmer nwoolmer commented Sep 7, 2025

This PR changes the algorithm for serving IN(longlist...) predicate to use a hashset instead of a binary search. The binary search does not scale well as the list size increases.

Furthermore, a new config setting is introduced, cairo.sql.jit.max.in.list.size.threshold. This specifies the size of the IN list that will be accepted by the JIT compiler. If this limit is exceeded, the JIT compiler will abort and compile a Java filter instead.

This is because the JIT compiler will unroll IN lists into equality comparisons chained by logical or operators. This degrades the performance for large lists into semi-linear time. Whilst JIT compilation is beneficial for many filters, if the IN comparison is the dominant component, it is a net loss to JIT compile the query.

Users can tune this parameter for their workload.

Future work:

  • Reviewing InDouble, since this also uses a binary search.
  • Any applicability to the new Decimal type
  • The possibility of adding a query hint to control whether JIT is applied on a per-query basis.
Microbench
CREATE TABLE bench AS (SELECT rnd_long(1, 1000, 0) l FROM long_sequence(100_000_000));

SELECT count() FROM bench WHERE l IN (201,994,127,666,896,481,384,650,283,803,426,810,597,53,34,746,527,614,589,456,420,342,272,943,41,385,413,152,397,761,891,186,655,774,888,165,285,163,958,121,152,409,616,138,381,736,164,664,561,918,623,100,127,51,108,357,231,659,841,752,762,720,94,946,193,242,984,905,909,577,818,320,25,124,919,444,231,35,593,296,452,519,174,50,422,949,816,669,313,203,204,133,95,800,602,291,644,507,599,231,301,984,835,429,507,275,877,339,283,939,408,190,572,627,911,808,283,938,635,660,607,958,958,719,325,117,141,907,921,273,748,698,984,808,152,928,824,945,560,619,67,432,1000,36,348,589,70,719,5,574,472,796,655,78,772,932,463,826,124,72,356,25,285,615,420,746,839,47,341,740,737,585,911,955,417,33,985,668,57,356,549,135,37,780,792,296,259,35,65,734,919,951,852,130,768,709,780,214,424,784);
-- Using binary search
-- jit: 315ms
-- java: 328ms

-- Using hash set
-- jit: 360ms 
-- java: 115ms 

todo: fix JIT version

Also fixes an existing bug:

  if (inVals.size() == 0) {
                return new InLongSingleConstFunction(args.getQuick(0), inVals.getQuick(0));
            }

If list is empty, we shouldn't read a value from it.

After fast-failing JIT:

-- 399ms JIT
-- 83ms no JIT

Java comparison for 200 entry IN list, when the element is always in the correct range:

Benchmark                                    (size)  Mode  Cnt   Score   Error  Units
BinarySearchVsHashSetBenchmark.binarySearch     200  avgt    2  26.419          ns/op
BinarySearchVsHashSetBenchmark.hashSet          200  avgt    2   6.878          ns/op

@nwoolmer nwoolmer added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Sep 7, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Sep 7, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch nw_in_longs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nwoolmer nwoolmer changed the title perf(sql): speed up IN comparisons on long lists perf(sql): breaking change 💥- speed up IN comparisons on long lists Sep 8, 2025
@nwoolmer nwoolmer requested a review from mtopolnik September 8, 2025 16:44
mtopolnik
mtopolnik previously approved these changes Sep 8, 2025
bluestreak01
bluestreak01 previously approved these changes Sep 9, 2025
@bluestreak01
Copy link
Copy Markdown
Member

test failure is unrelated but it a master regression, lets wait for test fix PR

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 42 / 43 (97.67%)

file detail

path covered line new line coverage
🔵 io/questdb/cairo/DefaultCairoConfiguration.java 0 1 00.00%
🔵 io/questdb/PropServerConfiguration.java 2 2 100.00%
🔵 io/questdb/jit/CompiledFilterIRSerializer.java 4 4 100.00%
🔵 io/questdb/DynamicPropServerConfiguration.java 2 2 100.00%
🔵 io/questdb/PropertyKey.java 1 1 100.00%
🔵 io/questdb/std/LongList.java 6 6 100.00%
🔵 io/questdb/std/LongHashSet.java 2 2 100.00%
🔵 io/questdb/cairo/CairoConfigurationWrapper.java 1 1 100.00%
🔵 io/questdb/griffin/engine/functions/bool/InLongFunctionFactory.java 24 24 100.00%

@bluestreak01 bluestreak01 merged commit 26b9855 into master Sep 11, 2025
35 checks passed
@bluestreak01 bluestreak01 deleted the nw_in_longs branch September 11, 2025 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants