perf(sql): speed up JIT-compiled filters by reordering predicates and short-circuiting them#6568
Merged
bluestreak01 merged 40 commits intomasterfrom Dec 24, 2025
Merged
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
… short-circuiting them
1c6938c to
6d5f1e7
Compare
…nto puzpuzpuz_jit_short_circuit
…nto puzpuzpuz_jit_short_circuit
d767bb3 to
82569bd
Compare
benchmarks/src/main/java/org/questdb/SqlJitCompilerScalarBenchmark.java
Outdated
Show resolved
Hide resolved
Contributor
[PR Coverage check]😍 pass : 230 / 239 (96.23%) file detail
|
bluestreak01
approved these changes
Dec 24, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces several performance optimizations to the JIT-compiled filter execution in QuestDB.
Short-circuit evaluation
Implements short-circuit evaluation for scalar
AND/ORpredicate chains:ANDchains: If a predicate evaluates tofalse, skip remaining predicates and move to next rowORchains: If a predicate evaluates totrue, skip remaining predicates and store the rowIN()operator: Optimized using short-circuit OR chains internally (only in case of the outerANDchain)New IR opcodes:
And_Sc,Or_Sc,Begin_Sc,End_ScPredicate reordering
Predicates are automatically sorted by estimated selectivity to maximize short-circuit benefits:
Register/memory hoisting
Several caches reduce redundant memory loads inside the hot loop:
ColumnAddressCacheConstantCacheColumnValueCacheConstantCacheYmmSIMD scatter short-circuiting
In the AVX2 SIMD loop, the scatter phase (writing matching row IDs) is now skipped when the mask is zero (no matches in the batch). This avoids expensive compress/scatter operations when filtering is highly selective.
Code generation improvements
TESTinstead ofCMPfor zero-checks where possibleAND/ORinstructions (short-circuit jumps suffice)TEST/SETEsequences for equality comparisons by tracking comparison flagsXORinstructions to break false dependenciesBenchmarks
On my box (Ryzen 7900x 64GB RAM Ubuntu 24.04) I've got the following difference in ClickBench's Hot Run (patch is on the left,
masteris on the right):Also, I've got the following results in JMH benchmarks.
SqlJitCompilerScalarBenchmark(AND/OR predicate chains)SqlJitCompilerSimdBenchmark(single predicate)SIMD mode:
Scalar mode:
Key takeaways:
EQ) - up to 69% faster for i16AND/ORchains: Significant improvements (23-42% faster) due to short-circuit evaluationNEQpredicates (low selectivity): Modest improvements since most rows match anywayEQpredicates on i64/i16 shows slight overhead (~5%), likely due to additional setup for hoisting that doesn't pay off for simple single-predicate filters. On the other hand, single-predicate filters are compiled with SIMD in most cases.A comparison of the generated code before-after
Let's consider the following query on
hitstable from ClickBench:On my box it takes 41ms on
masterand 24ms with this patch.Summary of the before/after assembly:
TraficSourceIDwas loaded twice before)SETE/ANDchains - replaced by direct conditional jumpsFor highly selective filters (few matches), the short-circuit behavior provides the biggest win since most rows exit after the first 1-2 predicate checks.
On
masterthe assembly generated by JIT looks like this:With the patch, it's the following: