Skip to content

perf(sql): improve latency of parsing SQL expressions#6538

Merged
bluestreak01 merged 4 commits intomasterfrom
vi_speedup_expr
Dec 16, 2025
Merged

perf(sql): improve latency of parsing SQL expressions#6538
bluestreak01 merged 4 commits intomasterfrom
vi_speedup_expr

Conversation

@bluestreak01
Copy link
Copy Markdown
Member

@bluestreak01 bluestreak01 commented Dec 15, 2025

Score is in nanoseconds.

Before

Benchmark                                                                                                                  (expression)  Mode  Cnt      Score     Error  Units
ExpressionParserBenchmark.testParseExpression                                                                                     a + b  avgt    5   2967.608 ±  20.733  ns/op
ExpressionParserBenchmark.testParseExpression                                                                             a + b * c / 2  avgt    5   5616.736 ± 257.162  ns/op
ExpressionParserBenchmark.testParseExpression                                                                       a + b * c(x, y) / 2  avgt    5   8011.693 ±  30.758  ns/op
ExpressionParserBenchmark.testParseExpression                                                                  a = 1 and b = 2 or c = 3  avgt    5   7806.386 ±  25.476  ns/op
ExpressionParserBenchmark.testParseExpression                case when a > 0 then 'positive' when a < 0 then 'negative' else 'zero' end  avgt    5  11142.925 ± 198.106  ns/op
ExpressionParserBenchmark.testParseExpression                                                                      a in (1, 2, 3, 4, 5)  avgt    5   6481.112 ± 200.127  ns/op
ExpressionParserBenchmark.testParseExpression                                                       cast(a as double) + cast(b as long)  avgt    5   9132.837 ± 212.628  ns/op
ExpressionParserBenchmark.testParseExpression                                                    a between 1 and 10 and b like '%test%'  avgt    5   6788.039 ± 334.786  ns/op
ExpressionParserBenchmark.testParseExpression                                                                   coalesce(a, b, c, d, 0)  avgt    5   6560.893 ± 263.608  ns/op
ExpressionParserBenchmark.testParseExpression  sum(a) over (partition by b order by c rows between unbounded preceding and current row)  avgt    5   3467.484 ±  11.306  ns/op

After

Benchmark                                                                                                                  (expression)  Mode  Cnt     Score    Error  Units
ExpressionParserBenchmark.testParseExpression                                                                                     a + b  avgt    5   991.241 ± 14.202  ns/op
ExpressionParserBenchmark.testParseExpression                                                                             a + b * c / 2  avgt    5  1556.387 ± 65.071  ns/op
ExpressionParserBenchmark.testParseExpression                                                                       a + b * c(x, y) / 2  avgt    5  1969.139 ± 12.345  ns/op
ExpressionParserBenchmark.testParseExpression                                                                  a = 1 and b = 2 or c = 3  avgt    5  2245.348 ± 10.338  ns/op
ExpressionParserBenchmark.testParseExpression                case when a > 0 then 'positive' when a < 0 then 'negative' else 'zero' end  avgt    5  3568.643 ± 61.740  ns/op
ExpressionParserBenchmark.testParseExpression                                                                      a in (1, 2, 3, 4, 5)  avgt    5  1847.929 ± 26.421  ns/op
ExpressionParserBenchmark.testParseExpression                                                       cast(a as double) + cast(b as long)  avgt    5  2498.896 ± 11.971  ns/op
ExpressionParserBenchmark.testParseExpression                                                    a between 1 and 10 and b like '%test%'  avgt    5  2113.894 ± 39.459  ns/op
ExpressionParserBenchmark.testParseExpression                                                                   coalesce(a, b, c, d, 0)  avgt    5  1755.415 ± 16.269  ns/op
ExpressionParserBenchmark.testParseExpression  sum(a) over (partition by b order by c rows between unbounded preceding and current row)  avgt    5  1232.815 ±  7.193  ns/op

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 15, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Introduces a new JMH benchmark for measuring SQL expression parsing performance with a custom reverse-Polish-notation builder. Refactors ExpressionParser with new helper methods for improved modularity. Updates fuzz test to use Java switch expressions and adjusts method signatures.

Changes

Cohort / File(s) Summary
New Expression Parser Benchmark
benchmarks/src/main/java/org/questdb/ExpressionParserBenchmark.java
Adds new JMH benchmark class with CairoEngine and SqlCompiler setup/teardown. Introduces RpnBuilder inner class implementing ExpressionParserListener to track parsed expression nodes. Exposes parameterized expression samples and testParseExpression benchmark method.
Expression Parser Refactoring
core/src/main/java/io/questdb/griffin/ExpressionParser.java
Extracts parsing logic into dedicated helper methods covering dereference processing, keyword validation, whitespace handling, floating-point literals, GeoHash operations, binary operations, decimal type parsing, lambda queries, PostgreSQL timestamps, unary operations, and array qualifiers. Replaces inlined code in parseExpr with calls to these helpers. Adds NotNull annotation import.
Test Modernization
core/src/test/java/io/questdb/test/griffin/ExpressionParserFuzzTest.java
Removes attempts parameter from fuzzTestValidExpressionAgainstOperators, internalizing iteration logic to 1024 fixed loops. Converts randomLiteral and ExpressionElement.wrap switch statements to Java 14+ switch expressions. Updates call site to align with new method signature.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • ExpressionParser refactoring introduces multiple new helper methods with distinct responsibilities—each requires verification that extracted logic maintains original behavior and error handling
  • Interaction between helpers and parseExpr control flow changes warrant careful tracing to ensure no regressions
  • Benchmark class is straightforward but should verify correct engine lifecycle management and RPN output accuracy
  • Test changes are largely syntactic but the method signature removal needs validation that all call sites are updated

Suggested labels

Performance, SQL

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 2.56% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the primary change: improving SQL expression parsing performance, with specific focus on latency reduction.
Description check ✅ Passed The description provides comprehensive before/after benchmark results demonstrating significant performance improvements across multiple SQL expression parsing scenarios.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@puzpuzpuz
Copy link
Copy Markdown
Contributor

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 16, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
benchmarks/src/main/java/org/questdb/ExpressionParserBenchmark.java (1)

117-122: The rpn() method mutates internal state, which is fine for single-threaded benchmarks but could be surprising.

The method trims the trailing space by modifying the sink in place. Since the benchmark uses Scope.Thread and reset() is called before each invocation, this is safe. Consider documenting this behavior or returning a new CharSequence to avoid confusion in other usage contexts.

If you prefer a side-effect-free approach:

     public final CharSequence rpn() {
         if (!sink.isEmpty() && sink.charAt(sink.length() - 1) == ' ') {
-            sink.clear(sink.length() - 1);
+            return sink.subSequence(0, sink.length() - 1);
         }
         return sink;
     }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0d9a535 and 4dd3b51.

📒 Files selected for processing (3)
  • benchmarks/src/main/java/org/questdb/ExpressionParserBenchmark.java (1 hunks)
  • core/src/main/java/io/questdb/griffin/ExpressionParser.java (23 hunks)
  • core/src/test/java/io/questdb/test/griffin/ExpressionParserFuzzTest.java (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
core/src/main/java/io/questdb/griffin/ExpressionParser.java (4)
core/src/main/java/io/questdb/griffin/SqlException.java (1)
  • SqlException (37-268)
core/src/main/java/io/questdb/griffin/model/ExpressionNode.java (1)
  • ExpressionNode (42-412)
core/src/main/java/io/questdb/griffin/GeoHashUtil.java (1)
  • GeoHashUtil (34-99)
core/src/main/java/io/questdb/griffin/OperatorExpression.java (1)
  • OperatorExpression (29-217)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
  • GitHub Check: New pull request (Coverage Report Coverage Report)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
  • GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
  • GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
  • GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-other-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
  • GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
  • GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
  • GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
  • GitHub Check: New pull request (Hosted Running tests on mac-other)
  • GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
  • GitHub Check: New pull request (Hosted Running tests on mac-cairo)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
  • GitHub Check: New pull request (Hosted Running tests on mac-griffin)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
  • GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
  • GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
  • GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (51)
core/src/test/java/io/questdb/test/griffin/ExpressionParserFuzzTest.java (4)

78-78: LGTM!

The call site is correctly updated to match the new method signature that internalizes the iteration count.


114-115: Hardcoded iteration count is acceptable for fuzz testing.

Moving the loop inside the method with a fixed 1024 iterations simplifies the API. The count is reasonable for fuzz testing coverage.


401-405: Switch expression modernization in wrap() is correct.

The logic is preserved while using the more concise switch expression syntax. The default case properly throws an exception for unsupported arities.


244-249: Modern switch expression syntax looks good.

The switch expression uses proper Java 14+ syntax with arrow labels and returns values cleanly. The project targets Java 17, which fully supports this feature.

core/src/main/java/io/questdb/griffin/ExpressionParser.java (41)

151-168: Helper method for dereference error handling is well-structured.

The method improves error messages for invalid type casts like x::[]type. The early return after throwing ensures clean control flow.


170-193: IS keyword validation logic is correct.

The method properly handles IS NULL, IS NOT NULL, IS TRUE, IS FALSE, IS NOT TRUE, and IS NOT FALSE by rewriting to = or != operators. The lexer backtracking is correctly implemented.


195-208: Array type whitespace validation is thorough.

The method correctly rejects whitespace between type name and [] with a descriptive error message. The assertion on line 199 assumes hi != -1 which should always hold given the context.


210-238: Floating point constant creation handles edge cases.

The method correctly handles cases like 1.2 vs 1. 2 and creates appropriate literal nodes. The else branch at line 227 handles the case where the dot follows an operator/control node.


240-265: GeoHash constant creation with optional suffix is correct.

The method properly handles optional /d or /dd suffix syntax and validates that the bits specification contains only decimals.


267-286: Standard GeoHash node creation validates type precision.

The method correctly validates geohash size tokens and ensures proper closing parenthesis.


288-304: Operation node creation is clean and correct.

The method properly sets the node type based on whether it's a SET operation and correctly handles BETWEEN's special 3-parameter case.


372-383: Scientific notation handling for +/- is correct.

The method correctly identifies incomplete scientific notation like 1e+ or 1e- by checking if the previous character was 'e' or 'E'.


385-395: BETWEEN clause stack unwinding is correct.

The method pops operators until finding a BETWEEN keyword, then pushes it back for further processing.


397-409: Array access stack handling is correct.

The method properly pops literals and array-related nodes with precedence 2 for array access operations.


411-428: Function call handling on parenthesis close is correct.

The method properly converts literals to function calls and validates SET_OPERATION parameter counts with a minimum of 2 arguments.


430-454: Aliased wildcard handling (table.*) is well-implemented.

The method correctly handles both a.* (floating sequence) and "foo".* (quoted identifier) cases, properly constructing the compound literal.


456-470: Colon processing for bind variables is correct.

The method handles single-character colon followed by non-whitespace to form bind variable tokens like :param.


472-490: Comma handling with CAST scope validation is correct.

The method properly rejects commas within CAST/CAST_AS scopes and pops operators until finding a control node.


492-523: Decimal type parsing handles precision and scale.

The method correctly parses decimal(p) and decimal(p, s) syntax, creating appropriate constant nodes with encoded type information.


525-554: DISTINCT keyword rewriting for count/string_agg is correct.

The method properly rewrites count(distinct x) to count_distinct(x) and string_agg(distinct x) to string_distinct_agg(x), with appropriate error handling for count(distinct *).


598-629: PostgreSQL timestamp cast handling is correct.

The method properly implements the PostgreSQL-style cast syntax like timestamp '2005-04-02', validating the target type and creating the appropriate FUNCTION node with cast operation.


631-649: Unary minus/complement detection is correct.

The method properly identifies unary operators based on the previous branch context (after operators, left parens, commas, etc.) and returns the appropriate unary operator expression.


651-656: Array slice validation is minimal but sufficient.

The method validates that a colon operator exists on the stack for array slicing syntax like [1:].


658-675: Array qualifier validation ensures no whitespace.

The method validates that the closing bracket immediately follows the opening bracket in array type qualifiers, throwing a descriptive error if whitespace is detected.


722-722: Call site updated to use extracted helper.

The parsePlus helper is correctly invoked for handling scientific notation.


771-771: Comma processing delegated to helper.

The processCommaAndPopOpStack helper is correctly integrated.


777-777: Colon processing delegated to helper.

The processColon helper is correctly integrated.


786-786: Array whitespace validation delegated to helper.

The validateWhitespace helper is correctly integrated.


792-792: Dereference processing delegated to helper.

The processDereference helper is correctly integrated.


811-811: Array stack popping delegated to helper.

The popArrayOpStack helper is correctly integrated.


831-832: Array qualifier validation integrated correctly.

The helper methods are properly called with appropriate error handling.


842-843: Array outlier validation integrated correctly.

The validateArrayOutliers helper is called for the edge case of array slicing syntax.


845-899: Bracket handling restructured with break statements.

The default case now has explicit break statements which improves readability and control flow clarity.


1010-1010: Function stack popping delegated to helper.

The popLiteralAndFunctionOpStack helper is correctly integrated for handling closing parentheses.


1034-1034: Decimal processing delegated to helper.

The processDecimal helper is correctly integrated.


1063-1064: Decimal rewrite processing integrated with continue.

The processDecimalRewrite helper returns a boolean to indicate if processing should continue, correctly used here.


1084-1084: GeoHash node creation delegated to helper.

The createGeoHashNodeStd helper is correctly integrated.


1094-1094: GeoHash constant creation delegated to helper.

The createGeoHashConst helper is correctly integrated.


1162-1162: BETWEEN AND handling delegated to helper.

The popAndOpStack helper is correctly integrated for BETWEEN clause processing.


1294-1294: Floating point constant creation delegated to helper.

The createFloatingPointConstant helper is correctly integrated.


1330-1330: PostgreSQL timestamp cast delegated to helper.

The processPgTimestampCast helper is correctly integrated.


1383-1383: IS keyword validation delegated to helper.

The validateIsKeyword helper is correctly integrated.


1394-1394: Aliased wildcard processing delegated to helper.

The processAliasedWildcard helper is correctly integrated.


1417-1417: Unary minus processing delegated to helper.

The processUnaryMinus helper is correctly integrated.


1484-1484: Operation creation delegated to helper.

The createOperation helper is correctly integrated for creating operation nodes.

benchmarks/src/main/java/org/questdb/ExpressionParserBenchmark.java (6)

44-47: JMH annotations are correctly configured.

Thread scope, average time mode, and nanosecond output unit are appropriate for this micro-benchmark.


49-49: Static configuration is acceptable for benchmark.

Using java.io.tmpdir for the benchmark configuration is reasonable since no actual data persistence is needed.


51-63: Good variety of SQL expressions for benchmarking.

The parameterized expressions cover arithmetic, logical, case statements, set operations, casts, and window functions, providing comprehensive coverage.


82-92: Resource management follows JMH best practices.

Using Level.Trial for setup/teardown is appropriate for heavyweight resources like CairoEngine. The compiler is properly closed before the engine.


94-99: Benchmark method is correctly structured.

Resetting the builder before each parse and returning the result prevents dead code elimination by JMH.


101-123: RpnBuilder implementation is correct and minimal.

The listener correctly handles both query models and regular tokens, building a space-separated RPN representation suitable for benchmark output verification.

@puzpuzpuz puzpuzpuz self-requested a review December 16, 2025 12:18
@puzpuzpuz puzpuzpuz added SQL Issues or changes relating to SQL execution Performance Performance improvements labels Dec 16, 2025
@questdb-butler
Copy link
Copy Markdown

⚠️ Enterprise CI Failed

The enterprise test suite failed for this PR.

Build: View Details
Tested Commit: 24e11feb9beaf615f5aae4d063e0322d95eaf4d5

Please investigate the failure before merging.

@glasstiger
Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 233 / 237 (98.31%)

file detail

path covered line new line coverage
🔵 io/questdb/griffin/ExpressionParser.java 233 237 98.31%

@bluestreak01 bluestreak01 merged commit 94a578f into master Dec 16, 2025
42 of 43 checks passed
@bluestreak01 bluestreak01 deleted the vi_speedup_expr branch December 16, 2025 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Performance Performance improvements SQL Issues or changes relating to SQL execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants