perf(sql): improve latency of parsing SQL expressions#6538
perf(sql): improve latency of parsing SQL expressions#6538bluestreak01 merged 4 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughIntroduces a new JMH benchmark for measuring SQL expression parsing performance with a custom reverse-Polish-notation builder. Refactors ExpressionParser with new helper methods for improved modularity. Updates fuzz test to use Java switch expressions and adjusts method signatures. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Suggested labels
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
benchmarks/src/main/java/org/questdb/ExpressionParserBenchmark.java (1)
117-122: Therpn()method mutates internal state, which is fine for single-threaded benchmarks but could be surprising.The method trims the trailing space by modifying the sink in place. Since the benchmark uses
Scope.Threadandreset()is called before each invocation, this is safe. Consider documenting this behavior or returning a newCharSequenceto avoid confusion in other usage contexts.If you prefer a side-effect-free approach:
public final CharSequence rpn() { if (!sink.isEmpty() && sink.charAt(sink.length() - 1) == ' ') { - sink.clear(sink.length() - 1); + return sink.subSequence(0, sink.length() - 1); } return sink; }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
benchmarks/src/main/java/org/questdb/ExpressionParserBenchmark.java(1 hunks)core/src/main/java/io/questdb/griffin/ExpressionParser.java(23 hunks)core/src/test/java/io/questdb/test/griffin/ExpressionParserFuzzTest.java(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
core/src/main/java/io/questdb/griffin/ExpressionParser.java (4)
core/src/main/java/io/questdb/griffin/SqlException.java (1)
SqlException(37-268)core/src/main/java/io/questdb/griffin/model/ExpressionNode.java (1)
ExpressionNode(42-412)core/src/main/java/io/questdb/griffin/GeoHashUtil.java (1)
GeoHashUtil(34-99)core/src/main/java/io/questdb/griffin/OperatorExpression.java (1)
OperatorExpression(29-217)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
- GitHub Check: New pull request (Coverage Report Coverage Report)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
- GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (51)
core/src/test/java/io/questdb/test/griffin/ExpressionParserFuzzTest.java (4)
78-78: LGTM!The call site is correctly updated to match the new method signature that internalizes the iteration count.
114-115: Hardcoded iteration count is acceptable for fuzz testing.Moving the loop inside the method with a fixed 1024 iterations simplifies the API. The count is reasonable for fuzz testing coverage.
401-405: Switch expression modernization inwrap()is correct.The logic is preserved while using the more concise switch expression syntax. The default case properly throws an exception for unsupported arities.
244-249: Modern switch expression syntax looks good.The switch expression uses proper Java 14+ syntax with arrow labels and returns values cleanly. The project targets Java 17, which fully supports this feature.
core/src/main/java/io/questdb/griffin/ExpressionParser.java (41)
151-168: Helper method for dereference error handling is well-structured.The method improves error messages for invalid type casts like
x::[]type. The early return after throwing ensures clean control flow.
170-193: IS keyword validation logic is correct.The method properly handles
IS NULL,IS NOT NULL,IS TRUE,IS FALSE,IS NOT TRUE, andIS NOT FALSEby rewriting to=or!=operators. The lexer backtracking is correctly implemented.
195-208: Array type whitespace validation is thorough.The method correctly rejects whitespace between type name and
[]with a descriptive error message. The assertion on line 199 assumeshi != -1which should always hold given the context.
210-238: Floating point constant creation handles edge cases.The method correctly handles cases like
1.2vs1. 2and creates appropriate literal nodes. Theelsebranch at line 227 handles the case where the dot follows an operator/control node.
240-265: GeoHash constant creation with optional suffix is correct.The method properly handles optional
/dor/ddsuffix syntax and validates that the bits specification contains only decimals.
267-286: Standard GeoHash node creation validates type precision.The method correctly validates geohash size tokens and ensures proper closing parenthesis.
288-304: Operation node creation is clean and correct.The method properly sets the node type based on whether it's a SET operation and correctly handles BETWEEN's special 3-parameter case.
372-383: Scientific notation handling for +/- is correct.The method correctly identifies incomplete scientific notation like
1e+or1e-by checking if the previous character was 'e' or 'E'.
385-395: BETWEEN clause stack unwinding is correct.The method pops operators until finding a BETWEEN keyword, then pushes it back for further processing.
397-409: Array access stack handling is correct.The method properly pops literals and array-related nodes with precedence 2 for array access operations.
411-428: Function call handling on parenthesis close is correct.The method properly converts literals to function calls and validates SET_OPERATION parameter counts with a minimum of 2 arguments.
430-454: Aliased wildcard handling (table.*) is well-implemented.The method correctly handles both
a.*(floating sequence) and"foo".*(quoted identifier) cases, properly constructing the compound literal.
456-470: Colon processing for bind variables is correct.The method handles single-character colon followed by non-whitespace to form bind variable tokens like
:param.
472-490: Comma handling with CAST scope validation is correct.The method properly rejects commas within CAST/CAST_AS scopes and pops operators until finding a control node.
492-523: Decimal type parsing handles precision and scale.The method correctly parses
decimal(p)anddecimal(p, s)syntax, creating appropriate constant nodes with encoded type information.
525-554: DISTINCT keyword rewriting for count/string_agg is correct.The method properly rewrites
count(distinct x)tocount_distinct(x)andstring_agg(distinct x)tostring_distinct_agg(x), with appropriate error handling forcount(distinct *).
598-629: PostgreSQL timestamp cast handling is correct.The method properly implements the PostgreSQL-style cast syntax like
timestamp '2005-04-02', validating the target type and creating the appropriate FUNCTION node withcastoperation.
631-649: Unary minus/complement detection is correct.The method properly identifies unary operators based on the previous branch context (after operators, left parens, commas, etc.) and returns the appropriate unary operator expression.
651-656: Array slice validation is minimal but sufficient.The method validates that a colon operator exists on the stack for array slicing syntax like
[1:].
658-675: Array qualifier validation ensures no whitespace.The method validates that the closing bracket immediately follows the opening bracket in array type qualifiers, throwing a descriptive error if whitespace is detected.
722-722: Call site updated to use extracted helper.The
parsePlushelper is correctly invoked for handling scientific notation.
771-771: Comma processing delegated to helper.The
processCommaAndPopOpStackhelper is correctly integrated.
777-777: Colon processing delegated to helper.The
processColonhelper is correctly integrated.
786-786: Array whitespace validation delegated to helper.The
validateWhitespacehelper is correctly integrated.
792-792: Dereference processing delegated to helper.The
processDereferencehelper is correctly integrated.
811-811: Array stack popping delegated to helper.The
popArrayOpStackhelper is correctly integrated.
831-832: Array qualifier validation integrated correctly.The helper methods are properly called with appropriate error handling.
842-843: Array outlier validation integrated correctly.The
validateArrayOutliershelper is called for the edge case of array slicing syntax.
845-899: Bracket handling restructured with break statements.The default case now has explicit break statements which improves readability and control flow clarity.
1010-1010: Function stack popping delegated to helper.The
popLiteralAndFunctionOpStackhelper is correctly integrated for handling closing parentheses.
1034-1034: Decimal processing delegated to helper.The
processDecimalhelper is correctly integrated.
1063-1064: Decimal rewrite processing integrated with continue.The
processDecimalRewritehelper returns a boolean to indicate if processing should continue, correctly used here.
1084-1084: GeoHash node creation delegated to helper.The
createGeoHashNodeStdhelper is correctly integrated.
1094-1094: GeoHash constant creation delegated to helper.The
createGeoHashConsthelper is correctly integrated.
1162-1162: BETWEEN AND handling delegated to helper.The
popAndOpStackhelper is correctly integrated for BETWEEN clause processing.
1294-1294: Floating point constant creation delegated to helper.The
createFloatingPointConstanthelper is correctly integrated.
1330-1330: PostgreSQL timestamp cast delegated to helper.The
processPgTimestampCasthelper is correctly integrated.
1383-1383: IS keyword validation delegated to helper.The
validateIsKeywordhelper is correctly integrated.
1394-1394: Aliased wildcard processing delegated to helper.The
processAliasedWildcardhelper is correctly integrated.
1417-1417: Unary minus processing delegated to helper.The
processUnaryMinushelper is correctly integrated.
1484-1484: Operation creation delegated to helper.The
createOperationhelper is correctly integrated for creating operation nodes.benchmarks/src/main/java/org/questdb/ExpressionParserBenchmark.java (6)
44-47: JMH annotations are correctly configured.Thread scope, average time mode, and nanosecond output unit are appropriate for this micro-benchmark.
49-49: Static configuration is acceptable for benchmark.Using
java.io.tmpdirfor the benchmark configuration is reasonable since no actual data persistence is needed.
51-63: Good variety of SQL expressions for benchmarking.The parameterized expressions cover arithmetic, logical, case statements, set operations, casts, and window functions, providing comprehensive coverage.
82-92: Resource management follows JMH best practices.Using
Level.Trialfor setup/teardown is appropriate for heavyweight resources like CairoEngine. The compiler is properly closed before the engine.
94-99: Benchmark method is correctly structured.Resetting the builder before each parse and returning the result prevents dead code elimination by JMH.
101-123: RpnBuilder implementation is correct and minimal.The listener correctly handles both query models and regular tokens, building a space-separated RPN representation suitable for benchmark output verification.
|
[PR Coverage check]😍 pass : 233 / 237 (98.31%) file detail
|
Score is in nanoseconds.
Before
After