feat(sql): support nested window functions with deduplication optimization#6643
feat(sql): support nested window functions with deduplication optimization#6643bluestreak01 merged 38 commits intomasterfrom
Conversation
…o be an argument to any other function (doh)
WalkthroughRemoved a parser-level rejection for window functions as function arguments; added nested-window extraction, deduplication, validation, and propagation in the optimiser; extended AST equality/hash to include WindowExpression; removed an identity-list utility; and expanded tests for nested and deduplicated window functions. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.40.5)core/src/main/java/io/questdb/griffin/SqlOptimiser.javaThanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
We need to test more complex queries like window function as an arg to an aggregation function or another window function. For example: CREATE TABLE x AS (
SELECT x, timestamp_sequence('2024-01-01', 1000000) AS ts FROM long_sequence(3)
) TIMESTAMP(ts) PARTITION BY DAY;
-- returns 0.0, 0.0, 0.0 while it should either give an error or return 6.0
SELECT sum(row_number() OVER ()) OVER () FROM x;
-- gives "Window function is not allowed in context of aggregation. Use sub-query." - this error looks spot-on
SELECT row_number() OVER (order by x), count() FROM x;
-- gives "Aggregate function cannot be passed as an argument" - this error is weird since the argument is a window function, not plain sum()
SELECT sum(sum(x) OVER () - sum(x) OVER ()) FROM x;
-- gives "window function called in non-window context, make sure to add OVER clause" - this error is weird since OVER clause is present
SELECT sum(row_number() OVER (order by x)) FROM x; |
|
@puzpuzpuz all addressed, all sqls, except for the group-by where window function is a key, execute correctly. |
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@core/src/main/java/io/questdb/griffin/SqlOptimiser.java`:
- Around line 120-122: The hard-coded MAX_WINDOW_FUNCTION_NESTING_DEPTH in
SqlOptimiser should be made configurable or tied to the existing
configuration.getSqlWindowMaxRecursion(); replace the constant usage with a
lookup to configuration.getSqlWindowMaxRecursion() (or expose a new config
getter if semantics differ) and ensure any thrown error message references the
same user-visible "window nesting depth" value; update references to
MAX_WINDOW_FUNCTION_NESTING_DEPTH in SqlOptimiser and adjust the error text to
use the configuration value so the limit is consistent and user-controllable.
- Around line 2570-2710: replaceIfWindowFunction() registers a discovered window
function but does not call extractAndRegisterNestedWindowFunctions(), so nested
windows inside wrapper expressions (found via emitWindowFunctions()) stay
unextracted; fix by invoking extractAndRegisterNestedWindowFunctions(...)
immediately after you register/add the WindowExpression in
replaceIfWindowFunction() (use the same translatingModel, innerVirtualModel,
baseModel and pass depth+1 or appropriate depth context), mirror the call
pattern used from top-level/group-by paths, and ensure you still call
emitLiterals/replace with nextLiteral(alias) afterwards so AST mutation keeps
compareNodesExact() dedup semantics consistent with code paths that pre-extract
nested windows.
In
`@core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java`:
- Around line 9188-9208: The test method testWindowFunctionAsArgumentToAggregate
is brittle because SUM over row_number() can be printed as 6.0; update the
assertion in the assertQuery call to make the expectation deterministic by
either changing the expected output block from "6" to "6.0" or change the SQL
query ("SELECT sum(row_number() OVER (order by ts)) AS result FROM x") to cast
the aggregate to an integer (e.g. wrap with CAST(... AS INT)) so the result
prints as "6" consistently; modify the expected string or the SQL accordingly in
the assertQuery invocation to match the chosen approach.
🧹 Nitpick comments (4)
core/src/main/java/io/questdb/griffin/SqlOptimiser.java (1)
183-185: EnsureinnerWindowModelsis always reset on optimiser reuse (includingclear()).You clear
innerWindowModelsat the start ofrewriteSelectClause0, which is good, butSqlOptimiser.clear()doesn’t clear it. IfSqlOptimiserinstances are pooled/reused across queries, keeping this list uncleared there is a footgun (retained references + surprising cross-query state).Also applies to: 7000-7003
core/src/test/java/io/questdb/test/griffin/SqlParserTest.java (1)
8195-8205: Dedup “different window specs” test is solid; consider making the expression non-commutative.Using
+is fine for asserting “two distinct window specs must both exist”, but-(or another non-commutative op) can make intent clearer if future rewrites ever reorder operands/aliases.Possible tweak (optional)
- "select-virtual row_number1 + row_number column from (select-window [row_number() row_number over (order by id), row_number() row_number1 over (order by ts)] row_number() row_number over (order by id), row_number() row_number1 over (order by ts) from (select [id, ts] from x timestamp (ts)))", - "SELECT row_number() OVER (ORDER BY ts) + row_number() OVER (ORDER BY id) FROM x", + "select-virtual row_number1 - row_number column from (select-window [row_number() row_number over (order by id), row_number() row_number1 over (order by ts)] row_number() row_number over (order by id), row_number() row_number1 over (order by ts) from (select [id, ts] from x timestamp (ts)))", + "SELECT row_number() OVER (ORDER BY ts) - row_number() OVER (ORDER BY id) FROM x",core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java (1)
9122-9163: Nice coverage for “window expressions inside scalar funcs” and “window arithmetic inside aggregates”; consider asserting dedup via plan.These two cases are great for regression coverage, but they don’t explicitly validate the dedup optimization (i.e., identical
sum(id) OVER ()computed once). If the plan output is stable enough, adding a lightweightassertPlanNoLeakCheck(...)forSELECT sum(sum(id) OVER () + sum(id) OVER ()) ...would make the test fail if dedup regresses (rather than only catching functional breakage).core/src/test/java/io/questdb/test/griffin/WindowFunctionTest.java (1)
527-553: Consider adding an explicit deduplication verification test.This test uses
sum(id) OVER (ORDER BY ts)andlag(id) OVER (ORDER BY ts)both standalone and in the arithmetic expressionsum(id) OVER (ORDER BY ts) - lag(id) OVER (ORDER BY ts). While this implicitly tests deduplication (same window expressions should be computed once), consider adding a test that explicitly verifies the optimization is working, perhaps by checking query plan or execution statistics if available.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
core/src/main/java/io/questdb/griffin/SqlOptimiser.javacore/src/main/java/io/questdb/griffin/model/ExpressionNode.javacore/src/test/java/io/questdb/test/griffin/ExpressionParserTest.javacore/src/test/java/io/questdb/test/griffin/SqlParserTest.javacore/src/test/java/io/questdb/test/griffin/WindowFunctionTest.javacore/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java
🧰 Additional context used
🧬 Code graph analysis (2)
core/src/main/java/io/questdb/griffin/model/ExpressionNode.java (1)
core/src/main/java/io/questdb/std/IntList.java (1)
IntList(34-410)
core/src/main/java/io/questdb/griffin/SqlOptimiser.java (2)
core/src/main/java/io/questdb/griffin/SqlException.java (1)
SqlException(37-273)core/src/main/java/io/questdb/griffin/model/ExpressionNode.java (1)
ExpressionNode(43-464)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
- GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (Check Changes Check changes)
- GitHub Check: build
🔇 Additional comments (18)
core/src/main/java/io/questdb/griffin/model/ExpressionNode.java (3)
83-85: LGTM on extended node comparison.The addition of
compareWindowExpressionsto the equality check ensures that window functions with identical arguments but differing window specifications (frame, partition, order) are correctly distinguished for deduplication purposes.
87-110: Thorough window expression comparison.The method comprehensively compares all components of a window specification—frame mode, row boundaries (including expression-based bounds), exclusion mode, and ignore-nulls flag. The recursive
compareNodesExactcall for boundary expressions correctly supports nested window function scenarios.
112-135: The null-safety concerns in this review are unfounded.The lists returned by
getPartitionBy(),getOrderBy(), andgetOrderByDirection()are guaranteed to be non-null: they areprivate finalfields initialized at declaration (WindowExpression lines 54–56) and are never reassigned. The only method that modifiesorderByandorderByDirectionisaddOrderBy()(lines 75–77), which adds to both lists simultaneously, maintaining their size invariant. Theclear()method also clears all three lists together. Therefore, the implicit assumption thatgetOrderByDirection().size()matchesgetOrderBy().size()is a guaranteed invariant, not a potential bug.Likely an incorrect or invalid review comment.
core/src/main/java/io/questdb/griffin/SqlOptimiser.java (1)
6622-6694: Good integration points for nested extraction in select-window and group-by paths.Calling
extractAndRegisterNestedWindowFunctions(...)before registering the top-level window column (and before group-by processing) is the right place to normalize the AST and avoid later “window function called in non-window context” style errors. The ordering withaddMissingTablePrefixesForGroupByQueries(...)before extraction also makes sense since it validates againstbaseModel.core/src/test/java/io/questdb/test/griffin/ExpressionParserTest.java (1)
2286-2294: LGTM!The test correctly verifies the RPN output for nested window functions. The expected RPN
row_number over () sum over ()follows the established pattern where arguments (including window function results) precede the function name in RPN notation. This aligns well with the PR objective of supporting nested window functions.core/src/test/java/io/questdb/test/griffin/SqlParserTest.java (7)
8164-8176: Good coverage for “window arithmetic inside scalar function” (abs) path.This test exercises the key relaxation (window funcs can participate in larger expressions passed to scalar functions) and validates correct extraction into
select-windowplus alias reuse inselect-virtual.
8178-8193: CASE expression test updated appropriately for dedup behavior.Nice adjustment to reflect that identical
row_number() OVER (ORDER BY ts)occurrences should be extracted once and referenced via the single projected alias.
8207-8220: Window-as-argument-to-window test matches the intended “layer extraction” model.The expected structure (inner
select-windowforrow_number, outerselect-windowforsum(row_number) OVER ()) is a good regression guard for the optimiser change.
8222-8233: Good parser-level coverage for “window as argument to aggregate”.This asserts the window function is computed in a window layer and the aggregate consumes the alias in a group-by layer, which is the critical rewrite for this feature.
8235-8250: Three-level nesting test is a strong end-to-end parser/model assertion.Chaining deepest-first via multiple
select-windowlayers is exactly what can regress easily; this test will catch it.
8252-8264: Nice coverage for “multiple different nested windows in one expression”.Validates extraction of multiple inner windows and correct expression reconstruction for the outer window function.
8266-8275: Depth-limit test is valuable; keep an eye on the hard-coded error position.The
max=8assertion is great; the exact position (43) may become brittle if formatting/normalization changes, so this is worth re-checking if the parser’s tokenization offsets evolve.core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java (1)
9210-9272: LGTM: covers window-as-arg (cast, scalar fn) and nested window-of-window.The set cleanly exercises the previously-blocked shapes (window inside
cast(), insideabs(), and nested window functions). The nested case is also order-robust (sum of row_numbers is invariant to row order), which is good for test stability.core/src/test/java/io/questdb/test/griffin/WindowFunctionTest.java (5)
30-35: LGTM! Comprehensive test suite for window function execution.This new test file provides excellent end-to-end coverage for the nested window function feature. The tests systematically cover:
- Basic window functions with different specifications
- Window functions in arithmetic expressions
- Window functions as arguments to other functions, aggregates, and other window functions
- Complex scenarios with CASE expressions, casts, and frame specifications
293-313: Test correctly validates the PR objective.This test addresses the specific case mentioned in the PR comments (
SELECT sum(row_number() OVER (order by ts)) FROM x). The expected result of6is correct: row_number() returns 1, 2, 3 for the three rows, and summing them yields 6.
399-424: Good coverage of different OVER specifications in CASE branches.This test validates an important edge case where the THEN and ELSE branches use window functions with different window specifications (cumulative sum vs partitioned sum). The expected results correctly reflect that each branch computes its window independently.
661-683: Thorough NULL propagation testing.The test correctly verifies that NULL values propagate through window function arithmetic. The expected results properly show NULL when either operand is NULL (e.g.,
id - lag(id)is NULL whenidis NULL or whenlag(id)is NULL).
355-377: No action needed. The nesting depth limit test already exists in SqlParserTest.java (testWindowFunctionNestingDepthLimit()), which correctly validates that 9 levels of nesting exceeds the limit of 8 and triggers the appropriate error. The test in WindowFunctionTest.java appropriately focuses on execution validation for valid nested window functions.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java
Show resolved
Hide resolved
|
@CodeRabbit review again, check if the comments have been addressed |
🧩 Analysis chain🏁 Script executed: Length of output: 321586
Two follow-ups I can take if you’d like:
Let me know if you want me to draft those additions. 🏁 Script executed: Length of output: 12838
Addressed by tests
Gaps I still recommend closing in this PR
If you want, I can draft both tests now. Otherwise, from my side the previously raised functional cases are addressed; only the above two assertions would make the coverage airtight. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@core/src/main/java/io/questdb/griffin/model/ExpressionNode.java`:
- Around line 148-155: The hashCode logic in ExpressionNode uses Chars.hashCode
for FUNCTION and LITERAL tokens but compareNodesExact compares those tokens
case-insensitively, causing a hash/equality contract violation; update the
branch inside the token != null check for node.type == FUNCTION || node.type ==
LITERAL to call Chars.lowerCaseHashCode(node.token) instead of
Chars.hashCode(node.token) so hashing matches the case-insensitive equality used
elsewhere (leave the else branch using node.token.hashCode() intact).
In `@core/src/main/java/io/questdb/griffin/SqlOptimiser.java`:
- Around line 2631-2702: extractNestedWindowFunctions() currently materializes
nested window functions into innerWindowModel without validating their WINDOW
SPECs; call validateNoWindowFunctionsInWindowSpec(...) after you extract nested
windows and before registering/adding the current WindowExpression. Concretely,
inside the node.windowExpression != null branch (after
extractAndRegisterNestedWindowFunctions(...) and before
innerWindowModel.addBottomUpColumn(wc) / registerWindowFunction(...)), invoke
validateNoWindowFunctionsInWindowSpec on the current WindowExpression (or its
WindowSpec) so any window functions inside PARTITION BY/ORDER BY are caught for
extracted/nested windows.
♻️ Duplicate comments (3)
core/src/main/java/io/questdb/griffin/SqlOptimiser.java (2)
121-122: Make nesting depth limit configurable / consistent with existing recursion limit.
Right nowMAX_WINDOW_FUNCTION_NESTING_DEPTHis hard-coded whilemaxRecursioncomes fromconfiguration.getSqlWindowMaxRecursion(). If these are intentionally different limits, the user-facing error should clearly distinguish them; otherwise, tie this to config.
5105-5134: Nested windows inside wrapper expressions still won’t be extracted; dedup hash map is bypassed.
replaceIfWindowFunction()(used byemitWindowFunctions()) registers the outer window intowindowModelbut does not callextractAndRegisterNestedWindowFunctions(), so nested windows can remain embedded in the window’s AST (e.g.,abs(sum(row_number() over()) over())). It also doesn’tregisterWindowFunction(), so global dedup misses windows discovered via this path.Proposed diff (extract + validate + hash-dedup + registration)
private ExpressionNode replaceIfWindowFunction( `@Transient` ExpressionNode node, QueryModel windowModel, QueryModel translatingModel, QueryModel innerVirtualModel, QueryModel baseModel ) throws SqlException { if (node != null && node.windowExpression != null) { - // Check if an identical window function already exists in the model - ObjList<QueryColumn> existingColumns = windowModel.getBottomUpColumns(); - for (int i = 0, n = existingColumns.size(); i < n; i++) { - QueryColumn existing = existingColumns.getQuick(i); - if (ExpressionNode.compareNodesExact(node, existing.getAst())) { - // Found duplicate - reuse the existing alias - return nextLiteral(existing.getAlias()); - } - } + validateNoWindowFunctionsInWindowSpec(node.windowExpression); + extractAndRegisterNestedWindowFunctions(node, translatingModel, innerVirtualModel, baseModel, 0); + + // Prefer global hash-based dedup (covers all code paths) + CharSequence existingAlias = findDuplicateWindowFunction(node); + if (existingAlias != null) { + return nextLiteral(existingAlias); + } WindowExpression wc = node.windowExpression; // Create alias for the window column if not already set CharSequence alias = wc.getAlias(); if (alias == null) { alias = createColumnAlias(node, windowModel); wc.of(alias, node); } windowModel.addBottomUpColumn(wc); + registerWindowFunction(node, wc); // Emit literals referenced by the window column to inner models emitLiterals(node, translatingModel, innerVirtualModel, true, baseModel, true); return nextLiteral(alias); } return node; }core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java (1)
9196-9216: Fix potential output formatting flakiness forsum(row_number() OVER ...)This still looks vulnerable to rendering as
6.0depending on typing/printing rules forsum(...). The earlier review already called this out; please make the assertion deterministic (either expect6.0or cast).Proposed fix (adjust expected output)
@@ assertQuery( """ result - 6 + 6.0 """, "SELECT sum(row_number() OVER (order by ts)) AS result FROM x",
🧹 Nitpick comments (8)
core/src/main/java/io/questdb/griffin/SqlOptimiser.java (4)
181-190: New optimiser state: ensure it’s reset on optimiser reuse/error paths (not just per select rewrite).
innerWindowModels,windowFunctionHashMap, andwindowColumnListPoolintroduce long-lived references; consider clearing them inSqlOptimiser.clear()(and/or other lifecycle resets) to avoid holding onto pooledQueryColumn/QueryModelgraphs across failed compiles.
2576-2630: Pool/map reset should also clear pooled lists to avoid retaining old QueryColumn refs.
clearWindowFunctionHashMap()resets the map and pool cursor, butwindowColumnListPoolentries beyond the newwindowColumnListPoolPoscan keep stale references indefinitely (until reused). Consider clearing all pooled lists (or at least all lists ever allocated) on reset.Proposed diff
private void clearWindowFunctionHashMap() { windowFunctionHashMap.clear(); windowColumnListPoolPos = 0; + // Avoid retaining QueryColumn refs across compiles when pool entries are not reused. + for (int i = 0, n = windowColumnListPool.size(); i < n; i++) { + windowColumnListPool.getQuick(i).clear(); + } }
2704-2779: Aliasing hack looks correct; considerputIfAbsent-style helper for clarity.
ThealiasMap.keyIndex(alias) > -1check is correct for “absent”, but it’s easy to misread. If there’s a canonical helper inLowerCaseCharSequenceObjHashMap(or a local wrapper), it’d reduce mistakes later.
7111-7115: Good reset point; consider also resetting in broader optimiser lifecycle.
ClearinginnerWindowModelsandwindowFunctionHashMapat the start ofrewriteSelectClause0()is correct; still consider also doing it inclear()to cover early-failure / reuse scenarios.core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java (1)
9021-9079: Good coverage forRESPECT NULLSdedup, but plan-shape assertions may be brittle.Nice addition validating that
lead/lag ... respect nulls(default) deduplicates with the plainlead/lagand the plan lists only 4 functions. Consider keeping these plan expectations as focused as possible on the dedup invariant (count/unique functions) to reduce churn if the optimizer adds/removes wrapper nodes (e.g., extraSelectedRecord).core/src/test/java/io/questdb/test/griffin/SqlParserTest.java (3)
8195-8205: Dedup-negative test looks good; watch for ordering brittleness
The “different ORDER BY => no dedup” intent is solid. Only minor risk: if internal window extraction order becomes canonicalized differently later, the exact “row_number vs row_number1” assignment could flip and fail the string assertion despite correct behavior. If you’ve seen flakiness here before, consider asserting via parsed model inspection instead of full-string equality.
8240-8251: Good coverage for window-as-argument-to-aggregate
This nicely exercises the “window computed first, then aggregated” shape. If the PR discussion item “SELECT sum(sum(x) OVER () - sum(x) OVER ()) FROM x” isn’t already covered elsewhere, this is the natural area to add it.
8311-8320: Nesting depth limit test is valuable; consider minimizing position brittleness
As a regression test this is great. The only concern is the hard-coded error position (Line 8316) being sensitive to minor parser formatting changes. IfassertSyntaxErrorrequires an exact position match, consider using the earliest stable token position (or loosening the position assertion) to reduce churn.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
core/src/main/java/io/questdb/griffin/SqlOptimiser.javacore/src/main/java/io/questdb/griffin/model/ExpressionNode.javacore/src/test/java/io/questdb/test/griffin/SqlParserTest.javacore/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java
🧰 Additional context used
🧬 Code graph analysis (3)
core/src/main/java/io/questdb/griffin/SqlOptimiser.java (1)
core/src/main/java/io/questdb/griffin/model/ExpressionNode.java (1)
ExpressionNode(43-526)
core/src/main/java/io/questdb/griffin/model/ExpressionNode.java (2)
core/src/main/java/io/questdb/std/IntList.java (1)
IntList(34-410)core/src/main/java/io/questdb/std/Chars.java (1)
Chars(43-1646)
core/src/test/java/io/questdb/test/griffin/SqlParserTest.java (1)
core/src/main/java/io/questdb/cairo/ColumnType.java (1)
ColumnType(58-1168)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
- GitHub Check: New pull request (Check Changes Check changes)
- GitHub Check: build
🔇 Additional comments (24)
core/src/main/java/io/questdb/griffin/model/ExpressionNode.java (4)
31-31: LGTM!Import is correctly added for
IntListusage in the new window expression comparison and hashing methods.
82-85: LGTM!The extension to include window expression comparison is correctly integrated with the existing comparison logic.
87-137: LGTM!Comprehensive comparison of all
WindowExpressionattributes including frame specification, boundary expressions, PARTITION BY, and ORDER BY with directions. Null handling and size checks are correctly implemented.
167-199: LGTM!The
hashWindowExpressionimplementation correctly hashes all fields that are compared incompareWindowExpressions, maintaining consistency between hash and equality for window expressions.core/src/main/java/io/questdb/griffin/SqlOptimiser.java (4)
2511-2574: Good separation of traversal stacks for window extraction.
UsingwindowNodeStackavoidsemitLiterals()clobbering traversal state while walking expressions.
2780-2807: Validation messages are clear and targeted.
The PARTITION BY vs ORDER BY error text will help users pinpoint where nesting is disallowed.
6739-6804: Good: top-level window + aggregate paths now extract nested windows and use hash-based dedup.
In particular:validateNoWindowFunctionsInWindowSpec()+extractAndRegisterNestedWindowFunctions()+findDuplicateWindowFunction()aligns with the PR’s semantics.
7677-7687: Chaining inner window models deepest-first looks correct.
The “deepest-first added, forward-iterated” invariant matches the required execution order (inner windows computed closer to base).core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java (5)
9130-9150: Arithmetic-of-window-functions inside a regular aggregate is well-covered.The expected result and inline explanation match the query shape and helps guard the original regression.
9152-9171: Covers the “original bug case” clearly.
abs(sum(x) OVER () - sum(x) OVER ())is a good minimal reproducer and the per-row expected output is explicit.
9218-9236: Covers window-as-argument tocast()cleanly.
9238-9256: Covers window-as-argument to a scalar function (abs) cleanly.
9258-9280: Covers nested window functions with a clear expected result.core/src/test/java/io/questdb/test/griffin/SqlParserTest.java (11)
8164-8176: Nice coverage for “window arithmetic inside a regular function”
This test clearly asserts the intended rule (“window funcs can appear inside an expression that is the function argument”), and the expected model shape (select-window extraction + select-virtual referencing aliases) matches the PR design.
8179-8193: Good dedup assertion for repeated window function in CASE
The updated expectation (“singlerow_number()extracted once, referenced twice”) is the right regression check for the new dedup behavior.
8207-8223: Dedup-positive test for PARTITION BY/ORDER BY is solid
This is a good “same PARTITION/ORDER => dedup” guard, and it also verifies that duplicate references in projection survive as separate output columns.
8225-8238: Great parser-level coverage for window-as-argument-to-window
The expected layeredselect-windowchaining matches the implementation approach described in the PR (extract nested windows into their own layers).
8253-8268: Three-level nesting test is a strong regression guard
This asserts the depth-first extraction and alias threading across multiple window layers; great for preventing accidental “flattening” regressions.
8270-8282: Covers multiple distinct inner windows in one expression
Good: ensures extraction handles multiple nested window nodes in a single expression tree, and does not incorrectly dedup different functions.
8284-8296: Good dedup test for duplicate nested windows
This is the exact “compute row_number once, reuse twice” scenario the dedup optimization is supposed to catch.
8298-8309: Good shared-inner-window test across multiple outer windows
This checks the “common subexpression” case across different outer window functions (sum/avg), which is easy to miss.
8322-8330: Good negative test: window function disallowed in PARTITION BY
Covers the direct forbidden placement and ensures the message is clear.
8332-8340: Good negative test: window function disallowed in window ORDER BY
Covers the other common forbidden placement, and the error text is specific enough to be user-actionable.
8342-8351: Good negative test for nested usage inside PARTITION BY expression
This is an important variant: it prevents “only top-level check” bugs by ensuring nested expressions are scanned too.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
|
@CodeRabbit the issues you raised have all been fixed, i am getting bored, find me new issues to fix |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 70258 🏁 Script executed: Length of output: 10637 🏁 Script executed: Length of output: 5693
If you want, I can draft the tests plus the two small validation patches. [plan_code_changes]
If you’re happy, I’ll proceed to draft the patches/tests exactly as above. 🏁 Script executed: Length of output: 323 🏁 Script executed: Length of output: 437
[plan_code_changes] Plan: Add WHERE/JOIN ON window-function validation + negative/dedup testsPhase 1: Add validation helper and wire into WHERE/JOIN ON
Phase 2: Extend tests in SqlParserTest and WindowFunctionTest
Once you implement these, you'll have: The dedup canonicalization (default frame vs explicit) and deepClone immutability are nice-to-haves you can tackle after these core correctness fixes if you're still bored. |
|
@CodeRabbit can you confirm if critical and major issues have been addressed? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 3064 🏁 Script executed: Length of output: 5440
✅ Addressed (Critical Functional)
❌ NOT Addressed (Critical Validation Gaps)Two blocking validation gaps remain:
These two gaps are critical because they violate SQL semantics and could produce incorrect results. The implementation in my earlier Everything else (the functional heavy lifting for nested windows and deduplication) is solid. Want me to finalize those two validation patches now? |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@core/src/test/java/io/questdb/test/griffin/model/ExpressionNodeTest.java`:
- Around line 43-45: Remove the brittle identity-hash check in
ExpressionNodeTest by deleting the Assert.assertNotEquals(token1.hashCode(),
token2.hashCode()) assertion; keep Assert.assertNotSame(token1, token2) and the
deepHashCode equality assertion (or, if you really want a guard, replace the
removed assertion with Assume.assumeTrue(token1.hashCode() != token2.hashCode())
to skip the test on a rare hash collision).
core/src/test/java/io/questdb/test/griffin/model/ExpressionNodeTest.java
Outdated
Show resolved
Hide resolved
The assertNotEquals(token1.hashCode(), token2.hashCode()) check relied on identity-based Object.hashCode() being different for two instances, which could fail on rare hash collisions. Kept assertNotSame for instance check. Co-Authored-By: Claude Opus 4.5 <[email protected]>
…l in nested window functions When processing queries like `SELECT x as a, x as b, sum(sum(a) OVER ()) OVER () FROM x`, projection aliases (e.g., "a b" representing "x as b") were incorrectly being copied from innerVirtualModel to translatingModel. This caused the innermost select-choose layer to contain redundant projection aliases that should only appear in outer layers. The fix adds a check to distinguish between: - Projection aliases (token is an alias for a different column) - should NOT be copied - Renamed table columns (token is a table column name) - should be copied Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/io/questdb/griffin/SqlOptimiser.java (1)
5221-5257: Register window functions discovered viaemitWindowFunctions()in the hash map.
replaceIfWindowFunction()dedups only by scanningwindowModel, but it doesn’t register new entries inwindowFunctionHashMap. If a window function is discovered inside an expression before an identical top‑level window column is processed, the later hash-based dedup won’t see the earlier one and can compute duplicates. Consider reusing the hash map here for consistent O(1) dedup.💡 Proposed fix
if (node != null && node.windowExpression != null) { // Extract any nested window functions from arguments to inner window model. // This modifies the AST in-place, replacing nested windows with literal references. extractAndRegisterNestedWindowFunctions(node, translatingModel, innerVirtualModel, baseModel, 0); // Validate that PARTITION BY and ORDER BY don't contain window functions validateNoWindowFunctionsInWindowSpec(node.windowExpression); - // Check if an identical window function already exists in the model - // (must be done AFTER extraction so we compare the modified AST) - ObjList<QueryColumn> existingColumns = windowModel.getBottomUpColumns(); - for (int i = 0, n = existingColumns.size(); i < n; i++) { - QueryColumn existing = existingColumns.getQuick(i); - if (ExpressionNode.compareNodesExact(node, existing.getAst())) { - // Found duplicate - reuse the existing alias - return nextLiteral(existing.getAlias()); - } - } + // O(1) duplicate check + int hash = ExpressionNode.deepHashCode(node); + CharSequence existingAlias = findDuplicateWindowFunction(node, hash); + if (existingAlias != null) { + return nextLiteral(existingAlias); + } WindowExpression wc = node.windowExpression; // Create alias for the window column if not already set CharSequence alias = wc.getAlias(); if (alias == null) { alias = createColumnAlias(node, windowModel); wc.of(alias, node); } windowModel.addBottomUpColumn(wc); + registerWindowFunction(wc, hash); // Emit literals referenced by the window column to inner models emitLiterals(node, translatingModel, innerVirtualModel, true, baseModel, true); return nextLiteral(alias); }
🤖 Fix all issues with AI agents
In `@core/src/main/java/io/questdb/griffin/SqlOptimiser.java`:
- Around line 1919-1940: collectReferencedAliases currently walks only the
generic expression tree and misses aliases referenced from window
specifications; update collectReferencedAliases to detect WindowExpression nodes
(or the class representing windowed expressions) and recursively traverse its
partitionBy and orderBy lists and any frame boundary expressions (e.g., frame
start/end expressions) in addition to lhs/rhs/args so that aliases referenced in
PARTITION BY / ORDER BY / frame bounds are added to the aliases set; reference
the collectReferencedAliases method and the WindowExpression (or equivalent)
node fields partitionBy, orderBy and frame bounds when making the change.
|
@CodeRabbit review and check if concern has been addressed |
🧩 Analysis chain
🏁 Script executed: Length of output: 13372
✅ All Critical Validations — IMPLEMENTED & TESTED
✅ All Functional Features — WORKING & TESTED
📋 Code Quality
🎯 SummaryZero blocking issues remain. The PR is ready for merge from a functional correctness and validation perspective: ✅ SQL standard compliance (windows disallowed in WHERE/JOIN ON/PARTITION BY/ORDER BY/GROUP BY) Great job addressing all the feedback! 🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
core/src/main/java/io/questdb/griffin/SqlOptimiser.java (1)
5246-5282: Dedup hash map is not updated for windows found viaemitWindowFunctions()(order‑dependent duplicates).
replaceIfWindowFunction()dedups via linear scan but never registers the window inwindowFunctionHashMap. If a window function is first discovered inside an expression (e.g.,abs(row_number() over())) and later appears as a top‑level window column, the hash‑based dedup path won’t see the earlier one, causing duplicate window computation. This undermines the O(1) dedup goal.♻️ Suggested fix (use hash map consistently)
- // Check if an identical window function already exists in the model - // (must be done AFTER extraction so we compare the modified AST) - ObjList<QueryColumn> existingColumns = windowModel.getBottomUpColumns(); - for (int i = 0, n = existingColumns.size(); i < n; i++) { - QueryColumn existing = existingColumns.getQuick(i); - if (ExpressionNode.compareNodesExact(node, existing.getAst())) { - // Found duplicate - reuse the existing alias - return nextLiteral(existing.getAlias()); - } - } + // Check if an identical window function already exists (hash-based) + int hash = ExpressionNode.deepHashCode(node); + CharSequence existingAlias = findDuplicateWindowFunction(node, hash); + if (existingAlias != null) { + return nextLiteral(existingAlias); + } WindowExpression wc = node.windowExpression; // Create alias for the window column if not already set CharSequence alias = wc.getAlias(); if (alias == null) { alias = createColumnAlias(node, windowModel); wc.of(alias, node); } windowModel.addBottomUpColumn(wc); + registerWindowFunction(wc, hash);
🤖 Fix all issues with AI agents
In `@core/src/main/java/io/questdb/griffin/SqlOptimiser.java`:
- Around line 3198-3225: The method findWindowFunctionOrNamePosition currently
always iterates node.args which breaks the established traversal convention
using paramCount to distinguish binary nodes (using lhs/rhs) from variadic nodes
(using args), risking double-traversal and pooled-node misuse; update the
traversal to follow the pattern used elsewhere in SqlOptimiser.java by
retrieving the paramCount for the node (via node.paramCount or equivalent) and
if paramCount < 3 recurse only into lhs and rhs, otherwise iterate node.args,
while preserving the initial checks for node.windowExpression and
FunctionFactoryCache.isPureWindowFunction(node.token) so detection remains
correct.
♻️ Duplicate comments (1)
core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java (1)
9183-9203: Potentially brittle expected output for sum(row_number()).The result may render as
6.0rather than6depending on aggregate formatting; consider normalizing via explicit cast or expected output.
…osition Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
@CodeRabbit review again, your last concern should be addressed now |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain
🏁 Script executed: Length of output: 13883
✅ Complete Implementation VerifiedCore Validations — All Present & Working
Test Coverage — ComprehensiveNegative tests (disallowed contexts):
Positive tests (allowed contexts):
Functional Features — All Working
Code Quality
🚀 Final Status: READY FOR MERGEThe PR is functionally complete, thoroughly tested, and implements all SQL standard compliance requirements for window functions. Excellent work addressing all the feedback! 🧠 Learnings usedIf you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
replaceIfWindowFunction() was adding window functions to windowModel but not registering them in windowFunctionHashMap. This caused findDuplicateWindowFunction() to miss these windows when processing later identical window functions, resulting in "Duplicate column" errors for queries like: SELECT abs(row_number() over(order by x)), row_number() over(order by x) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
@CodeRabbit review again, addressed outside of diff comment |
🧩 Analysis chain
🏁 Script executed: Length of output: 12761
✅ Latest Fix AppliedCommit 3e6832a: "fix(sql): register window functions in hash map for proper deduplication"
✅ All Critical Validations — VERIFIED IN PLACE
✅ All Tests — VERIFIED PRESENT
✅ Core Features — VERIFIED WORKING
🚀 Final Verdict: READY FOR MERGEThe deduplication registration fix addresses the likely cause of the enterprise CI failure. All SQL standard compliance validations are in place, comprehensively tested, and working correctly. The PR is functionally complete and production-ready. 🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@core/src/main/java/io/questdb/griffin/SqlOptimiser.java`:
- Around line 8671-8687: The validation in validateNoWindowFunctionsInWindowSpec
currently uses checkForChildWindowFunctions which misses standalone window
function names (e.g., row_number() without OVER); replace the calls that inspect
partitionBy and orderBy nodes to use checkForWindowFunctionsOrNames instead of
checkForChildWindowFunctions so pure window function identifiers are detected in
PARTITION BY and ORDER BY of the WindowExpression and cause the same
SqlException as other contexts.
♻️ Duplicate comments (1)
core/src/main/java/io/questdb/griffin/SqlOptimiser.java (1)
121-122: Consider wiring the nesting limit to configuration.This matches a prior review concern; leaving a hard-coded cap may diverge from
configuration.getSqlWindowMaxRecursion()expectations.
🧹 Nitpick comments (1)
core/src/test/java/io/questdb/test/griffin/SqlOptimiserTest.java (1)
5909-5939: Consider asserting results for q2/q3 (with stable aliases).Plan checks cover deduplication, but adding result assertions would protect against execution regressions; aliasing keeps headers deterministic.
🧪 Proposed additions
- String q2 = "select abs(row_number() over(partition by x order by ts)), row_number() over(partition by x order by ts) from t"; + String q2 = "select abs(row_number() over(partition by x order by ts)) as abs_rn, row_number() over(partition by x order by ts) as rn from t"; @@ assertPlanNoLeakCheck( q2, """ VirtualRecord functions: [abs(row_number),row_number] Window functions: [row_number() over (partition by [x])] PageFrame Row forward scan Frame forward scan on: t """ ); + assertSql(""" + abs_rn\trn + 1\t1 + 1\t1 + 1\t1 + """, q2); @@ - String q3 = "select abs(row_number() over(order by x)) + abs(row_number() over(order by x)), row_number() over(order by x) from t"; + String q3 = "select abs(row_number() over(order by x)) + abs(row_number() over(order by x)) as double_rn, row_number() over(order by x) as rn from t"; @@ assertPlanNoLeakCheck( q3, """ VirtualRecord functions: [abs(row_number)+abs(row_number),row_number] CachedWindow orderedFunctions: [[x] => [row_number()]] PageFrame Row forward scan Frame forward scan on: t """ ); + assertSql(""" + double_rn\trn + 2\t1 + 4\t2 + 6\t3 + """, q3);
[PR Coverage check]😍 pass : 361 / 393 (91.86%) file detail
|
closes #4018
Summary
Features
1. Window Functions as Arguments
Enables queries like:
2. Nested Window Functions
Enables window functions as arguments to other window functions:
Each nested window function is extracted to its own
select-windowsubquery layer.3. Window Functions in Aggregates
Enables window functions as arguments to regular aggregate functions:
4. Duplicate Window Function Deduplication
Identical window functions are now computed only once:
Deduplication works at multiple levels:
SELECT row_number() OVER (...), row_number() OVER (...) FROM tSELECT sum(row_number() OVER () + row_number() OVER ()) OVER () FROM tWindow functions with different specifications (ORDER BY, PARTITION BY, frame) are correctly kept separate.
5. Validation: Window Functions Not Allowed in PARTITION BY / ORDER BY
Clear error messages when window functions are used incorrectly:
Implementation
SqlOptimiser.java
extractNestedWindowFunctions()andextractAndRegisterNestedWindowFunctions()methodsselect-windowquery modelMAX_WINDOW_FUNCTION_NESTING_DEPTH = 8limitvalidateNoWindowFunctionsInWindowSpec()to prohibit window functions in PARTITION BY/ORDER BYO(1) Hash-Based Deduplication
windowFunctionHashMapfor O(1) duplicate detection (hash → list of QueryColumns)windowColumnListPoolto avoid allocationsfindDuplicateWindowFunction()- O(1) hash lookup + O(k) comparison (k = collision count, typically 1)registerWindowFunction()- registers window function for future deduplicationExpressionNode.java
compareNodesExact()to compare window expressionscompareWindowExpressions()method that compares all window specification fields:deepHashCode()andhashWindowExpression()for hash-based deduplicationcompareNodesExact()Test Plan
WindowFunctionTestexecution tests pass (213 tests)SqlParserTest#testWindowFunction*parser tests pass (19 tests)🤖 Generated with Claude Code