fix(sql): fix intermittent failures with SQL window functions#6030
fix(sql): fix intermittent failures with SQL window functions#6030bluestreak01 merged 9 commits intomasterfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThis change refactors window function factories to centralize window parameter validation and state management within the Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SqlExecutionContext
participant WindowFunctionFactory
participant WindowContext
Client->>WindowFunctionFactory: newInstance(...)
WindowFunctionFactory->>SqlExecutionContext: getWindowContext()
SqlExecutionContext-->>WindowFunctionFactory: WindowContext
WindowFunctionFactory->>WindowContext: validate(position, supportNullsDesc)
WindowContext-->>WindowFunctionFactory: (throws SqlException if invalid)
WindowFunctionFactory->>WindowContext: getRowsLo()/getRowsHi()/etc.
WindowFunctionFactory-->>Client: Function instance
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes detected. All code changes are directly related to window function validation, state management, and regression testing as required by the linked issue. ✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
|
@CodeRabbit please review |
|
@bluestreak01 I'll review the changes in this PR that fixes the race condition with SQL window functions. ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
core/src/main/java/io/questdb/griffin/engine/functions/window/LastValueDoubleWindowFunctionFactory.java (1)
1764-1771: Incorrect constant used when detecting an unbounded lower bound in the execution-plan printer
rowLois compared toLong.MAX_VALUE, but an unbounded preceding lower bound is encoded asLong.MIN_VALUEeverywhere else in the engine.
This makestoPlan()emit “unbounded” only for the impossible case ofLong.MAX_VALUE, while the validUNBOUNDED PRECEDINGpath prints an absolute number instead.- if (rowLo == Long.MAX_VALUE) { + if (rowLo == Long.MIN_VALUE) { sink.val("unbounded"); } else { sink.val(Math.abs(rowLo)); }
🧹 Nitpick comments (3)
core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java (1)
2252-2267: Test logic is correct but consider extracting to separate method.The window function test correctly validates the "exclude current row" behavior. The expected results properly reflect that when the current row is excluded from the frame, the function returns the last value from the remaining rows in the frame (or null when the frame becomes empty).
However, this inline test addition within an existing method makes the test harder to maintain and understand.
Consider extracting this test case to a separate method for better organization:
+@Test +public void testLastValueWithCurrentRowExclusion() throws Exception { + assertMemoryLeak(() -> { + execute("create table tab (ts timestamp, i long, j long, d double, s symbol, c VARCHAR) timestamp(ts)"); + execute("insert into tab select x::timestamp, x/4, case when x % 3 = 0 THEN NULL ELSE x%5 END, x%5, 'k' || (x%5) ::symbol, 'k' || x from long_sequence(7)"); + + assertQueryNoLeakCheck( + "ts\ti\tj\tlast_value\n" + + "1970-01-01T00:00:00.000001Z\t0\t1\tnull\n" + + "1970-01-01T00:00:00.000002Z\t0\t2\t1.0\n" + + "1970-01-01T00:00:00.000003Z\t0\tnull\t2.0\n" + + "1970-01-01T00:00:00.000004Z\t1\t4\tnull\n" + + "1970-01-01T00:00:00.000005Z\t1\t0\t4.0\n" + + "1970-01-01T00:00:00.000006Z\t1\tnull\t0.0\n" + + "1970-01-01T00:00:00.000007Z\t1\t2\tnull\n", + "select ts, i, j,\n" + + "last_value(j) over (partition by i order by ts rows between 2 preceding and current row exclude current row)\n" + + "from tab;", + "ts", + false, + true + ); + }); +}core/src/main/java/io/questdb/griffin/engine/functions/window/LastValueDoubleWindowFunctionFactory.java (2)
2039-2045: Minor wording glitch in generated execution plan
toPlan()currently outputs “preceding between and”, which contains a duplicated “between” token and does not match SQL grammar.- sink.val(Math.abs(rowsLo)); - sink.val(" preceding between and "); + sink.val(Math.abs(rowsLo)); + sink.val(" preceding and ");
98-110: Large duplication between NULL-handling branches
generateIgnoreNullsFunction()andgenerateRespectNullsFunction()are ~90 % identical, differing only in the concrete leaf classes they instantiate.
Consider extracting the common decision tree into a single private helper that takes functional interfaces (or anenum) for the two variants.
This will:• cut ~700 lines of boilerplate
• reduce the chance of future divergences between the two code paths
• simplify testing and maintenance.Also applies to: 284-296
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
core/src/main/java/io/questdb/griffin/engine/functions/window/AbstractWindowFunctionFactory.java(0 hunks)core/src/main/java/io/questdb/griffin/engine/functions/window/AvgDoubleWindowFunctionFactory.java(2 hunks)core/src/main/java/io/questdb/griffin/engine/functions/window/CountConstWindowFunctionFactory.java(2 hunks)core/src/main/java/io/questdb/griffin/engine/functions/window/CountFunctionFactoryHelper.java(1 hunks)core/src/main/java/io/questdb/griffin/engine/functions/window/FirstValueDoubleWindowFunctionFactory.java(4 hunks)core/src/main/java/io/questdb/griffin/engine/functions/window/LastValueDoubleWindowFunctionFactory.java(4 hunks)core/src/main/java/io/questdb/griffin/engine/functions/window/MaxDoubleWindowFunctionFactory.java(2 hunks)core/src/main/java/io/questdb/griffin/engine/functions/window/MinDoubleWindowFunctionFactory.java(2 hunks)core/src/main/java/io/questdb/griffin/engine/functions/window/SumDoubleWindowFunctionFactory.java(2 hunks)core/src/main/java/io/questdb/griffin/engine/window/WindowContext.java(3 hunks)core/src/main/java/io/questdb/griffin/engine/window/WindowContextImpl.java(6 hunks)core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java(2 hunks)
💤 Files with no reviewable changes (1)
- core/src/main/java/io/questdb/griffin/engine/functions/window/AbstractWindowFunctionFactory.java
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: in doublearrayaccessfunctionfactory.java, overflow validation for array index casting from long to i...
Learnt from: mtopolnik
PR: questdb/questdb#5997
File: core/src/main/java/io/questdb/griffin/engine/functions/array/DoubleArrayAccessFunctionFactory.java:127-131
Timestamp: 2025-07-31T08:21:17.390Z
Learning: In DoubleArrayAccessFunctionFactory.java, overflow validation for array index casting from long to int is performed in the validateIndexArgs() method rather than at each individual cast site. The validateIndexArgs() method is called early in newInstance() to validate all index arguments, making subsequent casts safe.
Applied to files:
core/src/main/java/io/questdb/griffin/engine/functions/window/MinDoubleWindowFunctionFactory.javacore/src/main/java/io/questdb/griffin/engine/functions/window/SumDoubleWindowFunctionFactory.javacore/src/main/java/io/questdb/griffin/engine/functions/window/AvgDoubleWindowFunctionFactory.javacore/src/main/java/io/questdb/griffin/engine/functions/window/MaxDoubleWindowFunctionFactory.javacore/src/main/java/io/questdb/griffin/engine/functions/window/CountConstWindowFunctionFactory.javacore/src/main/java/io/questdb/griffin/engine/window/WindowContextImpl.javacore/src/main/java/io/questdb/griffin/engine/functions/window/FirstValueDoubleWindowFunctionFactory.java
📚 Learning: in questdb ilp tests, maximum dimension values like `(1 << 28) - 1` are intentionally hard-coded ins...
Learnt from: jerrinot
PR: questdb/questdb#5996
File: core/src/test/java/io/questdb/test/cutlass/http/line/LineHttpSenderTest.java:350-356
Timestamp: 2025-08-04T09:16:27.366Z
Learning: In QuestDB ILP tests, maximum dimension values like `(1 << 28) - 1` are intentionally hard-coded instead of using constants like `ArrayView.DIM_MAX_LEN` to serve as "canaries" that will cause test failures if the API limits change, ensuring developers are aware of breaking changes to the ILP API.
Applied to files:
core/src/main/java/io/questdb/griffin/engine/functions/window/MinDoubleWindowFunctionFactory.java
🔇 Additional comments (16)
core/src/main/java/io/questdb/griffin/engine/functions/window/SumDoubleWindowFunctionFactory.java (2)
47-47: Import addition aligns with the refactoring objectives.The import of
WindowContextis necessary for the new stateless validation approach that eliminates race conditions.
76-83: Excellent refactoring to eliminate race conditions.This change successfully addresses the race condition issue described in the PR objectives by:
- Obtaining
WindowContextfromSqlExecutionContextinstead of relying on shared factory state- Using the centralized
validate()method for parameter validation- Extracting frame boundaries directly from the context rather than accessing instance fields
This makes the window function factory stateless and thread-safe for concurrent SQL compilations.
core/src/main/java/io/questdb/griffin/engine/functions/window/CountFunctionFactoryHelper.java (1)
68-71: Helper class correctly adopts the new stateless validation pattern.The refactoring successfully eliminates dependency on
AbstractWindowFunctionFactory's stateful validation by:
- Obtaining
WindowContextdirectly fromSqlExecutionContext- Using the centralized
validate()method- Extracting frame boundaries from the context
This ensures thread safety for concurrent window function compilations.
core/src/main/java/io/questdb/griffin/engine/functions/window/MinDoubleWindowFunctionFactory.java (2)
38-38: Import addition supports the stateless refactoring.Adding the
WindowContextimport enables the new thread-safe validation pattern.
64-71: Consistent implementation of the race condition fix.This factory correctly adopts the new stateless validation approach:
- Retrieves
WindowContextfromSqlExecutionContextinstead of using shared factory state- Performs validation through the centralized
validate()method- Extracts window frame boundaries directly from the context
This eliminates the race condition that occurred when multiple concurrent SQL compilations overwrote each other's parameters in shared factory instances.
core/src/main/java/io/questdb/griffin/engine/window/WindowContext.java (3)
30-30: Import addition supports the new validation method.The
SqlExceptionimport is required for the newvalidate()method signature.
41-42: Method reordering improves interface organization.The repositioning of
getNullsDescPos()andisIgnoreNulls()methods doesn't change functionality but may provide better logical grouping within the interface.Also applies to: 65-66
71-71: Critical method addition enables centralized validation.The new
validate(int position, boolean supportTNullsDesc)method is essential for the race condition fix. It centralizes window parameter validation logic that was previously scattered across individual factory instances, eliminating the shared mutable state that caused race conditions in concurrent SQL compilations.core/src/main/java/io/questdb/griffin/engine/functions/window/AvgDoubleWindowFunctionFactory.java (3)
48-48: LGTM: Necessary import for WindowContext refactoring.The import addition is correctly placed and supports the architectural change to centralize window parameter validation within the WindowContext abstraction.
82-83: LGTM: Correct implementation of stateless validation.This change successfully eliminates the race condition by replacing the stateful
checkWindowParametermethod with WindowContext-based validation. The WindowContext is obtained from the execution context (likely thread-local) and validation is performed without modifying shared factory state.
89-90: LGTM: Proper extraction of window frame parameters.The window frame boundaries are now correctly extracted from the validated WindowContext instead of being accessed as instance fields. This completes the stateless refactoring and ensures thread-safety while maintaining the same functional behavior.
core/src/main/java/io/questdb/griffin/engine/functions/window/MaxDoubleWindowFunctionFactory.java (2)
46-46: LGTM - Import added for WindowContext interface.The import statement correctly adds the WindowContext interface needed for the new stateless approach.
82-89: Excellent fix for the race condition issue!This refactoring successfully addresses the core race condition problem by:
- Eliminating shared state: WindowContext is now retrieved per-execution from
SqlExecutionContextinstead of being stored as instance fields- Thread-safe parameter extraction:
rowsLoandrowsHiare extracted locally within method scope rather than relying on potentially overwritten instance fields- Proper validation: The
windowContext.validate()call replaces the removedcheckWindowParameter()method while maintaining the same validation logicThis change ensures that concurrent SQL compilations no longer overwrite each other's window parameters, resolving the intermittent failures described in issue #5989.
core/src/main/java/io/questdb/griffin/engine/functions/window/CountConstWindowFunctionFactory.java (1)
43-43: LGTM! Correctly fixes the race condition.The refactoring properly removes mutable state from the shared factory instance by delegating window parameter validation and storage to the
WindowContext. This eliminates the race condition where concurrent SQL compilations would overwrite each other's parameters.Also applies to: 62-69
core/src/main/java/io/questdb/griffin/engine/window/WindowContextImpl.java (1)
43-44: Well-structured centralized validation logic.The new validation method and null handling fields properly centralize window parameter validation, eliminating the need for scattered validation logic across different factory classes. The
getRowsHi()adjustment forEXCLUDE_CURRENT_ROWis correctly implemented.Also applies to: 97-100, 122-124, 162-164, 167-169, 214-254
core/src/main/java/io/questdb/griffin/engine/functions/window/FirstValueDoubleWindowFunctionFactory.java (1)
46-46: Consistent refactoring that eliminates the race condition.The changes properly delegate window parameter validation and storage to
WindowContext, following the same pattern as other window function factories. This ensures thread-safety by removing mutable state from the shared factory instance.Also applies to: 77-92, 99-106, 301-310
...in/java/io/questdb/griffin/engine/functions/window/LastValueDoubleWindowFunctionFactory.java
Show resolved
Hide resolved
core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java
Outdated
Show resolved
Hide resolved
[PR Coverage check]😍 pass : 71 / 71 (100.00%) file detail
|
|
CI glitch: |
Summary
SQL queries using window functions could intermittently fail or produce incorrect results when multiple connections executed them concurrently. This PR fixes the race condition causing these failures.
Root Cause
Window function factories were incorrectly designed to be stateful - they stored execution context as instance fields:
AbstractWindowFunctionFactory.checkWindowParameter()would saveWindowContextand window frame boundaries (rowsLo,rowsHi) as instance fieldsFix
Made window function factories stateless. Removed all mutable state from
AbstractWindowFunctionFactory.Fixes #5989