feat(sql): implement ksum() window function with Kahan summation#6642
feat(sql): implement ksum() window function with Kahan summation#6642bluestreak01 merged 7 commits intomasterfrom
Conversation
Add ksum() as a window function that uses the Kahan summation algorithm for improved floating-point precision. This complements the existing ksum() aggregate function by enabling its use in window contexts. The implementation supports all standard window frame types: - Partition by with/without order by - ROWS and RANGE frame modes - Unbounded and bounded preceding/following - Sliding windows with proper Kahan compensation for value removal Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds a new KSumDoubleWindowFunctionFactory implementing Kahan-sum window functions for doubles, registers it in the function list, updates parser expectations, and adjusts window function tests (adds ksum tests and removes a large legacy test file). Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Inline Kahan summation logic to avoid allocating double[1] arrays on every computeNext() call. This is more appropriate for QuestDB's zero-GC design philosophy. Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In
@core/src/main/java/io/questdb/griffin/engine/functions/window/KSumDoubleWindowFunctionFactory.java:
- Around line 278-291: In the else branch where MemoryARW mem is created via
Vm.getCARWInstance(...) before returning new KSumOverRowsFrameFunction(...,
mem), protect the allocation against constructor failure by wrapping the
construction in a try/catch (or try/finally) so that if
KSumOverRowsFrameFunction's constructor throws, you call mem.close()/free()
(matching how other branches clean up) and rethrow the exception; specifically,
allocate MemoryARW mem, then try to return new
KSumOverRowsFrameFunction(args.get(0), rowsLo, rowsHi, mem), and on any
Throwable ensure mem is closed before propagating the error.
In
@core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java:
- Line 38: The test testWindowFunctionFailsInNonWindowContext omits the newly
added KSumDoubleWindowFunctionFactory from its factories array, so add
KSumDoubleWindowFunctionFactory.class to the factories array used in that test
to ensure the new window function is covered; locate the factories declaration
referenced by testWindowFunctionFailsInNonWindowContext and include
KSumDoubleWindowFunctionFactory in the list.
🧹 Nitpick comments (3)
core/src/main/java/io/questdb/griffin/engine/functions/window/KSumDoubleWindowFunctionFactory.java (2)
963-979: Minor: Redundant modulo operation.At line 970,
firstIdx %= sizeis redundant sincefirstIdxis always< capacity(which equalssizeat this point due to the condition at line 963). The operation has no effect. This doesn't affect correctness but could be simplified.Suggested simplification
if (firstIdx == 0) { Vect.memcpy(newAddress, oldAddress, size * RECORD_SIZE); } else { - firstIdx %= size; long firstPieceSize = (size - firstIdx) * RECORD_SIZE; Vect.memcpy(newAddress, oldAddress + firstIdx * RECORD_SIZE, firstPieceSize); Vect.memcpy(newAddress + firstPieceSize, oldAddress, firstIdx * RECORD_SIZE); firstIdx = 0; }
1277-1278: Unused instance fieldc.The instance field
cat line 1277 is written at line 1319 but never read. The compensation value is only needed during computation and doesn't need to persist as instance state.Suggested cleanup
static class KSumOverUnboundedPartitionRowsFrameFunction extends BasePartitionedWindowFunction implements WindowDoubleFunction { - private double c; // Kahan compensation private double sum;And remove line 1319:
value.putDouble(0, sum); value.putDouble(1, c); value.putLong(2, count); this.sum = count != 0 ? sum : Double.NaN; - this.c = c; }core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java (1)
3532-3560:testKSumPrecisionis mostly NULL-handling; consider tightening the “precision” claim.The test asserts exact stringified floating output (e.g.,
3.3000000000000003), which is fine for regression, but it doesn’t really demonstrate improved precision vssum()and the comments may overstate what’s being tested.Suggestion: either rename to reflect NULL/accumulation behavior, or add a small adversarial sequence comparison vs
sum()to justify the name.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
core/src/main/java/io/questdb/griffin/engine/functions/window/KSumDoubleWindowFunctionFactory.javacore/src/main/resources/function_list.txtcore/src/test/java/io/questdb/test/griffin/SqlParserTest.javacore/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java
🧰 Additional context used
🧬 Code graph analysis (2)
core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java (1)
core/src/main/java/io/questdb/griffin/engine/functions/window/KSumDoubleWindowFunctionFactory.java (1)
KSumDoubleWindowFunctionFactory(66-1513)
core/src/main/java/io/questdb/griffin/engine/functions/window/KSumDoubleWindowFunctionFactory.java (7)
core/src/main/java/io/questdb/cairo/ArrayColumnTypes.java (1)
ArrayColumnTypes(30-62)core/src/main/java/io/questdb/cairo/map/MapFactory.java (1)
MapFactory(34-145)core/src/main/java/io/questdb/cairo/sql/VirtualRecord.java (1)
VirtualRecord(40-285)core/src/main/java/io/questdb/cairo/vm/Vm.java (1)
Vm(39-128)core/src/main/java/io/questdb/griffin/model/WindowExpression.java (1)
WindowExpression(38-301)core/src/main/java/io/questdb/std/MemoryTag.java (1)
MemoryTag(27-184)core/src/main/java/io/questdb/std/ObjList.java (1)
ObjList(34-403)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (35)
- GitHub Check: New pull request (Coverage Report Coverage Report)
- GitHub Check: New pull request (Hosted Running tests on windows-other-2)
- GitHub Check: New pull request (Hosted Running tests on windows-other-1)
- GitHub Check: New pull request (Hosted Running tests on windows-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-2)
- GitHub Check: New pull request (Hosted Running tests on windows-cairo-1)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on windows-fuzz1)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-other)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-pgwire)
- GitHub Check: New pull request (Hosted Running tests on windows-griffin-base)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-sub)
- GitHub Check: New pull request (Hosted Running tests on mac-other)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-cairo-root)
- GitHub Check: New pull request (Hosted Running tests on mac-pgwire)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz2)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo-fuzz)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-fuzz1)
- GitHub Check: New pull request (Hosted Running tests on mac-cairo)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-sub)
- GitHub Check: New pull request (SelfHosted Running tests with cover on linux-griffin-root)
- GitHub Check: New pull request (Hosted Running tests on mac-griffin)
- GitHub Check: New pull request (SelfHosted Other tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x64-zfs)
- GitHub Check: New pull request (Rust Test and Lint on linux-jdk17)
- GitHub Check: New pull request (SelfHosted Other tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x64-zfs)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-x64-zfs)
- GitHub Check: New pull request (Trigger Enterprise CI Trigger Enterprise Pipeline)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-x86-graal)
- GitHub Check: New pull request (SelfHosted Griffin tests on linux-arm64)
- GitHub Check: New pull request (SelfHosted Cairo tests on linux-arm64)
- GitHub Check: New pull request (Check Changes Check changes)
🔇 Additional comments (13)
core/src/test/java/io/questdb/test/griffin/SqlParserTest.java (1)
7966-7977: Test correctly validates non-window function error.Switching from
ksum()tonsum()is correct:ksum()is now a window function (breaking the test's intent), whilensum()is a pure aggregation function with no window capability. The error position (7) and message (non-window function called in window context) remain stable and correctly validate the parser/semantic error.core/src/main/java/io/questdb/griffin/engine/functions/window/KSumDoubleWindowFunctionFactory.java (8)
66-77: LGTM!The factory class setup with static column type definitions is well-organized. The column type arrays correctly define the storage schema for each window frame variant (basic sum/compensation/count, rows-based with loIdx/startOffset, and range-based with additional capacity/firstIdx fields).
298-331: LGTM!Simple and correct implementation for the single-row frame case. No Kahan summation needed since it's just returning the current value.
333-408: LGTM!Correct two-pass implementation for whole partition sums. The Kahan summation algorithm is properly applied, and the handling of all-null partitions via
preparePass2is appropriate.
410-685: LGTM!Complex but well-structured implementation for partition-based range frames. The Kahan summation with compensation for both addition and subtraction (when values leave the frame) is correctly implemented. Memory management via
close(),reset(),reopen(), andtoTop()follows the expected lifecycle patterns.
687-877: LGTM!Correct implementation for partition-based rows frames with fixed-size ring buffers. The frame size calculation properly accounts for negative
rowsHivalues, and the Kahan summation is applied correctly for both adding and removing values from the sliding window.
1108-1272: LGTM!Correct implementation for non-partitioned rows frame. The buffer management and Kahan summation logic are properly implemented. The
close()method in constructor's catch block at line 1140 ensures proper cleanup ifinitBuffer()fails.
1354-1426: LGTM!Clean and correct implementation for cumulative sum over the entire result set. State management in
reset()andtoTop()properly reinitializes all fields.
1491-1512: LGTM!The static column type definitions correctly match the storage requirements of each window function variant.
core/src/main/resources/function_list.txt (1)
853-853: LGTM!Correct registration of the new
KSumDoubleWindowFunctionFactoryin the function catalog, properly placed alongside other window functions.core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java (3)
70-79: Verify ksum() argument-type coverage in parameterized frame tests.
FRAME_FUNCTIONS_PARAMETER_COLUMN_NAMEregistersksumwith"i"/"j"(typicallyLONGin this test suite). Ifksumis trulyksum(D)only (no implicit cast from integer types), some parameterized tests may fail before reaching the intended frame-validation assertions.If strict-double, consider switching ksum’s columns here to a known
DOUBLEcolumn (e.g.,"d") or using explicit casts in the generated SQL.
3437-3530: New ksum() window coverage looks solid (partitioned, cumulative, sliding, whole-set).The expected values align with the inserted dataset and exercise multiple frame shapes without being overly complex.
8760-8762: FRAME_FUNCTIONS updated to include ksum().Good integration with the existing normalizeSuffix-based parameterized framework.
...rc/main/java/io/questdb/griffin/engine/functions/window/KSumDoubleWindowFunctionFactory.java
Show resolved
Hide resolved
core/src/test/java/io/questdb/test/griffin/engine/window/WindowFunctionTest.java
Show resolved
Hide resolved
|
@CodeRabbit review |
✅ Actions performedReview triggered.
|
[PR Coverage check]😍 pass : 658 / 697 (94.40%) file detail
|
Summary
ksum()as a window function using the Kahan summation algorithm for improved floating-point precisionksum()aggregate function by enabling its use in window contextsChanges
KSumDoubleWindowFunctionFactory.java- Window function factory with 9 inner classes for different frame typesfunction_list.txt- Added registration entrySqlParserTest.java- Updated test to usensum(sinceksumis now a valid window function)WindowFunctionTest.java- Addedksumto FRAME_FUNCTIONS list and new test casesUsage Examples
Test plan
WindowFunctionTest.java:testKSumWindowFunction- Tests partition, unbounded preceding, sliding window, and whole result settestKSumPrecision- Tests NULL handlingksumadded toFRAME_FUNCTIONSlist for integration with existing parameterized testsSqlParserTest#testNonWindowFunctionInWindowContextstill works withnsum🤖 Generated with Claude Code