-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[multistage] Multiple Window Group Support #16109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multistage] Multiple Window Group Support #16109
Conversation
53efb75 to
f9bb276
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #16109 +/- ##
============================================
+ Coverage 62.90% 63.29% +0.39%
+ Complexity 1386 1365 -21
============================================
Files 2867 2960 +93
Lines 163354 170635 +7281
Branches 24952 26103 +1151
============================================
+ Hits 102755 108008 +5253
- Misses 52847 54431 +1584
- Partials 7752 8196 +444
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f9bb276 to
f882733
Compare
f882733 to
0301b38
Compare
| .map(operand -> operand.accept(this)) | ||
| .collect(Collectors.toList()); | ||
| return new Window.RexWinAggCall( | ||
| (SqlAggFunction) winCall.getOperator(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can we have a Preconditions.checkState for clear error handling if the cast is not correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RexWinAggCall currently can only be initialized with SqlAggFunction and the condition to step into this block is that the if (call instanceof Window.RexWinAggCall)
public static class RexWinAggCall extends RexCall {
public final int ordinal;
public final boolean distinct;
public final boolean ignoreNulls;
/** @deprecated */
@Deprecated
public RexWinAggCall(SqlAggFunction aggFun, RelDataType type, List<RexNode> operands, int ordinal, boolean distinct) {
this(aggFun, type, operands, ordinal, distinct, false);
}
public RexWinAggCall(SqlAggFunction aggFun, RelDataType type, List<RexNode> operands, int ordinal, boolean distinct, boolean ignoreNulls) {
super(type, aggFun, operands);
this.ordinal = ordinal;
this.distinct = distinct;
this.ignoreNulls = ignoreNulls;
}
}
23ec998 to
f44a2e9
Compare
Resolve the issue #15985
Background
The Core Problem: Shifting Input Field Counts. A LogicalWindow's expressions (
RexNodes) can refer to two things via RexInputRef: (1) Columns from its input relation (index < inputFieldCount); (2) Literals from its constants list (index >= inputFieldCount).When we chain the windows, the inputFieldCount for each subsequent window increases.
k-th constant is RexInputRef(N + k).N. It addsagg1_countfields. Its output hasN + agg1_countfields.N' = N + agg1_count.If a group in LW2 keep using the original RexInputRef(
N + k), LW2 will interpret it as a reference to the (N + k)-th column of its input. This is now incorrect. It's pointing to one of LW1's aggregate results instead of the intended constant. The reference needs to be shifted to RexInputRef(N' + k).Solution
Correct Row Type construction
cumulativeAggFieldCountvariable correctly tracks our position in the original aggregate field list.Correct RexInputRef pointing to the constant variable
Use a RexShuttle to traverse the expressions in each Window.Group and adjust any RexInputRef that points to a constant. Specially,
visitInputRefis overridden. It checks if an input reference's index points to a constant (index >=originalInputFieldCount). If it does, it creates a new RexInputRef with the index increased by theshiftamount. Otherwise, it returns the original reference. Theshiftamount denotes the number of new fields added by previous windows in the chain.Why groupSet and orderKeys Don't Need Shifting
groupSetandorderKeysRefer To: The integer indices in groupSet (PARTITION BY) and orderKeys (ORDER BY) refer to the columns of the window's direct input relation. Calcite provides mechanisms to fail early or eliminate the constant variables appearing in the window functions'sPartition ByorOrder By. Therefore, these indices will always be < inputFieldCount and will never point to the constants list.Outcome
Dataset: the Quick Start of COLOCATED_JOIN
Test SQL:
Removing
SET usePhysicalOptimizer=true;get the same correct result.Result