-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Support ListAgg WITHIN GROUP clause #13146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13146 +/- ##
============================================
+ Coverage 61.75% 62.22% +0.47%
+ Complexity 207 198 -9
============================================
Files 2436 2531 +95
Lines 133233 138446 +5213
Branches 20636 21428 +792
============================================
+ Hits 82274 86152 +3878
- Misses 44911 45866 +955
- Partials 6048 6428 +380
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
07d980d to
574394e
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should generalize the query flow for aggregate with distinct or within group order by. It should be modeled as projection/distinct on the leaf stage instead of aggregation. The aggregation should only be applied on the intermediate stage, which is similar to the window function handling logic
pinot-common/src/main/java/org/apache/pinot/common/request/context/FunctionContext.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/function/array/ListAggFunction.java
Outdated
Show resolved
Hide resolved
pinot-query-planner/src/main/java/org/apache/pinot/query/planner/plannode/AggregateNode.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/function/array/ListAggFunction.java
Outdated
Show resolved
Hide resolved
Do you mean We should have a new rule or reuse existing rule to modify the query plan? |
I think the distinct will still happen in both leaf and intermediate stage, as there could still be duplicates across leaf nodes. |
360739d to
e5cd5ed
Compare
|
Simplified the code to for both collation and distinct handling. |
...re/src/main/java/org/apache/pinot/core/query/aggregation/function/array/ListAggFunction.java
Outdated
Show resolved
Hide resolved
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference for ListAggFunction and ArrayAggString/ArrayAggDistinctStringFunction is just the extract final results part. Since you need to do special handling on v2 side, is it possible to directly use the ArrayAgg? If not, probably extend that and only override the extract final results
...re/src/main/java/org/apache/pinot/core/query/aggregation/function/array/ListAggFunction.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/function/array/ListAggFunction.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/function/array/ListAggFunction.java
Outdated
Show resolved
Hide resolved
|
8bf72f0 to
9610e82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not change this signature, and use argument to pass distinct similar to ARRAYAGG. In the future, we don't want to handle distinct logic within the aggregation function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted getAggregationFunction signature, it means we need to have special handle for LISTAGG to append isDistinct to the third argument.
This is due to LISTAGG is registered in SqlStdOperatorTable, so the sql syntax cannot be changed, usage can only be LISTAGG(distinct expr, ' | ') not LISTAGG(expr, ' | ', true).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the error for the query:
SELECT
Origin, Dest,
LISTAGG(Carrier, ', ', 'true') WITHIN GROUP (ORDER BY Carrier)
FROM
airlineStats
GROUP BY
Dest, Origin
ProcessingException(errorCode:150, message:SQLParsingError:
java.lang.Exception: Unable to find table for this query
at org.apache.pinot.controller.api.resources.PinotQueryResource.getMultiStageQueryResponse(PinotQueryResource.java:222)
at org.apache.pinot.controller.api.resources.PinotQueryResource.executeSqlQuery(PinotQueryResource.java:179)
at org.apache.pinot.controller.api.resources.PinotQueryResource.handlePostSql(PinotQueryResource.java:127)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
...
Caused by: java.lang.RuntimeException: Error composing query plan for: SELECT
Origin, Dest,
LISTAGG(Carrier, ', ', 'true') WITHIN GROUP (ORDER BY Carrier)
FROM
airlineStats
...
Caused by: org.apache.calcite.runtime.CalciteContextException: From line 3, column 5 to line 3, column 34: Invalid number of arguments to function 'LISTAGG'. Was expecting 1 arguments
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
...
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: Invalid number of arguments to function 'LISTAGG'. Was expecting 1 arguments
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490))
...ain/java/org/apache/pinot/core/query/aggregation/function/array/ListAggDistinctFunction.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/core/query/aggregation/function/array/ListAggDistinctFunction.java
Outdated
Show resolved
Hide resolved
...re/src/main/java/org/apache/pinot/core/query/aggregation/function/array/ListAggFunction.java
Outdated
Show resolved
Hide resolved
3c88a3c to
a4d2cc5
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
...c/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/pinot/core/query/aggregation/function/AggregationFunctionFactory.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/pinot/core/query/aggregation/function/array/ListAggDistinctFunction.java
Show resolved
Hide resolved
...t-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/AggregateOperator.java
Outdated
Show resolved
Hide resolved
* Support ListAgg WITHIN GROUP clause * fix rowCompartor schema * Change rule to insert a sort for WithInGroup clause * move isDistinct field out from FunctionContext * address comments
WITHIN GROUPClause in MultiStage EnginePinotLogicalSortExchangefor WITHIN GROUP instead ofPinotLogicalExchangefor leaf data exchangeListAggaggregation functionDistinctflag toListAggimplementationLimitation:
WITHIN GROUPClause is supported right now.Sample query:
Query Plan:
Query Plan