Skip to content

Conversation

@xiangfu0
Copy link
Contributor

@xiangfu0 xiangfu0 commented Oct 17, 2023

Add aggregation function: ArrayAgg.

  • Usage: ArrayAgg(column, 'dataType' [, 'isDistinct'])
  • Spports data BOOLEAN, INT, LONG, FLOAT(only in V1), DOUBLE, STRING, TIMESTAMP.
    E.g. ArrayAgg(intCol, 'INT') returns ARRAY<INT>

For implementation, separated the implementations per data type:

  • Base abstract class is ArrayAggFunction,
  • Each type has its own abstract class ArrayAggBase<type>Function,
  • There are two implementations:
    • distinct agg: ArrayAggDistinct<type>Function,
    • non-distinct agg: ArrayAgg<type>Function

Note: Float type column is treated as Double in v2 engine, so FLOAT type is not supported.

Release Note

  • Support ArrayAgg Aggregation function

@xiangfu0 xiangfu0 force-pushed the array-agg branch 11 times, most recently from 5c7b1e2 to e50b66b Compare October 17, 2023 20:41
@xiangfu0 xiangfu0 requested review from Jackie-Jiang, richardstartin and walterddr and removed request for richardstartin October 17, 2023 21:09
@xiangfu0 xiangfu0 added feature release-notes Referenced by PRs that need attention when compiling the next release notes multi-stage Related to the multi-stage query engine labels Oct 17, 2023
@xiangfu0 xiangfu0 changed the title Adding ArrayAgg function Add ArrayAgg aggregation function Oct 17, 2023
@apache apache deleted a comment from codecov-commenter Oct 17, 2023
@codecov-commenter
Copy link

codecov-commenter commented Oct 17, 2023

Codecov Report

Attention: Patch coverage is 40.45534% with 340 lines in your changes missing coverage. Please review.

Project coverage is 62.88%. Comparing base (226087f) to head (c9addb1).
Report is 2920 commits behind head on master.

Files with missing lines Patch % Lines
...gation/function/array/BaseArrayAggIntFunction.java 15.15% 26 Missing and 2 partials ⚠️
...ation/function/array/BaseArrayAggLongFunction.java 16.12% 24 Missing and 2 partials ⚠️
...ion/function/array/BaseArrayAggDoubleFunction.java 17.24% 22 Missing and 2 partials ⚠️
...tion/function/array/BaseArrayAggFloatFunction.java 17.24% 22 Missing and 2 partials ⚠️
...gregation/function/array/BaseArrayAggFunction.java 33.33% 23 Missing and 1 partial ⚠️
...ion/function/array/BaseArrayAggStringFunction.java 17.24% 22 Missing and 2 partials ⚠️
...function/array/ArrayAggDistinctDoubleFunction.java 33.33% 16 Missing ⚠️
.../function/array/ArrayAggDistinctFloatFunction.java 33.33% 16 Missing ⚠️
...function/array/ArrayAggDistinctStringFunction.java 30.43% 16 Missing ⚠️
...egation/function/array/ArrayAggDoubleFunction.java 33.33% 16 Missing ⚠️
... and 12 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #11822      +/-   ##
============================================
- Coverage     63.02%   62.88%   -0.14%     
- Complexity     1140     1141       +1     
============================================
  Files          2351     2367      +16     
  Lines        127324   127888     +564     
  Branches      19611    19732     +121     
============================================
+ Hits          80244    80424     +180     
- Misses        41373    41733     +360     
- Partials       5707     5731      +24     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 62.83% <40.45%> (-0.14%) ⬇️
java-21 62.76% <40.45%> (-0.09%) ⬇️
skip-bytebuffers-false 62.87% <40.45%> (-0.14%) ⬇️
skip-bytebuffers-true 62.72% <40.10%> (-0.08%) ⬇️
temurin 62.88% <40.45%> (-0.14%) ⬇️
unittests 62.88% <40.45%> (-0.14%) ⬇️
unittests1 66.88% <40.45%> (-0.20%) ⬇️
unittests2 14.44% <0.00%> (-0.08%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably consider how to parse this. b/c the proper way to express distinct array agg is

SELECT key, ARRAY_AGG(DISTINCT col) 
FROM tbl 
GROUP BY key

given that our functions are not standard it is fine we do it now but good to note this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, I was also thinking about this, however it's not parsed in v1 query engine.

Comment on lines +88 to +89
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @Jackie-Jiang this is related to #11518

@xiangfu0 xiangfu0 requested a review from walterddr October 19, 2023 06:38
@xiangfu0 xiangfu0 force-pushed the array-agg branch 3 times, most recently from bd2560b to afbd1c1 Compare October 19, 2023 16:34
Comment on lines +52 to +56
for (int i = 0; i < length; i++) {
if (!nullBitmap.contains(i)) {
setGroupByResult(groupByResultHolder, groupKeyArray[i], values[i]);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be postgres compatible. Documentation says:

array_agg ( anynonarray ) → anyarray
Collects all the input values, including nulls, into an array.

And in fact that is how it behaves
Example:

postgres=# insert into A(a) values (1), (2), (3), (null), (5);
INSERT 0 5
postgres=# select array_agg(a) from A;
   array_agg    
----------------
 {1,2,3,NULL,5}
(1 row)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature multi-stage Related to the multi-stage query engine release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants