-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Allow using 'serverReturnFinalResult' to optimize server partitioned table #13208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using 'serverReturnFinalResult' to optimize server partitioned table #13208
Conversation
40e85fd to
69c07ba
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13208 +/- ##
============================================
+ Coverage 61.75% 62.16% +0.41%
+ Complexity 207 198 -9
============================================
Files 2436 2534 +98
Lines 133233 139008 +5775
Branches 20636 21535 +899
============================================
+ Hits 82274 86420 +4146
- Misses 44911 46126 +1215
- Partials 6048 6462 +414
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
af4159f to
0eefa89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not directly add finalResult2 to finalResult1 then return finalResult1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good
|
Overall LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this IndexedTable has multiple aggregations, do you need this _hasFinalInput also to be per agg function or it's just an indicate ?
E.g. SELECT distinctCount(pk), avg(other_column) from myTable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In V1 engine we don't have a way to make per function hint. We can make it more smart in V2 which can carry more information.
0eefa89 to
ce47def
Compare
ce47def to
78fee19
Compare
When a column is partitioned on each server (i.e. the same value always show up on the same server), the following queries can be optimized by asking server to directly return final aggregate result instead of intermediate aggregate result.
1:
SELECT DISTINCT_COUNT(partitionedCol) FROM myTable2:
SELECT DISTINCT_COUNT(partitionedCol) FROM myTable GROUP BY col3:
SELECT AGG(col) FROM myTable GROUP BY partitionedColFor all 3 queries, we can ask server to return final aggregate result, but there are some difference between 2 and 3. For 2, server can return final aggregate result, but should still keep enough groups because the aggregate result is not global final result, but only the final result for a partition; For 3, server only needs to keep
LIMITgroups because the aggregate result is global final result for the group.In this PR, user can
SET serverReturnFinalResult = true;to accelerate 1 and 3; user canSET serverReturnFinalResultKeyUnpartitioned = true;to accelerate 2This PR also fixes the following bugs:
GroupByDataTableReducerMinMaxRangeAggregationFunction