Fix grouping set subset satisfaction#19853
Conversation
|
cc @gene-bordegaray do this changes look good to you? |
|
I can confirm that this change does fix the issue, I'll let @gene-bordegaray chime in as the one assigned to the original issue. Unless there are any relevant comments, I'll merge this in as soon as CI passes. |
|
Yes this looks correct, thanks @freakyzoidberg . Let me unassign myself and comment take on issue only thing may be wanting to add plans in the sqllogictests 😄 cc: @gabotechs |
| 04)------AggregateExec: mode=FinalPartitioned, gby=[channel@0 as channel, brand@1 as brand, __grouping_id@2 as __grouping_id], aggr=[sum(sub.total)] | ||
| 05)--------RepartitionExec: partitioning=Hash([channel@0, brand@1, __grouping_id@2], 4), input_partitions=4 | ||
| 06)----------AggregateExec: mode=Partial, gby=[(NULL as channel, NULL as brand), (channel@0 as channel, NULL as brand), (channel@0 as channel, brand@1 as brand)], aggr=[sum(sub.total)] |
There was a problem hiding this comment.
👍 I imagine before this PR the RepartitionExec would not be there right?
gabotechs
left a comment
There was a problem hiding this comment.
👍 Looks good! thanks @freakyzoidberg and @gene-bordegaray
## Summary - Fixes incorrect results from ROLLUP/CUBE/GROUPING SETS queries when using multiple partitions - The subset satisfaction optimization was incorrectly allowing hash partitioning on fewer columns to satisfy requirements that include `__grouping_id` - This caused partial aggregates from different partitions to be finalized independently, producing duplicate grand totals Closes apache#19849
|
Love it -- thank you |
|
Thank you @freakyzoidberg and @gabotechs Do you we know what changes in 52 introduced this problem? |
|
You have the full conclusion here #19849 |
Brings #19853 into `branch-52` Co-authored-by: Pierre Lacave <[email protected]>
|
Thanks @gabotechs -- I'll ask some follow up questions there |
Summary
__grouping_idCloses #19849