Fix grouping set subset satisfaction by freakyzoidberg · Pull Request #19853 · apache/datafusion

freakyzoidberg · 2026-01-16T13:41:59Z

Summary

Fixes incorrect results from ROLLUP/CUBE/GROUPING SETS queries when using multiple partitions
The subset satisfaction optimization was incorrectly allowing hash partitioning on fewer columns to satisfy requirements that include __grouping_id
This caused partial aggregates from different partitions to be finalized independently, producing duplicate grand totals

Closes #19849

gabotechs · 2026-01-16T13:48:22Z

cc @gene-bordegaray do this changes look good to you?

gabotechs · 2026-01-16T13:53:20Z

I can confirm that this change does fix the issue, I'll let @gene-bordegaray chime in as the one assigned to the original issue.

Unless there are any relevant comments, I'll merge this in as soon as CI passes.

gene-bordegaray · 2026-01-16T13:58:18Z

Yes this looks correct, thanks @freakyzoidberg . Let me unassign myself and comment take on issue

only thing may be wanting to add plans in the sqllogictests 😄

cc: @gabotechs

gabotechs · 2026-01-16T14:35:09Z

datafusion/sqllogictest/test_files/grouping_set_repartition.slt

+04)------AggregateExec: mode=FinalPartitioned, gby=[channel@0 as channel, brand@1 as brand, __grouping_id@2 as __grouping_id], aggr=[sum(sub.total)]
+05)--------RepartitionExec: partitioning=Hash([channel@0, brand@1, __grouping_id@2], 4), input_partitions=4
+06)----------AggregateExec: mode=Partial, gby=[(NULL as channel, NULL as brand), (channel@0 as channel, NULL as brand), (channel@0 as channel, brand@1 as brand)], aggr=[sum(sub.total)]


👍 I imagine before this PR the RepartitionExec would not be there right?

gabotechs

👍 Looks good! thanks @freakyzoidberg and @gene-bordegaray

## Summary - Fixes incorrect results from ROLLUP/CUBE/GROUPING SETS queries when using multiple partitions - The subset satisfaction optimization was incorrectly allowing hash partitioning on fewer columns to satisfy requirements that include `__grouping_id` - This caused partial aggregates from different partitions to be finalized independently, producing duplicate grand totals Closes apache#19849

alamb · 2026-01-16T16:13:27Z

Love it -- thank you

alamb · 2026-01-16T16:15:39Z

Thank you @freakyzoidberg and @gabotechs

Do you we know what changes in 52 introduced this problem?

gabotechs · 2026-01-16T16:20:11Z

You have the full conclusion here #19849

Brings #19853 into `branch-52` Co-authored-by: Pierre Lacave <[email protected]>

alamb · 2026-01-16T17:10:04Z

Thanks @gabotechs -- I'll ask some follow up questions there

NGA-TRAN

Nice!

freakyzoidberg added 2 commits January 16, 2026 14:36

Add test for ROLLUP with multiple partitions

e4535ae

Fix incorrect ROLLUP results with subset partition satisfaction

67b8c86

github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jan 16, 2026

Use constant grouping_id value

c62e3d3

Add explain in grouping set statisfaction slt

5a3667b

gabotechs reviewed Jan 16, 2026

View reviewed changes

gabotechs approved these changes Jan 16, 2026

View reviewed changes

gabotechs added this pull request to the merge queue Jan 16, 2026

Merged via the queue into apache:main with commit 1ab7e41 Jan 16, 2026
32 checks passed

This was referenced Jan 16, 2026

[branch 52] Fix grouping set subset satisfaction #19855

Merged

Release DataFusion 52.1.0 (minor/patch) Release (Jan 2026) #19784

Closed

gabotechs added a commit that referenced this pull request Jan 16, 2026

[branch 52] Fix grouping set subset satisfaction (#19855)

eb00fe2

Brings #19853 into `branch-52` Co-authored-by: Pierre Lacave <[email protected]>

NGA-TRAN reviewed Jan 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix grouping set subset satisfaction#19853

Fix grouping set subset satisfaction#19853
gabotechs merged 4 commits intoapache:mainfrom
freakyzoidberg:fix-grouping-set-subset-satisfaction

freakyzoidberg commented Jan 16, 2026

Uh oh!

gabotechs commented Jan 16, 2026 •

edited

Loading

Uh oh!

gabotechs commented Jan 16, 2026

Uh oh!

gene-bordegaray commented Jan 16, 2026 •

edited

Loading

Uh oh!

gabotechs Jan 16, 2026

Uh oh!

freakyzoidberg Jan 16, 2026

Uh oh!

gabotechs left a comment

Uh oh!

Uh oh!

alamb commented Jan 16, 2026

Uh oh!

alamb commented Jan 16, 2026 •

edited

Loading

Uh oh!

gabotechs commented Jan 16, 2026

Uh oh!

alamb commented Jan 16, 2026

Uh oh!

NGA-TRAN left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

freakyzoidberg commented Jan 16, 2026

Summary

Uh oh!

gabotechs commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gabotechs commented Jan 16, 2026

Uh oh!

gene-bordegaray commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gabotechs Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

freakyzoidberg Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

gabotechs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alamb commented Jan 16, 2026

Uh oh!

alamb commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gabotechs commented Jan 16, 2026

Uh oh!

alamb commented Jan 16, 2026

Uh oh!

NGA-TRAN left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gabotechs commented Jan 16, 2026 •

edited

Loading

gene-bordegaray commented Jan 16, 2026 •

edited

Loading

alamb commented Jan 16, 2026 •

edited

Loading