-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[multistage] Add Leaf Stage Worker Assignment / Boundary / Agg Rules #15481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #15481 +/- ##
============================================
+ Coverage 61.75% 62.83% +1.07%
- Complexity 207 1381 +1174
============================================
Files 2436 2856 +420
Lines 133233 162017 +28784
Branches 20636 24851 +4215
============================================
+ Hits 82274 101797 +19523
- Misses 44911 52532 +7621
- Partials 6048 7688 +1640
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
itschrispeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good afaict, though I'd like to take another look after the next couple PRs when I can step through some tests.
| return new PhysicalAggregate(aggRel.getCluster(), aggRel.getTraitSet(), aggRel.getHints(), aggRel.getGroupSet(), | ||
| aggRel.getGroupSets(), aggRel.getAggCallList(), nodeIdGenerator.get(), inputs.get(0), null, false, | ||
| AggregateNode.AggType.DIRECT, false, List.of(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand, why is this always direct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Direct mode aggregate means there's no partial aggregate, and we directly compute the full aggregate.
I am setting it to Direct right now because we are not doing the aggregate split yet. I'll add it in the next PR (or the one after that)
Ongoing work to add support for physical optimizers, this builds on top of these PRs.
Summary
Aggregate > (Leaf-Stage), then it would be possible to make Aggregate a part of the leaf stage. This rule is applied when either the is partitioned by group-by keys hint is present, or when we automatically detect that the data is partitioned on the group-by keys.Callouts
Leaf Stage Aggregate Rule
Right now the aggregate rules are kinda all over the place. Once I have a working E2E version I think it'll be easy to see if we can merge them into 1 rule. E.g. ideally I would have liked to handle this within the Aggregate Split rule itself, but doing that is tricky because the split happens after Exchange is added, and you could have parallelism change around the leaf stage boundary.
Trait Assignment on PRelNodes
During the PoC my approach was to assign traits to RelNodes. But given many RelNodes can drop traits, and we have built our own Physical* versions anyways, I have now switched to using PRelNodes for trait assignment.