-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Optimize group-by and join for single key scenario #11630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #11630 +/- ##
============================================
+ Coverage 63.08% 63.10% +0.01%
- Complexity 1110 1121 +11
============================================
Files 2330 2334 +4
Lines 125145 125185 +40
Branches 19195 19208 +13
============================================
+ Hits 78952 78994 +42
- Misses 40558 40561 +3
+ Partials 5635 5630 -5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Please fix the CI. |
f4b16c6 to
360748e
Compare
| List<ColocationKey> colocationKeys = ((FieldSelectionKeySelector) selector).getColumnIndices().stream() | ||
| .map(x -> new ColocationKey(x, numPartitions, selector.hashAlgorithm())).collect(Collectors.toList()); | ||
| List<ColocationKey> colocationKeys = | ||
| distributionKeys.stream().map(x -> new ColocationKey(x, numPartitions, KeySelector.DEFAULT_HASH_ALGORITHM)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we make this algorithm configurable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimizer is triggered by a hint, and we are not using it. I believe the long term plan is to remove it. cc @walterddr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think for now we can assume that. the problem is we dont know how users will actually be using. having this configurable can be done in the future when needed.
| private KeySelectorFactory() { | ||
| } | ||
|
|
||
| public static KeySelector<?> getKeySelector(List<Integer> keyIds) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiangfu0 we can add configurable algorithms by adding other factory methods IMO
| private List<Integer> _leftKeys; | ||
| @ProtoProperties | ||
| private KeySelector<Object[], Object[]> _rightJoinKeySelector; | ||
| private List<Integer> _rightKeys; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superb refactoring. this was a legacy property we added for pluggability that we never used. we should've removed these and use factory methods even before the 1.0 release :-P
Keyobject. Directly use the value as the key in the map.ArrayList, group key map inObject2IntOpenHashMapBitSet