Skip to content

Conversation

@Jackie-Jiang
Copy link
Contributor

@Jackie-Jiang Jackie-Jiang commented Sep 20, 2023

  • For single join key and group key scenario, no need to wrap the values into the Key object. Directly use the value as the key in the map.
  • Optimize the case of no join key
  • In group-by executor, store merged results in ArrayList, group key map in Object2IntOpenHashMap
  • In join operator, store matched right rows in BitSet

@Jackie-Jiang Jackie-Jiang added enhancement performance multi-stage Related to the multi-stage query engine labels Sep 20, 2023
@codecov-commenter
Copy link

codecov-commenter commented Sep 20, 2023

Codecov Report

Attention: Patch coverage is 72.17391% with 64 lines in your changes missing coverage. Please review.

Project coverage is 63.10%. Comparing base (47eff88) to head (360748e).
Report is 2666 commits behind head on master.

Files with missing lines Patch % Lines
...hysical/colocated/GreedyShuffleRewriteVisitor.java 0.00% 21 Missing ⚠️
.../apache/pinot/core/util/DataBlockExtractUtils.java 10.52% 17 Missing ⚠️
...t/query/planner/logical/ShuffleRewriteVisitor.java 0.00% 13 Missing ⚠️
...ry/runtime/operator/MultistageGroupByExecutor.java 91.02% 6 Missing and 1 partial ⚠️
...query/runtime/operator/exchange/BlockExchange.java 25.00% 1 Missing and 2 partials ⚠️
...t/query/planner/partitioning/EmptyKeySelector.java 75.00% 1 Missing ⚠️
.../pinot/query/planner/partitioning/KeySelector.java 0.00% 1 Missing ⚠️
...not/query/planner/plannode/MailboxReceiveNode.java 50.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #11630      +/-   ##
============================================
+ Coverage     63.08%   63.10%   +0.01%     
- Complexity     1110     1121      +11     
============================================
  Files          2330     2334       +4     
  Lines        125145   125185      +40     
  Branches      19195    19208      +13     
============================================
+ Hits          78952    78994      +42     
- Misses        40558    40561       +3     
+ Partials       5635     5630       -5     
Flag Coverage Δ
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 63.07% <72.17%> (+<0.01%) ⬆️
java-17 62.96% <72.17%> (+<0.01%) ⬆️
java-20 62.96% <72.17%> (+0.02%) ⬆️
temurin 63.10% <72.17%> (+0.01%) ⬆️
unittests 63.09% <72.17%> (+0.01%) ⬆️
unittests1 67.31% <72.17%> (+0.02%) ⬆️
unittests2 14.46% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0
Copy link
Contributor

Please fix the CI.

List<ColocationKey> colocationKeys = ((FieldSelectionKeySelector) selector).getColumnIndices().stream()
.map(x -> new ColocationKey(x, numPartitions, selector.hashAlgorithm())).collect(Collectors.toList());
List<ColocationKey> colocationKeys =
distributionKeys.stream().map(x -> new ColocationKey(x, numPartitions, KeySelector.DEFAULT_HASH_ALGORITHM))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we make this algorithm configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimizer is triggered by a hint, and we are not using it. I believe the long term plan is to remove it. cc @walterddr

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think for now we can assume that. the problem is we dont know how users will actually be using. having this configurable can be done in the future when needed.

@Jackie-Jiang Jackie-Jiang merged commit f6cb672 into apache:master Sep 20, 2023
@Jackie-Jiang Jackie-Jiang deleted the optimize_map branch September 20, 2023 17:10
private KeySelectorFactory() {
}

public static KeySelector<?> getKeySelector(List<Integer> keyIds) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangfu0 we can add configurable algorithms by adding other factory methods IMO

Comment on lines +95 to +97
private List<Integer> _leftKeys;
@ProtoProperties
private KeySelector<Object[], Object[]> _rightJoinKeySelector;
private List<Integer> _rightKeys;
Copy link
Contributor

@walterddr walterddr Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

superb refactoring. this was a legacy property we added for pluggability that we never used. we should've removed these and use factory methods even before the 1.0 release :-P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement multi-stage Related to the multi-stage query engine performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants