-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[multistage] Fix Bugs in SetOp Handling and Multi-Column Join #16330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multistage] Fix Bugs in SetOp Handling and Multi-Column Join #16330
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #16330 +/- ##
============================================
+ Coverage 62.90% 63.19% +0.28%
+ Complexity 1386 1364 -22
============================================
Files 2867 2974 +107
Lines 163354 172591 +9237
Branches 24952 26435 +1483
============================================
+ Hits 102755 109062 +6307
- Misses 52847 55208 +2361
- Partials 7752 8321 +569
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @Override | ||
| public SetOp copy(RelTraitSet traitSet, List<RelNode> inputs, boolean all) { | ||
| return new PhysicalIntersect( | ||
| getCluster(), traitSet, getHints(), inputs.stream().map(PRelNode.class::cast).collect(Collectors.toList()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all RelNode castable to PRelNode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, but our design is such that after RelNode optimization steps, we convert all nodes to PRelNode, and then only operate with them.
In other words, if a copy is called on a PRelNode, then it will only be when everything is a PRelNode.
...ery-planner/src/main/java/org/apache/pinot/query/planner/physical/v2/RelToPRelConverter.java
Show resolved
Hide resolved
6fe7df4 to
b81ccb3
Compare
Note: Only touches physical optimizer code.
Summary
Enables the H2 comparison test in
ResourceBasedQueriesTestand fixes the last set of bugs that I couldn't get to before. The bugs and changes are summarized below.Fix Trait Assignment for SetOp
For SetOp, we need to pushdown a hash distribution trait with all fields. Except for
UNION ALL, all other set operations require this hash dist trait.UNION ALLdoesn't need it, because it allows duplicates.Fixes Minus
I was creating
PhysicalUnionfor a minus 🤦 .Adds Intersect
Adds support for intersect.
Fixes Multi-Column Join
RelDistribution.hash(keys)normalizes the keys. That means that if you pass inkeys=[1, 0, 4], it will get sorted and stored as[0, 1, 4].When computing
hashDistToMatchinWorkerExchangeAssignmentRule, I was using List to List comparison for theHashDistributionDescand theRelDistribution.hashkeys. Changed it to do set to set comparison.In the future we can even change the keys to a set in
HashDistributionDesc, or normalize the keys inHashDistributionDesctoo.Fixes Runtime Support for SetOp
I was only handling joins in
meetParentEnforcedDistributionConstraint. Extended it to handleSetOptoo.Test Plan
Unit Tests have extensive coverage. There are around 200+ queries run in the new test I enabled in
ResourceBasedQueriesTest, where the same query is run with H2 and the physical optimizer.