-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[multistage][feature] leaf planning with multi-semi join support #11937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[multistage][feature] leaf planning with multi-semi join support #11937
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11937 +/- ##
==========================================
Coverage 61.42% 61.43%
+ Complexity 1129 207 -922
==========================================
Files 2385 2385
Lines 129188 129218 +30
Branches 20000 20006 +6
==========================================
+ Hits 79355 79380 +25
- Misses 44082 44083 +1
- Partials 5751 5755 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 27 files with indirect coverage changes 📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today! |
f7b7239 to
a4112fa
Compare
581eb76 to
48abf60
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not introduced in this PR, but can we break these checks into multiple steps? We should avoid doing the expensive canPushDynamicBroadcastToLeaf() unless all other checks pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i think this is in the TODO item for this function. unfortunately, there's not too much we can do here.
pinot-query-planner/src/main/java/org/apache/calcite/rel/rules/PinotRuleUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we differentiate aggregate after join vs join after aggregate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's no need to differentiate. from a top-down walk. the moment you reach agg, you are guarantee that you have join on top of an aggregate (b/c the starting point of this call is a join
...-query-runtime/src/main/java/org/apache/pinot/query/runtime/operator/MultiStageOperator.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/pinot/query/runtime/operator/LeafStageTransferableBlockOperator.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest not introducing this as subclass but keep the existing ServerPlanRequestContext. If you find these extra properties apply to OpChain in general, we may directly add them into OpChainExecutionContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i need to extend OpChainExecutionContext. b/c the visitor pattern of PhysicalPlanVisitor expects an OpChainExecutionContext
I am not particularly happy about this but in order to reuse the logic in PhysicalPlanVisitor this is the only option i can think of
we can obviously create an interface of the execution context and refactor the logic in PhysicalPlanVisitorvisitor to a utility class. but that's too much refactoring in this PR IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed offline, we reverted the server extension on physical plan visitor and use the base logic instead.
connected with QueryRunner refactor and simplify leaf stage compilation fix linter and further refactoring simplified visitor and context class, add javadoc fix server plan request gen issues
- fix direct on leaf
48abf60 to
c3b052f
Compare
1. removed ServerPhysicalPlan related extensions, moved to base 2. base handles leaf-stage boundary only when leaf server context is set 3. ServerPlanRequestContext reverted back to its own purpose --> to only construct PinotQuery, ServerQueryRequest, etc
pinot-query-planner/src/main/java/org/apache/calcite/rel/rules/PinotRuleUtils.java
Show resolved
Hide resolved
ad0a6fc to
a6c6cf2
Compare
| private static void constructPinotQueryPlan(ServerPlanRequestContext serverContext) { | ||
| DistributedStagePlan stagePlan = serverContext.getStagePlan(); | ||
| PinotQuery pinotQuery = serverContext.getPinotQuery(); | ||
| Integer leafNodeLimit = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactor was wrong here
currently pinotQuery only supports limited amount of PlanNodes
this means an Exchange must be inserted in order to ensure the leaf stage can be converted into a PinotQuery. However this is a misuse of Exchange
This PR plans to split the ServerRequest planning into 2 stages
This PR also follows up with the refactoring from #11439.
Context: alternative PR to #11843