Skip to content

Conversation

@walterddr
Copy link
Contributor

@walterddr walterddr commented Nov 2, 2023

currently pinotQuery only supports limited amount of PlanNodes
this means an Exchange must be inserted in order to ensure the leaf stage can be converted into a PinotQuery. However this is a misuse of Exchange

This PR plans to split the ServerRequest planning into 2 stages

  • first plan as much as possible into PinotQuery
  • for any remainder nodes that cannot be planned into PinotQuery, will be run together with the LeafStageTransferrableBlockOperator as the input locally.

This PR also follows up with the refactoring from #11439.

  • removed server plan request context entirely and replace it with fields used by server side.
  • extends OpChainExecutionContext with additional info from ServerOpChainExecutionContext
  • break down steps from ServerPlanRequestUtils.build method into 3 parts -->
    • compile pinot query
    • convert pinot query into instance requests (based on physical segments)
    • construct an OpChain that sits on top

Context: alternative PR to #11843

@codecov-commenter
Copy link

codecov-commenter commented Nov 2, 2023

Codecov Report

Merging #11937 (a6c6cf2) into master (73d82ec) will increase coverage by 0.00%.
Report is 1 commits behind head on master.
The diff coverage is 86.56%.

@@            Coverage Diff             @@
##             master   #11937    +/-   ##
==========================================
  Coverage     61.42%   61.43%            
+ Complexity     1129      207   -922     
==========================================
  Files          2385     2385            
  Lines        129188   129218    +30     
  Branches      20000    20006     +6     
==========================================
+ Hits          79355    79380    +25     
- Misses        44082    44083     +1     
- Partials       5751     5755     +4     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (ø)
integration <0.01% <0.00%> (ø)
integration1 <0.01% <0.00%> (ø)
integration2 0.00% <0.00%> (ø)
java-11 61.38% <86.56%> (+0.01%) ⬆️
java-21 61.28% <86.56%> (-0.02%) ⬇️
skip-bytebuffers-false 61.41% <86.56%> (+<0.01%) ⬆️
skip-bytebuffers-true 61.26% <86.56%> (+<0.01%) ⬆️
temurin 61.43% <86.56%> (+<0.01%) ⬆️
unittests 61.42% <86.56%> (+<0.01%) ⬆️
unittests1 46.68% <86.56%> (+0.01%) ⬆️
unittests2 27.59% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...va/org/apache/pinot/query/runtime/QueryRunner.java 74.22% <100.00%> (-6.48%) ⬇️
...ot/query/runtime/plan/OpChainExecutionContext.java 96.55% <100.00%> (+0.39%) ⬆️
.../pinot/query/runtime/plan/PhysicalPlanVisitor.java 92.45% <100.00%> (+0.96%) ⬆️
.../runtime/plan/server/ServerPlanRequestContext.java 100.00% <100.00%> (ø)
...ry/runtime/plan/server/ServerPlanRequestUtils.java 82.08% <100.00%> (+3.13%) ⬆️
.../runtime/plan/server/ServerPlanRequestVisitor.java 70.66% <65.38%> (-9.34%) ⬇️

... and 27 files with indirect coverage changes

📣 Codecov offers a browser extension for seamless coverage viewing on GitHub. Try it in Chrome or Firefox today!

@walterddr walterddr force-pushed the poc_leaf_planning_semi_join branch from f7b7239 to a4112fa Compare November 2, 2023 18:32
@walterddr walterddr added feature multi-stage Related to the multi-stage query engine labels Nov 2, 2023
@walterddr walterddr marked this pull request as ready for review November 2, 2023 19:27
@walterddr walterddr force-pushed the poc_leaf_planning_semi_join branch 2 times, most recently from 581eb76 to 48abf60 Compare November 7, 2023 17:16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not introduced in this PR, but can we break these checks into multiple steps? We should avoid doing the expensive canPushDynamicBroadcastToLeaf() unless all other checks pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i think this is in the TODO item for this function. unfortunately, there's not too much we can do here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we differentiate aggregate after join vs join after aggregate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no need to differentiate. from a top-down walk. the moment you reach agg, you are guarantee that you have join on top of an aggregate (b/c the starting point of this call is a join

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest not introducing this as subclass but keep the existing ServerPlanRequestContext. If you find these extra properties apply to OpChain in general, we may directly add them into OpChainExecutionContext

Copy link
Contributor Author

@walterddr walterddr Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i need to extend OpChainExecutionContext. b/c the visitor pattern of PhysicalPlanVisitor expects an OpChainExecutionContext

I am not particularly happy about this but in order to reuse the logic in PhysicalPlanVisitor this is the only option i can think of

we can obviously create an interface of the execution context and refactor the logic in PhysicalPlanVisitorvisitor to a utility class. but that's too much refactoring in this PR IMO

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, we reverted the server extension on physical plan visitor and use the base logic instead.

Rong Rong added 6 commits November 8, 2023 18:58
connected with QueryRunner

refactor and simplify leaf stage compilation

fix linter and further refactoring

simplified visitor and context class, add javadoc

fix server plan request gen issues
- fix direct on leaf
@walterddr walterddr force-pushed the poc_leaf_planning_semi_join branch from 48abf60 to c3b052f Compare November 9, 2023 03:24
1. removed ServerPhysicalPlan related extensions, moved to base
2. base handles leaf-stage boundary only when leaf server context is set
3. ServerPlanRequestContext reverted back to its own purpose --> to only construct PinotQuery, ServerQueryRequest, etc
@walterddr walterddr merged commit f30fdee into apache:master Nov 13, 2023
private static void constructPinotQueryPlan(ServerPlanRequestContext serverContext) {
DistributedStagePlan stagePlan = serverContext.getStagePlan();
PinotQuery pinotQuery = serverContext.getPinotQuery();
Integer leafNodeLimit =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor was wrong here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature multi-stage Related to the multi-stage query engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants