-
Notifications
You must be signed in to change notification settings - Fork 1.4k
[multistage] Support Physical Optimizer E2E #15698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #15698 +/- ##
============================================
+ Coverage 62.90% 63.18% +0.28%
- Complexity 1386 1388 +2
============================================
Files 2867 2873 +6
Lines 163354 164212 +858
Branches 24952 25124 +172
============================================
+ Hits 102755 103764 +1009
+ Misses 52847 52605 -242
- Partials 7752 7843 +91
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Do we have data points on overhead of optimizer itself ? |
|
Not yet, but I hope to run it at some point. FWIW I don't expect this new optimizer to add any new overheads of its own. Startree folks have seen some bottlenecks from how Calcite handles some things like large IN list, but we haven't yet tested MSE for such use cases yet. |
|
Apologies if these questions have been discussed earlier. In that case, you can just point to the discussion or doc. I have a couple of questions out of curiosity --
I think it really depends on the query complexity. In my experience, sophisticated query planning can easily go upto 100s of ms or even upto 1sec but then then it could be worth it if still optimizes the execution. I am very interested to know what are the kinds of queries for which you are seeing performance boost with the physical optimizer.
Why is this needed? I am hoping that even in the first working version of the optimizer, there is no query option otherwise we are just going to create a lot of tech debt / conditional paths where people will inevitably bypass it and create their own niche optimizations in the OSS code. Overall I think this is a step in the right direction. It will make Pinot better and help make it feature complete. However, optimizer is a long journey (longer than execution) with lots of consistent non-trivial optimizations/revisions. I suggest we invest systematically with a long term direction which I am not sure what that is. If there is a doc, I would love to see it. |
|
Hey Siddharth, this is the same workstream I had discussed with you in our 1:1 a few months ago, and the design doc is here: https://docs.google.com/document/d/17ApZbvNphKgEdSAOlZwTwAnnL_dAt9QbjcpzjHb4M0w/edit?tab=t.0#heading=h.txhamf1ek7k The overall goal is to restructure the query optimizer to improve Exchange Simplification and make it possible to add the MSE Lite Mode I had proposed here: #14640 Adding a cost model and doing join reordering is a separate topic and this effort doesn't impede that. We might actually contribute the same ourselves in a few months.
Key optimizations that this brings are (there are more.. I'll highlight them in the doc once we can run more tests on our clusters):
Some examples:
Completely agree. The reason this is needed is because the existing query optimizer has a lot of features that we need to port to the new optimizer flow. The idea is that we'll soon backport all features of the existing query optimizer, and then remove the old code path. I also want to give other companies time to test the new optimizer path out and run their own A/B tests in case they want to test the perf difference that the new optimizer brings to their workloads. |
88b44c1 to
b01fb52
Compare
You don't need to do that. Instead, you can rely on Remember we also introduced |
|
@gortiz : thanks for sharing. I'll raise another one after this to address this. As for the cleanup, I wanted to make a change which explicitly tracks whether the QueryEnvironment usage is in the controller or the broker. Right now I think we implicitly infer this by checking whether Ideally I'd like to take a stab at the TODO shown below but I think it's best to do it once I mainline the physical optimizer.
|
itschrispeck
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from my side given the execution plan shared
| "sql": "SET usePhysicalOptimizer=true; EXPLAIN PLAN FOR SELECT col2, col3 FROM a WHERE col1 = 'foo' LIMIT 10, 11", | ||
| "output": [ | ||
| "Execution Plan", | ||
| "\nPhysicalSort(offset=[10], fetch=[11])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unsorted queries, we can consider fetch directly and not do offsets since a high offset value can result in high memory usage for high offset values but low limit values.
shauryachats
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM

Summary
Completes the first E2E working version of the new Physical Optimizer. There's quite a bit of refactoring and feature porting that still needs to be done in order to make this usable generally, and I am hoping to spend May doing that.
Maintaining Request Id in Context
The current QueryEnvironment I feel is quite bloated and I know several folks have also raised this. For now I have added the requestId to
QueryEnvironment#Config. This eliminates passing the requestId across method calls liketoDispatchableSubPlanso I think it is overall still a relatively clean approach.Determining When Physical Optimizer is Enabled
I have added a new query option, that is temporary:
usePhysicalOptimizer=true/false. By default this is assumed false. When we use the physical optimizer, we need to skip certain sections of the HepProgram and I have made those changes accordingly.Plan Fragmenter and Mailbox Assignment
This does the following:
This is not unit tested right now and I am working on a follow-up PR to add unit tests for this and some other parts of the code.
PinotLogicalQueryPlanner changes
I think we really need to clean up some of these classes. e.g. PinotLogicalQueryPlanner creates the dispatchable plan which is misleading.
I hope to do this incrementally. First, I'll backport the missing features from the existing MSE Optimizer, then work on adding sufficient testing to make sure the new optimizer can be made the main optimizer, and then work on cleaning up the rest of the optimizer structure and removing the old optimizer code.
My hope is to wrap all of this up in H1.
Test Plan
Unit Tests
Have added Unit Test cases which have decent coverage. Check the JSON Plan output file.
Cluster Testing
We have tested this in our cluster and even for some of the simple query shapes on low amount of data, the perf improvement is around 2x, but that may be solely because we don't yet support semi-join dynamic filters in the Physical Optimizer. We'll be running more benchmarks this month to compare the difference between the old and the new optimizer on one of our bigger clusters.