-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Enabling LogicalProject pushdown optimizations to eliminate exchange of unused columns #14198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…of unused columns
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #14198 +/- ##
============================================
+ Coverage 61.75% 63.87% +2.12%
- Complexity 207 1535 +1328
============================================
Files 2436 2622 +186
Lines 133233 144399 +11166
Branches 20636 22103 +1467
============================================
+ Hits 82274 92235 +9961
- Misses 44911 45365 +454
- Partials 6048 6799 +751
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
These rules are repeated in multiple places, and rule sets are executed multiple times, introducing extra overhead and currently very hard to manage. Do you see a better way to re-organize them so that they can cover more general scenarios? cc @gortiz @yashmayya to also take a look |
|
@Jackie-Jiang I tried various permutations of reorganizing these rules which could minimize the projection of unused columns and reduce the modification of the number of existing logical plans which this change should not impact. One alternate way is to remove the last filter pushdown to reduce overhead but it causes many extraneous We have a separate workstream to improve query optimization and we can attempt to tackle it subsequently: #14243 cc: @ankitsultana |
|
Basically filter pushdown rules can impact project pushdown rules and vise versa. Should we consider putting them together as one rule collection? Will it potentially cause infinite loop? |
|
Yes, I tested it out and it does cause an infinite loop since |
| "\n LogicalProject(col3=[$2], $f1=[true])", | ||
| "\n LogicalFilter(condition=[=($0, _UTF-8'bar')])", | ||
| "\n LogicalTableScan(table=[[default, b]])", | ||
| "\n LogicalProject(col3=[$0], $f1=[$1])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I think this project is redundant? Something to note regardless as we work on a proper long term solution. Same in the other parts of this plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we can address this subsequently.
| "output": [ | ||
| "Execution Plan", | ||
| "\nLogicalProject(col1=[$0], $1=[$3])", | ||
| "\nLogicalProject(col1=[$0], w0$o0=[$3])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shauryachats : can we also verify that this window query works after this change? (as in it runs successfully)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.


The current logical planner has a shortcoming for
NOT INsubqueries where it projects unused columns before the exchange stage, resulting in increased data during shuffles and significantly increasing latency.An example would be:
The execution plan is:
It can be observed from the execution plan
that while the join is only on
a.col3and the final result required is acount(*), other columns (col1,col2,col4,col5) are present in theLogicalProjectstep right beforePinotLogicalExchangewhich should ideally have been pruned.The cause of the issue is the
LogicalFilterwhich does not allow projects to be transposed, since the current rules do not havePROJECT_FILTER_TRANSPOSEincluded. Even ifPROJECT_FILTER_TRANSPOSEis included inBASIC_RULES, it would not work for multiple joins sinceBASIC_RULESoptimizations are run only once.This PR aims to fix this issue by running
Projectpushdown (transpose and merge operations) as aRuleCollectionafter the first iteration ofFilterpushdown rules are run to ensureProjectcan be pushed down as much as possible.After the
Projectpushdown is run, theFilterpushdown is run again to push down filters after the project pushdown for two reasons:Filterpushdown afterProjectpushdown results in a Project pushdown after each logical plan node which increases verbosity.