move projection pushdown optimization logic to ExecutionPlan trait#14235
Conversation
|
Thank you @buraksenn. I'll take a look ASAP. |
|
Pushed the rest of the code. I've only left ProjectionExec which I think should be in projection_pushdown.rs file. I've one question, since projection pushdown has utils that other Execs depend on I've put it in |
|
@alamb are we okay with this API extension? I think it is inevitable at some point. I have a WIP PR (which has been in progress for a very long time, but I hope to get back to it soon) that would also benefit from such a method. |
Yes for sure -- I agree it is inevitable In my mind a step forward for making the optimizers more general too -- for example as written ProjectionPushdown doesn't work for user defined ExecutionPlan nodes. With this new API it does |
alamb
left a comment
There was a problem hiding this comment.
Thank you for working on this @buraksenn and @berkaysynnada -- it looks quite close and is 👨🍳 👌 very nice
| 02)--FilterExec: c2@1 > 10, projection=[c1@0] | ||
| 03)----RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
| 04)------CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1, c2], has_header=true | ||
| 01)ProjectionExec: expr=[c1@0 as c1] |
There was a problem hiding this comment.
Some of these plans don't look quite as good (they now have an unecessary projection)
There was a problem hiding this comment.
You're right, just moving the code changed the behavior. I got its cause and fixing now
There was a problem hiding this comment.
It seems I've introduced that while moving it. Thanks for the fix @berkaysynnada
| CardinalityEffect::Unknown | ||
| } | ||
|
|
||
| /// Attempts to push down the given projection into the input of this `ExecutionPlan`. |
There was a problem hiding this comment.
I agree it makes sense to expose ProjectionExec here in this API -- it is a very special operator. In theory it would be nice to avoid the circular reference (ProjectionExec is an ExecutionPlan that depends on ProjetionExec) but if it compiles I say we 🚢 🇮🇹
|
I am merging this once the CI passes one more |
berkaysynnada
left a comment
There was a problem hiding this comment.
Thanks @buraksenn. Now it is ready to go 🚀
alamb
left a comment
There was a problem hiding this comment.
This is epic work @berkaysynnada and @buraksenn -- the fact none of the tests was changed is 👌 very nice
Which issue does this PR close?
It does not directly closes but is related
Rationale for this change
ProjectionPushdownintodatafusion-physical-optimizercrate #14184.For #11502 there is a ongoing work. Projection pushdown optimizer depends on CsvExec which blocks moving out of core crate. By moving swapping with projection logic to execs we are getting rid of dependencies and we can simply call
try_swapping_with_projectionfor most execs.Next steps:
What changes are included in this PR?
try_swapping_with_projectionmethod to traitAre these changes tested?
Existing tests in pushdown
Are there any user-facing changes?
No