feat: Add projection to FilterExec#12281
Conversation
cc744be to
9456c00
Compare
b0aacd6 to
80b2ab3
Compare
berkaysynnada
left a comment
There was a problem hiding this comment.
LGTM, thank you @eejbyfeldt. It seems like the with_projection() API could end up being a method of the ExecutionPlan in the long run, wdyt? (It would improve the performance of all operators which internally refer to some columns but its downstream operators do not actually require that column)
| Ok(Some(Arc::new(UnionExec::new(new_children)))) | ||
| } | ||
|
|
||
| trait EmbededProjection: ExecutionPlan + Sized { |
There was a problem hiding this comment.
EmbededProjection -> EmbeddedProjection
| .map(|map| Self { map }) | ||
| } | ||
|
|
||
| pub fn from_indices(indices: &[usize], schema: &SchemaRef) -> Result<Self> { |
There was a problem hiding this comment.
A brief docstring would be helpful
| } | ||
| } | ||
|
|
||
| pub fn project( |
There was a problem hiding this comment.
Can we write a docstring here also?
80a879a to
0c05fb9
Compare
Probably worth considering in the future. The argument against making it part of ExecutionPlan is that there are some operators like Addressed the comments. |
| 08)--------------AggregateExec: mode=Partial, gby=[v1@0 as v1, v2@1 as v2], aggr=[max(having_test.v1)] | ||
| 09)----------------MemoryExec: partitions=1, partition_sizes=[1] | ||
| 01)CoalesceBatchesExec: target_batch_size=8192 | ||
| 02)--FilterExec: max(having_test.v1)@2 = 3, projection=[v1@0, v2@1] |
alamb
left a comment
There was a problem hiding this comment.
This is a really nice improvement -- thank you @eejbyfeldt .
Thank you @Dandandan and @berkaysynnada for the reviews
I went throught he plan changes carefully and they are 👌 👨🍳
| 22)------------------------------------------HashJoinExec: mode=Partitioned, join_type=Inner, on=[(p_partkey@0, ps_partkey@0)], projection=[p_partkey@0, p_mfgr@1, ps_suppkey@3, ps_supplycost@4] | ||
| 23)--------------------------------------------CoalesceBatchesExec: target_batch_size=8192 | ||
| 24)----------------------------------------------RepartitionExec: partitioning=Hash([p_partkey@0], 4), input_partitions=4 | ||
| 25)------------------------------------------------ProjectionExec: expr=[p_partkey@0 as p_partkey, p_mfgr@1 as p_mfgr] |
There was a problem hiding this comment.
these plans are doubly good -- they save two copies -- one in FilterExec and one in CoalesceBatchesExec
|
🚀 |
|
FWIW I ran clickbench and also saw improvement on Q30 and Q31 Which correspond to several queries where the filter is on a string column ( |
Which issue does this PR close?
Closes #5436.
Rationale for this change
Improves performance by avoiding copying some of the columns during execution of FilterExec.
Comparision tpch_mem10
What changes are included in this PR?
Adds a projection to FilterExec similarly to #9236 for HashJoinExec. It also resuses a lot of the logic from that PR to implement the pushdown.
Are these changes tested?
Existing tests.
Are there any user-facing changes?
Yes, projections can now be pushed into the FilterExec.