ARROW-11790: [Rust][DataFusion] Change builder signatures to take impl Interator<Item=Expr> rather than &[Expr] #9703
Conversation
rust/benchmarks/src/bin/tpch.rs
Outdated
There was a problem hiding this comment.
this is a pretty good example -- the answer_schema was created and immediately dropped after calling df.select(). Now the exprs don't need to be copied
There was a problem hiding this comment.
As I initially found the impl IntoIterator<Item .... syntax hard to grok, I figured I would add an example to LogicalPlanBuilder showing that it lets you use vec![] and the trait business can be somewhat ignored
There was a problem hiding this comment.
it was even worse here -- two copies of each input expr were made -- one copy to pass to validate_unique_names and once to put on the LogicalPlanAggregate.
Admittedly, removing the second copy doesn't require changing the signature of this function
Yeah, it's a pain but Related: is there a reason that |
@jorgecarleitao might know the answer to that as he introduced the |
1ac88f2 to
200fa6f
Compare
|
@alamb , I do not know the reason. I did not use it on |
jorgecarleitao
left a comment
There was a problem hiding this comment.
LGTM. Thanks a lot, @alamb .
There was a problem hiding this comment.
This will be even sweater once IntoIter is supported for arrays: [col()] vs vec![col()], see rust-lang/rust#65819
I can explain :-) My motivation was extensibility. I wanted users to be able to use the DataFusion DataFrame for local (in-process) execution but then be able to run distributed by having a mechanism to tell the context which DataFrame impl to use (configuration change rather than code change). I have a much better idea now though of the requirements so plan on writing up a design once Ballista is in this repo. |
|
And just to be clear, I am fine with removing the trait. |
|
I am going to rebase this PR and merge it once CI passes |
Thanks @andygrove -- I have no particular need to remove the |
|
I wrote up some very brief notes in https://issues.apache.org/jira/browse/ARROW-12064 |
|
Got. Then fwiw, I think that we can build that with this trait, we just need a bit more work / investigation: I am not 100% sure, but I think that the path for libraries that want to expose a different
For the case of Ballista, I think we need the opposite: Ballista has an (This is a hand-waving idea) |
…l Interator<Item=Expr> rather than &[Expr]
200fa6f to
e5a0b52
Compare
NOTE:
Since is a fairly major backwards incompatible change (many callsites need to be updated, though mostly mechanically); I gathered some feedback on this approach in #9692 and this is the PR I propose for merge.
I'll leave this open for several days and also send a note to the mailing lists for additional comment
It is part of my overall plan to make the DataFusion optimizer more idiomatic and do much less copying ARROW-11689
Rationale:
All callsites currently need an owned
Vec(or equivalent) so they can pass in&[Expr]and then Datafusion copies all theExprs. Many times the originalVec<Expr>is discarded immediately after use (I'll point out where this happens in a few places below). Thus I it would better (more idiomatic and often less copy/faster) to take something that could produce an iterator over ExprChanges
Dataframeso it takesVec<Expr>rather than&[Expr]LogicalPlanBuilderso it takesimpl Iterator<Item=Expr>rather than&[Expr]I couldn't figure out how to allow the
DataframeAPI (which is a Trait) to take animpl Iterator<Item=Expr>