Add physical plan properties to protobuf definition#13136
Add physical plan properties to protobuf definition#13136timsaucer wants to merge 1 commit intoapache:mainfrom
Conversation
alamb
left a comment
There was a problem hiding this comment.
Thank you @timsaucer -- I have had a chance to ponder this PR a bit more.
As I understand it, the reason we are thinking this code could help is that we could change the FFI bindings to return a serialized PlanProperties rather than a FFI struct.
However, given our current lackadaisical approach to protobuf compatibility https://docs.rs/datafusion-proto/latest/datafusion_proto/#version-compatibility I think we may run into trouble with this approach
Thus I think we should proceed with the FFI / API based approach rather than try to use serialized protobuf for the stable interface
That being said, I think this PR has potential value in its own right as a way to potentially improve performance (avoid recomputing the properties) though I am not sure is deserializing is faster than recomputing them 🤔
I did notice we don't really have unit tests for the other methods, so I didn't and any. I'm willing to do that, if desired.
I do think we should have some basic testing, otherwise there is a real danger this code will be broken accidentally in a future refactoring.
I think the existing tests are in https://github.com/apache/datafusion/blob/10af8a73662f4f6aac09a34157b7cf5fee034502/datafusion/proto/tests/cases/roundtrip_physical_plan.rs#L107-L106
However, the way most of the current serialization worked is that they re-compute the plan properties
to recompute the properties
One thing we could potentially do is to improve the protobuf format to actually serialize the PlanProperties rather than re-compute them when re-creating the plan
That would also result in test coverage of the new code
|
So now that we merged #12920 do we still need this PR 🤔 |
|
I don't mind closing it. I hadn't yet because I wanted to think more about the value you thought could be added by avoiding the call to recompute. |
|
In general I would tend towards less code and more re-compute unless someone finds / reports the properties are very expensive to compute. Even if they do I would personally prefer to try and make it faster to re-compute the properties rather than serialize them as that would benefit planning for all queries (not just ones that were serialzed) |
|
Sounds good to me. |
Which issue does this PR close?
This is to support apache/datafusion-python#823 and to address this comment on #12920
Rationale for this change
This is a pure addition to the protobuf definition to allow for transferring
PlanPropertiesin a serialized manner.What changes are included in this PR?
Are these changes tested?
Tested locally against my code for #12920
I did notice we don't really have unit tests for the other methods, so I didn't and any. I'm willing to do that, if desired.
Are there any user-facing changes?
No changes. Additional message definitions are available for use downstream, and particularly for the upcoming FFI work.