fix: Substrait serializer clippy error: not calling truncate#14723
fix: Substrait serializer clippy error: not calling truncate#14723alamb merged 9 commits intoapache:mainfrom
Conversation
| match std::fs::metadata(path.as_ref()) { | ||
| Ok(meta) if meta.len() > 0 => { | ||
| return Err(DataFusionError::Substrait(format!( | ||
| "Failed to encode substrait plan: the file {} already exists and is not empty", |
There was a problem hiding this comment.
You could remove substrait here because it's already part of the error prefix
datafusion/datafusion/common/src/error.rs
Line 517 in 19fe44c
| "Failed to encode substrait plan: the file {} already exists and is not empty", | |
| "Failed to encode plan: the file {} already exists and is not empty", |
There was a problem hiding this comment.
Adding a substrait is because there already had an error message "Failed to encode substrait plan" in the fuction serialize_bytes. I was following that error message to make error messages consistent.
There was a problem hiding this comment.
Removed the word "substrait" from all error messages.
Co-authored-by: Matthijs Brobbel <[email protected]>
Co-authored-by: Matthijs Brobbel <[email protected]>
mbrobbel
left a comment
There was a problem hiding this comment.
Thanks! I added some follow-up suggestions.
| deserialize_bytes(protobuf_in).await | ||
| } | ||
|
|
||
| /// Deserializes a plan from the bytes. |
There was a problem hiding this comment.
Follow-up suggestion (breaking change): this function is marked async but it doesn't have to be.
| /// Reads the file at `path` and deserializes a plan from the bytes. | ||
| pub async fn deserialize(path: impl AsRef<Path>) -> Result<Box<Plan>> { |
There was a problem hiding this comment.
Follow-up suggestion (breaking change): this function is marked async but it doesn't have to be (see comment on deserialize_bytes).
| /// Reads the file at `path` and deserializes a plan from the bytes. | |
| pub async fn deserialize(path: impl AsRef<Path>) -> Result<Box<Plan>> { | |
| /// Reads the file at `path` and deserializes a plan from the bytes. | |
| pub fn deserialize(path: impl AsRef<Path>) -> Result<Box<Plan>> { |
There was a problem hiding this comment.
I plan to contribute more to the datafusion-substrait crate. Personally, I would like to reserve the breaking change in one of the following PR.
What do you think? @alamb
There was a problem hiding this comment.
Agree -- let's discuss API changes in follow on PRs / issues
| let mut file = OpenOptions::new().write(true).create_new(true).open(path)?; | ||
| file.write_all(&protobuf_out?)?; |
There was a problem hiding this comment.
Follow-up suggestion: use tokio::io::AsyncWriteExt::write_all instead of the blocking std::io::Write::write_all in this async function.
There was a problem hiding this comment.
I have replaced the sync OpenOptions with the async counterpart backed by tokio.
| /// Plans a sql and serializes the generated logical plan to bytes. | ||
| pub async fn serialize_bytes(sql: &str, ctx: &SessionContext) -> Result<Vec<u8>> { |
There was a problem hiding this comment.
If we replace the sql arg with a DataFrame arg, this function (and serialize) can be non-async.
There was a problem hiding this comment.
Not sure which one is preferable for users.
@alamb cc
There was a problem hiding this comment.
I think we should leave this one as a str in this PR and we can potentially add a DataFrame specific one as a follow on
However, given there is already an API to serialize LogicalPlans directly I am not sure how much more value a DataFrame one would add
Co-authored-by: Matthijs Brobbel <[email protected]>
Which issue does this PR close?
truncate#9727Rationale for this change
It's reasonable to always truncate an existing file before serializing a plan.
On the other hand, the following
deserializemethod always read the file to end which does not consider the case that the file is not truncated before serialization.What changes are included in this PR?
Applied a clippy suggestion by adding a
truncate(true).Are these changes tested?
Tested by clippy
Are there any user-facing changes?
No