Fix ipc schema custom_metadata serialization#3282
Merged
tustvold merged 3 commits intoapache:masterfrom Dec 8, 2022
Merged
Conversation
Contributor
Author
|
I'm kinda hesitant on the test changes honestly, since having the pyarrow bytes statically hardcoded doesn't seem ideal |
tustvold
approved these changes
Dec 7, 2022
arrow-ipc/src/convert.rs
Outdated
| let ipc2 = crate::root_as_message(&bytes).unwrap(); | ||
| let schema2 = ipc2.header_as_schema().unwrap(); | ||
|
|
||
| // can't compare schema directly as though is same message, the underlying bytes seem to differ |
Contributor
There was a problem hiding this comment.
This is likely because flatbuffers aren't a canonical representation, that is there are multiple legal ways to represent the same data
| // the schema is: Field("field1", DataType::UInt32, false) | ||
| // Bytes of a schema generated via following python code, using pyarrow 10.0.1: | ||
| // | ||
| // import pyarrow as pa |
| builder.add_fields(fb_field_list); | ||
| builder.add_custom_metadata(fb_metadata_list); | ||
| if !custom_metadata.is_empty() { | ||
| builder.add_custom_metadata(fb_metadata_list); |
Contributor
There was a problem hiding this comment.
FWIW ideally we would not create the fb_metadata_list if not needed, as effectively this encodes an empty list, but then doesn't reference it from a flatbuffer Table. In practice this just means the generated flatbuffer has an orphaned 4 bytes of "padding"
Contributor
Author
There was a problem hiding this comment.
Thanks, I'll change it to do that
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #250.
Rationale for this change
Thought meant to address #250, it is outdated in that the error doesn't seem to be due to empty children array anymore, but persists with empty custom_metadata (at schema level, not field level).
Have updated the bytes generated from pyarrow as those were from a very old version, and also made adjustments to how bytes are generated from Rust too, to make it more dynamic.
What changes are included in this PR?
Fix test
Fix ipc serialization of schema to fb for empty custom_metadata field to ensure it serializes as a None/null instead of an empty array (since pyarrow appears to do the same)
Are there any user-facing changes?