-
Notifications
You must be signed in to change notification settings - Fork 1.1k
fix: respect offset/length when converting ArrayData to StructArray #7366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: respect offset/length when converting ArrayData to StructArray #7366
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes incorrect behavior in converting ArrayData to StructArray by respecting the offset and length from parent ArrayData when slicing child arrays. Key changes include:
- Updating test cases to validate error scenarios and correct slicing in StructArray.
- Altering the From implementation in StructArray to adjust child array offsets and lengths.
- Adding tests to cover both correct conversion and expected panics when child arrays are undersized.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| arrow-ipc/src/reader.rs | Updated tests to use the new structural field handling for invalid arrays. |
| arrow-array/src/array/struct_array.rs | Modified array conversion to adjust child offsets and lengths, plus added tests. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @westonpace -- I have one question about the assert / safety check. Otherwise this seems like a nice fix to me
FYI @kylebarron and @tustvold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did verify that these tests fail without the code change in this PR
thread 'array::struct_array::tests::test_struct_array_from_data_with_offset_and_length' panicked at arrow-array/src/array/struct_array.rs:551:9:
assertion `left == right` failed
left: StructArray
-- validity:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't this require that child_offset + parent_offset + parent_len is >= child_len?
The check above only verifies that parent_len + parent_offset is greater than child_len
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need to consider child_offset. The child_len is "number of items in the underlying array, after skipping any child_offset."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than this somewhat opaque logic, which I am also not sure is correct w.r.t nulls, I wonder if we could just do something along these lines
let child = make_array(cd.clone());
if (parent_offset != 0 || parent_len != cd .len()) {
return child.slice(parent_offset, parent_len);
}
return child
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. That's much nicer, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this need to also slice the child's null buffer, although I left a comment above on how to simplify this logic - ArrayData is very fiddly to use correctly, the less code using it the better
|
It seems we've been blessed with a new version of Rust 😆 I'll work on the clippy errors in a new PR. |
…reate invalid arrays by creating the invalid arrays with a different path.
febe908 to
6737ebe
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @westonpace -- this is looking very good. I have only one more question
Good catch. I've added a test case to ensure it is covered. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @westonpace -- a very nice piece of work
Which issue does this PR close?
Rationale for this change
See issue. When converting arrays from arrow-cpp then sliced struct arrays may have an offset/length. By ignoring these the resulting
StructArrayis incorrect.What changes are included in this PR?
Changes the behavior of
StructArray::from::<ArrayData>Are there any user-facing changes?
No. It's hard to imagine any user would be relying on the past behavior as it was invalid. There was one unit test that had to change because it was relying on this behavior to construct invalid arrays (to ensure the IPC could handle these invalid arrays).