Skip to content

Conversation

@westonpace
Copy link
Member

@westonpace westonpace commented Mar 31, 2025

Which issue does this PR close?

Rationale for this change

See issue. When converting arrays from arrow-cpp then sliced struct arrays may have an offset/length. By ignoring these the resulting StructArray is incorrect.

What changes are included in this PR?

Changes the behavior of StructArray::from::<ArrayData>

Are there any user-facing changes?

No. It's hard to imagine any user would be relying on the past behavior as it was invalid. There was one unit test that had to change because it was relying on this behavior to construct invalid arrays (to ensure the IPC could handle these invalid arrays).

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 31, 2025
@westonpace westonpace requested a review from Copilot April 1, 2025 13:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes incorrect behavior in converting ArrayData to StructArray by respecting the offset and length from parent ArrayData when slicing child arrays. Key changes include:

  • Updating test cases to validate error scenarios and correct slicing in StructArray.
  • Altering the From implementation in StructArray to adjust child array offsets and lengths.
  • Adding tests to cover both correct conversion and expected panics when child arrays are undersized.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
arrow-ipc/src/reader.rs Updated tests to use the new structural field handling for invalid arrays.
arrow-array/src/array/struct_array.rs Modified array conversion to adjust child offsets and lengths, plus added tests.

@westonpace westonpace requested review from alamb and tustvold April 1, 2025 17:40
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @westonpace -- I have one question about the assert / safety check. Otherwise this seems like a nice fix to me

FYI @kylebarron and @tustvold

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did verify that these tests fail without the code change in this PR



thread 'array::struct_array::tests::test_struct_array_from_data_with_offset_and_length' panicked at arrow-array/src/array/struct_array.rs:551:9:
assertion `left == right` failed
  left: StructArray
-- validity:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this require that child_offset + parent_offset + parent_len is >= child_len?

The check above only verifies that parent_len + parent_offset is greater than child_len

Copy link
Member Author

@westonpace westonpace Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need to consider child_offset. The child_len is "number of items in the underlying array, after skipping any child_offset."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than this somewhat opaque logic, which I am also not sure is correct w.r.t nulls, I wonder if we could just do something along these lines

let child = make_array(cd.clone());
if (parent_offset != 0 || parent_len != cd .len()) {
    return child.slice(parent_offset, parent_len);
}
return child

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. That's much nicer, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this need to also slice the child's null buffer, although I left a comment above on how to simplify this logic - ArrayData is very fiddly to use correctly, the less code using it the better

@westonpace
Copy link
Member Author

It seems we've been blessed with a new version of Rust 😆

I'll work on the clippy errors in a new PR.

@westonpace westonpace force-pushed the fix/struct-array-ignores-offsets branch from febe908 to 6737ebe Compare April 3, 2025 15:24
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @westonpace -- this is looking very good. I have only one more question

@westonpace
Copy link
Member Author

When I changed it to only look at offset, the tests all still pass so that suggests to me it isn't covered 🤔
@alamb

Good catch. I've added a test case to ensure it is covered.

@westonpace westonpace requested a review from alamb April 3, 2025 16:55
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @westonpace -- a very nice piece of work

@westonpace westonpace merged commit 9a5d821 into apache:main Apr 4, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StructArray from ArrayData Ignores Offset

3 participants