Skip to content

Conversation

@westonpace
Copy link
Member

@westonpace westonpace commented Mar 6, 2025

Which issue does this PR close?

Rationale for this change

See PR

What changes are included in this PR?

  • StructArray::try_new will now return an error if there are no arrays provided
  • StructArray::new will panic if there are no arrays provided
  • StructArray::from(vec![]) will panic

Are there any user-facing changes?

BREAKING CHANGE: StructArray::try_new will now return an error if no child arrays are provided.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 6, 2025
@tustvold
Copy link
Contributor

tustvold commented Mar 6, 2025

I think as this is a breaking change regardless, we should probably just add the length as an explicit argument and avoid potential errors/panics. I don't really feel strongly though, I personally think empty StructArrays are something the arrow spec probably shouldn't permit, but that ship has sailed...

Perhaps @alamb has thoughts on the matter

@westonpace
Copy link
Member Author

Ah, as you were typing that I saw the suggestions from #6732 which suggested adding try_new_with_length (yet another alternative). I've implemented that and changed the docs for try_new to encourage try_new_with_length but I agree it might be more straightforward to force users to do the inference themselves.

@tustvold
Copy link
Contributor

tustvold commented Mar 6, 2025

which suggested adding try_new_with_length (yet another alternative)

I'm personally less a fan of this, as it isn't materially different from the current state of play - where there are multiple alternative methods.

I guess I was thinking something along the lines of

pub fn try_new(
        len: usize,
        fields: Fields,
        arrays: Vec<ArrayRef>,
        nulls: Option<NullBuffer>,
    ) -> Result<Self, ArrowError>

That being said, this would be potentially quite a disruptive change, and I am not sure how common empty StructArray actually are, I can't help feeling most of the time they'd arise from a deficiency in a projection system, rather than someone actually creating them...

For an additional data point, RecordBatch::try_new is consistent with StructArray, and so not making this change might be more consistent. I don't really feel strongly on this, which I guess would be an argument towards not making a breaking change here... I'll let others weigh in.

@gatesn
Copy link

gatesn commented Apr 28, 2025

I would push for the breaking change honestly..

Here's another example: #7447

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR makes sense to me and i think is consistent with RecordBatch -- thank you @westonpace and @tustvold and @gatesn

For an additional data point, RecordBatch::try_new is consistent with StructArray, and so not making this change might be more consistent. I don't really feel strongly on this, which I guess would be an argument towards not making a breaking change here... I'll let others weigh in.

I think RecordBatch::try_new errors if there are no columns:

let row_count = options
.row_count
.or_else(|| columns.first().map(|col| col.len()))
.ok_or_else(|| {
ArrowError::InvalidArgumentError(
"must either specify a row count or at least one column".to_string(),
)
})?;

I would push for the breaking change honestly..

One thing we could do is deprecate StructArray::try_new which would give a bunch of warnings and encourage people to upgrade, but that would be inconsistent with RecordBatch::try_new() / RecordBatch::try_new_with_options)

@westonpace westonpace force-pushed the refactor/do-not-default-struct-arr-len branch from 5a5f41d to 907c293 Compare April 30, 2025 13:55
@westonpace
Copy link
Member Author

I've rebased. It seems the consensus is leaning towards making this change now. If there are no other suggestions by Friday I will merge.

@westonpace westonpace merged commit 4491b17 into apache:main May 2, 2025
26 checks passed
@alamb
Copy link
Contributor

alamb commented May 3, 2025

🚀 -- thanks @westonpace and everyone else

@alamb alamb added the api-change Changes to the arrow API label May 3, 2025
@alamb
Copy link
Contributor

alamb commented May 3, 2025

I am adding an API change label to this PR to make sure it is highlighted in the release notes - I think it is fine to release this in a minor release (and not have to wait for the next major release)

@kylebarron
Copy link
Member

IMO I'd have considered this to be a breaking change, as the StructArray::new constructor now panics with input considered valid as of 55.0.

In pyo3-arrow I have a test that checks I can import an empty RecordBatch which goes through this code path, which uses StructArray::new and now panics.

Ideally I'd like to declare my dependency at 55.0 to let downstream users use either 55.0 or 55.1, but I can't use the suggested workarounds of try_new_with_length because that didn't exist in 55.

I figure I'll just bump my requirement to 55.1 and that'll be the least painful way to handle this.

@alamb
Copy link
Contributor

alamb commented May 19, 2025

We can also create a 55.0.1 release and backport the try_new_with_length method if that is better

@kylebarron
Copy link
Member

Given how much work it looks like every time you make a release, I really don't want to make you make a release if you don't need to 😅

In kylebarron/arro3#328 I bumped to require 55.1 and that's fine with me.

@alamb
Copy link
Contributor

alamb commented May 21, 2025

This also seems to have caused a panic when filtering struct arrays:

thorfour pushed a commit to polarsignals/arrow-rs that referenced this pull request May 27, 2025
…_new (apache#7247)

* Do not default the struct array length to 0 in Struct::try_new if there are no child arrays.

* Extend testing for new_empty_fields

* Add try_new_with_length
Blizzara pushed a commit to Blizzara/datafusion that referenced this pull request May 28, 2025
xudong963 pushed a commit to apache/datafusion that referenced this pull request May 29, 2025
* Fix ScalarStructBuilder::build() when the struct is empty

This had been broken by apache/arrow-rs#7247 in Arrow 55.1.0

* fix test for error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-change Changes to the arrow API arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

StructArray::try_new behavior can be unexpected when there are no child arrays

5 participants