Skip to content

Comments

Revert "Seal Array trait (#9092)", mark Array as unsafe#9234

Merged
alamb merged 2 commits intoapache:mainfrom
gabotechs:gabotechs/revert-seal-array-trait
Jan 26, 2026
Merged

Revert "Seal Array trait (#9092)", mark Array as unsafe#9234
alamb merged 2 commits intoapache:mainfrom
gabotechs:gabotechs/revert-seal-array-trait

Conversation

@gabotechs
Copy link
Contributor

@gabotechs gabotechs commented Jan 21, 2026

This reverts commit 721f373.

Which issue does this PR close?

Rationale for this change

As discussed in Discussed in #9184, rather than sealing the Array trait, it's going to potentially be marked unsafe so that projects have time to find alternatives for their use-cases that are now satisfied by a custom Array implementation.

What changes are included in this PR?

Clean revert of #9092

Are these changes tested?

Not applicable here

Are there any user-facing changes?

Yes, users can no implement the Array trait in their codebases again.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 21, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gabotechs -- this looks good to me

I have some comment suggestions, but I also think we could do them as a follow on PR

@alamb alamb changed the title Revert "Seal Array trait (#9092)" Revert "Seal Array trait (#9092)", mark Array as unsafe Jan 22, 2026
@gabotechs gabotechs force-pushed the gabotechs/revert-seal-array-trait branch from 441b818 to 592bd0a Compare January 23, 2026 07:07
@gabotechs gabotechs force-pushed the gabotechs/revert-seal-array-trait branch from 592bd0a to 946931b Compare January 23, 2026 07:08
@alamb
Copy link
Contributor

alamb commented Jan 25, 2026

I plan to merge this tomorrow, unless there are any additional comments

@alamb alamb merged commit 6505d2a into apache:main Jan 26, 2026
27 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 26, 2026

Thanks again @gabotechs

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're adding scary docs to explain why it's unadvisable to manually impl this trait, it seems like we're missing one of the biggest reasons?

It's impossible to correctly implement the trait for third party types, because casting is based on the return values of Array::data_type, and third party types cannot extend enum DataType. So any code that attempts casting based on data type (including internal arrow library code) risks a panic.

@gabotechs
Copy link
Contributor Author

@scovich completely agree, and I think it makes sense that eventually this trait gets sealed. However, some crates are do willing to risk panics in order to benefit from the flexibility of trait Array, and this is a good intermediate step while these crates think or even contribute a better way of satisfying their needs.

@scovich
Copy link
Contributor

scovich commented Jan 27, 2026

@scovich completely agree, and I think it makes sense that eventually this trait gets sealed. However, some crates are do willing to risk panics in order to benefit from the flexibility of trait Array, and this is a good intermediate step while these crates think or even contribute a better way of satisfying their needs.

Understood. I was just commenting that the documentation in this PR, explaining why it's marked unsafe (strongly discouraged), doesn't give one of the biggest reasons. Maybe we should add that reason.

Or maybe there's enough truly unsafe code at risk to outweigh the "mere" panic risk of a cast gone bad?

@gabotechs
Copy link
Contributor Author

doesn't give one of the biggest reasons. Maybe we should add that reason.

The closest thing it mentions is that "panics" can happen, it does not enumerate which places specifically. If you have any suggestions about more things to add to the safety comment they are more than welcome.

@scovich
Copy link
Contributor

scovich commented Jan 28, 2026

Skimming down the list of methods:

  • Many code sites -- both internally and in user code -- use Array::data_type to drive casting decisions. Custom implementations of the Array trait risk causing panics in such code, unless Array::as_any actually returns the corresponding concrete type. See e.g. https://docs.rs/arrow/latest/arrow/array/macro.downcast_primitive_array.html
    • Corollary: Any type that attempts to impl Array as a "container" is utterly unusable as an actual array, because the original type could only be recovered if its Array::as_any returns a reference to the original type.
  • Array::into_data and Array::to_data must produce a valid ArrayData (with a valid ArrayData::data_type), or risk causing panics and/or UB in downstream consumers such as make_array.
    • Corollary: make_array can never recover the original custom array type. It will instead recover whatever ArrayData::data_type indicated.
  • Array::is_empty, Array::len, and Array::offset must be accurate, even if this array is the result of Array::slice
  • Array::nulls must be accurate (any entry wrongly marked non-null has an undefined value)

Summing it all up -- any custom Array implementation must either be a complex analogue to dyn Any -- completely ignoring the normal Array API -- or must look and act exactly like a newtype wrapper around one of the built-in Array types (e.g. Arc<dyn Array> can safely impl Array).

Problem is -- there's no way to tell which one you're dealing with, for an arbitrary &dyn Array. The only way to be sure you have a usable Array is to round trip it to ArrayData and back before attempting to work with it (assuming Array::into_data is implemented correctly).

If we really wanted to support the use case of dyn Array as a stand-in for dyn Any (since Array: !Any), then we'd have to define a new method, a "raw" version of Array::as_any, that can be downcast to recover the true type, and possibly also an analogue to Array::data_type that could guide the use of that raw casting. That way, the custom array-as-pointer could maintain the 1:1 correspondence between Array::data_type and Array::as_any, while still allowing to recover the original type. However, it seems like these are really just different use cases and we probably shouldn't try to conflate them in a single type.

alamb pushed a commit to alamb/arrow-rs that referenced this pull request Jan 30, 2026
…ache#9234)

This reverts commit 721f373.

# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Discussed in apache#9184
- Closes apache#9184
 
# Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

As discussed in Discussed in
apache#9184, rather than sealing the
`Array` trait, it's going to potentially be marked `unsafe` so that
projects have time to find alternatives for their use-cases that are now
satisfied by a custom `Array` implementation.


# What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Clean revert of apache#9092

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Not applicable here

# Are there any user-facing changes?

Yes, users can no implement the `Array` trait in their codebases again.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
@alamb
Copy link
Contributor

alamb commented Jan 30, 2026

@alamb
Copy link
Contributor

alamb commented Jan 30, 2026

I also created a PR to try and summarize the discussion about DataType in the comments:

alamb added a commit that referenced this pull request Feb 2, 2026
…unsafe` (#9234) (#9313)

- Part of #9240
- Related to #9184

This is a backport of the following PR to the 57 line
- #9234 from @gabotechs  

As discussed in #9184, rather
than sealing the Array trait which broke several downstream crates, this
PR markes the trait as `unsafe` so that projects have time to find
alternatives for their use-cases that are now satisfied by a custom
Array implementation.

Co-authored-by: Gabriel <[email protected]>
alamb added a commit that referenced this pull request Feb 2, 2026
# Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax.
-->

- Closes #NNN.

# Rationale for this change

Per discussion on
#9234 (review)
@scovich noted that it would be good to point out the core reason it is
hard/impossible to implement Array safetly


# What changes are included in this PR?

Update the comments on `Array` to try and capture the thread.

# Are these changes tested?

By CI
# Are there any user-facing changes?

Docs only, no functional changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[regression] Sealing the Array trait broke downstream crates

3 participants