Update Union Array to add UnionMode, match latest Arrow Spec, and rename new -> unsafe new_unchecked()#885
Conversation
arrow/src/array/array_union.rs
Outdated
There was a problem hiding this comment.
here is the API change
arrow/src/array/array_union.rs
Outdated
There was a problem hiding this comment.
here is the core format change -- the length of type_ids must match the length of the values (rather than skipping nulls)
arrow/src/array/data.rs
Outdated
There was a problem hiding this comment.
we can now validate the buffer layout based on DataType!
Codecov Report
@@ Coverage Diff @@
## master #885 +/- ##
==========================================
+ Coverage 82.33% 82.35% +0.02%
==========================================
Files 169 169
Lines 49719 49788 +69
==========================================
+ Hits 40936 41003 +67
- Misses 8783 8785 +2
Continue to review full report at Codecov.
|
3ace1e2 to
ca55696
Compare
new -> unsafe new_unchecked()UnionMode, match latest Arrow Spec, and rename new -> unsafe new_unchecked()
|
Marking PRs that are over a month old as stale -- please let us know if there is additional work planned or if we should close them. |
|
(I do have some idea in my head I can finish this one up) |
8cde1ed to
a094eb7
Compare
a094eb7 to
b481581
Compare
|
Ok, I got the yak shaving bug and cleaned up this PR and marked it ready for review. 🙏 |
|
@jimexist @paddyhoran @nevi-me -- might one of you have time to review this PR? It revamps how UnionArray is supported to conform to the modern Arrow spec, and adds additional validation. I would like to get it in for the 7.0.0 release in a few weeks time |
|
Sorry I've been so quiet on Arrow. I'll find some time to review in the coming days. |
| /// are used to index into the `child_arrays`. | ||
| /// | ||
| /// The `value_offsets` `Buffer` is only provided in the case of a dense union, sparse unions | ||
| /// should use `None`. If provided the `value_offsets` `Buffer` should contain `i32` values. |
There was a problem hiding this comment.
I think you removed the period at the end of this line by mistake, right?
There was a problem hiding this comment.
Yes, I think so. Good catch.
(Built on #810, so to see the changes proposed in this PR just look at the last commit)Which issue does this PR close?
Closes #814
Closes #85
Related to #817
Rationale for this change
Follow Arrow spec for
UnionArray, make it possible to validate with generic code added in #810, and conform to Rust safety conventions.See https://arrow.apache.org/docs/format/Columnar.html#union-layout for a description of the Union layout
What changes are included in this PR?
UnionArray::newtounsafe UnionArray::new_unchecked()to follow standard Rust safety conventionsUnionModetoDataType::Unionso the type carries information on expected formatAre there any user-facing changes?
UnionArray::newhas been renamed tounsafe UnionArray::new_unchecked()DenseorSparsewhen creating aDataType::UnionUnionArrayformat is different