Improve Array Logical Nullability#4691
Conversation
arrow-arith/src/boolean.rs
Outdated
There was a problem hiding this comment.
We no longer need to special case NullArray, whilst also now correctly handling DictionaryArray with value nulls
arrow-array/src/array/mod.rs
Outdated
There was a problem hiding this comment.
I debated calling this method is_required instead, but figured it was better to stick with consistent terminology, and this matches the method on Field
arrow-array/src/array/null_array.rs
Outdated
There was a problem hiding this comment.
This is a breaking change, although I expect the impact to minimal. I debated making is_null and friends always return logical nullability, but this would have been inconsistent with ArrayData and generally seemed like it might just cause more confusion
arrow-ord/src/comparison.rs
Outdated
arrow-ord/src/sort.rs
Outdated
ab28cfe to
721d65e
Compare
721d65e to
55a420b
Compare
| self.as_ref().nulls() | ||
| } | ||
|
|
||
| fn logical_nulls(&self) -> Option<NullBuffer> { |
There was a problem hiding this comment.
I considered having this eagerly constructed and stored on the arrays, but I felt this would be more surprising as it would make the arrays diverge from the arrow specification. Whilst this does raise the prospect of computing this NullBuffer multiple times, the cases where this is required are fairly rare. We can always optimise in future should this become a problem
|
|
||
| /// Returns the logical null buffer of this array if any | ||
| /// | ||
| /// In most cases this will be the same as [`Array::nulls`], except for: |
There was a problem hiding this comment.
I think Array::nulls should be deprecated and a non-deprecated version physical_nulls should be introduced. Users will misuse nulls because:
- most people will read over the small note within the doc string
nullsis easier to find thanlogical_nulls- there are two options of "which null", and we offer an implicit default (because it is not prefixed with a semantic) that is clearly the wrong one for many use cases
There was a problem hiding this comment.
Whilst I do sort of agree, I'm not sure this wouldn't just cause more confusion, as this isn't a distinction found in the arrow spec. Would we also deprecate is_null, is_valid, etc... I think the current API hasn't been a major issue thus far, as the cases where it matters are rare.
FWIW once we have StringView I think there is basically no reason to use DictionaryArray or RunArray
Which issue does this PR close?
Closes #4690
Closes #4689
Closes #4688
Closes #2687
Closes #4616
Relates to #4565
Relates to #4835
Rationale for this change
Various types have nullability that is not accurately reflected in their reported NullBuffer, this causes incorrect behaviour for various kernels that have special handling of the null buffers.
What changes are included in this PR?
Adds two methods to Array:
This logical null mask can then be used in the places where the logic relies on logical nullability
Are there any user-facing changes?