ARROW-10016: [Rust] Implement is null / is not null kernels#8204
ARROW-10016: [Rust] Implement is null / is not null kernels#8204jhorstmann wants to merge 13 commits intoapache:masterfrom
Conversation
|
The |
jorgecarleitao
left a comment
There was a problem hiding this comment.
Great work with through testing. 👍
Left some minor comments on it, but otherwise LGTM
c73f5c4 to
5e4930b
Compare
| if filter_array.len() % 64 != 0 { | ||
| let last_idx = filter_u64.len() - 1; | ||
| let mask = u64::MAX >> (64 - filter_array.len() % 64); | ||
| filter_u64[last_idx] &= mask; |
There was a problem hiding this comment.
This fixes the issue I reported about accessing bits outside of len in https://issues.apache.org/jira/browse/ARROW-10025
| .write_bytes(filter_bytes, u64_buffer.capacity() - filter_bytes.len())?; | ||
| let filter_u64 = u64_buffer.typed_data_mut::<u64>().to_owned(); | ||
| // add to the resulting len so is is a multiple of the size of u64 | ||
| let pad_addional_len = 8 - filter_bytes.len() % 8; |
There was a problem hiding this comment.
The previous code padded to a multiple of 64 bytes which does not seem necessary and makes masking of the last element more difficult.
| } | ||
|
|
||
| pub fn is_null(input: &ArrayRef) -> Result<BooleanArray> { | ||
| if input.offset() % 8 != 0 { |
There was a problem hiding this comment.
Is this limitation fine, or is there a potential way around it? Does it mean that array[len = 30].slice(5, 20).is_null() would fail?
There was a problem hiding this comment.
This is currently a limitation of several kernels that operate on boolean arrays or null bitmaps because we try to operate on whole bytes (or larger) instead of individual bits for performance reasons. We would need a nicer abstraction for those bit packed buffers to iterate over chunks, regardless of offsets. I think it makes sense to look into such an implementation together with @jorgecarleitao proposal for an iterator interface.
There was a problem hiding this comment.
Oh yes, I remember that now. I tried looking at bit manipulation that would address this, but I can't remember where; maybe as part of the Parquet writer 🤔
It's def something that we need to do, if I have more time in the coming months I'll look into it. We could also take inspiration from c++ as I think they have this.
|
@jhorstmann Looks like there is cargo fmt issue |
|
@andygrove format issue is solved |
Needs to be rebased after #8183 is merged