Skip to content

ARROW-10016: [Rust] Implement is null / is not null kernels#8204

Closed
jhorstmann wants to merge 13 commits intoapache:masterfrom
jhorstmann:ARROW-10016-implement-is-null-kernels
Closed

ARROW-10016: [Rust] Implement is null / is not null kernels#8204
jhorstmann wants to merge 13 commits intoapache:masterfrom
jhorstmann:ARROW-10016-implement-is-null-kernels

Conversation

@jhorstmann
Copy link
Copy Markdown
Contributor

Needs to be rebased after #8183 is merged

@jhorstmann jhorstmann changed the title Arrow 10016 Implement is null / is not null kernels ARROW-10016: [RUST] Implement is null / is not null kernels Sep 16, 2020
@github-actions
Copy link
Copy Markdown

@paddyhoran paddyhoran changed the title ARROW-10016: [RUST] Implement is null / is not null kernels ARROW-10016: [Rust] Implement is null / is not null kernels Sep 16, 2020
@jhorstmann
Copy link
Copy Markdown
Contributor Author

The IS NULL and probably also the NOT kernel seem to have some unexpected interactions with the filter kernel, which accesses bits outside of the len. I created a ticket for that at https://issues.apache.org/jira/browse/ARROW-10025

Copy link
Copy Markdown
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work with through testing. 👍

Left some minor comments on it, but otherwise LGTM

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 nan is not null :)

@jhorstmann jhorstmann force-pushed the ARROW-10016-implement-is-null-kernels branch from c73f5c4 to 5e4930b Compare September 18, 2020 10:20
if filter_array.len() % 64 != 0 {
let last_idx = filter_u64.len() - 1;
let mask = u64::MAX >> (64 - filter_array.len() % 64);
filter_u64[last_idx] &= mask;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.write_bytes(filter_bytes, u64_buffer.capacity() - filter_bytes.len())?;
let filter_u64 = u64_buffer.typed_data_mut::<u64>().to_owned();
// add to the resulting len so is is a multiple of the size of u64
let pad_addional_len = 8 - filter_bytes.len() % 8;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code padded to a multiple of 64 bytes which does not seem necessary and makes masking of the last element more difficult.

}

pub fn is_null(input: &ArrayRef) -> Result<BooleanArray> {
if input.offset() % 8 != 0 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this limitation fine, or is there a potential way around it? Does it mean that array[len = 30].slice(5, 20).is_null() would fail?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently a limitation of several kernels that operate on boolean arrays or null bitmaps because we try to operate on whole bytes (or larger) instead of individual bits for performance reasons. We would need a nicer abstraction for those bit packed buffers to iterate over chunks, regardless of offsets. I think it makes sense to look into such an implementation together with @jorgecarleitao proposal for an iterator interface.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I remember that now. I tried looking at bit manipulation that would address this, but I can't remember where; maybe as part of the Parquet writer 🤔

It's def something that we need to do, if I have more time in the coming months I'll look into it. We could also take inspiration from c++ as I think they have this.

Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andygrove
Copy link
Copy Markdown
Member

@jhorstmann Looks like there is cargo fmt issue

@jhorstmann
Copy link
Copy Markdown
Contributor Author

@andygrove format issue is solved

@nevi-me nevi-me closed this in 697f141 Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants