Add try_unary, binary, try_binary kernels ~90% faster#2666
Add try_unary, binary, try_binary kernels ~90% faster#2666tustvold merged 1 commit intoapache:masterfrom
Conversation
0815487 to
882a655
Compare
arrow/src/compute/kernels/arity.rs
Outdated
| use crate::util::bit_iterator::{BitIndexIterator, BitSliceIterator}; | ||
| use std::sync::Arc; | ||
|
|
||
| fn try_for_each_valid<F: FnMut(usize) -> Result<()>>( |
There was a problem hiding this comment.
This internal iteration style optimises much better than the external style when there are multiple variants, I may create a HybridBitIndexIterator that special cases for_each and try_fold 🤔
| } | ||
| } | ||
|
|
||
| pub fn binary<A, B, F, O>( |
| #[inline] | ||
| fn into_primitive_array_data<I: ArrowPrimitiveType, O: ArrowPrimitiveType>( | ||
| array: &PrimitiveArray<I>, | ||
| unsafe fn build_primitive_array<O: ArrowPrimitiveType>( |
There was a problem hiding this comment.
the new name makes more sense than before
| } | ||
| try_binary(left, right, |a, b| { | ||
| op(a, b).ok_or_else(|| { | ||
| ArrowError::ComputeError(format!("Overflow happened on: {:?}, {:?}", a, b)) |
There was a problem hiding this comment.
Can we point out the op which cause the overflow?
|
|
||
| /// an iterator that returns Some(T) or None, that can be used on any [`ArrayAccessor`] | ||
| // Note: This implementation is based on std's [Vec]s' [IntoIter]. | ||
| /// An iterator that returns Some(T) or None, that can be used on any [`ArrayAccessor`] |
| let null_count = array.null_count(); | ||
|
|
||
| let mut buffer = BufferBuilder::<O::Native>::new(len); | ||
| buffer.append_n_zeroed(array.len()); |
There was a problem hiding this comment.
Hmm, do we need zero-initialized values for the buffer?
There was a problem hiding this comment.
It is UB if we don't initialize all values in the buffer, even the null slots. We must therefore zero out the nulls, in the past I have found it is faster to zero initialize everything, and override the valid indexes, than to interleave appending nulls and values.
arrow/src/compute/kernels/arity.rs
Outdated
| if selectivity > 0.8 { | ||
| BitSliceIterator::new(nulls.unwrap(), 0, len) | ||
| .flat_map(|(start, end)| start..end) | ||
| .try_for_each(f) | ||
| } else { | ||
| BitIndexIterator::new(nulls.unwrap(), 0, len).try_for_each(f) |
There was a problem hiding this comment.
Am I read this incorrectly? I think higher selectivity here is more null values in the array? So I guess BitIndexIterator is more performant?
There was a problem hiding this comment.
Oops, i think you're right. I copied this from the filter kernel and forgot to reverse this
| } | ||
| } | ||
|
|
||
| pub fn binary<A, B, F, O>( |
There was a problem hiding this comment.
I guess we need to add docs for binary and try_binary.
| return Ok(PrimitiveArray::from(ArrayData::new_empty(&O::DATA_TYPE))); | ||
| } | ||
|
|
||
| let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len).unwrap(); |
There was a problem hiding this comment.
| let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len).unwrap(); | |
| let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len)?; |
| right: &PrimitiveArray<RT>, | ||
| op: F, | ||
| ) -> Result<PrimitiveArray<T>> | ||
| ) -> Result<PrimitiveArray<LT>> |
There was a problem hiding this comment.
Why is the return type same as the left type?
There was a problem hiding this comment.
In the style, the type of left and right can be different.
Do we need to support this?
There was a problem hiding this comment.
I believe it is required to allow adding a duration to a time
90e12b1 to
fb9ae62
Compare
Final benchmarksSo anything from 50% to 99% faster for the checked kernels 🎉 . The unchecked kernels regularly vary by +-10% but we're talking single-digit microseconds, so this isn't all that surprising. |
fdd3040 to
f05a3e6
Compare
|
Benchmark runs are scheduled for baseline = d88ed6a and contender = 2d28010. 2d28010 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #.
Rationale for this change
ArrayIter is great from an ergonomics perspective and definitely something we should continue to provide to users, however, the way it interleaves null masks severely makes for pretty poor performance. For primitive arrays, this can result in orders of magnitude more time spent handling the null mask than the values themselves.
What changes are included in this PR?
Are there any user-facing changes?
FYI @viirya @liukun4515