Fix "Incorrect Behavior of Collecting a filtered iterator to a BooleanArray"#8543
Fix "Incorrect Behavior of Collecting a filtered iterator to a BooleanArray"#8543mbrobbel merged 12 commits intoapache:mainfrom
Conversation
…ed len iterators Use BooleanBuilder in FromIterator Add BooleanAdapter Add BooleanArray::from_trusted_len_iter
|
@alamb Could you run the |
mbrobbel
left a comment
There was a problem hiding this comment.
On my M1 Pro:
BooleanArray::from_iter
[26.169 µs 26.194 µs 26.222 µs]
BooleanArray::from_trusted_len_iter
[9.6375 µs 9.6519 µs 9.6659 µs]
@mbrobbel Thanks for the review and running the benchmarks! Interessting how the performance differs on an ARM chip. Here are my numbers (Ryzen 3900X). |
|
I think the benchmark results for impl<Ptr: std::borrow::Borrow<Option<bool>>> FromIterator<Ptr> for BooleanArray {
fn from_iter<I: IntoIterator<Item = Ptr>>(iter: I) -> Self {
let iter = iter.into_iter();
let capacity = match iter.size_hint() {
(lower, Some(upper)) if lower == upper => lower,
_ => 0
};
let mut builder = BooleanBuilder::with_capacity(capacity);
builder.extend(iter.map(|item| *item.borrow()));
builder.finish()
}
}This is probably much slower than the current incorrect implementation, but any optimizations to |
39af718 to
fb8626a
Compare
fb8626a to
f7bd3c5
Compare
|
@jhorstmann Thanks for your input. That was a flaw in my benchmarks you're right. I've adapted them to now use a I really like the idea of re-using I've adapted the implementation so you can take a look. |
|
With the new With the old |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤔 The only one that looks suspicious is Int64Array::from_trusted_len_ter getting significantly slower I will rerun to double check |
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤔 Int64Array::from_trusted_len_iter 2.78 52.5±0.34µs ? ?/sec 1.00 18.9±0.04µs ? ?/sec |
…ter`" This reverts commit 78700b4.
|
🤖 |
|
🤖: Benchmark completed Details
|
Thanks! I think this is in-line with what I measured locally. |
Yes, I agree with your analysis -- and I think it is ok to take a hit of performance to make it correct 🤣 |
alamb
left a comment
There was a problem hiding this comment.
Thank you @tobixdev -- this code looks very nice and consistent with PrimitiveArray to me
@jhorstmann and @mbrobbel perhaps you can give this PR a final review before we merge as well to make sure we haven't missed anything
|
Looks good. This clearly fixes the incorrect behavior, and adding |
Yes, I agree that it would be great to get similar / better performance without any heap allocations. Thanks for creating the issue! |
|
I think all comments are resolved and some of them will be tackled in the follow-up PR / issue. Thanks for your input! |
Which issue does this PR close?
Rationale for this change
Fix the bug and align
BooleanArray::from_itertoPrimitiveArray::from_iterIn
BooleanArray::from_iter:Collecting to a
Vecand then usingfrom_trusted_len_iterwas almost double as fast as usingBooleanBufferBuilderon my machine.What changes are included in this PR?
BooleanArray::from_iterto fix the wrong behaviorBooleanArray::from_trusted_len_iterfor a more performant version (The old version ofBooleanArray::from_iter, just with unsafe flavor ofbit_util::set_bit_raw)BooleanAdapter, inspired byNativeAdapterfrom thePrimitiveArray. This allows also doingBooleanArray::from_iter([true, false].into_iter()).Are these changes tested?
BooleanArray::from_trusted_len_iterdirectly (oldBooleanArray::from_iteralso cover it indirectly)[false, true, ...](noOption)Are there any user-facing changes?
BooleanArray::from_iterhas a "slight" performance regression that users could observe.BooleanArrayBooleanArray::from_trusted_len_iter