ARROW-10010: [Rust] Speedup arithmetic (1.3-1.9x)#8191
Closed
jorgecarleitao wants to merge 4 commits intoapache:masterfrom
jorgecarleitao:divide_simd_faster
Closed
ARROW-10010: [Rust] Speedup arithmetic (1.3-1.9x)#8191jorgecarleitao wants to merge 4 commits intoapache:masterfrom jorgecarleitao:divide_simd_faster
jorgecarleitao wants to merge 4 commits intoapache:masterfrom
jorgecarleitao:divide_simd_faster
Conversation
nevi-me
reviewed
Sep 15, 2020
| let null_bit_buffer = | ||
| combine_option_bitmap(left.data_ref(), right.data_ref(), left.len())?; | ||
| let bitmap = null_bit_buffer.map(Bitmap::from); | ||
| let bitmap = null_bit_buffer.clone().map(Bitmap::from); |
Member
Author
There was a problem hiding this comment.
Maybe not, but I was unable to get a bitmap reference to set the mask for SIMD from a buffer without clone.
nevi-me
approved these changes
Sep 15, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR speeds-up arithmetic ops by leveraging vectorization of non-divide operations (in non-SIMD), as well as removing an un-needed operation in SIMD division.
For non-SIMD, this yields about
[-30%,-45%]for all operations (+-*/)For SIMD, this yields about
-30%on division.The culprit in non-SIMD was that we required the operation to return
Result<T::Native>, which was not allowing the compiler to vectorize the operation. Only the division requiresResult. For divide, removing the operator further speed up the operation (I do not know the reason).The culprit in SIMD was primarily a
simd_loadtoo many that was not doing anything.Benchmarks
The benchmark used:
and below are the results for the execution of the second
bench, which is the one that gives the differential, in my machine:Non-SIMD
SIMD
divide is the only relevant