Speed up the substring kernel by about 2x#1512
Conversation
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
| offsets | ||
| .windows(2) | ||
| .map(|pair| (pair[1] + start).max(pair[0])) | ||
| .collect() |
There was a problem hiding this comment.
Might be better to do new_starts/new_length in one iteration over offsets and use unzip to materialize into two Vecs?
There was a problem hiding this comment.
Thinking about it, I think actually materializing new_length to a Vec is not needed. It probably is faster (and saves allocating this Vec) to compute the length again when calculating new_values.
There was a problem hiding this comment.
I tried to remove .collect, actually. But you know, closure values in if-else branch can get no two closures, even if identical, have the same type error. And I did not find a good solution except adding .collect
rust-lang/rust#87961
There was a problem hiding this comment.
Yes, that is not possible in Rust (I think only by Boxing or manually inlining / macros).
Calculating the length in two places with something like let length = *end - *start within both the new_offsets new_values doesn´t seem too bad in terms of duplication?
There was a problem hiding this comment.
Yes, that is not possible in Rust (I think only by Boxing or manually inlining / macros).
Calculating the length in two places with something like
let length = *end - *startwithin both thenew_offsetsnew_valuesdoesn´t seem too bad in terms of duplication?
One way is to allow the closure not to capture the environment by adding more parameters, and the compiler will downcast closure to function. This works for calculating new_starts, but doesn't work for new_length where we must capture Some(length) = length
There was a problem hiding this comment.
My later comment mentioned not allocating the new vec but not allocating the array at all (by moving the calculation to the iteration in new_values). Is that something you want to try?
Thinking about it, I think actually materializing new_length to a Vec is not needed. It probably is faster (and saves allocating this Vec) to compute the length again when calculating new_values.
| let length_i: OffsetSize = offsets[i + 1] - offsets[i]; | ||
| // compute where we should start slicing this entry | ||
| let start = offsets[i] | ||
| + if start >= OffsetSize::zero() { |
There was a problem hiding this comment.
We don't need to compare start with zero multiple time. So I move this outside the for loop
| let start = start.max(offsets[i]).min(offsets[i + 1]); | ||
| // compute the length of the slice | ||
| let length: OffsetSize = length | ||
| .unwrap_or(length_i) |
There was a problem hiding this comment.
Same reason! we don't need to unwrap length in each loop. So move this outside the loop.
| .map(|(start, length)| { | ||
| (start.to_usize().unwrap(), length.to_usize().unwrap()) | ||
| }) | ||
| .map(|(start, length)| &data[start..start + length]) |
There was a problem hiding this comment.
| .map(|(start, length)| { | |
| (start.to_usize().unwrap(), length.to_usize().unwrap()) | |
| }) | |
| .map(|(start, length)| &data[start..start + length]) | |
| .map(|(start, length)| { | |
| let start = start.to_usize().unwrap(); | |
| let length = length.to_usize().unwrap(); | |
| &data[start..start + length] | |
| }) |
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #1512 +/- ##
==========================================
+ Coverage 82.72% 82.75% +0.02%
==========================================
Files 189 190 +1
Lines 54561 54707 +146
==========================================
+ Hits 45137 45274 +137
- Misses 9424 9433 +9
Continue to review full report at Codecov.
|
Signed-off-by: remzi <[email protected]>
Signed-off-by: remzi <[email protected]>
|
Hi @Dandandan ! These are some changes since your last review:
|
| length_i + start | ||
| }; | ||
| let cal_new_start: Box<dyn Fn(OffsetSize, OffsetSize) -> OffsetSize> = if start | ||
| >= zero |
There was a problem hiding this comment.
cargo fmt likes this weird format.
| length_so_far += length; | ||
| // start and end offsets for each substring | ||
| let mut new_starts_ends: Vec<(OffsetSize, OffsetSize)> = | ||
| Vec::with_capacity(array.len()); |
There was a problem hiding this comment.
If we don't materialize new_starts_ends here, we have to calculate them twice when calculating new_values
There was a problem hiding this comment.
It is a time-space trade-off, but not the major factor affecting performance
| new_offsets.push(zero); | ||
|
|
||
| new_offsets.push(length_so_far); | ||
| offsets.windows(2).for_each(|pair| { |
There was a problem hiding this comment.
Purely functional style is not appropriate because new_offsets and new_starts_ends have different number of elements. After trying some different ways, I find that mut vec + for each is more readable and efficient.
|
Hi @Dandandan ! What's your opinion? |
alamb
left a comment
There was a problem hiding this comment.
Thank you @HaoYang670
I spent some time reading this code but I ran out of time to properly review it. I am worried there are corner cases in here that might not be covered by tests -- did you review test coverage and is this properly covered?
I think some comments might help me understand the code more quickly as well.
I will try and find some more time later this week or next to review it more
|
Thank you very much @alamb. I have reviewed the Codedev report and all code branches have been covered. |
substring kernelsubstring kernel by about 2x
|
Thanks @HaoYang670 and @Dandandan |
Which issue does this PR close?
Closes #1511.
Rationale for this change
Improve the performance and do some clean-up work.
What changes are included in this PR?
new_offsetsfirst to estimate the memory needed for bufferBenchmark
chip: Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz
About 2X speed-up
Are there any user-facing changes?
No.