ARROW-10010: [Rust] Speedup arithmetic (1.3-1.9x) by jorgecarleitao · Pull Request #8191 · apache/arrow

jorgecarleitao · 2020-09-15T05:55:48Z

This PR speeds-up arithmetic ops by leveraging vectorization of non-divide operations (in non-SIMD), as well as removing an un-needed operation in SIMD division.

For non-SIMD, this yields about [-30%,-45%] for all operations (+-*/)
For SIMD, this yields about -30% on division.

The culprit in non-SIMD was that we required the operation to return Result<T::Native>, which was not allowing the compiler to vectorize the operation. Only the division requires Result. For divide, removing the operator further speed up the operation (I do not know the reason).

The culprit in SIMD was primarily a simd_load too many that was not doing anything.

Benchmarks

The benchmark used:

set -e
git checkout 0852869d1a9b7da4a1b91fa7cb7d4ef48e99cdec
cargo bench --bench arithmetic_kernels
git checkout divide_simd_faster
cargo bench --bench arithmetic_kernels
echo "##################################"
git checkout 0852869d1a9b7da4a1b91fa7cb7d4ef48e99cdec
cargo bench --bench arithmetic_kernels --features simd
git checkout divide_simd_faster
cargo bench --bench arithmetic_kernels --features simd

and below are the results for the execution of the second bench, which is the one that gives the differential, in my machine:

Non-SIMD

Previous HEAD position was 0852869d1 Improved benches for arithmetic.
Switched to branch 'divide_simd_faster'
   Compiling arrow v2.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 37.24s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-d281862a43faaf38
Gnuplot not found, using plotters backend
add 512                 time:   [1.4714 us 1.4758 us 1.4803 us]                     
                        change: [-44.446% -43.969% -43.522%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high severe

subtract 512            time:   [1.4825 us 1.4844 us 1.4866 us]                          
                        change: [-45.351% -45.018% -44.686%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

multiply 512            time:   [1.4895 us 1.4936 us 1.4990 us]                          
                        change: [-44.822% -44.135% -43.479%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

divide 512              time:   [1.9742 us 1.9773 us 1.9810 us]                        
                        change: [-33.273% -32.688% -32.052%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  7 (7.00%) high mild
  7 (7.00%) high severe

limit 512, 512          time:   [374.66 ns 375.64 ns 376.53 ns]                           
                        change: [-0.1000% +0.4442% +0.9503%] (p = 0.10 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

add_nulls_512           time:   [1.4880 us 1.4982 us 1.5115 us]                           
                        change: [-44.084% -43.116% -42.111%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  3 (3.00%) high mild
  13 (13.00%) high severe

divide_nulls_512        time:   [1.9731 us 1.9758 us 1.9790 us]                              
                        change: [-33.404% -32.570% -31.416%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

SIMD

divide is the only relevant

Previous HEAD position was 0852869d1 Improved benches for arithmetic.
Switched to branch 'divide_simd_faster'
   Compiling arrow v2.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
    Finished bench [optimized] target(s) in 38.63s
     Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-b8dc1739cfb5ae36
Gnuplot not found, using plotters backend
add 512                 time:   [879.31 ns 883.95 ns 889.17 ns]                     
                        change: [-0.2041% +0.6502% +1.5484%] (p = 0.15 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

subtract 512            time:   [864.99 ns 866.95 ns 868.95 ns]                          
                        change: [-4.8531% -4.1561% -3.5163%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

multiply 512            time:   [862.85 ns 864.87 ns 867.71 ns]                          
                        change: [-3.8532% -3.1774% -2.4459%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high severe

divide 512              time:   [1.9703 us 1.9771 us 1.9843 us]                        
                        change: [-30.046% -29.457% -28.903%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

limit 512, 512          time:   [368.89 ns 369.96 ns 370.96 ns]                           
                        change: [-1.9574% -1.0063% -0.0347%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 26 outliers among 100 measurements (26.00%)
  5 (5.00%) low severe
  6 (6.00%) low mild
  9 (9.00%) high mild
  6 (6.00%) high severe

add_nulls_512           time:   [871.97 ns 876.99 ns 883.57 ns]                           
                        change: [-5.1106% -3.6889% -2.3080%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

divide_nulls_512        time:   [1.9582 us 1.9625 us 1.9678 us]                              
                        change: [-34.188% -33.161% -32.136%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

github-actions · 2020-09-15T06:06:11Z

https://issues.apache.org/jira/browse/ARROW-10010

nevi-me · 2020-09-15T06:50:05Z

rust/arrow/src/compute/kernels/arithmetic.rs

    let null_bit_buffer =
        combine_option_bitmap(left.data_ref(), right.data_ref(), left.len())?;
-    let bitmap = null_bit_buffer.map(Bitmap::from);
+    let bitmap = null_bit_buffer.clone().map(Bitmap::from);


Is this clone necessary?

Maybe not, but I was unable to get a bitmap reference to set the mask for SIMD from a buffer without clone.

jorgecarleitao added 4 commits September 15, 2020 07:05

Improved benches for arithmetic.

0852869

Speed-up simd division.

3983985

Improved speed of math op with nulls.

256f90a

Improved speed of non-divide ops.

9fe0b72

nevi-me reviewed Sep 15, 2020

View reviewed changes

nevi-me approved these changes Sep 15, 2020

View reviewed changes

nevi-me closed this in 49e5b46 Sep 15, 2020

jorgecarleitao deleted the divide_simd_faster branch September 15, 2020 09:11

asfimport mentioned this pull request Sep 15, 2020

[Rust] Speedup arithmetic #26034

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-10010: [Rust] Speedup arithmetic (1.3-1.9x)#8191

ARROW-10010: [Rust] Speedup arithmetic (1.3-1.9x)#8191
jorgecarleitao wants to merge 4 commits intoapache:masterfrom
jorgecarleitao:divide_simd_faster

jorgecarleitao commented Sep 15, 2020

Uh oh!

github-actions bot commented Sep 15, 2020

Uh oh!

nevi-me Sep 15, 2020

Uh oh!

jorgecarleitao Sep 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jorgecarleitao commented Sep 15, 2020

Benchmarks

Non-SIMD

SIMD

Uh oh!

github-actions bot commented Sep 15, 2020

Uh oh!

nevi-me Sep 15, 2020

Choose a reason for hiding this comment

Uh oh!

jorgecarleitao Sep 15, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants