ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size#7442
ARROW-9075: [C++] Optimized Filter implementation: faster performance + compilation, smaller code size#7442wesm wants to merge 3 commits intoapache:masterfrom
Conversation
|
Here's benchmark runs on my machine
If you want to benchmark yourself, please use this branch for the "before": https://github.com/wesm/arrow/tree/ARROW-9075-comparison. It contains the RandomArrayGenerator::Boolean change and some other changes to the benchmarks without which the results will be non-comparable |
|
To show some simple numbers to show the perf before and after in Python, this example has a high selectivity (all but one value selected) and low selectivity filter (1/100 and 1/1000): before: after EDIT: updated benchmarks for low-selectivity optimization |
|
The RTools 4.0 build is spurious. This is ready for review |
|
I implemented some other optimizations, especially for the case where neither values nor filter contain nulls. I'm working on updated benchmarks Updated benchmarks: https://gist.github.com/wesm/ad07cec1613b6327926dfe1d95e7f4f0/revisions?diff=split |
|
I found some issues in the Python benchmarks I posted before. Here's the updated setup and current numbers setup (I was including the cost of converting NumPy booleans to Arrow booleans in the prior results). I also added a "worst case scenario" where 50% of values are not selected before: after: |
|
@ursabot benchmark --help |
|
|
@ursabot benchmark --benchmark-filter=Filter 66df3d0 |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#112487) builder has been succeeded. Revision: 31a66630f6bcb9a3f74912da7d31ac2412e97184 ======================================= =============== =============== =========
benchmark baseline contender change
======================================= =============== =============== =========
FilterInt64FilterWithNulls/262144/3 563.800 MiB/sec 576.625 MiB/sec 2.275%
- FilterStringFilterWithNulls/262144/3 498.174 MiB/sec 434.196 MiB/sec -12.842%
FilterFSLInt64FilterNoNulls/262144/9 158.897 MiB/sec 268.195 MiB/sec 68.785%
FilterInt64FilterNoNulls/262144/14 2.793 GiB/sec 6.554 GiB/sec 134.709%
FilterFSLInt64FilterNoNulls/262144/11 2.356 GiB/sec 5.386 GiB/sec 128.589%
FilterStringFilterNoNulls/262144/2 4.937 GiB/sec 10.996 GiB/sec 122.715%
FilterFSLInt64FilterWithNulls/262144/5 1.590 GiB/sec 4.193 GiB/sec 163.732%
FilterInt64FilterWithNulls/262144/12 519.932 MiB/sec 496.829 MiB/sec -4.443%
FilterInt64FilterNoNulls/262144/0 669.365 MiB/sec 7.541 GiB/sec 1053.558%
FilterFSLInt64FilterNoNulls/262144/1 268.027 MiB/sec 560.837 MiB/sec 109.246%
FilterStringFilterNoNulls/262144/6 488.692 MiB/sec 481.827 MiB/sec -1.405%
FilterInt64FilterNoNulls/262144/8 2.735 GiB/sec 6.313 GiB/sec 130.810%
FilterInt64FilterNoNulls/262144/5 2.809 GiB/sec 6.018 GiB/sec 114.267%
- FilterStringFilterWithNulls/262144/12 84.168 MiB/sec 70.410 MiB/sec -16.346%
FilterFSLInt64FilterNoNulls/262144/0 169.867 MiB/sec 718.594 MiB/sec 323.035%
FilterStringFilterWithNulls/262144/14 355.644 MiB/sec 878.914 MiB/sec 147.133%
FilterStringFilterWithNulls/262144/2 3.338 GiB/sec 8.903 GiB/sec 166.736%
FilterFSLInt64FilterWithNulls/262144/1 263.151 MiB/sec 512.905 MiB/sec 94.909%
FilterFSLInt64FilterNoNulls/262144/14 2.395 GiB/sec 5.212 GiB/sec 117.604%
FilterInt64FilterWithNulls/262144/11 1.729 GiB/sec 4.684 GiB/sec 170.948%
FilterInt64FilterNoNulls/262144/9 566.051 MiB/sec 3.083 GiB/sec 457.794%
- FilterStringFilterWithNulls/262144/10 619.724 MiB/sec 578.798 MiB/sec -6.604%
FilterInt64FilterWithNulls/262144/1 541.616 MiB/sec 558.958 MiB/sec 3.202%
FilterFSLInt64FilterWithNulls/262144/14 1.596 GiB/sec 4.061 GiB/sec 154.454%
FilterFSLInt64FilterWithNulls/262144/0 170.064 MiB/sec 398.738 MiB/sec 134.464%
FilterInt64FilterWithNulls/262144/2 1.739 GiB/sec 4.883 GiB/sec 180.721%
FilterInt64FilterWithNulls/262144/4 528.271 MiB/sec 555.772 MiB/sec 5.206%
FilterFSLInt64FilterNoNulls/262144/2 2.383 GiB/sec 6.074 GiB/sec 154.832%
FilterInt64FilterNoNulls/262144/4 584.370 MiB/sec 579.728 MiB/sec -0.794%
FilterInt64FilterNoNulls/262144/12 575.177 MiB/sec 3.023 GiB/sec 438.268%
- FilterStringFilterWithNulls/262144/9 459.179 MiB/sec 394.515 MiB/sec -14.083%
FilterStringFilterNoNulls/262144/5 4.936 GiB/sec 10.562 GiB/sec 113.987%
FilterInt64FilterNoNulls/262144/2 2.838 GiB/sec 7.390 GiB/sec 160.374%
FilterFSLInt64FilterNoNulls/262144/7 261.996 MiB/sec 464.922 MiB/sec 77.454%
FilterStringFilterNoNulls/262144/14 580.305 MiB/sec 1.253 GiB/sec 121.158%
FilterFSLInt64FilterWithNulls/262144/13 249.426 MiB/sec 386.982 MiB/sec 55.149%
- FilterInt64FilterWithNulls/262144/9 530.774 MiB/sec 497.368 MiB/sec -6.294%
FilterStringFilterWithNulls/262144/8 3.270 GiB/sec 8.467 GiB/sec 158.943%
FilterFSLInt64FilterNoNulls/262144/10 257.812 MiB/sec 390.196 MiB/sec 51.349%
- FilterStringFilterNoNulls/262144/13 98.039 MiB/sec 90.475 MiB/sec -7.716%
FilterInt64FilterWithNulls/262144/8 1.737 GiB/sec 4.652 GiB/sec 167.790%
FilterFSLInt64FilterWithNulls/262144/3 167.057 MiB/sec 351.817 MiB/sec 110.597%
- FilterStringFilterWithNulls/262144/6 494.580 MiB/sec 429.801 MiB/sec -13.098%
FilterFSLInt64FilterWithNulls/262144/12 165.174 MiB/sec 262.176 MiB/sec 58.728%
FilterInt64FilterWithNulls/262144/7 526.592 MiB/sec 541.187 MiB/sec 2.772%
FilterStringFilterNoNulls/262144/11 4.531 GiB/sec 9.652 GiB/sec 113.006%
FilterStringFilterWithNulls/262144/1 662.260 MiB/sec 633.359 MiB/sec -4.364%
FilterStringFilterWithNulls/262144/4 670.467 MiB/sec 644.877 MiB/sec -3.817%
FilterStringFilterNoNulls/262144/0 503.582 MiB/sec 550.304 MiB/sec 9.278%
- FilterStringFilterNoNulls/262144/9 443.066 MiB/sec 390.416 MiB/sec -11.883%
FilterFSLInt64FilterNoNulls/262144/13 251.747 MiB/sec 351.809 MiB/sec 39.747%
FilterInt64FilterNoNulls/262144/11 2.788 GiB/sec 6.687 GiB/sec 139.878%
- FilterInt64FilterWithNulls/262144/0 620.421 MiB/sec 585.692 MiB/sec -5.598%
FilterFSLInt64FilterWithNulls/262144/8 1.593 GiB/sec 4.155 GiB/sec 160.783%
- FilterStringFilterNoNulls/262144/7 692.942 MiB/sec 654.463 MiB/sec -5.553%
FilterStringFilterNoNulls/262144/8 4.900 GiB/sec 10.519 GiB/sec 114.694%
FilterInt64FilterWithNulls/262144/10 510.602 MiB/sec 527.612 MiB/sec 3.331%
FilterFSLInt64FilterNoNulls/262144/3 159.401 MiB/sec 555.494 MiB/sec 248.487%
FilterFSLInt64FilterNoNulls/262144/6 162.294 MiB/sec 399.907 MiB/sec 146.410%
- FilterStringFilterWithNulls/262144/0 517.359 MiB/sec 439.657 MiB/sec -15.019%
FilterInt64FilterWithNulls/262144/13 502.220 MiB/sec 527.971 MiB/sec 5.128%
FilterStringFilterWithNulls/262144/7 666.386 MiB/sec 638.254 MiB/sec -4.221%
FilterInt64FilterNoNulls/262144/6 603.261 MiB/sec 3.473 GiB/sec 489.518%
FilterStringFilterWithNulls/262144/11 2.994 GiB/sec 8.094 GiB/sec 170.304%
FilterFSLInt64FilterWithNulls/262144/6 165.225 MiB/sec 335.017 MiB/sec 102.765%
FilterFSLInt64FilterWithNulls/262144/7 257.333 MiB/sec 466.760 MiB/sec 81.383%
FilterInt64FilterNoNulls/262144/7 583.317 MiB/sec 564.896 MiB/sec -3.158%
FilterStringFilterNoNulls/262144/4 691.530 MiB/sec 699.221 MiB/sec 1.112%
FilterFSLInt64FilterWithNulls/262144/11 1.592 GiB/sec 4.057 GiB/sec 154.837%
- FilterStringFilterNoNulls/262144/12 88.970 MiB/sec 70.067 MiB/sec -21.246%
FilterInt64FilterNoNulls/262144/10 562.254 MiB/sec 545.802 MiB/sec -2.926%
FilterInt64FilterWithNulls/262144/14 1.738 GiB/sec 4.747 GiB/sec 173.077%
FilterFSLInt64FilterWithNulls/262144/2 1.570 GiB/sec 4.295 GiB/sec 173.597%
FilterInt64FilterNoNulls/262144/13 558.715 MiB/sec 554.622 MiB/sec -0.733%
FilterInt64FilterWithNulls/262144/6 561.253 MiB/sec 537.786 MiB/sec -4.181%
FilterStringFilterWithNulls/262144/13 91.370 MiB/sec 89.650 MiB/sec -1.882%
FilterFSLInt64FilterNoNulls/262144/12 153.042 MiB/sec 241.416 MiB/sec 57.745%
FilterFSLInt64FilterNoNulls/262144/5 2.414 GiB/sec 5.672 GiB/sec 134.917%
FilterFSLInt64FilterNoNulls/262144/8 2.377 GiB/sec 5.541 GiB/sec 133.082%
- FilterStringFilterNoNulls/262144/10 632.556 MiB/sec 572.816 MiB/sec -9.444%
FilterFSLInt64FilterWithNulls/262144/9 166.869 MiB/sec 288.049 MiB/sec 72.620%
FilterInt64FilterNoNulls/262144/1 599.855 MiB/sec 912.146 MiB/sec 52.061%
FilterStringFilterWithNulls/262144/5 3.295 GiB/sec 8.587 GiB/sec 160.574%
FilterFSLInt64FilterNoNulls/262144/4 263.896 MiB/sec 514.836 MiB/sec 95.091%
FilterFSLInt64FilterWithNulls/262144/4 258.744 MiB/sec 477.042 MiB/sec 84.369%
FilterInt64FilterWithNulls/262144/5 1.735 GiB/sec 4.728 GiB/sec 172.542%
FilterStringFilterNoNulls/262144/3 495.135 MiB/sec 539.178 MiB/sec 8.895%
FilterInt64FilterNoNulls/262144/3 611.978 MiB/sec 3.929 GiB/sec 557.402%
FilterFSLInt64FilterWithNulls/262144/10 255.156 MiB/sec 417.072 MiB/sec 63.458%
FilterStringFilterNoNulls/262144/1 714.448 MiB/sec 713.457 MiB/sec -0.139%
======================================= =============== =============== ========= |
|
The string perf regressions are mostly for the cases where 99.9% of the values are selected. I'll take a closer look at this to see what can be done. The varbinary case is so important that we might want to create a specialized implementation for it |
|
Still, a 10% decrease for string is highly tolerable for a 50-150% increase for all other types. |
|
True. I think for binary-based types we need to implement bulk-block-appends. It's beyond the scope of this PR -- I will take a brief look to see if there's anything dumb (like messing up the preallocation) that I did that's making things slower |
|
I'll have to deal with the string optimization in a follow up PR, so I'm going to leave this for review as is. It would be good to get this merged sooner rather than later. EDIT: opened https://issues.apache.org/jira/browse/ARROW-9152 |
|
Everything is much faster here, including string filtering. |
pitrou
left a comment
There was a problem hiding this comment.
I haven't taken a look at everything.
fsaintjacques
left a comment
There was a problem hiding this comment.
Some comments regarding testing and implementation.
cpp/src/arrow/compute/api_vector.h
Outdated
There was a problem hiding this comment.
This should probably be extracted as a ScalarFunction named popcount or so (follow up)
|
@ursabot benchmark --benchmark-filter=Filter c4f425768 |
|
I think I improved some of the readability problems and addressed the other comments. I'd like to merge this soon once CI is green |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#112952) builder has been succeeded. Revision: f50b39e54c50e8a53606eda486c88e6ec51d7006 ======================================= =============== ================ ========
benchmark baseline contender change
======================================= =============== ================ ========
- FilterFSLInt64FilterNoNulls/262144/14 5.457 GiB/sec 4.398 GiB/sec -19.404%
FilterStringFilterWithNulls/262144/4 642.405 MiB/sec 677.920 MiB/sec 5.528%
- FilterFSLInt64FilterNoNulls/262144/7 463.992 MiB/sec 378.391 MiB/sec -18.449%
FilterFSLInt64FilterWithNulls/262144/6 333.996 MiB/sec 320.327 MiB/sec -4.093%
- FilterFSLInt64FilterWithNulls/262144/1 516.189 MiB/sec 459.926 MiB/sec -10.900%
- FilterStringFilterNoNulls/262144/4 681.504 MiB/sec 595.788 MiB/sec -12.577%
- FilterFSLInt64FilterNoNulls/262144/8 5.889 GiB/sec 4.675 GiB/sec -20.610%
- FilterInt64FilterWithNulls/262144/10 606.960 MiB/sec 547.973 MiB/sec -9.718%
- FilterInt64FilterNoNulls/262144/7 638.264 MiB/sec 568.923 MiB/sec -10.864%
FilterStringFilterWithNulls/262144/6 431.474 MiB/sec 484.077 MiB/sec 12.191%
- FilterStringFilterNoNulls/262144/14 1.245 GiB/sec 1008.386 MiB/sec -20.893%
FilterFSLInt64FilterWithNulls/262144/11 4.239 GiB/sec 4.029 GiB/sec -4.954%
- FilterStringFilterNoNulls/262144/8 10.899 GiB/sec 8.494 GiB/sec -22.064%
- FilterFSLInt64FilterNoNulls/262144/4 515.626 MiB/sec 406.426 MiB/sec -21.178%
FilterInt64FilterNoNulls/262144/6 3.697 GiB/sec 3.525 GiB/sec -4.664%
FilterInt64FilterNoNulls/262144/8 6.829 GiB/sec 6.809 GiB/sec -0.301%
- FilterFSLInt64FilterNoNulls/262144/2 6.453 GiB/sec 4.950 GiB/sec -23.289%
- FilterInt64FilterWithNulls/262144/13 606.984 MiB/sec 548.948 MiB/sec -9.561%
- FilterStringFilterNoNulls/262144/1 707.132 MiB/sec 609.027 MiB/sec -13.874%
FilterStringFilterWithNulls/262144/3 436.301 MiB/sec 488.825 MiB/sec 12.038%
FilterStringFilterWithNulls/262144/1 616.105 MiB/sec 675.493 MiB/sec 9.639%
FilterStringFilterNoNulls/262144/3 548.660 MiB/sec 533.539 MiB/sec -2.756%
- FilterFSLInt64FilterNoNulls/262144/9 268.363 MiB/sec 250.359 MiB/sec -6.709%
- FilterStringFilterNoNulls/262144/13 89.995 MiB/sec 76.326 MiB/sec -15.189%
FilterStringFilterWithNulls/262144/12 71.366 MiB/sec 82.415 MiB/sec 15.483%
FilterInt64FilterNoNulls/262144/9 3.209 GiB/sec 3.114 GiB/sec -2.971%
FilterFSLInt64FilterWithNulls/262144/9 288.819 MiB/sec 276.679 MiB/sec -4.203%
FilterStringFilterNoNulls/262144/12 66.141 MiB/sec 65.509 MiB/sec -0.956%
- FilterFSLInt64FilterWithNulls/262144/4 474.907 MiB/sec 429.013 MiB/sec -9.664%
- FilterInt64FilterWithNulls/262144/1 651.659 MiB/sec 556.258 MiB/sec -14.640%
FilterStringFilterWithNulls/262144/14 911.019 MiB/sec 871.756 MiB/sec -4.310%
- FilterInt64FilterNoNulls/262144/4 675.941 MiB/sec 569.448 MiB/sec -15.755%
- FilterFSLInt64FilterNoNulls/262144/13 352.227 MiB/sec 307.638 MiB/sec -12.659%
FilterInt64FilterWithNulls/262144/5 5.129 GiB/sec 4.921 GiB/sec -4.068%
- FilterFSLInt64FilterWithNulls/262144/14 4.168 GiB/sec 3.909 GiB/sec -6.200%
FilterStringFilterWithNulls/262144/9 396.156 MiB/sec 442.591 MiB/sec 11.721%
- FilterFSLInt64FilterNoNulls/262144/3 554.664 MiB/sec 464.787 MiB/sec -16.204%
- FilterStringFilterNoNulls/262144/2 11.394 GiB/sec 8.924 GiB/sec -21.683%
- FilterStringFilterWithNulls/262144/8 8.856 GiB/sec 8.075 GiB/sec -8.825%
- FilterFSLInt64FilterNoNulls/262144/10 389.368 MiB/sec 333.033 MiB/sec -14.468%
- FilterFSLInt64FilterNoNulls/262144/11 5.587 GiB/sec 4.507 GiB/sec -19.338%
FilterStringFilterWithNulls/262144/10 580.314 MiB/sec 612.106 MiB/sec 5.478%
- FilterFSLInt64FilterNoNulls/262144/5 6.032 GiB/sec 4.717 GiB/sec -21.802%
- FilterFSLInt64FilterNoNulls/262144/0 725.211 MiB/sec 565.535 MiB/sec -22.018%
- FilterInt64FilterNoNulls/262144/3 4.266 GiB/sec 3.855 GiB/sec -9.641%
- FilterInt64FilterWithNulls/262144/12 549.159 MiB/sec 499.761 MiB/sec -8.995%
- FilterInt64FilterWithNulls/262144/0 622.810 MiB/sec 497.075 MiB/sec -20.188%
- FilterInt64FilterNoNulls/262144/1 1.021 GiB/sec 980.686 MiB/sec -6.230%
- FilterFSLInt64FilterWithNulls/262144/0 399.890 MiB/sec 375.677 MiB/sec -6.055%
- FilterFSLInt64FilterWithNulls/262144/2 4.497 GiB/sec 4.233 GiB/sec -5.880%
- FilterFSLInt64FilterNoNulls/262144/1 564.700 MiB/sec 431.560 MiB/sec -23.577%
- FilterInt64FilterWithNulls/262144/9 549.832 MiB/sec 499.657 MiB/sec -9.125%
- FilterInt64FilterWithNulls/262144/7 625.701 MiB/sec 550.091 MiB/sec -12.084%
FilterInt64FilterNoNulls/262144/14 6.386 GiB/sec 6.901 GiB/sec 8.073%
FilterInt64FilterWithNulls/262144/8 5.034 GiB/sec 4.958 GiB/sec -1.517%
FilterInt64FilterNoNulls/262144/12 3.215 GiB/sec 3.131 GiB/sec -2.607%
FilterStringFilterNoNulls/262144/0 560.832 MiB/sec 545.275 MiB/sec -2.774%
- FilterStringFilterNoNulls/262144/7 641.313 MiB/sec 582.952 MiB/sec -9.100%
- FilterInt64FilterWithNulls/262144/3 615.558 MiB/sec 496.003 MiB/sec -19.422%
- FilterStringFilterNoNulls/262144/10 578.560 MiB/sec 506.085 MiB/sec -12.527%
FilterInt64FilterWithNulls/262144/14 4.934 GiB/sec 4.873 GiB/sec -1.228%
FilterInt64FilterNoNulls/262144/5 7.145 GiB/sec 6.863 GiB/sec -3.945%
FilterStringFilterWithNulls/262144/7 632.496 MiB/sec 669.411 MiB/sec 5.836%
FilterInt64FilterWithNulls/262144/11 4.937 GiB/sec 4.860 GiB/sec -1.544%
- FilterStringFilterWithNulls/262144/5 9.095 GiB/sec 8.275 GiB/sec -9.015%
FilterStringFilterNoNulls/262144/6 483.482 MiB/sec 470.273 MiB/sec -2.732%
- FilterFSLInt64FilterWithNulls/262144/7 464.358 MiB/sec 418.157 MiB/sec -9.949%
- FilterStringFilterNoNulls/262144/11 10.039 GiB/sec 7.873 GiB/sec -21.572%
FilterInt64FilterNoNulls/262144/11 6.389 GiB/sec 6.942 GiB/sec 8.664%
- FilterFSLInt64FilterNoNulls/262144/6 400.926 MiB/sec 355.070 MiB/sec -11.437%
- FilterStringFilterNoNulls/262144/5 10.942 GiB/sec 8.621 GiB/sec -21.211%
FilterInt64FilterNoNulls/262144/2 7.901 GiB/sec 7.942 GiB/sec 0.526%
- FilterFSLInt64FilterWithNulls/262144/13 387.523 MiB/sec 354.145 MiB/sec -8.613%
- FilterInt64FilterNoNulls/262144/10 635.634 MiB/sec 574.368 MiB/sec -9.639%
- FilterStringFilterWithNulls/262144/11 8.363 GiB/sec 7.663 GiB/sec -8.365%
- FilterInt64FilterWithNulls/262144/4 644.733 MiB/sec 554.689 MiB/sec -13.966%
- FilterInt64FilterWithNulls/262144/2 5.308 GiB/sec 4.950 GiB/sec -6.739%
- FilterInt64FilterWithNulls/262144/6 582.743 MiB/sec 494.561 MiB/sec -15.132%
FilterFSLInt64FilterWithNulls/262144/5 4.299 GiB/sec 4.094 GiB/sec -4.757%
FilterInt64FilterNoNulls/262144/0 7.685 GiB/sec 8.021 GiB/sec 4.371%
- FilterInt64FilterNoNulls/262144/13 634.999 MiB/sec 574.211 MiB/sec -9.573%
- FilterStringFilterWithNulls/262144/2 9.478 GiB/sec 8.593 GiB/sec -9.337%
FilterFSLInt64FilterWithNulls/262144/8 4.256 GiB/sec 4.060 GiB/sec -4.609%
- FilterFSLInt64FilterWithNulls/262144/10 422.316 MiB/sec 380.968 MiB/sec -9.791%
FilterStringFilterNoNulls/262144/9 383.197 MiB/sec 374.020 MiB/sec -2.395%
- FilterFSLInt64FilterNoNulls/262144/12 242.820 MiB/sec 227.762 MiB/sec -6.201%
FilterStringFilterWithNulls/262144/0 429.008 MiB/sec 493.378 MiB/sec 15.004%
- FilterFSLInt64FilterWithNulls/262144/12 267.881 MiB/sec 249.827 MiB/sec -6.739%
FilterFSLInt64FilterWithNulls/262144/3 349.988 MiB/sec 337.076 MiB/sec -3.689%
FilterStringFilterWithNulls/262144/13 90.911 MiB/sec 97.476 MiB/sec 7.222%
======================================= =============== ================ ======== |
|
Something weird with the commit history, I'm not sure those benchmarks are right. I'll rebase things again and rerun |
Small fix More work, start writing filter -> selection vector Things compiling again finally BinaryBitBlockCounter tests passing Consolidate take/filter tests in same module, fix GetTakeIndices / GetFilterOutputSize unit tests and implementations Finish filter implementation, tests passing again Clean up includes Tweak benchmark parameters Some string streamlining Python fixes Python test fixes. Add fast path for low-selectivity filters Low selectivity path for non-primitive filtering VisitFilter is not a dependent template Implement some obvious non-null filter optimizations Fix typo
…ter paths less spaghetti Split primitive filter paths between DROP/EMIT_NULL, improve readability
|
AMD64 Ubuntu 18.04 C++ Benchmark (#112989) builder has been succeeded. Revision: 21227cc ======================================= =============== ================ ========
benchmark baseline contender change
======================================= =============== ================ ========
- FilterStringFilterNoNulls/262144/7 637.909 MiB/sec 572.355 MiB/sec -10.276%
- FilterStringFilterNoNulls/262144/8 10.897 GiB/sec 8.711 GiB/sec -20.057%
FilterStringFilterNoNulls/262144/6 485.775 MiB/sec 476.410 MiB/sec -1.928%
FilterStringFilterWithNulls/262144/4 649.558 MiB/sec 677.796 MiB/sec 4.347%
FilterInt64FilterNoNulls/262144/9 3.212 GiB/sec 3.264 GiB/sec 1.612%
- FilterFSLInt64FilterNoNulls/262144/13 351.877 MiB/sec 308.073 MiB/sec -12.449%
- FilterFSLInt64FilterNoNulls/262144/10 389.471 MiB/sec 333.418 MiB/sec -14.392%
- FilterInt64FilterNoNulls/262144/4 668.729 MiB/sec 625.199 MiB/sec -6.509%
FilterFSLInt64FilterWithNulls/262144/9 287.988 MiB/sec 276.495 MiB/sec -3.991%
- FilterStringFilterWithNulls/262144/2 9.441 GiB/sec 8.793 GiB/sec -6.865%
FilterStringFilterWithNulls/262144/12 73.855 MiB/sec 82.463 MiB/sec 11.656%
- FilterFSLInt64FilterNoNulls/262144/5 6.091 GiB/sec 4.403 GiB/sec -27.714%
- FilterFSLInt64FilterNoNulls/262144/3 550.519 MiB/sec 463.959 MiB/sec -15.723%
FilterInt64FilterNoNulls/262144/2 7.988 GiB/sec 7.976 GiB/sec -0.147%
- FilterStringFilterNoNulls/262144/4 700.795 MiB/sec 605.189 MiB/sec -13.643%
- FilterFSLInt64FilterWithNulls/262144/1 516.544 MiB/sec 460.521 MiB/sec -10.846%
- FilterStringFilterWithNulls/262144/8 8.877 GiB/sec 8.364 GiB/sec -5.779%
- FilterFSLInt64FilterWithNulls/262144/3 350.123 MiB/sec 329.103 MiB/sec -6.004%
FilterStringFilterWithNulls/262144/3 435.836 MiB/sec 494.167 MiB/sec 13.384%
FilterInt64FilterNoNulls/262144/10 630.544 MiB/sec 628.104 MiB/sec -0.387%
- FilterStringFilterNoNulls/262144/5 11.014 GiB/sec 8.788 GiB/sec -20.216%
FilterInt64FilterNoNulls/262144/3 4.263 GiB/sec 4.181 GiB/sec -1.936%
FilterInt64FilterWithNulls/262144/1 635.637 MiB/sec 615.015 MiB/sec -3.244%
FilterStringFilterWithNulls/262144/7 638.645 MiB/sec 678.465 MiB/sec 6.235%
- FilterFSLInt64FilterNoNulls/262144/2 6.506 GiB/sec 5.012 GiB/sec -22.975%
- FilterFSLInt64FilterNoNulls/262144/0 729.854 MiB/sec 569.623 MiB/sec -21.954%
FilterInt64FilterNoNulls/262144/5 6.946 GiB/sec 6.899 GiB/sec -0.674%
FilterInt64FilterWithNulls/262144/12 545.763 MiB/sec 547.657 MiB/sec 0.347%
FilterStringFilterNoNulls/262144/9 383.858 MiB/sec 377.178 MiB/sec -1.740%
- FilterFSLInt64FilterNoNulls/262144/8 5.825 GiB/sec 4.702 GiB/sec -19.289%
FilterInt64FilterNoNulls/262144/13 632.053 MiB/sec 633.157 MiB/sec 0.175%
FilterInt64FilterNoNulls/262144/1 1.020 GiB/sec 1.022 GiB/sec 0.239%
- FilterFSLInt64FilterNoNulls/262144/12 242.197 MiB/sec 228.152 MiB/sec -5.799%
FilterInt64FilterWithNulls/262144/4 640.980 MiB/sec 614.192 MiB/sec -4.179%
FilterInt64FilterWithNulls/262144/8 4.967 GiB/sec 5.071 GiB/sec 2.102%
- FilterFSLInt64FilterWithNulls/262144/0 396.373 MiB/sec 374.388 MiB/sec -5.546%
FilterInt64FilterWithNulls/262144/11 4.934 GiB/sec 4.997 GiB/sec 1.282%
- FilterFSLInt64FilterNoNulls/262144/14 5.435 GiB/sec 4.459 GiB/sec -17.946%
FilterInt64FilterNoNulls/262144/12 3.255 GiB/sec 3.185 GiB/sec -2.144%
FilterStringFilterWithNulls/262144/1 638.704 MiB/sec 690.413 MiB/sec 8.096%
- FilterStringFilterNoNulls/262144/2 11.411 GiB/sec 9.040 GiB/sec -20.778%
FilterInt64FilterWithNulls/262144/6 582.753 MiB/sec 554.462 MiB/sec -4.855%
FilterStringFilterWithNulls/262144/10 586.149 MiB/sec 616.404 MiB/sec 5.162%
FilterInt64FilterNoNulls/262144/0 7.653 GiB/sec 7.971 GiB/sec 4.146%
FilterInt64FilterWithNulls/262144/13 590.396 MiB/sec 607.816 MiB/sec 2.951%
- FilterStringFilterNoNulls/262144/14 1.254 GiB/sec 1011.778 MiB/sec -21.233%
- FilterFSLInt64FilterWithNulls/262144/4 474.573 MiB/sec 428.073 MiB/sec -9.798%
FilterInt64FilterWithNulls/262144/2 5.245 GiB/sec 5.072 GiB/sec -3.310%
- FilterStringFilterWithNulls/262144/11 8.381 GiB/sec 7.793 GiB/sec -7.006%
FilterFSLInt64FilterWithNulls/262144/14 4.065 GiB/sec 3.917 GiB/sec -3.648%
- FilterFSLInt64FilterNoNulls/262144/1 566.516 MiB/sec 432.124 MiB/sec -23.723%
FilterStringFilterWithNulls/262144/6 431.308 MiB/sec 489.475 MiB/sec 13.486%
- FilterFSLInt64FilterNoNulls/262144/9 267.636 MiB/sec 250.549 MiB/sec -6.385%
- FilterFSLInt64FilterWithNulls/262144/2 4.505 GiB/sec 4.244 GiB/sec -5.789%
- FilterStringFilterNoNulls/262144/1 699.807 MiB/sec 605.175 MiB/sec -13.523%
FilterInt64FilterWithNulls/262144/14 4.914 GiB/sec 4.970 GiB/sec 1.141%
- FilterStringFilterNoNulls/262144/11 9.990 GiB/sec 7.988 GiB/sec -20.035%
- FilterStringFilterNoNulls/262144/12 70.677 MiB/sec 65.603 MiB/sec -7.180%
FilterStringFilterWithNulls/262144/9 395.814 MiB/sec 447.434 MiB/sec 13.042%
FilterFSLInt64FilterWithNulls/262144/6 333.780 MiB/sec 319.575 MiB/sec -4.256%
FilterFSLInt64FilterWithNulls/262144/8 4.263 GiB/sec 4.091 GiB/sec -4.021%
FilterInt64FilterNoNulls/262144/14 6.414 GiB/sec 6.933 GiB/sec 8.095%
FilterStringFilterWithNulls/262144/0 441.849 MiB/sec 496.266 MiB/sec 12.316%
FilterInt64FilterNoNulls/262144/11 6.411 GiB/sec 6.874 GiB/sec 7.218%
- FilterInt64FilterNoNulls/262144/7 648.036 MiB/sec 547.011 MiB/sec -15.589%
- FilterFSLInt64FilterWithNulls/262144/10 419.063 MiB/sec 381.681 MiB/sec -8.920%
- FilterFSLInt64FilterWithNulls/262144/13 386.755 MiB/sec 353.726 MiB/sec -8.540%
FilterInt64FilterNoNulls/262144/8 6.724 GiB/sec 7.073 GiB/sec 5.190%
FilterInt64FilterWithNulls/262144/9 545.560 MiB/sec 545.449 MiB/sec -0.020%
- FilterStringFilterNoNulls/262144/10 575.809 MiB/sec 507.681 MiB/sec -11.832%
- FilterStringFilterWithNulls/262144/5 9.154 GiB/sec 8.428 GiB/sec -7.931%
FilterStringFilterNoNulls/262144/0 519.896 MiB/sec 554.802 MiB/sec 6.714%
FilterFSLInt64FilterWithNulls/262144/5 4.294 GiB/sec 4.126 GiB/sec -3.911%
- FilterFSLInt64FilterNoNulls/262144/7 463.085 MiB/sec 378.577 MiB/sec -18.249%
FilterFSLInt64FilterWithNulls/262144/11 4.245 GiB/sec 4.061 GiB/sec -4.333%
FilterStringFilterNoNulls/262144/3 544.102 MiB/sec 542.846 MiB/sec -0.231%
- FilterInt64FilterWithNulls/262144/0 617.474 MiB/sec 560.813 MiB/sec -9.176%
FilterInt64FilterWithNulls/262144/7 619.732 MiB/sec 609.068 MiB/sec -1.721%
FilterStringFilterWithNulls/262144/13 91.185 MiB/sec 97.530 MiB/sec 6.958%
- FilterStringFilterWithNulls/262144/14 929.857 MiB/sec 874.512 MiB/sec -5.952%
- FilterInt64FilterWithNulls/262144/3 604.918 MiB/sec 560.882 MiB/sec -7.280%
- FilterFSLInt64FilterNoNulls/262144/4 514.014 MiB/sec 411.713 MiB/sec -19.902%
- FilterFSLInt64FilterWithNulls/262144/7 463.921 MiB/sec 417.320 MiB/sec -10.045%
- FilterFSLInt64FilterWithNulls/262144/12 267.697 MiB/sec 247.408 MiB/sec -7.579%
- FilterFSLInt64FilterNoNulls/262144/11 5.632 GiB/sec 4.533 GiB/sec -19.515%
- FilterStringFilterNoNulls/262144/13 90.578 MiB/sec 76.367 MiB/sec -15.690%
FilterInt64FilterNoNulls/262144/6 3.709 GiB/sec 3.680 GiB/sec -0.786%
FilterInt64FilterWithNulls/262144/5 5.115 GiB/sec 4.997 GiB/sec -2.309%
FilterInt64FilterWithNulls/262144/10 604.161 MiB/sec 607.760 MiB/sec 0.596%
- FilterFSLInt64FilterNoNulls/262144/6 389.763 MiB/sec 354.969 MiB/sec -8.927%
======================================= =============== ================ ======== |
|
So these "readability" improvements made performance worse so I'll revert them |
|
AMD64 Ubuntu 18.04 C++ Benchmark (#113048) builder has been succeeded. Revision: 54bb838 ======================================= =============== =============== ========
benchmark baseline contender change
======================================= =============== =============== ========
FilterStringFilterWithNulls/262144/9 395.928 MiB/sec 397.664 MiB/sec 0.439%
FilterInt64FilterWithNulls/262144/0 621.828 MiB/sec 613.884 MiB/sec -1.277%
FilterStringFilterWithNulls/262144/10 578.179 MiB/sec 577.449 MiB/sec -0.126%
FilterFSLInt64FilterWithNulls/262144/14 4.068 GiB/sec 4.018 GiB/sec -1.247%
FilterInt64FilterWithNulls/262144/13 604.515 MiB/sec 575.481 MiB/sec -4.803%
FilterFSLInt64FilterNoNulls/262144/13 350.875 MiB/sec 355.061 MiB/sec 1.193%
FilterStringFilterWithNulls/262144/0 441.188 MiB/sec 442.379 MiB/sec 0.270%
FilterInt64FilterWithNulls/262144/7 623.569 MiB/sec 594.423 MiB/sec -4.674%
FilterStringFilterWithNulls/262144/12 73.925 MiB/sec 73.930 MiB/sec 0.007%
FilterStringFilterNoNulls/262144/3 548.889 MiB/sec 548.269 MiB/sec -0.113%
FilterInt64FilterNoNulls/262144/0 7.942 GiB/sec 8.079 GiB/sec 1.727%
FilterInt64FilterNoNulls/262144/6 3.827 GiB/sec 3.725 GiB/sec -2.665%
FilterStringFilterWithNulls/262144/2 9.138 GiB/sec 9.205 GiB/sec 0.726%
FilterFSLInt64FilterWithNulls/262144/13 385.938 MiB/sec 370.599 MiB/sec -3.975%
FilterInt64FilterWithNulls/262144/9 549.281 MiB/sec 542.112 MiB/sec -1.305%
FilterInt64FilterWithNulls/262144/2 5.253 GiB/sec 5.047 GiB/sec -3.918%
FilterFSLInt64FilterNoNulls/262144/5 5.778 GiB/sec 5.676 GiB/sec -1.761%
FilterStringFilterNoNulls/262144/1 711.705 MiB/sec 697.941 MiB/sec -1.934%
FilterStringFilterNoNulls/262144/0 560.111 MiB/sec 560.315 MiB/sec 0.036%
FilterStringFilterWithNulls/262144/5 8.773 GiB/sec 8.976 GiB/sec 2.318%
FilterInt64FilterWithNulls/262144/11 4.863 GiB/sec 4.942 GiB/sec 1.631%
FilterFSLInt64FilterWithNulls/262144/11 4.145 GiB/sec 4.089 GiB/sec -1.362%
FilterInt64FilterNoNulls/262144/2 7.854 GiB/sec 7.609 GiB/sec -3.117%
FilterStringFilterNoNulls/262144/11 9.751 GiB/sec 9.565 GiB/sec -1.904%
FilterStringFilterNoNulls/262144/7 641.570 MiB/sec 650.710 MiB/sec 1.425%
FilterStringFilterWithNulls/262144/3 435.185 MiB/sec 436.932 MiB/sec 0.401%
FilterFSLInt64FilterNoNulls/262144/14 5.202 GiB/sec 5.302 GiB/sec 1.915%
FilterInt64FilterNoNulls/262144/4 674.907 MiB/sec 654.585 MiB/sec -3.011%
FilterInt64FilterNoNulls/262144/5 7.023 GiB/sec 6.971 GiB/sec -0.741%
FilterInt64FilterWithNulls/262144/12 548.203 MiB/sec 542.909 MiB/sec -0.966%
FilterFSLInt64FilterNoNulls/262144/10 387.772 MiB/sec 390.564 MiB/sec 0.720%
FilterInt64FilterWithNulls/262144/8 4.951 GiB/sec 5.094 GiB/sec 2.880%
FilterStringFilterNoNulls/262144/13 90.750 MiB/sec 91.694 MiB/sec 1.040%
FilterFSLInt64FilterWithNulls/262144/12 230.292 MiB/sec 263.113 MiB/sec 14.252%
FilterStringFilterNoNulls/262144/12 70.772 MiB/sec 70.740 MiB/sec -0.044%
FilterStringFilterWithNulls/262144/14 927.254 MiB/sec 925.791 MiB/sec -0.158%
FilterStringFilterNoNulls/262144/5 10.587 GiB/sec 10.322 GiB/sec -2.509%
FilterFSLInt64FilterNoNulls/262144/3 551.473 MiB/sec 556.816 MiB/sec 0.969%
FilterInt64FilterNoNulls/262144/14 6.302 GiB/sec 6.848 GiB/sec 8.656%
FilterInt64FilterWithNulls/262144/14 4.804 GiB/sec 4.945 GiB/sec 2.933%
FilterStringFilterNoNulls/262144/14 1.257 GiB/sec 1.247 GiB/sec -0.814%
FilterFSLInt64FilterNoNulls/262144/6 399.266 MiB/sec 402.455 MiB/sec 0.799%
FilterInt64FilterWithNulls/262144/5 5.037 GiB/sec 4.954 GiB/sec -1.645%
FilterFSLInt64FilterNoNulls/262144/8 5.576 GiB/sec 5.576 GiB/sec -0.004%
FilterFSLInt64FilterNoNulls/262144/7 462.231 MiB/sec 456.668 MiB/sec -1.203%
FilterFSLInt64FilterNoNulls/262144/11 5.377 GiB/sec 5.381 GiB/sec 0.082%
FilterStringFilterNoNulls/262144/6 487.645 MiB/sec 487.464 MiB/sec -0.037%
FilterStringFilterNoNulls/262144/4 687.214 MiB/sec 678.019 MiB/sec -1.338%
FilterFSLInt64FilterWithNulls/262144/9 287.916 MiB/sec 285.805 MiB/sec -0.733%
FilterInt64FilterNoNulls/262144/9 3.245 GiB/sec 3.126 GiB/sec -3.683%
FilterFSLInt64FilterWithNulls/262144/1 514.149 MiB/sec 501.235 MiB/sec -2.512%
FilterInt64FilterNoNulls/262144/11 6.304 GiB/sec 6.838 GiB/sec 8.471%
FilterInt64FilterWithNulls/262144/4 642.597 MiB/sec 617.492 MiB/sec -3.907%
FilterFSLInt64FilterNoNulls/262144/0 723.263 MiB/sec 719.475 MiB/sec -0.524%
FilterFSLInt64FilterWithNulls/262144/2 4.335 GiB/sec 4.281 GiB/sec -1.228%
FilterStringFilterWithNulls/262144/8 8.635 GiB/sec 8.847 GiB/sec 2.451%
FilterFSLInt64FilterWithNulls/262144/4 473.024 MiB/sec 457.711 MiB/sec -3.237%
FilterStringFilterWithNulls/262144/4 637.237 MiB/sec 646.187 MiB/sec 1.405%
FilterStringFilterWithNulls/262144/6 430.118 MiB/sec 433.059 MiB/sec 0.684%
FilterStringFilterNoNulls/262144/10 572.254 MiB/sec 573.892 MiB/sec 0.286%
FilterStringFilterWithNulls/262144/1 644.800 MiB/sec 644.056 MiB/sec -0.115%
FilterStringFilterWithNulls/262144/7 635.644 MiB/sec 640.796 MiB/sec 0.810%
FilterInt64FilterWithNulls/262144/6 581.863 MiB/sec 575.886 MiB/sec -1.027%
FilterFSLInt64FilterNoNulls/262144/4 513.508 MiB/sec 499.319 MiB/sec -2.763%
FilterInt64FilterNoNulls/262144/13 632.203 MiB/sec 613.689 MiB/sec -2.928%
FilterStringFilterNoNulls/262144/8 10.491 GiB/sec 10.181 GiB/sec -2.953%
FilterFSLInt64FilterNoNulls/262144/1 563.147 MiB/sec 540.663 MiB/sec -3.993%
FilterFSLInt64FilterNoNulls/262144/9 267.226 MiB/sec 269.194 MiB/sec 0.736%
FilterFSLInt64FilterWithNulls/262144/10 420.329 MiB/sec 405.197 MiB/sec -3.600%
- FilterInt64FilterNoNulls/262144/1 1.022 GiB/sec 922.850 MiB/sec -11.845%
FilterInt64FilterNoNulls/262144/7 652.709 MiB/sec 631.526 MiB/sec -3.245%
FilterStringFilterNoNulls/262144/2 11.144 GiB/sec 10.843 GiB/sec -2.698%
FilterStringFilterWithNulls/262144/13 91.231 MiB/sec 91.638 MiB/sec 0.446%
FilterInt64FilterNoNulls/262144/12 3.242 GiB/sec 3.112 GiB/sec -4.024%
FilterFSLInt64FilterNoNulls/262144/12 242.297 MiB/sec 242.607 MiB/sec 0.128%
FilterFSLInt64FilterNoNulls/262144/2 6.165 GiB/sec 6.062 GiB/sec -1.679%
FilterFSLInt64FilterWithNulls/262144/6 331.566 MiB/sec 332.386 MiB/sec 0.247%
FilterInt64FilterWithNulls/262144/1 648.702 MiB/sec 622.712 MiB/sec -4.006%
FilterFSLInt64FilterWithNulls/262144/5 4.123 GiB/sec 4.122 GiB/sec -0.014%
FilterFSLInt64FilterWithNulls/262144/0 399.262 MiB/sec 398.338 MiB/sec -0.231%
FilterFSLInt64FilterWithNulls/262144/3 347.643 MiB/sec 349.930 MiB/sec 0.658%
FilterInt64FilterNoNulls/262144/3 4.312 GiB/sec 4.291 GiB/sec -0.478%
FilterStringFilterWithNulls/262144/11 8.207 GiB/sec 8.348 GiB/sec 1.720%
FilterStringFilterNoNulls/262144/9 391.780 MiB/sec 391.367 MiB/sec -0.106%
FilterFSLInt64FilterWithNulls/262144/8 4.142 GiB/sec 4.103 GiB/sec -0.926%
FilterInt64FilterNoNulls/262144/8 6.703 GiB/sec 6.908 GiB/sec 3.063%
FilterInt64FilterWithNulls/262144/10 604.595 MiB/sec 575.671 MiB/sec -4.784%
FilterFSLInt64FilterWithNulls/262144/7 461.693 MiB/sec 447.411 MiB/sec -3.093%
FilterInt64FilterNoNulls/262144/10 632.128 MiB/sec 614.452 MiB/sec -2.796%
FilterInt64FilterWithNulls/262144/3 613.629 MiB/sec 607.939 MiB/sec -0.927%
======================================= =============== =============== ======== |
|
+1. Thanks all for the comments |
NOTE: the diff is artificially larger due to some code rearranging (that was necessitated because of how some data selection code is shared between the Take and Filter implementations).
Summary:
Some incidental changes:
compute::internal::GetTakeIndices. I have also altered the implementation of filtering a record batch to use this, which should be faster (it would be good to have some benchmarks to confirm this).