ARROW-8199: [C++] Add support for multi-column sort indices on Table#8612
ARROW-8199: [C++] Add support for multi-column sort indices on Table#8612kou wants to merge 37 commits intoapache:masterfrom
Conversation
cpp/src/arrow/compute/api_vector.cc
Outdated
There was a problem hiding this comment.
This hunk is for SortToIndices() -> SortIndices() rename.
There was a problem hiding this comment.
This change is for order support on array.
There was a problem hiding this comment.
This change is for order support on array.
There was a problem hiding this comment.
Array prefix is added to distinguish array and table related codes.
There was a problem hiding this comment.
Array prefix is added to distinguish array and table related codes.
There was a problem hiding this comment.
This change is for order support on array.
There was a problem hiding this comment.
This change is for order support on array.
There was a problem hiding this comment.
Array prefix is added to distinguish array and table related codes.
There was a problem hiding this comment.
For multi-column sorting on table.
There was a problem hiding this comment.
This hunk is for renaming SortToIndices() -> SortIndices(). No logic change.
6b58fbb to
904ee1f
Compare
589ad3d to
2da0ea7
Compare
|
Also update kernel document? |
|
Done. |
|
@kou I'm okay with this patch. Some random thoughts:
|
|
Thanks for your comments!
Each column in table may have the different number of chunks. For example,
It's interesting! I've added the idea to follow-up tasks in the pull request description. |
pitrou
left a comment
There was a problem hiding this comment.
I'm a bit surprised by the approach adopted here.
I would expect the following approach:
- sort(chunked array) sorts each chunk individually, then merges all of them (see
std::inplace_merge) - sort(table) sorts each sort column in reverse order
21fb667 to
a500146
Compare
I've implemented.
I wanted this a follow-up task but I've implemented this in this pull request. Because both of them mentions this approach. My implementation Summary:
Description: N records: 1048576 Narrow range: Almost elements may be same Wide range: Almost elements will be different N chunks: 4 Narrow range:
Wide range:
N chunks: 32 Narrow range:
Wide range:
Can we mark the radix sort approach improvement a follow-up task? Or should we optimize the radix sort approach in this pull request? |
There was a problem hiding this comment.
Can anyone suggest better name...?
2a1a396 to
86ce536
Compare
|
@pitrou Could you review this again? |
|
Thanks for the numbers. I didn't expect "first sort key" to be better, but I'm convinced now. |
Summary:
* Deprecate SortToIndices()
* Add SortIndices() because we use "sort_indices" as kernel name in
apache#7240
* Add support for sort indices in descending order on Array
* Rename existing "sort_indices" kernel to "array_sort_indices" and
introduce "sort_indices" meta function to support RecordBatch and
Table
Benchmark:
Summary:
* No performance regression in existing sort on array
* Same performance as sort on array when the number of sort keys is 1
* 1.5x-100x slower than sort on array when the number of sort keys is 2
Before:
----------------------------------------------------------------------------------
Benchmark Time CPU
----------------------------------------------------------------------------------
SortToIndicesInt64Count/32768/10000/min_time:1.000 15685 ns 15682 ns
SortToIndicesInt64Count/32768/100/min_time:1.000 15961 ns 15957 ns
SortToIndicesInt64Count/32768/10/min_time:1.000 16473 ns 16469 ns
SortToIndicesInt64Count/32768/2/min_time:1.000 27993 ns 27987 ns
SortToIndicesInt64Count/32768/1/min_time:1.000 5609 ns 5608 ns
SortToIndicesInt64Count/32768/0/min_time:1.000 13143 ns 13141 ns
SortToIndicesInt64Count/1048576/1/min_time:1.000 134695 ns 134670 ns
SortToIndicesInt64Count/8388608/1/min_time:1.000 1243992 ns 1243260 ns
SortToIndicesInt64Compare/32768/10000/min_time:1.000 193632 ns 193595 ns
SortToIndicesInt64Compare/32768/100/min_time:1.000 194876 ns 194837 ns
SortToIndicesInt64Compare/32768/10/min_time:1.000 182362 ns 182324 ns
SortToIndicesInt64Compare/32768/2/min_time:1.000 111607 ns 111584 ns
SortToIndicesInt64Compare/32768/1/min_time:1.000 5642 ns 5641 ns
SortToIndicesInt64Compare/32768/0/min_time:1.000 190302 ns 190268 ns
SortToIndicesInt64Compare/1048576/1/min_time:1.000 134743 ns 134718 ns
SortToIndicesInt64Compare/8388608/1/min_time:1.000 1261404 ns 1249362 ns
After:
-------------------------------------------------------------------------------------
Benchmark Time CPU
-------------------------------------------------------------------------------------
ArraySortIndicesInt64Count/32768/10000/min_time:1.000 14769 ns 14766 ns
ArraySortIndicesInt64Count/32768/100/min_time:1.000 15207 ns 15204 ns
ArraySortIndicesInt64Count/32768/10/min_time:1.000 15892 ns 15889 ns
ArraySortIndicesInt64Count/32768/2/min_time:1.000 30107 ns 30100 ns
ArraySortIndicesInt64Count/32768/1/min_time:1.000 5509 ns 5508 ns
ArraySortIndicesInt64Count/32768/0/min_time:1.000 12699 ns 12696 ns
ArraySortIndicesInt64Count/1048576/1/min_time:1.000 132585 ns 132557 ns
ArraySortIndicesInt64Count/8388608/1/min_time:1.000 1236596 ns 1235842 ns
ArraySortIndicesInt64Compare/32768/10000/min_time:1.000 193259 ns 193217 ns
ArraySortIndicesInt64Compare/32768/100/min_time:1.000 190010 ns 189973 ns
ArraySortIndicesInt64Compare/32768/10/min_time:1.000 179923 ns 179879 ns
ArraySortIndicesInt64Compare/32768/2/min_time:1.000 111100 ns 111074 ns
ArraySortIndicesInt64Compare/32768/1/min_time:1.000 5660 ns 5659 ns
ArraySortIndicesInt64Compare/32768/0/min_time:1.000 186521 ns 186476 ns
ArraySortIndicesInt64Compare/1048576/1/min_time:1.000 132863 ns 132832 ns
ArraySortIndicesInt64Compare/8388608/1/min_time:1.000 1266383 ns 1265765 ns
TableSortIndicesInt64Count/32768/10000/min_time:1.000 21812 ns 21807 ns
TableSortIndicesInt64Count/32768/100/min_time:1.000 22494 ns 22490 ns
TableSortIndicesInt64Count/32768/10/min_time:1.000 17300 ns 17296 ns
TableSortIndicesInt64Count/32768/2/min_time:1.000 29927 ns 29921 ns
TableSortIndicesInt64Count/32768/1/min_time:1.000 5877 ns 5875 ns
TableSortIndicesInt64Count/32768/0/min_time:1.000 20394 ns 20390 ns
TableSortIndicesInt64Count/1048576/1/min_time:1.000 132904 ns 132871 ns
TableSortIndicesInt64Count/8388608/1/min_time:1.000 1342693 ns 1341943 ns
TableSortIndicesInt64Compare/32768/10000/min_time:1.000 203163 ns 203106 ns
TableSortIndicesInt64Compare/32768/100/min_time:1.000 199531 ns 199477 ns
TableSortIndicesInt64Compare/32768/10/min_time:1.000 185968 ns 185916 ns
TableSortIndicesInt64Compare/32768/2/min_time:1.000 113571 ns 113540 ns
TableSortIndicesInt64Compare/32768/1/min_time:1.000 6251 ns 6249 ns
TableSortIndicesInt64Compare/32768/0/min_time:1.000 183650 ns 183609 ns
TableSortIndicesInt64Compare/1048576/1/min_time:1.000 131701 ns 131674 ns
TableSortIndicesInt64Compare/8388608/1/min_time:1.000 1264413 ns 1263622 ns
TableSortIndicesInt64Int64/32768/10000/min_time:1.000 313368 ns 313310 ns
TableSortIndicesInt64Int64/32768/100/min_time:1.000 313425 ns 313361 ns
TableSortIndicesInt64Int64/32768/10/min_time:1.000 337051 ns 336987 ns
TableSortIndicesInt64Int64/32768/2/min_time:1.000 402063 ns 401973 ns
TableSortIndicesInt64Int64/32768/1/min_time:1.000 254660 ns 254612 ns
TableSortIndicesInt64Int64/32768/0/min_time:1.000 305887 ns 305815 ns
TableSortIndicesInt64Int64/1048576/1/min_time:1.000 11157920 ns 11155020 ns
TableSortIndicesInt64Int64/8388608/1/min_time:1.000 106529839 ns 106501576 ns
Follow-up tasks:
* Improve performance when the number of sort keys is 2 or greater
* Idea1: Use counting sort for small range data like on array
* Idea2: Use threading and merge sort
* Add support multi-column partition Nth indices on Table
2020-11-18T15:27:12+09:00
Running release/arrow-compute-vector-sort-benchmark
Run on (24 X 3800 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x12)
L1 Instruction 32 KiB (x12)
L2 Unified 512 KiB (x12)
L3 Unified 16384 KiB (x4)
Load Average: 0.48, 0.50, 0.53
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
-----------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------
ArraySortIndicesInt64Narrow/32768/10000/min_time:1.000 15722 ns 15721 ns 82919 bytes_per_second=1.94115G/s items_per_second=260.537M/s null_percent=0.01 size=32.768k
ArraySortIndicesInt64Narrow/32768/100/min_time:1.000 16007 ns 16006 ns 89081 bytes_per_second=1.9066G/s items_per_second=255.899M/s null_percent=1 size=32.768k
ArraySortIndicesInt64Narrow/32768/10/min_time:1.000 15178 ns 15177 ns 91254 bytes_per_second=2.01071G/s items_per_second=269.873M/s null_percent=10 size=32.768k
ArraySortIndicesInt64Narrow/32768/2/min_time:1.000 29886 ns 29885 ns 47570 bytes_per_second=1045.67M/s items_per_second=137.058M/s null_percent=50 size=32.768k
ArraySortIndicesInt64Narrow/32768/1/min_time:1.000 5968 ns 5968 ns 235596 bytes_per_second=5.11372G/s items_per_second=686.352M/s null_percent=100 size=32.768k
ArraySortIndicesInt64Narrow/32768/0/min_time:1.000 12681 ns 12681 ns 108821 bytes_per_second=2.40665G/s items_per_second=323.015M/s null_percent=0 size=32.768k
ArraySortIndicesInt64Narrow/1048576/100/min_time:1.000 813376 ns 813331 ns 1777 bytes_per_second=1.2007G/s items_per_second=161.155M/s null_percent=1 size=1048.58k
ArraySortIndicesInt64Narrow/8388608/100/min_time:1.000 6543874 ns 6543621 ns 207 bytes_per_second=1.19391G/s items_per_second=160.244M/s null_percent=1 size=8.38861M
ArraySortIndicesInt64Wide/32768/10000/min_time:1.000 189957 ns 189956 ns 7344 bytes_per_second=164.511M/s items_per_second=21.5628M/s null_percent=0.01 size=32.768k
ArraySortIndicesInt64Wide/32768/100/min_time:1.000 190269 ns 190267 ns 7346 bytes_per_second=164.243M/s items_per_second=21.5277M/s null_percent=1 size=32.768k
ArraySortIndicesInt64Wide/32768/10/min_time:1.000 175425 ns 175422 ns 7952 bytes_per_second=178.142M/s items_per_second=23.3494M/s null_percent=10 size=32.768k
ArraySortIndicesInt64Wide/32768/2/min_time:1.000 107976 ns 107975 ns 12837 bytes_per_second=289.418M/s items_per_second=37.9346M/s null_percent=50 size=32.768k
ArraySortIndicesInt64Wide/32768/1/min_time:1.000 5941 ns 5941 ns 235078 bytes_per_second=5.13666G/s items_per_second=689.431M/s null_percent=100 size=32.768k
ArraySortIndicesInt64Wide/32768/0/min_time:1.000 184858 ns 184857 ns 7542 bytes_per_second=169.049M/s items_per_second=22.1576M/s null_percent=0 size=32.768k
ArraySortIndicesInt64Wide/1048576/100/min_time:1.000 9355194 ns 9354942 ns 150 bytes_per_second=106.895M/s items_per_second=14.011M/s null_percent=1 size=1048.58k
ArraySortIndicesInt64Wide/8388608/100/min_time:1.000 94101796 ns 94099551 ns 15 bytes_per_second=85.0163M/s items_per_second=11.1433M/s null_percent=1 size=8.38861M
TableSortIndicesInt64Narrow/1048576/100/16/32/min_time:1.000 1386231938 ns 1386176541 ns 1 items_per_second=756.452k/s
TableSortIndicesInt64Narrow/1048576/0/16/32/min_time:1.000 1350031141 ns 1349982623 ns 1 items_per_second=776.733k/s
TableSortIndicesInt64Narrow/1048576/100/8/32/min_time:1.000 1401741018 ns 1401632081 ns 1 items_per_second=748.111k/s
TableSortIndicesInt64Narrow/1048576/0/8/32/min_time:1.000 1373174328 ns 1373145968 ns 1 items_per_second=763.63k/s
TableSortIndicesInt64Narrow/1048576/100/2/32/min_time:1.000 1035377805 ns 1035362806 ns 1 items_per_second=1012.76k/s
TableSortIndicesInt64Narrow/1048576/0/2/32/min_time:1.000 1035218085 ns 1035201824 ns 1 items_per_second=1012.92k/s
TableSortIndicesInt64Narrow/1048576/100/1/32/min_time:1.000 6859319 ns 6859251 ns 201 items_per_second=152.87M/s
TableSortIndicesInt64Narrow/1048576/0/1/32/min_time:1.000 6847314 ns 6847048 ns 209 items_per_second=153.143M/s
TableSortIndicesInt64Narrow/1048576/100/16/4/min_time:1.000 626909516 ns 626904696 ns 2 items_per_second=1.67262M/s
TableSortIndicesInt64Narrow/1048576/0/16/4/min_time:1.000 591632035 ns 591602144 ns 2 items_per_second=1.77243M/s
TableSortIndicesInt64Narrow/1048576/100/8/4/min_time:1.000 613197155 ns 613182727 ns 2 items_per_second=1.71005M/s
TableSortIndicesInt64Narrow/1048576/0/8/4/min_time:1.000 595568690 ns 595554829 ns 2 items_per_second=1.76067M/s
TableSortIndicesInt64Narrow/1048576/100/2/4/min_time:1.000 424472984 ns 424453915 ns 3 items_per_second=2.47041M/s
TableSortIndicesInt64Narrow/1048576/0/2/4/min_time:1.000 397339109 ns 397335604 ns 4 items_per_second=2.63902M/s
TableSortIndicesInt64Narrow/1048576/100/1/4/min_time:1.000 7027310 ns 7027241 ns 195 items_per_second=149.216M/s
TableSortIndicesInt64Narrow/1048576/0/1/4/min_time:1.000 6891364 ns 6891272 ns 207 items_per_second=152.16M/s
TableSortIndicesInt64Narrow/1048576/100/16/1/min_time:1.000 516832749 ns 516823293 ns 3 items_per_second=2.02889M/s
TableSortIndicesInt64Narrow/1048576/0/16/1/min_time:1.000 495273237 ns 495269975 ns 3 items_per_second=2.11718M/s
TableSortIndicesInt64Narrow/1048576/100/8/1/min_time:1.000 515550360 ns 515531501 ns 3 items_per_second=2.03397M/s
TableSortIndicesInt64Narrow/1048576/0/8/1/min_time:1.000 496670125 ns 496663084 ns 3 items_per_second=2.11124M/s
TableSortIndicesInt64Narrow/1048576/100/2/1/min_time:1.000 340060863 ns 340053172 ns 4 items_per_second=3.08356M/s
TableSortIndicesInt64Narrow/1048576/0/2/1/min_time:1.000 331281499 ns 331277004 ns 4 items_per_second=3.16525M/s
TableSortIndicesInt64Narrow/1048576/100/1/1/min_time:1.000 6691062 ns 6690924 ns 212 items_per_second=156.716M/s
TableSortIndicesInt64Narrow/1048576/0/1/1/min_time:1.000 6384382 ns 6384323 ns 221 items_per_second=164.242M/s
TableSortIndicesInt64Wide/1048576/100/16/32/min_time:1.000 622544467 ns 622531557 ns 2 items_per_second=1.68437M/s
TableSortIndicesInt64Wide/1048576/0/16/32/min_time:1.000 619193843 ns 619155597 ns 2 items_per_second=1.69356M/s
TableSortIndicesInt64Wide/1048576/100/8/32/min_time:1.000 615885889 ns 615884764 ns 2 items_per_second=1.70255M/s
TableSortIndicesInt64Wide/1048576/0/8/32/min_time:1.000 589917731 ns 589912005 ns 2 items_per_second=1.77751M/s
TableSortIndicesInt64Wide/1048576/100/2/32/min_time:1.000 600951149 ns 600947256 ns 2 items_per_second=1.74487M/s
TableSortIndicesInt64Wide/1048576/0/2/32/min_time:1.000 592304437 ns 592286953 ns 2 items_per_second=1.77039M/s
TableSortIndicesInt64Wide/1048576/100/1/32/min_time:1.000 98781239 ns 98777123 ns 14 items_per_second=10.6156M/s
TableSortIndicesInt64Wide/1048576/0/1/32/min_time:1.000 94136230 ns 94085863 ns 15 items_per_second=11.1449M/s
TableSortIndicesInt64Wide/1048576/100/16/4/min_time:1.000 232323308 ns 232319347 ns 6 items_per_second=4.51351M/s
TableSortIndicesInt64Wide/1048576/0/16/4/min_time:1.000 223708791 ns 223700834 ns 6 items_per_second=4.6874M/s
TableSortIndicesInt64Wide/1048576/100/8/4/min_time:1.000 221975360 ns 221968035 ns 6 items_per_second=4.724M/s
TableSortIndicesInt64Wide/1048576/0/8/4/min_time:1.000 218214843 ns 218210306 ns 6 items_per_second=4.80535M/s
TableSortIndicesInt64Wide/1048576/100/2/4/min_time:1.000 224756430 ns 224745751 ns 6 items_per_second=4.66561M/s
TableSortIndicesInt64Wide/1048576/0/2/4/min_time:1.000 220809969 ns 220809044 ns 6 items_per_second=4.74879M/s
TableSortIndicesInt64Wide/1048576/100/1/4/min_time:1.000 96159427 ns 96156899 ns 14 items_per_second=10.9048M/s
TableSortIndicesInt64Wide/1048576/0/1/4/min_time:1.000 92307749 ns 92304526 ns 15 items_per_second=11.36M/s
TableSortIndicesInt64Wide/1048576/100/16/1/min_time:1.000 166390841 ns 166382427 ns 8 items_per_second=6.3022M/s
TableSortIndicesInt64Wide/1048576/0/16/1/min_time:1.000 164576492 ns 164570581 ns 9 items_per_second=6.37159M/s
TableSortIndicesInt64Wide/1048576/100/8/1/min_time:1.000 165724919 ns 165718584 ns 8 items_per_second=6.32745M/s
TableSortIndicesInt64Wide/1048576/0/8/1/min_time:1.000 164048003 ns 164046222 ns 8 items_per_second=6.39195M/s
TableSortIndicesInt64Wide/1048576/100/2/1/min_time:1.000 170131788 ns 170124950 ns 8 items_per_second=6.16356M/s
TableSortIndicesInt64Wide/1048576/0/2/1/min_time:1.000 170874129 ns 170871245 ns 8 items_per_second=6.13664M/s
TableSortIndicesInt64Wide/1048576/100/1/1/min_time:1.000 92314674 ns 92312953 ns 15 items_per_second=11.3589M/s
TableSortIndicesInt64Wide/1048576/0/1/1/min_time:1.000 92023019 ns 92022643 ns 15 items_per_second=11.3948M/s
Chunked array sorter implementation is naive.
Radix sort based table sorter is slower than the existing table
sorter. So this is disabled for now.
Benchmark Time CPU
------------------------------------------------------------------------------------------
ArraySortIndicesInt64Narrow/32768/10000/min_time:1.000 16481 ns 16481 ns
ArraySortIndicesInt64Narrow/32768/100/min_time:1.000 16782 ns 16782 ns
ArraySortIndicesInt64Narrow/32768/10/min_time:1.000 16604 ns 16604 ns
ArraySortIndicesInt64Narrow/32768/2/min_time:1.000 28438 ns 28438 ns
ArraySortIndicesInt64Narrow/32768/1/min_time:1.000 6064 ns 6064 ns
ArraySortIndicesInt64Narrow/32768/0/min_time:1.000 13949 ns 13949 ns
ArraySortIndicesInt64Narrow/1048576/100/min_time:1.000 739131 ns 739116 ns
ArraySortIndicesInt64Narrow/8388608/100/min_time:1.000 7270177 ns 7270007 ns
ArraySortIndicesInt64Wide/32768/10000/min_time:1.000 197500 ns 197496 ns
ArraySortIndicesInt64Wide/32768/100/min_time:1.000 198485 ns 198481 ns
ArraySortIndicesInt64Wide/32768/10/min_time:1.000 184462 ns 184460 ns
ArraySortIndicesInt64Wide/32768/2/min_time:1.000 114283 ns 114282 ns
ArraySortIndicesInt64Wide/32768/1/min_time:1.000 6117 ns 6116 ns
ArraySortIndicesInt64Wide/32768/0/min_time:1.000 194275 ns 194267 ns
ArraySortIndicesInt64Wide/1048576/100/min_time:1.000 9688010 ns 9687762 ns
ArraySortIndicesInt64Wide/8388608/100/min_time:1.000 98482097 ns 98480482 ns
TableSortIndicesInt64Narrow/1048576/100/16/32/min_time:1.000 13674075901 ns 13673756372
TableSortIndicesInt64Narrow/1048576/0/16/32/min_time:1.000 13562068073 ns 13561526021
TableSortIndicesInt64Narrow/1048576/100/8/32/min_time:1.000 6816877843 ns 6816737232 ns
TableSortIndicesInt64Narrow/1048576/0/8/32/min_time:1.000 6513599980 ns 6513470119 ns
TableSortIndicesInt64Narrow/1048576/100/2/32/min_time:1.000 1297496469 ns 1297401147 ns
TableSortIndicesInt64Narrow/1048576/0/2/32/min_time:1.000 1198835057 ns 1198795471 ns
TableSortIndicesInt64Narrow/1048576/100/1/32/min_time:1.000 512438066 ns 512427916 ns
TableSortIndicesInt64Narrow/1048576/0/1/32/min_time:1.000 406987001 ns 406983020 ns
TableSortIndicesInt64Narrow/1048576/100/16/4/min_time:1.000 4782555568 ns 4782501462 ns
TableSortIndicesInt64Narrow/1048576/0/16/4/min_time:1.000 4627874931 ns 4627776462 ns
TableSortIndicesInt64Narrow/1048576/100/8/4/min_time:1.000 2347109491 ns 2347030723 ns
TableSortIndicesInt64Narrow/1048576/0/8/4/min_time:1.000 2353793768 ns 2353762906 ns
TableSortIndicesInt64Narrow/1048576/100/2/4/min_time:1.000 329895435 ns 329878554 ns
TableSortIndicesInt64Narrow/1048576/0/2/4/min_time:1.000 318651531 ns 318643264 ns
TableSortIndicesInt64Narrow/1048576/100/1/4/min_time:1.000 27022014 ns 27020904 ns
TableSortIndicesInt64Narrow/1048576/0/1/4/min_time:1.000 21620126 ns 21619471 ns
TableSortIndicesInt64Narrow/1048576/100/16/1/min_time:1.000 2082161247 ns 2082123210 ns
TableSortIndicesInt64Narrow/1048576/0/16/1/min_time:1.000 2125744136 ns 2125688346 ns
TableSortIndicesInt64Narrow/1048576/100/8/1/min_time:1.000 1054223608 ns 1054187702 ns
TableSortIndicesInt64Narrow/1048576/0/8/1/min_time:1.000 1039334371 ns 1039300540 ns
TableSortIndicesInt64Narrow/1048576/100/2/1/min_time:1.000 226713390 ns 226708185 ns
TableSortIndicesInt64Narrow/1048576/0/2/1/min_time:1.000 232277655 ns 232265803 ns
TableSortIndicesInt64Narrow/1048576/100/1/1/min_time:1.000 7317963 ns 7317641 ns
TableSortIndicesInt64Narrow/1048576/0/1/1/min_time:1.000 6936323 ns 6935778 ns
TableSortIndicesInt64Wide/1048576/100/16/32/min_time:1.000 15478483751 ns 15477778551 ns
TableSortIndicesInt64Wide/1048576/0/16/32/min_time:1.000 15323045709 ns 15322716370 ns
TableSortIndicesInt64Wide/1048576/100/8/32/min_time:1.000 7600326817 ns 7600060885 ns
TableSortIndicesInt64Wide/1048576/0/8/32/min_time:1.000 7305830182 ns 7305653322 ns
TableSortIndicesInt64Wide/1048576/100/2/32/min_time:1.000 1603439764 ns 1603419271 ns
TableSortIndicesInt64Wide/1048576/0/2/32/min_time:1.000 1585765386 ns 1585720827 ns
TableSortIndicesInt64Wide/1048576/100/1/32/min_time:1.000 848231188 ns 848213371 ns
TableSortIndicesInt64Wide/1048576/0/1/32/min_time:1.000 599883956 ns 599827161 ns
TableSortIndicesInt64Wide/1048576/100/16/4/min_time:1.000 6105580749 ns 6105405315 ns
TableSortIndicesInt64Wide/1048576/0/16/4/min_time:1.000 6101922415 ns 6092946808 ns
TableSortIndicesInt64Wide/1048576/100/8/4/min_time:1.000 3016793628 ns 3013968137 ns
TableSortIndicesInt64Wide/1048576/0/8/4/min_time:1.000 2989364923 ns 2989345424 ns
TableSortIndicesInt64Wide/1048576/100/2/4/min_time:1.000 595413434 ns 595398793 ns
TableSortIndicesInt64Wide/1048576/0/2/4/min_time:1.000 593904971 ns 593872005 ns
TableSortIndicesInt64Wide/1048576/100/1/4/min_time:1.000 126583175 ns 126578734 ns
TableSortIndicesInt64Wide/1048576/0/1/4/min_time:1.000 115004382 ns 115003229 ns
TableSortIndicesInt64Wide/1048576/100/16/1/min_time:1.000 3013364645 ns 3013319384 ns
TableSortIndicesInt64Wide/1048576/0/16/1/min_time:1.000 3032564737 ns 3032430757 ns
TableSortIndicesInt64Wide/1048576/100/8/1/min_time:1.000 1457310735 ns 1457286492 ns
TableSortIndicesInt64Wide/1048576/0/8/1/min_time:1.000 1465821557 ns 1465809354 ns
TableSortIndicesInt64Wide/1048576/100/2/1/min_time:1.000 335816200 ns 335812614 ns
TableSortIndicesInt64Wide/1048576/0/2/1/min_time:1.000 330472127 ns 330467667 ns
TableSortIndicesInt64Wide/1048576/100/1/1/min_time:1.000 103267823 ns 103262409 ns
TableSortIndicesInt64Wide/1048576/0/1/1/min_time:1.000 102048116 ns 102044125 ns
Diff:
Range/N records/Inverse null probability/N sort keys/N chunks
Narrow/1048576/100/16/32/min_time:1.000
FromTheBeginning: 1386231938 ns
Radix: 13674075901 ns
Narrow/1048576/0/16/32/min_time:1.000
FromTheBeginning: 1350031141 ns
Radix: 13562068073 ns
Narrow/1048576/100/8/32/min_time:1.000
FromTheBeginning: 1401741018 ns
Radix: 6816877843 ns
Narrow/1048576/0/8/32/min_time:1.000
FromTheBeginning: 1373174328 ns
Radix: 6513599980 ns
Narrow/1048576/100/2/32/min_time:1.000
FromTheBeginning: 1035377805 ns
Radix: 1297496469 ns
Narrow/1048576/0/2/32/min_time:1.000
FromTheBeginning: 1035218085 ns
Radix: 1198835057 ns
Narrow/1048576/100/1/32/min_time:1.000
FromTheBeginning: 6859319 ns
Radix: 512438066 ns
Narrow/1048576/0/1/32/min_time:1.000
FromTheBeginning: 6847314 ns
Radix: 406987001 ns
Narrow/1048576/100/16/4/min_time:1.000
FromTheBeginning: 626909516 ns
Radix: 4782555568 ns
Narrow/1048576/0/16/4/min_time:1.000
FromTheBeginning: 591632035 ns
Radix: 4627874931 ns
Narrow/1048576/100/8/4/min_time:1.000
FromTheBeginning: 613197155 ns
Radix: 2347109491 ns
Narrow/1048576/0/8/4/min_time:1.000
FromTheBeginning: 595568690 ns
Radix: 2353793768 ns
Narrow/1048576/100/2/4/min_time:1.000
FromTheBeginning: 424472984 ns
Radix: 329895435 ns
Narrow/1048576/0/2/4/min_time:1.000
FromTheBeginning: 397339109 ns
Radix: 318651531 ns
Narrow/1048576/100/1/4/min_time:1.000
FromTheBeginning: 7027310 ns
Radix: 27022014 ns
Narrow/1048576/0/1/4/min_time:1.000
FromTheBeginning: 6891364 ns
Radix: 21620126 ns
Narrow/1048576/100/16/1/min_time:1.000
FromTheBeginning: 516832749 ns
Radix: 2082161247 ns
Narrow/1048576/0/16/1/min_time:1.000
FromTheBeginning: 495273237 ns
Radix: 2125744136 ns
Narrow/1048576/100/8/1/min_time:1.000
FromTheBeginning: 515550360 ns
Radix: 1054223608 ns
Narrow/1048576/0/8/1/min_time:1.000
FromTheBeginning: 496670125 ns
Radix: 1039334371 ns
Narrow/1048576/100/2/1/min_time:1.000
FromTheBeginning: 340060863 ns
Radix: 226713390 ns
Narrow/1048576/0/2/1/min_time:1.000
FromTheBeginning: 331281499 ns
Radix: 232277655 ns
Narrow/1048576/100/1/1/min_time:1.000
FromTheBeginning: 6691062 ns
Radix: 7317963 ns
Narrow/1048576/0/1/1/min_time:1.000
FromTheBeginning: 6384382 ns
Radix: 6936323 ns
Wide/1048576/100/16/32/min_time:1.000
FromTheBeginning: 622544467 ns
Radix: 15478483751 ns
Wide/1048576/0/16/32/min_time:1.000
FromTheBeginning: 619193843 ns
Radix: 15323045709 ns
Wide/1048576/100/8/32/min_time:1.000
FromTheBeginning: 615885889 ns
Radix: 7600326817 ns
Wide/1048576/0/8/32/min_time:1.000
FromTheBeginning: 589917731 ns
Radix: 7305830182 ns
Wide/1048576/100/2/32/min_time:1.000
FromTheBeginning: 600951149 ns
Radix: 1603439764 ns
Wide/1048576/0/2/32/min_time:1.000
FromTheBeginning: 592304437 ns
Radix: 1585765386 ns
Wide/1048576/100/1/32/min_time:1.000
FromTheBeginning: 98781239 ns
Radix: 848231188 ns
Wide/1048576/0/1/32/min_time:1.000
FromTheBeginning: 94136230 ns
Radix: 599883956 ns
Wide/1048576/100/16/4/min_time:1.000
FromTheBeginning: 232323308 ns
Radix: 6105580749 ns
Wide/1048576/0/16/4/min_time:1.000
FromTheBeginning: 223708791 ns
Radix: 6101922415 ns
Wide/1048576/100/8/4/min_time:1.000
FromTheBeginning: 221975360 ns
Radix: 3016793628 ns
Wide/1048576/0/8/4/min_time:1.000
FromTheBeginning: 218214843 ns
Radix: 2989364923 ns
Wide/1048576/100/2/4/min_time:1.000
FromTheBeginning: 224756430 ns
Radix: 595413434 ns
Wide/1048576/0/2/4/min_time:1.000
FromTheBeginning: 220809969 ns
Radix: 593904971 ns
Wide/1048576/100/1/4/min_time:1.000
FromTheBeginning: 96159427 ns
Radix: 126583175 ns
Wide/1048576/0/1/4/min_time:1.000
FromTheBeginning: 92307749 ns
Radix: 115004382 ns
Wide/1048576/100/16/1/min_time:1.000
FromTheBeginning: 166390841 ns
Radix: 3013364645 ns
Wide/1048576/0/16/1/min_time:1.000
FromTheBeginning: 164576492 ns
Radix: 3032564737 ns
Wide/1048576/100/8/1/min_time:1.000
FromTheBeginning: 165724919 ns
Radix: 1457310735 ns
Wide/1048576/0/8/1/min_time:1.000
FromTheBeginning: 164048003 ns
Radix: 1465821557 ns
Wide/1048576/100/2/1/min_time:1.000
FromTheBeginning: 170131788 ns
Radix: 335816200 ns
Wide/1048576/0/2/1/min_time:1.000
FromTheBeginning: 170874129 ns
Radix: 330472127 ns
Wide/1048576/100/1/1/min_time:1.000
FromTheBeginning: 92314674 ns
Radix: 103267823 ns
Wide/1048576/0/1/1/min_time:1.000
FromTheBeginning: 92023019 ns
Radix: 102048116 ns
Add some chunked array sorting benchmarks
e50e471 to
fecd557
Compare
Summary:
Deprecate SortToIndices()
Add SortIndices() because we use "sort_indices" as kernel name in
ARROW-8792: [C++][Python][R][GLib] New Array compute kernels implementation and execution framework #7240
Add support for sort indices in descending order on Array
Rename existing "sort_indices" kernel to "array_sort_indices" and
introduce "sort_indices" meta function to support ChunkedArray,
RecordBatch and Table
Require benchmark 1.5.2 or later
Benchmark:
Summary:
No performance regression in existing sort on array
Same performance as sort on array when the number of sort keys is 1
1.5x-100x slower than sort on array when the number of sort keys is 2
Before:
After:
Follow-up tasks:
Improve performance when the number of sort keys is 2 or greater
Idea1: Use counting sort for small range data like on array
Idea2: Use threading for radix sort
Add support for multi-column partition Nth indices on Table