ARROW-10058: [C++] Improve repeated levels conversion without BMI2#8320
ARROW-10058: [C++] Improve repeated levels conversion without BMI2#8320pitrou wants to merge 1 commit intoapache:masterfrom
Conversation
|
See JIRA issue for benchmarks. Would be nice to have benchmarks on other machines. @emkornfield |
|
I also notice that we call |
|
@pitrou I'm devoting most of my bandwidth to try to finish up the parquet read component this week, is it ok if I take a closer look next week (hopefully with enough time before an RC is cut?) |
yeah it isn't ideal, it is possible there is a better factoring in there but it seemed hard to do and isolate BMI2 special instructions, I guess if this isn't too much slower then BMI2 on intel we could potentially collapse everything, but I would not expect that to be the case. |
No problem.
Right. The emulation is probably much slower. |
Use a lookup table to emulate PEXT 5 bits at a time. Remove the slow scalar path.
|
Updated benchmarks on AMD Ryzen: |
cd01f19 to
482797c
Compare
|
sorry some personal issues came up. hope to have time tonight to review this and other parquet related CLs |
|
For the record, if I profile
And |
|
+1. Thanks. |
Use a lookup table to emulate PEXT 5 bits at a time.
Remove the slow scalar path.