Enable fast decoding on Apple/AArch64 builds (18-25% faster decompression)#1040
Conversation
This makes decoding significantly faster on M1; measured on compressed source code across 8 hardware threads, decompressing 294 MB to 1301 MB takes 513 ms of cumulative work (2.53 GB/s) before, and 406 ms (3.2 GB/s) after this change on M1 Pro. There's no way to check if the target architecture is M1 specifically but the gains are likely to be similar on recent iterations on Apple processors, and the original performance issue was probably more specific to Qualcomm.
|
Well, hopefully, the condition |
|
I measured on my M1 MacBook Air, and see decompression speed go from 3.9 GB/s -> 4.6 GB/s.
This will also enable it for older iPhones, which ran on Qualcomm chips. I'm not sure how much we care about that. Or even if older iPhones had the mentioned performance issues. It may have been Android devices. |
|
I wouldn't be too worried about this - I don't think Apple ever used Qualcomm CPUs, but the last ARM Cortex CPU they used was Apple A5 (launched in 2011 and discontinued in 2016). It's hard to know for certain without measuring this but I wouldn't expect this to cause a regression on Apple's more recent hardware. |
|
Thanks @zeux, this looks like a good trade off |
This includes the M1 optimization PR: lz4/lz4#1040 As a result, qgrep bruteforce queries run 10-15% faster on M1 Pro.
This makes decoding significantly faster on M1; measured on compressed source
code, decompressing 294 MB to 1301 MB takes 513 ms (2.53 GB/s) before, and
406 ms (3.2 GB/s) after this change on M1 Pro.
There's no way to check if the target architecture is M1 specifically but the
gains are likely to be similar on recent iterations of Apple processors, and
the original performance issue was probably more specific to Qualcomm.