Optimize bitcount command by SIMD#1741
Conversation
4ba7902 to
8da4334
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #1741 +/- ##
=========================================
Coverage 71.08% 71.08%
=========================================
Files 123 123
Lines 65552 65583 +31
=========================================
+ Hits 46596 46622 +26
- Misses 18956 18961 +5
|
c2b2c1f to
0eedd92
Compare
|
Can you add tests to make sure the same results are achieved, with, and without AVX? |
|
@xbasel Sure! If there is anything I can help, please let me know. |
there was a typo :), I asked whether you can add tests (unless you think the existing tests are enough) |
No problem, I will review the tests to ensure they are enough and get back to you. |
9b9a8aa to
b420ef4
Compare
Test code to check scalar/vector result consistency has been committed. |
a5dc330 to
1cdeebe
Compare
2a05fef to
a3a31cf
Compare
zuiderkwast
left a comment
There was a problem hiding this comment.
This is very good. Just some minor comments.
First I didn't understand anything of this AVX2 code, but I found the paper and I realized this was not invented in this PR. 😄
aaa211a to
16fd90a
Compare
Optimize bitcount command by SIMD Signed-off-by: chzhoo <[email protected]>
Fix complie warning Signed-off-by: chzhoo <[email protected]>
Forget include of stdint.h Signed-off-by: chzhoo <[email protected]>
Optimize unit test error messages Signed-off-by: chzhoo <[email protected]>
Add citations to the original author's work Signed-off-by: chzhoo <[email protected]>
097a241 to
114b338
Compare
The unit test was added in #1741. It fails when compiled with UBSAN. Using a local array like `char buf[size]` is undefined behaviour when `size == 0`. This fix just makes it size 1 in this case. The failure I got locally: unit/test_bitops.c:28:13: runtime error: variable length array bound evaluates to non-positive value 0 Signed-off-by: Viktor Söderqvist <[email protected]>
In a recent PR valkey-io#1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, it fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. Signed-off-by: Ricardo Dias <[email protected]>
In a recent PR #1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
**Background** - Currently, we implement bitcount using a lookup table method - By SIMD, parallel table lookups can be achieved, which boosts performance - Most x86 servers support the AVX2 instruction set **BenchMark** | Value Size | QPS (After optimization) | QPS (Before optimization) | change | | ---- | ---- | ---- | ---- | |16 B | 114925| 115924 | -0.8%| |256 B| 112619 | 112201| +0.3%| |4 KB| 105523|96251| +9.6%| |64 KB|79723|36796| +116%| |1MB|21306|3466|+514%| CPU: AMD EPYC 9754 128-Core Processor * 8 OS: Ubuntu Server 22.04 LTS 64bit Memory: 16GB VM: Tencent cloud SA5.2XLARGE16 **Test Plan** Pending. Will add test if it looks okay **Other** This PR is based on https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp --------- Signed-off-by: chzhoo <[email protected]>
The unit test was added in #1741. It fails when compiled with UBSAN. Using a local array like `char buf[size]` is undefined behaviour when `size == 0`. This fix just makes it size 1 in this case. The failure I got locally: unit/test_bitops.c:28:13: runtime error: variable length array bound evaluates to non-positive value 0 Signed-off-by: Viktor Söderqvist <[email protected]>
In a recent PR #1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
**Background** - Currently, we implement bitcount using a lookup table method - By SIMD, parallel table lookups can be achieved, which boosts performance - Most x86 servers support the AVX2 instruction set **BenchMark** | Value Size | QPS (After optimization) | QPS (Before optimization) | change | | ---- | ---- | ---- | ---- | |16 B | 114925| 115924 | -0.8%| |256 B| 112619 | 112201| +0.3%| |4 KB| 105523|96251| +9.6%| |64 KB|79723|36796| +116%| |1MB|21306|3466|+514%| CPU: AMD EPYC 9754 128-Core Processor * 8 OS: Ubuntu Server 22.04 LTS 64bit Memory: 16GB VM: Tencent cloud SA5.2XLARGE16 **Test Plan** Pending. Will add test if it looks okay **Other** This PR is based on https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp --------- Signed-off-by: chzhoo <[email protected]>
The unit test was added in valkey-io#1741. It fails when compiled with UBSAN. Using a local array like `char buf[size]` is undefined behaviour when `size == 0`. This fix just makes it size 1 in this case. The failure I got locally: unit/test_bitops.c:28:13: runtime error: variable length array bound evaluates to non-positive value 0 Signed-off-by: Viktor Söderqvist <[email protected]>
In a recent PR valkey-io#1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
**Background** - Currently, we implement bitcount using a lookup table method - By SIMD, parallel table lookups can be achieved, which boosts performance - Most x86 servers support the AVX2 instruction set **BenchMark** | Value Size | QPS (After optimization) | QPS (Before optimization) | change | | ---- | ---- | ---- | ---- | |16 B | 114925| 115924 | -0.8%| |256 B| 112619 | 112201| +0.3%| |4 KB| 105523|96251| +9.6%| |64 KB|79723|36796| +116%| |1MB|21306|3466|+514%| CPU: AMD EPYC 9754 128-Core Processor * 8 OS: Ubuntu Server 22.04 LTS 64bit Memory: 16GB VM: Tencent cloud SA5.2XLARGE16 **Test Plan** Pending. Will add test if it looks okay **Other** This PR is based on https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp --------- Signed-off-by: chzhoo <[email protected]>
The unit test was added in valkey-io#1741. It fails when compiled with UBSAN. Using a local array like `char buf[size]` is undefined behaviour when `size == 0`. This fix just makes it size 1 in this case. The failure I got locally: unit/test_bitops.c:28:13: runtime error: variable length array bound evaluates to non-positive value 0 Signed-off-by: Viktor Söderqvist <[email protected]>
In a recent PR valkey-io#1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
In a recent PR valkey-io#1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]> Signed-off-by: Shai Zarka <[email protected]>
**Background** - Currently, we implement bitcount using a lookup table method - By SIMD, parallel table lookups can be achieved, which boosts performance - Most x86 servers support the AVX2 instruction set **BenchMark** | Value Size | QPS (After optimization) | QPS (Before optimization) | change | | ---- | ---- | ---- | ---- | |16 B | 114925| 115924 | -0.8%| |256 B| 112619 | 112201| +0.3%| |4 KB| 105523|96251| +9.6%| |64 KB|79723|36796| +116%| |1MB|21306|3466|+514%| CPU: AMD EPYC 9754 128-Core Processor * 8 OS: Ubuntu Server 22.04 LTS 64bit Memory: 16GB VM: Tencent cloud SA5.2XLARGE16 **Test Plan** Pending. Will add test if it looks okay **Other** This PR is based on https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp --------- Signed-off-by: chzhoo <[email protected]>
The unit test was added in valkey-io#1741. It fails when compiled with UBSAN. Using a local array like `char buf[size]` is undefined behaviour when `size == 0`. This fix just makes it size 1 in this case. The failure I got locally: unit/test_bitops.c:28:13: runtime error: variable length array bound evaluates to non-positive value 0 Signed-off-by: Viktor Söderqvist <[email protected]>
In a recent PR valkey-io#1741 the new header `<immintrin.h>` was added, which transitively includes `<mm_malloc.h>` header, where a function called `_mm_malloc(...)` makes a call to the `malloc` function. The Valkey server code explicitly sets the malloc function as a deprecated function in `server.h`: ```c void *malloc(size_t size) __attribute__((deprecated)); ``` The Valkey server code is then compiled with `-Werror=deprecated-declarations` option to detect the uses of deprecated functions like `malloc`, and due to this, when the `bitops.c` file is compiled with Clang, fails with the following error: ``` In file included from bitops.c:33: In file included from /usr/lib/llvm-18/lib/clang/18/include/immintrin.h:26: In file included from /usr/lib/llvm-18/lib/clang/18/include/xmmintrin.h:31: /usr/lib/llvm-18/lib/clang/18/include/mm_malloc.h:35:12: error: 'malloc' is deprecated [-Werror,-Wdeprecated-declarations] 35 | return malloc(__size); | ^ ./server.h:3874:42: note: 'malloc' has been explicitly marked deprecated here 3874 | void *malloc(size_t size) __attribute__((deprecated)); ``` There is a difference in behavior though, between GCC and Clang. The `bitops.c` file compiles successfully with GCC. I don't know exactly why GCC does not issue a warning in this case. My best guess is that GCC does not issue warnings from code of the standard library. To fix the build error in clang, we explicitly use `pragma` macro to tell clang to ignore deprecated declarations warnings in `bitops.c`. --------- Signed-off-by: Ricardo Dias <[email protected]> Signed-off-by: Ricardo Dias <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
Background
BenchMark
CPU: AMD EPYC 9754 128-Core Processor * 8
OS: Ubuntu Server 22.04 LTS 64bit
Memory: 16GB
VM: Tencent cloud SA5.2XLARGE16
Test Plan
Pending. Will add test if it looks okay
Other
This PR is based on https://github.com/WojciechMula/sse-popcount/blob/master/popcnt-avx2-lookup.cpp