detect intrinsics: ensure intrinsics are not optimized out before assembler is run#1562
Conversation
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## develop #1562 +/- ##
========================================
Coverage 83.87% 83.88%
========================================
Files 132 132
Lines 10843 10843
Branches 2801 2801
========================================
+ Hits 9095 9096 +1
+ Misses 1049 1048 -1
Partials 699 699 Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
Are checks in CMake (https://github.com/zlib-ng/zlib-ng/blob/develop/cmake/detect-intrinsics.cmake) also affected? |
|
Looks like the CMake version does not need patching, will check why that is. Probably they don't set optimization flags during tests. Edit: yeah, that is the case. I think it would be good anyways to keep the tests in sync? And given that certain (hardware vendor) compilers like to optimize by default, I think that's generally an improvement. |
| (void)c; | ||
| return 0; | ||
| }" | ||
| #include <wmmintrin.h> |
There was a problem hiding this comment.
This include was added in configure but missing in cmake? 🤔
Here's another fun false positive issue, which suggests to run tests with optimizations enabledUsing GCC 12, this: // test-1.c
#include <wmmintrin.h>
int main() {
__m128i a, b;
_mm_clmulepi64_si128(a, b, 0x10);
}compiles fine without any flags: but when you turn optimization flags on it errors: On the other hand, when defined outside of main: // test-2.c
#include <wmmintrin.h>
__m128i f(__m128i a, __m128i b) { return _mm_clmulepi64_si128(a, b, 0x10); }
int main() { return 0; }it fails with and without optimization flags: Edit: the configure / cmake test was correct, the difference is storing the result in |
|
I've confirmed that this addresses the issues reported in #1559 |
|
LGTM, but I am awaiting code review by others. |
nmoinvaz
left a comment
There was a problem hiding this comment.
I tested this PR locally against MSVC and it detected AVX2, AVX512, SSE42, SSE2, SSSE3, PCLMULQDQ, VPCLMULQDQ, and XSAVE intrinsics.
I'm surprised about the compiler not ignoring f() function since it is not used anywhere. I'm not sure if those changes are okay, because we have check_c_source_compile_or_run. The instructions should be executed when WITH_NATIVE_INSTRUCTIONS=ON.
It's because it's not |
Does building with CMake option |
I think that only marks the symbol as such (for shared linking), the symbol is still added to the binary. Another way to symbols get removed is |
|
Regarding native instructions, why don't you trust |
|
Asking cause it would simplify my PR 😆 |
|
There are "processors" that implement certain instructions incorrectly and running the executable will make sure we catch any crash issues. |
|
Aren't tests better at catching that issue? |
That is a very good question. Not running them would be faster and simpler. Potentially this could be used to compile only the optimized functions that actually run on the current cpu when using a compiler that does not support If you want to attempt to change this, I think it would be best if that was a separate commit/PR, so it is easier to bisect/revert. |
|
This is the commit 0b8418e where the change was made to use
It seems like they could have just use I am infavor, of having a separate commit that removes Also AFAIK, |
On RHEL9 the GCC is new enough to support AVX512-VNNI, but its assembler (binutils) is not and errors with ``` Error: unsupported instruction vpdpbusd ``` This was already addressed earlier in zlib-ng#1562 to some extent, except that a check for `_mm256_dpbusd_epi32` was not added, which is what the assembler errors over.
On RHEL9 the GCC is new enough to support AVX512-VNNI, but its assembler (binutils) is not and errors with ``` Error: unsupported instruction vpdpbusd ``` This was already addressed earlier in #1562 to some extent, except that a check for `_mm256_dpbusd_epi32` was not added, which is what the assembler errors over.
On RHEL9 the GCC is new enough to support AVX512-VNNI, but its assembler (binutils) is not and errors with ``` Error: unsupported instruction vpdpbusd ``` This was already addressed earlier in #1562 to some extent, except that a check for `_mm256_dpbusd_epi32` was not added, which is what the assembler errors over.
On RHEL9 the GCC is new enough to support AVX512-VNNI, but its assembler (binutils) is not and errors with ``` Error: unsupported instruction vpdpbusd ``` This was already addressed earlier in #1562 to some extent, except that a check for `_mm256_dpbusd_epi32` was not added, which is what the assembler errors over.
Closes #1559
Old binutils may have an assembler that does not support vector instructions
generated by
gcc.Although it's recommended to use an up-to-date assembler with gcc, it's happens
in practice that new gcc and old binutils are mixed.
The problem with the current configure script is that it runs tests with
-O2cflags, which causes the test inside
main() { ... }to be optimized outentirely, and the assembler won't see any of the vector instructions, leading
to false positives in tests.
This PR ensures that most configure tests use a separate function, taking and
returning some
__mm*type, to avoid optimizations, so that the assembler willerror on seeing unsupported instructions.
Alternatively,
-O0could be passed, but I don't think that is alwayssufficient, since some compilers cannot not optimize ;p I believe nvhpc is
one of those.