detect intrinsics: ensure intrinsics are not optimized out before assembler is run by haampie · Pull Request #1562 · zlib-ng/zlib-ng

haampie · 2023-08-21T09:17:08Z

Old binutils may have an assembler that does not support vector instructions
generated by gcc.

Although it's recommended to use an up-to-date assembler with gcc, it's happens
in practice that new gcc and old binutils are mixed.

The problem with the current configure script is that it runs tests with -O2
cflags, which causes the test inside main() { ... } to be optimized out
entirely, and the assembler won't see any of the vector instructions, leading
to false positives in tests.

This PR ensures that most configure tests use a separate function, taking and
returning some __mm* type, to avoid optimizations, so that the assembler will
error on seeing unsupported instructions.

Alternatively, -O0 could be passed, but I don't think that is always
sufficient, since some compilers cannot not optimize ;p I believe nvhpc is
one of those.

… is run

codecov · 2023-08-21T09:20:53Z

Codecov Report

Patch and project coverage have no change.

Comparison is base (0cb2b37) 83.87% compared to head (1605ec8) 83.88%.
Report is 13 commits behind head on develop.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #1562   +/-   ##
========================================
  Coverage    83.87%   83.88%           
========================================
  Files          132      132           
  Lines        10843    10843           
  Branches      2801     2801           
========================================
+ Hits          9095     9096    +1     
+ Misses        1049     1048    -1     
  Partials       699      699

Flag	Coverage Δ
macos_clang	`42.97% <ø> (ø)`
macos_gcc	`73.61% <ø> (ø)`
ubuntu_clang	`82.50% <ø> (ø)`
ubuntu_clang_debug	`81.90% <ø> (-0.24%)`	⬇️
ubuntu_clang_inflate_allow_invalid_dist	`82.15% <ø> (ø)`
ubuntu_clang_inflate_strict	`82.49% <ø> (ø)`
ubuntu_clang_mmap	`82.82% <ø> (ø)`
ubuntu_clang_pigz	`13.96% <ø> (ø)`
ubuntu_clang_pigz_no_optim	`11.51% <ø> (ø)`
ubuntu_clang_pigz_no_threads	`13.73% <ø> (-0.06%)`	⬇️
ubuntu_clang_reduced_mem	`82.67% <ø> (-0.23%)`	⬇️
ubuntu_clang_toolchain_riscv	`∅ <ø> (∅)`
ubuntu_gcc	`75.30% <ø> (+0.07%)`	⬆️
ubuntu_gcc_aarch64	`77.40% <ø> (ø)`
ubuntu_gcc_aarch64_compat_no_opt	`75.65% <ø> (ø)`
ubuntu_gcc_aarch64_no_acle	`76.16% <ø> (ø)`
ubuntu_gcc_aarch64_no_neon	`76.16% <ø> (ø)`
ubuntu_gcc_armhf	`77.47% <ø> (ø)`
ubuntu_gcc_armhf_compat_no_opt	`75.60% <ø> (ø)`
ubuntu_gcc_armhf_no_acle	`77.42% <ø> (ø)`
ubuntu_gcc_armhf_no_neon	`77.32% <ø> (ø)`
ubuntu_gcc_armsf	`74.65% <ø> (ø)`
ubuntu_gcc_armsf_compat_no_opt	`74.09% <ø> (ø)`
ubuntu_gcc_benchmark	`73.57% <ø> (+0.16%)`	⬆️
ubuntu_gcc_compat_no_opt	`76.85% <ø> (+0.01%)`	⬆️
ubuntu_gcc_compat_sprefix	`73.73% <ø> (ø)`
ubuntu_gcc_m32	`73.23% <ø> (-0.16%)`	⬇️
ubuntu_gcc_mingw_i686	`73.50% <ø> (ø)`
ubuntu_gcc_mingw_x86_64	`73.51% <ø> (ø)`
ubuntu_gcc_mips	`74.97% <ø> (ø)`
ubuntu_gcc_mips64	`74.98% <ø> (ø)`
ubuntu_gcc_no_avx2	`74.34% <ø> (+0.05%)`	⬆️
ubuntu_gcc_no_ctz	`74.65% <ø> (ø)`
ubuntu_gcc_no_ctzll	`74.64% <ø> (ø)`
ubuntu_gcc_no_pclmulqdq	`74.28% <ø> (ø)`
ubuntu_gcc_no_sse2	`74.54% <ø> (ø)`
ubuntu_gcc_no_sse42	`74.17% <ø> (-0.56%)`	⬇️
ubuntu_gcc_o1	`74.18% <ø> (+0.05%)`	⬆️
ubuntu_gcc_osb	`∅ <ø> (∅)`
ubuntu_gcc_pigz	`38.23% <ø> (+0.07%)`	⬆️
ubuntu_gcc_pigz_aarch64	`39.02% <ø> (-0.03%)`	⬇️
ubuntu_gcc_ppc	`73.92% <ø> (ø)`
ubuntu_gcc_ppc64	`74.36% <ø> (ø)`
ubuntu_gcc_ppc64_power9	`74.53% <ø> (ø)`
ubuntu_gcc_ppc64le	`74.43% <ø> (ø)`
ubuntu_gcc_ppc64le_novsx	`74.75% <ø> (ø)`
ubuntu_gcc_ppc64le_power9	`74.31% <ø> (ø)`
ubuntu_gcc_ppc_no_power8	`74.63% <ø> (ø)`
ubuntu_gcc_s390x	`74.80% <ø> (ø)`
ubuntu_gcc_s390x_dfltcc	`71.92% <ø> (ø)`
ubuntu_gcc_s390x_dfltcc_compat	`73.98% <ø> (ø)`
ubuntu_gcc_s390x_no_crc32	`74.59% <ø> (ø)`
ubuntu_gcc_sparc64	`74.79% <ø> (ø)`
ubuntu_gcc_sprefix	`73.24% <ø> (ø)`
win64_gcc	`73.95% <ø> (-0.05%)`	⬇️
win64_gcc_compat_no_opt	`74.71% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

phprus · 2023-08-21T09:50:59Z

Are checks in CMake (https://github.com/zlib-ng/zlib-ng/blob/develop/cmake/detect-intrinsics.cmake) also affected?

haampie · 2023-08-21T10:57:08Z

Looks like the CMake version does not need patching, will check why that is. Probably they don't set optimization flags during tests.

Edit: yeah, that is the case. I think it would be good anyways to keep the tests in sync? And given that certain (hardware vendor) compilers like to optimize by default, I think that's generally an improvement.

haampie · 2023-08-21T11:19:25Z

cmake/detect-intrinsics.cmake

-                (void)c;
-                return 0;
-            }"
+            #include <wmmintrin.h>


This include was added in configure but missing in cmake? 🤔

haampie · 2023-08-21T11:31:17Z

Here's another fun false positive issue, which suggests to run tests with optimizations enabled

From https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64554.

Using GCC 12, this:

// test-1.c
#include <wmmintrin.h>
int main() { 
    __m128i a, b;
    _mm_clmulepi64_si128(a, b, 0x10);
}

compiles fine without any flags:

$ gcc test-1.c

but when you turn optimization flags on it errors:

$ gcc -O1 test-1.c
In file included from test-1.c:1:
/usr/lib/gcc/x86_64-linux-gnu/12/include/wmmintrin.h: In function ‘main’:
/usr/lib/gcc/x86_64-linux-gnu/12/include/wmmintrin.h:116:1: error: inlining failed in call to ‘always_inline’ ‘_mm_clmulepi64_si128’: target specific option mismatch
  116 | _mm_clmulepi64_si128 (__m128i __X, __m128i __Y, const int __I)
      | ^~~~~~~~~~~~~~~~~~~~
test-1.c:4:5: note: called from here
    4 |     _mm_clmulepi64_si128(a, b, 0x10);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

On the other hand, when defined outside of main:

// test-2.c
#include <wmmintrin.h>
__m128i f(__m128i a, __m128i b) { return _mm_clmulepi64_si128(a, b, 0x10); }
int main() { return 0; }

it fails with and without optimization flags:

$ gcc test-2.c
In file included from test-2.c:1:
test-2.c: In function ‘f’:
test-2.c:2:42: error: ‘__builtin_ia32_pclmulqdq128’ needs isa option -mpclmul -msse2
    2 | __m128i f(__m128i a, __m128i b) { return _mm_clmulepi64_si128(a, b, 0x10); }
      |

Edit: the configure / cmake test was correct, the difference is storing the result in __m128i c = ...

jrwrigh · 2023-08-21T21:12:46Z

I've confirmed that this addresses the issues reported in #1559

Dead2 · 2023-08-27T10:50:22Z

LGTM, but I am awaiting code review by others.

nmoinvaz

I tested this PR locally against MSVC and it detected AVX2, AVX512, SSE42, SSE2, SSSE3, PCLMULQDQ, VPCLMULQDQ, and XSAVE intrinsics.

I'm surprised about the compiler not ignoring f() function since it is not used anywhere. I'm not sure if those changes are okay, because we have check_c_source_compile_or_run. The instructions should be executed when WITH_NATIVE_INSTRUCTIONS=ON.

cmake/detect-intrinsics.cmake

configure

haampie · 2023-08-29T07:08:16Z

I'm surprised about the compiler not ignoring f() function since it is not used anywhere.

It's because it's not static f(), so C has to create a symbol for it. I think it only gets deleted with -flto and -fwhole-program (the latter is GCC specific, effectively the same as adding static to every function in the compilation unit).

phprus · 2023-08-29T11:31:12Z

It's because it's not static f(), so C has to create a symbol for it. I think it only gets deleted with -flto and -fwhole-program (the latter is GCC specific, effectively the same as adding static to every function in the compilation unit).

Does building with CMake option -DCMAKE_C_VISIBILITY_PRESET=hidden work?

haampie · 2023-08-29T12:14:44Z

It's because it's not static f(), so C has to create a symbol for it. I think it only gets deleted with -flto and -fwhole-program (the latter is GCC specific, effectively the same as adding static to every function in the compilation unit).

Does building with CMake option -DCMAKE_C_VISIBILITY_PRESET=hidden work?

I think that only marks the symbol as such (for shared linking), the symbol is still added to the binary.

Another way to symbols get removed is -O1 -ffunction-sections -Wl,--gc-sections (-O1 to make sure that function get inlined, so that -ffunction-sections -Wl,--gc-sections can remove function that aren't called), but that shouldn't matter for this PR either since the compiler + assembler still create an object file before the linker removes stuff from it.

haampie · 2023-08-29T13:03:35Z

Regarding native instructions, why don't you trust -march=native to do just that? Why run executables?

nmoinvaz · 2023-08-29T15:32:47Z

That is a good question, I don't know. Perhaps @Dead2 or @mtl1979 can chime in.

haampie · 2023-08-29T15:35:51Z

Asking cause it would simplify my PR 😆

mtl1979 · 2023-08-29T15:48:56Z

There are "processors" that implement certain instructions incorrectly and running the executable will make sure we catch any crash issues.

haampie · 2023-08-29T16:22:57Z

Aren't tests better at catching that issue?

Dead2 · 2023-08-30T16:33:02Z

Regarding native instructions, why don't you trust -march=native to do just that? Why run executables?

That is a very good question. Not running them would be faster and simpler.
I honestly can't remember why we started running the instruction set tests, and I suspect it is not really needed. The compiler should be correct, so running it would only be making doubly sure.

Potentially this could be used to compile only the optimized functions that actually run on the current cpu when using a compiler that does not support -march=native or similar, but we don't do that currently. Is there any interest in this? Think MSVC for example. The next step after that would be to only compile in the best optimization for the machine. But it'd require a good chunk of extra cmake code to implement such logic.

If you want to attempt to change this, I think it would be best if that was a separate commit/PR, so it is easier to bisect/revert.

nmoinvaz · 2023-08-30T17:51:14Z

This is the commit 0b8418e where the change was made to use check_c_source_runs:

CMakeLists.txt: use check_c_source_runs instead of check_c_source_compiles
to try to avoid using intrinsics and an instruction set the compiler
knows but the host CPU doesn't support.

It seems like they could have just use WITH_NATIVE_INSTRUCTIONS because that is the purpose. Original commit didn't state why WITH_NATIVE_INSTRUCTIONS is not a valid solution.

I am infavor, of having a separate commit that removes check_c_source_compile_or_run and just going back to using check_c_source_compiles.

Also AFAIK, configure script does not attempt to run anything, only compile.

Dead2

LGTM

On RHEL9 the GCC is new enough to support AVX512-VNNI, but its assembler (binutils) is not and errors with ``` Error: unsupported instruction vpdpbusd ``` This was already addressed earlier in zlib-ng#1562 to some extent, except that a check for `_mm256_dpbusd_epi32` was not added, which is what the assembler errors over.

On RHEL9 the GCC is new enough to support AVX512-VNNI, but its assembler (binutils) is not and errors with ``` Error: unsupported instruction vpdpbusd ``` This was already addressed earlier in #1562 to some extent, except that a check for `_mm256_dpbusd_epi32` was not added, which is what the assembler errors over.

configure: ensure instructions are not optimized out before assembler…

58215ba

… is run

This was referenced Aug 21, 2023

autotools configure script does not discern AVX512 hardware support #1559

Closed

zlib-api: use zlib-ng +compat by default spack/spack#39358

Merged

another

96a6770

Dead2 added the Build Env label Aug 21, 2023

Do the same for detect-intrinsics.cmake

ab8217d

haampie commented Aug 21, 2023

View reviewed changes

haampie changed the title ~~configure: ensure instructions are not optimized out before assembler is run~~ ensure intrinsics are not optimized out before assembler is run Aug 21, 2023

haampie changed the title ~~ensure intrinsics are not optimized out before assembler is run~~ detect intrinsics: ensure intrinsics are not optimized out before assembler is run Aug 21, 2023

Dead2 requested a review from KungFuJesus August 23, 2023 18:42

nmoinvaz reviewed Aug 29, 2023

View reviewed changes

cmake/detect-intrinsics.cmake Outdated Show resolved Hide resolved

configure Outdated Show resolved Hide resolved

cast _xgetbv to int to silence conversion warning

1605ec8

Dead2 approved these changes Sep 11, 2023

View reviewed changes

Dead2 merged commit ca2d4e5 into zlib-ng:develop Sep 13, 2023

nmoinvaz mentioned this pull request Sep 14, 2023

Add ARMv6 version of slide_hash #1538

Merged

Dead2 mentioned this pull request Oct 13, 2023

2.1.4 Release #1586

Merged

haampie deleted the fix/avx-detection-with-old-assembler branch October 21, 2023 15:21

nmoinvaz mentioned this pull request Dec 23, 2023

Make sure uqsub16 mnemonic doesn't get optimized away. #1619

Merged

nmoinvaz mentioned this pull request Sep 11, 2024

Fixed false positive HAVE_ARMV6_INTRIN value on old ARM platforms #1774

Merged

haampie mentioned this pull request Jul 29, 2025

X86_AVX512VNNI: test _mm256_dpbusd_epi32 too #1944

Merged

haampie mentioned this pull request Mar 9, 2026

Installation issue: llvm-amdgpu spack/spack-packages#3696

Open

4 tasks

Uh oh!

Conversation

haampie commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

phprus commented Aug 21, 2023

Uh oh!

haampie commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haampie Aug 21, 2023

Choose a reason for hiding this comment

Uh oh!

haampie commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrwrigh commented Aug 21, 2023

Uh oh!

Dead2 commented Aug 27, 2023

Uh oh!

nmoinvaz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

haampie commented Aug 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phprus commented Aug 29, 2023

Uh oh!

haampie commented Aug 29, 2023

Uh oh!

haampie commented Aug 29, 2023

Uh oh!

nmoinvaz commented Aug 29, 2023

Uh oh!

haampie commented Aug 29, 2023

Uh oh!

mtl1979 commented Aug 29, 2023

Uh oh!

haampie commented Aug 29, 2023

Uh oh!

Dead2 commented Aug 30, 2023

Uh oh!

nmoinvaz commented Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dead2 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

haampie commented Aug 21, 2023 •

edited

Loading

codecov bot commented Aug 21, 2023 •

edited

Loading

haampie commented Aug 21, 2023 •

edited

Loading

haampie commented Aug 21, 2023 •

edited

Loading

haampie commented Aug 29, 2023 •

edited

Loading

nmoinvaz commented Aug 30, 2023 •

edited

Loading