Skip to content

zlib-api: use zlib-ng +compat by default#39358

Merged
tgamblin merged 3 commits intospack:developfrom
haampie:fix/switch-default-provider-zlib-api
Aug 17, 2023
Merged

zlib-api: use zlib-ng +compat by default#39358
tgamblin merged 3 commits intospack:developfrom
haampie:fix/switch-default-provider-zlib-api

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Aug 9, 2023

In the HPC package manager, we want the fastest zlib implementation by default. zlib-ng is up to 4x faster than stock zlib, and it can do things like take advantage of AVX-512 instructions. This PR makes zlib-ng the default zlib-api provider (zlib-api was introduced earlier, in #37372).

As far as I can see, the only issues you can encounter are:

  1. Build issues with packages that heavily rely on zlib internals. In Gitlab CI only one out of hundreds of packages had that issue (it extended zlib with deflate stuff, and used its own copy of zlib sources).
  2. Packages that like to detect zlib-ng separately and rely on zlib-ng internals. The only issue I've found with this among the hundreds of packages built in CI is perl trying to report more specific zlib-ng version details, and relied on some internals that got refactored. But yeah... that warrants a patch / conflict and is nothing special.

At runtime, you cannot really have any issues, given that zlib and zlib-ng export the exact same symbols (and zlib-ng tests this in their CI).

You can't really have issues with externals when using zlib-ng either. The only type of issue is when system zlib is rather new, and not marked as external; if another external uses new symbols, and Spack builds an older zlib/zlib-ng, then the external might not find the new symbols. But this is a configuration issue, and it's not an issue caused by zlib-ng, as the same would happen with older Spack zlib.

@spackbot-app spackbot-app bot added core PR affects Spack core functionality defaults labels Aug 9, 2023
@haampie haampie changed the title zlib-api: use zlib-ng +compat by default zlib-api: use zlib-ng +compat by default Aug 9, 2023
@haampie haampie closed this Aug 10, 2023
@haampie haampie reopened this Aug 10, 2023
@haampie haampie force-pushed the fix/switch-default-provider-zlib-api branch from 408e1e8 to 886688d Compare August 11, 2023 08:34
Copy link
Copy Markdown
Member

@alalazo alalazo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine, and was tested to work with our pipelines with a rebuild everything. LGTM, but I think @tgamblin needs the last word here

@alalazo alalazo requested a review from tgamblin August 16, 2023 08:44
@alalazo alalazo added this to the v0.21.0 milestone Aug 16, 2023
@tgamblin tgamblin merged commit ec500ad into spack:develop Aug 17, 2023
@tgamblin
Copy link
Copy Markdown
Member

Merged! I added the rationale to the description and the commit, and I also added this here:

So that people can easily surface it. Hoping it just works as @haampie describes.

@jrwrigh
Copy link
Copy Markdown

jrwrigh commented Aug 18, 2023

If you notice issues, please comment on the PR.

(I'd open an issue if not for the comment in the discussion thread. Let me know if y'all would still like this to be a separate issue).

I'm currently getting the following build error:

==> Installing zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus [1/1]
==> No binary for zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus found: installing from source
==> Using cached archive: /projects/tools/Spack/var/spack/cache/_source-cache/archive/d2/d20e55f89d71991c59f1c5ad1ef944815e5850526c0d9cd8e504eaed5b24491a.tar.gz
==> No patches needed for zlib-ng
==> zlib-ng: Executing phase: 'autoreconf'
==> zlib-ng: Executing phase: 'configure'
==> zlib-ng: Executing phase: 'build'
==> Error: ProcessError: Command exited with status 2:
    'make' '-j16' 'V=1'

10 errors found in build log:
     118    make[1]: Entering directory '/tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86'
     119    /projects/tools/Spack/lib/spack/env/gcc/gcc -O2  -std=c11 -Wall -DNDEBUG -D_LARGEFILE64_SOURCE=1 -DHAVE_POSIX_MEMALIGN -DHAVE_ALIGNED_ALLOC -DHAVE_SYS_AUXV_H -DZLIB_COMPAT -DWITH_GZFILEOP -DHAVE_VISIBILITY_HIDDEN -DHAVE_VISIBILITY_INTERNAL -DHAVE_ATTRIBUTE_ALIGNED -DHA
            VE_THREAD_LOCAL -DHAVE_BUILTIN_CTZ -DHAVE_BUILTIN_CTZLL -DX86_FEATURES -DX86_AVX2 -DX86_AVX512 -DX86_MASK_INTRIN -DX86_AVX512VNNI -DX86_SSE42 -DX86_SSE42_CRC_INTRIN -DX86_SSE2 -DX86_SSSE3 -DX86_PCLMULQDQ_CRC -DX86_VPCLMULQDQ_CRC -mssse3 -fno-lto -I/tmp/tooldeveloper/sp
            ack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86 -I/tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src -c -o chunkset_ssse3.o /tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3
            -oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86/chunkset_ssse3.c
     120    make -C arch/x86 crc32_pclmulqdq.o
     121    make[1]: Entering directory '/tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86'
     122    /projects/tools/Spack/lib/spack/env/gcc/gcc -O2  -std=c11 -Wall -DNDEBUG -D_LARGEFILE64_SOURCE=1 -DHAVE_POSIX_MEMALIGN -DHAVE_ALIGNED_ALLOC -DHAVE_SYS_AUXV_H -DZLIB_COMPAT -DWITH_GZFILEOP -DHAVE_VISIBILITY_HIDDEN -DHAVE_VISIBILITY_INTERNAL -DHAVE_ATTRIBUTE_ALIGNED -DHA
            VE_THREAD_LOCAL -DHAVE_BUILTIN_CTZ -DHAVE_BUILTIN_CTZLL -DX86_FEATURES -DX86_AVX2 -DX86_AVX512 -DX86_MASK_INTRIN -DX86_AVX512VNNI -DX86_SSE42 -DX86_SSE42_CRC_INTRIN -DX86_SSE2 -DX86_SSSE3 -DX86_PCLMULQDQ_CRC -DX86_VPCLMULQDQ_CRC -mpclmul -msse4.2 -fno-lto -I/tmp/toolde
            veloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86 -I/tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src -c -o crc32_pclmulqdq.o /tmp/tooldeveloper/spack-stage/spack-stage-zl
            ib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86/crc32_pclmulqdq.c
     123    /tmp/cc7yhZcn.s: Assembler messages:
  >> 124    /tmp/cc7yhZcn.s:66: Error: no such instruction: `vpdpbusd %zmm7,%zmm4,%zmm2'
  >> 125    /tmp/cc7yhZcn.s:68: Error: no such instruction: `vpdpbusd %zmm7,%zmm5,%zmm3'
  >> 126    /tmp/cc7yhZcn.s:122: Error: no such instruction: `vpdpbusd %zmm7,%zmm1,%zmm3'
  >> 127    /tmp/cc7yhZcn.s:212: Error: no such instruction: `vpdpbusd %ymm7,%ymm3,%ymm2'
  >> 128    /tmp/cc7yhZcn.s:216: Error: no such instruction: `vpdpbusd %ymm7,%ymm4,%ymm5'
  >> 129    /tmp/cc7yhZcn.s:268: Error: no such instruction: `vpdpbusd %ymm7,%ymm1,%ymm5'
  >> 130    Makefile:123: recipe for target 'adler32_avx512_vnni.o' failed
  >> 131    make[1]: *** [adler32_avx512_vnni.o] Error 1
     132    make[1]: Leaving directory '/tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86'
  >> 133    Makefile:167: recipe for target 'arch/x86/adler32_avx512_vnni.o' failed
  >> 134    make: *** [arch/x86/adler32_avx512_vnni.o] Error 2
     135    make: *** Waiting for unfinished jobs....
     136    make[1]: Leaving directory '/tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86'
     137    make[1]: Leaving directory '/tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86'
     138    make[1]: Leaving directory '/tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86'
     139    make[1]: Leaving directory '/tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86'
     140    make[1]: Leaving directory '/tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/arch/x86'

See build log for details:
  /tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-build-out.txt

I've narrowed it down to a configure problem. For some reason, on a machine I use, the configure process concludes that my CPU has AVX512 instructions when it does not. From the build/configure log:

==> zlib-ng: Executing phase: 'autoreconf'
==> zlib-ng: Executing phase: 'configure'
==> [2023-08-18-13:56:34.546535] Find (recursive): /tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src ['configure']
==> [2023-08-18-13:56:34.550041] Find complete: /tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src ['configure']
==> [2023-08-18-13:56:34.550400] FILTER FILE: /tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/configure [replacing "^(\s*if test x-L = )("\$p" \|\|\s*)$"]
==> [2023-08-18-13:56:34.560086] FILTER FILE: /tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/configure [replacing "^(\s*test x-R = )("\$p")(; then\s*)$"]
==> [2023-08-18-13:56:34.568396] FILTER FILE: /tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/configure [replacing "^(\s*test \$p = "-R")(; then\s*)$"]
==> [2023-08-18-13:56:34.577842] '/tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src/configure' '--prefix=/projects/tools/Spack/opt/spack/linux-debian8-haswell/gcc-11.2.0/zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus' '--zlib-compat'
Checking for compiler... /projects/tools/Spack/lib/spack/env/gcc/gcc
Checking for shared library support... Building shared library libz.so.1.2.13.zlib-ng with /projects/tools/Spack/lib/spack/env/gcc/gcc.
Checking for off64_t... Yes.
Checking for fseeko... Yes.
Checking for posix_memalign... Yes.
Checking for aligned_alloc... Yes.
Checking for strerror... Yes.
Checking for getauxval() or elf_aux_info() in sys/auxv.h... Yes.
Checking for unistd.h... Yes.
Checking for ptrdiff_t... Yes.
Checking for ANSI C compliant compiler...  Yes.
Checking for -fno-semantic-interposition... Yes.
Checking for -fno-lto... Yes.
Checking for attribute(visibility(hidden)) support... Yes.
Checking for attribute(visibility(internal)) support... Yes.
Checking for attribute(aligned) ... Yes.
Checking for _Thread_local ... Yes.
Checking for __builtin_ctz ... Yes.
Checking for __builtin_ctzll ... Yes.
Checking for AVX2 intrinsics ... Yes.
Checking for AVX512 intrinsics ... Yes.
Checking for AVX512 k-mask intrinsics ... Yes.
Check whether -mtune=cascadelake works ... Yes.
Checking for AVX512VNNI intrinsics ... Yes.
Checking for SSE4.2 CRC inline assembly ... Yes.
Checking for SSE4.2 CRC intrinsics ... Yes.
Checking for SSE2 intrinsics ... Yes.
Checking for SSSE3 intrinsics ... Yes.
Checking for PCLMULQDQ intrinsics ... Yes.
Checking for VPCLMULQDQ intrinsics ... Yes.
Checking for XSAVE intrinsics ... Yes.
ARCH: x86_64
Using arch directory: arch/x86
Architecture-specific static object files: x86_features.o slide_hash_avx2.o chunkset_avx2.o compare256_avx2.o adler32_avx2.o adler32_avx512.o adler32_avx512_vnni.o adler32_sse42.o insert_string_sse42.o chunkset_sse2.o compare256_sse2.o slide_hash_sse2.o adler32_ssse3.o chunkset_ssse3.o crc32_pclmulqdq.o crc32_vpclmulqdq.o
Architecture-specific shared object files: x86_features.lo slide_hash_avx2.lo chunkset_avx2.lo compare256_avx2.lo adler32_avx2.lo adler32_avx512.lo adler32_avx512_vnni.lo adler32_sse42.lo insert_string_sse42.lo chunkset_sse2.lo compare256_sse2.lo slide_hash_sse2.lo adler32_ssse3.lo chunkset_ssse3.lo crc32_pclmulqdq.lo crc32_vpclmulqdq.lo
==> [2023-08-18-13:56:40.862149] Find (recursive): /tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src ['libtool']
==> [2023-08-18-13:56:40.865647] Find complete: /tmp/tooldeveloper/spack-stage/spack-stage-zlib-ng-2.1.3-oss7rd36yaiacuvnrjlk7qulzzvdlzus/spack-src ['libtool']
==> zlib-ng: Executing phase: 'build'
...

/proc/cpuinfo does not have avx512 listed on it, so I'm not sure where it's getting that idea from.

Oddly though, if I manually clone and build zlib-ng, configure determines the correct instruction set present and subsequently builds fine:

❯ ./configure
Checking for compiler... gcc
Checking for Symbol versioning... Yes.
Checking for shared library support... Building shared library libz-ng.so.2.1.3 with gcc.
Checking for off64_t... Yes.
Checking for fseeko... Yes.
Checking for posix_memalign... Yes.
Checking for aligned_alloc... Yes.
Checking for strerror... Yes.
Checking for getauxval() or elf_aux_info() in sys/auxv.h... Yes.
Checking for unistd.h... Yes.
Checking for ptrdiff_t... Yes.
Checking for ANSI C compliant compiler...  Yes.
Checking for -fno-semantic-interposition... No.
Checking for -fno-lto... Yes.
Checking for attribute(visibility(hidden)) support... Yes.
Checking for attribute(visibility(internal)) support... Yes.
Checking for attribute(aligned) ... Yes.
Checking for _Thread_local ... Yes.
Checking for __builtin_ctz ... Yes.
Checking for __builtin_ctzll ... Yes.
Checking for AVX2 intrinsics ... Yes.
Checking for AVX512 intrinsics ... No.
Check whether -mtune=cascadelake works ... No.
Check whether -mtune=skylake-avx512 works ... No.
Checking for AVX512VNNI intrinsics ... No.
Checking for SSE4.2 intrinsics ... Yes.
Checking for SSE2 intrinsics ... Yes.
Checking for SSSE3 intrinsics ... Yes.
Checking for PCLMULQDQ intrinsics ... Yes.
Checking for VPCLMULQDQ intrinsics ... No.
Checking for XSAVE intrinsics ... No.
Uname: Linux
ARCH: x86_64
Using arch directory: arch/x86
Architecture-specific static object files: x86_features.o slide_hash_avx2.o chunkset_avx2.o compare256_avx2.o adler32_avx2.o adler32_sse42.o insert_string_sse42.o chunkset_sse2.o compare256_sse2.o slide_hash_sse2.o adler32_ssse3.o chunkset_ssse3.o crc32_pclmulqdq.o
Architecture-specific shared object files: x86_features.lo slide_hash_avx2.lo chunkset_avx2.lo compare256_avx2.lo adler32_avx2.lo adler32_sse42.lo insert_string_sse42.lo chunkset_sse2.lo compare256_sse2.lo slide_hash_sse2.lo adler32_ssse3.lo chunkset_ssse3.lo crc32_pclmulqdq.lo

Any ideas what's happening here and how to fix it?

@jrwrigh
Copy link
Copy Markdown

jrwrigh commented Aug 18, 2023

Update: Loading a newer compiler (10.2.0 vs 4.9.2) does cause the manual build to fail. It looks like their configure script is just going off of whether the given gcc will compile with AVX512 flags without actually checking whether they'll run on the given hardware. (Edit: More accurately, the assembler fails to work with AVX512 instructions)

Regardless, this issue is definitely more of a zlib-ng problem (I'll submit an issue there), but until that's resolved I'd say the switch to zlib-ng might be a bit too soon?

Edit: Here's the zlib-ng issue zlib-ng/zlib-ng#1559

@tgamblin
Copy link
Copy Markdown
Member

@haampie if we can’t patch the build easily to work with more compilers I’m inclined to revert this. Thoughts?

@alalazo
Copy link
Copy Markdown
Member

alalazo commented Aug 21, 2023

(Edit: More accurately, the assembler fails to work with AVX512 instructions)

Can you post the output of:

spack debug report

? I think chances are high you are on a very old system, in which case setting these flags https://github.com/zlib-ng/zlib-ng#advanced-build-options based on the target might be better than reverting the PR.

@haampie haampie deleted the fix/switch-default-provider-zlib-api branch August 21, 2023 06:52
@haampie
Copy link
Copy Markdown
Member Author

haampie commented Aug 21, 2023

Sounds like a fixable problem. I'd rather collect more issues (if any) than revert the PR, cause otherwise we probably never discover them. We could decide closer to the 0.21 release whether to revert?

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Aug 21, 2023

For some reason, on a machine I use, the configure process concludes that my CPU has AVX512 instructions when it does not

/proc/cpuinfo does not have avx512 listed on it

It's not about your CPU, it's about whether your compiler can emit those instructions, which it can. Apparently your assembler cannot deal with those.

At runtime cpuid is used to dispatch to optimal routines.

Using the right combination of GCC & binutils is definitely a good idea, I don't think zlib-ng will be the only project where you'll have issues. You can also run into issues where binutils doesn't understand gcc's debug info for example.

However, we could check how to make the configure check in zlib-ng more strict.

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Aug 21, 2023

The configure script is generally fine, it's just that the configure test uses -O2, which causes the relevant AVX512 instructions to be optimized out, so the assembler never sees them.

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Aug 21, 2023

Fixed by zlib-ng/zlib-ng#1562.

On Centos 7 with spack-built gcc 13 and system binutils 2.27:

Checking for AVX2 intrinsics ... Yes.
Checking for AVX512 intrinsics ... Yes.
Checking for AVX512 k-mask intrinsics ... Yes.
Check whether -mtune=cascadelake works ... Yes.
Checking for AVX512VNNI intrinsics ... No.
Checking for SSE4.2 intrinsics ... Yes.
Checking for SSE2 intrinsics ... Yes.
Checking for SSSE3 intrinsics ... Yes.
Checking for PCLMULQDQ intrinsics ... Yes.
Checking for VPCLMULQDQ intrinsics ... No.
Checking for XSAVE intrinsics ... Yes.

@jrwrigh
Copy link
Copy Markdown

jrwrigh commented Aug 21, 2023

Can you post the output of:

spack debug report

? I think chances are high you are on a very old system,

I can do that once I get to my computer, but I can go ahead and confirm that I am on a very old system (Debian from like 2015ish).

@haampie
Copy link
Copy Markdown
Member Author

haampie commented Aug 21, 2023

@jrwrigh try #39559 instead, it should fix your issue

mpokorny pushed a commit to mpokorny/spack that referenced this pull request Sep 18, 2023
In the HPC package manager, we want the fastest `zlib`  implementation by default.  `zlib-ng` is up to 4x faster than stock `zlib`, and it can do things like take advantage of AVX-512 instructions.  This PR makes `zlib-ng` the default `zlib-api` provider (`zlib-api` was introduced earlier, in spack#37372).

As far as I can see, the only issues you can encounter are:

1. Build issues with packages that heavily rely on `zlib` internals. In Gitlab CI only one out of hundreds of packages had that issue (it extended zlib with deflate stuff, and used its own copy of zlib sources).
2. Packages that like to detect `zlib-ng` separately and rely on `zlib-ng` internals. The only issue I've found with this among the hundreds of packages built in CI is `perl` trying to report more specific zlib-ng version details, and relied on some internals that got refactored. But yeah... that warrants a patch / conflict and is nothing special.

At runtime, you cannot really have any issues, given that zlib and zlib-ng export the exact same symbols (and zlib-ng tests this in their CI).

You can't really have issues with externals when using zlib-ng either. The only type of issue is when system zlib is rather new, and not marked as external; if another external uses new symbols, and Spack builds an older zlib/zlib-ng, then the external might not find the new symbols. But this is a configuration issue, and it's not an issue caused by zlib-ng, as the same would happen with older Spack zlib.

* zlib-api: use zlib-ng +compat by default
* make a trivial change to zlib-ng to trigger a rebuild
* add `haampie` as maintainer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core PR affects Spack core functionality defaults update-package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants