Skip to content

hip: mathlibs inherit CudaPackage and ROCmPackage#34586

Merged
haampie merged 6 commits intospack:developfrom
cgmb:rocm-cuda-mathlibs
Apr 3, 2023
Merged

hip: mathlibs inherit CudaPackage and ROCmPackage#34586
haampie merged 6 commits intospack:developfrom
cgmb:rocm-cuda-mathlibs

Conversation

@cgmb
Copy link
Copy Markdown
Contributor

@cgmb cgmb commented Dec 17, 2022

This change enables hipblas, hipsolver, hipsparse, hipfft, and hipcub to be built for either ROCm or CUDA. The +rocm variant is the default (to match the previous behaviour). However, the new amdgpu_target variant is required (as that's how ROCmPackage works).

Related: #33872

@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Dec 17, 2022

It's weird how the CI keeps trying to build for gfx701. I suspect it's because the amdgpu_target variant is underspecified in some specs. I've opened #34591 to try to prevent some of those weird failures.

@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Dec 23, 2022

@spackbot rerun pipeline

edit: opps. wrong PR.

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Dec 23, 2022

I've started that pipeline for you!

@cgmb cgmb marked this pull request as draft December 23, 2022 21:02
@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Dec 23, 2022

The ROCmPackage default of (amdgpu_target=none) is different from how the ROCm libraries have worked historically, which is breaking libraries that depend on the hip mathlibs. I'm going to mark this as a draft until I know how to deal with that.

@cgmb cgmb force-pushed the rocm-cuda-mathlibs branch from 767c107 to e99b92a Compare January 24, 2023 08:12
@spackbot-app spackbot-app bot added build-systems core PR affects Spack core functionality labels Jan 24, 2023
@cgmb cgmb force-pushed the rocm-cuda-mathlibs branch from e99b92a to 3e59444 Compare January 24, 2023 22:55
@cgmb cgmb removed build-systems core PR affects Spack core functionality labels Jan 24, 2023
@cgmb cgmb marked this pull request as ready for review January 25, 2023 00:45
@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Jan 25, 2023

the new amdgpu_target variant is required (as that's how ROCmPackage works).

This is unfortunate. It means that spack spec hipsolver is resolved as hipsolver +cuda since amdgpu_target=none is the default, is illegal, and is sticky.

cgmb added 3 commits March 15, 2023 12:51
Unless the amdgpu_target is overriden, the libraries will default to
being built for cuda, since amdgpu_target=none is both default and in
conflict with +rocm. This requires a custom Disjoint set to include
both the 'auto' variant used by the rocm mathlibs and the 'none'
variant used by ROCmPackage.
@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Mar 17, 2023

@haampie, @renjithravindrankannath, @srekolam, please review these changes.

I've made some changes so that this PR doesn't have to wait on #35233 and #35316. The custom disjoint set that is needed for this is a bit ugly, but the awfulness is at least contained to these libraries. I explicitly avoided extracting it into a function to ensure that it doesn't get used elsewhere. We can remove it once the amdgpu_target is properly detected.

The custom disjoint set is exactly the same as the amdgpu_target variant from ROCmPackage, but it adds the value of auto and sets it as the default. This works with the rocm mathlibs, which also have a default value of auto. This is how my proposed change preserves the existing default behaviour for specs that don't specify the amdgpu_target.

Once there is proper amdgpu_target detection logic in spack (both for concretizing specs and for external find), then the auto variant value can be dropped from rccl, rocalution, roccblas, rocfft, rocprim, rocrand, rocsolver, rocsparse, rocthrust, rocmma, hipblas, hipsolver, hipsparse, hipfft and hipcub.

@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Mar 28, 2023

*poke*

Copy link
Copy Markdown
Contributor

@srekolam srekolam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cgmb sorry for being late on this. I looked at your changes and i took your changes on my local spack repository and build hipfft+ cuda and [email protected]. It built both versions successfully.
A couple of observations with the hip. When i execute the hipcc and hipconfig from both the installed prefix paths of hip + rocm and hip+cuda, i see it always points to cuda driver ..
PR-34586.txt
same with hipconfig too.. Just trying to check with you on this.

@renjithravindrankannath
Copy link
Copy Markdown
Contributor

@cgmb , Please excuse my delayed response. I tried [email protected] +cuda , [email protected] +cuda, [email protected] +cuda, [email protected] +cuda and [email protected] +cuda. hipblas, hipfft, hipsolver were built successfully. But hipcub & hipsparse were failing.
hipcub-5.4.3-cuda-spack-build-out.txt
hipsparse-5.4.3-cuda-spack-build-out.txt

hip-targets-release.cmake and hip-targets.cmake are missing under the lib/cmake/hip/ path

[xyz@ixt-rack-104 spack]$ ls -la opt/spack/linux-ubuntu20.04-skylake_avx512/gcc-9.4.0/hip-5.4.3-ltezppdane3i6xpri3gdl2f7rvp4ucak/lib/cmake/hip/
total 68
drwxr-sr-x. 3 root root  4096 Mar 28 01:31 .
drwxr-sr-x. 3 root root  4096 Mar 28 01:31 ..
drwxr-sr-x. 2 root root  4096 Mar 28 01:31 FindHIP
-rw-r--r--. 1 root root 33450 Jan 20 09:05 FindHIP.cmake
-rw-r--r--. 1 root root 13266 Mar 28 01:29 hip-config.cmake
-rw-r--r--. 1 root root  2890 Mar 28 01:29 hip-config-version.cmake

This patch is not strictly necessary, but it may fix the search for HIP
in certain environments.
@spackbot-app spackbot-app bot added the patch label Mar 28, 2023
@cgmb
Copy link
Copy Markdown
Contributor Author

cgmb commented Mar 28, 2023

@cgmb sorry for being late on this. I looked at your changes and i took your changes on my local spack repository and build hipfft+ cuda and [email protected]. It built both versions successfully. A couple of observations with the hip. When i execute the hipcc and hipconfig from both the installed prefix paths of hip + rocm and hip+cuda, i see it always points to cuda driver .. PR-34586.txt same with hipconfig too.. Just trying to check with you on this.

You need to spack load hip +rocm or spack load hip +cuda to ensure your environment is correctly configured before running hipconfig or hipcc.

hipcub-5.4.3-cuda-spack-build-out.txt

I can't reproduce this, but it looks like a bug in hipcub. The hip-config.cmake file is getting picked up by this call to find_package(hip QUIET CONFIG ...). I've added a couple patches in d2fcda2 to try to fix it for [email protected]: +cuda. However, it works for me even without the patches. So, they're not strictly necessary in some environments.

hipsparse-5.4.3-cuda-spack-build-out.txt

This is a bug in hipsparse. It doesn't support newer versions of CUDA. I've backported the fix in 9fc769d.

hip-targets-release.cmake and hip-targets.cmake are missing under the lib/cmake/hip/ path

That is normal. HIP does not yet support config mode search on the NVIDIA platform.

@renjithravindrankannath
Copy link
Copy Markdown
Contributor

@cgmb sorry for being late on this. I looked at your changes and i took your changes on my local spack repository and build hipfft+ cuda and [email protected]. It built both versions successfully. A couple of observations with the hip. When i execute the hipcc and hipconfig from both the installed prefix paths of hip + rocm and hip+cuda, i see it always points to cuda driver .. PR-34586.txt same with hipconfig too.. Just trying to check with you on this.

You need to spack load hip +rocm or spack load hip +cuda to ensure your environment is correctly configured before running hipconfig or hipcc.

hipcub-5.4.3-cuda-spack-build-out.txt

I can't reproduce this, but it looks like a bug in hipcub. The hip-config.cmake file is getting picked up by this call to find_package(hip QUIET CONFIG ...). I've added a couple patches to try to fix it in [email protected]: +cuda. However, it works for me even without the patches. So, they're not strictly necessary in some environments.

hipsparse-5.4.3-cuda-spack-build-out.txt

This is a bug in hipsparse. It doesn't support newer versions of CUDA. I've backported the fix.

hip-targets-release.cmake and hip-targets.cmake are missing under the lib/cmake/hip/ path

That is normal. HIP does not yet support config mode search on the NVIDIA platform.

Thanks @cgmb . hipcub and hipsparse built successfully for cuda variant.

Copy link
Copy Markdown
Contributor

@renjithravindrankannath renjithravindrankannath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

patch("find-hip-cuda-rocm-5.1.patch", when="@5.1:5.2 +cuda")
patch("find-hip-cuda-rocm-5.3.patch", when="@5.3: +cuda")

def setup_build_environment(self, env):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see setup_build_environment had been removed for other recipes in the PR except for hipcub. Any reason why. Otherwise i am okay with the changes.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hipblas, hipsolver, hipfft and hipsparse are all written in C++ and can be built with a normal C++ compiler, but hipCUB is written in the HIP language and requires a HIP compiler.

@haampie haampie merged commit cfea2d1 into spack:develop Apr 3, 2023
@cgmb cgmb deleted the rocm-cuda-mathlibs branch April 3, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants