Make CUDA and ROCm architecture conditional #27185
Conversation
|
@scottwittenburg Do you have any idea why some specs are failing in pipelines? For instance, I can't get a sense of this error |
It seems |
16c8554 to
22f98e0
Compare
|
@alalazo does this mean I can finally set the default of variants depending on |
|
@dev-zero I think so, but to be clear:
For instance, when this PR is merged a spec satisfying |
|
@sethrj FYI, not sure this can make it to v0.17.0 |
fixes spack#14337 The variant to specify which architecture to use for CUDA and ROCm are now conditional on +cuda and +rocm respectively.
22f98e0 to
771919e
Compare
|
@spackbot run pipeline |
|
I've started that pipeline for you! |
|
@spackbot run pipeline |
|
I've started that pipeline for you! |
|
Are we going to merge this before 0.17? (I would be in favor :D) |
| variant('cuda_arch_35_k20x', default=False, | ||
| description=('CP2K (resp. DBCSR) has specific parameter sets for' | ||
| ' different GPU models. Enable this when building' | ||
| ' with cuda_arch=35 for a K20x instead of a K40')) |
There was a problem hiding this comment.
Not relevant to your changes, but damn this is ugly 😂
There was a problem hiding this comment.
Well, luckily Nvidia fixed that afterwards, and many codes can build for multiple models in parallel, hence don't have that problem in the first place.
Since spack#27185, the cuda_arch variant values are conditional on +cuda. This means that for -cuda specs, the installation fails with: ``` ==> acts: Executing phase: 'cmake' ==> Error: KeyError: 'cuda_arch' /home/wdconinc/git/spack/var/spack/repos/builtin/packages/acts/package.py:222, in cmake_args: 219 log_failure_threshold = spec.variants['log_failure_threshold'].value 220 args.append("-DACTS_LOG_FAILURE_THRESHOLD={0}".format(log_failure_threshold)) 221 >> 222 cuda_arch = spec.variants['cuda_arch'].value 223 if cuda_arch != 'none': 224 args.append('-DCUDA_FLAGS=-arch=sm_{0}'.format(cuda_arch[0])) 225 ```
Since #27185, the cuda_arch variant values are conditional on +cuda. This means that for -cuda specs, the installation fails with: ``` ==> acts: Executing phase: 'cmake' ==> Error: KeyError: 'cuda_arch' /home/wdconinc/git/spack/var/spack/repos/builtin/packages/acts/package.py:222, in cmake_args: 219 log_failure_threshold = spec.variants['log_failure_threshold'].value 220 args.append("-DACTS_LOG_FAILURE_THRESHOLD={0}".format(log_failure_threshold)) 221 >> 222 cuda_arch = spec.variants['cuda_arch'].value 223 if cuda_arch != 'none': 224 args.append('-DCUDA_FLAGS=-arch=sm_{0}'.format(cuda_arch[0])) 225 ```
|
It seems like |
|
I'm in favor of it. Technically you can have CUDA enabled but not have any device-specific code, but it's really only useful for testing. (You can for example build against CUDA on a system without a working CUDA card, and you can still call CUDA runtime APIs such as |
|
I guess the question would be what should be the default. |
|
It would be really cool if we could interrogate the system to find the default, but that could be bad when build and deploy systems are different. Probably better for there to be a way to force the user to select the cuda architecture. |
fixes #14337
fixes #27213
Modifications:
+cudaand+rocmrespectively.cp2kto make all CUDA related variants conditional on+cuda