Skip to content

Make CUDA and ROCm architecture conditional #27185

Merged
sethrj merged 2 commits intospack:developfrom
alalazo:mixims/cuda_rocm_conditional_variants
Nov 22, 2021
Merged

Make CUDA and ROCm architecture conditional #27185
sethrj merged 2 commits intospack:developfrom
alalazo:mixims/cuda_rocm_conditional_variants

Conversation

@alalazo
Copy link
Copy Markdown
Member

@alalazo alalazo commented Nov 3, 2021

fixes #14337
fixes #27213

Modifications:

  • The variant to specify which architecture to use for CUDA and ROCm are now conditional on +cuda and
    +rocm respectively.
  • Modify cp2k to make all CUDA related variants conditional on +cuda

dev-zero
dev-zero previously approved these changes Nov 3, 2021
Copy link
Copy Markdown
Contributor

@dev-zero dev-zero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

becker33
becker33 previously approved these changes Nov 3, 2021
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 3, 2021

@scottwittenburg Do you have any idea why some specs are failing in pipelines? For instance, I can't get a sense of this error

@scottwittenburg
Copy link
Copy Markdown
Contributor

@scottwittenburg Do you have any idea why some specs are failing in pipelines? For instance, I can't get a sense of this error

It seems spack install is still exiting 0 in a lot of cases when configure stage fails, and pipelines use that exit code to decide whether to create a buildcache or not. So the error at the bottom of the trace is really a red herring in those cases, and you have to scroll way up to find the real problem. In the case you linked above, this error message appears:

==> [2021-11-03-09:18:21.167201] No patches needed for hpx
==> [2021-11-03-09:18:21.202240] hpx: Executing phase: 'cmake'
==> [2021-11-03-09:18:21.306025] Error: AttributeError: 'tuple' object has no attribute 'values'
/builds/spack/spack/var/spack/repos/builtin/packages/hpx/package.py:166, in instrumentation_args:
        165    def instrumentation_args(self):
  >>    166        for value in self.variants['instrumentation'].values:
        167            if value == 'none':
        168                continue
        169

@alalazo alalazo dismissed stale reviews from becker33 and dev-zero via e1d94b6 November 3, 2021 21:40
@alalazo alalazo force-pushed the mixims/cuda_rocm_conditional_variants branch from 16c8554 to 22f98e0 Compare November 3, 2021 22:12
@dev-zero
Copy link
Copy Markdown
Contributor

dev-zero commented Nov 4, 2021

@alalazo does this mean I can finally set the default of variants depending on +cuda to True, but they get ignored unless +cuda is enabled?

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 4, 2021

@dev-zero I think so, but to be clear:

  • We can't set the default value of a package based on some other variant
  • We can make variant conditional

For instance, when this PR is merged a spec satisfying cp2k~cuda will have no cuda_fft variant at all. In that sense you can change the default of cuda_fft to True since the variant will be there only when +cuda is in the spec.

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 4, 2021

@sethrj FYI, not sure this can make it to v0.17.0

fixes spack#14337

The variant to specify which architecture to use
for CUDA and ROCm are now conditional on +cuda and
+rocm respectively.
@alalazo alalazo force-pushed the mixims/cuda_rocm_conditional_variants branch from 22f98e0 to 771919e Compare November 4, 2021 19:09
@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 4, 2021

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Nov 4, 2021

I've started that pipeline for you!

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Nov 4, 2021

@spackbot run pipeline

@spackbot-app
Copy link
Copy Markdown

spackbot-app bot commented Nov 4, 2021

I've started that pipeline for you!

@haampie
Copy link
Copy Markdown
Member

haampie commented Nov 5, 2021

Are we going to merge this before 0.17? (I would be in favor :D)

Copy link
Copy Markdown
Contributor

@sethrj sethrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

variant('cuda_arch_35_k20x', default=False,
description=('CP2K (resp. DBCSR) has specific parameter sets for'
' different GPU models. Enable this when building'
' with cuda_arch=35 for a K20x instead of a K40'))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not relevant to your changes, but damn this is ugly 😂

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, luckily Nvidia fixed that afterwards, and many codes can build for multiple models in parallel, hence don't have that problem in the first place.

@sethrj sethrj merged commit 5eba5dc into spack:develop Nov 22, 2021
@alalazo alalazo deleted the mixims/cuda_rocm_conditional_variants branch November 22, 2021 12:55
wdconinc added a commit to wdconinc/spack that referenced this pull request Dec 5, 2021
Since spack#27185, the cuda_arch variant values are conditional on +cuda. This means that for -cuda specs, the installation fails with:
```
==> acts: Executing phase: 'cmake'
==> Error: KeyError: 'cuda_arch'

/home/wdconinc/git/spack/var/spack/repos/builtin/packages/acts/package.py:222, in cmake_args:
        219        log_failure_threshold = spec.variants['log_failure_threshold'].value
        220        args.append("-DACTS_LOG_FAILURE_THRESHOLD={0}".format(log_failure_threshold))
        221
  >>    222        cuda_arch = spec.variants['cuda_arch'].value
        223        if cuda_arch != 'none':
        224            args.append('-DCUDA_FLAGS=-arch=sm_{0}'.format(cuda_arch[0]))
        225
```
alalazo pushed a commit that referenced this pull request Dec 6, 2021
Since #27185, the cuda_arch variant values are conditional on +cuda. This means that for -cuda specs, the installation fails with:
```
==> acts: Executing phase: 'cmake'
==> Error: KeyError: 'cuda_arch'

/home/wdconinc/git/spack/var/spack/repos/builtin/packages/acts/package.py:222, in cmake_args:
        219        log_failure_threshold = spec.variants['log_failure_threshold'].value
        220        args.append("-DACTS_LOG_FAILURE_THRESHOLD={0}".format(log_failure_threshold))
        221
  >>    222        cuda_arch = spec.variants['cuda_arch'].value
        223        if cuda_arch != 'none':
        224            args.append('-DCUDA_FLAGS=-arch=sm_{0}'.format(cuda_arch[0]))
        225
```
@adamjstewart
Copy link
Copy Markdown
Member

It seems like cuda_arch still has a none option, do we want to disallow this? That would significantly reduce error checking in a lot of packages. I feel like we only added cuda_arch=none as an option so that we could support ~cuda before this variant was conditional.

@sethrj
Copy link
Copy Markdown
Contributor

sethrj commented Feb 2, 2022

I'm in favor of it. Technically you can have CUDA enabled but not have any device-specific code, but it's really only useful for testing. (You can for example build against CUDA on a system without a working CUDA card, and you can still call CUDA runtime APIs such as cudaMalloc even if you don't have any device-specific code.)

@alalazo
Copy link
Copy Markdown
Member Author

alalazo commented Feb 3, 2022

I guess the question would be what should be the default.

@sethrj
Copy link
Copy Markdown
Contributor

sethrj commented Feb 3, 2022

It would be really cool if we could interrogate the system to find the default, but that could be bad when build and deploy systems are different. Probably better for there to be a way to force the user to select the cuda architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hpx package.py: self.variants['instrumentation'].values: 'tuple' object has no attribute 'values' make variants in CudaPackage optional

7 participants