Updates to CHAI, RAJA, UMPIRE, and CARE packages. Fixes to hip package for ROCm v3.9+ support#20715
Conversation
|
Could you make a similar change in the camp package? See var/spack/repos/builtin/packages/camp/package.py Notice that umpire depends on camp, so we should have the same flags there for consistency |
|
Indeed. I've made similar changes to the camp package. |
dtaller
left a comment
There was a problem hiding this comment.
In hip/package.py, we use both if '@3.9.0:' in self.spec and '@:3.8.0' in self.spec: for the same check (if the version is less than 3.8) instead of asking if we are greater than 3.9 . Perhaps we should make the two consistent, so it is apparent that they are doing something similar?
| 'device_lib_path': fallback_prefix.lib | ||
| } | ||
| # ROCm 3.9 changes the default location of the device_lib_path | ||
| if '@3.9.0:' in self.spec: |
There was a problem hiding this comment.
If you go to line 96 below, they have the if block the other way ( '@:3.8.0' in self.spec: for if the version is less than 3.8) instead of asking if we are greater than 3.9 . Perhaps we should make the two consistent, so it is apparant that they are doing something similar?
There was a problem hiding this comment.
I've corrected the test on line 96. The original test would exclude versions such as 3.8.X, where X is non-zero. See the commit.
There was a problem hiding this comment.
For reference, @:3.8.9999 is the inverse of @3.9.0:, which I think makes clear how the error occurred.
| arch_str = ",".join(archs) | ||
| options.append( | ||
| '-DHIP_HIPCC_FLAGS=--amdgpu-target={0}'.format(arch_str) | ||
| '-DHIP_HIPCC_FLAGS=--amdgpu-target={0} --rocm-device-lib-path={1}' |
There was a problem hiding this comment.
Like I said in the main conversation, please make similar change to camp library since umpire depends on camp
There was a problem hiding this comment.
Indeed, I've made changes to camp as well. Thanks!
dtaller
left a comment
There was a problem hiding this comment.
Thanks Kostas! Looks good to me, I am happy with this. Approved.
| 'hsa-rocr-dev': fallback_prefix.hsa, | ||
| 'rocminfo': fallback_prefix.bin, | ||
| 'rocm-device-libs': fallback_prefix.amdgcn.bitcode, | ||
| 'device_lib_path': fallback_prefix.amdgcn.bitcode |
There was a problem hiding this comment.
Why do the packages depending on this use $DEVICE_LIB_PATH instead of the value returned here?
There was a problem hiding this comment.
@cmakrides-hpe @dtaller I'm a little concerned that we're proliferating the same information via two mechanisms.
@becker I think there was some issue with the hand-off between hip and its dependent package.
I think previously you suggested doing the handoff via the setup_dependent_package function in hip.py to set something spec.device_path or something similar, but I am ok either way. I am not sure if Kostas tried that and ran into some issue, or if he just preferred this method. We will wait for him to respond
There was a problem hiding this comment.
Within the hip package, I believe the dictionary is exclusively used. I didn't have much success with forwarding this to a dependent package and the workaround was to use the environment. At the time, I realized that the hip package exposed the dictionary via the environment and assumed the intention was for dependent packages to get the rocm paths from the environment.
There was a problem hiding this comment.
Hi @cmakrides-hpe . Could you please show us what errors arise exactly when you do something like what Greg suggested (like adding spec.extra_attributes[‘device_lib_path’] = path etc and accessing it from umpire, care, etc)?
If Greg sees exactly what errors arise in his approach he might be better equipped to offer a fix, a different suggestion, or approve the current approach you are using so we can move forward
There was a problem hiding this comment.
I'll have to re-create those attempts since I don't think I pushed a commit locally with those attempts.
It seems that this approach is the preferred method of passing these arguments?
There was a problem hiding this comment.
I have contributed to spack a bit, but I would defer to Greg and other developers who work on this full time, see what their design philosophy is exactly. But that is my impression from previous comments
There was a problem hiding this comment.
Through some trial and error I was able to use the setup_dependent_package interface and pass a path defined from the hip package. @becker33 do you have advice on if I should keep the environment variables that I previously used to pass the paths around?
There was a problem hiding this comment.
I would recommend using the setup_dependent_package interface exclusively.
In general, I recommend using environment variables only when those are the expected interface to the package, and use the spack spec/package as the interface to convey information to Spack itself.
|
@cmakrides-hpe @dtaller I'm a little concerned that we're proliferating the same information via two mechanisms. |
|
Hi @cmakrides-hpe . Could you merge with develop and also modify lib/spack/docs/build_systems/rocmpackage.rst to reflect your new flag? See #20743 . |
…ependent_package interface. Added supporting documentation to rocmpackage.rst
|
I've restarted the macos test, it failed on what I believe is a transient failure on the test infrastructure. If it fails again we'll look into it, but nothing in this PR suggests it should touch macos support in particular. |
|
We should just try to improve this AMD environment variable madness upstream. I've created issues here: |
There was a problem hiding this comment.
I would object to this change. --rocm-device-lib-path is an AMD-clang flag, and hipcc does not expect this flag; it's just a convenience wrapper around clang that tries to set this flag itself.
If you don't have hipcc, device libs and clang in a default path (so, the spack situation), and you want to use hipcc, you have to configure it through environment variables. If you use clang directly, then use --rocm-device-lib-path.
Hello @haampie . Just to keep you in the loop, I emailed Greg and Kostas yesterday. It was decided that we should get a contact from AMD to take a look and offer his/her perspective. Kostas and Greg will get in touch with the correct person at AMD and likely respond soon after. |
|
@becker33 did you accidentally hit the approve button? We should not merge this, #21852 already did everything this PR tries to accomplish.
@cmakrides-hpe can you test your packages without this PR on current |
Hi @haampie . What happened was that a contact at AMD emailed us saying that the solution in this PR is acceptable, which is why Greg Becker approved. However, if it has already been fixed, then of course this is not needed. I will let @cmakrides-hpe test things out and confirm that the develop branch is working for him on his platform. |
|
Hi @haampie . Just to keep you in the loop, I got an email from @cmakrides-hpe . He said that it seems like the changes already merged into develop work on his system. He will likely proceed with spack develop instead of this branch. Thanks everyone for your patience. |
|
Great, happy to hear that things work. Let me close this PR, but if there are further issues to be addressed, feel free to reopen! |
Greetings,
I have a few changes that derive from the change in ROCm 3.9 for the default path of the device libraries from ${ROCM_ROOT}/lib to ${ROCM_ROOT}/amdgcn/bitcode. This issue then prompted changes in the RAJA, UMPIRE, CHAI, and CARE packages (all similar changes to augment the HIP_HIPCC_FLAGS option)