Skip to content

Updates to CHAI, RAJA, UMPIRE, and CARE packages. Fixes to hip package for ROCm v3.9+ support#20715

Closed
cmakrides-hpe wants to merge 9 commits intospack:developfrom
cmakrides-hpe:features/amd-support-perfPortability
Closed

Updates to CHAI, RAJA, UMPIRE, and CARE packages. Fixes to hip package for ROCm v3.9+ support#20715
cmakrides-hpe wants to merge 9 commits intospack:developfrom
cmakrides-hpe:features/amd-support-perfPortability

Conversation

@cmakrides-hpe
Copy link
Copy Markdown

Greetings,
I have a few changes that derive from the change in ROCm 3.9 for the default path of the device libraries from ${ROCM_ROOT}/lib to ${ROCM_ROOT}/amdgcn/bitcode. This issue then prompted changes in the RAJA, UMPIRE, CHAI, and CARE packages (all similar changes to augment the HIP_HIPCC_FLAGS option)

@dtaller dtaller requested review from becker33 and dtaller January 7, 2021 00:45
@dtaller
Copy link
Copy Markdown
Contributor

dtaller commented Jan 7, 2021

Could you make a similar change in the camp package?

See var/spack/repos/builtin/packages/camp/package.py

Notice that umpire depends on camp, so we should have the same flags there for consistency

@cmakrides-hpe
Copy link
Copy Markdown
Author

Indeed. I've made similar changes to the camp package.

Copy link
Copy Markdown
Contributor

@dtaller dtaller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In hip/package.py, we use both if '@3.9.0:' in self.spec and '@:3.8.0' in self.spec: for the same check (if the version is less than 3.8) instead of asking if we are greater than 3.9 . Perhaps we should make the two consistent, so it is apparent that they are doing something similar?

'device_lib_path': fallback_prefix.lib
}
# ROCm 3.9 changes the default location of the device_lib_path
if '@3.9.0:' in self.spec:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you go to line 96 below, they have the if block the other way ( '@:3.8.0' in self.spec: for if the version is less than 3.8) instead of asking if we are greater than 3.9 . Perhaps we should make the two consistent, so it is apparant that they are doing something similar?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've corrected the test on line 96. The original test would exclude versions such as 3.8.X, where X is non-zero. See the commit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, @:3.8.9999 is the inverse of @3.9.0:, which I think makes clear how the error occurred.

arch_str = ",".join(archs)
options.append(
'-DHIP_HIPCC_FLAGS=--amdgpu-target={0}'.format(arch_str)
'-DHIP_HIPCC_FLAGS=--amdgpu-target={0} --rocm-device-lib-path={1}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I said in the main conversation, please make similar change to camp library since umpire depends on camp

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I've made changes to camp as well. Thanks!

Copy link
Copy Markdown
Contributor

@dtaller dtaller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kostas! Looks good to me, I am happy with this. Approved.

'hsa-rocr-dev': fallback_prefix.hsa,
'rocminfo': fallback_prefix.bin,
'rocm-device-libs': fallback_prefix.amdgcn.bitcode,
'device_lib_path': fallback_prefix.amdgcn.bitcode
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do the packages depending on this use $DEVICE_LIB_PATH instead of the value returned here?

Copy link
Copy Markdown
Contributor

@dtaller dtaller Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmakrides-hpe @dtaller I'm a little concerned that we're proliferating the same information via two mechanisms.

@becker I think there was some issue with the hand-off between hip and its dependent package.

I think previously you suggested doing the handoff via the setup_dependent_package function in hip.py to set something spec.device_path or something similar, but I am ok either way. I am not sure if Kostas tried that and ran into some issue, or if he just preferred this method. We will wait for him to respond

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within the hip package, I believe the dictionary is exclusively used. I didn't have much success with forwarding this to a dependent package and the workaround was to use the environment. At the time, I realized that the hip package exposed the dictionary via the environment and assumed the intention was for dependent packages to get the rocm paths from the environment.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cmakrides-hpe . Could you please show us what errors arise exactly when you do something like what Greg suggested (like adding spec.extra_attributes[‘device_lib_path’] = path etc and accessing it from umpire, care, etc)?

If Greg sees exactly what errors arise in his approach he might be better equipped to offer a fix, a different suggestion, or approve the current approach you are using so we can move forward

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to re-create those attempts since I don't think I pushed a commit locally with those attempts.
It seems that this approach is the preferred method of passing these arguments?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have contributed to spack a bit, but I would defer to Greg and other developers who work on this full time, see what their design philosophy is exactly. But that is my impression from previous comments

Copy link
Copy Markdown
Author

@cmakrides-hpe cmakrides-hpe Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Through some trial and error I was able to use the setup_dependent_package interface and pass a path defined from the hip package. @becker33 do you have advice on if I should keep the environment variables that I previously used to pass the paths around?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using the setup_dependent_package interface exclusively.

In general, I recommend using environment variables only when those are the expected interface to the package, and use the spack spec/package as the interface to convey information to Spack itself.

@becker33
Copy link
Copy Markdown
Member

@cmakrides-hpe @dtaller I'm a little concerned that we're proliferating the same information via two mechanisms.

@dtaller
Copy link
Copy Markdown
Contributor

dtaller commented Jan 21, 2021

Hi @cmakrides-hpe . Could you merge with develop and also modify lib/spack/docs/build_systems/rocmpackage.rst to reflect your new flag? See #20743 .

@becker33
Copy link
Copy Markdown
Member

I've restarted the macos test, it failed on what I believe is a transient failure on the test infrastructure. If it fails again we'll look into it, but nothing in this PR suggests it should touch macos support in particular.

@haampie
Copy link
Copy Markdown
Member

haampie commented Jan 22, 2021

We should just try to improve this AMD environment variable madness upstream. I've created issues here:

Copy link
Copy Markdown
Member

@haampie haampie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would object to this change. --rocm-device-lib-path is an AMD-clang flag, and hipcc does not expect this flag; it's just a convenience wrapper around clang that tries to set this flag itself.

See https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-3.9.0/bin/hipcc#L830-L832

If you don't have hipcc, device libs and clang in a default path (so, the spack situation), and you want to use hipcc, you have to configure it through environment variables. If you use clang directly, then use --rocm-device-lib-path.

@dtaller
Copy link
Copy Markdown
Contributor

dtaller commented Jan 28, 2021

I would object to this change. --rocm-device-lib-path is an AMD-clang flag, and hipcc does not expect this flag; it's just a convenience wrapper around clang that tries to set this flag itself.

See https://github.com/ROCm-Developer-Tools/HIP/blob/rocm-3.9.0/bin/hipcc#L830-L832

If you don't have hipcc, device libs and clang in a default path (so, the spack situation), and you want to use hipcc, you have to configure it through environment variables. If you use clang directly, then use --rocm-device-lib-path.

Hello @haampie . Just to keep you in the loop, I emailed Greg and Kostas yesterday. It was decided that we should get a contact from AMD to take a look and offer his/her perspective. Kostas and Greg will get in touch with the correct person at AMD and likely respond soon after.

@haampie
Copy link
Copy Markdown
Member

haampie commented Feb 25, 2021

@becker33 did you accidentally hit the approve button? We should not merge this, #21852 already did everything this PR tries to accomplish.

@cmakrides-hpe can you test your packages without this PR on current develop of spack?

@dtaller
Copy link
Copy Markdown
Contributor

dtaller commented Feb 26, 2021

@becker33 did you accidentally hit the approve button? We should not merge this, #21852 already did everything this PR tries to accomplish.

* https://github.com/ROCm-Developer-Tools/HIP/blob/main/bin/hipcc#L829 hipcc sets `--hip-device-lib-path` to [DEVICE_LIB_PATH](https://github.com/ROCm-Developer-Tools/HIP/blob/main/bin/hipcc#L88) and is [equivalent to --rocm-device-lib-path](https://github.com/ROCm-Developer-Tools/llvm-project/blob/3bc5ed38750c6a6daff39ad524b75e40c8c09183/clang/include/clang/Driver/Options.td#L896)

* also we set HIP_DEVICE_LIB_PATH which is used [in LLVM directly](https://github.com/ROCm-Developer-Tools/llvm-project/blob/c4a91444689455a35db1e7f50bcd876a3eb86126/clang/lib/Driver/ToolChains/AMDGPU.cpp#L206) in case --rocm-device-lib-path is not passed to clang.

@cmakrides-hpe can you test your packages without this PR on current develop of spack?

Hi @haampie . What happened was that a contact at AMD emailed us saying that the solution in this PR is acceptable, which is why Greg Becker approved. However, if it has already been fixed, then of course this is not needed. I will let @cmakrides-hpe test things out and confirm that the develop branch is working for him on his platform.

@dtaller
Copy link
Copy Markdown
Contributor

dtaller commented Mar 1, 2021

Hi @haampie . Just to keep you in the loop, I got an email from @cmakrides-hpe . He said that it seems like the changes already merged into develop work on his system. He will likely proceed with spack develop instead of this branch.

Thanks everyone for your patience.

@haampie
Copy link
Copy Markdown
Member

haampie commented Mar 1, 2021

Great, happy to hear that things work. Let me close this PR, but if there are further issues to be addressed, feel free to reopen!

@haampie haampie closed this Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants