Added ROCmPackage (build system) documentation#20743
Added ROCmPackage (build system) documentation#20743tldahlgren merged 3 commits intospack:developfrom
Conversation
dtaller
left a comment
There was a problem hiding this comment.
Thanks for your work, @tldahlgren ! I know documentation is not the most fun task, so I appreciate that you did this for me (and other rocm users). Overall it looks fine, just a few small additions.
| '-DHIP_ROOT_DIR={0}'.format(spec['hip'].prefix]) | ||
| rocm_archs = spec.variants['amdgpu_target'].value | ||
| if 'none' not in rocm_archs: | ||
| args.append('-DHIP_HIPCC_FLAGS=--amdgpu-target={0}' |
There was a problem hiding this comment.
This will have to change soon. There is a pull request #20715 that will add another flag, --rocm-device-lib-path={1}, here also. It might be worth just adding that in now.
There was a problem hiding this comment.
I'm hesitant to change the documentation when the feature is not yet supported.
There was a problem hiding this comment.
Ok, that's fair, no problem
| :emphasize-lines: 1,7-18 | ||
|
|
||
| class MyRocmPackage(CMakePackage, ROCmPackage): | ||
| ... |
There was a problem hiding this comment.
Here, I would add a note about passing rocm and amdgpu_target to dependencies if you have any. This is not done automatically, and you get get strange build issues if you fail to this this. Something like:
# variants +rocm and amdgpu_targets are not automatically passed to
# dependencies, so do it manually.
depends_on('camp+rocm', when='+rocm')
for val in ROCmPackage.amdgpu_targets:
depends_on('camp amdgpu_target=%s' % val, when='amdgpu_target=%s' % val)
| ^^^^^^^^^^^^ | ||
|
|
||
| This package defines basic ``rocm`` dependencies, including ``llvm`` and ``hip``. | ||
|
|
There was a problem hiding this comment.
I would add another sentence or two to remind them that if they want to use a pre-installed version of rocm they must add the following to their packages.yaml: hip (prefix: /opt/rocm/hip) , hsa-rocr-dev (prefix: /opt/rocm) , and llvm-amdgpu ( prefix: /opt/rocm/llvm ), and that they are advised to add the internal hip clang compiler to their compilers.yaml file (cc path /opt/rocm/llvm/bin/clang and cxx path /opt/rocm/llvm/bin/clang++ ). These will NOT be automatically found when you do spack external find --not-buildable and spack compiler find .
You can see this and some other troubleshooting advice at the top of the build_systems/rocm.py file
There was a problem hiding this comment.
I'm still a bit unsure if registering the amd clang compiler is a great idea. The way I see it it is a device compiler, and we're only registering host compilers, right? I don't have nvcc registered as a compiler either.
We should just try to make sure that packages use amd's clang just to compile device-code, and otherwise use the default compiler for host code. (There's a few packages that currently set CXX to hipcc, which doesn't seem right).
There was a problem hiding this comment.
I went through the process of debugging a package that used rocm only to find out they linked both against GNU's OpenMP and (the AMD version of) clang's OpenMP, that was a mess.
There was a problem hiding this comment.
I'm hesitant to add this information -- especially hard-coded paths -- to the documentation (at least at this time).
I followed those instructions when I was preparing for an E4S hackathon and they caused problems with the hackathon environment. I don't recall off-hand whether it was the compilers.yaml change or the packages.yaml file that caused my problems. I think it was the latter but I can't state that with 100% certainty.
How about I ensure a reference to the rocm package?
There was a problem hiding this comment.
To @tldahlgren : That's understandable, referencing the file with suggested changes to compilers.yaml and packages.yaml sounds reasonable.
To further explain my thinking to @haampie : I think that rocm is packaged with both hipcc (device compiler) and a version of clang (host compiler). I previously ran into an issue when I used a different host compiler which referenced a different version of glibc (gnu c library with some common functions). That resulted in errors like ../lib/libumpire.so: undefined reference to `std::__throw_out_of_range_fmt(char const*, ...)@@GLIBCXX_3.4.20' , because the rocm stuff was internally using a different version. Referencing /opt/rocm/llvm/bin/clang++ made my error go away. However, there may be other better ways to resolve this issue
…uggested review base class for settings
dtaller
left a comment
There was a problem hiding this comment.
Thanks, looks good to me
This PR creates and adds (build system) documentation for the
ROCmPackage. It contains the ROCm equivalents of the contents added in #20742 .