Skip to content

Added ROCmPackage (build system) documentation#20743

Merged
tldahlgren merged 3 commits intospack:developfrom
tldahlgren:docs/add-rocmpackage-description
Jan 21, 2021
Merged

Added ROCmPackage (build system) documentation#20743
tldahlgren merged 3 commits intospack:developfrom
tldahlgren:docs/add-rocmpackage-description

Conversation

@tldahlgren
Copy link
Copy Markdown
Contributor

This PR creates and adds (build system) documentation for the ROCmPackage. It contains the ROCm equivalents of the contents added in #20742 .

@tldahlgren tldahlgren added the documentation Improvements or additions to documentation label Jan 8, 2021
@tldahlgren tldahlgren self-assigned this Jan 8, 2021
@tgamblin tgamblin requested review from becker33 and dtaller January 8, 2021 03:04
Copy link
Copy Markdown
Contributor

@dtaller dtaller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work, @tldahlgren ! I know documentation is not the most fun task, so I appreciate that you did this for me (and other rocm users). Overall it looks fine, just a few small additions.

'-DHIP_ROOT_DIR={0}'.format(spec['hip'].prefix])
rocm_archs = spec.variants['amdgpu_target'].value
if 'none' not in rocm_archs:
args.append('-DHIP_HIPCC_FLAGS=--amdgpu-target={0}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have to change soon. There is a pull request #20715 that will add another flag, --rocm-device-lib-path={1}, here also. It might be worth just adding that in now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to change the documentation when the feature is not yet supported.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's fair, no problem

:emphasize-lines: 1,7-18

class MyRocmPackage(CMakePackage, ROCmPackage):
...
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I would add a note about passing rocm and amdgpu_target to dependencies if you have any. This is not done automatically, and you get get strange build issues if you fail to this this. Something like:

# variants +rocm and amdgpu_targets are not automatically passed to
# dependencies, so do it manually.
depends_on('camp+rocm', when='+rocm')
for val in ROCmPackage.amdgpu_targets:
    depends_on('camp amdgpu_target=%s' % val, when='amdgpu_target=%s' % val)

^^^^^^^^^^^^

This package defines basic ``rocm`` dependencies, including ``llvm`` and ``hip``.

Copy link
Copy Markdown
Contributor

@dtaller dtaller Jan 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add another sentence or two to remind them that if they want to use a pre-installed version of rocm they must add the following to their packages.yaml: hip (prefix: /opt/rocm/hip) , hsa-rocr-dev (prefix: /opt/rocm) , and llvm-amdgpu ( prefix: /opt/rocm/llvm ), and that they are advised to add the internal hip clang compiler to their compilers.yaml file (cc path /opt/rocm/llvm/bin/clang and cxx path /opt/rocm/llvm/bin/clang++ ). These will NOT be automatically found when you do spack external find --not-buildable and spack compiler find .

You can see this and some other troubleshooting advice at the top of the build_systems/rocm.py file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit unsure if registering the amd clang compiler is a great idea. The way I see it it is a device compiler, and we're only registering host compilers, right? I don't have nvcc registered as a compiler either.

We should just try to make sure that packages use amd's clang just to compile device-code, and otherwise use the default compiler for host code. (There's a few packages that currently set CXX to hipcc, which doesn't seem right).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the process of debugging a package that used rocm only to find out they linked both against GNU's OpenMP and (the AMD version of) clang's OpenMP, that was a mess.

Copy link
Copy Markdown
Contributor Author

@tldahlgren tldahlgren Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to add this information -- especially hard-coded paths -- to the documentation (at least at this time).

I followed those instructions when I was preparing for an E4S hackathon and they caused problems with the hackathon environment. I don't recall off-hand whether it was the compilers.yaml change or the packages.yaml file that caused my problems. I think it was the latter but I can't state that with 100% certainty.

How about I ensure a reference to the rocm package?

Copy link
Copy Markdown
Contributor

@dtaller dtaller Jan 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @tldahlgren : That's understandable, referencing the file with suggested changes to compilers.yaml and packages.yaml sounds reasonable.

To further explain my thinking to @haampie : I think that rocm is packaged with both hipcc (device compiler) and a version of clang (host compiler). I previously ran into an issue when I used a different host compiler which referenced a different version of glibc (gnu c library with some common functions). That resulted in errors like ../lib/libumpire.so: undefined reference to `std::__throw_out_of_range_fmt(char const*, ...)@@GLIBCXX_3.4.20' , because the rocm stuff was internally using a different version. Referencing /opt/rocm/llvm/bin/clang++ made my error go away. However, there may be other better ways to resolve this issue

Copy link
Copy Markdown
Contributor

@dtaller dtaller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me

@tldahlgren tldahlgren merged commit 25bab31 into spack:develop Jan 21, 2021
@tldahlgren tldahlgren deleted the docs/add-rocmpackage-description branch January 21, 2021 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants