Skip to content

rocm and hip: 4.3.1 → 4.5.2#150767

Merged
lovesegfault merged 10 commits intoNixOS:masterfrom
Flakebi:rocm
Dec 22, 2021
Merged

rocm and hip: 4.3.1 → 4.5.2#150767
lovesegfault merged 10 commits intoNixOS:masterfrom
Flakebi:rocm

Conversation

@Flakebi
Copy link
Member

@Flakebi Flakebi commented Dec 14, 2021

Motivation for this change

Update all rocm packages to 4.5.2.
I tested rocm-smi, some OpenCL examples and a few hip examples from https://github.com/ROCm-Developer-Tools/HIP-Examples and they ran through fine. More testing of hip would be nice, I’m not that familiar with all its features and the binaries that are shipped.

There were a few changes in the rocm build landscape:

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.05 Release Notes (or backporting 21.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot requested review from acowley and lovesegfault December 14, 2021 21:52
@ofborg ofborg bot added 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels Dec 14, 2021
@acowley
Copy link
Contributor

acowley commented Dec 15, 2021

I’ve not yet had a chance to build or run this, but many of the changes look very positive! Thank you! Something missing here as compared to what I did in the overlay is to move rocm-opencl-runtime from Python 2 to 3. I don’t think that change caused any trouble, and it will help address the nixpkgs migration from 2 to 3.

Fix the build with rocclr distributed by source.
rocclr cannot be built alone and needs to be distributed by source now.
Fix compiler-rt build and use ninja for faster builds.
@Flakebi
Copy link
Member Author

Flakebi commented Dec 15, 2021

Thanks for the comment!
I updated the patches to include more of the overlay changes (diff).

@acowley
Copy link
Contributor

acowley commented Dec 15, 2021

One issue I'm having now is that when I try to build something with the hipcc from this PR, I get,

/nix/store/lbxfixyw1yk099pjyaiy3xj5dl7kxm1g-binutils-2.35.2/bin/ld: cannot find -lamdhip64
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)

So it's missing something like a -L $out/lib. Can you see where that might be missing?

@acowley
Copy link
Contributor

acowley commented Dec 15, 2021

I think it might be the way we patch HIP_LIB_PATH in hipcc. You do this nice thing where you build a hip derivation that the final hipamd derivation points to, while I took a single source view approach by copying the hip source into the hipamd build tree. One difference between those approaches is the meaning of $out in different places.

hip was split into multiple repositories.
This builds the version for AMD GPUs.
@Flakebi
Copy link
Member Author

Flakebi commented Dec 22, 2021

Thanks for testing and digging into this, the HIP_LIB_PATH is indeed wrong!

I previously tested hip with having hip in nativeBuildInputs in a shell.nix and that also added the hip/lib path to NIX_LDFLAGS, so I didn’t encounter this error in my testing.
I now tried again without hip in the build inputs and it works after removing the HIP_LIB_PATH replacement (which makes HIP_LIB_PATH default to $HIP_ROCCLR_HOME/lib, which is the correct path).

@acowley
Copy link
Contributor

acowley commented Dec 22, 2021

@Flakebi Do you by any chance have cachix setup with CI builds for your fork? Building llvm and clang to test is a real drag on cycle time.

@Flakebi
Copy link
Member Author

Flakebi commented Dec 22, 2021

I usually use my beefy work machine to compile anything that contains llvm, that alleviates the pain 🙂 I guess I could push the builds somewhere, but it looks like cachix costs something?

@acowley
Copy link
Contributor

acowley commented Dec 22, 2021

cachix has a free tier for open source projects, and it can be hooked into GitHub CI quite nicely. This is sloppy, but you can see how it's used in the nixos-rocm overlay so that users don't have to build anything.

I've not set it up myself for a nixpkgs fork, so that could be an issue. But there is also the option to manually push store paths which would work for this case.

No matter: my small tests of hip and OpenCL ran properly, so I'm happy with this. If you're satisfied with the PR now, @Flakebi , and neither @matthewbauer nor @lovesegfault chime in, I'll merge the PR later tonight.

@Flakebi
Copy link
Member Author

Flakebi commented Dec 22, 2021

That’s fine with me, thanks for testing again.
Next time, I’ll try pushing the builds to cachix.

@lovesegfault
Copy link
Member

Looks AOK for me :)

@lovesegfault lovesegfault merged commit f403d1f into NixOS:master Dec 22, 2021
@Flakebi Flakebi deleted the rocm branch December 22, 2021 23:46
@acowley
Copy link
Contributor

acowley commented Dec 22, 2021

Thank you for taking this on and seeing it through, @Flakebi!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants