rocmPackages: 6.0.2 -> 6.3.3, and various ROCm build fixes and new packages#367695
rocmPackages: 6.0.2 -> 6.3.3, and various ROCm build fixes and new packages#367695prusnak merged 11 commits intoNixOS:stagingfrom
Conversation
76e05f1 to
c2abb37
Compare
|
not knowing much about the rocm stack, mainly here as i am using btop with rocm support enabled. I receive the following error when compiling |
|
@Shawn8901 Should be fixed now, was broken when I first opened the PR. |
|
One thing that I've wanted to do for a long time is to completely remove the ROCm LLVM as an stdEnv, which should solve a lot of these weird compilation errors. Solus's ROCm stack does this, and we compile every non-HIP code with GCC. In this way, the entire ROCm LLVM can be compacted into a single derivation and the complexity of packaging/updating ROCm LLVM is drastically reduced. That is, you should be able to use just the default stdenv with GCC to compile non-HIP code and tell CMake/HIPCC to use the ROCm LLVM only when compiling HIP code. You can achieve this entirely through environment variables. It doesn't make sense that because a portion of the codebase contains HIP code, any C/C++ in the codebase needs to be compiled with ROCm LLVM's C compiler. |
|
Ok I just noticed the "Contemplate trying to make a normal Nix style CC wrapper work again" section, so it seems like you've already experienced the pain of the ROCm LLVM 😅 I will give my idea a try in the next few days and get back. |
|
It looks like upstream are moving away from a separate hipcc and using clang (now Maintaining a separate HIP only compiler might require maintaining significant cmakefile patches to get it to be used, but if you can work out a way to do this that isn't maintenance hell that's great. |
Please see https://github.com/GZGavinZhao/rocm-llvm-project/commits/solus-rocm-6.2.x for the patches and https://lists.debian.org/debian-ai/2024/12/msg00042.html for more details. I hope they apply cleanly on v6.3, but if not I think the changes are easy enough to manually rewrite them. If you need patches for other components, please see |
Solus does this and we didn't have to use any patches. Most of the work done was figuring out the environment variables to tell CMake and/or HIPCC what our intended HIP compiler is. The only thing I'm worrying about is locating sysroots due to non-standard installation prefix, but other than that Solus's experience shows that this is definitely doable. |
22f00e1 to
c05b8cb
Compare
This comment was marked as outdated.
This comment was marked as outdated.
|
I'm having trouble getting this to build. I get Failed Tests (2): during the triton-llvm-19.1.0-rc1 test phase. RX6800XT. X86_64-linux on nixos. No overlays or config or anything, just trying to create a devshell with python312Packages.torch. I can post a (nearly) minimum reproducible flake: |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
|
I tried to use this PR via overlay But got a collision when building ollama But overall it seems to be building without errors. When not using ollama I was able to rebuild my system without errors. |
|
I got a separate HIP compiler working and successfully compiled |
|
ROCm's standard toolchain is clang + GNU libs including libstdc++. There are a few packages which don't compile with Clashing error is because I added a /llvm link to clr for use by other ROCm packages and ollama also already adds one to its ROCm env internally, can be resolved by dropping the /llvm link from ollama. |
|
Added exact GPU targets as pkgs.rocmPackages_6.gfx908, gfx1030 etc. |
|
Thank you everyone for your awesome work! |
You can track it here: https://nixpk.gs/pr-tracker.html?pr=367695 |
This will really take effect once NixOS/nixpkgs#367695 lands in unstable
|
Now in |
|
Ergh, zluda is broken again: |
Are you on |
|
On master. |
|
According to nixpk.gs, it should be available on |
Fixes #337159
Fixes #383836
Fixes #379354
Bump to 6.3.3 for rocmPackages_6 package set and associated updates in packages which depend on changed or newly introduced ROCm packages.
Upstream PRs/issues Raised
TODO List
and msgpackworking for hipblaslt. 10GB derivation is not ok.Make use of working LLVM packages .override to simplify LLVMoverrideScope isn't present and is needed.Allow better build parallelism by creating -minimal versions of some of the huge packages built for no gfx archesToo difficult, not doing in this PRThings done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.