Skip to content

rocmPackages.llvm.libcxx: fix build by disabling test#375850

Merged
collares merged 1 commit intoNixOS:masterfrom
mschwaig:fix-rocm-build-by-disabling-test
Jan 24, 2025
Merged

rocmPackages.llvm.libcxx: fix build by disabling test#375850
collares merged 1 commit intoNixOS:masterfrom
mschwaig:fix-rocm-build-by-disabling-test

Conversation

@mschwaig
Copy link
Member

Closes #375745

I am not sure yet that this is the fix we should apply, but probably this will just work.

I am running nixkpgs-review on it now.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added 6.topic: rocm ROCm is an Advanced Micro Devices software stack for graphics processing unit programming. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 101-500 This PR causes between 101 and 500 packages to rebuild on Linux. labels Jan 22, 2025
@Flakebi
Copy link
Member

Flakebi commented Jan 22, 2025

Sounds good to me based on your investigation in #375745.

@mschwaig
Copy link
Member Author

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 375850


x86_64-linux

⏩ 13 packages marked as broken and skipped:
  • python312Packages.vllm
  • python312Packages.vllm.dist
  • rocmPackages.llvm.flang
  • rocmPackages.llvm.flang.doc
  • rocmPackages.llvm.flang.info
  • rocmPackages.llvm.flang.man
  • rocmPackages.llvm.libclc
  • rocmPackages.migraphx
  • rocmPackages.mivisionx
  • rocmPackages.mivisionx-cpu
  • rocmPackages.mivisionx-hip
  • rocmPackages.rdc
  • rocmPackages.rdc.doc
❌ 12 packages failed to build:
  • ollama-rocm
  • rocmPackages.hipblas
  • rocmPackages.hipsolver
  • rocmPackages.miopen
  • rocmPackages.rocalution
  • rocmPackages.rocblas
  • rocmPackages.rocmlir
  • rocmPackages.rocmlir-rock
  • rocmPackages.rocmlir.external
  • rocmPackages.rocprofiler
  • rocmPackages.rocsolver
  • zluda
✅ 63 packages built:
  • blender-hip
  • btop-rocm
  • rocmPackages.clang-ocl
  • rocmPackages.clr
  • rocmPackages.clr.icd
  • rocmPackages.composable_kernel
  • rocmPackages.half
  • rocmPackages.hip-common
  • rocmPackages.hipcc
  • rocmPackages.hipcub
  • rocmPackages.hipfft
  • rocmPackages.hipfort
  • rocmPackages.hipify
  • rocmPackages.hiprand
  • rocmPackages.hipsparse
  • rocmPackages.hsa-amd-aqlprofile-bin
  • rocmPackages.llvm.clang
  • rocmPackages.llvm.clang-tools-extra
  • rocmPackages.llvm.clang-tools-extra.doc
  • rocmPackages.llvm.clang-tools-extra.info
  • rocmPackages.llvm.clang-tools-extra.man
  • rocmPackages.llvm.libcxx
  • rocmPackages.llvm.libcxx.doc
  • rocmPackages.llvm.lldb
  • rocmPackages.llvm.lldb.doc
  • rocmPackages.llvm.lldb.info
  • rocmPackages.llvm.lldb.man
  • rocmPackages.llvm.mlir
  • rocmPackages.llvm.openmp
  • rocmPackages.llvm.openmp.doc
  • rocmPackages.llvm.openmp.info
  • rocmPackages.llvm.openmp.man
  • rocmPackages.llvm.polly
  • rocmPackages.llvm.polly.doc
  • rocmPackages.llvm.polly.info
  • rocmPackages.llvm.polly.man
  • rocmPackages.llvm.pstl
  • rocmPackages.llvm.rocmClangStdenv
  • rocmPackages.rccl
  • rocmPackages.rocdbgapi
  • rocmPackages.rocdbgapi.doc
  • rocmPackages.rocfft
  • rocmPackages.rocgdb
  • rocmPackages.rocm-cmake
  • rocmPackages.rocm-comgr
  • rocmPackages.rocm-core
  • rocmPackages.rocm-device-libs
  • rocmPackages.rocm-runtime
  • rocmPackages.rocm-smi
  • rocmPackages.rocm-thunk
  • rocmPackages.rocminfo
  • rocmPackages.rocprim
  • rocmPackages.rocr-debug-agent
  • rocmPackages.rocrand
  • rocmPackages.rocsparse
  • rocmPackages.rocthrust
  • rocmPackages.roctracer
  • rocmPackages.rocwmma
  • rocmPackages.rpp (rocmPackages.rpp-hip)
  • rocmPackages.rpp-cpu
  • rocmPackages.rpp-opencl
  • rocmPackages.tensile
  • rocmPackages.tensile.dist

The key thing that still seems to be failing to build now is rocblas.

ollama-rocm, hipblas, hipsolver, rocalution,rocsolver were fine before this issue came up, but are downstream from rocblas.

The others were already broken, and should maybe be marked as such, or not, since 6.3 seems to be coming up anyways.

I have attached the build log for rocblas here: rocmPackages.rocblas-x86_64-linux.log.

I'm still not sure yet if skipping that test is the right fix. Adding a mechanism to bypass the warning would be another possibility.

@mschwaig mschwaig marked this pull request as ready for review January 22, 2025 22:11
@mschwaig
Copy link
Member Author

mschwaig commented Jan 22, 2025

I have marked this PR as ready for review, not because I actively want it reviewed or merged before looking into what's up with rocblas, but because if someone can confirm that it fixes their use case I think merging this for now might make sense.

@mschwaig
Copy link
Member Author

The log output of the rocblas build is full of output produced by this line (probably not from that exact commit though):
https://github.com/ROCm/hipBLASLt/blob/a11ccf64efcd818106dbe37768f69dfcc0a7ff22/tensilelite/Tensile/SolutionStructs.py#L1297

@deftdawg
Copy link
Contributor

Thanks for this, being that it addresses the only test that fails on my system, I'll apply this locally even if it isn't merged:

********************
Failed Tests (1):
  llvm-libc++-shared.cfg.in :: libcxx/transitive_includes.sh.cpp


Testing Time: 560.01s

Total Discovered Tests: 7869
  Unsupported      :  428 (5.44%)
  Passed           : 7400 (94.04%)
  Expectedly Failed:   40 (0.51%)
  Failed           :    1 (0.01%)
...
error: builder for '/nix/store/35a4dhaa1b27dcscz6qnj97a15qma2bx-rocm-llvm-libcxx-6.0.2.drv' failed with exit code 1

@sgiath
Copy link
Contributor

sgiath commented Jan 24, 2025

This works for me too.

@mschwaig
Copy link
Member Author

I'd say let's merge this then and fix the issue with rocblas is in another PR.

@mschwaig
Copy link
Member Author

Does anybody have an idea how we could turn off/patch out that echo from the stdenv we use in ROCm?

Quoting myself from the PR mentioned above

I'm guessing reverting this or adding an environment variable to turn this off in some places would not be a practical fix because it would cause a mass rebuild and therefore have a really long turnaround time?

@collares collares merged commit a77fc1d into NixOS:master Jan 24, 2025
28 of 30 checks passed
@collares
Copy link
Member

I have marked this PR as ready for review, not because I actively want it reviewed or merged before looking into what's up with rocblas, but because if someone can confirm that it fixes their use case I think merging this for now might make sense.

I'd guess a large proportion of people hitting this issue probably care only about gaming, and I've verified Steam works with this patch applied. It seems like a harmless workaround until #370435 is ready.

@olaffreund
Copy link

Will this be merged into the stable as well ?

@mschwaig
Copy link
Member Author

Will this be merged into the stable as well ?

I think what you might be waiting for is just for this to land in nixpkgs-unstable, where it has not quite landed yet as of now, because it takes some time for channels to advance. A few days at most, unless something is seriously 'blocking' the channel. You can keep track whether a PR has landed in a channel yet here: https://nixpk.gs/pr-tracker.html?pr=375850

What people usually do to get away from breakage like this is stay on the last working version until the fix is ready, or move forward to a newer commit on master, which has not landed yet. Though I have no idea how that stuff works without flakes, besides picking the previous generation in the bootloader.

It does not look like the change to the wrapper that caused this build failure was backported into a release, and the package seems to be building fine on 24.11:
https://hydra.nixos.org/build/284989140

@olaffreund
Copy link

Will this be merged into the stable as well ?

I think what you might be waiting for is just for this to land in nixpkgs-unstable, where it has not quite landed yet as of now, because it takes some time for channels to advance. A few days at most, unless something is seriously 'blocking' the channel. You can keep track whether a PR has landed in a channel yet here: https://nixpk.gs/pr-tracker.html?pr=375850

What people usually do to get away from breakage like this is stay on the last working version until the fix is ready, or move forward to a newer commit on master, which has not landed yet. Though I have no idea how that stuff works without flakes, besides picking the previous generation in the bootloader.

It does not look like the change to the wrapper that caused this build failure was backported into a release, and the package seems to be building fine on 24.11: https://hydra.nixos.org/build/284989140

Looks like if I use 24.11 and do not mix stable and unstable packages I'm all good. I was running ollama from unstable with rocm packages from unstable as well. Thanks for the update I will just need to wait then. Cheers

@collares
Copy link
Member

Looks like if I use 24.11 and do not mix stable and unstable packages I'm all good. I was running ollama from unstable with rocm packages from unstable as well. Thanks for the update I will just need to wait then. Cheers

Note that the present PR will not fix ollama-rocm, since rocblas still doesn't build (see #375850 (comment)).

pbek added a commit to pbek/nixcfg that referenced this pull request Feb 3, 2025
… after fixed for kernel 6.13

Signed-off-by: Patrizio Bekerle <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: rocm ROCm is an Advanced Micro Devices software stack for graphics processing unit programming. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 101-500 This PR causes between 101 and 500 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build failure: rocmPackages.llvm.libcxx

6 participants