Skip to content

cudaPackages.cuda-samples: misc bumps and fixups#266115

Merged
ConnorBaker merged 9 commits intoNixOS:masterfrom
ConnorBaker:fix/cuda-samples-backendStdenv
Nov 9, 2023
Merged

cudaPackages.cuda-samples: misc bumps and fixups#266115
ConnorBaker merged 9 commits intoNixOS:masterfrom
ConnorBaker:fix/cuda-samples-backendStdenv

Conversation

@ConnorBaker
Copy link
Contributor

@ConnorBaker ConnorBaker commented Nov 7, 2023

Description of changes

  • Don't include cuda-samples in cudaPackages_11_7 because there isn't a compatible release.
    • Better to have it absent than to throw and break evaluation in a way that can't easily be overridden.
  • Fixes a bug where backendStdenv was always taken from the default cudaPackages, regardless of which version of cudaPackages was trying to build cuda-samples.
  • Refactors to use finalAttrs instead of the rec keyword.
  • Fixes CMake-related build failure for CUDA Packages 12.2+.
    • NOTE: This does not fix 12.0+ builds; see below.
  • Adds CUDA team as maintainers.
  • Switches to autoAddOpenGLRunpathHook.

Warning

Builds with CUDA Packages 12+ are currently failing with the following:

cuda-samples> /nix/store/x1254667lvjd1d760lhz0irxr9q40338-cudatoolkit-12.1.1/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_50,code=compute_50 -o memMapIPCDrv helper_multiprocess.o memMapIpc.o -L/nix/store/x1254667lvjd1d760lhz0irxr9q40338-cudatoolkit-12.1.1/lib64/stubs -lcuda
cuda-samples> /nix/store/x1254667lvjd1d760lhz0irxr9q40338-cudatoolkit-12.1.1/include/cuda/std/barrier(144): error: function "operator new" cannot be called with the given argument list
cuda-samples>             argument types are: (unsigned long, cuda::std::__4::__barrier_base<cuda::std::__4::__empty_completion, 2> *)
cuda-samples>           { new (&__b->__barrier) __barrier_base(__expected); }
cuda-samples>             ^
cuda-samples> 1 error detected in the compilation of "tf32TensorCoreGemm.cu".
cuda-samples> make[1]: *** [Makefile:367: tf32TensorCoreGemm.o] Error 255

I imagine this is unrelated to this PR -- the only reason it appears now is because this PR allows compilation with newer versions of CUDA. Prior to this PR, the version used was fixed to the version of cudaPackages defined at the top level in Nixpkgs, regardless of which cudaPackages_*_* was used.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.11 Release Notes (or backporting 23.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

@ConnorBaker ConnorBaker force-pushed the fix/cuda-samples-backendStdenv branch from 5349586 to b807a32 Compare November 7, 2023 20:24
@ConnorBaker ConnorBaker self-assigned this Nov 7, 2023
@ConnorBaker ConnorBaker added the 6.topic: cuda Parallel computing platform and API label Nov 7, 2023
@ConnorBaker ConnorBaker changed the title cudaPackages.cuda-samples: use backendStdenv from current scope cudaPackages.cuda-samples: misc bumps and fixups Nov 7, 2023
@ConnorBaker ConnorBaker marked this pull request as ready for review November 7, 2023 20:36
@ConnorBaker
Copy link
Contributor Author

Result of nixpkgs-review pr 266115 --extra-nixpkgs-config '{ allowUnfree = true; cudaSupport = true; cudaCapabilities = [ "8.9" ]; packageOverrides = pkgs: { cudaPackages = pkgs.lib.recurseIntoAttrs pkgs.cudaPackages_11_8; }; }' run on x86_64-linux 1

1 package built:
  • cudaPackages.cuda-samples

@SomeoneSerge
Copy link
Contributor

Better to have it absent than to throw and break evaluation in a way that can't easily be overridden.

Is it?

I think you may be right, but I'll try to argue to the opposite.

Hydra handles broken = true and gracefully (?) skips the broken derivation. It's kind of similar for nixpkgs-review. I did some ugly hacks to mimic that behaviour in https://github.com/SomeoneSerge/nixpkgs-cuda-ci. The convenient thing about it is that broken = true is relatively explicit and unambiguous: some human had a look at the code and decided to mark it as "don't bother trying to build this". I say "relatively", because attributes get also classified as "broken" when it's just their transitive dependencies that are explicitly marked.

Interpreting an absent attribute in the CI to me seems more tricky. Say you want to assert that a valid package set should at least attributes from a certain list, and that they should be buildable. E.g. it should have python3Packages.torchWithCuda, python3Packages.jax, and, well, cudaPackages.cuda-samples. Dropping the cuda-samples attribute depending on the release means we need to implement extra logic in the CI that would judge whether the missing attribute is an error or not based on the cuda/nixpkgs version.

The argument maybe doesn't hold for cuda-samples specifically because I think it is missing for some releases anyway, but in general I'd say having a more static set of attribute paths is a good thing?

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Nov 7, 2023

finalAttrs

😍

buildInputs = [ cudatoolkit ]

Is it time, since you're at it? (Or actually maybe you shouldn't waste the breath, there are more important matters and this otherwise can be merged fast; idk)

@ofborg ofborg bot added 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux. labels Nov 7, 2023
@ConnorBaker
Copy link
Contributor Author

Better to have it absent than to throw and break evaluation in a way that can't easily be overridden.

Is it?

To clarify, it's not "break" in the sense that the derivation has broken = true -- I'm replacing builtins.throw which would "break" evaluation entirely.

@ConnorBaker
Copy link
Contributor Author

buildInputs = [ cudatoolkit ]

Is it time, since you're at it? (Or actually maybe you shouldn't waste the breath, there are more important matters and this otherwise can be merged fast; idk)

I'd prefer to do that in a different PR if that's okay with you :)

@SomeoneSerge
Copy link
Contributor

To clarify, it's not "break" in the sense that the derivation has broken = true -- I'm replacing builtins.throw which would "break" evaluation entirely.

Right, we definitely do not want to throw. But broken = true is still an alternative to optionalAttrs, what do you think about those?

@ConnorBaker ConnorBaker merged commit 69de178 into NixOS:master Nov 9, 2023
@ConnorBaker ConnorBaker deleted the fix/cuda-samples-backendStdenv branch November 9, 2023 15:00
@ConnorBaker
Copy link
Contributor Author

Right, we definitely do not want to throw. But broken = true is still an alternative to optionalAttrs, what do you think about those?

I think this deserves its own followup, so I made an issue to track it: #266475

It's a specific case of how we choose to handle unavailable/unknown comparabilities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: cuda Parallel computing platform and API 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux. 11.by: package-maintainer This PR was created by a maintainer of all the package it changes.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants