cudaPackages: cleanup post #256324

Cleanup to be done once #256324 is merged.

> [!NOTE]
> I use the term "multiplexed package" to refer to a package which has multiple versions present in a single instance of the `cudaPackages` package set.
> Examples of such "multiplexed packages" include cuDNN and TensorRT, as there are generally several versions of these packages available in each instance of the `cudaPackages` package set.
> A non-example would be NVCC, as there is exactly one in each instance of the `cudaPackages` package set.

### Cleaner abstraction to support overrides

Per a conversation with @SomeoneSerge (thank you for generously giving so much of your time to me so I could throw a bunch of Nix at you!), how one should attempt to override core attributes of the package set isn't clear. That is, the "backbone" attribute sets which are instantiated outside of the package set fixed point and manually threaded through the extensions to prevent infinite recursion (which occurs because the existence of some attributes in the package set depends on these values): `cudaVersion`, `gpus`, `flags`, and `nvccCompatibilities`.

Serge suggested making use of the module system to solve this issue:

Instead of evaluating modules inside the generic multiplex builder and each `extension.nix`, we do it once in `pkgs/top-level/cuda-packages.nix`. We expose two additional attributes in the package set: a list for additional modules to be evaluated against the ones which exist in-tree (allowing users to extend what we provide) and the resulting evaluated configuration. The builders would then take some subtree of that configuration instead of performing the module evaluation themselves.

### Fragile implementation for multiplexed packages

Multiplexed packages rely on the existence of several paths which depend on the `pname` of the package. As an example, the expression which produces `cudnn` packages requires that all of `pkgs/development/cuda-modules/cudnn/{fixup,shims,releases}.nix` exist, and access them by name.

This is fragile and generally unwanted. Improvements suggested in [Cleaner abstraction to support overrides](#cleaner-abstraction-to-support-overrides) may provide a way to re-implement such logic in a cleaner way.


### Source-available packages and source priorities

Nixpkgs prefers to use source-builds when possible. Some packages, like CUDA CCCL are available both pre-packaged from NVIDIA and from source: https://github.com/NVIDIA/cccl.

It's unclear at the moment how to handle builds when multiple sources are available. However, this does seem like something the module system would excel at handling, given we are able to assign priorities to different values.

### Laundry list

- [ ] Find a way to abstract away CuTensor's `extension.nix`. Likely solved by [Cleaner abstraction to support overrides](#cleaner-abstraction-to-support-overrides).
- [ ] Update logic to handle creation of overrides using the new `dependencies` field present in the feature manifests generated by `cuda-redist-find-features`.
- [ ] Document this particular section of NVIDIA's documentation for other CUDA maintainers: <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#host-compiler-support-policy>
  - "We only support libstdc++ (GCC’s implementation) for all the supported host compilers for the platforms listed above."
  - That makes it sound like, even when using Clang, we need to use GCC's C++ standard library.
- [x] Figure out what `cuda_compat` is/does: #267247
- [ ] Figure out how to extend option sets with additional options. Likely solved by [Cleaner abstraction to support overrides](#cleaner-abstraction-to-support-overrides)
- [ ] Investigate behavior of overrides of `cudaVersion` on the `cudaPackages` package set through `override` and `overrideAttrs`. Likely solved by [Cleaner abstraction to support overrides](#cleaner-abstraction-to-support-overrides).
- [ ] Move `cuda-redist-find-features` to `nix-community` (https://github.com/NixOS/nixpkgs/pull/256324#discussion_r1387012952)


@NixOS/cuda-maintainers @SomeoneSerge @samuela thoughts?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cudaPackages: cleanup post #256324 #271217

Cleaner abstraction to support overrides

Fragile implementation for multiplexed packages

Source-available packages and source priorities

Laundry list

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

cudaPackages: cleanup post #256324 #271217

Description

Cleaner abstraction to support overrides

Fragile implementation for multiplexed packages

Source-available packages and source priorities

Laundry list

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions