Skip to content

meta.problems: Custom package problems [RFC127]#478539

Merged
infinisil merged 6 commits intoNixOS:masterfrom
tweag:rfc127
Feb 26, 2026
Merged

meta.problems: Custom package problems [RFC127]#478539
infinisil merged 6 commits intoNixOS:masterfrom
tweag:rfc127

Conversation

@infinisil
Copy link
Member

@infinisil infinisil commented Jan 9, 2026

This PR implements RFC127, originally based on @piegamesde's initial implementation in #177272 and @AkechiShiro's follow up to add some tests in #338267.

This PR improves over the above by:

  • Optimising performance, ending up with this difference, most notably:
    • gc.totalBytes: +0.3781%
    • nrFunctionCalls: +0.7062%
    • nrLookups: +3.8620%
    • nrPrimOpCalls: +1.0484%
    • nrThunks: +0.6940%
  • Adding more tests, achieving pretty much complete code coverage of what was added

Things done

  • Tests, runnable with nix-build -A tests.problems
  • Docs
  • Added myself as code owner to relevant files

Add a 👍 reaction to pull requests you find important.

@infinisil infinisil requested a review from adisbladis January 9, 2026 21:29
@infinisil infinisil added the 1.severity: significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. label Jan 9, 2026
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux. 6.topic: stdenv Standard environment 8.has: documentation This PR adds or changes documentation labels Jan 9, 2026
Copy link
Member

@mdaniels5757 mdaniels5757 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, this looks great!

- "deprecated": The package relies on software which has reached its end of life.
- "maintainerless": Automatically generated for packages with `meta.maintainers == []`. Unique, not manually specifiable.

Each problem has a handler that deals with it, which can be one of "error", "warn" or "ignore".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think I'd prefer if this was called "severity".

Declaring what it is rather than what should be done with it feels more right to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Handler" also sort of implies that what's declared is an action or a command; perhaps even a custom one.

"Severity" OTOH is just a property of the problem itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea. It's a derivation from the RFC, but I don't think we need to follow it to the dot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, this also implies that we should change config.problems.handlers, and I'm not sure I want to go that far.

@infinisil
Copy link
Member Author

Ping, would be great to get this merged swiftly so that people can try it out

@ConnorBaker

This comment was marked as outdated.

@infinisil

This comment was marked as outdated.

Copy link
Contributor

@ConnorBaker ConnorBaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits/questions

@ConnorBaker

This comment was marked as outdated.

@nixpkgs-ci nixpkgs-ci bot requested a review from a team February 8, 2026 23:04
@infinisil

This comment was marked as outdated.

@ConnorBaker

This comment was marked as outdated.

Copy link
Member Author

@infinisil infinisil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the top-level CUDA conversation with @ConnorBaker into a thread

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally posted by @ConnorBaker in #478539 (comment)

@ConnorBaker Would you like some help with that? If you give me some more details about what you need I could take a look!

I'd love some help!

Most of the CUDA packages come from binary archives provided by NVIDIA (we call these "redists" because we found them at a URL with "redist" in the path, the presence of a "redistrib_.json" manifest, and the LICENSE for some of them allowing redistribution). We have a generic buildRedist function we use to make packages from those binary archives, defined here: https://github.com/NixOS/nixpkgs/blob/9774bb59bd3bb978cf86b38afc70f9c6d5505983/pkgs/development/cuda-modules/buildRedist/default.nix.

Since availability of CUDA packages depends on platform, CUDA version, CUDA capabilities (think the type of GPU), and versions of dependencies, there's a lot of ways things can break (either when trying to patchelf the binary or at runtime). Making matters more complicated, NVIDIA does a fair amount of optimistic binary loading through dlopen calls at runtime, depending on the code path and the environment (e.g., CUDA version, CUDA capability, platform, etc.).

We try to fail early (during evaluation) when possible to avoid very expensive (time, space, and compute) builds that would fail or produce binaries things known to fail. To do that, we have two different attributes set through passthru by buildRedist: brokenAssertions and platformAssertions.

brokenAssertions is defined here:

# brokenAssertions :: [Attrs]
# Used by mkMetaBroken to set `meta.broken`.
# Example: Broken on a specific version of CUDA or when a dependency has a specific version.
# NOTE: Do not use this when a broken assertion means evaluation will fail! For example, if
# a package is missing and is required for the build -- that should go in platformAssertions,
# because attempts to access attributes on the package will cause evaluation errors.
brokenAssertions = [
{
message = "lib output precedes static output";
assertion =
let
libIndex = findFirstIndex (x: x == "lib") null finalAttrs.outputs;
staticIndex = findFirstIndex (x: x == "static") null finalAttrs.outputs;
in
libIndex == null || staticIndex == null || libIndex < staticIndex;
}
{
# NOTE: We cannot (easily) check that all expected outputs have a corresponding outputNameVar attribute in
# finalAttrs because of the presence of attributes which use the "output" prefix but are not outputNameVars
# (e.g., outputChecks and outputName).
message = "outputNameVarFallbacks is a super set of expectedOutputs";
assertion =
subtractLists (map mkOutputNameVar finalAttrs.passthru.expectedOutputs) (
attrNames finalAttrs.passthru.outputNameVarFallbacks
) == [ ];
}
{
message = "outputToPatterns is a super set of expectedOutputs";
assertion =
subtractLists finalAttrs.passthru.expectedOutputs (attrNames finalAttrs.passthru.outputToPatterns)
== [ ];
}
{
message = "propagatedBuildOutputs is a subset of outputs";
assertion = subtractLists finalAttrs.outputs finalAttrs.propagatedBuildOutputs == [ ];
}
]
++ brokenAssertions;

and platformAssertions is defined here:

# platformAssertions :: [Attrs]
# Used by mkMetaBadPlatforms to set `meta.badPlatforms`.
# Example: Broken on a specific system when some condition is met, like targeting Jetson or
# a required package missing.
# NOTE: Use this when a failed assertion means evaluation can fail!
platformAssertions =
let
isSupportedRedistSystem = _redistSystemIsSupported hostRedistSystem finalAttrs.passthru.supportedRedistSystems;
in
[
{
message = "src is null if and only if hostRedistSystem is unsupported";
assertion = (finalAttrs.src == null) == !isSupportedRedistSystem;
}
{
message = "hostRedistSystem (${hostRedistSystem}) is supported (${builtins.toJSON finalAttrs.passthru.supportedRedistSystems})";
assertion = isSupportedRedistSystem;
}
]
++ platformAssertions;

Then, in meta, buildRedist uses the final copies of those values to set meta.broken and meta.badPlatforms:

broken = _mkMetaBroken finalAttrs;
badPlatforms = _mkMetaBadPlatforms finalAttrs;

The helper functions produce the expected values, additionally printing out failed assertions through builtins.traceVerbose: https://github.com/NixOS/nixpkgs/blob/3f96296da66f5ecf3d8106c61281b823949a56c0/pkgs/development/cuda-modules/_cuda/lib/meta.nix.

As an example of how this is used, look at cuDNN, which uses platformAssertions to ensure it is only available for supported CUDA capabilities:

# NOTE:
# With cuDNN forward compatiblity, all non-natively supported compute capabilities JIT compile PTX kernels.
#
# While this is sub-optimal and we should warn the user and encourage them to use a newer version of cuDNN, we
# have no clean mechanism by which we can warn the user, or allow silencing such a warning if the use of an
# older cuDNN is intentional.
#
# As such, we only warn about capabilities which are no longer supported by cuDNN.
#
# NOTE:
#
# NVIDIA promises forward compatibility of cuDNN for major versions of CUDA. As an example, the cuDNN build for
# CUDA 12 is compatible with all, and will remain compatible with, all CUDA 12 releases. However, this does not
# extend to static linking with CUDA 11!
#
# We don't need to check the CUDA version to see if it falls within some supported range -- if a user decides
# to do static linking against some odd combination of CUDA 11 and cuDNN, that's on them.
#
platformAssertions =
let
# Create variables and use logical OR to allow short-circuiting.
cudnnAtLeast912 = cudnnAtLeast "9.12";
cudnnAtLeast88 = cudnnAtLeast912 || cudnnAtLeast "8.8";
cudnnAtLeast85 = cudnnAtLeast88 || cudnnAtLeast "8.5";
allCCNewerThan75 = lib.all (lib.flip lib.versionAtLeast "7.5") cudaCapabilities;
allCCNewerThan50 = allCCNewerThan75 || lib.all (lib.flip lib.versionAtLeast "5.0") cudaCapabilities;
allCCNewerThan35 = allCCNewerThan50 || lib.all (lib.flip lib.versionAtLeast "3.5") cudaCapabilities;
in
[
# https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-850/support-matrix/index.html#cudnn-cuda-hardware-versions
{
message =
"cuDNN releases since 8.5 (found ${finalAttrs.version})"
+ " support CUDA compute capabilities 3.5 and newer (found ${builtins.toJSON cudaCapabilities})";
assertion = cudnnAtLeast85 -> allCCNewerThan35;
}
# https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-880/support-matrix/index.html#cudnn-cuda-hardware-versions
{
message =
"cuDNN releases since 8.8 (found ${finalAttrs.version})"
+ " support CUDA compute capabilities 5.0 and newer (found ${builtins.toJSON cudaCapabilities})";
assertion = cudnnAtLeast88 -> allCCNewerThan50;
}
# https://docs.nvidia.com/deeplearning/cudnn/backend/v9.12.0/reference/support-matrix.html#gpu-cuda-toolkit-and-cuda-driver-requirements
{
message =
"cuDNN releases since 9.12 (found ${finalAttrs.version})"
+ " support CUDA compute capabilities 7.5 and newer (found ${builtins.toJSON cudaCapabilities})";
assertion = cudnnAtLeast912 -> allCCNewerThan75;
}
];

I'd love to see how (or if) the current implementation of the problems RFC would address such a use case.

As an added bonus, I'd love to see the definition for cudaPackages.backendStdenv simplified: https://github.com/NixOS/nixpkgs/blob/9774bb59bd3bb978cf86b38afc70f9c6d5505983/pkgs/development/cuda-modules/backendStdenv/default.nix.

Beyond ensuring a compatible version of GCC (patched to use glibc/libstdc++ from the default stdenv) is available for NVCC, it also performs a fair amount of logic in determining the default set of CUDA capabilities for some version of CUDA as well as whether explicitly requested CUDA capabilities are valid for the given CUDA version. It doesn't make sense to attach these through buildRedist since it affects all CUDA packages, and causing evaluation of backendStdenv to fail also allows us to fail evaluation in the presence of incorrect configurations (since a number of the utilities in cudaPackages.flags are derived from cudaPackages.backendStdenv).


Originally posted by @infinisil in #478539 (comment)

@ConnorBaker Thanks for the detailed explanation! I've created another draft PR based on the one here to implement a broken error kind: tweag#111

With that, it should be possible for you to do this for assertions:

meta.problems = optionalAttrs (! assertion) { 
  libBeforeStatic.message = "lib output precedes static output"
} // optionalAttrs (! anotherAssertion) {
  # ...
};

We could also consider adding a generic enable/condition/assertion field to problems to make this more ergonomic.

If an assertion is wrong, it causes evaluation to fail with the message displayed, along with info on how specific assertions can be switched to just warn or be entirely ignored.

I believe this would be appropriate for both the current brokenAssertions and platformAssertions, while not setting any meta.broken and meta.badPlatforms values. I plan to make a PR switching all of them for you to test, but early feedback is also appreciated :)

Originally posted by @infinisil in #478539 (comment)

@ConnorBaker Can you check out tweag#112?


Originally posted by @ConnorBaker in #478539 (comment)

I had a chance to look at tweag#111 and tweag#112 and I like what you've done!

We could also consider adding a generic enable/condition/assertion field to problems to make this more ergonomic.

I personally would like to see that, or something which aligns the attributes with the NixOS module system assertions/warnings.

In testing out tweag#112, I noticed it seems there's a double newline before "Package problems":

$ NIXPKGS_ALLOW_UNFREE=1 nix eval .#pkgsForCudaArch.sm_61.cudaPackages.cutlass -L --impure --json
error:
       … in the condition of the assert statement
         at /nix/store/yxy9bvxkvijkxpaz9qzap8gsjn4qf5rq-source/lib/customisation.nix:450:9:
          449|       outPath =
          450|         assert condition;
             |         ^
          451|         drv.outPath;

       … while evaluating the attribute 'handled'
         at /nix/store/yxy9bvxkvijkxpaz9qzap8gsjn4qf5rq-source/pkgs/stdenv/generic/check-meta.nix:677:11:
          676|           valid = if isNull problems.error then "warn" else "no";
          677|           handled = handle {
             |           ^
          678|             inherit attrs meta;

       (stack trace truncated; use '--show-trace' to show the full, detailed trace)

       error: Package ‘cuda12.8-cutlass-3.9.2’ in /nix/store/yxy9bvxkvijkxpaz9qzap8gsjn4qf5rq-source/pkgs/development/cuda-modules/packages/cutlass.nix:211 has the following problems that must be acknowledged: [ "capabilities" ], refusing to evaluate.


       Package problems:

       - capabilities (kind "broken"): Not all capabilities are >= 7.0 (["6.1"])

       You can use it anyway by ignoring its problems, using one of the
       following methods:

       a) For `nixos-rebuild` you can add "warn" or "ignore" entries to
         `nixpkgs.config.problems.handlers` inside configuration.nix,
         like this:

           {
             nixpkgs.config.problems.handlers = {
               cutlass.capabilities = "warn";
             };
           }

       b) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
         "warn" or "ignore" to `problems.handlers` in
         ~/.config/nixpkgs/config.nix, like this:

           {
             problems.handlers = {
               cutlass.capabilities = "warn";
             };
           }

       See this page for more details: https://nixos.org/manual/nixpkgs/unstable#sec-problems

Is that intentional?

I'm glad to see these impact meta.available (which I use in some places to handle different configurations).

What do you imagine the use-case for the unsupported kind being and how would you say it differs from broken?


Originally posted by @infinisil in #478539 (comment)

I personally would like to see that, or something which aligns the attributes with the NixOS module system assertions/warnings.

Noted, but probably better in a follow-up PR to keep the scope of this one smaller.

Is that intentional?

Nope. Really the error format needs some tweaking in general imo, it's too big and contains duplicate info. How about this instead:

error: Package ‘a-0’ in /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-default.nix:18 has problems:
- deprecated: Package is deprecated and replaced by b.
- maintainerless: This package has no declared maintainer, i.e. an empty `meta.maintainers` and `meta.teams` attribute.
- removal: Package will be removed.

See https://nixos.org/manual/nixpkgs/unstable#sec-problems for info. To allow evaluation regardless, use:
- Nixpkgs import: import nixpkgs { config = <below code>; }
- NixOS: nixpkgs.config = <below code>;
- nix-* commands: Put below code in ~/.config/nixpkgs/config.nix

  {
    problems.handlers = {
      a.deprecated = "warn"; # or "ignore"
      a.maintainerless = "warn"; # or "ignore"
      a.removal = "warn"; # or "ignore"
    };
  }

What do you imagine the use-case for the unsupported kind being and how would you say it differs from broken?

That's actually a very good point. These have very different notions:

  • Unsupported: Not supposed to work, fix upstream
  • Broken: Supposed to work according to upstream, fix the Nix side

We'd want to use both of these for CUDA


Originally posted by @ConnorBaker in #478539 (comment)

Ah, that makes sense. I worry that unsupported may not exactly align with how we'd want to use it for CUDA -- that is, to explain that the user has selected a configuration which will not or cannot work and that there is no upstream fix that would enable it. Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ConnorBaker Right, I see! Then I think it would be best to introduce a separate assertion problem kind for generic internal assertions that have been violated, not necessarily relating to broken or unsupported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the reason I've done broken or unsupported system previously is to indicate or communicate that such a configuration just isn't possible and that even more than "this will fail to build" the user might get evaluation errors (e.g., trying to do a lookup in an attribute set for a CUDA binary for an unsupported platform, like Darwin, will throw a missing attribute exception) -- kind of like a "stop digging! nothing you want is here!".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those cases I actually think a simple foo.attrThatMightNotExist or (throw "This does not work!") would be best, because meta.problems will inherently allow users to ignore the problem and try anyways, which doesn't make a lot of sense if we know exactly where during evaluation it will fail.

So I think of meta.problems as a way for package maintainers to say that

I'm warning you that there's a problem, but if you wanna see for yourself or think it's not a problem for you, feel free to ignore this, in which case you're on your own and can't complain if you run into the problem you ignored

Copy link
Member

@Eveeifyeve Eveeifyeve Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better message could be a warning that says:

Warning!!!!!! The package <package> <problem> and may lead into these problems like: <list>, We strongly recommend <recommendation>. 

Please note: This issues listed above may not be covered in an issue.

So with a package being like nim1 and problem being it doesn't exist, recommend nim2 & problems may include security vulnerabilities, package building issue.
Would look like:

Warning!!!!!! The package nim1, doesn't exist and may lead into these problems like: security vulnerabilities & package building issues , We strongly recommend using nim or nim-unwrapped (v2). 

Please note: This issues listed above may not be covered in an issue/will be automatically closed.

Because I find the message that you mentioned @infinisil having a lot of idioms personally.

Comment on lines 3 to 18
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty happy with this error message now tbh, but we can of course adjust this later if wanted

@infinisil infinisil force-pushed the rfc127 branch 2 times, most recently from a036cd4 to e382faf Compare February 20, 2026 14:18
@infinisil
Copy link
Member Author

I've addressed all feedback now and would like to merge this soon.

I'll also try to get follow-ups to this done, including migration of broken, unsupported, and an introduction of a generic assertion problem type, for CUDA and other use cases.

@infinisil
Copy link
Member Author

Pushed to fix conflict. I'll go ahead with the merge now, I hope nothing breaks from this!

@infinisil infinisil enabled auto-merge February 26, 2026 14:11
@infinisil infinisil added this pull request to the merge queue Feb 26, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 26, 2026
@infinisil infinisil added this pull request to the merge queue Feb 26, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Feb 26, 2026
@infinisil
Copy link
Member Author

There's some spurious GitHub Actions failures, let's try again..

@infinisil infinisil added this pull request to the merge queue Feb 26, 2026
Merged via the queue into NixOS:master with commit 73c2995 Feb 26, 2026
53 of 70 checks passed
@github-project-automation github-project-automation bot moved this to Done in Stdenv Feb 26, 2026
@infinisil
Copy link
Member Author

Follow-up for meta.broken: #494416

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1.severity: significant Novel ideas, large API changes, notable refactorings, issues with RFC potential, etc. 6.topic: stdenv Standard environment 8.has: documentation This PR adds or changes documentation 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-darwin: 1 This PR causes 1 package to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants