meta.problems: Custom package problems [RFC127]#478539
meta.problems: Custom package problems [RFC127]#478539infinisil merged 6 commits intoNixOS:masterfrom
Conversation
| - "deprecated": The package relies on software which has reached its end of life. | ||
| - "maintainerless": Automatically generated for packages with `meta.maintainers == []`. Unique, not manually specifiable. | ||
|
|
||
| Each problem has a handler that deals with it, which can be one of "error", "warn" or "ignore". |
There was a problem hiding this comment.
Nit: I think I'd prefer if this was called "severity".
Declaring what it is rather than what should be done with it feels more right to me.
There was a problem hiding this comment.
"Handler" also sort of implies that what's declared is an action or a command; perhaps even a custom one.
"Severity" OTOH is just a property of the problem itself.
There was a problem hiding this comment.
I like the idea. It's a derivation from the RFC, but I don't think we need to follow it to the dot.
There was a problem hiding this comment.
Although, this also implies that we should change config.problems.handlers, and I'm not sure I want to go that far.
|
Ping, would be great to get this merged swiftly so that people can try it out |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
ConnorBaker
left a comment
There was a problem hiding this comment.
Mostly nits/questions
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
infinisil
left a comment
There was a problem hiding this comment.
Moving the top-level CUDA conversation with @ConnorBaker into a thread
There was a problem hiding this comment.
Originally posted by @ConnorBaker in #478539 (comment)
@ConnorBaker Would you like some help with that? If you give me some more details about what you need I could take a look!
I'd love some help!
Most of the CUDA packages come from binary archives provided by NVIDIA (we call these "redists" because we found them at a URL with "redist" in the path, the presence of a "redistrib_.json" manifest, and the LICENSE for some of them allowing redistribution). We have a generic buildRedist function we use to make packages from those binary archives, defined here: https://github.com/NixOS/nixpkgs/blob/9774bb59bd3bb978cf86b38afc70f9c6d5505983/pkgs/development/cuda-modules/buildRedist/default.nix.
Since availability of CUDA packages depends on platform, CUDA version, CUDA capabilities (think the type of GPU), and versions of dependencies, there's a lot of ways things can break (either when trying to patchelf the binary or at runtime). Making matters more complicated, NVIDIA does a fair amount of optimistic binary loading through dlopen calls at runtime, depending on the code path and the environment (e.g., CUDA version, CUDA capability, platform, etc.).
We try to fail early (during evaluation) when possible to avoid very expensive (time, space, and compute) builds that would fail or produce binaries things known to fail. To do that, we have two different attributes set through passthru by buildRedist: brokenAssertions and platformAssertions.
brokenAssertions is defined here:
nixpkgs/pkgs/development/cuda-modules/buildRedist/default.nix
Lines 388 to 425 in 9774bb5
and platformAssertions is defined here:
nixpkgs/pkgs/development/cuda-modules/buildRedist/default.nix
Lines 427 to 446 in 9774bb5
Then, in meta, buildRedist uses the final copies of those values to set meta.broken and meta.badPlatforms:
nixpkgs/pkgs/development/cuda-modules/buildRedist/default.nix
Lines 455 to 456 in 9774bb5
The helper functions produce the expected values, additionally printing out failed assertions through builtins.traceVerbose: https://github.com/NixOS/nixpkgs/blob/3f96296da66f5ecf3d8106c61281b823949a56c0/pkgs/development/cuda-modules/_cuda/lib/meta.nix.
As an example of how this is used, look at cuDNN, which uses platformAssertions to ensure it is only available for supported CUDA capabilities:
nixpkgs/pkgs/development/cuda-modules/packages/cudnn.nix
Lines 52 to 104 in 9774bb5
I'd love to see how (or if) the current implementation of the problems RFC would address such a use case.
As an added bonus, I'd love to see the definition for cudaPackages.backendStdenv simplified: https://github.com/NixOS/nixpkgs/blob/9774bb59bd3bb978cf86b38afc70f9c6d5505983/pkgs/development/cuda-modules/backendStdenv/default.nix.
Beyond ensuring a compatible version of GCC (patched to use glibc/libstdc++ from the default stdenv) is available for NVCC, it also performs a fair amount of logic in determining the default set of CUDA capabilities for some version of CUDA as well as whether explicitly requested CUDA capabilities are valid for the given CUDA version. It doesn't make sense to attach these through buildRedist since it affects all CUDA packages, and causing evaluation of backendStdenv to fail also allows us to fail evaluation in the presence of incorrect configurations (since a number of the utilities in cudaPackages.flags are derived from cudaPackages.backendStdenv).
Originally posted by @infinisil in #478539 (comment)
@ConnorBaker Thanks for the detailed explanation! I've created another draft PR based on the one here to implement a broken error kind: tweag#111
With that, it should be possible for you to do this for assertions:
meta.problems = optionalAttrs (! assertion) {
libBeforeStatic.message = "lib output precedes static output"
} // optionalAttrs (! anotherAssertion) {
# ...
};We could also consider adding a generic enable/condition/assertion field to problems to make this more ergonomic.
If an assertion is wrong, it causes evaluation to fail with the message displayed, along with info on how specific assertions can be switched to just warn or be entirely ignored.
I believe this would be appropriate for both the current brokenAssertions and platformAssertions, while not setting any meta.broken and meta.badPlatforms values. I plan to make a PR switching all of them for you to test, but early feedback is also appreciated :)
Originally posted by @infinisil in #478539 (comment)
@ConnorBaker Can you check out tweag#112?
Originally posted by @ConnorBaker in #478539 (comment)
I had a chance to look at tweag#111 and tweag#112 and I like what you've done!
We could also consider adding a generic enable/condition/assertion field to problems to make this more ergonomic.
I personally would like to see that, or something which aligns the attributes with the NixOS module system assertions/warnings.
In testing out tweag#112, I noticed it seems there's a double newline before "Package problems":
$ NIXPKGS_ALLOW_UNFREE=1 nix eval .#pkgsForCudaArch.sm_61.cudaPackages.cutlass -L --impure --json
error:
… in the condition of the assert statement
at /nix/store/yxy9bvxkvijkxpaz9qzap8gsjn4qf5rq-source/lib/customisation.nix:450:9:
449| outPath =
450| assert condition;
| ^
451| drv.outPath;
… while evaluating the attribute 'handled'
at /nix/store/yxy9bvxkvijkxpaz9qzap8gsjn4qf5rq-source/pkgs/stdenv/generic/check-meta.nix:677:11:
676| valid = if isNull problems.error then "warn" else "no";
677| handled = handle {
| ^
678| inherit attrs meta;
(stack trace truncated; use '--show-trace' to show the full, detailed trace)
error: Package ‘cuda12.8-cutlass-3.9.2’ in /nix/store/yxy9bvxkvijkxpaz9qzap8gsjn4qf5rq-source/pkgs/development/cuda-modules/packages/cutlass.nix:211 has the following problems that must be acknowledged: [ "capabilities" ], refusing to evaluate.
Package problems:
- capabilities (kind "broken"): Not all capabilities are >= 7.0 (["6.1"])
You can use it anyway by ignoring its problems, using one of the
following methods:
a) For `nixos-rebuild` you can add "warn" or "ignore" entries to
`nixpkgs.config.problems.handlers` inside configuration.nix,
like this:
{
nixpkgs.config.problems.handlers = {
cutlass.capabilities = "warn";
};
}
b) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
"warn" or "ignore" to `problems.handlers` in
~/.config/nixpkgs/config.nix, like this:
{
problems.handlers = {
cutlass.capabilities = "warn";
};
}
See this page for more details: https://nixos.org/manual/nixpkgs/unstable#sec-problemsIs that intentional?
I'm glad to see these impact meta.available (which I use in some places to handle different configurations).
What do you imagine the use-case for the unsupported kind being and how would you say it differs from broken?
Originally posted by @infinisil in #478539 (comment)
I personally would like to see that, or something which aligns the attributes with the NixOS module system assertions/warnings.
Noted, but probably better in a follow-up PR to keep the scope of this one smaller.
Is that intentional?
Nope. Really the error format needs some tweaking in general imo, it's too big and contains duplicate info. How about this instead:
error: Package ‘a-0’ in /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-default.nix:18 has problems:
- deprecated: Package is deprecated and replaced by b.
- maintainerless: This package has no declared maintainer, i.e. an empty `meta.maintainers` and `meta.teams` attribute.
- removal: Package will be removed.
See https://nixos.org/manual/nixpkgs/unstable#sec-problems for info. To allow evaluation regardless, use:
- Nixpkgs import: import nixpkgs { config = <below code>; }
- NixOS: nixpkgs.config = <below code>;
- nix-* commands: Put below code in ~/.config/nixpkgs/config.nix
{
problems.handlers = {
a.deprecated = "warn"; # or "ignore"
a.maintainerless = "warn"; # or "ignore"
a.removal = "warn"; # or "ignore"
};
}
What do you imagine the use-case for the unsupported kind being and how would you say it differs from broken?
That's actually a very good point. These have very different notions:
- Unsupported: Not supposed to work, fix upstream
- Broken: Supposed to work according to upstream, fix the Nix side
We'd want to use both of these for CUDA
Originally posted by @ConnorBaker in #478539 (comment)
Ah, that makes sense. I worry that unsupported may not exactly align with how we'd want to use it for CUDA -- that is, to explain that the user has selected a configuration which will not or cannot work and that there is no upstream fix that would enable it. Does that make sense?
There was a problem hiding this comment.
@ConnorBaker Right, I see! Then I think it would be best to introduce a separate assertion problem kind for generic internal assertions that have been violated, not necessarily relating to broken or unsupported.
There was a problem hiding this comment.
Part of the reason I've done broken or unsupported system previously is to indicate or communicate that such a configuration just isn't possible and that even more than "this will fail to build" the user might get evaluation errors (e.g., trying to do a lookup in an attribute set for a CUDA binary for an unsupported platform, like Darwin, will throw a missing attribute exception) -- kind of like a "stop digging! nothing you want is here!".
There was a problem hiding this comment.
For those cases I actually think a simple foo.attrThatMightNotExist or (throw "This does not work!") would be best, because meta.problems will inherently allow users to ignore the problem and try anyways, which doesn't make a lot of sense if we know exactly where during evaluation it will fail.
So I think of meta.problems as a way for package maintainers to say that
I'm warning you that there's a problem, but if you wanna see for yourself or think it's not a problem for you, feel free to ignore this, in which case you're on your own and can't complain if you run into the problem you ignored
There was a problem hiding this comment.
A better message could be a warning that says:
Warning!!!!!! The package <package> <problem> and may lead into these problems like: <list>, We strongly recommend <recommendation>.
Please note: This issues listed above may not be covered in an issue.
So with a package being like nim1 and problem being it doesn't exist, recommend nim2 & problems may include security vulnerabilities, package building issue.
Would look like:
Warning!!!!!! The package nim1, doesn't exist and may lead into these problems like: security vulnerabilities & package building issues , We strongly recommend using nim or nim-unwrapped (v2).
Please note: This issues listed above may not be covered in an issue/will be automatically closed.
Because I find the message that you mentioned @infinisil having a lot of idioms personally.
There was a problem hiding this comment.
I'm pretty happy with this error message now tbh, but we can of course adjust this later if wanted
a036cd4 to
e382faf
Compare
|
I've addressed all feedback now and would like to merge this soon. I'll also try to get follow-ups to this done, including migration of |
The attributes of the warning/invalid values are never used anywhere, and maintaining these would make future changes harder
Not just used for errors anymore
See https://github.com/NixOS/rfcs/blob/master/rfcs/0127-issues-warnings.md Co-Authored-By: piegames <[email protected]> Co-Authored-By: AkechiShiro <[email protected]>
|
Pushed to fix conflict. I'll go ahead with the merge now, I hope nothing breaks from this! |
|
There's some spurious GitHub Actions failures, let's try again.. |
|
Follow-up for |
This PR implements RFC127, originally based on @piegamesde's initial implementation in #177272 and @AkechiShiro's follow up to add some tests in #338267.
This PR improves over the above by:
Things done
nix-build -A tests.problemsAdd a 👍 reaction to pull requests you find important.