stdenv: allow preserving meta fields in derivation#420575
stdenv: allow preserving meta fields in derivation#420575Princemachiavelli wants to merge 6 commits intoNixOS:stagingfrom
Conversation
|
So we'll have a list of meta attribute which gets added to the derivation as well. And It would be nice if we can come up with another name (other than |
|
CC @NixOS/stdenv for visibility. Disclosure: I work with @Princemachiavelli. I'm very interested in the functionality such a change enable with respect to security scans and tracking CVE fixes. |
c7fda36 to
45c4e15
Compare
Thanks, I've added I've renamed the new derivation field to Additionally, I've added a setupHook to also write this |
|
As a general note on |
8ae0b15 to
fd6c366
Compare
fd6c366 to
6bac05e
Compare
|
Looks like evaluation is failing because the parallel eval runs with |
|
Requested review from @philiptaron and @RossComputerGuy. |
|
@Princemachiavelli I'd like to see a check or assertion on the values being serialized into the derivation which prevents addition of any paths, derivations, or strings with context -- I believe allowing any of those would cause their closures to be propagated forever. |
6bac05e to
bce77ce
Compare
|
People likely interested in changes to My two cents: @Princemachiavelli do I understand correctly that you want this because
Not saying yours is wrong, just wondering whether the trade-offs of the variants are properly explored. |
Yes, at a high level this is the issue. My main argument for this PR's solution is it's simplicity and minimal impact on Nix/Nixpkgs. I think both of your proposed approaches have a lot of interesting use-cases but, given their complexity, they should be implemented as part of a larger project to improve many parts of Nix/Nixpkgs.
This is a proposed feature in upstream Nix (NixOS/nix#10780). While this is an interesting solution, it presents some obvious issues. If CPE (or any metadata) is incorrect and must be updated, how would we "force" an update of a store path if we are excluding this data from the input-addressing function? The "actual" problem trying to be solved here is avoiding rebuilds due to metadata changes. However, one could argue that the same issue occurs due to minor changes to the
I'm not 100% sure I'm following but I could imagine a solution that operates on the expression layer. Let me know if this tracks what you are thinking. I could see However, I feel that correctly accounting for stringContext and identifying run-time vs build-time dependencies would become very complicated. Even correctly implemented, it would "require" access to the Nix source code and original expression - it would never allow for computing the SBOM of an arbitrary store path. The association of store path and Nix expression would have to be tracked outside Nix as well. This would severely limit integration into existing security, CVE scanning, etc. software. |
roberth
left a comment
There was a problem hiding this comment.
Thank you for taking this initiative @Princemachiavelli!
I think it's unfortunate that we don't have a good opportunity to integrate meta cleverly into the derivation format without the rebuild trouble, which is feasible. We may get around to it one day, along with other changes.
Before such a future, I agree it's better to deliver a solution that works, and this approach supports that.
| ) | ||
| // optionalAttrs __structuredAttrs { env = checkedEnv; } | ||
| // optionalAttrs (preserveMetaFields != [ ]) { | ||
| nixMetaJSON = builtins.toJSON (filterAttrs (n: _: (elem n preserveMetaFields)) meta); |
There was a problem hiding this comment.
Let's make this a pleasant format and not propagate the pname tech debt. It only exists because we haven't found the courage to fix the problem of these redundant three variables.
For context, the way this works is that store path name == "${pname}-${version}" if the latter both are provided to mkDerivation.
pkg.name got the value of the store path name arguably by mistake.
The store path name part's purpose is for humans to read. It exists at the store layer and did not need to be passed back up to the Nix language layer. (More generally, I find that tacking the argument onto the returned derivation attributes is one of the biggest flaws of the language)
Instead, what Nixpkgs should do is claim the name name for itself, for its own purposes in packaging as opposed to a technical term from its underlying build system.
| nixMetaJSON = builtins.toJSON (filterAttrs (n: _: (elem n preserveMetaFields)) meta); | |
| nixMetaJSON = builtins.toJSON (filterAttrs (n: _: (elem n preserveMetaFields)) meta // { | |
| name = attrs.pname or attr.name; | |
| version = attrs.version or null; | |
| }); |
Having null values is good here. It shows that the omission of a value is not accidental or because of some change in the interface.
name, pname and version can be removed from preserveMetaFields.
| "license" | ||
| "vendor" | ||
| "cpe" | ||
| "mainProgram" |
There was a problem hiding this comment.
Adding mainProgram to any other output than bin or out would be misleading, because no such program exists there.
There was a problem hiding this comment.
We probably also have the opposite granularity problem, where one meta.json isn't sufficient for describing large store paths, for example a link farm, or (worse) a store path with combined content that's not linked but copied.
There was a problem hiding this comment.
It might be wrong to write this to any outputs at all.
These things are quite subtle and by doing this we're not just creating worse data quality, but setting our users up for failure, because outputs by their very nature do not take into account all the build inputs, whose licenses and vulnerabilities will affect the output anyway. The closure of outputs is incomplete, so any analysis done using it will also be incomplete.
To illustrate, a static library dependency becomes part of the output (bytes, vulns, licenses), but it does not leave behind a reference to the original library and its meta.json.
In other words, tools should take into account the derivation closure, not just the output's closure. By storing the meta info in the derivation closure, we provide it faithfully in its original shape, and we set up our users to consider how to deal with these characteristics, instead of nudging them down the wrong path.
There was a problem hiding this comment.
@roberth There's certainly a lot of truth to your analysis. I still think there may be additional value in keeping the meta in the outputs as well - even if it's only ~95% accurate. Even using derivations, do all language ecosystems (rust, go, npm, etc.) have a single derivation per-dependency? If not, we would encounter the same issue.
In any case, updating bonbom/sbomnix to take advantage of the derivation directly will take some time anyway and is probably sufficient for my uses. I'll move the nix-support/meta.json stdenv hook into it separate PR.
There was a problem hiding this comment.
In other words, tools should take into account the derivation closure, not just the output's closure. By storing the meta info in the derivation closure, we provide it faithfully in its original shape, and we set up our users to consider how to deal with these characteristics, instead of nudging them down the wrong path.
I don't understand the exclusive OR here. This PR stores the info both in the derivation (all derivation arguments are exposed in the .drv after all) as well as in the output. It give you both.
I think having meta information of your runtime dependencies can still be useful. e.g. for a runtime threat scanner to be able to tell you if you're running a vulnerable version of something. Or for a container scanner that scans a nix-built container to match store paths to package databases. Then afterwards people can always reach for the derivation-level to get the whole accurate build-time view.
Even using derivations, do all language ecosystems (rust, go, npm, etc.) have a single derivation per-dependency? If not, we would encounter the same issue.
buildGoModule and friends should be responsible for customizing the meta hook to output their SBOM information in nix-support/. Go actually already stores dependency information of what it was built with inside the binary. So the meta information is already available.
% nix shell nixpkgs#go nixpkgs#caddy
% go version -m $(which caddy)
/nix/store/k369dl06sznpappxj4wr5fmh84x0zpjf-caddy-2.10.0/bin/caddy: go1.24.4
path github.com/caddyserver/caddy/v2/cmd/caddy
mod github.com/caddyserver/caddy/v2 (devel)
dep cel.dev/expr v0.19.1
dep dario.cat/mergo v1.0.1
dep filippo.io/edwards25519 v1.1.0
dep github.com/BurntSushi/toml v1.4.0
dep github.com/KimMachineGun/automemlimit v0.7.1
dep github.com/Masterminds/goutils v1.1.1
(...)
build -buildmode=exe
build -compiler=gc
build -tags=nobadger,nomysql,nopgx
build -trimpath=true
build CGO_ENABLED=1
build GOARCH=arm64
build GOOS=darwin
Bombon also has some code that does something similar for rust: https://github.com/nikstur/bombon/blame/main/nix/passthru-vendored.nix
The same we would have to do for statically linked binaries. Ideally the output of the statically built derivation contains meta information on what dependencies ended up being linked into the static binary (though modulo the nix store path. as we don't want references in static outputs I guess?). Similar to how we find propagated build inputs recursively for all our build inputs in setup.sh's findInputs, we can also collect meta info recursively of all out build inputs and then write the union of that as a build output. This might over-approximate as maybe not all buildInputs are used for linking in the end. But if we want exact info then this needs collaboration with the build tools of said packages to produce these info instead. See e.g. https://ariadne.space/2025/02/08/c-sboms-and-how-pkgconf.html
There was a problem hiding this comment.
e.g. something like this (took the code from the pkg-config hook and adjusted.)
The trick is that the addToMetaPath hook gets called for all (propagated) inputs of the derivation. So it shows all our build time dependencies.
# Skip setup hook if we're neither a build-time dep, nor, temporarily, doing a
# native compile.
#
# TODO(@Ericson2314): No native exception
[[ -z ${strictDeps-} ]] || (( "$hostOffset" < 0 )) || return 0
addToMetaPath() {
# See ../setup-hooks/role.bash
local role_post
getHostRoleEnvHook
addToSearchPath "META_DEPENDENCIES${role_post}" "$1/nix-support/meta.json"
}
# See ../setup-hooks/role.bash
getTargetRole
getTargetRoleWrapper
addEnvHooks "$targetOffset" addToMetaPath
# No local scope in sourced file
unset -v role_post
writeMetaJson() {
mkdir -p "$output/nix-support"
# TODO: idk what the role_post stuff is needed here because not familiar with cross compile
IFS=':' read -ra dependencies <<< "$META_DEPENDENCIES"
# add a dependencies key containing the recursive meta json contents of all dependencies
# code currently assumes `meta.json` always exists
{ echo "$nixMetaJSON" ; jq '{dependencies:.}' --slurp "${dependencies[@]}" } | jq --slurp . > "$output/nix-support/meta.json"
}
fixupOutputHooks+=(writeMetaJson)There was a problem hiding this comment.
buildGoModule and friends could have their own setup hooks that add their own specific meta json with information we don't get at the derivation level and have those merged as well.
There was a problem hiding this comment.
I think having meta information of your runtime dependencies can still be useful. e.g. for a runtime threat scanner to be able to tell you if you're running a vulnerable version of something.
I'm not debating the usefulness of runtime dependency info, but we don't have that.
Runtime dependencies and an output's store path references are different concepts. (in a similar way that hermeticity and reproducibility should not be confused)
It surely doesn't help that some of us occasionally use "runtime dependency" for "dependency of an output" and we need to stop doing that. If we're talking about analyses, doubly so.
And herein lies the problem: how do you make sure that the runtime dependencies are even accounted for?
Static linking, header-only C/C++ deps, minified deps, or really any sort of bundled dependencies are invisible to the output's Nix reference graph. Unfortunately for this use case, that is actually a good thing. We don't want unnecessary store paths to be required at runtime.
The usual solution for this problem is to use a different output, but that immediately defeats the apparent goal of having a report readily available without using any Nix commands.
A built output with metadata may well serve a purpose, but any analysis that isn't derivation-aware is bound to have serious problems, so a built metadata thing in an output should only be considered in the context of a derivation-aware analysis.
There was a problem hiding this comment.
Another use case for this output metadata is license checking. This system, for example, would allow one to much more easily check if their deployed system has any GPL components.
Of course, static linking/header only libraries are still an issue, but it's much easier to start from here and add a few items manually than starting from the entire derivation closure and removing all transitive build-time dependencies.
The bash version string may not be the expected format until the fixed point is fully evaluated.
5ba1f79 to
748bc8c
Compare
Is that example still correct? It looks like it is using structured attrs and the JSON is still converted to a string, in contract to the PR description. @robert wrote
I agree. In the past I was more keen on putting the information elsewhere, but now I rather just do whatever Nixpkgs want to not block things. We can always fix the unnecessary rebuild problem later. |
|
I’d be really reluctant to extend the amount by which we mix |
2bd3db2 to
e42b2cd
Compare
|
Yeah that makes sense to me @emilazy. Just to be clear, I didn't mean to hurry it along before release, but was just belatedly responding to the ping from @fricklerhandwerk, and didn't want it to be held up on my account. I'm fine if it is held up for other reasons which are not me. |
|
@Ericson2314 @emilazy @ConnorBaker |
Changes
identifiers,license,mainProgram, andvendorfields from themkDerivationattrSet under a newnixMetafield in the derivation.nixMetarespects__structuredAttrs; if enabled the existing attribute structure of the aforementioned attributes is preserved. If__structuredAttrsis false, the attributes will be encoded as a JSON string.nixMeta.Example Derivation output
For example, a derivation's
envwill look like this if__structuredAttrsis false:{ "/nix/store/aflkbqqs7f8hyp0l6przb3bpxr5xs1dh-hello-2.12.2.drv": { "args": [ "-e", "/nix/store/l622p70vy8k5sh7y5wizi5f2mic6ynpg-source-stdenv.sh", "/nix/store/shkw4qm9qcw5sc5n1k5jznc83ny02r39-default-builder.sh" ], "builder": "/nix/store/mfy6cpswaknpqj1jv7h5s27dh60ziv3p-bash-5.3p3/bin/bash", "env": { "nixMeta": "{\"identifiers\":{\"cpeParts\":{\"edition\":\"*\",\"language\":\"*\",\"other\":\"*\",\"part\":\"a\",\"product\":\"hello\",\"sw_edition\":\"*\",\"target_hw\":\"*\",\"target_sw\":\"*\",\"vendor\":\"gnu\"},\"possibleCPEs\":[{\"cpe\":\"cpe:2.3:a:gnu:hello:2.12.2:*:*:*:*:*:*:*\",\"cpeParts\":{\"edition\":\"*\",\"language\":\"*\",\"other\":\"*\",\"part\":\"a\",\"product\":\"hello\",\"sw_edition\":\"*\",\"target_hw\":\"*\",\"target_sw\":\"*\",\"update\":\"*\",\"vendor\":\"gnu\",\"version\":\"2.12.2\"}},{\"cpe\":\"cpe:2.3:a:gnu:hello:2.12:2:*:*:*:*:*:*\",\"cpeParts\":{\"edition\":\"*\",\"language\":\"*\",\"other\":\"*\",\"part\":\"a\",\"product\":\"hello\",\"sw_edition\":\"*\",\"target_hw\":\"*\",\"target_sw\":\"*\",\"update\":\"2\",\"vendor\":\"gnu\",\"version\":\"2.12\"}}],\"v1\":{\"cpeParts\":{\"edition\":\"*\",\"language\":\"*\",\"other\":\"*\",\"part\":\"a\",\"product\":\"hello\",\"sw_edition\":\"*\",\"target_hw\":\"*\",\"target_sw\":\"*\",\"vendor\":\"gnu\"},\"possibleCPEs\":[{\"cpe\":\"cpe:2.3:a:gnu:hello:2.12.2:*:*:*:*:*:*:*\",\"cpeParts\":{\"edition\":\"*\",\"language\":\"*\",\"other\":\"*\",\"part\":\"a\",\"product\":\"hello\",\"sw_edition\":\"*\",\"target_hw\":\"*\",\"target_sw\":\"*\",\"update\":\"*\",\"vendor\":\"gnu\",\"version\":\"2.12.2\"}},{\"cpe\":\"cpe:2.3:a:gnu:hello:2.12:2:*:*:*:*:*:*\",\"cpeParts\":{\"edition\":\"*\",\"language\":\"*\",\"other\":\"*\",\"part\":\"a\",\"product\":\"hello\",\"sw_edition\":\"*\",\"target_hw\":\"*\",\"target_sw\":\"*\",\"update\":\"2\",\"vendor\":\"gnu\",\"version\":\"2.12\"}}]}},\"license\":{\"deprecated\":false,\"free\":true,\"fullName\":\"GNU General Public License v3.0 or later\",\"redistributable\":true,\"shortName\":\"gpl3Plus\",\"spdxId\":\"GPL-3.0-or-later\",\"url\":\"https://spdx.org/licenses/GPL-3.0-or-later.html\"},\"mainProgram\":\"hello\"}", "out": "/nix/store/23p192rg0c8r0lzqgrr0gykk6vlwlzj5-hello-2.12.2", [...] "system": "x86_64-linux", "version": "2.12.2" }, "inputDrvs": {}, [...] }If
__structuredAttrsis enabled, thennixMetais added as a attrSet.{ "/nix/store/5gpn6haap01zy2l8xdnx84my6lafshkn-hello-2.12.2.drv": { "args": [ "-e", "/nix/store/l622p70vy8k5sh7y5wizi5f2mic6ynpg-source-stdenv.sh", "/nix/store/shkw4qm9qcw5sc5n1k5jznc83ny02r39-default-builder.sh" ], "builder": "/nix/store/mfy6cpswaknpqj1jv7h5s27dh60ziv3p-bash-5.3p3/bin/bash", "env": { "out": "/nix/store/g45fpaampg8hfwf8cj61y8pl1nbjwsfs-hello-2.12.2" }, "inputDrvs": { [...] } }, "inputSrcs": [...], "name": "hello-2.12.2", "outputs": { "out": { "path": "/nix/store/g45fpaampg8hfwf8cj61y8pl1nbjwsfs-hello-2.12.2" } }, "structuredAttrs": { [...] "name": "hello-2.12.2", "nativeBuildInputs": [ "/nix/store/671cssvzsgf47jqxm8zcl7lgpnbqivm3-version-check-hook" ], "nixMeta": { "identifiers": { "cpeParts": { "edition": "*", "language": "*", "other": "*", "part": "a", "product": "hello", "sw_edition": "*", "target_hw": "*", "target_sw": "*", "vendor": "gnu" }, "possibleCPEs": [ { "cpe": "cpe:2.3:a:gnu:hello:2.12.2:*:*:*:*:*:*:*", "cpeParts": { "edition": "*", "language": "*", "other": "*", "part": "a", "product": "hello", "sw_edition": "*", "target_hw": "*", "target_sw": "*", "update": "*", "vendor": "gnu", "version": "2.12.2" } }, { "cpe": "cpe:2.3:a:gnu:hello:2.12:2:*:*:*:*:*:*", "cpeParts": { "edition": "*", "language": "*", "other": "*", "part": "a", "product": "hello", "sw_edition": "*", "target_hw": "*", "target_sw": "*", "update": "2", "vendor": "gnu", "version": "2.12" } } ], "v1": { "cpeParts": { "edition": "*", "language": "*", "other": "*", "part": "a", "product": "hello", "sw_edition": "*", "target_hw": "*", "target_sw": "*", "vendor": "gnu" }, "possibleCPEs": [ { "cpe": "cpe:2.3:a:gnu:hello:2.12.2:*:*:*:*:*:*:*", "cpeParts": { "edition": "*", "language": "*", "other": "*", "part": "a", "product": "hello", "sw_edition": "*", "target_hw": "*", "target_sw": "*", "update": "*", "vendor": "gnu", "version": "2.12.2" } }, { "cpe": "cpe:2.3:a:gnu:hello:2.12:2:*:*:*:*:*:*", "cpeParts": { "edition": "*", "language": "*", "other": "*", "part": "a", "product": "hello", "sw_edition": "*", "target_hw": "*", "target_sw": "*", "update": "2", "vendor": "gnu", "version": "2.12" } } ] } }, "license": { "deprecated": false, "free": true, "fullName": "GNU General Public License v3.0 or later", "redistributable": true, "shortName": "gpl3Plus", "spdxId": "GPL-3.0-or-later", "url": "https://spdx.org/licenses/GPL-3.0-or-later.html" }, "mainProgram": "hello" }, "outputChecks": { "out": {} }, "outputs": [ "out" ], "patches": [], "pname": "hello", [...] }Proposal Background
Currently, package metadata is stripped before creating the derivation. Generally, we only retain the package name and version in the derivation and we loosely retain this is the nix store paths except reconstructing the exact name and version is imprecise.
This lost of meta information makes generating correct and complete SBOMs nearly impossible. Tools like
sbomnixrequire matching store paths and derivations to the correctmetainformation solely on the package name and version which often leads to incorrect matches and incomplete SBOM reports.An existing PR (#409797) to improve the state of CVE tracking will add the complete CPE identifier to the
metaattribute set. While this is a great improvement, nixpkgs still lacks method to query the meta/CPE information for a given derivation or store-path. A previous PR #256296, attempted to resolve this by including the entiremetaattrSet in the derivation but concerns over the risk of mass-rebuilds due to minor changes (such as to the maintainers list) stalled it's adoption.Instead of including all meta attributes, this PR includes subset of meta attributes;
identifiers,license,mainProgram, andvendor. Changes to these attributes are in-frequent or would involve a rebuild anyway (e.g updatingmainProgram).