Conversation
|
Note that this is just a draft implementation. One thing that is puzzling me right now is that with the following expression, the output directory simply disappears after build: pkgs.nodePackages.fetchNodeModules {
src = pkgs.fetchFromGitHub {
owner = "nttgin";
repo = "BGPalerter";
rev = "v1.28.1";
sha256 = "sha256-Y0atkJfZxjUOGPQ3goXS/YD5SsX9ZjpbM0Nc5IuaFP4=";
};
makeTarball = false;
exposeBin = true;
sha256 = "sha256-T9W96csyA+/fwFs1eNccslx5qcbVTpAtqLM1Xd32tk4=";
}Would like to know if this problem is reproducible by others and what the solution is. |
|
What does this do that npmlock2nix (or one of the other 20 attempts to do npm things with nix) doesn't? |
I think npmlock2nix is doing import from derivation, which this tool does not. |
|
|
There was a problem hiding this comment.
I think npm also supports yarn.lock as well: https://blog.npmjs.org/post/621733939456933888/npm-v7-series-why-keep-package-lockjson.html
There was a problem hiding this comment.
It does with the normal npm i which will automatically generate a package-lock.json from yarn.lock, but not the deterministic npm ci we are using which refuses to perform any change to the lock files.
Also this tool doesn't attempt to parse the |
2c1d3e0 to
044844a
Compare
There was a problem hiding this comment.
Could we move the files under lib, share or opt?
There was a problem hiding this comment.
Sure, it makes sense to follow the conventions since we are having a bin (if exposeBin is set) as well. Could you also test if you can reproduce the problem that I'm seeing with exposeBin?
There was a problem hiding this comment.
I don't really have the time right now to fiddle with this.
044844a to
93c7d52
Compare
|
Pushed a fix to change the directory structure ( |
roberth
left a comment
There was a problem hiding this comment.
Reproducibility is enforced both in npm (lock file) and Nix (hash of final output).
Enforcement in Nix happens, but in a very narrow way. If you forget to update the hash, you will be using stale dependencies, unknowingly, until "hopefully" someone else tries to build it without your store or binary cache failing the fetcher, detecting the problem. Considering that these will end up on cache.nixos.org, these problems can linger for a long time.
It's also worth considering that these store paths will duplicate a lot of files that are shared store paths in case of other solutions. (hard link optimization only applies to a local nix store; nix-copy-closure and other forms of distribution such as docker images are still affected)
The hashes in the examples have been updated.
We don't have a method for finding and updating these when a change logic inevitably requires this again. This is especially true when other projects call this function.
This might be alleviated somewhat by including a hash of relevant inputs in the name, forcing a new store path when those inputs change. However, the selection of those inputs is a trade-off between usability and correctness. Too many and you'll burden the user with unnecessary hash updates. Too few and we'll have reproducibility problems. This is not a choice we should have to make.
Specifically with npm we don't have to make this trade-off because it provides us with perfectly usable lock files that we can parse with Nix to fetch dependencies in a reliably reproducible way.
This problem can be solved by adding the lock file to the dependencies as well and than diff the lock file from the fixed-derivation output with lock file delivered in the source.
Having thousands of node derivations also comes with a cost: evaluation memory and nixpkgs download/unpack size.
Does not really scale in nixpkgs. Dependency trees are so large for the average nodejs package that the added file size is an not acceptable overhead to store all lock files in nixpkgs. Otherwise we would have switched tools like https://github.com/nix-community/yarn2nix or similar. We cannot use import from derivation because this would increase the evaluation time. |
|
To add another data sample. Peertubes yarn.lock file is 8000 lines. This is a huge amount for a single package even without the full dependency tree. I think this fetcher here is the only sane approach for nixpkgs. |
|
I expect this fetcher to be implementable as a non-fixed output recursive derivation that
Recursive Nix is implemented but experimental, so while we could try it today, we can't use it by default in Nixpkgs. Eelco has raised a concern
However, I believe the impurity can be addressed without disallowing it. If for whatever reason that doesn't work out, this idea seems to be compatible with RFC40 ret-cont recursive Nix; a more restrictive form of recursion. Perhaps even RFC92, although I really can't confirm that now. With this solution on the horizon, I believe we can use this PR in the interim without creating a structural problem for Nix's perceived reliability. |
That wont work for everything because upstream does not always provide an (up to date) lock file.
npm can take anything as a downloaded file and we already fetching random git repos in nodePackages right now. |
While true, I do not think it's relevant to my comment. It is a more general problem but the solution is easy enough: make a PR to update the lock file, and fetch from the PR. Preferably tag it in the fork as well to avoid loss on rebase, but that's yet another topic.
I intended |
The algorithm you are describing is effectively import-from-derivation. It is not a good idea to implement it this way because this would force a nix eval process to download all node packages just to evaluate nixpkgs. This is slow and takes a lot of disk space and moves computational work from builder to evaluator (bad for how Nix CIs currently work). |
It is not. Evaluation for the (large number of) fetcher calls happens inside the sandbox thanks to recursive Nix. I've clarified the original comment.
To the rest of Nixpkgs, a recursive Nix
Hercules CI has solved the user experience around import-from-derivation. It doesn't have such an evaluation bottleneck and it's clear when its evaluator is waiting for IFD. IFD is still sequential as of today, but that's not a problem in practice, unless you're building at the scale of Nixpkgs. |
Ok. But it is not clear when we will be able to use this feature. So far a nix release is just pushed forward year by year by adding more experimental features. To me it seems we could add it with a checksum and when time comes we could switch over to recursive nix by dropping the need for a checksum.
That's nice however hercules evaluation is still slower than both plain nix and hydra (which uses multiple threads). |
So we agree.
Thanks for letting me know. I'll start to optimize eval soon. |
Nope, it doesn't. It works in hydras restricted eval mode. You can use it with IFD if you don't want to copy the lock file into the repo. Give that this PR proposes yet another big FOD I am really not in favor of this. We already had enough trouble in the Go and Rust world due to these issues. I was originally planning to propose npmlock2nix for nixpkgs but I want it to be more mature first. |
npmlock2nix is also not a solution just look how large a package-lock.json is: https://github.com/andig/evcc/blob/master/package-lock.json. That might work for projects that have the package-lock.json in the repo but certainly not for nixpkgs. It also still will increase evaluation time due to large dependency graph it needs to process. I don't see any other scalable solution but this fixed-input derivations and maybe recursive nix in the future. |
There was a problem hiding this comment.
you forgot to inherit exposeBin, or am I missing something ?
There was a problem hiding this comment.
same with the npmFlags ?
There was a problem hiding this comment.
Last comment on this, I don't know what is the best practice here. But rather than inheriting variables that are unrelated to mkDerivation (I had to search the codebase to see if those variables where used elsewhere), could it make sense to use those variables directly where they are required? Something like
- remove the inherit runScripts
- inside the build phase have something like
''
if ! ${runScripts} ; then
NPM_FLAGS+=" --ignore-scripts"
fi
''
There was a problem hiding this comment.
Yes, I should add it there for clarity, even though it's pulled in with the other arguments in the last line with //.
There was a problem hiding this comment.
What I know about bash is that the -n checks if a variable is non empty.
What is a little confusing to me is that production is either true or false, but it's never empty.
Perhaps this is a behavior of a mkDerivation that will convert variable it inherits as empty if they are false. It seems a littlle unintuitive to me. I just want to confirm here that this is the case.
There was a problem hiding this comment.
one thing that is curious to me is that cargo by default makes a tarball, there is no way to not make a tarball.
(ref)
I wonder if anybody what motivated that decision.
Checking the go code, it does not make a tarball at all. (ref)
I wonder why
- go took an opposite approach from rust
- why it would be good to have both ?
I'm guessing the tarball can help if you need to remote-build ?
|
today there was the new nix version released on nixpkgs-unstable just saying that the same problem of disappearing derivation happens with that new version as well. |
There was a problem hiding this comment.
perhaps the default should be false here ?
My impression is that fetchNodeModules will fetch the node_modules. There is therefore a high chance that people will use those dependencies to build something. Building will require dev dependencies. I don't see a context in which it would make sense to not have the dev dependencies in the node_modules (since you won't be able to build the related project with it). Actually if you want to optimize the closure size, you might want to first have a dev dependencies, build the project then replace the node_modules folder with only production dependencies.
There was a problem hiding this comment.
Yes, the documentation should encourage packagers to run npm prune --production after building.
There was a problem hiding this comment.
ok, then it's a matter of changing what the production flag does. At the moment it only installs dependencies and not devDependencies.
There was a problem hiding this comment.
just wondering here what the use case for this would be.
Would we want to expose binaries from the node_modules of a project ?
Were you thinking of webpack and those build executables for example ?
I'm wondering if it's not better to provide the node_modules folder as is and let the consumers of that folder handle the binary paths ?
(this is the approach that node2nix takes if I remember correctly). basically if you want in your consuming derivation use a single binary you can address it by path, if you want to make all of those available, you can do export PATH=node_modules/.bin
There was a problem hiding this comment.
(Sorry for the late response)
Yes, exactly. It's mostly there for convenience since some build scripts attempt to run those commands directly. I can do just fine without them, but we still need to resolve the other reproducibility issues.
|
In the interest of discussion I went ahead and did an implementation for yarn |
|
Have we considered the old
This is the old workflow though. Nowadays contributors do generate the exprs, because it's fast, but It does avoid the technical issues. |
|
To move this forward, I have removed the
For the missing output problem, an issue should be created for Nix, once we figure out a minimal repro example. I looked a bit more into the tarball reproducibility issue that @happysalada and @yu-re-ka mentioned, and it appears that the only differences come down to the permissions of the symlinks in the tar. For example: ( For reference, I'm using ZFS as my filesystem. |
73b4664 to
4e0c78a
Compare
|
Thank you for your work on this! I've tested again and here is the thing that works for me (just to confirm if we still have a different hash on different platforms) The good news is that the bug of the disappearing path after building is gone with this new update. I think the main point is to figure out if the hash is still different on different platforms. If it is, what to do about it. There are a couple of additional nits
@roberth I like the idea. People could generate their own node-packages to test when adding a package. When they are confident it works, they just commit the result of the expression, and then weekly the version update can be run. There might still be some hiccups, but it could be a short term solution. I think it would just be a matter of documenting the workflow properly. |
|
@happysalada would you like to lead such an effort? The |
|
@roberth thank you for your idea. I am a little at capacity right now, but I have added testing this approach on my task list for as soon as I get a bit more time. |
|
I marked this as stale due to inactivity. → More info |
|
Still important to me, can I do something to help @happysalada ? |
|
What is your level of interest on this ? if you are interested in doing 3 tests with js packages you are interested in, and reporting the results, that would be amazing. |
|
I've been using this downstream, but we should push this forward. I'll rebase the PR in a bit, and add the |
Between medium and high, I am not sure though I have different platforms alas (only |
|
Hi all. Over the past week or so, I've been working on a solution to this issue. (I actually didn't even know this PR was a thing before starting work on mine, until someone told me about it.) It populates a cache directory to avoid the issue of different hashes occurring across platforms, and because of its superior reproducibility, is most likely the better solution to push forward. There are more details in #189539 if you're interested in taking a look -- I welcome any and all comments/suggestions. |
|
I haven't tested #189539 yet, but I think the approach there is less prone to breakages. Let's close this and focus our efforts on that. |
Motivation for this change
This PR adds a new fetcher,
fetchNodeModules, which can download deterministic dependency sets from the public NPM registry according to the NPM lock file (package-lock.json). This is useful for ensuring that Node.js packages are able to use dependencies at the exact versions specified by upstream, avoiding silent breakages caused by updates to the sharednodePackagespackage set which does not respect semantic versioning. The fetcher will also greatly simplify the work required to package Node.js projects with complex dependencies without sacrificing reproducibility.Background
Currently, adding a package that makes use of the NPM ecosystem to Nixpkgs is done by adding all NPM dependencies it requires to one of the following places, both of which come with their own downsides:
Shared `nodePackages` package set
We keep a shared list of NPM packages in
pkgs/development/node-packages/node-packages.json. When people add packages to this list, a script (generate.sh) is run which fetches the latest versions of all packages in the list from NPM, writing the URLs to retrieve the packages along with their hashes intonode-packages.nix. This workflow has several major downsides:node-packages.nix.All of these inevitably lead to silent breakages of unrelated packages. An example of silent breakage can be found in a proposed addition of the BGPAlerter package (#115474). A major update to
js-yaml(3.14.1 -> 4.0.0) was introduced in a PR that updates rust-analyzer package. Unfortunately, this release is not backwards-compatible (indicated by the increase of the major release number) and it removes several deprecated functions that were available in earlier versions (e.g.,safeLoad). For BGPAlerter, the configuration wizard (run when there is no configuration) works without issue while actually starting the monitoring daemon will fail because of the removedsafeLoadfunction. In other words, such breakages can be invisible even from those directly testing the packages.Private dependency set
Alternatively, maintainers can also opt to separate their packages from the global shared package list. They work similarly to the global list, except at a smaller scale with changes affecting only the package itself. A tool like
node2nixandyarn2nixis used to convert the NPM/Yarn dependencies (with version constraints, and in most cases, lock files) into Nix expressions containing a list of URLs and hashes corresponding to each dependency. Dependencies are downloaded according to the version constraints specified in the package's manifest (package.json), and in most cases, a lock file (yarn.lock,package-lock.json) will be used which specifies the exact version for each required dependency. Because of that, breakages do not occur for packages using their private dependency sets.Due to the interconnected nature of the NPM ecosystem, the generated Nix expressions can be extremely long, often with line counts in the thousands or even tens of thousands. As a result, reviewers can be reluctant accepting new packages with their private dependency sets. The shared package set remains the preferred way for smaller leaf packages, and thus has to suffer from the issues listed above.
A list of packages that have their own dependency sets can be found by searching for
offline_cache =andgenerated by node2nixin Nixpkgs.fetchNodeModulesfetchNodeModulesis similar to thefetchCargoTarballfetcher for Rust packages in that it relies on the native language-specific package manager (npm) to complete the entire process of resolving and downloading dependencies, resulting in a bundle of frozen packages. A fixed-output derivation is used to allow npm network access and the results are compared against a predetermined hash in the Nix expression.The process is reproducible because like it relies on a lock file which specifies the exact version for each required dependency along with hashes created from the its contents, like
yarn2nixandnode2nixmentioned above. The difference is that the dependency list is not converted to Nix and only standard tools for the language are used. Reproducibility is enforced both in npm (lock file) and Nix (hash of final output).It is important to note that the original lock file and the translated Nix expressions from the aforementioned tools contain the same level of information.
fetchNodeModulesdoes not work with projects without a lock file.Example
A tarball of all runtime dependencies of BGPAlerter can be created with the expression below:
See also
Things done
sandboxinnix.confon non-NixOS linux)nix-shell -p nixpkgs-review --run "nixpkgs-review wip"./result/bin/)