Add filterFiles to lib and lib.sources#188301
Conversation
ursi
commented
Aug 25, 2022
lib/sources.nix
Outdated
There was a problem hiding this comment.
| This function always scans the entire directory tree. In large code bases, you can improve performance by filtering the second argument. |
(assuming you're flipping the parameters as suggested below, and we fix sourceLike support)
lib/sources.nix
Outdated
There was a problem hiding this comment.
This does not actually support all sourceLike values, such as the result of cleanSource. Types are a bit vague in Nix documentation, so I really can't blame you for not picking up on this subtlety.
There was a problem hiding this comment.
I just tried
lib.filesystem.listFilesRecursive (lib.cleanSource ./.)
in nixpkgs and it worked.
lib/sources.nix
Outdated
There was a problem hiding this comment.
This makes the source O(#files²) . Maybe something the memoizeSource function could be useful.
lib/sources.nix
Outdated
There was a problem hiding this comment.
This problem is not unique to this function, so it should be moved somewhere else.
There was a problem hiding this comment.
Am I good to pull this out into its own function and submit a PR with both, or should it be pulled out and put into a separate PR?
There was a problem hiding this comment.
Actually this is no good, because it reintroduces the parent path impurity.
Unfortunately it's up to us to make sure that the evaluation of a directory does not depend on the location of that directory. That location includes the basename: someone may use a different checkout, or a flake source may resolve to a different location depending on the entire project through the store path hash, thereby largely defeating the purpose of source filtering.
name should be "source" unless the caller specifies something else. In this case you can let the caller set the name afterwards with cleanSourceWith. @infinisil is working on #112083, which will add a more convenient function for setting the name.
There was a problem hiding this comment.
could you elaborate on what exactly "largely defeats the purpose of source filtering"?
There was a problem hiding this comment.
Apologies; I might have responded too quickly here. Nonetheless I am not sure if this new solution should be applied to this new function, and whether this new function fits the larger design of the source combinator library. I'll defer to @infinisil who's currently more up to speed about all of that, including lazy trees.
There was a problem hiding this comment.
@ursi I'm not entirely sure I understand this name handling here. What exactly does it do and what use cases are affected by this?
There was a problem hiding this comment.
It addresses the problem caused by using builtins.filterSource on a generated source. Such a source will have a name of the form /nix/store/c59y63zj2pgjx87184kv1kyxqqs9b5sl-source, which will be updated any time that source changes (A particularly common example of this is the source ./. inside a flake). Because filterSource uses the base name of the file for the name of the new derivation, this means that whenever the generated source changes, the source created from filterSource changes, even if the files are exactly the same - it has a different name.
This function checks to see if you're filtering a source from a derivation, and if you are, it just uses the generic name "source" instead of getting the name from the base name like filterSource would.
There was a problem hiding this comment.
I see, so on one hand this shouldn't be a problem anymore with lazy trees, which makes it so that the path to the local flake is not directly a nix store path anymore. But also, that Nix branch would break the use case this intends to fix, because baseNameOf for lazy paths (default for flakes) returns something like "virtual0000000000000000000000004-source". While I reported an issue with exactly this in that PR, all the proposed solutions would also break the code here (and if you consider paths to always be relative to a filesystem root, it also makes sense). In general I'd say that we shouldn't split "/nix/store/-" paths and try to infer something from that, it's not very sound in general, not just for the lazy trees PR.
On the other hand, @roberth's proposed solution of just always using "source" as the name and deferring changing of that name to other modifier functions sounds much better, I think we should go for that.
There was a problem hiding this comment.
I have gone ahead and just used the name "source" for everything.
lib/sources.nix
Outdated
There was a problem hiding this comment.
These two lines can be simplified to filter after the other changes.
|
Would it make sense to relativize the paths with respect to project root? This way someone can include/exclude directories "for free" with this api, for instance by using (If paths are not relativized, then you could test for |
|
Good point. It is unclear to me why |
Path values in Nix are always absolute. I don't think diverging from that is a great idea, because it creates a patch interface, compared to the other source functions. |
|
well for one, outputs |
|
(Relative paths also improves purity, since strictly speaking otherwise one can write a filter over e.g. |
|
@roberth If it's easy enough to demonstrate, can you give an example of the patch interference in code? |
|
Okay so I realized flakes actually fix the impurity issue of using absolute paths, since everything is moved to the nix store for evaluation, however, I still do think the relative path API has superior ergonomics. |
AFAIU, NixOS/nix#6530 makes it so flakes can be evaluated without copying them to the nix store first. |
|
Well if that's the case, the potentially the impurity argument still stands. |
|
I have pushed a new version with a lot of the changes made that had been pointed out. It is also much faster than before. |
|
I recently developed a safer and easier-to-use abstraction for source filtering in a draft PR to Nixpkgs, please take a look, try it out, and give feedback! https://discourse.nixos.org/t/easy-source-filtering-with-file-sets/29117 In particular the use case of this issue is covered with the generic let
fs = lib.fileset;
in
fs.fileFilter (file: file.ext == "html") ./.Though we can also easily add a convenience function like |
|
@infinisil in your example, does that just get all the html files directly in |
|
@ursi It traverses the entire |
Oh I'm just realizing that I answered something different than I think you asked. So the answer is that it only copies the html files into the Nix store. The other files are traversed recursively but not copied. |
|
I finally looked more into the fileset library. It does have one restriction that this does not - it doesn't work on any path, only local path values. So you cannot filter the result of another derivation. |
|
@ursi Indeed. What's your use case for filtering a store path? I'm wondering because maybe we could also have library functions for that, see also #264541. Note that if you need to filter a It does make a lot of sense for |
|
Well, the toy use case that made me realize this limitation was me trying to use your library to filter over all of nixpkgs to compare its performance with this function. I just tried pulling in nixpkgs from my project's flake input in the repl. |
|
@ursi I see, we obviously can't justify this as a use case for that feature, but if you have a practical use case let me know in #264541. I'd be also very happy to get feedback on the library in general, probably best to open issues and ping me if you encounter problems :) Regarding performance, I've done a benchmark before, and indeed it's less performant, however it's not possible to improve it without updating the builtins, see NixOS/nix#8820. In fact it's the same problem you've had to work around in this very PR here (doing a recursive In any case, I've done my best to optimise the file set library as much as possible without any such builtin, there's even a benchmark to check for regressions. |
|
I'll close this PR, since the file set library is fairly usable now. See #266356 for status, updates and feature requests! |