-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
(I wanted to split this thread from #296 (comment) .)
Let's discuss relations with IPFS here. As I see it, mainly a decentralized way to distribute nix-stored data would be appreciated.
What we might start with
The easiest usable step might be to allow distribution of fixed-output derivations over IPFS. That are paths that already are content-addressed, typically by (truncated) sha256 over either a flat file or a tar-like dump of a directory tree; more details are in the docs. These paths are mainly used for compressed tarballs of sources. This step itself should avoid lots of problems with unstable upstream downloads, assuming we could convince enough nixers to serve their files over IPFS.
Converting hashes
One of the difficulties is that we use different kinds of hashing than in IPFS, and I don't think it would be good to require converting those many thousands of hashes in our expressions. (Note that it's infeasible to convert among those hashes unless you have the whole content.) IPFS people might best suggest how to work around this. I imagine we want to "serve" a mapping from the hashes we use to the IPFS's hashes, perhaps realized through IPNS. (I don't know details of IPFS's design, I'm afraid.) There's an advantage that one can easily verify the nix-style hash in the end after obtaining the paths in any way.
Non-fixed content
If we get that far, it shouldn't be too hard to manage distributing everything via IPFS, as for all other derivations we use something we could call indirect content addressing. To explain that, let's look at how we distribute binaries now – our binary caches. We hash the build recipe, including all its recipe dependencies, and we inspect the corresponding narinfo URL on cache.nixos.org. If our build farm has built that recipe, various information is in that file, mainly the hashes of the content of the resulting outputs of that build and crypto-signatures of them.
Note that this narinfo step just converts our problem to the previous fixed-output case, and the conversion itself seems very reminiscent of IPNS.
Deduplication
Note that nix-built stuff has significantly greater than usual potential for chunk-level deduplication. Very often we do a rebuild of a package only because something in a dependency has changed, so there are only very minor changes expected in the results, mainly just exchanging the references to runtime dependencies as their paths have changed. (In seldom occasions even lengths of the paths would change.) There's a great potential to save on that during distribution of binaries, which would be utilized by implementing the section above, and even potential in saving disk space in comparison to our way of hardlinking equal files (the next paragraph).
Saving disk space
Another use might be to actually store the files in a FS similar to what IPFS uses. That seems a little more complex and tricky thing to deploy, e.g. I'm not sure someone already trusts the implementation of the FS enough to have the whole OS running of it.
It's probably premature to speculate too much on this use ATM; I'll just write I can imagine having symlinks from /nix/store/foo to /ipfs/*, representing the locally trusted version of that path. (That's working around the problems related to making /nix/store/foo content-addressed.) Perhaps it could start as a per-path opt-in, so one could move only the less vital paths out of /nix/store itself.
I can help personally with bridging the two communities in my spare time. Not too long ago, I spent many months on researching various ways to handle "highly redundant" data, mainly from the point of view of theoretical computer science.