-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
design: what checksum to use for snapshot content #38
Comments
Not sure if this will help, but have a look at BLAKE2. It's fast, strong and natively supports tree hashing. |
Hashing all the metadata is not easy (although I'd like to see continuity to evolve to handle that). So I think tarsum is OK atm, but should allow specifying other algorithms by design. |
@AkihiroSuda So would you ignore the hardlink issue or fix it while still using the tar headers? |
Oh, I was misunderstanding hardlink just causes false cache miss, but it seems actually causing false cache hit. This hardlink issue seems to need to be fixed. Also, can we open GitHub issue about this at moby/moby repo, so that we can collect information from users about how this issue affects real usecase. |
I'll leave this open as we can make changes if needed but |
docker build
uses a hash that contains some selection of tar headers together with the file data for detecting data changes. For directories, the hashes of subfiles are used.In some cases, it is not correct, for example, the hash would change if a filename changes while it has no effect on the destination where it would be copied.
A bigger problem is that it does not support hardlinks. As the hash is over a tar header the second link would only hash the parent path and be completely wrong. Example of this is in https://gist.github.com/tonistiigi/775cb15d3918958020bdd0165f776005
One other solution would be to use https://github.com/containerd/continuity . A problem with that is that it is not a tree hash, so getting a hash of a directory means recursively walking it.
mtree
has similar problems.We could make our hash version if we pick the right headers and solve a way to normalize hardlinks. As these checksums are local to builder, we could use a faster crypto-hash than sha256 as well.
At first, we should probably go with the current hash as everything needed for that is mostly already implemented in https://github.com/moby/moby/blob/master/builder/remotecontext/tarsum.go . But if we want to fix these above issues, now would be a time to do it.
@dmcgowan
The text was updated successfully, but these errors were encountered: