buildcache create: reproducible tarballs#35623
Merged
alalazo merged 7 commits intospack:developfrom Mar 8, 2023
Merged
Conversation
42f80d3 to
442dc1f
Compare
eebb375 to
838e0d0
Compare
Currently `spack buildcache create` creates compressed tarballs that differ between each invocation, thanks to: 1. The gzip header containing mtime set to time.time() 2. The generated buildinfo file which has a different mtime every time. To avoid this, you have to explicitly construct GZipFile yourself, since the Python API doesn't expose the mtime arg, and we have to manually create the tarinfo object for the buildinfo metadata file.
838e0d0 to
1620a1c
Compare
…ymlinks: set 0o755 permissions in tarfile; other files use 0o644
alalazo
previously approved these changes
Mar 8, 2023
alalazo
approved these changes
Mar 8, 2023
jmcarcell
pushed a commit
to key4hep/spack
that referenced
this pull request
Apr 13, 2023
Currently `spack buildcache create` creates compressed tarballs that differ between each invocation, thanks to: 1. The gzip header containing mtime set to time.time() 2. The generated buildinfo file which has a different mtime every time. To avoid this, you have to explicitly construct GZipFile yourself, since the Python API doesn't expose the mtime arg, and we have to manually create the tarinfo object for the buildinfo metadata file. Normalize mode: regular files & hardlinks executable by user, dirs, symlinks: set 0o755 permissions in tarfile; other files use 0o644
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Currently
spack buildcache createcreates compressed tarballs thatdiffer between each invocation (even when nothing is modified in the
prefix), thanks to:
To avoid this, construct GZipFile explicitly (since the Python API doesn't
expose the mtime arg) and update the tarinfo entries.
Notice that for distribution (spack buildcache create), there is no point
in recording the the user and group ids and names, since they don't
exist on other systems. In fact, that data is only ever used when python
is run with privileges, since only then it can try and restore the
ownership. I think it's a really bad idea to have this in Spack, in fact it
may cause problems when extracting a binary tarball in a docker
container where the user is root, and Python fails to change user ids
because the users don't exist there. (We've seen this in the past for
source tarballs actually, so we changed to --no-same-owner or smth).
To fix the ownership issue: set uid/gid to 0, meaning that if run under
elevated privileges, the user is "changed" to root, i.e. not changed at
all.
Lastly, in binary distribution, we should only ever add dirs, files, symlinks
and hardlinks to the tarball, not character/block devices and fifos, so
this change explicitly drops those types of entries.
For file modes, normalize like
git:0o644for files/links that are notexecutable by user, and
0o755otherwise (including dirs, symlinks).This also prevents surprises when extracting a tarball with
tar xf,since users wouldn't have to worry (as much) about newly created
files being world-writable and executable.
A test is added that checks for bitwise reproducibility of a compressed
tarball, even when the timestamps on disk are altered. Also a test is
added to check file mode normalization.