Skip to content

buildcache create: reproducible tarballs#35623

Merged
alalazo merged 7 commits intospack:developfrom
haampie:fix/reproducible-repeated-buildcache-create
Mar 8, 2023
Merged

buildcache create: reproducible tarballs#35623
alalazo merged 7 commits intospack:developfrom
haampie:fix/reproducible-repeated-buildcache-create

Conversation

@haampie
Copy link
Copy Markdown
Member

@haampie haampie commented Feb 22, 2023

Currently spack buildcache create creates compressed tarballs that
differ between each invocation (even when nothing is modified in the
prefix), thanks to:

  1. The gzip header containing mtime set to time.time()
  2. tar.add() copying stat info (uid/gid/uname/gname/mtime).

To avoid this, construct GZipFile explicitly (since the Python API doesn't
expose the mtime arg) and update the tarinfo entries.

Notice that for distribution (spack buildcache create), there is no point
in recording the the user and group ids and names, since they don't
exist on other systems. In fact, that data is only ever used when python
is run with privileges, since only then it can try and restore the
ownership. I think it's a really bad idea to have this in Spack, in fact it
may cause problems when extracting a binary tarball in a docker
container where the user is root, and Python fails to change user ids
because the users don't exist there. (We've seen this in the past for
source tarballs actually, so we changed to --no-same-owner or smth).

To fix the ownership issue: set uid/gid to 0, meaning that if run under
elevated privileges, the user is "changed" to root, i.e. not changed at
all.

Lastly, in binary distribution, we should only ever add dirs, files, symlinks
and hardlinks to the tarball, not character/block devices and fifos, so
this change explicitly drops those types of entries.

For file modes, normalize like git: 0o644 for files/links that are not
executable by user, and 0o755 otherwise (including dirs, symlinks).
This also prevents surprises when extracting a tarball with tar xf,
since users wouldn't have to worry (as much) about newly created
files being world-writable and executable.

A test is added that checks for bitwise reproducibility of a compressed
tarball, even when the timestamps on disk are altered. Also a test is
added to check file mode normalization.

@spackbot-app spackbot-app bot added binary-packages core PR affects Spack core functionality labels Feb 22, 2023
@haampie haampie force-pushed the fix/reproducible-repeated-buildcache-create branch 2 times, most recently from 42f80d3 to 442dc1f Compare February 22, 2023 20:05
@haampie haampie force-pushed the fix/reproducible-repeated-buildcache-create branch 2 times, most recently from eebb375 to 838e0d0 Compare February 24, 2023 14:22
haampie and others added 4 commits February 24, 2023 16:46
Currently `spack buildcache create` creates compressed tarballs that
differ between each invocation, thanks to:

1. The gzip header containing mtime set to time.time()
2. The generated buildinfo file which has a different mtime every time.

To avoid this, you have to explicitly construct GZipFile yourself, since
the Python API doesn't expose the mtime arg, and we have to manually
create the tarinfo object for the buildinfo metadata file.
@haampie haampie force-pushed the fix/reproducible-repeated-buildcache-create branch from 838e0d0 to 1620a1c Compare February 24, 2023 15:47
@spackbot-app spackbot-app bot added the tests General test capability(ies) label Feb 24, 2023
@haampie haampie changed the title buildcache create: reproducible tarball across repeated invocations buildcache create: reproducible tarballs Feb 24, 2023
@haampie haampie requested a review from tgamblin February 27, 2023 21:31
alalazo
alalazo previously approved these changes Mar 8, 2023
@alalazo alalazo enabled auto-merge (squash) March 8, 2023 15:16
@alalazo alalazo merged commit 22d4e79 into spack:develop Mar 8, 2023
@haampie haampie deleted the fix/reproducible-repeated-buildcache-create branch March 8, 2023 15:52
jmcarcell pushed a commit to key4hep/spack that referenced this pull request Apr 13, 2023
Currently `spack buildcache create` creates compressed tarballs that
differ between each invocation, thanks to:

1. The gzip header containing mtime set to time.time()
2. The generated buildinfo file which has a different mtime every time.

To avoid this, you have to explicitly construct GZipFile yourself, since
the Python API doesn't expose the mtime arg, and we have to manually
create the tarinfo object for the buildinfo metadata file.

Normalize mode: regular files & hardlinks executable by user, dirs, symlinks: set 0o755 permissions in tarfile; other files use 0o644
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

binary-packages core PR affects Spack core functionality tests General test capability(ies)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants