Skip to content

Conversation

@bk2204
Copy link
Member

@bk2204 bk2204 commented Apr 29, 2022

Right now, extracting one of our archives (except for the source archives) causes all files to be extracted in the current directory. This is commonly referred to as a tarbomb, and is generally unwelcome.

Instead, let's create a top-level directory in each of our archives named the same way as we currently name our source tarballs: the string "git-lfs-" and then a version number without the initial "v". This is the same algorithm GitHub would use to create its autogenerated archives.

Because we can no longer use zip's -j option to strip off path components (since we now want to preserve at least some of them), using zip becomes tricky. Instead, let's use the BSD (libarchive) tar, which is natively available on Windows, macOS, and FreeBSD as the system tar implementation, and which can be easily installed on all Linux distros under the name "bsdtar". This program supports tar files, zip files, and a wide variety of other formats that we don't need here and is ideal for this usage.

In addition, to make testing these operations easier, let's move several commands into variables so they can be overridden. Finally, we change our tar archives to use the pax format, which is already used by Git and is substantially more portable than GNU tar's default format.

This series is best reviewed commit by commit.

Fixes #4977

@bk2204 bk2204 force-pushed the tarballs-not-tarbombs branch from 333747d to ec22edf Compare April 29, 2022 15:04
bk2204 added 4 commits May 2, 2022 14:41
We put other commands in variables, so let's move these as well.  In
addition to the pure aesthetic value of this, we can also allow testing
the makefile operations by overriding them with mocked commands or ":",
the POSIX sh null command.
Use a variable for the tar command.  This allows us to test with both
GNU tar and BSD tar to verify correct operation.
Right now, extracting one of our archives (except for the source
archives) causes all files to be extracted in the current directory.
This is commonly referred to as a tarbomb, and is generally unwelcome.

Instead, let's create a top-level directory in each of our archives
named the same way as we currently name our source tarballs: the string
"git-lfs-" and then a version number without the initial "v".  This is
the same algorithm GitHub would use to create its autogenerated
archives.

Because we can no longer use zip's -j option to strip off path
components (since we now want to preserve at least some of them), using
zip becomes tricky.  Instead, let's use the BSD (libarchive) tar, which
is natively available on Windows, macOS, and FreeBSD as the system tar
implementation, and which can be easily installed on all Linux distros
under the name "bsdtar".  This program supports tar files, zip files,
and a wide variety of other formats that we don't need here and is ideal
for this usage.  However, it doesn't support updating zip files in
place, so we need to extract and rebuild the archives instead.
There are a wide variety of possible formats for tar archives.  The
POSIX standard provides the ustar format (Unix standard tar), and GNU
tar has extended this with various extensions to better support long
file names.  The latter is the default format for GNU tar.

However, POSIX has also standardized an additional format, called the
pax format, which is an extension to the ustar format that allows for
functionally unlimited file name lengths, file sizes, and other
attributes.  Because it is standardized, it is substantially more
portable than GNU tar, and it is also used by default in Git, so anyone
who can consume a Git-produced tar archive can therefore handle this
format.

The pax format can be enabled with the --posix option to both GNU and
BSD tar.  Since it's a substantial improvement over the default for GNU
tar, let's produce it by default when we create a tar archive.
@bk2204 bk2204 force-pushed the tarballs-not-tarbombs branch from ec22edf to 1836434 Compare May 2, 2022 14:42
@bk2204 bk2204 marked this pull request as ready for review May 2, 2022 14:42
@bk2204 bk2204 requested a review from a team as a code owner May 2, 2022 14:42
Copy link
Member

@chrisd8088 chrisd8088 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had one question about the release workflow going back to Go 1.17; otherwise this looked great, thank you!

strategy:
matrix:
go: ['1.18.x']
go: ['1.17.x']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this on 1.18.x?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I rebased and there was a conflict which I apparently misresolved. Will fix.

On macOS and Windows, the default tar implementation is the BSD
(libarchive) tar, which handles various formats.  We need this tar
implementation to handle the creation and modification of zip files,
which GNU tar doesn't support.  As a result, install the
libarchive-tools package on Linux as well so we can make use of it.
@bk2204 bk2204 force-pushed the tarballs-not-tarbombs branch from 1836434 to 31003a8 Compare May 2, 2022 19:22
@bk2204 bk2204 merged commit cc6c955 into git-lfs:main May 3, 2022
@bk2204 bk2204 deleted the tarballs-not-tarbombs branch May 4, 2022 15:17
@chrisd8088 chrisd8088 mentioned this pull request May 24, 2022
@JamesHutchisonCarta
Copy link

JamesHutchisonCarta commented May 25, 2022

It would be helpful if this ticket clarified what the subdirectory naming convention is, with an example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make tarballs not tarbombs

3 participants