-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Tarballs, not tarbombs #4980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tarballs, not tarbombs #4980
Conversation
333747d to
ec22edf
Compare
We put other commands in variables, so let's move these as well. In addition to the pure aesthetic value of this, we can also allow testing the makefile operations by overriding them with mocked commands or ":", the POSIX sh null command.
Use a variable for the tar command. This allows us to test with both GNU tar and BSD tar to verify correct operation.
Right now, extracting one of our archives (except for the source archives) causes all files to be extracted in the current directory. This is commonly referred to as a tarbomb, and is generally unwelcome. Instead, let's create a top-level directory in each of our archives named the same way as we currently name our source tarballs: the string "git-lfs-" and then a version number without the initial "v". This is the same algorithm GitHub would use to create its autogenerated archives. Because we can no longer use zip's -j option to strip off path components (since we now want to preserve at least some of them), using zip becomes tricky. Instead, let's use the BSD (libarchive) tar, which is natively available on Windows, macOS, and FreeBSD as the system tar implementation, and which can be easily installed on all Linux distros under the name "bsdtar". This program supports tar files, zip files, and a wide variety of other formats that we don't need here and is ideal for this usage. However, it doesn't support updating zip files in place, so we need to extract and rebuild the archives instead.
There are a wide variety of possible formats for tar archives. The POSIX standard provides the ustar format (Unix standard tar), and GNU tar has extended this with various extensions to better support long file names. The latter is the default format for GNU tar. However, POSIX has also standardized an additional format, called the pax format, which is an extension to the ustar format that allows for functionally unlimited file name lengths, file sizes, and other attributes. Because it is standardized, it is substantially more portable than GNU tar, and it is also used by default in Git, so anyone who can consume a Git-produced tar archive can therefore handle this format. The pax format can be enabled with the --posix option to both GNU and BSD tar. Since it's a substantial improvement over the default for GNU tar, let's produce it by default when we create a tar archive.
ec22edf to
1836434
Compare
chrisd8088
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just had one question about the release workflow going back to Go 1.17; otherwise this looked great, thank you!
.github/workflows/release.yml
Outdated
| strategy: | ||
| matrix: | ||
| go: ['1.18.x'] | ||
| go: ['1.17.x'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep this on 1.18.x?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I rebased and there was a conflict which I apparently misresolved. Will fix.
On macOS and Windows, the default tar implementation is the BSD (libarchive) tar, which handles various formats. We need this tar implementation to handle the creation and modification of zip files, which GNU tar doesn't support. As a result, install the libarchive-tools package on Linux as well so we can make use of it.
1836434 to
31003a8
Compare
|
It would be helpful if this ticket clarified what the subdirectory naming convention is, with an example. |
Right now, extracting one of our archives (except for the source archives) causes all files to be extracted in the current directory. This is commonly referred to as a tarbomb, and is generally unwelcome.
Instead, let's create a top-level directory in each of our archives named the same way as we currently name our source tarballs: the string "git-lfs-" and then a version number without the initial "v". This is the same algorithm GitHub would use to create its autogenerated archives.
Because we can no longer use zip's -j option to strip off path components (since we now want to preserve at least some of them), using zip becomes tricky. Instead, let's use the BSD (libarchive) tar, which is natively available on Windows, macOS, and FreeBSD as the system tar implementation, and which can be easily installed on all Linux distros under the name "bsdtar". This program supports tar files, zip files, and a wide variety of other formats that we don't need here and is ideal for this usage.
In addition, to make testing these operations easier, let's move several commands into variables so they can be overridden. Finally, we change our tar archives to use the pax format, which is already used by Git and is substantially more portable than GNU tar's default format.
This series is best reviewed commit by commit.
Fixes #4977