-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Use custom random data generator for all test objects and filenames #5868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We use several techniques in our test suite to acquire random data and filenames, of which the most prevalent one is to read from the /dev/urandom special file. Recently, this appears to no longer work reliably when our CI jobs are run on Windows by our GitHub Actions workflows, possibly due to an upstream change in Git Bash, MSYS2, Cygwin, or for some other reason. We also use the uuidgen(1) command in one test in our t/t-prune.sh test script to populate Git LFS object content. However, that command is not installed with Git Bash, so on Windows this data is simply empty; while the test still passes, this is less than ideal. We already have a number of helper programs written in Go that provide platform-agnostic utilities to our test suite, and Go's crypto/rand package can be used to generate random data on all the platforms we support. On Windows in particular it uses the ProcessPrng API, since /dev/urandom does not exist on that platform. (We are only able to make use of /dev/urandom on Windows because a simulation of it is provided by the MSYS2 environment, and Git Bash, which we use to run our test suite, is built on MSYS2.) Therefore we add a new lfstest-genrandom program for our test suite which uses the crypto/rand Go package to return strings of random data. If a size is provided as a command-line argument, it returns that number of random bytes; otherwise, it returns an effectively endless stream of data. This parallels the design of the Git test suite's test-genrandom helper: https://github.com/git/git/blob/a7dae3bdc8b516d36f630b12bb01e853a667e0d9/t/helper/test-genrandom.c When the --base64 command-line flag is specified, our new program returns data encoded using the regular Base64 specification from RFC 4648. When the --base64url flag is specified instead, the returned data is encoded with the alternate Base64 specification from RFC 4648 that is suitable for use in filenames and URLs. In this case, slash and plus characters are avoided in favour of underscore and dash characters. With either flag, the encoded data has no Base64 padding, and if a size was provided on the command line, the data will be truncated to exactly that number of bytes after encoding (rather than being truncated before encoding, which would produce extra output bytes). While we could choose to only produce the URL-compatible form of Base64 data, we would have to rewrite a number of tests which expect to be able to read the random data we've written into a test file and then pass it to our calc_oid() shell function, which calls the Bash builtin command "printf" on the data. If the data happens to begin with a dash character, "printf" would interpret it as a flag argument and might fail. It is therefore more convenient in these cases to use the regular form of Base64 encoding. Conversely, if we only generated data encoded in the regular form of Base64, we would not be able to use it in the several places where we generate random directory names without also passing the data through something like the sha256sum(1) program, as we do now. By making the URL-compatible encoding available directly from our new utility program, we can simplify these portions of our test suite when, in a subsequent commit in this PR, we refactor our test's shell functions to make use of our new program. Note that we follow the example of several of our other test helper programs, such as lfstest-realpath and lfstest-testutils, in using distinct exit codes (specifically, the values 2 and above) for each error condition.
In a previous commit in this PR we introduced a new utility program for our test suite, lfstest-genrandom, which reliably produces random data on all the platforms we support, either as raw binary data or using one of several forms of Base64 encoding. We now make use of that program throughout our test suite, in place of reading from /dev/urandom directly. (Note that because the lfstest-genrandom program uses the Go crypto/rand package to collect the random data, on macOS and Linux systems the ultimate source of the data will still be /dev/urandom, since the Go package reads from that special file on those platforms. On Windows, however, the Go package makes use of the native Windows ProcessPrng API, so we will no longer be dependent on the Git Bash environment to provide a simulated /dev/urandom on Windows.) In most cases, we pass the --base64 flag to our lfstest-genrandom program, which allows us to eliminate many calls to the base64(1) command. We can also pass the exact size of the returned data we desire as a command-line argument, which eliminates the need to pipe the data through the head(1) command. In two instances, in the remove_and_create_local_repo() and remove_and_create_remote_repo() functions in our t/fixtures/migrate.sh file, we use the --base64url flag so that we can use the returned strings as directory name suffixes without further processing. Previously, we had to pipe the data through base64(1), head(1), sha256sum(1), and cut(1) in order to ensure we had a random string safe for use in a directory name. And in the setup_local_branch_with_special_character_files() function in our t/fixtures/migrate.sh file we don't pass any flags to the lfstest-genrandom program as we want raw binary data for the test files created by that function.
In previous commits in this PR we introduced a new utility program for our test suite, lfstest-genrandom, which reliably produces random data on all the platforms we support (in several forms of Base64 encoding, or as raw binary data), and then converted much of our test suite to make use of this new program. As noted in one of those commits, one test in our t/t-prune.sh test script makes use of the uuidgen(1) command to populate some Git LFS objects with arbitrary text, but as this command does not exist in the Git Bash environment on Windows, the objects actually have empty data on that platform. The test still passes because it doesn't depend on the length of the random data, and because the non-zero exit code caused by the missing uuidgen(1) command is ignored as it appears earlier in a pipeline whose final command (our lfstest-testutils program) always succeeds. We therefore revise this "prune verify large numbers of refs" test (which was added in commit dd19c1b of PR git-lfs#1861) to call our new lfstest-genrandom program instead of the uuidgen(1) command. We pass the --base64 flag to our program and request a slightly larger size for the output Base64-encoded string than the length of a UUID, just to ensure we are using a similar amount of random data as before, although this doesn't actually affect the success or failure of the test. We also adjust the test so that it does not create the Git LFS objects with sizes that do not match the length of their arbitrary content. (Again, the test passed despite this inconsistency in its test data.)
In commit 96775c6 of PR git-lfs#2929 the "migrate import (--object-map)" test was added to the test suite for the "git lfs migrate import" command to validate the --object-map option for that command, which was introduced in the same PR. This test makes use of the mktemp(1) shell command to generate a unique temporary directory, into which the test then writes several files including the Git LFS migration's object map file. Likewise, in commit 7632a20 of PR git-lfs#3084, a similar "migrate export (--object-map)" test was added to the test suite for the "git lfs migrate export" command, which was introduced in that PR, and this test also uses the mktemp(1) shell command to create a unique temporary directory. However, neither test removes these temporary directories after completion, so they remain in place indefinitely in whatever system location is used by the mktemp(1) command. This is in addition to the temporary directory created by the mktemp(1) command which is used throughout all our test suites, whose path we store in the GIT_LFS_TEST_DIR variable so it can be reused by all test suites after the first one to run. (At least, that is the intent of the variable, but in practice, because we do not export it, and because each test script runs in its own shell session, each script creates another temporary directory. We will address this issue in a subsequent PR.) In a previous commit in this PR we introduced a new lfstest-genrandom utility program for our test suite, so we can now make use of it to replace both instances where our migration tests make an extra call to the mktemp(1) shell command (beyond the one used to populate the GIT_LFS_TEST_DIR variable). In both tests, we now create a unique directory whose path begins with the path in the GIT_LFS_TEST_DIR variable, followed by a directory name partially generated by our lfstest-genrandom program. We pass the --base64url flag to the program ensure it returns only characters safe to use in a file or directory name on any OS platform.
In commit cc41ca5 we changed the template used for our call to mktemp(1) to add a suffix of "XXXXXX" in order to comply with the requirements of the GNU version of that command, which expects at least 3 "X" characters in a template for a temporary file or directory name. However, on macOS, the BSD version of mktemp(1) treats these as literal "X" characters if the -t option is used, because in this case the following argument is interpreted as a prefix, not a template. Thus we actually create temporary directories on macOS of the form /var/folders/.../git-lfs_TEMP.XXXXXX.<random>, with 8 random characters appended by the command. Since the "X" characters are unnecessary in this case, we adjust our definition of the TEMPDIR_PREFIX variable to leave them off when our tests are running on macOS.
This was referenced Sep 21, 2024
larsxschneider
approved these changes
Sep 27, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For the past several months our CI test suite has experienced intermittent failures when running on Windows, which are caused when no data is returned from the simulated
/dev/urandomspecial file in Git Bash.While the cause of this regression in the Git Bash/MSYS2 environment is unclear, to avoid this and any similar issues in the future, we introduce a new test helper program named
lfstest-genrandom, and then use it to generate the contents of all the test files we need throughout our test suite.Further, we make use of this new utility to generate random filenames and directory names, which permits us to replace calls to
mktemp(1)anduuidgen(1)(which doesn't exist in Git Bash) in a number of our tests.Each commit in this PR has a detailed description; however, it should also be possible to review this PR as a single patch diff.
CI Failures on Windows
In the most recent CI job run from the merge of PR #5866 earlier this week, we can see the consequences when
/dev/urandomdoes not return any data.As one example among many other test failures, the
clean stdintest in ourt/t-clean.shtest script does not succeed. The test creates two files filled with data from/dev/urandomand calculates their Git LFS OIDs. In the error log from the failed test, we can see these OIDs were bothe3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, which is the SHA-256 hash of an empty string. When the test then runsgit lfs cleanon the files, that command produces no output (which is expected in the case of zero-byte input files), so when the test compares the OIDs to the output ofgit lfs clean, they don't match and the test fails.stderr log
New Test Helper Program
Our new
lfstest-genrandomprogram takes an optional command-line argument specifying the number of random bytes to return, as well as optional--base64or--base64urlflag arguments.When no size argument is provided, the program returns an effectively endless stream of random data, following the model of the
test-genrandomhelper program in the Git test suite. Otherwise the program returns exactly the specified number of bytes of random data.When the
--base64argument is given, the output is encoded in regular Base64, as defined by RFC 4648, including/and+characters. If the--base64urlargument is used instead, the output is encoded in the alternate Base64 encoding format defined in RFC 4648, which uses the-and_characters. In both cases, no Base64 padding is added, and if a size argument was provided, the output is truncated to the specified size after encoding, not before.The program follows the example of several of our other test helper programs, such as
lfstest-realpathandlfstest-testutils, in using distinct exit codes (specifically, the values 2 and above) for each error condition.With this helper program available, we can then replace all our shell pipelines of the form
base64 < /dev/urandom | head -c 120 >foo.txtwith equivalents likelfstest-genrandom --base64 120 >foo.txt.We can also rewrite a handful of calls to
mktemp(1)anduuidgen(1), simplifying them and also ensuring we always use data of the expected length and type.In particular, the
prune verify large numbers of refstest in ourt/t-prune.shscript callsuuidgen(1)to create random Git LFS object data; however, that command is not available in the Git Bash environment on Windows, so in practice the test uses empty data. While that test's success is not conditional on having valid Git LFS object data, having invalid data is not an ideal practice, and we can easily correct this issue now using thelfstest-genrandomprogram.Temporary Directory Names
Both our
t/t-migrate-export.shandt/t-migrate-import.shtest scripts make calls to themktemp(1)command to create temporary directories. However, they do not remove these after the tests complete, in part because the directories are created outside of the location specified by ourGIT_LFS_TEST_DIRvariable (but see the notes below on this point).We can replace these calls to
mktemp(1)with the use of ourlfstest-genrandomprogram and thus ensure the directories we create have paths which begin with the path stored in theGIT_LFS_TEST_DIRvariable, so at least they will be removed if and when that directory is deleted. Note, though, that due to the fact that we do not export that variable, and because each test script runs in its own shell session, each script creates another temporary directory for its own privateGIT_LFS_TEST_DIRvariable. Moreover, we do not consistently remove these directories after the test suites are run, as we never check or make use of theyesvalue we store in theRM_GIT_LFS_TEST_DIRvariable. We expect to address both issues in subsequent PRs.For now, one additional change we do make is to fix the path generated on macOS by our sole remaining call to
mktemp(1). On macOS, the BSD version ofmktemp(1)treats theXcharacters in our directory name template as literals, leading to names likegit-lfs_TEMP.XXXXXX.y3IwCMwminstead ofgit-lfs_TEMP.y3IwCMwm. (This version of the command treats the argument following the-toption as a prefix rather than a template, unlike the GNU version.) So we simply set ourTEMPDIR_PREFIXtogit-lfs_TEMPon macOS, which ensures we are returned directory names of the form we intend, without spuriousXcharacters.Expected and Unrelated CI Failures
Note that we expect our
Build with specific GoCI job to fail at the moment, due to the recent release of Go version 1.23. We install the latest version of thegoimportspackage, and the installation fails in theBuild with specific GoCI job because thex/toolsmodule requires Go 1.22 as of commit golang/tools@70f5626, and we are still using Go 1.21 for that CI job. We will address this issue in a subsequent PR.