Fix flaky test `t-credentials.sh` #5616

bk2204 · 2024-01-11T19:09:08Z

We've noticed that this test has become flaky sometimes and fails in CI. More recently, a distributor noticed that this test fails consistently when run with 4 jobs at a time with prove. Not all of the tests fail every time, but at least one of them does.

Part of the relevant difference is that our netrc tests use localhost as the machine name instead of the default 127.0.0.1. This is helpful, because it means we don't call the credential helper, when we really want to fall back to the askpass code. That's because we use different credentials for the netrctest remote, so if there were credentials available for localhost, we'll fail since we'll provide the wrong ones.

The reason, therefore, that this test is flaky is that the clone code creates an entry for the credential helper for localhost, and the credential directory is shared among tests. Thus, t-credentials.sh fails whenever these tests run concurrently.

The obvious solution is to simply create a per-test credentials directory so that we aren't affected by what other tests are doing. Let's do that, which makes this test pass consistently.

The initial commit here simply sets up a function so that we can call it in our test.

Fixes #5609

Right now, we do the default credential setup once for each run of the gitserver. However, in the future, we'll want to be able to perform it again in a different directory so we can have test-specific credential directories if needed. To do that, let's move the setup into a function, which we can reuse elsewhere.

We've noticed that this test has become flaky sometimes and fails in CI. More recently, a distributor noticed that this test fails consistently when run with 4 jobs at a time with `prove`. Not all of the tests fail every time, but at least one of them does. Part of the relevant difference is that our netrc tests use `localhost` as the machine name instead of the default 127.0.0.1. This is helpful, because it means we don't call the credential helper, when we really want to fall back to the askpass code. That's because we use different credentials for the `netrctest` remote, so if there were credentials available for `localhost`, we'll fail since we'll provide the wrong ones. The reason, therefore, that this test is flaky is that the clone code creates an entry for the credential helper for `localhost`, and the credential directory is shared among tests. Thus, `t-credentials.sh` fails whenever these tests run concurrently. The obvious solution is to simply create a per-test credentials directory so that we aren't affected by what other tests are doing. Let's do that, which makes this test pass consistently.

opohorel · 2024-01-12T09:25:39Z

I included both of these patches, added them to v3.4.1 release and tested with -j4 inside the CentOS Stream 9 mock build.
Tests now pass as expected, thank you for taking a look and fixing this!

chrisd8088

Thanks for digging into this! I had a question about how this change would work if the problem stems from multiple tests in the same suite conflicting with each other, but if it entirely fixes the problem, I'm good with it.

chrisd8088 · 2024-01-12T18:17:30Z

t/t-credentials.sh

+export CREDSDIR="$REMOTEDIR/creds-credentials"
+setup_creds


I confess to being a bit puzzled here because the commit description indicates the goal is to create a per-test credentials directory, but wouldn't this create a directory shared among all the tests in t/t-credentials.sh, and if I understand the problem there are several tests in that test suite which end up conflicting with each other, so this might not fully resolve the issue? Quite possibly I'm missing some important detail here.

Sorry, by per-test I mean test file, not begin_test/end_test. Thus, the goal is to prevent the t/t-credential.sh and t/t-clone.sh tests from conflicting with each other. The sets of tests in each file are fine on their own, just not together.

In commit b9602f3 of PR git-lfs#5616 we resolved a problem which caused flaky test results in our CI jobs by ensuring that the t/t-credential.sh test script used a unique directory for the credential record files we provide to our git-credential-lfstest helper program. This change ensured that when the "clone (HTTP server/proxy require cookies)" test in our t/t-clone.sh script creates a record file named "localhost" in the common directory specified by the CREDSDIR environment variable, as a copy of the "127.0.0.1" file our setup_creds() shell library function creates, this does not conflict with the "credentials from netrc" test and other related tests in the t/t-credential.sh test script, which depend on there not existing a credential for the "localhost" hostname. However, a previous attempt to resolve this problem was already made in commits bb87944 and bcb9eb3 of PR git-lfs#3825, but assumed that our test scripts would always run sequentially rather than in parallel. In that PR, the "clone (HTTP server/proxy require cookies)" was added, and at the end of the test, it removes the "localhost" credential record file it creates at the beginning of the test. Of course, if the test ever failed, then it would leave the "localhost" file in place, so even if our test scripts always ran sequentially there would still be a conflict with the tests in the t/t-credential.sh script. We can more comprehensively ensure that these two test scripts do not conflict again in the future by using the same technique applied in commit b9602f3 to the t/t-credential.sh script in the t/t-clone.sh script as well. We therefore add a call to setup_creds() at the start of the t/t-clone.sh script, so it will also use a separate copy of the credential record files in the CREDSDIR directory. This means that the "localhost" file will only exist in a credential record directory used by the t/t-clone.sh script. We can then also drop the attempt to clean up the "localhost" file from the end of the "clone (HTTP server/proxy require cookies)" test. To make this change, we also need to slightly refactor our t/testhelpers.sh library so that the setup_creds() function can generate the appropriate credential record filenames on its own. At the moment, it depends on some local variables defined in the setup() function, but these are used nowhere else except in the setup_creds() function, so we can just move their values directly into the strings concatenated and passed to the write_creds_path() function by the setup_creds() function.

In commit b9602f3 of PR git-lfs#5616 we resolved a problem which caused flaky test results in our CI jobs by ensuring that the t/t-credential.sh test script used a unique directory for the credential record files we provide to our git-credential-lfstest helper program. This change ensured that when the "clone (HTTP server/proxy require cookies)" test in our t/t-clone.sh script creates a record file named "localhost" in the common directory specified by the CREDSDIR environment variable, specifically a copy of the "127.0.0.1" file created by the setup_creds() shell library function, this does not conflict with the "credentials from netrc" test and other related tests in the t/t-credential.sh test script, which depend on there being no credential records associated with the "localhost" hostname. However, we already made a previous attempt to resolve this problem in commits bb87944 and bcb9eb3 of PR git-lfs#3825, but that change assumes that our test scripts always run sequentially rather than in parallel. In PR git-lfs#3825 the "clone (HTTP server/proxy require cookies)" test was added, and at the end of the test, it removes the "localhost" credential record file it creates at the beginning of the test. Of course, if the test ever fails, then it leaves the "localhost" record file in place, so even if our test scripts always ran sequentially there would still be a potential conflict with the tests in the t/t-credential.sh script. We can more comprehensively ensure that these two test scripts do not conflict again in the future by using the same technique applied in commit b9602f3 to the t/t-credential.sh script in the t/t-clone.sh script as well. First, we add a call to setup_creds() at the start of the t/t-clone.sh script after setting the CREDSDIR environment variable to a path unique to the t/t-clone.sh script. This ensures that a separate copy of the default credential record file for the 127.0.0.1 hostname is available for the exclusive use of the tests in the script. Next, we move the creation of the credential record files associated with the TLS/SSL client certificate used in the "clone ClientCert" test from the generic setup_creds() function into the test itself. We would otherwise need to define the "certpath" and "keypath" variables, as their values are interpolated by the function into the names of these credential record files. The TLS/SSL client certificate and key files are generated by our lfstest-gitserver utility when it is first executed. Their locations are defined by the LFSTEST_CLIENT_* environment variables passed by the setup() function in our t/testhelpers.sh shell library to the lfstest-count-tests utility, which then runs the lfstest-gitserver program. The values of these environment variables are set from the LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE* variables defined by our t/testenv.sh script. We now use these variables in the "clone ClientCert" test when we call the write_creds_file() function directly, instead of allowing the setup_creds() function to perform those calls. We could define the "certpath" and "keypath" variables (using the LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE_ENCRYPTED variables) prior to calling setup_creds() at the start of the t/t-clone.sh script. However, the "clone ClientCert" test is the only one which actually makes use of these credential record files, as this is the only test which actively checks the use of an encrypted TLS/SSL client certificate, and so is the only place where we need to create these record files. Further, we can also move into the test the "git config" commands that set the "http.<url>.sslCert" and "http.<url>.sslKey" Git configuration options with the locations of the TLS/SSL client certificate files. Previously, we set these configuration options in the setup() function in our t/testhelpers.sh shell library, so they were configured for all tests in all test scripts, although only the "clone ClientCert" test makes use of them. Finally, the "clone (HTTP server/proxy require cookies)" test no longer needs to attempt to delete the credential record file for the "localhost" hostname after the test is complete, so we can simply remove that code. Note that the "clone ClientCert" test was first introduced in commit daba49a of PR git-lfs#1893, at which time the "git config" commands to set the "http.<url>.sslCert" and "http.<url>.sslKey" options were added to the setup() function, and the LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE variables were defined in what was then the test/testenv.sh script. Later, in commit 52f94c2 of PR git-lfs#3270, the LFS_CLIENT_KEY_FILE_ENCRYPTED variable was added along with support in the lfstest-gitserver program to generate an encrypted certificate key file. In another commit in PR git-lfs#3270, commit 706beca, the "clone ClientCert" test was then expanded to validate use of the encrypted key file with the "git lfs clone" command. (Note too that the following test, the "clone ClientCert with homedir certs" test, appears to also depend on credential record files for the TLS/SSL client certificate files it creates in the dedicated home directory used by our tests. However, because this test does not set the "http.sslCertPasswordProtected" Git configuration option or the GIT_SSL_CERT_PASSWORD_PROTECTED environment variable, Git does not attempt to retrieve a passphrase for the certificate files, and so the associated credential record files are never actually used. We will address this issue in a subsequent commit in this PR.)

In commit b9602f3 of PR git-lfs#5616 we resolved a problem which caused flaky test results in our CI jobs by ensuring that the t/t-credential.sh test script used a unique directory for the credential record files we provide to our git-credential-lfstest helper program. This change ensured that when the "clone (HTTP server/proxy require cookies)" test in our t/t-clone.sh script creates a record file named "localhost" in the common directory specified by the CREDSDIR environment variable, specifically a copy of the "127.0.0.1" file created by the setup_creds() shell library function, this does not conflict with the "credentials from netrc" test and other related tests in the t/t-credential.sh test script, which depend on there being no credential records associated with the "localhost" hostname. However, we already made a previous attempt to resolve this problem in commits bb87944 and bcb9eb3 of PR git-lfs#3825, but that change assumes that our test scripts always run sequentially rather than in parallel. In PR git-lfs#3825 the "clone (HTTP server/proxy require cookies)" test was added, and at the end of the test, it removes the "localhost" credential record file it creates at the beginning of the test. Of course, if the test ever fails, then it leaves the "localhost" record file in place, so even if our test scripts always ran sequentially there would still be a potential conflict with the tests in the t/t-credential.sh script. We can more comprehensively ensure that these two test scripts do not conflict again in the future by using the same technique applied in commit b9602f3 to the t/t-credential.sh script in the t/t-clone.sh script as well. First, we add a call to setup_creds() at the start of the t/t-clone.sh script after setting the CREDSDIR environment variable to a path unique to the t/t-clone.sh script. This ensures that a separate copy of the default credential record file for the 127.0.0.1 hostname is available for the exclusive use of the tests in the script. Next, we move the creation of the credential record files associated with the TLS/SSL client certificate used in the "clone ClientCert" test from the generic setup_creds() function into the test itself. We would otherwise need to define the "certpath" and "keypath" variables, as their values are interpolated by the function into the names of these credential record files. The TLS/SSL client certificate and key files are generated by our lfstest-gitserver utility when it is first executed. Their locations are defined by the LFSTEST_CLIENT_* environment variables passed by the setup() function in our t/testhelpers.sh shell library to the lfstest-count-tests utility, which then runs the lfstest-gitserver program. The values of these environment variables are set from the LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE* variables defined by our t/testenv.sh script. We now use these variables in the "clone ClientCert" test when we call the write_creds_file() function directly, instead of allowing the setup_creds() function to perform those calls. We could define the "certpath" and "keypath" variables (using the LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE_ENCRYPTED variables) prior to calling setup_creds() at the start of the t/t-clone.sh script. However, the "clone ClientCert" test is the only one which actually makes use of these credential record files, as this is the only test which actively checks the use of an encrypted TLS/SSL client certificate, and so is the only place where we need to create these record files. Further, we can also move into the test the "git config" commands that set the "http.<url>.sslCert" and "http.<url>.sslKey" Git configuration options with the locations of the TLS/SSL client certificate files, and we do the same for the "create lock with server using client cert" test in the t/t-lock.sh test script. Previously, we set these configuration options in the setup() function in our t/testhelpers.sh shell library, so they were configured for all tests in all test scripts, although only the "clone ClientCert" test and the "create lock with server using client cert" test make use of them. Finally, the "clone (HTTP server/proxy require cookies)" test no longer needs to attempt to delete the credential record file for the "localhost" hostname after the test is complete, so we can simply remove that code. Note that the "clone ClientCert" test was first introduced in commit daba49a of PR git-lfs#1893, at which time the "git config" commands to set the "http.<url>.sslCert" and "http.<url>.sslKey" options were added to the setup() function, and the LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE variables were defined in what was then the test/testenv.sh script. Later, in commit 52f94c2 of PR git-lfs#3270, the LFS_CLIENT_KEY_FILE_ENCRYPTED variable was added along with support in the lfstest-gitserver program to generate an encrypted certificate key file. In another commit in PR git-lfs#3270, commit 706beca, the "clone ClientCert" test was then expanded to validate use of the encrypted key file with the "git lfs clone" command. (Note too that the following test, the "clone ClientCert with homedir certs" test, appears to also depend on credential record files for the TLS/SSL client certificate files it creates in the dedicated home directory used by our tests. However, because this test does not set the "http.sslCertPasswordProtected" Git configuration option or the GIT_SSL_CERT_PASSWORD_PROTECTED environment variable, Git does not attempt to retrieve a passphrase for the certificate files, and so the associated credential record files are never actually used. We will address this issue in a subsequent commit in this PR.)

bk2204 added 2 commits January 11, 2024 18:58

bk2204 mentioned this pull request Jan 11, 2024

t-credential.sh fails when building with lower level of parallelism #5609

Closed

bk2204 marked this pull request as ready for review January 12, 2024 14:03

bk2204 requested a review from a team as a code owner January 12, 2024 14:03

chrisd8088 approved these changes Jan 12, 2024

View reviewed changes

bk2204 merged commit b4a0927 into git-lfs:main Jan 12, 2024

bk2204 deleted the flaky-credential-test branch January 15, 2024 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix flaky test `t-credentials.sh` #5616

Fix flaky test `t-credentials.sh` #5616

Uh oh!

bk2204 commented Jan 11, 2024 •

edited

Loading

Uh oh!

opohorel commented Jan 12, 2024

Uh oh!

chrisd8088 left a comment

Uh oh!

chrisd8088 Jan 12, 2024

Uh oh!

bk2204 Jan 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		export CREDSDIR="$REMOTEDIR/creds-credentials"
		setup_creds

Fix flaky test t-credentials.sh #5616

Fix flaky test t-credentials.sh #5616

Uh oh!

Conversation

bk2204 commented Jan 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

opohorel commented Jan 12, 2024

Uh oh!

chrisd8088 left a comment

Choose a reason for hiding this comment

Uh oh!

chrisd8088 Jan 12, 2024

Choose a reason for hiding this comment

Uh oh!

bk2204 Jan 12, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix flaky test `t-credentials.sh` #5616

Fix flaky test `t-credentials.sh` #5616

bk2204 commented Jan 11, 2024 •

edited

Loading