Skip to content

Conversation

@bk2204
Copy link
Member

@bk2204 bk2204 commented Jan 11, 2024

We've noticed that this test has become flaky sometimes and fails in CI. More recently, a distributor noticed that this test fails consistently when run with 4 jobs at a time with prove. Not all of the tests fail every time, but at least one of them does.

Part of the relevant difference is that our netrc tests use localhost as the machine name instead of the default 127.0.0.1. This is helpful, because it means we don't call the credential helper, when we really want to fall back to the askpass code. That's because we use different credentials for the netrctest remote, so if there were credentials available for localhost, we'll fail since we'll provide the wrong ones.

The reason, therefore, that this test is flaky is that the clone code creates an entry for the credential helper for localhost, and the credential directory is shared among tests. Thus, t-credentials.sh fails whenever these tests run concurrently.

The obvious solution is to simply create a per-test credentials directory so that we aren't affected by what other tests are doing. Let's do that, which makes this test pass consistently.

The initial commit here simply sets up a function so that we can call it in our test.

Fixes #5609

Right now, we do the default credential setup once for each run of the
gitserver.  However, in the future, we'll want to be able to perform it
again in a different directory so we can have test-specific credential
directories if needed.  To do that, let's move the setup into a
function, which we can reuse elsewhere.
We've noticed that this test has become flaky sometimes and fails in CI.
More recently, a distributor noticed that this test fails consistently
when run with 4 jobs at a time with `prove`.  Not all of the tests fail
every time, but at least one of them does.

Part of the relevant difference is that our netrc tests use `localhost`
as the machine name instead of the default 127.0.0.1.  This is helpful,
because it means we don't call the credential helper, when we really
want to fall back to the askpass code.  That's because we use different
credentials for the `netrctest` remote, so if there were credentials
available for `localhost`, we'll fail since we'll provide the wrong
ones.

The reason, therefore, that this test is flaky is that the clone code
creates an entry for the credential helper for `localhost`, and the
credential directory is shared among tests.  Thus, `t-credentials.sh`
fails whenever these tests run concurrently.

The obvious solution is to simply create a per-test credentials
directory so that we aren't affected by what other tests are doing.
Let's do that, which makes this test pass consistently.
@opohorel
Copy link
Contributor

I included both of these patches, added them to v3.4.1 release and tested with -j4 inside the CentOS Stream 9 mock build.
Tests now pass as expected, thank you for taking a look and fixing this!

@bk2204 bk2204 marked this pull request as ready for review January 12, 2024 14:03
@bk2204 bk2204 requested a review from a team as a code owner January 12, 2024 14:03
Copy link
Member

@chrisd8088 chrisd8088 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this! I had a question about how this change would work if the problem stems from multiple tests in the same suite conflicting with each other, but if it entirely fixes the problem, I'm good with it.

Comment on lines +7 to +8
export CREDSDIR="$REMOTEDIR/creds-credentials"
setup_creds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confess to being a bit puzzled here because the commit description indicates the goal is to create a per-test credentials directory, but wouldn't this create a directory shared among all the tests in t/t-credentials.sh, and if I understand the problem there are several tests in that test suite which end up conflicting with each other, so this might not fully resolve the issue? Quite possibly I'm missing some important detail here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, by per-test I mean test file, not begin_test/end_test. Thus, the goal is to prevent the t/t-credential.sh and t/t-clone.sh tests from conflicting with each other. The sets of tests in each file are fine on their own, just not together.

@bk2204 bk2204 merged commit b4a0927 into git-lfs:main Jan 12, 2024
@bk2204 bk2204 deleted the flaky-credential-test branch January 15, 2024 15:11
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Sep 15, 2024
In commit b9602f3 of PR git-lfs#5616 we
resolved a problem which caused flaky test results in our CI jobs
by ensuring that the t/t-credential.sh test script used a unique
directory for the credential record files we provide to our
git-credential-lfstest helper program.

This change ensured that when the "clone (HTTP server/proxy require
cookies)" test in our t/t-clone.sh script creates a record file named
"localhost" in the common directory specified by the CREDSDIR
environment variable, as a copy of the "127.0.0.1" file our
setup_creds() shell library function creates, this does not conflict
with the "credentials from netrc" test and other related tests
in the t/t-credential.sh test script, which depend on there not
existing a credential for the "localhost" hostname.

However, a previous attempt to resolve this problem was already
made in commits bb87944 and
bcb9eb3 of PR git-lfs#3825, but assumed that
our test scripts would always run sequentially rather than in parallel.
In that PR, the "clone (HTTP server/proxy require cookies)" was
added, and at the end of the test, it removes the "localhost"
credential record file it creates at the beginning of the test.

Of course, if the test ever failed, then it would leave the "localhost"
file in place, so even if our test scripts always ran sequentially
there would still be a conflict with the tests in the t/t-credential.sh
script.

We can more comprehensively ensure that these two test scripts do
not conflict again in the future by using the same technique applied
in commit b9602f3 to the
t/t-credential.sh script in the t/t-clone.sh script as well.

We therefore add a call to setup_creds() at the start of the
t/t-clone.sh script, so it will also use a separate copy of the
credential record files in the CREDSDIR directory.  This means
that the "localhost" file will only exist in a credential record
directory used by the t/t-clone.sh script.  We can then also drop
the attempt to clean up the "localhost" file from the end of the
"clone (HTTP server/proxy require cookies)" test.

To make this change, we also need to slightly refactor our
t/testhelpers.sh library so that the setup_creds() function can
generate the appropriate credential record filenames on its own.
At the moment, it depends on some local variables defined in the
setup() function, but these are used nowhere else except in the
setup_creds() function, so we can just move their values directly
into the strings concatenated and passed to the write_creds_path()
function by the setup_creds() function.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Oct 7, 2024
In commit b9602f3 of PR git-lfs#5616 we
resolved a problem which caused flaky test results in our CI jobs
by ensuring that the t/t-credential.sh test script used a unique
directory for the credential record files we provide to our
git-credential-lfstest helper program.

This change ensured that when the "clone (HTTP server/proxy require
cookies)" test in our t/t-clone.sh script creates a record file named
"localhost" in the common directory specified by the CREDSDIR
environment variable, as a copy of the "127.0.0.1" file our
setup_creds() shell library function creates, this does not conflict
with the "credentials from netrc" test and other related tests
in the t/t-credential.sh test script, which depend on there not
existing a credential for the "localhost" hostname.

However, a previous attempt to resolve this problem was already
made in commits bb87944 and
bcb9eb3 of PR git-lfs#3825, but assumed that
our test scripts would always run sequentially rather than in parallel.
In that PR, the "clone (HTTP server/proxy require cookies)" was
added, and at the end of the test, it removes the "localhost"
credential record file it creates at the beginning of the test.

Of course, if the test ever failed, then it would leave the "localhost"
file in place, so even if our test scripts always ran sequentially
there would still be a conflict with the tests in the t/t-credential.sh
script.

We can more comprehensively ensure that these two test scripts do
not conflict again in the future by using the same technique applied
in commit b9602f3 to the
t/t-credential.sh script in the t/t-clone.sh script as well.

We therefore add a call to setup_creds() at the start of the
t/t-clone.sh script, so it will also use a separate copy of the
credential record files in the CREDSDIR directory.  This means
that the "localhost" file will only exist in a credential record
directory used by the t/t-clone.sh script.  We can then also drop
the attempt to clean up the "localhost" file from the end of the
"clone (HTTP server/proxy require cookies)" test.

To make this change, we also need to slightly refactor our
t/testhelpers.sh library so that the setup_creds() function can
generate the appropriate credential record filenames on its own.
At the moment, it depends on some local variables defined in the
setup() function, but these are used nowhere else except in the
setup_creds() function, so we can just move their values directly
into the strings concatenated and passed to the write_creds_path()
function by the setup_creds() function.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Oct 30, 2024
In commit b9602f3 of PR git-lfs#5616 we
resolved a problem which caused flaky test results in our CI jobs
by ensuring that the t/t-credential.sh test script used a unique
directory for the credential record files we provide to our
git-credential-lfstest helper program.

This change ensured that when the "clone (HTTP server/proxy require
cookies)" test in our t/t-clone.sh script creates a record file named
"localhost" in the common directory specified by the CREDSDIR
environment variable, as a copy of the "127.0.0.1" file our
setup_creds() shell library function creates, this does not conflict
with the "credentials from netrc" test and other related tests
in the t/t-credential.sh test script, which depend on there not
existing a credential for the "localhost" hostname.

However, a previous attempt to resolve this problem was already
made in commits bb87944 and
bcb9eb3 of PR git-lfs#3825, but assumed that
our test scripts would always run sequentially rather than in parallel.
In that PR, the "clone (HTTP server/proxy require cookies)" was
added, and at the end of the test, it removes the "localhost"
credential record file it creates at the beginning of the test.

Of course, if the test ever failed, then it would leave the "localhost"
file in place, so even if our test scripts always ran sequentially
there would still be a conflict with the tests in the t/t-credential.sh
script.

We can more comprehensively ensure that these two test scripts do
not conflict again in the future by using the same technique applied
in commit b9602f3 to the
t/t-credential.sh script in the t/t-clone.sh script as well.

We therefore add a call to setup_creds() at the start of the
t/t-clone.sh script, so it will also use a separate copy of the
credential record files in the CREDSDIR directory.  This means
that the "localhost" file will only exist in a credential record
directory used by the t/t-clone.sh script.  We can then also drop
the attempt to clean up the "localhost" file from the end of the
"clone (HTTP server/proxy require cookies)" test.

To make this change, we also need to slightly refactor our
t/testhelpers.sh library so that the setup_creds() function can
generate the appropriate credential record filenames on its own.
At the moment, it depends on some local variables defined in the
setup() function, but these are used nowhere else except in the
setup_creds() function, so we can just move their values directly
into the strings concatenated and passed to the write_creds_path()
function by the setup_creds() function.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 18, 2024
In commit b9602f3 of PR git-lfs#5616 we
resolved a problem which caused flaky test results in our CI jobs
by ensuring that the t/t-credential.sh test script used a unique
directory for the credential record files we provide to our
git-credential-lfstest helper program.

This change ensured that when the "clone (HTTP server/proxy require
cookies)" test in our t/t-clone.sh script creates a record file named
"localhost" in the common directory specified by the CREDSDIR
environment variable, specifically a copy of the "127.0.0.1" file
created by the setup_creds() shell library function, this does not
conflict with the "credentials from netrc" test and other related
tests in the t/t-credential.sh test script, which depend on there
being no credential records associated with the "localhost" hostname.

However, we already made a previous attempt to resolve this problem
in commits bb87944 and
bcb9eb3 of PR git-lfs#3825, but that change
assumes that our test scripts always run sequentially rather than in
parallel.  In PR git-lfs#3825 the "clone (HTTP server/proxy require cookies)"
test was added, and at the end of the test, it removes the "localhost"
credential record file it creates at the beginning of the test.

Of course, if the test ever fails, then it leaves the "localhost" record
file in place, so even if our test scripts always ran sequentially
there would still be a potential conflict with the tests in the
t/t-credential.sh script.

We can more comprehensively ensure that these two test scripts do
not conflict again in the future by using the same technique applied
in commit b9602f3 to the
t/t-credential.sh script in the t/t-clone.sh script as well.

First, we add a call to setup_creds() at the start of the t/t-clone.sh
script after setting the CREDSDIR environment variable to a path
unique to the t/t-clone.sh script.  This ensures that a separate copy
of the default credential record file for the 127.0.0.1 hostname is
available for the exclusive use of the tests in the script.

Next, we move the creation of the credential record files associated
with the TLS/SSL client certificate used in the "clone ClientCert"
test from the generic setup_creds() function into the test itself.
We would otherwise need to define the "certpath" and "keypath" variables,
as their values are interpolated by the function into the names of
these credential record files.

The TLS/SSL client certificate and key files are generated by our
lfstest-gitserver utility when it is first executed.  Their locations
are defined by the LFSTEST_CLIENT_* environment variables passed by
the setup() function in our t/testhelpers.sh shell library to the
lfstest-count-tests utility, which then runs the lfstest-gitserver
program.  The values of these environment variables are set from the
LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE* variables defined by
our t/testenv.sh script.  We now use these variables in the
"clone ClientCert" test when we call the write_creds_file() function
directly, instead of allowing the setup_creds() function to perform
those calls.

We could define the "certpath" and "keypath" variables (using the
LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE_ENCRYPTED variables)
prior to calling setup_creds() at the start of the t/t-clone.sh script.
However, the "clone ClientCert" test is the only one which actually
makes use of these credential record files, as this is the only test
which actively checks the use of an encrypted TLS/SSL client certificate,
and so is the only place where we need to create these record files.

Further, we can also move into the test the "git config" commands that
set the "http.<url>.sslCert" and "http.<url>.sslKey" Git configuration
options with the locations of the TLS/SSL client certificate files.
Previously, we set these configuration options in the setup() function
in our t/testhelpers.sh shell library, so they were configured for all
tests in all test scripts, although only the "clone ClientCert" test
makes use of them.

Finally, the "clone (HTTP server/proxy require cookies)" test no
longer needs to attempt to delete the credential record file for
the "localhost" hostname after the test is complete, so we can
simply remove that code.

Note that the "clone ClientCert" test was first introduced in
commit daba49a of PR git-lfs#1893, at
which time the "git config" commands to set the "http.<url>.sslCert"
and "http.<url>.sslKey" options were added to the setup() function,
and the LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE variables
were defined in what was then the test/testenv.sh script.  Later,
in commit 52f94c2 of PR git-lfs#3270,
the LFS_CLIENT_KEY_FILE_ENCRYPTED variable was added along with
support in the lfstest-gitserver program to generate an encrypted
certificate key file.  In another commit in PR git-lfs#3270, commit
706beca, the "clone ClientCert"
test was then expanded to validate use of the encrypted key file
with the "git lfs clone" command.

(Note too that the following test, the "clone ClientCert with homedir
certs" test, appears to also depend on credential record files for
the TLS/SSL client certificate files it creates in the dedicated home
directory used by our tests.  However, because this test does not
set the "http.sslCertPasswordProtected" Git configuration option or
the GIT_SSL_CERT_PASSWORD_PROTECTED environment variable, Git does
not attempt to retrieve a passphrase for the certificate files, and
so the associated credential record files are never actually used.
We will address this issue in a subsequent commit in this PR.)
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 18, 2024
In commit b9602f3 of PR git-lfs#5616 we
resolved a problem which caused flaky test results in our CI jobs
by ensuring that the t/t-credential.sh test script used a unique
directory for the credential record files we provide to our
git-credential-lfstest helper program.

This change ensured that when the "clone (HTTP server/proxy require
cookies)" test in our t/t-clone.sh script creates a record file named
"localhost" in the common directory specified by the CREDSDIR
environment variable, specifically a copy of the "127.0.0.1" file
created by the setup_creds() shell library function, this does not
conflict with the "credentials from netrc" test and other related
tests in the t/t-credential.sh test script, which depend on there
being no credential records associated with the "localhost" hostname.

However, we already made a previous attempt to resolve this problem
in commits bb87944 and
bcb9eb3 of PR git-lfs#3825, but that change
assumes that our test scripts always run sequentially rather than in
parallel.  In PR git-lfs#3825 the "clone (HTTP server/proxy require cookies)"
test was added, and at the end of the test, it removes the "localhost"
credential record file it creates at the beginning of the test.

Of course, if the test ever fails, then it leaves the "localhost" record
file in place, so even if our test scripts always ran sequentially
there would still be a potential conflict with the tests in the
t/t-credential.sh script.

We can more comprehensively ensure that these two test scripts do
not conflict again in the future by using the same technique applied
in commit b9602f3 to the
t/t-credential.sh script in the t/t-clone.sh script as well.

First, we add a call to setup_creds() at the start of the t/t-clone.sh
script after setting the CREDSDIR environment variable to a path
unique to the t/t-clone.sh script.  This ensures that a separate copy
of the default credential record file for the 127.0.0.1 hostname is
available for the exclusive use of the tests in the script.

Next, we move the creation of the credential record files associated
with the TLS/SSL client certificate used in the "clone ClientCert"
test from the generic setup_creds() function into the test itself.
We would otherwise need to define the "certpath" and "keypath" variables,
as their values are interpolated by the function into the names of
these credential record files.

The TLS/SSL client certificate and key files are generated by our
lfstest-gitserver utility when it is first executed.  Their locations
are defined by the LFSTEST_CLIENT_* environment variables passed by
the setup() function in our t/testhelpers.sh shell library to the
lfstest-count-tests utility, which then runs the lfstest-gitserver
program.  The values of these environment variables are set from the
LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE* variables defined by
our t/testenv.sh script.  We now use these variables in the
"clone ClientCert" test when we call the write_creds_file() function
directly, instead of allowing the setup_creds() function to perform
those calls.

We could define the "certpath" and "keypath" variables (using the
LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE_ENCRYPTED variables)
prior to calling setup_creds() at the start of the t/t-clone.sh script.
However, the "clone ClientCert" test is the only one which actually
makes use of these credential record files, as this is the only test
which actively checks the use of an encrypted TLS/SSL client certificate,
and so is the only place where we need to create these record files.

Further, we can also move into the test the "git config" commands that
set the "http.<url>.sslCert" and "http.<url>.sslKey" Git configuration
options with the locations of the TLS/SSL client certificate files, and
we do the same for the "create lock with server using client cert" test
in the t/t-lock.sh test script.

Previously, we set these configuration options in the setup() function
in our t/testhelpers.sh shell library, so they were configured for all
tests in all test scripts, although only the "clone ClientCert" test and
the "create lock with server using client cert" test make use of them.

Finally, the "clone (HTTP server/proxy require cookies)" test no
longer needs to attempt to delete the credential record file for
the "localhost" hostname after the test is complete, so we can
simply remove that code.

Note that the "clone ClientCert" test was first introduced in
commit daba49a of PR git-lfs#1893, at
which time the "git config" commands to set the "http.<url>.sslCert"
and "http.<url>.sslKey" options were added to the setup() function,
and the LFS_CLIENT_CERT_FILE and LFS_CLIENT_KEY_FILE variables
were defined in what was then the test/testenv.sh script.  Later,
in commit 52f94c2 of PR git-lfs#3270,
the LFS_CLIENT_KEY_FILE_ENCRYPTED variable was added along with
support in the lfstest-gitserver program to generate an encrypted
certificate key file.  In another commit in PR git-lfs#3270, commit
706beca, the "clone ClientCert"
test was then expanded to validate use of the encrypted key file
with the "git lfs clone" command.

(Note too that the following test, the "clone ClientCert with homedir
certs" test, appears to also depend on credential record files for
the TLS/SSL client certificate files it creates in the dedicated home
directory used by our tests.  However, because this test does not
set the "http.sslCertPasswordProtected" Git configuration option or
the GIT_SSL_CERT_PASSWORD_PROTECTED environment variable, Git does
not attempt to retrieve a passphrase for the certificate files, and
so the associated credential record files are never actually used.
We will address this issue in a subsequent commit in this PR.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

t-credential.sh fails when building with lower level of parallelism

3 participants