Skip to content

Conversation

@chrisd8088
Copy link
Member

This PR simplifies and makes consistent the messages reported by the Git LFS client during push operations when a Git LFS object is missing from local internal Git LFS storage and is also not present on the remote server, and the lfs.allowIncompletePush Git configuration option is set to its default value of false.

At present, the client outputs different messages under these conditions, depending on whether or not a file exists in the working tree at the same path as that of the Git LFS pointer which references the missing object. We can demonstrate this variable behaviour of the client as follows:

$ git clone https://github.com/chrisd8088/test
$ cd test
$ git lfs track "*.bin"
$ echo foo >foo.bin
$ git add .gitattributes foo.bin
$ git commit -m test

$ rm -rf .git/lfs/objects
$ git push origin main
Git LFS upload failed:   0% (0/1), 0 B | 0 B/s                                  
Uploading LFS objects:   0% (0/1), 0 B | 0 B/s, done.
  (missing) foo.bin (b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c)
hint: Your push was rejected due to missing or corrupt local objects.
hint: You can disable this check with: `git config lfs.allowincompletepush true`

$ rm foo.bin 
$ git push origin main
Unable to find source for object b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c (try running `git lfs fetch --all`)
Uploading LFS objects:   0% (0/1), 0 B | 0 B/s, done.

Note that in neither case is the push operation successful, but the error messages are different depending on whether or not foo.bin exists in the working tree.

Further, the contents of the file in the working tree do not matter, as shown below in a continuation of the example above:

$ echo bar >foo.bin
$ git push origin main
Git LFS upload failed:   0% (0/1), 0 B | 0 B/s                                  
  (missing) foo.bin (b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c)
Uploading LFS objects:   0% (0/1), 0 B | 0 B/s, done.
hint: Your push was rejected due to missing or corrupt local objects.
hint: You can disable this check with: `git config lfs.allowincompletepush true`

(Note that when pushing to a remote on a local filesystem using the git-lfs-standalone-file(1) adapter, rather than either the of HTTP-based or SSH-based Git LFS object transfer protocols, the client always reports the error message shown above.)

This variable behaviour is the consequence of an incomplete implementation of the ensureFile() method of the uploadContext structure in our commands package, dating from its introduction in PR #176. In that PR's description, the method's purpose was explained as follows (using the original pre-release name for the Git LFS project):

If the .git/hawser/objects directory gets into a weird state (for example, if the user manually removed some files in there), this attempts to re-clean the objects based on the git repository file path.

The code comments preceding the method describe it in the same way, as do several subsequent references to the method in later PRs, such as PR #2574 and #3398. However, the ensureFile() method has never actually replaced missing object files under any circumstances.

This PR removes the ensureFile() method and as a consequence, missing object files should always produce the same error messages regardless of whether a file exists in the working tree at the path associated with an object's Git LFS pointer. For example:

$ git clone https://github.com/chrisd8088/test
$ cd test
$ git lfs track "*.bin"
$ echo foo >foo.bin
$ git add .gitattributes foo.bin
$ git commit -m test

$ rm -rf .git/lfs/objects
$ rm foo.bin
$ git push origin main
Git LFS upload failed:   0% (0/1), 0 B | 0 B/s                                  
Uploading LFS objects:   0% (0/1), 0 B | 0 B/s, done.
  (missing) foo.bin (b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c)
hint: Your push was rejected due to missing or corrupt local objects.
hint: You can disable this check with: `git config lfs.allowincompletepush true`

In addition to making these error messages consistent, this PR also removes some unnecessary mapping of the Missing fields in per-object data structures during upload operations, and refactors and expands the tests that validate our handling of missing and corrupt object files in git lfs pre-push and git push commands.

This PR will be most easily reviewed on a commit-by-commit basis, as each commit contains a detailed description of its changes.

Background

The ensureFile() method checks whether an object file is missing from the local storage directories, and if not, tests whether a file exists in the current working tree at the path of the Git LFS pointer associated with the object. If such a file exists, the method proceeds to run the Clean() method of the GitFilter structure in our lfs package on the file's contents.

The Clean() method calls the copyToTemp() method, which writes a "cleaned" version of the contents of the file from the working tree into a temporary file in the .git/lfs/tmp directory. It does not, though, move this file into the appropriate location under the .git/lfs/objects directory hierarchy. That has always been the responsibility of the clean() function in the commands package, which renames the temporary file into its final location, unless the file in the working tree was found to contain a raw Git LFS pointer, or an error occurred.

While we could try to revise the ensureFile() method to operate as was originally intended, the advantages of such a change are relatively slim, and the disadvantages are several. Most obviously, it requires modifications to our clean() function to guard against the replacement of object files with incorrect data, something the other callers of the function do not need to be concerned about because they only run within the context of our clean filter code, and so receive the input data from Git itself.

That this is a concern at all is in turn due to the reasonable chance that a file found in the current working tree at a given path does not contain the identical data as that of an Git LFS object generated from another file previously located at the same path.

As well, the fact that the ensureFile() method has never worked as designed, despite being repeatedly refactored and enhanced over ten years, suggests that its purpose is somewhat obscure and that the requisite logic is less intelligible than would be ideal. Users and developers expect push operations to involve the transfer of data but not the creation (or re-creation) of local data files, so the use of our clean filter code in such a context is not particularly intuitive.

For all these reasons, we just remove the ensureFile() method entirely, which simplifies our handling of missing objects during upload transfer operations, and makes them more consistent, whether a missing file is detected before beginning batch transfer requests, or while preparing a set of objects to be uploaded following a server's batch API response.

@chrisd8088 chrisd8088 requested a review from a team as a code owner April 3, 2025 06:28
@chrisd8088 chrisd8088 force-pushed the drop-unused-push-clean branch from 51047a8 to 04e567f Compare April 3, 2025 07:10
chrisd8088 added 15 commits May 22, 2025 16:50
Our t/t-pre-push.sh and t/t-push-failures-local.sh test scripts include
a number of tests which validate the behaviour of the "git lfs pre-push"
and "git push" commands when Git LFS objects are missing or corrupt.

While these tests are largely similar in their implementation, they
vary in a number of formatting and implementation details unrelated to
the specifics of the different conditions they simulate.  These
variations are artifacts of the evolution of our test suite over time;
for example, the tests in the t/t-push-failures-local.sh script were
refactored and collected from several earlier tests in commit
4d52e08 of PR git-lfs#3109, and the original
versions of the tests in the t/t-pre-push.sh script were added
incrementally in PRs git-lfs#447, git-lfs#2199, and git-lfs#2574.

In a subsequent commit in this PR we expect to update the Git LFS
client to remove some non-functional code which attempts to recreate
missing Git LFS objects during push operations.  In many cases this
change will cause the client to report missing objects in an earlier
stage of push operations than it does now.  We also expect to reword
the error message the client will output in such cases.

Before we make these changes, we first revise the related tests in our
test suite so they are as simple and similar as possible.  This will
ensure that when we update the Git LFS client we can clearly identify
the changes that we need to make in our tests to accommodate the
client's new behaviour.

In the "pre-push with missing pointer not on server" test in the
t/t-pre-push.sh script we remove the use of the "set +e" shell option,
which is not necessary to avoid test failure.  Although we expect the
"git lfs pre-push" command to fail, its output is piped into the tee(1)
command, and when the "set -e" option is in effect the shell will exit
immediately if the last command in a pipeline returns a non-zero exit
code, but not if other commands in the pipeline return such a code.

We do, however, add a check that the "git lfs pre-push" command exits
with a non-zero code by testing the appropriate PIPESTATUS array value,
following the example of the other related tests.

These other tests include the "pre-push with missing and present pointers
(lfs.allowincompletepush true)" and "pre-push reject missing pointers
(lfs.allowincompletepush default)" tests in the t/t-pre-push.sh script,
plus the four tests in the t/t-push-failures-local.sh script.

In all these tests we adjust the error messages generated in the case
that a "git lfs pre-push" or "git push" command fails or succeeds
unexpectedly.  We rewrite these messages so they are consistent with
each other and with those found in many of our other test scripts.

Note that in the "push reject missing objects (lfs.allowincompletepush
false)" test we also correct the message reported if the "git push"
command were to succeed unexpectedly.  At present, the message states
that we expect the command to succeed, but we actually expect it to fail.

Next, in the "push reject missing objects (lfs.allowincompletepush
default)" and "push reject corrupt objects (lfs.allowincompletepush
default)" tests, we update our checks of the "git push" command's exit
code so that we confirm the command exits with a specific non-zero exit
code of 1 rather than simply checking that its exit code is not zero.
This change brings these checks into alignment with those made by the
other related tests.

Lastly, we remove and adjust some whitespace so that these tests
are all formatted in a similar manner to each other.
Our t/t-pre-push.sh and t/t-push-failures-local.sh test scripts include
a number of tests which validate the behaviour of the "git lfs pre-push"
and "git push" commands when Git LFS objects are missing or corrupt.

While these tests are largely similar in their implementation, they
vary in a number of formatting and implementation details unrelated to
the specifics of the different conditions they simulate.  These
variations are artifacts of the evolution of our test suite over time;
for example, the tests in the t/t-push-failures-local.sh script were
refactored and collected from several earlier tests in commit
4d52e08 of PR git-lfs#3109, and the original
versions of the tests in the t/t-pre-push.sh script were added
incrementally in PRs git-lfs#447, git-lfs#2199, and git-lfs#2574.

In a subsequent commit in this PR we expect to update the Git LFS
client to remove some non-functional code which attempts to recreate
missing Git LFS objects during push operations.  In many cases this
change will cause the client to report missing objects in an earlier
stage of push operations than it does now.  We also expect to reword
the error message the client will output in such cases.

Before we make these changes, we first revise the related tests in our
test suite so they are as simple and similar as possible.  This will
ensure that when we update the Git LFS client we can clearly identify
the changes that we need to make in our tests to accommodate the
client's new behaviour.

In a pair of tests in t/t-pre-push.sh script, and in another pair of
tests in the t/t-push-failures-local.sh script, the tests' initial
setup code creates several Git LFS objects and then removes the object
file in the .git/lfs/objects directory hierarchy for one of them.

In each case, we can replace this code with a simple call to our
delete_local_object() test helper function, which performs the
equivalent action of removing an object's file from the internal
Git LFS storage directories.
Our t/t-push-failures-local.sh test script includes a number of tests
which validate the behaviour of the "git push" command when Git LFS
objects are missing or corrupt.

While these tests are largely similar in their implementation, they
vary in a number of formatting and implementation details unrelated to
the specifics of the different conditions they simulate.  These
variations are artifacts of the evolution of our test suite over time;
for example, the tests in the t/t-push-failures-local.sh script were
refactored and collected from several earlier tests in commit
4d52e08 of PR git-lfs#3109.

In a subsequent commit in this PR we expect to update the Git LFS
client to remove some non-functional code which attempts to recreate
missing Git LFS objects during push operations.  In many cases this
change will cause the client to report missing objects in an earlier
stage of push operations than it does now.  We also expect to reword
the error message the client will output in such cases.

Before we make these changes, we first revise the related tests in our
test suite so they are as simple and similar as possible.  This will
ensure that when we update the Git LFS client we can clearly identify
the changes that we need to make in our tests to accommodate the
client's new behaviour.

A pair of tests in the t/t-push-failures-local.sh script, specifically
the "push reject missing objects (lfs.allowincompletepush default)" and
"push reject corrupt objects (lfs.allowincompletepush default)" tests,
perform a number of additional assertions beyond those performed by
the other tests in the same script.

These extra assertions first check that the tests' initial setup code
has successfully created object files in the local Git LFS object
storage directories, and then that the delete_local_object() and
corrupt_local_object() test helper functions successfully delete or
truncate one of the object files.

However, these assertions are unnecessary, as the conditions they
check are fully validated by the remainder of the tests' actions.
For instance, we check the output of the "git push" command to confirm
that one specific object is missing or corrupt, and then at the end
of the tests we assert that only the remaining object has been
pushed to the remote server.  This final condition would not be
possible if that object had not been successfully created in the
first place.

Therefore, in order to align these two tests with the other related
tests we simply remove the unnecessary assertion statements.  We can
then also remove the setup code which initializes the variables
that store the Git LFS object sizes, because these values were only
used in the assertion calls.
Our t/t-push-failures-local.sh test script includes a number of tests
which validate the behaviour of the "git push" command when Git LFS
objects are missing or corrupt.

While these tests are largely similar in their implementation, they
vary in a number of formatting and implementation details unrelated to
the specifics of the different conditions they simulate.  These
variations are artifacts of the evolution of our test suite over time;
for example, the tests in the t/t-push-failures-local.sh script were
refactored and collected from several earlier tests in commit
4d52e08 of PR git-lfs#3109.

In a subsequent commit in this PR we expect to update the Git LFS
client to remove some non-functional code which attempts to recreate
missing Git LFS objects during push operations.  In many cases this
change will cause the client to report missing objects in an earlier
stage of push operations than it does now.  We also expect to reword
the error message the client will output in such cases.

Before we make these changes, we first revise the related tests in our
test suite so they are as simple and similar as possible.  This will
ensure that when we update the Git LFS client we can clearly identify
the changes that we need to make in our tests to accommodate the
client's new behaviour.

A pair of tests in the t/t-push-failures-local.sh script, specifically
the "push reject missing objects (lfs.allowincompletepush default)" and
"push reject corrupt objects (lfs.allowincompletepush default)" tests,
perform their final assertions in the reverse order from those performed
in the other tests in the same script.

In order to align the code in all these tests as closely as possible,
we simply revise the final checks in the last two tests in the
t/t-push-failures-local.sh script so they follow the same pattern
as those of the other tests.
Our t/t-pre-push.sh and t/t-push-failures-local.sh test scripts include
a number of tests which validate the behaviour of the "git lfs pre-push"
and "git push" commands when Git LFS objects are missing or corrupt.

While these tests are largely similar in their implementation, they
vary in a number of formatting and implementation details unrelated to
the specifics of the different conditions they simulate.  These
variations are artifacts of the evolution of our test suite over time;
for example, the tests in the t/t-push-failures-local.sh script were
refactored and collected from several earlier tests in commit
4d52e08 of PR git-lfs#3109, and the original
versions of the tests in the t/t-pre-push.sh script were added
incrementally in PRs git-lfs#447, git-lfs#2199, and git-lfs#2574.

In a subsequent commit in this PR we expect to update the Git LFS
client to remove some non-functional code which attempts to recreate
missing Git LFS objects during push operations.  In many cases this
change will cause the client to report missing objects in an earlier
stage of push operations than it does now.  We also expect to reword
the error message the client will output in such cases.

Before we make these changes, we first revise the related tests in our
test suite so they are as simple and similar as possible.  This will
ensure that when we update the Git LFS client we can clearly identify
the changes that we need to make in our tests to accommodate the
client's new behaviour.

A pair of tests in the t/t-push-failures-local.sh script, specifically
the "push reject missing objects (lfs.allowincompletepush default)" and
"push reject corrupt objects (lfs.allowincompletepush default)" tests,
perform their setup of Git LFS objects and Git commits in a different
sequence than the other tests in the same script.

In order to align the code in all these tests as closely as possible,
we simply revise the setup steps of the last two tests in the
t/t-push-failures-local.sh script so they follow the same pattern
as those of the other tests.  This change has no functional effect;
the only notable detail is that the tests now create all their Git LFS
objects in a single commit instead of using a separate commit to create
each object.

As well, we reorder the lists of files we pass to the "git add" command
in the first two tests in the t/t-push-failures-local.sh script so
they now follow the same pattern as those of the other tests in both
that script and in the t/t-pre-push.sh script.

We also adjust the commit message used when creating Git LFS objects
in a pair of tests in the t/t-pre-push.sh script, specifically the
"pre-push with missing and present pointers (lfs.allowincompletepush true)"
and "pre-push reject missing pointers (lfs.allowincompletepush default)"
tests, so they are equivalent to those used in the related tests in
the t/t-push-failures-local.sh script.
Our t/t-pre-push.sh test script includes a number of tests which validate
the behaviour of the "git lfs pre-push" command when Git LFS objects are
present, missing, or corrupt.

While these tests are largely similar in their implementation, they
vary in a number of formatting and implementation details unrelated to
the specifics of the different conditions they simulate.  These
variations are artifacts of the evolution of our test suite over time;
for example, the original versions of the tests in the t/t-pre-push.sh
script were added incrementally in PRs git-lfs#447, git-lfs#2199, and git-lfs#2574.

In a subsequent commit in this PR we expect to update the Git LFS
client to remove some non-functional code which attempts to recreate
missing Git LFS objects during push operations.  In many cases this
change will cause the client to report missing objects in an earlier
stage of push operations than it does now.  We also expect to reword
the error message the client will output in such cases.

Before we make these changes, we first revise the related tests in our
test suite so they are as simple and similar as possible.  This will
ensure that when we update the Git LFS client we can clearly identify
the changes that we need to make in our tests to accommodate the
client's new behaviour.

The "pre-push with existing file" and "pre-push with existing pointer"
tests in the t/t-pre-push.sh script check similar conditions, with the
only difference betwen that the latter test does not invoke the
Git LFS client's "clean" filter and instead creates an object file
in the .git/lfs/objects storage directories directly.  Like the
"pre-push with existing file" test, though, the "pre-push with existing
pointer" test then pushes the new Git LFS object to the remote server
and expect the object to be transferred successfully.

However, unlike the "pre-push with existing file" test, the "pre-push
with existing pointer" test does not confirm that the object exists
in the remote server's storage, so we add that confirmation step now
using an assert_server_object() test helper function call identical
to the one made at the end of the "pre-push with existing file" test.

Another pair of tests, the "pre-push with missing pointer not on server"
test in the t/t-pre-push.sh script and the "push reject missing objects
(lfs.allowincompletepush default)" test in the t/t-push-failures-local.sh
script, also check similar conditions, with the difference that the
former test never creates an object file in the .git/lfs/objects
storage directories, while the latter allows an object file to be
created by the "clean" filter, and then deletes it.  In both cases,
though, they expect the absent object file to cause a failure to be
reported by the "git lfs pre-push" or "git push" commands, respectively.

However, the "pre-push with missing pointer not on server" test, unlike
the "push reject missing objects (lfs.allowincompletepush default)" test,
does not call the refute_server_object() helper function to confirm the
that the object does not exist in the remote server's storage after the
push operation runs, and also does not check for the message "LFS upload
failed" in the output of the push operation, so we add those checks now.
Our t/t-pre-push.sh test script includes a number of tests which validate
the behaviour of the "git lfs pre-push" command when Git LFS objects are
present, missing, or corrupt.

While these tests are largely similar in their implementation, they
vary in a number of formatting and implementation details unrelated to
the specifics of the different conditions they simulate.  These
variations are artifacts of the evolution of our test suite over time;
for example, the original versions of the tests in the t/t-pre-push.sh
script were added incrementally in PRs git-lfs#447, git-lfs#2199, and git-lfs#2574.

In a subsequent commit in this PR we expect to update the Git LFS
client to remove some non-functional code which attempts to recreate
missing Git LFS objects during push operations.  In many cases this
change will cause the client to report missing objects in an earlier
stage of push operations than it does now.  We also expect to reword
the error message the client will output in such cases.

Before we make these changes, we first revise the related tests in our
test suite so they are as simple and similar as possible.  This will
ensure that when we update the Git LFS client we can clearly identify
the changes that we need to make in our tests to accommodate the
client's new behaviour.

Some of the tests in the t/t-pre-push.sh script which verify the
behaviour of the "git lfs pre-push" command when Git LFS objects are
present or missing use an older form of the common initial test
repository setup steps, where the basename(1) command is called
to generate a prefix from the script's name for each test repository's
name on the remote server, but then a fixed string without that
prefix is used when cloning the repository from the remote.

This older approach to establishing a test repository is not uncommon
in our test suite and is entirely functional.  However, we expect to
rename a number of these particular tests in the t/t-pre-push.sh script
in a subsequent commit in this PR, and also revise the repository names
used by the tests at the same time, to help make the tests as consistent
with each other as possible.

Therefore, before renaming these tests and their repositories, we
adjust their initial setup steps to use follow the same pattern as,
for example, the related tests in the same script and also those found
in the t/t-push-failures-local.sh script.
Our t/t-pre-push.sh and t/t-push-failures-local.sh test scripts include
a number of tests which validate the behaviour of the "git lfs pre-push"
and "git push" commands when Git LFS objects are present, missing, or
corrupt.

While these tests are largely similar in their implementation, they
vary in a number of formatting and implementation details unrelated to
the specifics of the different conditions they simulate.  These
variations are artifacts of the evolution of our test suite over time;
for example, the tests in the t/t-push-failures-local.sh script were
refactored and collected from several earlier tests in commit
4d52e08 of PR git-lfs#3109, and the original
versions of the tests in the t/t-pre-push.sh script were added
incrementally in PRs git-lfs#447, git-lfs#2199, and git-lfs#2574.

In a subsequent commit in this PR we expect to update the Git LFS
client to remove some non-functional code which attempts to recreate
missing Git LFS objects during push operations.  In many cases this
change will cause the client to report missing objects in an earlier
stage of push operations than it does now.  We also expect to reword
the error message the client will output in such cases.

Before we make these changes, we first revise the related tests in our
test suite so they are as simple and similar as possible.  This will
ensure that when we update the Git LFS client we can clearly identify
the changes that we need to make in our tests to accommodate the
client's new behaviour.

In previous commits in this PR we have revised and reformatted these
tests to increase their consistency with each other.  One additional
adjustment we make now to further increase this consistency, and to
provide greater clarity as to each test's purpose, is to rename both
the tests and test repositories they create and clone.

Note that since these tests typically only create a single missing
or corrupt object, we now use the singular word "object" rather than
"objects" in the test and repository names.
Our t/t-pre-push.sh and t/t-push-failures-local.sh test scripts include
a number of tests which validate the behaviour of the "git lfs pre-push"
and "git push" commands when Git LFS objects are missing or corrupt.

In a subsequent commit in this PR we expect to update the Git LFS
client to remove some non-functional code which attempts to recreate
missing Git LFS objects during push operations.  In many cases this
change will cause the client to report missing objects in an earlier
stage of push operations than it does now.  We also expect to reword
the error message the client will output in such cases.

Before we make these changes, we update two of the tests to check for
a longer and more specific error message in the case where a
"git lfs pre-push" or "git push" command fails after finding that
a Git LFS object is missing from the local object storage directories
and also that there is no extant file in the Git working tree at
the path associated with the Git LFS pointer that references the
missing object.

Under these conditions, and when the "lfs.allowIncompletePush" Git
configuration option is set to its default value of "false", the
Git LFS client reports the error message generated by the
enqueueAndCollectRetriesFor() method of the TransferQueue structure
in our "tq" Go package.  This message should include the OID of
missing Git LFS object.

We therefore update the "pre-push reject missing pointers
(lfs.allowincompletepush default)" test in the t/t-pre-push.sh script
and the "push reject missing objects (lfs.allowincompletepush false)"
test in the t/t-push-failures-local.sh script so that they check for
the missing object's OID in the text of the error message reported
by the "git lfs pre-push" or "git push" command, respectively.
Our t/t-pre-push.sh and t/t-push-failures-local.sh test scripts include
a number of tests which validate the behaviour of the "git lfs pre-push"
and "git push" commands when Git LFS objects are missing or corrupt.

In a subsequent commit in this PR we expect to update the Git LFS
client to remove some non-functional code which attempts to recreate
missing Git LFS objects during push operations.  In many cases this
change will cause the client to report missing objects in an earlier
stage of push operations than it does now.  We also expect to reword
the error message the client will output in such cases.

Before we make these changes, we update two of the tests where the
"lfs.allowIncompletePush" Git configuration option is set to "true"
so that although they remove one Git LFS object file from the local
storage directories under .git/lfs/objects, they do not also remove
the corresponding Git LFS pointer from the repository's most recent
commit, or remove the copy of the object file's contents from the
current working tree.

These additional removal actions are not necessary to simulate the
test condition of a Git LFS object file missing from the local
storage directories, and so long as the "lfs.allowIncompletePush"
option is set to "true", have no effect on the outcome of the
"git lfs pre-push" and "git push" commands.

Note that several other related tests in the t/t-pre-push.sh and
t/t-push-failures-local.sh scripts also perform the same action
of removing an object file from the local storage directories and then
also deleting the associated Git LFS pointer and corresponding file
in the working tree.  In a subsequent commit in this PR we will
eliminate these steps from the other tests as well.

However, we first need to address the fact that the Git LFS client
outputs different error messages when an object file is missing,
depending on whether or not a file exists in the working tree at
the path associated with the Git LFS pointer corresponding to the
object.  The file in the working tree may have any content, or be
entirely empty; its simple presence is enough to change the error
message output by the Git LFS client and the point at which the
client abandons the upload push operation, so long as the
"lfs.allowIncompletePush" configuration option is set to its
default values of "false".

In a subsequent commit in this PR we will remove the code which
causes this variable behaviour, which will simplify our handling of
missing objects during upload transfer operations.

At the same time we will also update the relevant tests in the
t/t-pre-push.sh and t/t-push-failures-local.sh scripts, and in
particular we will adjust them so that after they remove a Git LFS
object file from the local storage directories, they do not also
remove the corresponding Git LFS pointer from the repository's Git
history or the associated copy of the object file's contents in the
current working tree.  Thus these tests' setup steps will again
exactly match those of the tests we have modified in this commit.
Our t/t-push-failures-local.sh test script includes a number of tests
which validate the behaviour of the "git push" command when Git LFS
objects are missing or corrupt.

All of these tests exercise the HTTP-based Git LFS object transfer
protocol, as they predate the development and implementation of the
"pure" SSH-based Git LFS object transfer protocol in PR git-lfs#4446
in 2021.

In a subsequent commit in this PR we expect to update the Git LFS
client to remove some non-functional code which attempts to recreate
missing Git LFS objects during push operations.  In many cases this
change will cause the client to report missing objects in an earlier
stage of push operations than it does now.  We also expect to reword
the error message the client will output in such cases.

Before we make these changes, we first introduce additional versions of
all the tests in the t/t-push-failures-local.sh script, adjusting them
to exercise the SSH-based Git LFS object transfer protocol rather than
the HTTP-based one.  We also introduce a new refute_remote_object()
test helper function, akin to the existing refute_server_object()
helper function but like the assert_remote_object() function able
to operate without the use of the HTTP-based Git LFS object transfer
protocol.

Among our new tests, one test in particular, the "push reject missing
object (lfs.allowincompletepush false) (git-lfs-transfer)" test,
validates some logic in the Batch() method of the SSHBatchClient
structure in our "tq" package that we expect to refactor in a
subsequent commit in this PR.

At present, this Batch() method operates in the same manner as the
Batch() method of the tqClient, which implements the HTTP-based
object transfer protocol.  Both methods start by iterating over the
set of objects to be transferred and copying the Missing fields from
the per-object Transfer structures into a local map.  After performing
the actual Batch API request (either over HTTP or SSH), the both
methods then iterate over the set of object records returned by the
server, which are also represented as Transfer structures, and set
these structures' Missing flags based on the values saved in the
methods' local maps.

This behaviour ensures that when the enqueueAndCollectRetriesFor()
method of the TransferQueue structure in our "tq" package iterates
over the per-object Transfer structures representing the server's
response, if the server has requested an upload of the object (meaning
the server lacks a copy of the object), and if the object's Missing
flag is set (meaning the client lacks a local copy of the object data),
then the push operation can be abandoned immediately, instead of
proceeding further and only reporting the missing object later
when attempting to upload the object's data.

As noted above, we expect to refactor this logic in later commits in
this PR, specifically by removing the use of local maps in the Batch()
methods while retaining the ability of the client to abandon push
operations at the batch request stage as soon as the client determines
that an object is missing both locally and on the remote server.

The new tests we add in this commit will therefore provide greater
confidence that these subsequent changes will not cause regressions in
the existing behaviour of the client when performing SSH-based push
operations.
At present, during push operations, the Git LFS client outputs different
error messages when an object file is missing depending on whether or
not a file exists in the working tree at the path associated with the
Git LFS pointer corresponding to the object.  The file in the working
tree may have any content, or be entirely empty; its simple presence
is enough to change the error message output by the Git LFS client and
the point at which the client abandons the upload push operation, so
long as the "lfs.allowIncompletePush" configuration option is set to
its default value of "false".

Under these conditions, the client also abandons the push operation
as soon as it determines that the object is missing on the remote
server as well as on the local system.  Under other conditions, such
as when a file (with any content) exists in the working tree at the
path associated with the missing object, the client continues the
push operation until it attempts to upload the object's data, at
which point it skips the object but continues to try to upload any
other objects in its queue.

In a subsequent commit in this PR we will remove the code which
causes this variable behaviour, which will simplify our handling of
missing objects during upload transfer operations, and cause the
client to output a consistent error message in all cases where an
object file is missing and the "lfs.allowIncompletePush" configuration
option is set to "false".  Our changes will also cause the client to
abandon the push operation as soon as it determines an object to be
missing both locally and on the remote server.

Our t/t-pre-push.sh and t/t-push-failures-local.sh test scripts include
a number of tests which validate the behaviour of the "git lfs pre-push"
and "git push" commands when Git LFS objects are missing or corrupt.

Several of these tests simulate the specific conditions described
above, and check that the push operation has been abandoned immediately
after a Batch API request by checking for the presence of a specific
error message.  This message was originally introduced into the
ensureFile() method of the uploadContext structure in our "commands"
package in commit fea77e1 of PR git-lfs#3398,
but since commit 1412d6e of PR git-lfs#3634
was been output by the enqueueAndCollectRetriesFor() method of the
TransferQueue structure in our "tq" package.

When we revise this message to be consistent with the one output
by the ReportErrors() method of the uploadContext structure, which
we expect to do in a subsequent commit in this PR, our tests will
no longer be able to check for the unique message output by the
enqueueAndCollectRetriesFor() method to confirm that the push
operation has been abandoned immediately after receiving a response
from the remote server.

As we would like our tests to continue to validate this behaviour by
checking for the presence of a specific error message output by the
enqueueAndCollectRetriesFor() method, we update that method now to
output a trace log message, and then revise the relevant tests to
also set the GIT_TRACE environment variable and then check for this
message in the logs generated by the "git lfs pre-push" or "git push"
commands.
Since commit 1412d6e of PR git-lfs#3634,
during push operations the Git LFS client has sometimes avoided
reporting an error when an object to be pushed is missing locally
if the remote server reports that it has a copy already.

To implement this feature, a new Missing element was added to the
Transfer and objectTuple structures in our "tq" package, and the
Add() method of the TransferQueue structure in that package was
updated to accept an additional "missing" flag, which the method
uses to set the Missing element of the objectTuple structure it
creates and sends to the "incoming" channel.  Batches of objects to be
pushed are then gathered from this channel by the collectBatches()
method of the TransferQueue structure.

As batches of objectTuple structures are collected, they are passed
to the enqueueAndCollectRetriesFor() method, which converts them
to Transfer structures using the ToTransfers() method and then passes
them to the Batch() function, which is defined in our tq/api.go
source file.  This function initializes a batchRequest structure
which contains the set of Transfer structures as its Objects element,
and then passes those to the Batch() method specific to the current
batch transfer adapter's structure.  These Batch() methods return
a BatchResponse structure, which the Batch() function then returns
to the enqueueAndCollectRetriesFor() method.  The BatchResponse
structure also contains an Objects element which is another set
of Transfer structures that represent the per-object metadata
received from the remote server.

After the enqueueAndCollectRetriesFor() method receives a BatchResponse
during a push operation, if any of the Transfer structures in that
response define an upload action to be performed, this implies that
the remote server does not have a copy of those objects.

As one of the changes we made in PR git-lfs#3634, we introduced a step into
the enqueueAndCollectRetriesFor() method which halts the push operation
if the server's response indicates that the server lacks a copy of an
object, and if the "missing" value passed to the Add() method for that
object was set to "true".  (Initially, this step also decremented the
count of the number of objects waiting to be transferred, but this
created the potential for stalled push operations, and so another
approach to handling an early exit from the batch transfer process was
implemented in commit eb83fcd of
PR git-lfs#3800.)

Also in PR git-lfs#3634, several methods of the uploadContext structure in
our "commands" package were revised to set the "missing" value for
each object before calling the Add() method of the TransferQueue
structure.  Specifically, the ensureFile() method of the uploadContext
structure first checks for the presence of the object data file in
the .git/lfs/objects local storage directories.  If that does not
exist, then the method looks for a file in the working tree at the
path associated with the Git LFS pointer that corresponds to the
object.  If that file also does not exist, and if the
"lfs.allowIncompletePush" Git configuration option is set to "false",
then the method returns "true", and this value is ultimately used
for the "missing" argument in the call to the TransferQueue's
Add() method for the object.

Note that the file in the working tree may have any content, or be
entirely empty; its simple presence is enough to change the value
returned by the ensureFile() method, given the other conditions
described above.  We expect to revise this unintuitive behaviour in
a subsequent commit in this PR.

Before we make that change, however, we first adjust two aspects
of the implementation from PR git-lfs#3634 so as to simplify our handling
of missing objects during push operations.  We make one of these
adjustments in this commit, and the other in a following commit
in this PR.

As noted above, the enqueueAndCollectRetriesFor() method halts a
push operation if the server's response indicates that the server lacks
a copy of an object, and if the "missing" value passed to the Add()
method for that object was set to "true".  In order for the method
to determine an object's "missing" value, at present it consults
the Missing element of the object's Transfer structure from the
set of those structures provided in the BatchResponse structure.

However, Git LFS remote servers do not report an object's local
"missing" status, so these Missing fields are not populated from
the server's response.  Instead, the appropriate values for these
Missing fields in the Transfer structures representing the list
of objects in the server's response are set by either the Batch()
method of the tqClient structure, which handles HTTP-based Git LFS
Batch API requests and responses, and the Batch() method of the
SSHBatchClient structure, which handles the equivalent in the SSH
version of the Git LFS object transfer protocol.

These methods are invoked by the Batch() function in the "tq" package,
and are passed a batchRequest structure, whose Objects element has
been populated by that function from the list of Transfer structures
passed to it by the enqueueAndCollectRetriesFor() method.  The
Batch() methods for both the HTTP and SSH versions of the Git LFS
object transfer protocol iterate over this input set of Transfer
structures and create local maps from each object's ID to the
value of the Missing element from the object's input Transfer structure.

After performing the relevant HTTP or SSH batch request and receiving
a response from the remote server, the Batch() methods then iterate
over the list of objects in the server's reply and set each object's
output Transfer structure's Missing element from the value found in
the local map.

This design dates from the original implementation for the HTTP-based
Git LFS transfer protocol in commit 1412d6e
of PR git-lfs#3634, and was then implemented for the SSH-based protocol in
commit 594f8e3 of PR git-lfs#4446, when
that protocol was introduced.

However, we can simplify this design by eliminating the need to
create and reference local maps in the Batch() methods for each
transfer protocol, if we instead make use of the map of object IDs
to lists of objectTuple structures held in the "transfers" element
of the TransferQueue structure.  This "transfers" element contains
 a map from object IDs to "objects" structures, which in turn
have "objects" elements that are lists of objectTuple structures.
These are populated by the remember() method of the TransferQueue
method, which is called by the Add() method to record all the
input data it is passed when an object is added to the transfer queue.

One potential complication is that the "objects" structure allows
for multiple objectTuples to be recorded for a common object ID,
in case the Add() method is called repeatedly for the same Git LFS
object.  Hypothetically, this could allow for different values
of the "missing" argument to the Add() method to be recorded for
the same object ID during the operation of the transfer queue.

In practice, though, this is not possible, because the Add() method
will only be called once per unique object ID during push operations.
When a "git lfs pre-push" or "git lfs push" command runs, the
UploadPointers() method of the uploadContext structure in the
"commands" is the only caller of the Add() method of the TransferQueue
structure, and the UploadPointers() method only performs that call
for the object IDs returned by the prepareUpload() method, which it
calls before the TransferQueue's Add() method.

In the common case where a Git LFS push command scans through the
history of one or more Git references to find Git LFS pointers,
the UploadPointers() method of the uploadContext structure is called
for each individual pointer found in the Git history.  As a consequence,
the objects are passed individually to the prepareUpload() method.
That method checks if the given object have been processed already by
calling the HasUploaded() method, and if that returns "true", the
prepareUpload() method returns nothing, so the UploadPointers() method
in turn does not invoke the UploadPointers() method for that object.
If the HasUploaded() method returns "false", though, the object is
returned by the prepareUpload() method, so the UploadPointers() method
invokes the TransferQueue's Add() method, and then calls the
SetUploaded() method to register the object ID in the set of IDs
that have been processed so far.

Note that is unlikely for multiple Git LFS pointers with the same
object ID to be found in a repository's Git history in the first place,
because these pointers will normally be identical to each other, and
so will have identical Git blob SHA-1 values.  Therefore when we
run "git rev-list --objects ..." to retrieve the Git blobs reachable
from a set of Git references, Git LFS pointers with the same
Git LFS object IDs will normally only be represented by a single
Git blob.

However, as demonstrated by our t/t-duplicate-oids.sh test script,
it is possible for Git LFS pointers with legacy values for their
"version" field to exist, which may then reference the same SHA-256
Git LFS object ID as a modern Git LFS pointer but be stored as
distinct Git blobs.  Similarly, it is possible for a Git LFS pointer
with extension fields to resolve to the same final content data as
a pointer without those fields, or with different extension fields,
and so could also result in distinct Git blobs that are all valid
Git LFS pointers with the same SHA-256 object IDs.

Aside from the common case where a Git LFS push command scans through
the history of one or more Git references to find Git LFS pointers,
the "git lfs push" command may also be invoked with its --object-id
or --stdin options, along with a list of specific Git LFS object IDs
to be uploaded.  When handling these command-line options, the
uploadsWithObjectIDs() function in our "commands" package will
invoke the UploadPointers() method of the uploadContext structure
with the full set of Git LFS object IDs passed to the command.

Because a user may inadvertently supply the same object ID more than
once, this means duplicate IDs may be passed by the UploadPointers()
method to the prepareUpload() method, in which case that method's
call to the HasUploaded() method would return the same "false" value
for all the duplicate object IDs in the list.  Hence, this check is
not sufficient in this case to avoid duplicate IDs being returned
by the prepareUpload() method.  Fortunately, though, the
prepareUpload() method performs another round of de-duplication using
a local StringSet from the "tools" package of the standard Go library.
Each object ID is added to this "uniqOids" set as it is processed,
unless the set already contains that object ID, in which case the
object is skipped.

This second level of de-duplication logic ensures that it is not
possible for UploadPointers() method to receive object IDs from the
prepareUpload() method unless they have not been processed before.
This in turn guarantees that the Add() method of the TransferQueue
structure will never be called for the same Git LFS object ID more
than once during a push operation.

(As a sidebar, the use of the "uniqOids" set to de-duplicate objects
during push operations was originally introduced in commit
d770dfa of PR git-lfs#1600 to guard against
duplicate object IDs which arose from distinct Git LFS pointers with
the same SHA-256 object IDs, as can result from the use of legacy
pointer "version" values.  At the time, the prepareUpload() method
was called by an upload() function rather than the newer
UploadPointers() method.  During normal "git lfs pre-push" and
"git lfs push" commands, without the --object-id or --stdin options,
this upload() function would be passed all the Git LFS objects found
from scanning the appropriate Git history, and so might encounter
duplicate object IDs which would not be filtered by checking the set
maintained by the SetUploaded() method, since that was only called
after the Add() method of the TransferQueue structure.  Later, with
the changes in PR git-lfs#1953, the logic changed such that objects are
only handled individually during push operations when the --object-id
and --stdin options are not supplied, and since then the checks of the
"uniqOids" set in the prepareUpload() method have only guarded against
duplicate object IDs provided via those command-line options.)

As described above, we can rely on the object ID de-duplication in the
"commands" package to prevent the same object being requested for
upload more than once.  This ensures that the "objects" structure
found in the "transfers" map element of the TransferQueue structure
will only contain a single objectTuple.  Therefore, in our revisions
to the enqueueAndCollectRetriesFor() method, we can simply check the
Missing element of that first (and only) objectTuple to determine
whether the "missing" argument was set to "true" when the Add() method
of the TransferQueue structure was called for that unique object.

(Note that in PR git-lfs#2476, the "transfers" structure was updated to
maintain a list of objectTuple structures for each object ID, in order
to handle duplicate OIDs passed during "smudge" filter operations
using Git's "delay" capability for long-running filter drivers
like the "git lfs filter-process" command.  This filter never
handles upload operations, though, so the de-duplication logic in
the "tq" package is not applicable to push operations, since those
are fully de-duplicated in the "commands" package.)

In prior commits in this PR we refactored and expanded the tests
relevant to the handling of missing objects during push operations,
in part to provide additional validation of the changes in this
commit, including several new tests in our t/t-push-failures-local.sh
which attempt to perform push operations with missing objects while
using the SSH-based version of the Git LFS object transfer protocol.

In particular, the "push reject missing object (lfs.allowincompletepush
false) (git-lfs-transfer)" test checks that when an object is missing
locally, and is detected as such by the ensureFile() method of the
uploadContext structure because there is also no file in the working
tree at the path associated with the object's Git LFS pointer, the
Git LFS client abandons the push operation immediately and does not
proceed to try to upload either the missing object or any other objects.

The test validates this behaviour by checking that no objects are
present on the server after the "git push" command exits, and also
by looking for error and trace log messages the client should output
when it halts the push operation.  We added the trace log message in
a previous commit in this PR because we intend to revise the error
message in a subsequent commit, which will make the message more
consistent, so it is not specific to the case where no file is found
in the working tree, but will also mean we cannot use the revised
error message as an indicator that the client has halted the push
operation where we expect that it should.
Since commit 1412d6e of PR git-lfs#3634,
during push operations the Git LFS client has sometimes avoided
reporting an error when an object to be pushed is missing locally
if the remote server reports that it has a copy already.

To implement this feature, a new Missing element was added to the
Transfer and objectTuple structures in our "tq" package, and the
Add() method of the TransferQueue structure in that package was
updated to accept an additional "missing" flag, which the method
uses to set the Missing element of the objectTuple structure it
creates and sends to the "incoming" channel.  Batches of objects to be
pushed are then gathered from this channel by the collectBatches()
method of the TransferQueue structure.

As batches of objectTuple structures are collected, they are passed
to the enqueueAndCollectRetriesFor() method, which converts them
to Transfer structures using the ToTransfers() method and then passes
them to the Batch() function, which is defined in our tq/api.go
source file.  This function initializes a batchRequest structure
which contains the set of Transfer structures as its Objects element,
and then passes those to the Batch() method specific to the current
batch transfer adapter's structure.  These Batch() methods return
a BatchResponse structure, which the Batch() function then returns
to the enqueueAndCollectRetriesFor() method.  The BatchResponse
structure also contains an Objects element which is another set
of Transfer structures that represent the per-object metadata
received from the remote server.

After the enqueueAndCollectRetriesFor() method receives a BatchResponse
during a push operation, if any of the Transfer structures in that
response define an upload action to be performed, this implies that
the remote server does not have a copy of those objects.

As one of the changes we made in PR git-lfs#3634, we introduced a step into
the enqueueAndCollectRetriesFor() method which halts the push operation
if the server's response indicates that the server lacks a copy of an
object, and if the "missing" value passed to the Add() method for that
object was set to "true".  (Initially, this step also decremented the
count of the number of objects waiting to be transferred, but this
created the potential for stalled push operations, and so another
approach to handling an early exit from the batch transfer process was
implemented in commit eb83fcd of
PR git-lfs#3800.)

Also in PR git-lfs#3634, several methods of the uploadContext structure in
our "commands" package were revised to set the "missing" value for
each object before calling the Add() method of the TransferQueue
structure.  Specifically, the ensureFile() method of the uploadContext
structure first checks for the presence of the object data file in
the .git/lfs/objects local storage directories.  If that does not
exist, then the method looks for a file in the working tree at the
path associated with the Git LFS pointer that corresponds to the
object.  If that file also does not exist, and if the
"lfs.allowIncompletePush" Git configuration option is set to "false",
then the method returns "true", and this value is ultimately used
for the "missing" argument in the call to the TransferQueue's
Add() method for the object.

Note that the file in the working tree may have any content, or be
entirely empty; its simple presence is enough to change the value
returned by the ensureFile() method, given the other conditions
described above.  We expect to revise this unintuitive behaviour in
a subsequent commit in this PR.

Before we make that change, however, we first adjust two aspects
of the implementation from PR git-lfs#3634 so as to simplify our handling
of missing objects during push operations.  We made one of these
adjustments in the previous commit in this PR, and we make the
other in this commit.

As noted above, the enqueueAndCollectRetriesFor() method halts a
push operation if the server's response indicates that the server lacks
a copy of an object, and if the "missing" value passed to the Add()
method for that object was set to "true".  When this occurs, the
enqueueAndCollectRetriesFor() method outputs an error message that
is distinct from the message which would otherwise be reported later,
when the partitionTransfers() method of the TransferQueue rechecks
whether individual objects' data files are present in the local
storage directories.

The message reported by the enqueueAndCollectRetriesFor() method
when it abandons a push operation states that the client was
"Unable to find source for object".  This message was originally
introduced in commit fea77e1 of
PR git-lfs#3398, and was generated by the ensureFile() method of the
uploadContext structure in our "commands" package, when that method
was unable to locate either an object file in the local storage
directories, or a corresponding file in the working tree at the path
of the object's Git LFS pointer.

As such, the wording of the message alludes to the expectation that
the ensureFile() method will try to recreate a missing object file
from a file in the working tree, i.e., from the object's "source".
This is how the method is described in the code comments that
precede it, and how it was intended to operate since it was first
added in PR git-lfs#176.

However, the ensureFile() has never actually recreated object files
in this way, due to an oversight in its implementation, and given
the challenges posed by the likelihood that files in the current
working tree do not correspond exactly to the source of missing Git LFS
object files, we expect to simply remove the ensureFile() method in
a subsequent commit in this PR.

In PR git-lfs#3634 the enqueueAndCollectRetriesFor() method of the TransferQueue
structure was revised to halt push operations if the server reports that
it requires the upload of an object for which the "missing" value
provided to the TransferQueue's Add() method was set to "true".  The
error message reported in such a case was copied from the message
formerly output by the ensureFile() method of the uploadContext
structure, and that method was altered so it no longer generated this
error message.

This error message is not the only one reported by the Git LFS client
when an object file is missing during a push operation, though, because
it is only output when the ensureFile() method does not find a file
(with any content) in the working at the path associated with the
object's pointer.  When a file does exist in the working tree, but
the actual object file in the Git LFS local storage directories is
missing, the push operation proceeds past the checks in the
enqueueAndCollectRetriesFor() method and continues to the point where
that method calls the addToAdapter() method of the TransferQueue
structure.  That method in turn calls the partitionTransfers() method
to determine which of the current batch of objects to be uploaded
have local object files present and which do not.

If the partitionTransfers() method finds that an object file is
missing for a given object, it creates a new MalformedObjectError
with the "missing" element of that error structure set to "true".
The method then instantiates a TransferResult structure for the object
and sets the Error element of the TransferResult to the new
MalformedObjectError.  Finally, the method returns this TransferResult
along with the other TransferResult structures it creates for all the
other objects in the batch.

The addToAdapter() method passes these TransferResult structures
individually to the handleTransferResult() method, which checks whether
the Error element is defined for the given TransferResult.  If an
error was encountered, and is one which indicates the object transfer
should not be retried, then the handleTransferResult() method sends
the error to the TransferQueue's "errorc" channel.  For errors of the
MalformedObjectError type, this is always the case.

During a push operation, the CollectErrors() method of the uploadContext
structure in the "commands" package receives these errors from the
channel, and if they are errors of the MalformedObjectError type and
have a "true" values in their "missing" element, the object's ID and
the filename associated with the object's pointer are recorded in the
"missing" map of the uploadContext structure.

When the ReportErrors() method of the uploadContext structure is then
run, it iterates over the keys and values of the "missing" map and
outputs an error message containing both the object ID and the
associated filename of the object's pointer, along with a "(missing)"
prefix.  As well, a leading error message is output, whose exact text
depends on the value of the "lfs.allowIncompletePush" configuration
option, and if this option is set to its default value of "false",
several trailing error messages are output which provide a hint as to
how to set that option if the user wants to allow a subsequent push
operation with missing objects to proceed to completion as best it can.

These per-object error messages were first defined in commit
9be11e8 of PR git-lfs#2082, and the trailing
hints were added in commit f5f5731 of
PR git-lfs#3109, at which time the "lfs.allowIncompletePush" option's default
values was changed to "false".

As mentioned above, in a subsequent commit in this PR we expect to
remove the ensureFile() method of the uploadContext structure.  We
still expect to perform a check for missing object files in the
uploadTransfer() method, though, as this will typically find any
missing object files at the start of a push operation, and thereby
allow the enqueueAndCollectRetriesFor() method of the TransferQueue
structure to halt the operation as soon as one of those objects is
found to be required by the remote server.

For the time being, we will leave the secondary check for missing
object files in place in the partitionTransfers() method of the
TransferQueue structure, as this method also tests the size of the
object file and reports those with unexpected sizes as corrupt.

Nevertheless, we would like to make our error messages as consistent
as possible when handling missing object files.  Therefore we revise
the enqueueAndCollectRetriesFor() method so it no longer returns the
"Unable to find source for object" error message, but instead returns a
new MalformedObjectError with a "true" value for its "missing" element.

One advantage of this change is that we remove the somewhat stale
wording of the previous message, which reflected the assumption that
the ensureFile() method of the uploadContext structure would attempt
to recreate missing object files from "source" files in the working
tree, even though the method has never actually done so.

Another advantage is that by returning a MalformedObjectError, the
existing logic of the CollectErrors() and ReportErrors() methods of the
uploadContext structure will handle the error exactly as if it had been
generated by the TransferQueue's partitionTransfers() method, and will
output both the same leading error message and trailing hint messages
as in that case.

As a result, we also adjust several tests in our t/t-pre-push.sh
and t/t-push-failures-local.sh scripts to expect the error messages
output by the ReportErrors() method instead of the message previously
generated by the enqueueAndCollectRetriesFor() method.
The original version of the ensureFile() method of the uploadContext
structure in our "commands" package was first introduced in commit
5d239d6 of PR git-lfs#176 in 2015, and has
been refactored a number of times since then, but continues to be called
for each object a push operation intends to upload.

The method is documented as a function that checks whether a Git LFS
object file exists in the local .git/lfs/objects storage directories
for a given pointer, and if it does not, tries to replace the missing
object file by passing the contents of the corresponding file in the
working tree through our "clean" filter.

The description of PR git-lfs#176 explains the function's purpose as follows,
using the original pre-release name for the Git LFS project:

  If the .git/hawser/objects directory gets into a weird state
  (for example, if the user manually removed some files in there),
  this attempts to re-clean the objects based on the git repository
  file path.

The code comments preceding the ensureFile() method also describe it
in the same way, as do the notes in later PRs, such as PR git-lfs#2574, in
which the "lfs.allowIncompletePush" configuration option was introduced,
and PR git-lfs#3398, which refined the error message the Git LFS client reports
during a push operation when an object is missing locally and is also
not present on the remote server.

However, the ensureFile() method has never actually replaced missing
object files under any circumstances.  It does check whether an object
file is missing from the local storage directories, and if not, tests
whether a file exists in the current working tree at the path of the
Git LFS pointer associated with the object.  If such a file exists,
the method proceeds to run the Clean() method of the GitFilter structure
in our "lfs" package on the file's contents.

The Clean() method calculates the SHA-256 hash value of the file's
contents and creates a Pointer structure containing this hash value,
and also writes a copy of the file's data into a temporary file in
the .git/lfs/tmp directory.  It is then the responsibility of the
caller to determine whether or not this temporary file should be
moved into place in the .git/lfs/objects directory hierarchy.

The only other caller of the Clean() method, besides the ensureFile()
method, is the clean() function in the "commands" package, which
is used by multiple Git LFS commands including the "git lfs clean"
and "git lfs filter-process" plumbing commands, as well as the
"git lfs migrate import" command.

The clean() function performs several tasks after invoking the Clean()
method.  First, it checks whether the file processed by the method
was found to contain a Git LFS pointer; if so, no further action is
taken as we assume the file in the working tree has not been passed
through our "smudge" filter, and we do not want to create another
pointer which simply hashes and references the existing one.

Next, the clean() function checks whether a file already exists in the
local .git/lfs/objects storage directories at the location into which
the function would otherwise expect to move the temporary file
created by the Clean() method.  If a file does exist in this location
and has the same size as the temporary file, no further action is
taken, as we assume it contains the same contents and does not need
to be updated.

Assuming neither of these checks causes the clean() function to
return early, the function moves the temporary file created by the
Clean() method into the expected location within the .git/lfs/objects
directory hierarchy.

Unfortunately, because the ensureFile() method invokes the Clean() method
of the GitFilter structure but never performs any of the subsequent steps
taken by the clean() function, it never recreates a missing Git LFS object
from a file found in the working tree.  This appears to have been the
case at the time the ensureFile() method was introduced in PR git-lfs#176,
and has remained so ever since.

We could attempt to remedy this situation by altering the ensureFile()
method so it calls the clean() function.  To do so it would need to
simulate the conditions under which the function usually runs,
specifically within the "clean" filter context where the function is
expected to transform an input data stream into an output data stream.
We would likely use a Discard structure from the "io" package of the
standard Go library to simply discard the output from the clean()
function, as we do not need to send it back to Git in the way the
"git lfs filter-process" or "git lfs clean" commands do.

However, we would have to add logic to the clean() function to guard
against the case where the file in the working tree had different
contents than those of the missing Git LFS object.  Because the
user may check out a Git reference in which a different file exists
at the same path in the working tree, or may simply modify the
file in the working tree independently, there is no guarantee that
the file we pass through the Clean() method is identical to the
one from which the missing Git LFS object was created.

The original implementation of the ensureFile() function, although it
did not fulfil its stated purpose, did include a check to verify
that the SHA hash of the working tree file, as returned by the Clean()
method, matched that of the missing object.  This check was removed in
commit 338ab40 of PR git-lfs#1812, which
would likely have introduced a serious bug, except that the ensureFile()
method never actually replaced any missing objects and so the removal
of this check had no functional impact.

While we could try to revise the ensureFile() method to operate as
was originally intended, the advantages of such a change are relatively
slim, and the disadvantages are several.  Most obviously, it requires
modifications to our clean() function to guard against the replacement
of object files with incorrect data, something the other callers of
the function do not need to be concerned about.

That this is a concern at all is in turn due to the reasonable
chance that a file found in the current working tree at a given path
does not contain the identical data as that of an Git LFS object
generated from another file previously located at the same path.

As well, the fact that the ensureFile() method has never worked as
designed, despite being repeatedly refactored and enhanced over ten
years, suggests that its purpose is somewhat obscure and that the
requisite logic is less intelligible than would be ideal.  Users and
developers expect push operations to involve the transfer of data but
not the creation (or re-creation) of local data files, so the use of
some of our "clean" filter code in such a context is not particularly
intuitive.

For all these reasons, we just remove the ensureFile() method entirely,
which simplifies our handling of missing objects during upload transfer
operations.  Instead, we check for the presence of each object file
we intend to push in the uploadTransfer() method of our uploadContext
structure, and if a file is not found in the local storage directories,
we flag it as missing, unless the "lfs.allowIncompletePush" configuration
option is set to "true".  We also use the IsNotExist() function from
the "os" package in the Go standard library to ascertain whether an
object file is missing, or if some other type of error prevents us
from reading its state.  This mirrors the more detailed checks performed
on each object file during a push operation by the partitionTransfers()
method of the TransferQueue structure in the "tq" package.

One consequence of this change is that when an object to be uploaded
is missing locally and is also not already present on the remote
server, and when the "lfs.allowIncompletePush" Git configuration
option is set to its default value of "false", we now always abandon
a push operation after the remote server indicates that it expects
the client to upload the object.

This means that in the "push reject missing object
(lfs.allowincompletepush default)*" tests in our
t/t-push-failures-local.sh test script, we should now expect
to find a trace log message output by the client stating that the
push operation's batch queue has been stopped because an object
is missing on both the local system and the remote server.

We added this trace log message in a prior commit in this PR, and were
able to insert checks for it in several other tests in our test suite,
but only because those tests either did not create a file in the working
tree at all, as in the case of the "pre-push reject missing object"
test in our t/t-pre-push.sh script, or removed the file in the working
tree that corresponded to the object file they removed from the
.git/lfs/objects storage directories.

As a result of the changes in this commit, we can also now simplify
the three tests that performed this extra setup step, where they
used to remove the file in the working tree which corresponded to
the object file they removed from the local Git LFS storage directories.
This step is no longer necessary to cause the client to abandon the
push operation after the server indicates that it requires an upload
of an object the client has determined is missing from the local system.
Therefore we can remove these extra setup steps from both of the
"push reject missing object (lfs.allowincompletepush false)*" tests
in the t/t-push-failures-local.sh script, and from the "pre-push
reject missing object (lfs.allowincompletepush default)" test in the
t/t-pre-push.sh script.
@chrisd8088 chrisd8088 force-pushed the drop-unused-push-clean branch from 04e567f to fc57e01 Compare May 22, 2025 23:53
Copy link
Member

@larsxschneider larsxschneider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chrisd8088 chrisd8088 merged commit 47c116a into git-lfs:main Jun 5, 2025
10 checks passed
@chrisd8088 chrisd8088 deleted the drop-unused-push-clean branch June 5, 2025 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants