Skip to content

Conversation

@ttaylorr
Copy link
Contributor

@ttaylorr ttaylorr commented Dec 29, 2016

This pull-request teaches the push and pre-push commands how to use the new gitscanner without accumulating potentially large slices of *lfs.WrappedPointers.

Previously, in order to push a set of LFS objects to a remote, the pusher needed to have explicit knowledge of all objects they were going to push. In many cases, this isn't a problem, since usually the number of objects is manageable. During initial or large pushes, however, the number of objects to push can grow prohibitively large, causing a problem.

This pull-request applies the work that @technoweenie has done with the gitscanner type and applies it to the push and pre-push commands. Here's a breakdown of what happened:

  1. 4d4f808: since the commands will now call uploadPointers() more than once, re-use the *tq.TransferQueue instance across invocations to prevent needless re-allocation and to take full advantage of batching.
  2. 64a643e: since the uploadPointers() calls can place objects across multiple batches and thus do not guarantee that all objects in the call have been uploaded, introduce Await() to ensure that the *tq.TransferQueue is done before exiting LFS.
  3. a58e91f^1...f1380bd: teach calling code how to asynchronously call uploadPointers.
  4. 028facf^1...430f5a7: 💅
  5. 7af9281: fix small bug in lfs.NewDownloadCheckQueue which created n empty elements in the options array (where n is equal to the length of options provided)
  6. 15673a4: since the download check queue is used many times instead of all at once, do some preparation work to make it reusable. In this commit, provide a mechanism for callers to find out about failed transfers so that calling code can accurately determine when all given items have been checked.
  7. b169435: Wire-up the download check queue to be reused in *commands.uploadContext.

One thing I'd like to investigate before merging is the use of the download check queue in b169435. Prior to that commit, the queue was re-instantiated each time the scanner reported items. This would be OK if the scanner reported items in small batches, since we'd be able to take advantage of the batching functionality of the *tq.TransferQueue, however it does not. What this means is that we have to perform an expensive object allocation (not to mention all of the internal chans, maps, etc.) each time a pointer is read from the gitscanner.

To mitigate this, I did work in 15673a4^1...b169435 to avoid this expensive and repeated allocation. However, as a consequence, the download check queue needs to report synchronously whether or not the server has items that are missing locally in .git/lfs/objects. What this means is a batch size of 1, so that we're not waiting indefinitely to hear of the status a set of items with cardinality less than the batch size.

This may be fine, since I think it's a safe assumption to assume the number of items that needs to go through this check queue (i.e., the number of objects missing from .git/lfs/objects` needed to push) is sufficiently small. If not, it may be worth pursuing alternatives that remove the need for single-length batches.

One option we could pursue is using a second transfer queue to keep track of the items that are not present in .git/lfs/objects, or using the *lfs.Batcher type to batch it ourselves. I'm not sure if this is advantageous or not, since it does carry with it additional complexity. I'd love to hear additional thoughts on this.


/cc @git-lfs/core

@ttaylorr ttaylorr added this to the v2.0.0 milestone Dec 29, 2016
@technoweenie
Copy link
Contributor

Hmm, I don't like this approach. We should definitely not be making download check calls in batches of 1. Maybe we need to stake a step back and consider why there is a download check queue at all? How can we make it a single queue?

I have a hunch that the transfer queue checks to see if the object lives in .git/lfs/objects before pushing. The download queue lets us filter objects on the server out early. Why not add this check to the upload transfer adapters instead?

@ttaylorr
Copy link
Contributor Author

ttaylorr commented Jan 5, 2017

We should definitely not be making download check calls in batches of 1.

I definitely agree 👍 Taking another look at this PR, I think that I can do something cleaner than having the Notify(fn func(oid string, ok bool)) and a batch size of 1.

I have a hunch that the transfer queue checks to see if the object lives in .git/lfs/objects before pushing.

I'm not sure that this is the case. When I read the code, I thought this was checking for objects that the gitscanner thought were needed in the push, but were not present in .git/lfs/objects. When the *commands.uploadContext prepares an unfiltered set of *lfs.WrappedPointers for upload, it checks all of them to see whether lfs.ObjectExistsOfSize is true for that pointer. If the object does exist, it is added to the uploadables slice and then subsequently, the *tq.TransferQueue.

Otherwise, it's added to the missingLocalObjects set and then checkMissing is called on that set. That function adds it to the download check queue, which will tell us whether or not the server has that object (i.e., if it's OK to push even though the object is missing locally). Otherwise, if the server does not have that object present, it'll be added to the main *tq.TransferQueue, which will propagate an error back up when it tries to upload an object it does not have the data for.

Some of the solutions I proposed in the original PR I think are worth considering:

One option we could pursue is using a second transfer queue to keep track of the items that are not present in .git/lfs/objects, or using the *lfs.Batcher type to batch it ourselves.

I originally thought that this problem would have a large enough effect to warrant considering these alternatives seriously. However, if a user tries to push a LFS repository with many objects and they completed their initial clone/fetch with an -X flag touching many objects, this could be a significant problem.

Why not add this check to the upload transfer adapters instead?

I think this is an interesting approach. I like that it allows us to use a single transfer queue and greatly simplifies the implementation of the uploadContext. However, I'm not sure if this brings us as much benefit as we'd like in terms of network/API calls. I'm assuming that we'd do a check something like:

func (a *basicUploadAdapter) DoTransfer(ctx interface{}, t *Transfer, cb ProgressCallback, authOkFunc func()) error {
	rel, err := t.Actions.Get("upload")
	if err != nil {
		return err
		// return fmt.Errorf("No upload action for this object.")
	}

        if !lfs.ObjectExistsOfSize(t.Oid, t.Size) {
                if serverHasObject(t.Oid, t.Size) {
                        // OK, bail out early.
                }

                // Otherwise, we can't upload, and return a failure.
        }

        req, err := httputil.NewHttpRequest("PUT", rel.Href, rel.Header)
        if err != nil {
                return err
        }

        // ...
}

This seems clean to me, but we'd still be making one network call per each object missing locally.

@sinbad
Copy link
Contributor

sinbad commented Jan 6, 2017

Your reasoning about why it checks for missing objects locally is correct; this increasingly happens when you have multiple branches with some shared ancestry but where you've never actually fetched for some of the intermediate steps (which is fine if someone else pushed those objects).

I would advocate for putting as few bits of duplicated logic in the transfer adapters as possible. Dealing with the situation around missing objects locally but present on the server is identical regardless of what transfer adapter you're using, so IMO should not need to be implemented in every adapter. The adapter should only be called when you know you want to upload something. Remember that custom adapters are implemented outside the team, and therefore a clean line of responsibility between lfs and the transfer adapter is very important; you can't refactor a responsibility like this to an adapter without forcing all other adapters to duplicate the logic. That's a bad move IMO.

@ttaylorr
Copy link
Contributor Author

ttaylorr commented Jan 6, 2017

That's a bad move IMO.

I agree. @technoweenie reminded me yesterday in a video call that we actually don't need this check at all. If the server already has an object, it just wont send that object's OID (and thus, any upload action) in the batch response.

@sinbad
Copy link
Contributor

sinbad commented Jan 9, 2017

that we actually don't need this check at all. If the server already has an object, it just wont send that object's OID (and thus, any upload action) in the batch response.

That's true as well, but the pre-check was necessary (at least it used to be) because the pre-push would fail before even calling the API if it thought that the OID was missing locally, since it checked that while collecting the list of things to upload. I guess if the sequence is different this could be eliminated:

  1. Collect list of OIDs to push but don't check if they're present locally
  2. Call API - this strips out those the server already has
  3. Only upload missing, which now must fail if not local because these are now hard requirements

@ttaylorr
Copy link
Contributor Author

After some pairing with @technoweenie, I went ahead and removed the download check queue entirely (338ab40) ✨ . This change essentially ignores all pointer-related errors before the object is in the *tq.TransferQueue, and moves the lfs.ObjectExistsOfSize functionality there instead, no adapter changes required.

This works well, since the *tq.TransferQueue only performs this check if an object has an upload action associated with it (i.e., is needed for the push to complete).

Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great next steps forward for the gitscanner 🤘

@ttaylorr ttaylorr merged commit 474df80 into master Jan 11, 2017
@ttaylorr ttaylorr deleted the gitscanner-push branch January 11, 2017 22:47
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Oct 31, 2022
In commit 338ab40 of PR git-lfs#1812 the
prepareUpload() function was refactored so that it no longer called
checked whether an object to be pushed had a local file copy or not
and skipped it if it did not.

(That functionality already existed in the uploadTransfer() function
in that it called the ensureFile() function before returning a
tq.Transfer object.  Note that these functions were in the
commands/commands.go source file at the time.  The uploadTransfer()
function is called for each pointer blob object by uploadPointers().)

However, commit 338ab40 did not
update the comment in prepareUpload() which stated that the function
skipped any objects which did not have local file copies, so we now
revise that comment and add one in the uploadTransfer() function.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Oct 31, 2022
In commit 338ab40 of PR git-lfs#1812 the
prepareUpload() function was refactored so that it no longer checked
whether an object to be pushed had a local file copy or not and skipped it
if it did not.

(That functionality already existed in the uploadTransfer() function
in that it called the ensureFile() function before returning a
tq.Transfer object.  Note that these functions were in the
commands/commands.go source file at the time.  The uploadTransfer()
function is called for each pointer blob object by uploadPointers().)

However, commit 338ab40 did not
update the comment in prepareUpload() which stated that the function
skipped any objects which did not have local file copies, so we now
revise that comment and add one in the uploadTransfer() function.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Mar 28, 2025
The original version of the ensureFile() method of the uploadContext
structure in our "commands" package was first introduced in commit
5d239d6 of PR git-lfs#176 in 2015, and has
been refactored a number of times since then, but continues to be called
for each object a push operation intends to upload.

The method is documented as a function that checks whether a Git LFS
object file exists in the local .git/lfs/objects storage directories
for a given pointer, and if it does not, tries to replace the missing
object file by passing the contents of the corresponding file in the
working tree through our "clean" filter.

The description of PR git-lfs#176 explains the function's purpose as follows,
using the original pre-release name for the Git LFS project:

  If the .git/hawser/objects directory gets into a weird state
  (for example, if the user manually removed some files in there),
  this attempts to re-clean the objects based on the git repository
  file path.

The code comments preceding the ensureFile() method also describe it
in the same way, as do the notes in later PRs, such as PR git-lfs#2574, in
which the "lfs.allowIncompletePush" configuration option was introduced,
and PR git-lfs#3398, which refined the error message the Git LFS client reports
during a push operation when an object is missing locally and is also
not present on the remote server.

However, the ensureFile() method has never actually replaced missing
object files under any circumstances.  It does check whether an object
file is missing from the local storage directories, and if not, tests
whether a file exists in the current working tree at the path of the
Git LFS pointer associated with the object.  If such a file exists,
the method proceeds to run the Clean() method of the GitFilter structure
in our "lfs" package on the file's contents.

The Clean() method calculates the SHA-256 hash value of the file's
contents and creates a Pointer structure containing this hash value,
and also writes a copy of the file's data into a temporary file in
the .git/lfs/tmp directory.  It is then the responsibility of the
caller to determine whether or not this temporary file should be
moved into place in the .git/lfs/objects directory hierarchy.

The only other caller of the Clean() method, besides the ensureFile()
method, is the clean() function in the "commands" package, which
is used by multiple Git LFS commands including the "git lfs clean"
and "git lfs filter-process" plumbing commands, as well as the
"git lfs migrate import" command.

The clean() function performs several tasks after invoking the Clean()
method.  First, it checks whether the file processed by the method
was found to contain a Git LFS pointer; if so, no further action is
taken as we assume the file in the working tree has not been passed
through our "smudge" filter, and we do not want to create another
pointer which simply hashes and references the existing one.

Next, the clean() function checks whether a file already exists in the
local .git/lfs/objects storage directories at the location into which
the function would otherwise expect to move the temporary file
created by the Clean() method.  If a file does exist in this location
and has the same size as the temporary file, no further action is
taken, as we assume it contains the same contents and does not need
to be updated.

Assuming neither of these checks causes the clean() function to
return early, the function moves the temporary file created by the
Clean() method into the expected location within the .git/lfs/objects
directory hierarchy.

Unfortunately, because the ensureFile() method invokes the Clean() method
of the GitFilter structure but never performs any of the subsequent steps
taken by the clean() function, it does not ever actually recreate
a missing Git LFS object from a file found in the working tree.
This appears to have been the case at the time the ensureFile() method
was introduced in PR git-lfs#176, and has remained so ever since.

We could attempt to remedy this situation by altering the ensureFile()
method so it calls the clean() function.  To do so it would need to
simulate the conditions under which the function usually runs,
specifically within the "clean" filter context where the function is
expected to transform an input data stream into an output data stream.
We would likely use a Discard structure from the "io" package of the
standard Go library to simply discard the output from the clean()
function, as we do not need to send it back to Git in the way the
"git lfs filter-process" or "git lfs clean" commands do.

However, we would have to add logic to the clean() function guard
against the case where the file in the working tree had different
contents than those of the missing Git LFS object.  Because the
user may check out a Git reference in which a different file exists
at the same path in the working tree, or may simply modify the
file in the working tree independently, there is no guarantee that
the file we pass through the Clean() method is identical to the
one from which the missing Git LFS object was created.

The original implementation of the ensureFile() function, although it
did not fulfil its stated purpose, did include a check to verify
that the SHA hash of the working tree file, as returned by the Clean()
method, matched that of the missing object.  This check was removed in
commit 338ab40 of PR git-lfs#1812, which
would likely have introduced a serious bug, except that the ensureFile()
method never actually replaced any missing objects and so the removal
of this check had no functional impact.

While we could try to revise the ensureFile() method to operate as
was originally intended, the advantages of such a change are relatively
slim, and the disadvantages are several.  Most obviously, it requires
modifications to our clean() function to guard against the replacement
of object files with incorrect data, something the other callers of
the function do not need to be concerned about.

The fact that this is a concern at all is in turn due to the reasonable
chance that a file found in the current working tree at a given path
does not contain the identical data as that of an Git LFS object
generated from another file previously located at the same path.

As well, the fact that the ensureFile() method has never worked as
designed, despite being repeatedly refactored and enhanced over ten
years, suggests that its purpose is somewhat obscure and that the
requisite logic less intelligible than is ideal.  Users and developers
expect push operations to involve the transfer of data but not
the creation (or re-creation) of local data files, so the use of our
"clean" filter in such a context is not particularly intuitive.

For all these reasons, we just remove the ensureFile() method entirely,
which simplifies our handling of missing objects during upload transfer
operations.  Instead, we check for the presence of each object file
we intend to push in the uploadTransfer() method of our uploadContext
structure, and if a file is not found in the local storage directories,
we flag it as missing, unless the "lfs.allowIncompletePush" configuration
option is set to "true".

One consequence of this change is that when objects to be uploaded
are missing locally and are also not already present on the remote
server, and when the "lfs.allowIncompletePush" Git configuration
option is set to its default value of "false", we now always return
a consistent error message.

This occurs because, under these conditions, we now always abandon
the upload operation after the Git LFS Batch API request in which we
determine that an object which is missing locally is also not available
on the remote server.  Previously, we would sometimes continue past
this stage in the upload process, and only report a failure when we
tried to upload the object's data.  This occurred if the ensureFile()
method had found a file in the working tree at the same path as that
of the missing object (regardless of that file's contents), and thus
returned a "false" value to indicate that the object had been
recreated and was no longer missing.

Because the Git LFS client's behaviour is now more consistent when
handling missing objects during a push operation, we update several
tests in the t/t-pre-push.sh and t/t-push-failures-local.sh test
scripts to reflect this change.

In the "pre-push reject missing object" test in the t/t-pre-push.sh
script and the "push reject missing object (lfs.allowincompletepush
default)" test in the t/t-push-failures-local.sh script, we update
the error messages expected by the tests because the client now
reports an error as soon as it detects that the missing object is
not present on the remote server.

In the case of the latter test, this also means that we now expect
that no objects are uploaded to the server, because the client abandons
the upload process after the Batch API request from which it learns
that the missing object is also not present on the server, rather than
proceeding past this stage and uploading the object for which a local
copy is available.

In a previous commit in this PR we altered two other tests, in which
the "lfs.allowIncompletePush" option is set to "true", because those
tests only needed to remove a file from the local .git/lfs/objects
storage directories to simulate the condition where a Git LFS object
file is missing.  These tests did not also need to remove the Git LFS
pointer associated with the object from the repository's most recent
commit, or remove the copy of the object file's contents from the
current working tree.

We can now make the same changes to the two remaining tests that perform
these additional setup steps, namely the "pre-push reject missing object
(lfs.allowincompletepush default)" test in the t/t-pre-push.sh script
and the "push reject missing object (lfs.allowincompletepush false)"
test in the t/t-push-failures-local.sh script.  Because it is no longer
necessary to remove a working tree file in order to avoid causing
the ensureFile() method to report that the corresponding Git LFS object
is actually missing, we can simplify the setup steps in these tests
and bring them back into alignment with those of the related tests
where the "lfs.allowIncompletePush" option is set to "true".
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Mar 28, 2025
The original version of the ensureFile() method of the uploadContext
structure in our "commands" package was first introduced in commit
5d239d6 of PR git-lfs#176 in 2015, and has
been refactored a number of times since then, but continues to be called
for each object a push operation intends to upload.

The method is documented as a function that checks whether a Git LFS
object file exists in the local .git/lfs/objects storage directories
for a given pointer, and if it does not, tries to replace the missing
object file by passing the contents of the corresponding file in the
working tree through our "clean" filter.

The description of PR git-lfs#176 explains the function's purpose as follows,
using the original pre-release name for the Git LFS project:

  If the .git/hawser/objects directory gets into a weird state
  (for example, if the user manually removed some files in there),
  this attempts to re-clean the objects based on the git repository
  file path.

The code comments preceding the ensureFile() method also describe it
in the same way, as do the notes in later PRs, such as PR git-lfs#2574, in
which the "lfs.allowIncompletePush" configuration option was introduced,
and PR git-lfs#3398, which refined the error message the Git LFS client reports
during a push operation when an object is missing locally and is also
not present on the remote server.

However, the ensureFile() method has never actually replaced missing
object files under any circumstances.  It does check whether an object
file is missing from the local storage directories, and if not, tests
whether a file exists in the current working tree at the path of the
Git LFS pointer associated with the object.  If such a file exists,
the method proceeds to run the Clean() method of the GitFilter structure
in our "lfs" package on the file's contents.

The Clean() method calculates the SHA-256 hash value of the file's
contents and creates a Pointer structure containing this hash value,
and also writes a copy of the file's data into a temporary file in
the .git/lfs/tmp directory.  It is then the responsibility of the
caller to determine whether or not this temporary file should be
moved into place in the .git/lfs/objects directory hierarchy.

The only other caller of the Clean() method, besides the ensureFile()
method, is the clean() function in the "commands" package, which
is used by multiple Git LFS commands including the "git lfs clean"
and "git lfs filter-process" plumbing commands, as well as the
"git lfs migrate import" command.

The clean() function performs several tasks after invoking the Clean()
method.  First, it checks whether the file processed by the method
was found to contain a Git LFS pointer; if so, no further action is
taken as we assume the file in the working tree has not been passed
through our "smudge" filter, and we do not want to create another
pointer which simply hashes and references the existing one.

Next, the clean() function checks whether a file already exists in the
local .git/lfs/objects storage directories at the location into which
the function would otherwise expect to move the temporary file
created by the Clean() method.  If a file does exist in this location
and has the same size as the temporary file, no further action is
taken, as we assume it contains the same contents and does not need
to be updated.

Assuming neither of these checks causes the clean() function to
return early, the function moves the temporary file created by the
Clean() method into the expected location within the .git/lfs/objects
directory hierarchy.

Unfortunately, because the ensureFile() method invokes the Clean() method
of the GitFilter structure but never performs any of the subsequent steps
taken by the clean() function, it does not ever actually recreate
a missing Git LFS object from a file found in the working tree.
This appears to have been the case at the time the ensureFile() method
was introduced in PR git-lfs#176, and has remained so ever since.

We could attempt to remedy this situation by altering the ensureFile()
method so it calls the clean() function.  To do so it would need to
simulate the conditions under which the function usually runs,
specifically within the "clean" filter context where the function is
expected to transform an input data stream into an output data stream.
We would likely use a Discard structure from the "io" package of the
standard Go library to simply discard the output from the clean()
function, as we do not need to send it back to Git in the way the
"git lfs filter-process" or "git lfs clean" commands do.

However, we would have to add logic to the clean() function guard
against the case where the file in the working tree had different
contents than those of the missing Git LFS object.  Because the
user may check out a Git reference in which a different file exists
at the same path in the working tree, or may simply modify the
file in the working tree independently, there is no guarantee that
the file we pass through the Clean() method is identical to the
one from which the missing Git LFS object was created.

The original implementation of the ensureFile() function, although it
did not fulfil its stated purpose, did include a check to verify
that the SHA hash of the working tree file, as returned by the Clean()
method, matched that of the missing object.  This check was removed in
commit 338ab40 of PR git-lfs#1812, which
would likely have introduced a serious bug, except that the ensureFile()
method never actually replaced any missing objects and so the removal
of this check had no functional impact.

While we could try to revise the ensureFile() method to operate as
was originally intended, the advantages of such a change are relatively
slim, and the disadvantages are several.  Most obviously, it requires
modifications to our clean() function to guard against the replacement
of object files with incorrect data, something the other callers of
the function do not need to be concerned about.

The fact that this is a concern at all is in turn due to the reasonable
chance that a file found in the current working tree at a given path
does not contain the identical data as that of an Git LFS object
generated from another file previously located at the same path.

As well, the fact that the ensureFile() method has never worked as
designed, despite being repeatedly refactored and enhanced over ten
years, suggests that its purpose is somewhat obscure and that the
requisite logic less intelligible than is ideal.  Users and developers
expect push operations to involve the transfer of data but not
the creation (or re-creation) of local data files, so the use of our
"clean" filter in such a context is not particularly intuitive.

For all these reasons, we just remove the ensureFile() method entirely,
which simplifies our handling of missing objects during upload transfer
operations.  Instead, we check for the presence of each object file
we intend to push in the uploadTransfer() method of our uploadContext
structure, and if a file is not found in the local storage directories,
we flag it as missing, unless the "lfs.allowIncompletePush" configuration
option is set to "true".

One consequence of this change is that when objects to be uploaded
are missing locally and are also not already present on the remote
server, and when the "lfs.allowIncompletePush" Git configuration
option is set to its default value of "false", we now always return
a consistent error message.

This occurs because, under these conditions, we now always abandon
the upload operation after the Git LFS Batch API request in which we
determine that an object which is missing locally is also not available
on the remote server.  Previously, we would sometimes continue past
this stage in the upload process, and only report a failure when we
tried to upload the object's data.  This occurred if the ensureFile()
method had found a file in the working tree at the same path as that
of the missing object (regardless of that file's contents), and thus
returned a "false" value to indicate that the object had been
recreated and was no longer missing.

Because the Git LFS client's behaviour is now more consistent when
handling missing objects during a push operation, we update several
tests in the t/t-pre-push.sh and t/t-push-failures-local.sh test
scripts to reflect this change.

In the "pre-push reject missing object" test in the t/t-pre-push.sh
script and the "push reject missing object (lfs.allowincompletepush
default)" test in the t/t-push-failures-local.sh script, we update
the error messages expected by the tests because the client now
reports an error as soon as it detects that the missing object is
not present on the remote server.

In the case of the latter test, this also means that we now expect
that no objects are uploaded to the server, because the client abandons
the upload process after the Batch API request from which it learns
that the missing object is also not present on the server, rather than
proceeding past this stage and uploading the object for which a local
copy is available.

In a previous commit in this PR we altered two other tests, in which
the "lfs.allowIncompletePush" option is set to "true", because those
tests only needed to remove a file from the local .git/lfs/objects
storage directories to simulate the condition where a Git LFS object
file is missing.  These tests did not also need to remove the Git LFS
pointer associated with the object from the repository's most recent
commit, or remove the copy of the object file's contents from the
current working tree.

We can now make the same changes to the two remaining tests that perform
these additional setup steps, namely the "pre-push reject missing object
(lfs.allowincompletepush default)" test in the t/t-pre-push.sh script
and the "push reject missing object (lfs.allowincompletepush false)"
test in the t/t-push-failures-local.sh script.  Because it is no longer
necessary to remove a working tree file in order to avoid causing
the ensureFile() method to report that the corresponding Git LFS object
is actually missing, we can simplify the setup steps in these tests
and bring them back into alignment with those of the related tests
where the "lfs.allowIncompletePush" option is set to "true".
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 3, 2025
The original version of the ensureFile() method of the uploadContext
structure in our "commands" package was first introduced in commit
5d239d6 of PR git-lfs#176 in 2015, and has
been refactored a number of times since then, but continues to be called
for each object a push operation intends to upload.

The method is documented as a function that checks whether a Git LFS
object file exists in the local .git/lfs/objects storage directories
for a given pointer, and if it does not, tries to replace the missing
object file by passing the contents of the corresponding file in the
working tree through our "clean" filter.

The description of PR git-lfs#176 explains the function's purpose as follows,
using the original pre-release name for the Git LFS project:

  If the .git/hawser/objects directory gets into a weird state
  (for example, if the user manually removed some files in there),
  this attempts to re-clean the objects based on the git repository
  file path.

The code comments preceding the ensureFile() method also describe it
in the same way, as do the notes in later PRs, such as PR git-lfs#2574, in
which the "lfs.allowIncompletePush" configuration option was introduced,
and PR git-lfs#3398, which refined the error message the Git LFS client reports
during a push operation when an object is missing locally and is also
not present on the remote server.

However, the ensureFile() method has never actually replaced missing
object files under any circumstances.  It does check whether an object
file is missing from the local storage directories, and if not, tests
whether a file exists in the current working tree at the path of the
Git LFS pointer associated with the object.  If such a file exists,
the method proceeds to run the Clean() method of the GitFilter structure
in our "lfs" package on the file's contents.

The Clean() method calculates the SHA-256 hash value of the file's
contents and creates a Pointer structure containing this hash value,
and also writes a copy of the file's data into a temporary file in
the .git/lfs/tmp directory.  It is then the responsibility of the
caller to determine whether or not this temporary file should be
moved into place in the .git/lfs/objects directory hierarchy.

The only other caller of the Clean() method, besides the ensureFile()
method, is the clean() function in the "commands" package, which
is used by multiple Git LFS commands including the "git lfs clean"
and "git lfs filter-process" plumbing commands, as well as the
"git lfs migrate import" command.

The clean() function performs several tasks after invoking the Clean()
method.  First, it checks whether the file processed by the method
was found to contain a Git LFS pointer; if so, no further action is
taken as we assume the file in the working tree has not been passed
through our "smudge" filter, and we do not want to create another
pointer which simply hashes and references the existing one.

Next, the clean() function checks whether a file already exists in the
local .git/lfs/objects storage directories at the location into which
the function would otherwise expect to move the temporary file
created by the Clean() method.  If a file does exist in this location
and has the same size as the temporary file, no further action is
taken, as we assume it contains the same contents and does not need
to be updated.

Assuming neither of these checks causes the clean() function to
return early, the function moves the temporary file created by the
Clean() method into the expected location within the .git/lfs/objects
directory hierarchy.

Unfortunately, because the ensureFile() method invokes the Clean() method
of the GitFilter structure but never performs any of the subsequent steps
taken by the clean() function, it never recreates a missing Git LFS object
from a file found in the working tree.  This appears to have been the
case at the time the ensureFile() method was introduced in PR git-lfs#176,
and has remained so ever since.

We could attempt to remedy this situation by altering the ensureFile()
method so it calls the clean() function.  To do so it would need to
simulate the conditions under which the function usually runs,
specifically within the "clean" filter context where the function is
expected to transform an input data stream into an output data stream.
We would likely use a Discard structure from the "io" package of the
standard Go library to simply discard the output from the clean()
function, as we do not need to send it back to Git in the way the
"git lfs filter-process" or "git lfs clean" commands do.

However, we would have to add logic to the clean() function to guard
against the case where the file in the working tree had different
contents than those of the missing Git LFS object.  Because the
user may check out a Git reference in which a different file exists
at the same path in the working tree, or may simply modify the
file in the working tree independently, there is no guarantee that
the file we pass through the Clean() method is identical to the
one from which the missing Git LFS object was created.

The original implementation of the ensureFile() function, although it
did not fulfil its stated purpose, did include a check to verify
that the SHA hash of the working tree file, as returned by the Clean()
method, matched that of the missing object.  This check was removed in
commit 338ab40 of PR git-lfs#1812, which
would likely have introduced a serious bug, except that the ensureFile()
method never actually replaced any missing objects and so the removal
of this check had no functional impact.

While we could try to revise the ensureFile() method to operate as
was originally intended, the advantages of such a change are relatively
slim, and the disadvantages are several.  Most obviously, it requires
modifications to our clean() function to guard against the replacement
of object files with incorrect data, something the other callers of
the function do not need to be concerned about.

That this is a concern at all is in turn due to the reasonable
chance that a file found in the current working tree at a given path
does not contain the identical data as that of an Git LFS object
generated from another file previously located at the same path.

As well, the fact that the ensureFile() method has never worked as
designed, despite being repeatedly refactored and enhanced over ten
years, suggests that its purpose is somewhat obscure and that the
requisite logic less intelligible than is ideal.  Users and developers
expect push operations to involve the transfer of data but not
the creation (or re-creation) of local data files, so the use of our
"clean" filter in such a context is not particularly intuitive.

For all these reasons, we just remove the ensureFile() method entirely,
which simplifies our handling of missing objects during upload transfer
operations.  Instead, we check for the presence of each object file
we intend to push in the uploadTransfer() method of our uploadContext
structure, and if a file is not found in the local storage directories,
we flag it as missing, unless the "lfs.allowIncompletePush" configuration
option is set to "true".  We also use the IsNotExist() function from
the "os" package in the Go standard library to ascertain whether an
object file is missing, or if some other type of error prevents us
from reading its state.  This mirrors the more detailed checks performed
on each object file during a push operation by the partitionTransfers()
method of the TransferQueue structure in the "tq" package.

One consequence of this change is that when an object to be uploaded
is missing locally and is also not already present on the remote
server, and when the "lfs.allowIncompletePush" Git configuration
option is set to its default value of "false", we now always abandon
a push operation after the remote server indicates that it expects
the client to upload the object.

This means that in the "push reject missing object
(lfs.allowincompletepush default)*" tests in our
t/t-push-failures-local.sh test script, we should now expect
to find a trace log message output by the client stating that the
push operation's batch queue has been stopped because an object
is missing on both the local system and the remote server.

We added this trace log message in a prior commit in this PR, and were
able to insert checks for it in several other tests in our test suite,
but only because those tests either did not create a file in the working
tree at all, as in the case of the "pre-push reject missing object"
test in our t/t-pre-push.sh script, or removed the file in the working
tree that corresponded to the object file they removed from the
.git/lfs/objects storage directories.

As a result of the changes in this commit, we can also now simplify
the three tests that performed this extra setup step, where they
used to remove the file in the working tree which corresponded to
the object file they removed from the local Git LFS storage directories.
This step is no longer necessary to cause the client to abandon the
push operation after the server indicates that it requires an upload
of an object the client has determined is missing from the local system.
Therefore we can remove these extra setup steps from both of the
"push reject missing object (lfs.allowincompletepush false)*" tests
in the t/t-push-failures-local.sh script, and from the "pre-push
reject missing object (lfs.allowincompletepush default)" test in the
t/t-pre-push.sh script.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 3, 2025
The original version of the ensureFile() method of the uploadContext
structure in our "commands" package was first introduced in commit
5d239d6 of PR git-lfs#176 in 2015, and has
been refactored a number of times since then, but continues to be called
for each object a push operation intends to upload.

The method is documented as a function that checks whether a Git LFS
object file exists in the local .git/lfs/objects storage directories
for a given pointer, and if it does not, tries to replace the missing
object file by passing the contents of the corresponding file in the
working tree through our "clean" filter.

The description of PR git-lfs#176 explains the function's purpose as follows,
using the original pre-release name for the Git LFS project:

  If the .git/hawser/objects directory gets into a weird state
  (for example, if the user manually removed some files in there),
  this attempts to re-clean the objects based on the git repository
  file path.

The code comments preceding the ensureFile() method also describe it
in the same way, as do the notes in later PRs, such as PR git-lfs#2574, in
which the "lfs.allowIncompletePush" configuration option was introduced,
and PR git-lfs#3398, which refined the error message the Git LFS client reports
during a push operation when an object is missing locally and is also
not present on the remote server.

However, the ensureFile() method has never actually replaced missing
object files under any circumstances.  It does check whether an object
file is missing from the local storage directories, and if not, tests
whether a file exists in the current working tree at the path of the
Git LFS pointer associated with the object.  If such a file exists,
the method proceeds to run the Clean() method of the GitFilter structure
in our "lfs" package on the file's contents.

The Clean() method calculates the SHA-256 hash value of the file's
contents and creates a Pointer structure containing this hash value,
and also writes a copy of the file's data into a temporary file in
the .git/lfs/tmp directory.  It is then the responsibility of the
caller to determine whether or not this temporary file should be
moved into place in the .git/lfs/objects directory hierarchy.

The only other caller of the Clean() method, besides the ensureFile()
method, is the clean() function in the "commands" package, which
is used by multiple Git LFS commands including the "git lfs clean"
and "git lfs filter-process" plumbing commands, as well as the
"git lfs migrate import" command.

The clean() function performs several tasks after invoking the Clean()
method.  First, it checks whether the file processed by the method
was found to contain a Git LFS pointer; if so, no further action is
taken as we assume the file in the working tree has not been passed
through our "smudge" filter, and we do not want to create another
pointer which simply hashes and references the existing one.

Next, the clean() function checks whether a file already exists in the
local .git/lfs/objects storage directories at the location into which
the function would otherwise expect to move the temporary file
created by the Clean() method.  If a file does exist in this location
and has the same size as the temporary file, no further action is
taken, as we assume it contains the same contents and does not need
to be updated.

Assuming neither of these checks causes the clean() function to
return early, the function moves the temporary file created by the
Clean() method into the expected location within the .git/lfs/objects
directory hierarchy.

Unfortunately, because the ensureFile() method invokes the Clean() method
of the GitFilter structure but never performs any of the subsequent steps
taken by the clean() function, it never recreates a missing Git LFS object
from a file found in the working tree.  This appears to have been the
case at the time the ensureFile() method was introduced in PR git-lfs#176,
and has remained so ever since.

We could attempt to remedy this situation by altering the ensureFile()
method so it calls the clean() function.  To do so it would need to
simulate the conditions under which the function usually runs,
specifically within the "clean" filter context where the function is
expected to transform an input data stream into an output data stream.
We would likely use a Discard structure from the "io" package of the
standard Go library to simply discard the output from the clean()
function, as we do not need to send it back to Git in the way the
"git lfs filter-process" or "git lfs clean" commands do.

However, we would have to add logic to the clean() function to guard
against the case where the file in the working tree had different
contents than those of the missing Git LFS object.  Because the
user may check out a Git reference in which a different file exists
at the same path in the working tree, or may simply modify the
file in the working tree independently, there is no guarantee that
the file we pass through the Clean() method is identical to the
one from which the missing Git LFS object was created.

The original implementation of the ensureFile() function, although it
did not fulfil its stated purpose, did include a check to verify
that the SHA hash of the working tree file, as returned by the Clean()
method, matched that of the missing object.  This check was removed in
commit 338ab40 of PR git-lfs#1812, which
would likely have introduced a serious bug, except that the ensureFile()
method never actually replaced any missing objects and so the removal
of this check had no functional impact.

While we could try to revise the ensureFile() method to operate as
was originally intended, the advantages of such a change are relatively
slim, and the disadvantages are several.  Most obviously, it requires
modifications to our clean() function to guard against the replacement
of object files with incorrect data, something the other callers of
the function do not need to be concerned about.

That this is a concern at all is in turn due to the reasonable
chance that a file found in the current working tree at a given path
does not contain the identical data as that of an Git LFS object
generated from another file previously located at the same path.

As well, the fact that the ensureFile() method has never worked as
designed, despite being repeatedly refactored and enhanced over ten
years, suggests that its purpose is somewhat obscure and that the
requisite logic is less intelligible than would be ideal.  Users and
developers expect push operations to involve the transfer of data but
not the creation (or re-creation) of local data files, so the use of
some of our "clean" filter code in such a context is not particularly
intuitive.

For all these reasons, we just remove the ensureFile() method entirely,
which simplifies our handling of missing objects during upload transfer
operations.  Instead, we check for the presence of each object file
we intend to push in the uploadTransfer() method of our uploadContext
structure, and if a file is not found in the local storage directories,
we flag it as missing, unless the "lfs.allowIncompletePush" configuration
option is set to "true".  We also use the IsNotExist() function from
the "os" package in the Go standard library to ascertain whether an
object file is missing, or if some other type of error prevents us
from reading its state.  This mirrors the more detailed checks performed
on each object file during a push operation by the partitionTransfers()
method of the TransferQueue structure in the "tq" package.

One consequence of this change is that when an object to be uploaded
is missing locally and is also not already present on the remote
server, and when the "lfs.allowIncompletePush" Git configuration
option is set to its default value of "false", we now always abandon
a push operation after the remote server indicates that it expects
the client to upload the object.

This means that in the "push reject missing object
(lfs.allowincompletepush default)*" tests in our
t/t-push-failures-local.sh test script, we should now expect
to find a trace log message output by the client stating that the
push operation's batch queue has been stopped because an object
is missing on both the local system and the remote server.

We added this trace log message in a prior commit in this PR, and were
able to insert checks for it in several other tests in our test suite,
but only because those tests either did not create a file in the working
tree at all, as in the case of the "pre-push reject missing object"
test in our t/t-pre-push.sh script, or removed the file in the working
tree that corresponded to the object file they removed from the
.git/lfs/objects storage directories.

As a result of the changes in this commit, we can also now simplify
the three tests that performed this extra setup step, where they
used to remove the file in the working tree which corresponded to
the object file they removed from the local Git LFS storage directories.
This step is no longer necessary to cause the client to abandon the
push operation after the server indicates that it requires an upload
of an object the client has determined is missing from the local system.
Therefore we can remove these extra setup steps from both of the
"push reject missing object (lfs.allowincompletepush false)*" tests
in the t/t-push-failures-local.sh script, and from the "pre-push
reject missing object (lfs.allowincompletepush default)" test in the
t/t-pre-push.sh script.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 3, 2025
The original version of the ensureFile() method of the uploadContext
structure in our "commands" package was first introduced in commit
5d239d6 of PR git-lfs#176 in 2015, and has
been refactored a number of times since then, but continues to be called
for each object a push operation intends to upload.

The method is documented as a function that checks whether a Git LFS
object file exists in the local .git/lfs/objects storage directories
for a given pointer, and if it does not, tries to replace the missing
object file by passing the contents of the corresponding file in the
working tree through our "clean" filter.

The description of PR git-lfs#176 explains the function's purpose as follows,
using the original pre-release name for the Git LFS project:

  If the .git/hawser/objects directory gets into a weird state
  (for example, if the user manually removed some files in there),
  this attempts to re-clean the objects based on the git repository
  file path.

The code comments preceding the ensureFile() method also describe it
in the same way, as do the notes in later PRs, such as PR git-lfs#2574, in
which the "lfs.allowIncompletePush" configuration option was introduced,
and PR git-lfs#3398, which refined the error message the Git LFS client reports
during a push operation when an object is missing locally and is also
not present on the remote server.

However, the ensureFile() method has never actually replaced missing
object files under any circumstances.  It does check whether an object
file is missing from the local storage directories, and if not, tests
whether a file exists in the current working tree at the path of the
Git LFS pointer associated with the object.  If such a file exists,
the method proceeds to run the Clean() method of the GitFilter structure
in our "lfs" package on the file's contents.

The Clean() method calculates the SHA-256 hash value of the file's
contents and creates a Pointer structure containing this hash value,
and also writes a copy of the file's data into a temporary file in
the .git/lfs/tmp directory.  It is then the responsibility of the
caller to determine whether or not this temporary file should be
moved into place in the .git/lfs/objects directory hierarchy.

The only other caller of the Clean() method, besides the ensureFile()
method, is the clean() function in the "commands" package, which
is used by multiple Git LFS commands including the "git lfs clean"
and "git lfs filter-process" plumbing commands, as well as the
"git lfs migrate import" command.

The clean() function performs several tasks after invoking the Clean()
method.  First, it checks whether the file processed by the method
was found to contain a Git LFS pointer; if so, no further action is
taken as we assume the file in the working tree has not been passed
through our "smudge" filter, and we do not want to create another
pointer which simply hashes and references the existing one.

Next, the clean() function checks whether a file already exists in the
local .git/lfs/objects storage directories at the location into which
the function would otherwise expect to move the temporary file
created by the Clean() method.  If a file does exist in this location
and has the same size as the temporary file, no further action is
taken, as we assume it contains the same contents and does not need
to be updated.

Assuming neither of these checks causes the clean() function to
return early, the function moves the temporary file created by the
Clean() method into the expected location within the .git/lfs/objects
directory hierarchy.

Unfortunately, because the ensureFile() method invokes the Clean() method
of the GitFilter structure but never performs any of the subsequent steps
taken by the clean() function, it never recreates a missing Git LFS object
from a file found in the working tree.  This appears to have been the
case at the time the ensureFile() method was introduced in PR git-lfs#176,
and has remained so ever since.

We could attempt to remedy this situation by altering the ensureFile()
method so it calls the clean() function.  To do so it would need to
simulate the conditions under which the function usually runs,
specifically within the "clean" filter context where the function is
expected to transform an input data stream into an output data stream.
We would likely use a Discard structure from the "io" package of the
standard Go library to simply discard the output from the clean()
function, as we do not need to send it back to Git in the way the
"git lfs filter-process" or "git lfs clean" commands do.

However, we would have to add logic to the clean() function to guard
against the case where the file in the working tree had different
contents than those of the missing Git LFS object.  Because the
user may check out a Git reference in which a different file exists
at the same path in the working tree, or may simply modify the
file in the working tree independently, there is no guarantee that
the file we pass through the Clean() method is identical to the
one from which the missing Git LFS object was created.

The original implementation of the ensureFile() function, although it
did not fulfil its stated purpose, did include a check to verify
that the SHA hash of the working tree file, as returned by the Clean()
method, matched that of the missing object.  This check was removed in
commit 338ab40 of PR git-lfs#1812, which
would likely have introduced a serious bug, except that the ensureFile()
method never actually replaced any missing objects and so the removal
of this check had no functional impact.

While we could try to revise the ensureFile() method to operate as
was originally intended, the advantages of such a change are relatively
slim, and the disadvantages are several.  Most obviously, it requires
modifications to our clean() function to guard against the replacement
of object files with incorrect data, something the other callers of
the function do not need to be concerned about.

That this is a concern at all is in turn due to the reasonable
chance that a file found in the current working tree at a given path
does not contain the identical data as that of an Git LFS object
generated from another file previously located at the same path.

As well, the fact that the ensureFile() method has never worked as
designed, despite being repeatedly refactored and enhanced over ten
years, suggests that its purpose is somewhat obscure and that the
requisite logic is less intelligible than would be ideal.  Users and
developers expect push operations to involve the transfer of data but
not the creation (or re-creation) of local data files, so the use of
some of our "clean" filter code in such a context is not particularly
intuitive.

For all these reasons, we just remove the ensureFile() method entirely,
which simplifies our handling of missing objects during upload transfer
operations.  Instead, we check for the presence of each object file
we intend to push in the uploadTransfer() method of our uploadContext
structure, and if a file is not found in the local storage directories,
we flag it as missing, unless the "lfs.allowIncompletePush" configuration
option is set to "true".  We also use the IsNotExist() function from
the "os" package in the Go standard library to ascertain whether an
object file is missing, or if some other type of error prevents us
from reading its state.  This mirrors the more detailed checks performed
on each object file during a push operation by the partitionTransfers()
method of the TransferQueue structure in the "tq" package.

One consequence of this change is that when an object to be uploaded
is missing locally and is also not already present on the remote
server, and when the "lfs.allowIncompletePush" Git configuration
option is set to its default value of "false", we now always abandon
a push operation after the remote server indicates that it expects
the client to upload the object.

This means that in the "push reject missing object
(lfs.allowincompletepush default)*" tests in our
t/t-push-failures-local.sh test script, we should now expect
to find a trace log message output by the client stating that the
push operation's batch queue has been stopped because an object
is missing on both the local system and the remote server.

We added this trace log message in a prior commit in this PR, and were
able to insert checks for it in several other tests in our test suite,
but only because those tests either did not create a file in the working
tree at all, as in the case of the "pre-push reject missing object"
test in our t/t-pre-push.sh script, or removed the file in the working
tree that corresponded to the object file they removed from the
.git/lfs/objects storage directories.

As a result of the changes in this commit, we can also now simplify
the three tests that performed this extra setup step, where they
used to remove the file in the working tree which corresponded to
the object file they removed from the local Git LFS storage directories.
This step is no longer necessary to cause the client to abandon the
push operation after the server indicates that it requires an upload
of an object the client has determined is missing from the local system.
Therefore we can remove these extra setup steps from both of the
"push reject missing object (lfs.allowincompletepush false)*" tests
in the t/t-push-failures-local.sh script, and from the "pre-push
reject missing object (lfs.allowincompletepush default)" test in the
t/t-pre-push.sh script.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request May 22, 2025
The original version of the ensureFile() method of the uploadContext
structure in our "commands" package was first introduced in commit
5d239d6 of PR git-lfs#176 in 2015, and has
been refactored a number of times since then, but continues to be called
for each object a push operation intends to upload.

The method is documented as a function that checks whether a Git LFS
object file exists in the local .git/lfs/objects storage directories
for a given pointer, and if it does not, tries to replace the missing
object file by passing the contents of the corresponding file in the
working tree through our "clean" filter.

The description of PR git-lfs#176 explains the function's purpose as follows,
using the original pre-release name for the Git LFS project:

  If the .git/hawser/objects directory gets into a weird state
  (for example, if the user manually removed some files in there),
  this attempts to re-clean the objects based on the git repository
  file path.

The code comments preceding the ensureFile() method also describe it
in the same way, as do the notes in later PRs, such as PR git-lfs#2574, in
which the "lfs.allowIncompletePush" configuration option was introduced,
and PR git-lfs#3398, which refined the error message the Git LFS client reports
during a push operation when an object is missing locally and is also
not present on the remote server.

However, the ensureFile() method has never actually replaced missing
object files under any circumstances.  It does check whether an object
file is missing from the local storage directories, and if not, tests
whether a file exists in the current working tree at the path of the
Git LFS pointer associated with the object.  If such a file exists,
the method proceeds to run the Clean() method of the GitFilter structure
in our "lfs" package on the file's contents.

The Clean() method calculates the SHA-256 hash value of the file's
contents and creates a Pointer structure containing this hash value,
and also writes a copy of the file's data into a temporary file in
the .git/lfs/tmp directory.  It is then the responsibility of the
caller to determine whether or not this temporary file should be
moved into place in the .git/lfs/objects directory hierarchy.

The only other caller of the Clean() method, besides the ensureFile()
method, is the clean() function in the "commands" package, which
is used by multiple Git LFS commands including the "git lfs clean"
and "git lfs filter-process" plumbing commands, as well as the
"git lfs migrate import" command.

The clean() function performs several tasks after invoking the Clean()
method.  First, it checks whether the file processed by the method
was found to contain a Git LFS pointer; if so, no further action is
taken as we assume the file in the working tree has not been passed
through our "smudge" filter, and we do not want to create another
pointer which simply hashes and references the existing one.

Next, the clean() function checks whether a file already exists in the
local .git/lfs/objects storage directories at the location into which
the function would otherwise expect to move the temporary file
created by the Clean() method.  If a file does exist in this location
and has the same size as the temporary file, no further action is
taken, as we assume it contains the same contents and does not need
to be updated.

Assuming neither of these checks causes the clean() function to
return early, the function moves the temporary file created by the
Clean() method into the expected location within the .git/lfs/objects
directory hierarchy.

Unfortunately, because the ensureFile() method invokes the Clean() method
of the GitFilter structure but never performs any of the subsequent steps
taken by the clean() function, it never recreates a missing Git LFS object
from a file found in the working tree.  This appears to have been the
case at the time the ensureFile() method was introduced in PR git-lfs#176,
and has remained so ever since.

We could attempt to remedy this situation by altering the ensureFile()
method so it calls the clean() function.  To do so it would need to
simulate the conditions under which the function usually runs,
specifically within the "clean" filter context where the function is
expected to transform an input data stream into an output data stream.
We would likely use a Discard structure from the "io" package of the
standard Go library to simply discard the output from the clean()
function, as we do not need to send it back to Git in the way the
"git lfs filter-process" or "git lfs clean" commands do.

However, we would have to add logic to the clean() function to guard
against the case where the file in the working tree had different
contents than those of the missing Git LFS object.  Because the
user may check out a Git reference in which a different file exists
at the same path in the working tree, or may simply modify the
file in the working tree independently, there is no guarantee that
the file we pass through the Clean() method is identical to the
one from which the missing Git LFS object was created.

The original implementation of the ensureFile() function, although it
did not fulfil its stated purpose, did include a check to verify
that the SHA hash of the working tree file, as returned by the Clean()
method, matched that of the missing object.  This check was removed in
commit 338ab40 of PR git-lfs#1812, which
would likely have introduced a serious bug, except that the ensureFile()
method never actually replaced any missing objects and so the removal
of this check had no functional impact.

While we could try to revise the ensureFile() method to operate as
was originally intended, the advantages of such a change are relatively
slim, and the disadvantages are several.  Most obviously, it requires
modifications to our clean() function to guard against the replacement
of object files with incorrect data, something the other callers of
the function do not need to be concerned about.

That this is a concern at all is in turn due to the reasonable
chance that a file found in the current working tree at a given path
does not contain the identical data as that of an Git LFS object
generated from another file previously located at the same path.

As well, the fact that the ensureFile() method has never worked as
designed, despite being repeatedly refactored and enhanced over ten
years, suggests that its purpose is somewhat obscure and that the
requisite logic is less intelligible than would be ideal.  Users and
developers expect push operations to involve the transfer of data but
not the creation (or re-creation) of local data files, so the use of
some of our "clean" filter code in such a context is not particularly
intuitive.

For all these reasons, we just remove the ensureFile() method entirely,
which simplifies our handling of missing objects during upload transfer
operations.  Instead, we check for the presence of each object file
we intend to push in the uploadTransfer() method of our uploadContext
structure, and if a file is not found in the local storage directories,
we flag it as missing, unless the "lfs.allowIncompletePush" configuration
option is set to "true".  We also use the IsNotExist() function from
the "os" package in the Go standard library to ascertain whether an
object file is missing, or if some other type of error prevents us
from reading its state.  This mirrors the more detailed checks performed
on each object file during a push operation by the partitionTransfers()
method of the TransferQueue structure in the "tq" package.

One consequence of this change is that when an object to be uploaded
is missing locally and is also not already present on the remote
server, and when the "lfs.allowIncompletePush" Git configuration
option is set to its default value of "false", we now always abandon
a push operation after the remote server indicates that it expects
the client to upload the object.

This means that in the "push reject missing object
(lfs.allowincompletepush default)*" tests in our
t/t-push-failures-local.sh test script, we should now expect
to find a trace log message output by the client stating that the
push operation's batch queue has been stopped because an object
is missing on both the local system and the remote server.

We added this trace log message in a prior commit in this PR, and were
able to insert checks for it in several other tests in our test suite,
but only because those tests either did not create a file in the working
tree at all, as in the case of the "pre-push reject missing object"
test in our t/t-pre-push.sh script, or removed the file in the working
tree that corresponded to the object file they removed from the
.git/lfs/objects storage directories.

As a result of the changes in this commit, we can also now simplify
the three tests that performed this extra setup step, where they
used to remove the file in the working tree which corresponded to
the object file they removed from the local Git LFS storage directories.
This step is no longer necessary to cause the client to abandon the
push operation after the server indicates that it requires an upload
of an object the client has determined is missing from the local system.
Therefore we can remove these extra setup steps from both of the
"push reject missing object (lfs.allowincompletepush false)*" tests
in the t/t-push-failures-local.sh script, and from the "pre-push
reject missing object (lfs.allowincompletepush default)" test in the
t/t-pre-push.sh script.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants