Skip to content

Conversation

@chrisd8088
Copy link
Member

In PR #2851 the git lfs prune command was changed to respect the lfs.fetchexclude configuration option such that objects would always be pruned if they were referenced by files whose paths matched one of the patterns in the configuration option (unless they were referenced by an unpushed commit).

However, this filter is applied using the GitScanner.ScanRef() method, which indirectly utilizes the internal scanRefsToChan() function, and that function only visits unique Git LFS OIDs a single time each, even if they are referenced by multiple tree entries (i.e., if there are multiple files with the same content).

This means that if an LFS object appears in both a file that matches a pattern from lfs.fetchexclude and in a file that
does not match, the object may be pruned if the file path seen during the scan is the matching one regardless of whether the non-matching file would otherwise have its object retained.

To resolve this we change the pruneTaskGetRetainedAtRef() function to use the GitScanner.ScanTree() method instead of ScanRef(), because ScanTree() visits all file paths in each commit. We need to pass our callback to the ScanTree() method so that we can save all non-matching files' OIDs into our list of OIDs to be retained; therefore we need to add a callback argument to ScanTree() in the same manner as is done for ScanRef() and various other GitScanner methods.

We also introduce additional checks in our "prune all excluded paths" test to ensure that we always retain objects when they appear in a commit to be retained and at least one of the files referencing that object ID does not match the lfs.fetchexclude filter.

/cc @larsxschneider as author of #2851.

In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, this filter is applied using the GitScanner.ScanRef()
method, which indirectly utilizes the internal scanRefsToChan()
function, and that function only visits unique OIDs a single
time each, even if they are referenced by multiple tree entries
(i.e., if there are multiple files with the same content).

This means that if an LFS object appears in both a file that
matches a pattern from "lfs.fetchexclude" and in a file that
does not match, the object may be pruned if the file path seen
during the scan is the matching one regardless of whether the
non-matching file would otherwise have its object retained.

To resolve this we change the pruneTaskGetRetainedAtRef()
function to use the GitScanner.ScanTree() method instead of
ScanRef(), because ScanTree() visits all file paths in each
commit.  We need to pass our callback to the ScanTree() method
so that we can save all non-matching files' OIDs into our list
of OIDs to be retained; therefore we need to add a callback
argument to ScanTree() in the same manner as is done for
ScanRef() and various other GitScanner methods.

We also introduce additional checks in our "prune all excluded
paths" test to ensure that we always retain objects when they
appear in a commit to be retained and at least one of the files
referencing that object ID does not match the "lfs.fetchexclude"
filter.
@chrisd8088 chrisd8088 requested a review from a team as a code owner April 28, 2022 03:33
@chrisd8088 chrisd8088 merged commit 33ad48c into git-lfs:main Apr 28, 2022
@chrisd8088 chrisd8088 deleted the prune-scan-refs-tree branch April 28, 2022 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants