Avoid pruning when identical files both match and do not match lfs.fetchexclude
#4973
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In PR #2851 the
git lfs prunecommand was changed to respect thelfs.fetchexcludeconfiguration option such that objects would always be pruned if they were referenced by files whose paths matched one of the patterns in the configuration option (unless they were referenced by an unpushed commit).However, this filter is applied using the
GitScanner.ScanRef()method, which indirectly utilizes the internalscanRefsToChan()function, and that function only visits unique Git LFS OIDs a single time each, even if they are referenced by multiple tree entries (i.e., if there are multiple files with the same content).This means that if an LFS object appears in both a file that matches a pattern from
lfs.fetchexcludeand in a file thatdoes not match, the object may be pruned if the file path seen during the scan is the matching one regardless of whether the non-matching file would otherwise have its object retained.
To resolve this we change the
pruneTaskGetRetainedAtRef()function to use theGitScanner.ScanTree()method instead ofScanRef(), becauseScanTree()visits all file paths in each commit. We need to pass our callback to theScanTree()method so that we can save all non-matching files' OIDs into our list of OIDs to be retained; therefore we need to add a callback argument toScanTree()in the same manner as is done forScanRef()and various otherGitScannermethods.We also introduce additional checks in our
"prune all excluded paths"test to ensure that we always retain objects when they appear in a commit to be retained and at least one of the files referencing that object ID does not match thelfs.fetchexcludefilter./cc @larsxschneider as author of #2851.