Skip to content

Conversation

@larsxschneider
Copy link
Member

@larsxschneider larsxschneider commented Feb 1, 2018

People use 'lfs.fetchexclude' to limit the download of Git LFS content.
If a user excludes Git LFS files with 'lfs.fetchexclude' after a full
clone, then Git LFS would not smudge the files in the working directory
but it would keep the LFS content in the LFS storage directory (usually
.git/lfs/objects). Git LFS would keep the LFS content even after a
'git lfs prune' run.

Change that behavior and prune all objects that match the
'lfs.fetchexclude' pattern independent of their age.


⚠️ I am unfamiliar with these areas of the code. Please review carefully! All good ideas stolen from @ttaylorr 's great work in #2847 !

@larsxschneider
Copy link
Member Author

test-fetch.sh fails right now because git lfs fsck does not respect the exclude pattern here. As soon as #2847 is merged (and this PR rebased) the checks should pass.

Copy link
Contributor

@ttaylorr ttaylorr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I just left a quick note about where I think the filtering could take place.


retainChan <- p.Oid
tracerx.Printf("RETAIN: %v via ref %v", p.Oid, ref)
if gitscanner.Filter.Allows(p.Name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the behavior that you're looking for, but other APIs from the gitscanner code do this filtering from within the lfs package, such that file- (or pointer-) names that don't pass the filter aren't propagated out to the callbacks.

For consistency: what do you think about moving those checks there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course! That was just necessary because I haven't used your changes from #2847 ... stupid me. Now everything is in master and rebased my branch here.

retainChan := make(chan string, 100)

gitscanner := lfs.NewGitScanner(nil)
gitscanner.Filter = filepathfilter.New(nil, cfg.FetchExcludePaths())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think we can avoid the include filters, since we only want to not prune files that have been explicitly excluded from being downloaded in the first place.

People use 'lfs.fetchexclude' to limit the download of Git LFS content.
If a user excludes Git LFS files with 'lfs.fetchexclude' after a full
clone, then Git LFS would not smudge the files in the working directory
but it would keep the LFS content in the LFS storage directory (usually
.git/lfs/objects). Git LFS would keep the LFS content even after a
'git lfs prune' run.

Change that behavior and prune all objects that match the
'lfs.fetchexclude' pattern independent of their age.
Use the same code structure in pruneTaskGetPreviousVersionsOfRef() as
in pruneTaskGetRetainedAtRef().
}

retainChan <- p.Oid
tracerx.Printf("RETAIN: %v via ref %v >= %v", p.Oid, ref, since)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess aee2e19 is just my OCD 😄
Let me know if you don't like this kind of stuff and I remove the commit from the PR!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's great! Thank you.

@ttaylorr ttaylorr merged commit 8eeaa8d into master Feb 14, 2018
@ttaylorr ttaylorr deleted the lars/prune-excluded branch February 14, 2018 01:52
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 20, 2022
In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, while this filter was applied to files referenced by
recent refs (including HEAD), it was not applied to files
referenced only by recent commits previous to a recent ref.
This could result in the unexpected consequence that an object
in HEAD would be pruned because it matched the "lfs.fetchexclude"
filter, but its previous version was not pruned.

We therefore add equivalent filtering to the logPreviousSHAs()
internal function which is called by the ScanPreviousVersions()
GitScanner method used by pruneTaskGetPreviousVersionsOfRef()
in the "git lfs prune" command's main phase.

(Note that in PR git-lfs#1743 the Git log scanning functions were refactored,
and a common parseScannerLogOutput() internal function was created to
be used when performing log scans during "git lfs prune" commands.
This replaced the original logPreviousSHAs() function, which had
support for path-based filtering; however, that functionality was
apparently never used.)

We also add a test which confirms that "git lfs prune" respects the
"lfs.fetchexclude" configuration option, that it applies it to all
files except those referenced by unpushed commits or stashes, and
that it uses gitignore(5)-style path matching, as per our change in
a prior commit in this PR.  This test will fail if gitattributes(5)-
tyle matching is used, and will also fail if files are not filtered
when they are referenced by recent previous commits.

Note that we also add one extra check to the existing "prune
unreferenced and old" test for consistency with our new test.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 22, 2022
In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, while this filter was applied to files referenced by
recent refs (including HEAD), it was not applied to files
referenced only by recent commits previous to a recent ref.
This could result in the unexpected consequence that an object
in HEAD would be pruned because it matched the "lfs.fetchexclude"
filter, but its previous version was not pruned.

We therefore add equivalent filtering to the logPreviousSHAs()
internal function which is called by the ScanPreviousVersions()
GitScanner method used by pruneTaskGetPreviousVersionsOfRef()
in the "git lfs prune" command's main phase.

(Note that in PR git-lfs#1743 the Git log scanning functions were refactored,
and a common parseScannerLogOutput() internal function was created to
be used when performing log scans during "git lfs prune" commands.
This replaced the original logPreviousSHAs() function, which had
support for path-based filtering; however, that functionality was
apparently never used.)

We also add a test which confirms that "git lfs prune" respects the
"lfs.fetchexclude" configuration option, that it applies it to all
files except those referenced by unpushed commits or stashes, and
that it uses gitignore(5)-style path matching, as per our change in
a prior commit in this PR.  This test will fail if gitattributes(5)-
tyle matching is used, and will also fail if files are not filtered
when they are referenced by recent previous commits.

Note that we also add one extra check to the existing "prune
unreferenced and old" test for consistency with our new test.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 22, 2022
In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, this filter was applied using the GitScanner.ScanRef()
method, which utilizes the internal scanRefsToChan() function,
and that function only visits unique OIDs a single time each,
even if they are referenced by multiple tree entries (i.e.,
if there are multiple files with the same content).

This means that if an LFS object appears in both a file that
matches a pattern from "lfs.fetchexclude" and in a file that
does not match, the object may be pruned if the file path seen
during the scan is the matching one regardless of whether the
non-matching file would otherwise be retained.

To resolve this we change the pruneTaskGetRetainedAtRef()
function to use the GitScanner.ScanTree() method instead of
ScanRef(), because ScanTree() visits all file paths in each
commit.  We need to pass our callback to the ScanTree() method
so that we can save all non-matching file's OIDs into our list
of OIDs to be retained; therefore we need to add a callback
argument to ScanTree() in the same manner as is done for
ScanRef() and various other GitScanner methods.

We also introduce additional checks in our "prune unreferenced and
old except in excluded paths" test to ensure that we always
retain objects when they appear in a commit to be retained and
at least one of the files referencing that object ID does not
match the "lfs.fetchexclude" filter.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
The "prune keep unpushed" test ensures that Git LFS objects
referenced by unpushed commits are always retained when the
"git lfs prune" command is run.

However, it does not at present check that objects belonging to
files which match a "lfs.fetchexclude" pattern, which would
otherwise be pruned regardless of their status, will be retained
because they have not yet been pushed.  Logic in the "git lfs prune"
command to prune objects from files matching the "lfs.fetchexclude"
filter was added in PR git-lfs#2851, and it should demonstrate this
behaviour, but no checks actually confirm it.

We therefore expand the "prune keep unpushed" test so that its
simulated commit history includes objects whose file path matches
a path we then exclude using the "lfs.fetchexclude" filter.  We then
confirm that all these objects are retained by "git lfs prune" until
they are pushed, after which they are all pruned by the same command,
even the object referenced by the new HEAD ref.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, while this filter was applied to files referenced by
recent refs (including HEAD), it was not applied to files
referenced only by recent commits previous to a recent ref.
This could result in the unexpected consequence that an object
in HEAD would be pruned because it matched the "lfs.fetchexclude"
filter, but its previous version was not pruned.

We therefore add equivalent filtering to the logPreviousSHAs()
internal function which is called by the ScanPreviousVersions()
GitScanner method used by pruneTaskGetPreviousVersionsOfRef()
in the "git lfs prune" command's main phase.

(Note that in PR git-lfs#1743 the Git log scanning functions were refactored,
and a common parseScannerLogOutput() internal function was created to
be used when performing log scans during "git lfs prune" commands.
This replaced the original logPreviousSHAs() function, which had
support for path-based filtering; however, that functionality was
apparently never used.)

We also add a test which confirms that "git lfs prune" respects
the "lfs.fetchexclude" configuration option insofar as it prunes
Git LFS objects for files whose paths match a pattern in the filter,
both when they appear in commits directly referenced by a recent ref
and when they only appear in commits previous to those.

Our test also checks that "git lfs prune" applies the file using
gitignore(5)-style path matching, as per our change in a prior commit
in this PR.  This test will fail if gitattributes(5)-style matching
is used, and will also fail if files are not filtered when they
are referenced by recent previous commits.

Note that we also add one extra check to the existing "prune
unreferenced and old" test for consistency with our new test.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
The "prune worktree" test ensures that Git LFS objects
referenced by worktrees are always retained when the "git lfs
prune" command is run.

However, it does not at present check that objects belonging to
files which match a "lfs.fetchexclude" pattern are pruned even
when they are referenced by a commit at the HEAD of a worktree.
Logic in the "git lfs prune" command to prune objects from files
matching the "lfs.fetchexclude" filter was added in PR git-lfs#2851,
and it should demonstrate this behaviour, but no checks actually
confirm it.

We therefore expand the "prune worktree" test so that its
simulated commit history includes objects in each worktree's
branch whose file path matches a path we then exclude using the
"lfs.fetchexclude" filter.  We then confirm that these objects
are pruned by "git lfs prune", despite being referenced by the
worktrees' HEAD refs.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, this filter was applied using the GitScanner.ScanRef()
method, which utilizes the internal scanRefsToChan() function,
and that function only visits unique OIDs a single time each,
even if they are referenced by multiple tree entries (i.e.,
if there are multiple files with the same content).

This means that if an LFS object appears in both a file that
matches a pattern from "lfs.fetchexclude" and in a file that
does not match, the object may be pruned if the file path seen
during the scan is the matching one regardless of whether the
non-matching file would otherwise be retained.

To resolve this we change the pruneTaskGetRetainedAtRef()
function to use the GitScanner.ScanTree() method instead of
ScanRef(), because ScanTree() visits all file paths in each
commit.  We need to pass our callback to the ScanTree() method
so that we can save all non-matching file's OIDs into our list
of OIDs to be retained; therefore we need to add a callback
argument to ScanTree() in the same manner as is done for
ScanRef() and various other GitScanner methods.

We also introduce additional checks in our "prune all excluded
paths" test to ensure that we always retain objects when they
appear in a commit to be retained and at least one of the files
referencing that object ID does not match the "lfs.fetchexclude"
filter.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
The "prune keep unpushed" test ensures that Git LFS objects
referenced by unpushed commits are always retained when the
"git lfs prune" command is run.

However, it does not at present check that objects belonging to
files which match a "lfs.fetchexclude" pattern, which would
otherwise be pruned regardless of their status, will be retained
because they have not yet been pushed.  Logic in the "git lfs prune"
command to prune objects from files matching the "lfs.fetchexclude"
filter was added in PR git-lfs#2851, and it should demonstrate this
behaviour, but no checks actually confirm it.

We therefore expand the "prune keep unpushed" test so that its
simulated commit history includes objects whose file path matches
a path we then exclude using the "lfs.fetchexclude" filter.  We then
confirm that all these objects are retained by "git lfs prune" until
they are pushed, after which they are all pruned by the same command,
even the object referenced by the new HEAD ref.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
The "prune keep unpushed" test ensures that Git LFS objects
referenced by unpushed commits are always retained when the
"git lfs prune" command is run.

However, it does not at present check that objects belonging to
files which match a "lfs.fetchexclude" pattern, which would
otherwise be pruned regardless of their status, will be retained
because they have not yet been pushed.  Logic in the "git lfs prune"
command to prune objects from files matching the "lfs.fetchexclude"
filter was added in PR git-lfs#2851, and it should demonstrate this
behaviour, but no checks actually confirm it.

We therefore expand the "prune keep unpushed" test so that its
simulated commit history includes objects whose file path matches
a path we then exclude using the "lfs.fetchexclude" filter.  We then
confirm that all these objects are retained by "git lfs prune" until
they are pushed, after which they are all pruned by the same command,
even the object referenced by the new HEAD ref.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
The "prune worktree" test ensures that Git LFS objects
referenced by worktrees are always retained when the "git lfs
prune" command is run.

However, it does not at present check that objects belonging to
files which match a "lfs.fetchexclude" pattern are pruned even
when they are referenced by a commit at the HEAD of a worktree.
Logic in the "git lfs prune" command to prune objects from files
matching the "lfs.fetchexclude" filter was added in PR git-lfs#2851,
and it should demonstrate this behaviour, but no checks actually
confirm it.

We therefore expand the "prune worktree" test so that its
simulated commit history includes objects in each worktree's
branch whose file path matches a path we then exclude using the
"lfs.fetchexclude" filter.  We then confirm that these objects
are pruned by "git lfs prune", despite being referenced by the
worktrees' HEAD refs.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, while this filter was applied to files referenced by
recent refs (including HEAD), it was not applied to files
referenced only by recent commits previous to a recent ref.
This could result in the unexpected consequence that an object
in HEAD would be pruned because it matched the "lfs.fetchexclude"
filter, but its previous version was not pruned.

We therefore add equivalent filtering to the logPreviousSHAs()
internal function which is called by the ScanPreviousVersions()
GitScanner method used by pruneTaskGetPreviousVersionsOfRef()
in the "git lfs prune" command's main phase.

(Note that in PR git-lfs#1743 the Git log scanning functions were refactored,
and a common parseScannerLogOutput() internal function was created to
be used when performing log scans during "git lfs prune" commands.
This replaced the original logPreviousSHAs() function, which had
support for path-based filtering; however, that functionality was
apparently never used.)

We also add a test which confirms that "git lfs prune" respects
the "lfs.fetchexclude" configuration option insofar as it prunes
Git LFS objects for files whose paths match a pattern in the filter,
both when they appear in commits directly referenced by a recent ref
and when they only appear in commits previous to those.

Our test also checks that "git lfs prune" applies the file using
gitignore(5)-style path matching, as per our change in a prior commit
in this PR.  This test will fail if gitattributes(5)-style matching
is used, and will also fail if files are not filtered when they
are referenced by recent previous commits.

Note that we also add one extra check to the existing "prune
unreferenced and old" test for consistency with our new test.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, this filter was applied using the GitScanner.ScanRef()
method, which utilizes the internal scanRefsToChan() function,
and that function only visits unique OIDs a single time each,
even if they are referenced by multiple tree entries (i.e.,
if there are multiple files with the same content).

This means that if an LFS object appears in both a file that
matches a pattern from "lfs.fetchexclude" and in a file that
does not match, the object may be pruned if the file path seen
during the scan is the matching one regardless of whether the
non-matching file would otherwise be retained.

To resolve this we change the pruneTaskGetRetainedAtRef()
function to use the GitScanner.ScanTree() method instead of
ScanRef(), because ScanTree() visits all file paths in each
commit.  We need to pass our callback to the ScanTree() method
so that we can save all non-matching file's OIDs into our list
of OIDs to be retained; therefore we need to add a callback
argument to ScanTree() in the same manner as is done for
ScanRef() and various other GitScanner methods.

We also introduce additional checks in our "prune all excluded
paths" test to ensure that we always retain objects when they
appear in a commit to be retained and at least one of the files
referencing that object ID does not match the "lfs.fetchexclude"
filter.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
The "prune keep unpushed" test ensures that Git LFS objects
referenced by unpushed commits are always retained when the
"git lfs prune" command is run.

However, it does not at present check that objects belonging to
files which match a "lfs.fetchexclude" pattern, which would
otherwise be pruned regardless of their status, will be retained
because they have not yet been pushed.  Logic in the "git lfs prune"
command to prune objects from files matching the "lfs.fetchexclude"
filter was added in PR git-lfs#2851, and it should demonstrate this
behaviour, but no checks actually confirm it.

We therefore expand the "prune keep unpushed" test so that its
simulated commit history includes objects whose file path matches
a path we then exclude using the "lfs.fetchexclude" filter.  We then
confirm that all these objects are retained by "git lfs prune" until
they are pushed, after which they are all pruned by the same command,
even the object referenced by the new HEAD ref.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
The "prune worktree" test ensures that Git LFS objects
referenced by worktrees are always retained when the "git lfs
prune" command is run.

However, it does not at present check that objects belonging to
files which match a "lfs.fetchexclude" pattern are pruned even
when they are referenced by a commit at the HEAD of a worktree.
Logic in the "git lfs prune" command to prune objects from files
matching the "lfs.fetchexclude" filter was added in PR git-lfs#2851,
and it should demonstrate this behaviour, but no checks actually
confirm it.

We therefore expand the "prune worktree" test so that its
simulated commit history includes objects in each worktree's
branch whose file path matches a path we then exclude using the
"lfs.fetchexclude" filter.  We then confirm that these objects
are pruned by "git lfs prune", despite being referenced by the
worktrees' HEAD refs.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
The "prune keep unpushed" test ensures that Git LFS objects
referenced by unpushed commits are always retained when the
"git lfs prune" command is run.

However, it does not at present check that objects belonging to
files which match a "lfs.fetchexclude" pattern, which would
otherwise be pruned regardless of their status, will be retained
because they have not yet been pushed.  Logic in the "git lfs prune"
command to prune objects from files matching the "lfs.fetchexclude"
filter was added in PR git-lfs#2851, and it should demonstrate this
behaviour, but no checks actually confirm it.

We therefore expand the "prune keep unpushed" test so that its
simulated commit history includes objects whose file path matches
a path we then exclude using the "lfs.fetchexclude" filter.  We then
confirm that all these objects are retained by "git lfs prune" until
they are pushed, after which they are all pruned by the same command,
even the object referenced by the new HEAD ref.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 25, 2022
The "prune worktree" test ensures that Git LFS objects
referenced by worktrees are always retained when the "git lfs
prune" command is run.

However, it does not at present check that objects belonging to
files which match a "lfs.fetchexclude" pattern are pruned even
when they are referenced by a commit at the HEAD of a worktree.
Logic in the "git lfs prune" command to prune objects from files
matching the "lfs.fetchexclude" filter was added in PR git-lfs#2851,
and it should demonstrate this behaviour, but no checks actually
confirm it.

We therefore expand the "prune worktree" test so that its
simulated commit history includes objects in each worktree's
branch whose file path matches a path we then exclude using the
"lfs.fetchexclude" filter.  We then confirm that these objects
are pruned by "git lfs prune", despite being referenced by the
worktrees' HEAD refs.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 26, 2022
In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, while this filter was applied to files referenced by
recent refs (including HEAD), it was not applied to files
referenced only by recent commits previous to a recent ref.
This could result in the unexpected consequence that an object
in HEAD would be pruned because it matched the "lfs.fetchexclude"
filter, but its previous version was not pruned.

We therefore add equivalent filtering to the logPreviousSHAs()
internal function which is called by the ScanPreviousVersions()
GitScanner method used by pruneTaskGetPreviousVersionsOfRef()
in the "git lfs prune" command's main phase.

(Note that in PR git-lfs#1743 the Git log scanning functions were refactored,
and a common parseScannerLogOutput() internal function was created to
be used when performing log scans during "git lfs prune" commands.
This replaced the original logPreviousSHAs() function, which had
support for path-based filtering; however, that functionality was
apparently never used.)

We also add a test which confirms that "git lfs prune" respects
the "lfs.fetchexclude" configuration option insofar as it prunes
Git LFS objects for files whose paths match a pattern in the filter,
both when they appear in commits directly referenced by a recent ref
and when they only appear in commits previous to those.

Our test also checks that "git lfs prune" applies the file using
gitignore(5)-style path matching, as per our change in a prior commit
in this PR.  This test will fail if gitattributes(5)-style matching
is used, and will also fail if files are not filtered when they
are referenced by recent previous commits.

Note that we also add one extra check to the existing "prune
unreferenced and old" test for consistency with our new test.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 26, 2022
In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, while this filter was applied to files referenced by
recent refs (including HEAD), it was not applied to files referenced
only by recent commits previous to a recent ref.  This can result in
the unexpected consequence that an object in HEAD is pruned because
it is referenced only by a file that matches the "lfs.fetchexclude"
filter, but a recent previous version of the object is not pruned
despite also only being referenced by the same file.

We therefore add equivalent filtering to the logPreviousSHAs()
internal function which is called by the ScanPreviousVersions()
GitScanner method used by pruneTaskGetPreviousVersionsOfRef()
in the "git lfs prune" command's main phase.

(Note that in PR git-lfs#1743 the Git log scanning functions were refactored,
and a common parseScannerLogOutput() internal function was created to
be used when performing log scans during "git lfs prune" commands.
This replaced the original logPreviousSHAs() function, which had
support for path-based filtering; however, that functionality was
apparently never used.)

We also add a test which confirms that "git lfs prune" respects
the "lfs.fetchexclude" configuration option insofar as it prunes
Git LFS objects for files whose paths match a pattern in the filter,
both when they appear in commits directly referenced by a recent ref
and when they only appear in commits previous to those.

Note too that we add one extra check to the existing "prune
unreferenced and old" test for consistency with our new test.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 27, 2022
In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, this filter was applied using the GitScanner.ScanRef()
method, which utilizes the internal scanRefsToChan() function,
and that function only visits unique OIDs a single time each,
even if they are referenced by multiple tree entries (i.e.,
if there are multiple files with the same content).

This means that if an LFS object appears in both a file that
matches a pattern from "lfs.fetchexclude" and in a file that
does not match, the object may be pruned if the file path seen
during the scan is the matching one regardless of whether the
non-matching file would otherwise be retained.

To resolve this we change the pruneTaskGetRetainedAtRef()
function to use the GitScanner.ScanTree() method instead of
ScanRef(), because ScanTree() visits all file paths in each
commit.  We need to pass our callback to the ScanTree() method
so that we can save all non-matching file's OIDs into our list
of OIDs to be retained; therefore we need to add a callback
argument to ScanTree() in the same manner as is done for
ScanRef() and various other GitScanner methods.

We also introduce additional checks in our "prune all excluded
paths" test to ensure that we always retain objects when they
appear in a commit to be retained and at least one of the files
referencing that object ID does not match the "lfs.fetchexclude"
filter.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 28, 2022
In commit d2221dc of PR git-lfs#2851
the "git lfs prune" command was changed to respect the
"lfs.fetchexclude" configuration option such that objects would
always be pruned if they were referenced by files whose paths
matched one of the patterns in the configuration option (unless
they were referenced by an unpushed commit).

However, this filter is applied using the GitScanner.ScanRef()
method, which indirectly utilizes the internal scanRefsToChan()
function, and that function only visits unique OIDs a single
time each, even if they are referenced by multiple tree entries
(i.e., if there are multiple files with the same content).

This means that if an LFS object appears in both a file that
matches a pattern from "lfs.fetchexclude" and in a file that
does not match, the object may be pruned if the file path seen
during the scan is the matching one regardless of whether the
non-matching file would otherwise have its object retained.

To resolve this we change the pruneTaskGetRetainedAtRef()
function to use the GitScanner.ScanTree() method instead of
ScanRef(), because ScanTree() visits all file paths in each
commit.  We need to pass our callback to the ScanTree() method
so that we can save all non-matching files' OIDs into our list
of OIDs to be retained; therefore we need to add a callback
argument to ScanTree() in the same manner as is done for
ScanRef() and various other GitScanner methods.

We also introduce additional checks in our "prune all excluded
paths" test to ensure that we always retain objects when they
appear in a commit to be retained and at least one of the files
referencing that object ID does not match the "lfs.fetchexclude"
filter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants