Skip to content

Conversation

@ttaylorr
Copy link
Contributor

@ttaylorr ttaylorr commented Jun 23, 2017

This pull request implements the git-lfs-migrate(1) 'import' subcommand.

The 'import' subcommand is designed to convert large blobs stored in Git history to LFS pointer files based on the --include, --exclude, and --include-ref, --exclude-ref flags. It works by calling the commands.clean() function with the blob loaded in memory, and then a) storing that blob's contents in .git/lfs/objects and b) writing out an LFS pointer file in its place in the githistory.

Regarding some of the notes from #2146 about when/where to insert .gitattributes changes @technoweenie:

When is .gitattributes built? I think it'd be cool to slip into the first commit or something.

Right now I'm adding a .gitattributes entry into the first commit that has any paths matching those in the -I or -X filter, but this could easily be changed to go anywhere before that point in history. I think my preference would be to delay adding the .gitattributes entry until the first commit at which that object appears, since that brings us closer to the 1:1 mapping of old history to new, and to me, is clearer in terms of not having a (potentially) huge .gitattributes entry up front.

and @andyneff:

I am actually more inclined to lean towards @ttaylorr's first implementation. I like the idea that the commits before the LFS files are still the same shas, and thus the graphs are topologically connected to each other. I don't know. I just feels good to me ;)

I initially implemented this according to my original suggestion of appending patterns to the .gitattributes file when a file of that kind was first rewritten. This is problematic for two reasons:

  1. It involves duplicate work to figure out which pattern included/excluded a file (see: filepathfilter: teach AllowsPattern() #2341).
  2. It does not work for -X, --exclude flags. The BlobFn (used to rewrite blobs) is only called on blobs that do match the filter, not ones that don't. This prevents us from ever seeing tree entries that are excluded from the filterset, thus never presenting us an opportunity to add the negative matches to the .gitattributes.

Instead, I add the .gitattributes changes to the first commit that we migrate by writing lines like:

pattern merge=lfs filter=lfs diff=lfs -text

and adding negative entries for --exclude'd patterns like:

pattern text -merge -filter -diff

We keep track of these in a set of patterns that the LFS migrator has tracked, and merge them each time with the .gitattributes in the root tree therefore persisting any .gitattributes changes that exist in the original history. See:

// Create a blob of the attributes that are optionally
// present in the "t" tree's .gitattributes blob, and
// union in the patterns that we've tracked.
//
// Perform this Union() operation each time we visit a
// root tree such that if the underlying .gitattributes
// is present and has a diff between commits in the
// range of commits to migrate, those changes are
// preserved.
blob, err := trackedToBlob(db, theirs.Clone().Union(ours))

Here's some example output:

  1. Generate some commits ahead of the remote ref that contain data to be migrated to LFS:
~/g/git-lfs (migrate-demo) $ for i in $(seq 1 10); do
  base64 < /dev/urandom | head -c 64 > a.dat
  git add a.dat
  git commit -m "a.dat: $i"
  done
# ...
  1. Run git lfs migrate info on that data:
~/g/git-lfs (migrate-demo) $ git lfs migrate info --above=0b --include="*.dat"
migrate: Sorting commits: ..., done
migrate: Rewriting commits: 100% (10/10), done
*.dat   640 B   10/10 files(s)  100%
  1. Run git lfs migrate import to rewrite those commits such that the large files are tracked with LFS:
~/g/git-lfs (migrate-demo) $ git lfs migrate import --include="*.dat"
migrate: Sorting commits: ..., done
migrate: Rewriting commits: 100% (11/11), done
  migrate-demo  f18bb746d44e8ea5065fc779bb1acdf3cdae7ed8 -> 35b0fe0a7bf3ae6952ec9584895a7fb6ebcd498b
migrate: Updating refs: ..., done
  1. Observe the initial commit to show that a) original .gitattributes contents are persisted, b) new .gitattributes patterns are added accordingly, and c) the *.dat file was added to Git LFS.
~/g/git-lfs (migrate-demo) $ git show 3f65c4db7313e9f4fe0c546fcbba8d10593f2025
commit 3f65c4db7313e9f4fe0c546fcbba8d10593f2025
Author: Taylor Blau <[email protected]>
Date:   Fri Jun 23 13:07:12 2017 -0600

    a.dat: 1

diff --git a/.gitattributes b/.gitattributes
index e4c71dbf..3716304e 100644
--- a/.gitattributes
+++ b/.gitattributes
@@ -1,3 +1,4 @@
 * text=auto
 * eol=lf
 *.bat eol=crlf
+*.dat filter=lfs diff=lfs merge=lfs -text
diff --git a/a.dat b/a.dat
new file mode 100644
index 00000000..ec26fbcd
--- /dev/null
+++ b/a.dat
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:86df498e7a88666447e0c540defbef3f2b2091eecba3c0b7c9ac21962aaef576
+size 64
  1. Observe an interdiff to show that changes are persisted through the migration and the diffs apply cleanly:
~/g/git-lfs (migrate-demo) $ git show 94a7b26a4260b04af84c7ce9c7fae6e6586345a6
commit 94a7b26a4260b04af84c7ce9c7fae6e6586345a6
Author: Taylor Blau <[email protected]>
Date:   Fri Jun 23 13:07:12 2017 -0600

    a.dat: 2

diff --git a/a.dat b/a.dat
index ec26fbcd..b7fa3fba 100644
--- a/a.dat
+++ b/a.dat
@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:86df498e7a88666447e0c540defbef3f2b2091eecba3c0b7c9ac21962aaef576
+oid sha256:bb1df455c4d90dbc0c6903500c036096eeec78f1abc151997a80387c81ccd7e0
 size 64
  1. Observe that HEAD contains a checked-out LFS object, a pointer file in the index, and a corresponding entry in .git/lfs/objects:
~/g/git-lfs (migrate-demo) $ git cat-file -p :a.dat
version https://git-lfs.github.com/spec/v1
oid sha256:b91016d0851cd677a9209a0c0d11c7e9b2d5654f2baeb6485579b899cf6ffa01
size 64
~/g/git-lfs (migrate-demo) $ cat a.dat
yVnUDK0PSDqmoSvY4W/zD/a3GD/rEuPEx/kbwnjFK4E9jMLp0xRiLzCqmwwkPROO%                                                          
~/g/git-lfs (migrate-demo) $ cat .git/lfs/objects/b9/10/b91016d0851cd677a9209a0c0d11c7e9b2d5654f2baeb6485579b899cf6ffa01 
yVnUDK0PSDqmoSvY4W/zD/a3GD/rEuPEx/kbwnjFK4E9jMLp0xRiLzCqmwwkPROO%                                                          

Closes: #2146.


/cc @git-lfs/core
/refs #2146

@ttaylorr ttaylorr added this to the v2.2.0 milestone Jun 23, 2017
@ttaylorr ttaylorr requested a review from technoweenie June 23, 2017 19:17
@ttaylorr
Copy link
Contributor Author

Opened up #2358, which should cause this branch to pass on CircleCI after it's merged in.

@technoweenie technoweenie merged commit 22ff8aa into master Jun 26, 2017
@technoweenie technoweenie deleted the migrate-subcommand-import branch June 26, 2017 21:56
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 24, 2024
Our README file contains a brief note in its "Example Usage" section
stating that Git LFS requires a Git version higher than 1.8.2 on
Linux and 1.8.5 on macOS.  This statement dates from commit
59a49b0 in PR git-lfs#412 in 2015, and so
is relatively out of date.

In particular, when we added support for the "git lfs migrate" command
in PR git-lfs#2353, the actual minimum supported version of Git was changed
from 1.8.x to 1.9.0 (in commit 1d0e834)
and then to 2.0.0 (in commit 5aea841).

These changes were made to the Travis CI configuration in use at the time,
and later migrated to our current GitHub Actions CI workflow in commit
c32820806229c3f42364d989f7a8597f73cb107ba of PR git-lfs#3808.  This workflow
continues to run our Git LFS test suite using Git 2.0.0.

We therefore now update our README file to remove the outdated note
about Git 1.8.x versions, and add a paragraph to the "Limitations"
section which documents the current minimum supported Git version of
2.0.0 but also strongly advises the use of a more recent Git version.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 3, 2025
Since commit a343a11 of PR git-lfs#1461,
a number of our commands, including "git lfs pull", "git lfs push",
and "git lfs track", have checked the version of the currently
available Git program and reported an error if it was not at least
version 1.8.2.

However, when we added support for the "git lfs migrate" command
in PR git-lfs#2353, the actual minimum supported version of Git was changed
from 1.8.x to 1.9.0 (in commit 1d0e834)
and then to 2.0.0 (in commit 5aea841).

These changes were made to the Travis CI configuration in use at the time,
and later migrated to our current GitHub Actions CI workflow in commit
c32820806229c3f42364d989f7a8597f73cb107ba of PR git-lfs#3808.  This workflow
continues to run our Git LFS test suite using Git 2.0.0.

More recently, in commit 1501265 of
PR git-lfs#5921, we updated our README file to document that the current
minimum supported version of Git we require is v2.0.0.

We therefore now update the minimum Git version required by the Git LFS
client to 2.0.0 by adjusting the version string defined in the
requireGitVersion() function of our "commands" package.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Apr 3, 2025
Since commit a343a11 of PR git-lfs#1461,
a number of our commands, including "git lfs pull", "git lfs push",
and "git lfs track", have checked the version of the currently
available Git program and reported an error if it was not at least
version 1.8.2.

However, when we added support for the "git lfs migrate" command
in PR git-lfs#2353, the actual minimum supported version of Git was changed
from 1.8.x to 1.9.0 (in commit 1d0e834)
and then to 2.0.0 (in commit 5aea841).

These changes were made to the Travis CI configuration in use at the time,
and later migrated to our current GitHub Actions CI workflow in commit
c32820806229c3f42364d989f7a8597f73cb107ba of PR git-lfs#3808.  This workflow
continues to run our Git LFS test suite using Git 2.0.0.

More recently, in commit 1501265 of
PR git-lfs#5921, we updated our README file to document that the current
minimum supported version of Git we require is v2.0.0.

We therefore now update the minimum Git version required by the Git LFS
client to 2.0.0 by adjusting the version string defined in the
requireGitVersion() function of our "commands" package.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Jun 19, 2025
When we build Debian and RPM Linux packages, we define the minimum
versions of Git and Go required by the Git LFS client.  However,
the minimum versions we specify are at present somewhat out of date.

Specifically, both the "control" file for our Debian packages and the
SPEC file for our RPM packages state that we require at least Git
version 1.8.2, and the former also specifies that we require at least
Go version 1.12.0.

In practice, though, since we introduced the "git lfs migrate" command
in PR git-lfs#2353, Git v2.0.0 has been the earliest version of Git we support,
as per commit 5aea841 of that PR.

We have also required at least Go v1.23.0 to build the Git LFS client
since commit 70e23fa of PR git-lfs#5997,
when we updated the minimum version of the x/crypto Go module specified
in our "go.mod" file and the "go mod tidy" command then also updated
the minimum required version of Go to 1.23.0.

Because we anticipate making a v3.7.0 release of the Git LFS client in
the near future, we now update the "control" file for our Debian
packages and the SPEC file for our RPM packages to indicate that
the Git LFS client requires at least Git v2.0.0 and Go v1.23.0.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 26, 2025
When the "git lfs migrate import" subcommand was implemented in PR git-lfs#2353,
a few initial tests were included, beginning with those from commit
e39a767 when the original version of
what is now our t/t-migrate-import.sh test script was first added.

Several of these tests were designed to check that files matching
certain path patterns are converted to Git LFS files, while files which
do not match those patterns are left unchanged.  For instance, the
"migrate import (default branch with filter)" test intends to check that
files matching the pattern "*.md" are converted to Git LFS by the
"git lfs migrate import" command, while files matching the pattern
"*.txt" are not converted.

One of the specific checks performed by these tests is to try to verify
that no Git LFS "filter=lfs" entry has been added to the ".gitattributes"
file for the "*.txt" path pattern.  To do this, they read the
".gitattributes" file's contents from a given branch and then pipe its
contents to a grep(1) command with the -v option, in the expectation
that this will fail if a line is found which matches a regular expression
containing the "*.txt" pattern.

However, the -v option of the "grep" command does not cause the command
to fail (i.e., exit with a non-zero value) if a line is found in its
input which matches the provided regular expression.  Rather, the -v
option causes the "grep" command to filter out any lines from its input
which match the expression, and then the exit status is determined in
the usual manner, so that the command only returns a non-zero value if
no other lines were seen in the input.

Since our tests happen to always generate entries in the ".gitattributes"
files which do not match the "*.txt" pattern, the "grep" commands with
the -v option always succeed, but without actually verifying that
entries with "*.txt" patterns do not appear in the files.  If such
an entry did appear, it would simply be filtered by the "grep" commands
and then the existence of the other lines would still allow the
commands to succeed.

In fact, this specific problem affects two of the tests in the
t/t-migrate-import.sh test script, the "migrate import (default branch,
exclude remote refs)" and "migrate import (given branch, exclude
remote refs)" tests.  In both cases, the ".gitattributes" files
that our tests currently intend to prove do not contain entries with
the "*.txt" pattern actually do contain such entries.  We therefore
correct these checks now, as discussed further below.

As additional tests have been added to the t/t-migrate-import.sh test
script over time, misuse of the "grep" command's -v option has been
accidentally propagated into a number of our tests in the script.

We therefore rewrite all of these checks so that they do not use
the -v option of the "grep" command.  Instead, we utilize the "grep"
command's -c option to produce a count of all lines matching the given
pattern, and then verify that the count is zero (except in two cases
where the existing checks are incorrect, as mentioned above).  We use
this idiom throughout many of our other test scripts, in part because
it has several advantages over other possible techniques for ensuring
that a file contains no lines matching a certain pattern.

One alternative approach used in a few of our test scripts, solely for
historical reasons, is to simply run "grep" without any options and
then check that the command's output is empty with the -z or string
comparison shell test operators.  While this generally suffices, if by
chance the "grep" command should fail with an error, the check will
still pass and the error will not be reported.

In such a case, the command would return an exit status code of 2, and
while the "errexit" shell attribute is enabled in all of our tests by the
"set -e" command, because we run the "grep" command in a subshell within
the shell "[" builtin, if the command returns a non-zero status code that
 will not cause the test's shell to exit immediately.  The command would
also likely print an error message to its standard error file descriptor
but would not write anything to its standard output, so our checks using
the -z or string comparison shell test operators would still succeed.

By comparison, when we use the -c option, if by chance the "grep" command
were to fail with an error, it would not output an integer count value
and so the test would fail.

In the case of the two specific tests where we currently attempt to
verify that an entry with the "*.txt" pattern does not appear in a
".gitattributes" file, even though these entries do appear, we simply
remove the -v option from the "grep" command rather than replace it
with the -c option.

The purpose of both of these tests, the "migrate import (default branch,
exclude remote refs)" and "migrate import (given branch, exclude remote
refs)" tests, is to demonstrate that the "git lfs migrate import" command
will not rewrite the Git history of any locally cached remote references.
The tests therefore assert that after running the "git lfs migrate import"
command, no ".gitattributes" file has been added to the Git tree of the
commit associated with the "refs/remotes/origin/main" reference.  This
check alone is sufficient to prove that the reference has not been
altered.

The two tests then proceeded to also try to check that no "*.txt"
entries had been added to the ".gitattributes" files in one of the local
references.  However, the specific "git lfs migrate import" commands
performed by the tests are expected to create such entries, since
they convert all of the files in the local branches, exactly as they
do in the comparable "migrate import (default branch)" and "migrate
import (given branch)" tests.

Finally, note that many of the regular expressions used in the "grep"
commands we modify in this commit do not properly escape the "."
character, so it will technically match any character and not just a
literal "." character, as is intended.  (The asterisk character in these
commands' patterns would also normally be parsed as a metacharacter,
but because it happens to be the first character in the expressions
it is treated as a literal "*" character.)

We could resolve this problem for the regular expressions of the specific
"grep" commands we are modifying in this commit, either by adding the -F
option to the "grep" commands or by escaping the "." character.  However,
since this issue affects multiple other patterns as well (including some
in other test scripts), we would prefer to address this issue in a more
comprehensive fashion.  We therefore defer to a future PR any revisions
of the patterns used by any "grep" commands in our test suite.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Nov 26, 2025
When the "git lfs migrate import" subcommand was implemented in PR git-lfs#2353,
a few initial tests were included, beginning with those from commit
e39a767 when the original version of
what is now our t/t-migrate-import.sh test script was first added.

Several of these tests were designed to check that files matching
certain path patterns are converted to Git LFS files, while files which
do not match those patterns are left unchanged.  For instance, the
"migrate import (default branch with filter)" test intends to check that
files matching the pattern "*.md" are converted to Git LFS by the
"git lfs migrate import" command, while files matching the pattern
"*.txt" are not converted.

One of the specific checks performed by these tests is to try to verify
that no Git LFS "filter=lfs" entry has been added to the ".gitattributes"
file for the "*.txt" path pattern.  To do this, they read the
".gitattributes" file's contents from a given branch and then pipe its
contents to a grep(1) command with the -v option, in the expectation
that this will fail if a line is found which matches a regular expression
containing the "*.txt" pattern.

However, the -v option of the "grep" command does not cause the command
to fail (i.e., exit with a non-zero value) if a line is found in its
input which matches the provided regular expression.  Rather, the -v
option causes the "grep" command to filter out any lines from its input
which match the expression, and then the exit status is determined in
the usual manner, so that the command only returns a non-zero value if
no other lines were seen in the input.

Since our tests happen to always generate entries in the ".gitattributes"
files which do not match the "*.txt" pattern, the "grep" commands with
the -v option always succeed, but without actually verifying that
entries with "*.txt" patterns do not appear in the files.  If such
an entry did appear, it would simply be filtered by the "grep" commands
and then the existence of the other lines would still allow the
commands to succeed.

In fact, this specific problem affects two of the tests in the
t/t-migrate-import.sh test script, the "migrate import (default branch,
exclude remote refs)" and "migrate import (given branch, exclude
remote refs)" tests.  In both cases, the ".gitattributes" files
that our tests currently intend to prove do not contain entries with
the "*.txt" pattern actually do contain such entries.  We therefore
correct these checks now, as discussed further below.

As additional tests have been added to the t/t-migrate-import.sh test
script over time, misuse of the "grep" command's -v option has been
accidentally propagated into a number of our tests in the script.

We therefore rewrite all of these checks so that they do not use
the -v option of the "grep" command.  Instead, we utilize the "grep"
command's -c option to produce a count of all lines matching the given
pattern, and then verify that the count is zero (except in two cases
where the existing checks are incorrect, as mentioned above).  We use
this idiom throughout many of our other test scripts, in part because
it has several advantages over other possible techniques for ensuring
that a file contains no lines matching a certain pattern.

One alternative approach used in a few of our test scripts, solely for
historical reasons, is to simply run "grep" without any options and
then check that the command's output is empty with the -z or string
comparison shell test operators.  While this generally suffices, if by
chance the "grep" command should fail with an error, the check will
still pass and the error will not be reported.

In such a case, the command would return an exit status code of 2, and
while the "errexit" shell attribute is enabled in all of our tests by the
"set -e" command, because we run the "grep" command in a subshell within
the shell "[" builtin, if the command returns a non-zero status code that
 will not cause the test's shell to exit immediately.  The command would
also likely print an error message to its standard error file descriptor
but would not write anything to its standard output, so our checks using
the -z or string comparison shell test operators would still succeed.

By comparison, when we use the -c option, if by chance the "grep" command
were to fail with an error, it would not output an integer count value
and so the test would fail.

In the case of the two specific tests where we currently attempt to
verify that an entry with the "*.txt" pattern does not appear in a
".gitattributes" file, even though these entries do appear, we simply
remove the -v option from the "grep" command rather than replace it
with the -c option.

The purpose of both of these tests, the "migrate import (default branch,
exclude remote refs)" and "migrate import (given branch, exclude remote
refs)" tests, is to demonstrate that the "git lfs migrate import" command
will not rewrite the Git history of any locally cached remote references.
The tests therefore assert that after running the "git lfs migrate import"
command, no ".gitattributes" file has been added to the Git tree of the
commit associated with the "refs/remotes/origin/main" reference.  This
check alone is sufficient to prove that the reference has not been
altered.

The two tests then proceeded to also try to check that no "*.txt"
entries had been added to the ".gitattributes" files in one of the local
references.  However, the specific "git lfs migrate import" commands
performed by the tests are expected to create such entries, since
they convert all of the files in the local branches, exactly as they
do in the comparable "migrate import (default branch)" and "migrate
import (given branch)" tests.

Finally, note that many of the regular expressions used in the "grep"
commands we modify in this commit do not properly escape the "."
character, so it will technically match any character and not just a
literal "." character, as is intended.  (The asterisk character in these
commands' patterns would also normally be parsed as a metacharacter,
but because it happens to be the first character in the expressions
it is treated as a literal "*" character.)

We could resolve this problem for the regular expressions of the specific
"grep" commands we are modifying in this commit, either by adding the -F
option to the "grep" commands or by escaping the "." character.  However,
since this issue affects multiple other patterns as well (including some
in other test scripts), we would prefer to address this issue in a more
comprehensive fashion.  We therefore defer to a future PR any revisions
of the patterns used by any "grep" commands in our test suite.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

proposal: migrate sub-command

3 participants