Skip to content

Conversation

@alexkad0
Copy link
Contributor

@alexkad0 alexkad0 commented May 3, 2025

Invoking git-update-ref to rewrite a single reference is a noticeable performance bottleneck during LFS migration in a repository with a lot of references. Particularly, it may take ~3 minutes to update ~20k references.

The pull request makes Git LFS update all references at once by generating a reference update script and piping it into git-update-ref.

Please note that logic that handles nested tags in updateOneRef() assumes that the tag referenced by the tag being updated has already been updated in the Git storage, this assumption no longer holds. The pull request tackles the issue by caching the new state of the reference in the map seen.

@alexkad0 alexkad0 requested a review from a team as a code owner May 3, 2025 23:06
@chrisd8088 chrisd8088 added git-core migration Related to `git lfs migrate` enhancement labels May 7, 2025
Copy link
Member

@chrisd8088 chrisd8088 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for this proposal and PR!

Using the --stdin option to git update-ref seems like a very good change and one we should definitely try to adopt for use in the git lfs migrate command.

I've put a few notes and possible additional changes into this review, so please take a look when you get a chance and let me know what you think. As a quick summary, my main suggestions are:

  • create a small wrapper in the git package for the command itself
  • merge the existing UpdateRefIn() wrapper back into UpdateRef() and simplify it
  • use the -z option and NUL-delimited input instructions
  • avoid using the transactional instructions with Git prior to v2.27.0
  • start the command before looping over the refs to be updated and then write update instructions to the command's input pipe within the loop

I hope those things all make sense! Please do let me know your thoughts, and if you get a chance to test my suggestions against your real-world use cases, that would be fantastic.

Thank you very much against for this great improvement!

@alexkad0 alexkad0 force-pushed the use-update-ref-script branch from b47ba66 to c5d6f75 Compare May 10, 2025 12:38
Copy link
Member

@chrisd8088 chrisd8088 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the additional changes! I think this is looking really good. I'll the CI suite run, although I expect it may complain about the ordering of modules in the import statement.

I'm also going to ask for a second pair of eyes from @git-lfs/core because this PR now includes a chunk of my own code. Other than that, though, I think we're set to go.

As for handling the ref updates in a series of transactions instead of one, my own sense is that we should defer that to a follow-up PR if it proves to be necessary, but I'm interested in other opinions.

Thanks again very much for this idea and implementation!

alexkad0 added 2 commits May 17, 2025 13:38
Invoking git-update-ref to rewrite a single reference is a noticeable
performance bottleneck during LFS migration in a repository with a lot
of references. Particularly, it may take ~3 minutes to update ~20k
references.

The patch makes Git LFS update all references at once by generating
a reference update script and piping it into git-update-ref.

Please note that logic that handles nested tags in updateOneRef()
assumes that the tag referenced by the tag being updated has already
been updated in the Git storage, this assumption no longer holds.
The patch tackles the issue by caching the new state of the reference
in the map 'seen'.
The routine git.UpdateRefIn() is no longer invoked
outside the git module.
@alexkad0 alexkad0 force-pushed the use-update-ref-script branch from 6aaae81 to 0f2697c Compare May 17, 2025 12:51
Copy link
Member

@larsxschneider larsxschneider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! LGTM

As suggested by larsxschneider on PR review, we can simplify the name
of the new UpdateRefsFromStdinInDir() function we added in a previous
commit in this PR to just UpdateRefsFromStdin(), without much loss
of clarity.

There is one other Git command wrapper function in our "git" package,
the AllRefsIn() function, which also sets the Git command's working
directory to an explicit value, which is why this function's name
has an "In" suffix, as did the UpdateRefsIn() function we removed
in a previous commit in this PR.

On the other hand, the NewLsFiles() and NewRevListScanner() constructor
functions in our "git" package also set the working directory for
the Git commands they perform, and their names do not explicitly
indicate this behaviour, so we have a precedent for just naming our
new function without any special suffix.  Like these constructor
functions, we rely on the function's argument list and the comments
which precede the function to clarify its actions.
@chrisd8088 chrisd8088 merged commit b23bc04 into git-lfs:main May 22, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement git-core migration Related to `git lfs migrate`

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants