-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Use a git-update-ref script to update references #6048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
chrisd8088
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for this proposal and PR!
Using the --stdin option to git update-ref seems like a very good change and one we should definitely try to adopt for use in the git lfs migrate command.
I've put a few notes and possible additional changes into this review, so please take a look when you get a chance and let me know what you think. As a quick summary, my main suggestions are:
- create a small wrapper in the
gitpackage for the command itself - merge the existing
UpdateRefIn()wrapper back intoUpdateRef()and simplify it - use the
-zoption and NUL-delimited input instructions - avoid using the transactional instructions with Git prior to v2.27.0
- start the command before looping over the refs to be updated and then write
updateinstructions to the command's input pipe within the loop
I hope those things all make sense! Please do let me know your thoughts, and if you get a chance to test my suggestions against your real-world use cases, that would be fantastic.
Thank you very much against for this great improvement!
b47ba66 to
c5d6f75
Compare
chrisd8088
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the additional changes! I think this is looking really good. I'll the CI suite run, although I expect it may complain about the ordering of modules in the import statement.
I'm also going to ask for a second pair of eyes from @git-lfs/core because this PR now includes a chunk of my own code. Other than that, though, I think we're set to go.
As for handling the ref updates in a series of transactions instead of one, my own sense is that we should defer that to a follow-up PR if it proves to be necessary, but I'm interested in other opinions.
Thanks again very much for this idea and implementation!
Invoking git-update-ref to rewrite a single reference is a noticeable performance bottleneck during LFS migration in a repository with a lot of references. Particularly, it may take ~3 minutes to update ~20k references. The patch makes Git LFS update all references at once by generating a reference update script and piping it into git-update-ref. Please note that logic that handles nested tags in updateOneRef() assumes that the tag referenced by the tag being updated has already been updated in the Git storage, this assumption no longer holds. The patch tackles the issue by caching the new state of the reference in the map 'seen'.
The routine git.UpdateRefIn() is no longer invoked outside the git module.
6aaae81 to
0f2697c
Compare
larsxschneider
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM
As suggested by larsxschneider on PR review, we can simplify the name of the new UpdateRefsFromStdinInDir() function we added in a previous commit in this PR to just UpdateRefsFromStdin(), without much loss of clarity. There is one other Git command wrapper function in our "git" package, the AllRefsIn() function, which also sets the Git command's working directory to an explicit value, which is why this function's name has an "In" suffix, as did the UpdateRefsIn() function we removed in a previous commit in this PR. On the other hand, the NewLsFiles() and NewRevListScanner() constructor functions in our "git" package also set the working directory for the Git commands they perform, and their names do not explicitly indicate this behaviour, so we have a precedent for just naming our new function without any special suffix. Like these constructor functions, we rely on the function's argument list and the comments which precede the function to clarify its actions.
Invoking
git-update-refto rewrite a single reference is a noticeable performance bottleneck during LFS migration in a repository with a lot of references. Particularly, it may take ~3 minutes to update ~20k references.The pull request makes Git LFS update all references at once by generating a reference update script and piping it into
git-update-ref.Please note that logic that handles nested tags in
updateOneRef()assumes that the tag referenced by the tag being updated has already been updated in the Git storage, this assumption no longer holds. The pull request tackles the issue by caching the new state of the reference in the mapseen.