Skip to content

Conversation

@pastelsky
Copy link

@pastelsky pastelsky commented Sep 12, 2024

Fixes #5848.

Improves the performance of git lfs pre-push particularly for large repositories with numerous tags.

Previously, the pre-push hook would execute 'git ls-remote' with both --heads and --tags flags, which could be slow in repositories with many tags.

Tags can be omitted for pre-push however.

Improves the performance of git lfs pre-push particularly for large repositories with numerous tags.

Previously, the pre-push hook would execute 'git ls-remote' with both
--heads and --tags flags, which could be slow in repositories with many
tags.

Tags can be omitted for pre-push however.
@chrisd8088
Copy link
Member

Hey, thanks very much for the patch! I'll take a look in the next few days, and consider whether we should add a test.

Our CI jobs are also not working 100% right now, so I won't kick off CI jobs for this PR yet, but I have some fixes in the pipeline and will get those up into a PR soon, so we can return our CI suite to full functionality. Once that's done, we have a small queue of PRs like yours to merge. So, stay tuned, and thanks very much for the contribution!

@pastelsky
Copy link
Author

Thanks! My understanding was that this was just removing an unused flag from a code path, which is why I didn't add a test initially. What are the assertions we could add here in your opinion?

@chrisd8088
Copy link
Member

What are the assertions we could add here in your opinion?

Honestly, I haven't had time to even think about this yet -- it's entirely possible there's nothing we need to do. I only meant to say that I try to always consider, for any PR, whether tests are something we can add or update. It was just a generic comment.

Thanks for you patience, and once $DAYJOB is slightly less intensive, I should be able to get our CI stack working again.

@chrisd8088
Copy link
Member

Just a quick update on the state of our CI suite: PR #5866 fixed the macOS jobs, PR #5868 should fix the Windows jobs, and PR #5872 will upgrade us to Go 1.23, which should fix the Build with specific Go jobs. With any luck, we'll be back to 100% success after those are merged.

Thanks very much for your patience!

@pastelsky
Copy link
Author

pastelsky commented Sep 22, 2024 via email

In a previous commit in this PR we introduced the ignoreTags argument
to the RemoteRefs() function in our "git" package, so that we could
indicate to the function whether or not it should pass the --tags option
to the "git ls-remote" command.  When the ignoreTags flag is set, we do
not pass the --tags option to the command, only the --heads option.

As a result, the "git ls-remote" command should never return tag refs
in this case, so we can now add a check and return an error from the
RemoteRefs() function if a tag ref is found in the set of refs returned
by "git ls-remote" when ignoreTags is "true".

We also remove some excess whitespace to satisfy the "go fmt" linter.
Copy link
Member

@chrisd8088 chrisd8088 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this PR!

I realized we could go a bit further with these changes, removing some unneeded logic and adding an error check, and that we should perhaps also expand our test suite of the git lfs push command slightly to cover the case where there is a tag ref on the Git remote.

I've written up three commits in my ls-remote-skip-tags branch and I'd like to push these into this PR's branch, if that's OK with you.

We'd then solicit a second review from another member of the core team, and that should be everything we need to get this merged!

Let me know if that's all acceptable. I can also open a second PR, but I think it's easier to combine efforts here in this one, if possible. If I don't hear back in a while, I may go ahead with my plan as I outlined it; I hope that's OK.

Thanks again for the contribution!

@pastelsky
Copy link
Author

pastelsky commented Oct 4, 2024

@chrisd8088 You can add your commits to this, no issues — added you as a collaborator to my fork.

In a previous commit in this PR we revised the RemoteRefs() function
in our "git" package to take an additional ignoreTags argument, and
updated the callers of this function appropriately.  In the
calcSkippedRefs() function of our "lfs" package we now pass a "true"
value to the RemoteRefs() function, which guarantees it will only
return branch refs.

The calcSkippedRefs() function compares this set of branch refs
against the refs returned from the "git" package's CachedRemoteRefs()
function, which also consists exclusively of branch refs.  Any refs
which are in both sets are then returned as a list of refs to be
ignored.  In particular, in subsequent calls to the "git rev-list"
command, any commits reachable from the commits referenced by the
locally cached versions of these ignorable refs will be excluded.

As the RemoteRefs() function will no longer return any tags, we
can slightly simplify the calcSkippedRefs() function, since it does
not need to check for tag refs any longer.  We also rename the
actualRemoteBranchRefs variable to actualRemoteRefsSet, and revise
and expand some inline comments to clarify the code's expectations
and intent.
In previous commits in this PR we have adjusted the arguments we use
when executing the "git ls-remote" command so as to limit the refs it
returns to include only branches, in cases where we do not need to
process any tag refs.

Specifically, the "git lfs push" and "git lfs pre-push" commands invoke
the NewGitScannerForPush() function in our "lfs" package, which uses
the calcSkippedRefs() function in the same package to find the set
of remote branch refs that are also cached locally.  Any commits
reachable from the commits referenced by the locally cached versions
of these refs will then be excluded from the set of commits returned
by the "git rev-list" command when that is subsequently called by
the revListShas() function and the methods of the RevListScanner
structure in our "git" package.  No tag refs are considered during
these steps, so can avoid retrieving them with the "git ls-remote"
command.

Earlier, in commit aab142d of PR git-lfs#4102,
we added the "pre-push with force-pushed ref" test to our t/t-pre-push.sh
test suite, which checks that the "git lfs pre-push" command handles
the condition where a Git commit is referenced by a remote ref, and
is not present locally, but also happens to be one that can be ignored
because it has been replaced in the local version of the ref by a
force-push.  (The test runs the "git lfs pre-push" command indirectly,
as it is executed by the "git push" command.)

Fortunately, this test suffices to validate the changes we have made
in this PR also, at least for the "git lfs pre-push" command, because the
test creates and pushes a tag, and then updates and pushes the tag again.
As a result, it runs the "git lfs pre-push" command in a context where
a tag ref exists on the Git remote.  Prior to our changes in this PR,
the "git ls-remote" command (as performed by the RemoteRefs() function
in the "git" package, when called by the calcSkippedRefs() function)
would return the tag ref as well as the "main" branch ref.  With this
PR's changes, the "git ls-remote" command now only returns the "main"
branch ref under this test's conditions, so we have an effective test
of the "git lfs pre-push" command with a tag ref on a Git remote.

However, we do not have a test of the "git lfs push" command with a
tag ref on the Git remote, so we add such a test now.  This test is
not strictly necessary, but ensures we run the "git lfs push" command
in a context where a tag has been created and pushed to the Git remote.
Without this PR's other changes, the tag ref would then be returned by
"git ls-remote" when that was performed as part of the "git lfs push"
command's operation.  With this PR's changes, though, the tag ref will
not be returned by "git ls-remote".  The test will succeed in either
case because the tag ref is ultimately immaterial in the determination
of the refs returned by the calcSkippedRefs() function.
@chrisd8088
Copy link
Member

You can add your commits to this, no issues — added you as a collaborator to my fork.

Thank you very much! I've done so, and I'll solicit a review from another @git-lfs/core person (just so I'm not reviewing my own commits!), and we'll go from there. Thanks again!

@chrisd8088 chrisd8088 requested a review from a team October 4, 2024 17:39
Copy link
Member

@larsxschneider larsxschneider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you! 😊

As suggested by larsxschneider on PR review, we can avoid one
negation of the new Boolean argument we have added to the RemoteRefs()
function in our "git" package, in a prior commit in this PR, if we
invert the flag's sense and rename it from ignoreTags to withTags.

We also take the opportunity to update the code comments which describe
the action of the RemoteRefs() function so as to indicate that the
inclusion of tag refs in the returned list of refs is now optional.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fetching all refs and tags on every push can be really slow for large repositories

3 participants