Faster send pack for remote#1947
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce push latency by avoiding a full reachable-object walk when preparing a pack for remote.PushContext, introducing a commit-diff-based object discovery path and wiring it into Remote.sendPack.
Changes:
- Add
revlist.ObjectsDiff()to compute objects to send by walking local/remote commit graphs and collecting tree diffs. - Switch
remote.sendPack(non-local remotes) fromrevlist.Objects()torevlist.ObjectsDiff(). - Add tests and benchmarks for
ObjectsDiff.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| remote.go | Uses revlist.ObjectsDiff during send-pack object selection for non-local endpoints. |
| plumbing/revlist/diff.go | Implements ObjectsDiff and tree-diff-based object collection. |
| plumbing/revlist/diff_test.go | Adds fixture-based + synthetic tests and benchmarks for ObjectsDiff. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
This implementation of |
There was a problem hiding this comment.
@stiak running benchmarks locally I can definitely see an improvement in time and complexity - great job! 👏
BenchmarkReachableObjects-24 7983 138570 ns/op 95315 B/op 1774 allocs/op
BenchmarkReachableObjectsDiff-24 22647 53123 ns/op 35052 B/op 644 allocs/op
Testing Objects vs ObjectsDiff against the upstream git:
$ go run _examples/open/main.go . refs/tags/v5.12.0 refs/remotes/origin/mainlocal: refs/tags/v5.12.0 (302dddeda962e4bb3477a8e4080bc6f5a253e2bb)
remote: refs/remotes/origin/main (5c776e15a11be5d2dfc6df27692bf867a69c37d8)
git rev-list refs/tags/v5.12.0 --objects --count: 14869
revlist.Objects ref1
objects: 14869 time: 487.578525ms alloc: 9 MB total-alloc: 159 MB gc-cycles: 34
revlist.ObjectsDiff ref1
objects: 14740 time: 444.275267ms alloc: 9 MB total-alloc: 255 MB gc-cycles: 26
git rev-list refs/tags/v5.12.0..refs/remotes/origin/main --objects --count: 0
revlist.Objects ref1..ref2
objects: 0 time: 399.524922ms alloc: 14 MB total-alloc: 217 MB gc-cycles: 17
revlist.ObjectsDiff ref1..ref2
objects: 0 time: 35.240541ms alloc: 15 MB total-alloc: 19 MB gc-cycles: 2
The number of objects returned in ObjectsDiff is slightly off: both Objects and git returned 14869, ObjectsDiff returned 14740.
Comparing with a target that has a delta, shows that we are off by 1 in terms of number of objects - which is less of a problem but it would be good to understand why we have one additional object in this case:
$ go run _examples/open/main.go . refs/remotes/origin/tags/v5.13.0 refs/tags/v5.12.0
local: refs/remotes/origin/tags/v5.13.0 (94bd4af1deb15a64e90c6287eaf9e9f09b192a1f)
remote: refs/tags/v5.12.0 (302dddeda962e4bb3477a8e4080bc6f5a253e2bb)
git rev-list refs/remotes/origin/tags/v5.13.0 --objects --count: 15607
revlist.Objects ref1
objects: 15607 time: 576.640542ms alloc: 10 MB total-alloc: 171 MB gc-cycles: 36
revlist.ObjectsDiff ref1
objects: 15476 time: 487.676894ms alloc: 10 MB total-alloc: 275 MB gc-cycles: 27
git rev-list refs/remotes/origin/tags/v5.13.0..refs/tags/v5.12.0 --objects --count: 738
revlist.Objects ref1..ref2
objects: 738 time: 308.615632ms alloc: 10 MB total-alloc: 146 MB gc-cycles: 13
revlist.ObjectsDiff ref1..ref2
objects: 739 time: 45.168414ms alloc: 10 MB total-alloc: 25 MB gc-cycles: 3
The memory consumption is a lot better (~10x depending on repo) on the hot path, so it should be OK to take the ~50% worse on the less likely. Although we could probably find easy optimisations for it at a later point.
The code used for the tests above:
package main
import (
"fmt"
"os"
"os/exec"
"runtime"
"strings"
"time"
"github.com/go-git/go-git/v6"
. "github.com/go-git/go-git/v6/_examples"
"github.com/go-git/go-git/v6/plumbing"
"github.com/go-git/go-git/v6/plumbing/revlist"
)
func main() {
CheckArgs("<path> <ref1> <ref2>")
path := os.Args[1]
ref1 := os.Args[2]
ref2 := os.Args[3]
r, err := git.PlainOpen(path)
CheckIfError(err)
local, err := r.Reference(plumbing.ReferenceName(ref1), true)
CheckIfError(err)
remote, err := r.Reference(plumbing.ReferenceName(ref2), true)
CheckIfError(err)
Info("local: %s (%s)", local.Name(), local.Hash())
Info("remote: %s (%s)", remote.Name(), remote.Hash())
localHashes := []plumbing.Hash{local.Hash()}
remoteHashes := []plumbing.Hash{remote.Hash()}
// All objects reachable from ref1.
Info("git rev-list %s --objects --count: %s", ref1, gitRevListCount(path, ref1))
bench("revlist.Objects ref1", func() ([]plumbing.Hash, error) {
return revlist.Objects(r.Storer, localHashes, nil)
})
bench("revlist.ObjectsDiff ref1", func() ([]plumbing.Hash, error) {
return revlist.ObjectsDiff(r.Storer, localHashes, nil)
})
// Objects reachable from ref1 but not ref2.
Info("\ngit rev-list %s..%s --objects --count: %s", ref1, ref2, gitRevListCount(path, ref1, "^"+ref2))
bench("revlist.Objects ref1..ref2", func() ([]plumbing.Hash, error) {
return revlist.Objects(r.Storer, localHashes, remoteHashes)
})
bench("revlist.ObjectsDiff ref1..ref2", func() ([]plumbing.Hash, error) {
return revlist.ObjectsDiff(r.Storer, localHashes, remoteHashes)
})
}
func bench(label string, fn func() ([]plumbing.Hash, error)) {
runtime.GC()
var before runtime.MemStats
runtime.ReadMemStats(&before)
t0 := time.Now()
hashes, err := fn()
elapsed := time.Since(t0)
CheckIfError(err)
runtime.GC()
var after runtime.MemStats
runtime.ReadMemStats(&after)
Info("%s", label)
fmt.Printf(" objects: %d time: %v alloc: %d MB total-alloc: %d MB gc-cycles: %d\n",
len(hashes),
elapsed,
after.Alloc/1024/1024,
(after.TotalAlloc-before.TotalAlloc)/1024/1024,
after.NumGC-before.NumGC,
)
}
func gitRevListCount(path string, refs ...string) string {
args := append([]string{"-C", path, "rev-list", "--objects", "--count"}, refs...)
out, err := exec.Command("git", args...).Output()
if err != nil {
return fmt.Sprintf("error: %v", err)
}
return strings.TrimSpace(string(out))
}|
@pjbgf I think I have the correctness issues solved now.
I'll look into the benchmarking and broader replacement next. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
c6c2eab to
1696357
Compare
|
Latest timing output, also including the changes from https://github.com/squishykid/go-git/tree/faster-send-pack as |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Replace the legacy commit/tree walker in revlist.Objects with a new implementation that uses an interleaved wants/haves commit walk with per-commit tree diffing to compute the minimal set of objects to send. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Objects was returning "object not found" when walking commits in a shallow repository because it tried to fetch parents beyond the shallow boundary. Load the shallow set at walk construction and skip parent traversal for shallow commits. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]>
The interleaved timestamp-ordered walk could miss common ancestors when commit times were skewed (e.g. base commit newer than both children). Drain all haves commits first to fully establish the seen boundary, then walk wants. This makes the result purely graph-based rather than depending on committer timestamps. Also removes insertSorted since walk order no longer matters, and renames ObjectsWithRef parameters to wants/haves for consistency. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]>
Align haves expansion with upstream Git's pack-objects behaviour: walk all haves ancestor commits and mark their trees/blobs as seen so that the wants diff walk never includes objects the remote already has. Previously only the tip haves trees were marked, which could re-send blobs reachable from haves ancestry. Also surface errors from haves parent lookups and want parent tree lookups instead of silently swallowing them. Upstream Git treats missing objects in a non-shallow, non-promisor repo as hard errors; promisor support will be handled separately in a future change. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]>
Introduce an objectWalker interface so that storers (e.g. those backed by a native Git ODB) can supply their own revlist implementation. Objects delegates to RevListObjects when the storer implements it, falling back to the built-in walk otherwise. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]>
|
I've tested this and ended up adding a few commits to better align behaviour with upstream. Existing problems:
Fixes to the previous commit:
Additionally, object storers are able to completely override the object walker by adding a func |
| } else { | ||
| hashesToPush, err = revlist.Objects(r.s, objects, haves) | ||
| } | ||
| hashesToPush, err = revlist.Objects(r.s, objects, haves) |
There was a problem hiding this comment.
For future reference: the performance gain on a local-specific approach wasn't worth the additional complexity.
I noticed while using
remote.PushContextthat there was a significant delay between getting the remote refs and the push, even when pushing a relatively small branch (~12 files changed).It looks like the current
revlist.Objects()function walks a lot more than it needs to. This PR introduces a newobjectWalkstruct with functions that aim to improve the algorithm, though native git has other solutions which are likely faster again.This wires the
ObjectsandObjectsWithReffunctions to use the new path. It also removes theObjectsWithStorageForIgnorespath, which I didn't understand. Happy to bring it back if you can explain it to me.