Skip to content

Faster send pack for remote#1947

Merged
pjbgf merged 5 commits into
go-git:mainfrom
stiak:faster-send-pack
Apr 8, 2026
Merged

Faster send pack for remote#1947
pjbgf merged 5 commits into
go-git:mainfrom
stiak:faster-send-pack

Conversation

@stiak
Copy link
Copy Markdown
Contributor

@stiak stiak commented Apr 1, 2026

I noticed while using remote.PushContext that there was a significant delay between getting the remote refs and the push, even when pushing a relatively small branch (~12 files changed).

It looks like the current revlist.Objects() function walks a lot more than it needs to. This PR introduces a new objectWalk struct with functions that aim to improve the algorithm, though native git has other solutions which are likely faster again.

This wires the Objects and ObjectsWithRef functions to use the new path. It also removes the ObjectsWithStorageForIgnores path, which I didn't understand. Happy to bring it back if you can explain it to me.

Copilot AI review requested due to automatic review settings April 1, 2026 06:00
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce push latency by avoiding a full reachable-object walk when preparing a pack for remote.PushContext, introducing a commit-diff-based object discovery path and wiring it into Remote.sendPack.

Changes:

  • Add revlist.ObjectsDiff() to compute objects to send by walking local/remote commit graphs and collecting tree diffs.
  • Switch remote.sendPack (non-local remotes) from revlist.Objects() to revlist.ObjectsDiff().
  • Add tests and benchmarks for ObjectsDiff.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
remote.go Uses revlist.ObjectsDiff during send-pack object selection for non-local endpoints.
plumbing/revlist/diff.go Implements ObjectsDiff and tree-diff-based object collection.
plumbing/revlist/diff_test.go Adds fixture-based + synthetic tests and benchmarks for ObjectsDiff.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread remote.go Outdated
Comment thread plumbing/revlist/diff.go Outdated
Comment thread plumbing/revlist/diff.go Outdated
Comment thread plumbing/revlist/object_walk.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plumbing/revlist/diff.go Outdated
Comment thread plumbing/revlist/diff_test.go Outdated
@pjbgf
Copy link
Copy Markdown
Member

pjbgf commented Apr 2, 2026

This implementation of ObjectsDiff seems to fix #1953.

Copy link
Copy Markdown
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stiak running benchmarks locally I can definitely see an improvement in time and complexity - great job! 👏

BenchmarkReachableObjects-24        	    7983	    138570 ns/op	   95315 B/op	    1774 allocs/op
BenchmarkReachableObjectsDiff-24    	   22647	     53123 ns/op	   35052 B/op	     644 allocs/op

Testing Objects vs ObjectsDiff against the upstream git:

$ go run _examples/open/main.go . refs/tags/v5.12.0 refs/remotes/origin/mainlocal:  refs/tags/v5.12.0 (302dddeda962e4bb3477a8e4080bc6f5a253e2bb)
remote: refs/remotes/origin/main (5c776e15a11be5d2dfc6df27692bf867a69c37d8)
git rev-list refs/tags/v5.12.0 --objects --count: 14869
revlist.Objects ref1
  objects: 14869  time: 487.578525ms  alloc: 9 MB  total-alloc: 159 MB  gc-cycles: 34
revlist.ObjectsDiff ref1
  objects: 14740  time: 444.275267ms  alloc: 9 MB  total-alloc: 255 MB  gc-cycles: 26

git rev-list refs/tags/v5.12.0..refs/remotes/origin/main --objects --count: 0
revlist.Objects ref1..ref2
  objects: 0  time: 399.524922ms  alloc: 14 MB  total-alloc: 217 MB  gc-cycles: 17
revlist.ObjectsDiff ref1..ref2
  objects: 0  time: 35.240541ms  alloc: 15 MB  total-alloc: 19 MB  gc-cycles: 2

The number of objects returned in ObjectsDiff is slightly off: both Objects and git returned 14869, ObjectsDiff returned 14740.

Comparing with a target that has a delta, shows that we are off by 1 in terms of number of objects - which is less of a problem but it would be good to understand why we have one additional object in this case:

$ go run _examples/open/main.go . refs/remotes/origin/tags/v5.13.0 refs/tags/v5.12.0
local:  refs/remotes/origin/tags/v5.13.0 (94bd4af1deb15a64e90c6287eaf9e9f09b192a1f)
remote: refs/tags/v5.12.0 (302dddeda962e4bb3477a8e4080bc6f5a253e2bb)
git rev-list refs/remotes/origin/tags/v5.13.0 --objects --count: 15607
revlist.Objects ref1
  objects: 15607  time: 576.640542ms  alloc: 10 MB  total-alloc: 171 MB  gc-cycles: 36
revlist.ObjectsDiff ref1
  objects: 15476  time: 487.676894ms  alloc: 10 MB  total-alloc: 275 MB  gc-cycles: 27

git rev-list refs/remotes/origin/tags/v5.13.0..refs/tags/v5.12.0 --objects --count: 738
revlist.Objects ref1..ref2
  objects: 738  time: 308.615632ms  alloc: 10 MB  total-alloc: 146 MB  gc-cycles: 13
revlist.ObjectsDiff ref1..ref2
  objects: 739  time: 45.168414ms  alloc: 10 MB  total-alloc: 25 MB  gc-cycles: 3

The memory consumption is a lot better (~10x depending on repo) on the hot path, so it should be OK to take the ~50% worse on the less likely. Although we could probably find easy optimisations for it at a later point.

The code used for the tests above:

package main

import (
	"fmt"
	"os"
	"os/exec"
	"runtime"
	"strings"
	"time"

	"github.com/go-git/go-git/v6"
	. "github.com/go-git/go-git/v6/_examples"
	"github.com/go-git/go-git/v6/plumbing"
	"github.com/go-git/go-git/v6/plumbing/revlist"
)

func main() {
	CheckArgs("<path> <ref1> <ref2>")
	path := os.Args[1]
	ref1 := os.Args[2]
	ref2 := os.Args[3]

	r, err := git.PlainOpen(path)
	CheckIfError(err)

	local, err := r.Reference(plumbing.ReferenceName(ref1), true)
	CheckIfError(err)

	remote, err := r.Reference(plumbing.ReferenceName(ref2), true)
	CheckIfError(err)

	Info("local:  %s (%s)", local.Name(), local.Hash())
	Info("remote: %s (%s)", remote.Name(), remote.Hash())

	localHashes := []plumbing.Hash{local.Hash()}
	remoteHashes := []plumbing.Hash{remote.Hash()}

	// All objects reachable from ref1.
	Info("git rev-list %s --objects --count: %s", ref1, gitRevListCount(path, ref1))

	bench("revlist.Objects ref1", func() ([]plumbing.Hash, error) {
		return revlist.Objects(r.Storer, localHashes, nil)
	})

	bench("revlist.ObjectsDiff ref1", func() ([]plumbing.Hash, error) {
		return revlist.ObjectsDiff(r.Storer, localHashes, nil)
	})

	// Objects reachable from ref1 but not ref2.
	Info("\ngit rev-list %s..%s --objects --count: %s", ref1, ref2, gitRevListCount(path, ref1, "^"+ref2))

	bench("revlist.Objects ref1..ref2", func() ([]plumbing.Hash, error) {
		return revlist.Objects(r.Storer, localHashes, remoteHashes)
	})

	bench("revlist.ObjectsDiff ref1..ref2", func() ([]plumbing.Hash, error) {
		return revlist.ObjectsDiff(r.Storer, localHashes, remoteHashes)
	})
}

func bench(label string, fn func() ([]plumbing.Hash, error)) {
	runtime.GC()
	var before runtime.MemStats
	runtime.ReadMemStats(&before)

	t0 := time.Now()
	hashes, err := fn()
	elapsed := time.Since(t0)
	CheckIfError(err)

	runtime.GC()
	var after runtime.MemStats
	runtime.ReadMemStats(&after)

	Info("%s", label)
	fmt.Printf("  objects: %d  time: %v  alloc: %d MB  total-alloc: %d MB  gc-cycles: %d\n",
		len(hashes),
		elapsed,
		after.Alloc/1024/1024,
		(after.TotalAlloc-before.TotalAlloc)/1024/1024,
		after.NumGC-before.NumGC,
	)
}

func gitRevListCount(path string, refs ...string) string {
	args := append([]string{"-C", path, "rev-list", "--objects", "--count"}, refs...)
	out, err := exec.Command("git", args...).Output()
	if err != nil {
		return fmt.Sprintf("error: %v", err)
	}
	return strings.TrimSpace(string(out))
}

Comment thread plumbing/revlist/diff.go Outdated
Comment thread plumbing/revlist/diff.go Outdated
Comment thread plumbing/revlist/diff.go Outdated
Comment thread plumbing/revlist/diff.go Outdated
Comment thread plumbing/revlist/diff_test.go Outdated
Comment thread plumbing/revlist/diff_test.go Outdated
Comment thread plumbing/revlist/diff_test.go Outdated
@stiak
Copy link
Copy Markdown
Contributor Author

stiak commented Apr 7, 2026

@pjbgf I think I have the correctness issues solved now.

  • The missing objects were due to an early return, meaning we didn't evaluate the diff again if we'd seen the tree before. Early return was removed.
  • The extra objects were due to objects being re-introduced by the commits between haves and wants. Solved by capturing "seen" objects at the tip before starting the commit walk.

I'll look into the benchmarking and broader replacement next.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plumbing/revlist/object_walk.go Outdated
Comment thread plumbing/revlist/object_walk.go
Comment thread plumbing/revlist/revlist_test.go
Comment thread plumbing/revlist/revlist.go
@stiak stiak force-pushed the faster-send-pack branch 3 times, most recently from c6c2eab to 1696357 Compare April 8, 2026 00:08
@stiak stiak requested a review from Copilot April 8, 2026 00:08
@stiak
Copy link
Copy Markdown
Contributor Author

stiak commented Apr 8, 2026

Latest timing output, also including the changes from https://github.com/squishykid/go-git/tree/faster-send-pack as ObjectsTopo. This PR implementation is Objects and current main branch is ObjectsOld:

     local:  refs/tags/v5.13.0 (94bd4af1deb15a64e90c6287eaf9e9f09b192a1f)
     remote: refs/tags/v5.12.0 (302dddeda962e4bb3477a8e4080bc6f5a253e2bb)
     git rev-list refs/tags/v5.13.0 --objects --count: 15607
     revlist.ObjectsOld ref1
       objects: 15607  time: 557.486125ms  alloc: 10 MB  total-alloc: 169 MB  gc-cycles: 38
     revlist.Objects ref1
       objects: 15607  time: 391.296958ms  alloc: 9 MB  total-alloc: 135 MB  gc-cycles: 15
     revlist.ObjectsTopo ref1
       objects: 15607  time: 566.089792ms  alloc: 9 MB  total-alloc: 197 MB  gc-cycles: 15

     git rev-list refs/tags/v5.13.0..refs/tags/v5.12.0 --objects --count: 738
     revlist.ObjectsOld ref1..ref2
       objects: 738  time: 401.096708ms  alloc: 9 MB  total-alloc: 145 MB  gc-cycles: 14
     revlist.Objects ref1..ref2
       objects: 738  time: 82.081542ms  alloc: 9 MB  total-alloc: 30 MB  gc-cycles: 4
     revlist.ObjectsTopo ref1..ref2
       objects: 738  time: 52.568834ms  alloc: 9 MB  total-alloc: 18 MB  gc-cycles: 3

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plumbing/revlist/revlist.go Outdated
Comment thread plumbing/revlist/object_walk.go
Comment thread plumbing/revlist/object_walk.go Outdated
Comment thread plumbing/revlist/object_walk.go
Comment thread plumbing/revlist/object_walk.go Outdated
Comment thread remote.go
Replace the legacy commit/tree walker in revlist.Objects with a new
implementation that uses an interleaved wants/haves commit walk with
per-commit tree diffing to compute the minimal set of objects to send.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@stiak stiak force-pushed the faster-send-pack branch from 1696357 to 0294b39 Compare April 8, 2026 00:25
pjbgf added 4 commits April 8, 2026 13:46
Objects was returning "object not found" when walking commits in a
shallow repository because it tried to fetch parents beyond the
shallow boundary. Load the shallow set at walk construction and
skip parent traversal for shallow commits.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
The interleaved timestamp-ordered walk could miss common ancestors when
commit times were skewed (e.g. base commit newer than both children).
Drain all haves commits first to fully establish the seen boundary,
then walk wants. This makes the result purely graph-based rather than
depending on committer timestamps.

Also removes insertSorted since walk order no longer matters, and
renames ObjectsWithRef parameters to wants/haves for consistency.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Align haves expansion with upstream Git's pack-objects behaviour:
walk all haves ancestor commits and mark their trees/blobs as seen
so that the wants diff walk never includes objects the remote already
has. Previously only the tip haves trees were marked, which could
re-send blobs reachable from haves ancestry.

Also surface errors from haves parent lookups and want parent tree
lookups instead of silently swallowing them. Upstream Git treats
missing objects in a non-shallow, non-promisor repo as hard errors;
promisor support will be handled separately in a future change.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
Introduce an objectWalker interface so that storers (e.g. those backed
by a native Git ODB) can supply their own revlist implementation.
Objects delegates to RevListObjects when the storer implements it,
falling back to the built-in walk otherwise.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>
@pjbgf
Copy link
Copy Markdown
Member

pjbgf commented Apr 8, 2026

I've tested this and ended up adding a few commits to better align behaviour with upstream.

Existing problems:

Fixes to the previous commit:

  • Remove assumptions on having previous commits based on a newer commit date.

Additionally, object storers are able to completely override the object walker by adding a func RevListObjects(wants, haves []plumbing.Hash) ([]plumbing.Hash, error). In the future this could be exposed as its own public interface.

Comment thread remote.go
} else {
hashesToPush, err = revlist.Objects(r.s, objects, haves)
}
hashesToPush, err = revlist.Objects(r.s, objects, haves)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference: the performance gain on a local-specific approach wasn't worth the additional complexity.

Copy link
Copy Markdown
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stiak thanks for working on this. 🙇

@pjbgf pjbgf merged commit ba8e2c6 into go-git:main Apr 8, 2026
23 of 24 checks passed
@stiak stiak mentioned this pull request Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants