Skip to content

demo: mmap'ed reads#2153

Open
hanwen wants to merge 1 commit into
go-git:mainfrom
hanwen:push-mpwkuzvpyrsl
Open

demo: mmap'ed reads#2153
hanwen wants to merge 1 commit into
go-git:mainfrom
hanwen:push-mpwkuzvpyrsl

Conversation

@hanwen

@hanwen hanwen commented May 23, 2026

Copy link
Copy Markdown

before:

time /tmp/merge-base ~/vc/linux/.git refs/tags/v7.0 refs/tags/v6.9 a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6

real 0m10.530s
user 0m8.033s
sys 0m3.195s

after:

$ time /tmp/merge-base ~/vc/linux/.git refs/tags/v7.0 refs/tags/v6.9 a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6

real 0m6.716s
user 0m7.314s
sys 0m0.161s

this change, but osfs.New() with KeepDescriptors: true

$ time /tmp/merge-base ~/vc/linux/.git refs/tags/v7.0 refs/tags/v6.9
a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6

real 0m11.368s
user 0m8.909s
sys 0m3.257s

CGit:

time git --git-dir ~/vc/linux/.git/ merge-base refs/tags/v6.9 refs/tags/v7.0 a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6

real 0m2.948s
user 0m2.701s
sys 0m0.230s

After packhandle cache (#2132)

time /tmp/merge-base ~/vc/linux/.git refs/tags/v7.0 refs/tags/v6.9
a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6

real 0m10.596s
user 0m9.272s
sys 0m2.489s

before:

time /tmp/merge-base  ~/vc/linux/.git refs/tags/v7.0 refs/tags/v6.9
a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6

real	0m10.530s
user	0m8.033s
sys	0m3.195s

after:

$ time /tmp/merge-base  ~/vc/linux/.git refs/tags/v7.0 refs/tags/v6.9
a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6

real	0m6.716s
user	0m7.314s
sys	0m0.161s

CGit:

time git  --git-dir ~/vc/linux/.git/ merge-base refs/tags/v6.9 refs/tags/v7.0
a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6

real	0m2.948s
user	0m2.701s
sys	0m0.230s
@hanwen

hanwen commented May 24, 2026

Copy link
Copy Markdown
Author

merge_base @ current main:

[pid 505664] pread64(19, "PACK\0\0\0\2\0\1\332M\221\230\2x\234\255X[W\3436\27}\367\2578k^:S"..., 4096, 0) = 4096
[pid 505664] pread64(19, "\257\312\357|_v\313z\323}U\5\347\177\364\252jj\354\273", 4096, 152076825) = 20
[pid 505664] pread64(19, "", 4076, 152076845) = 0
[pid 505664] pread64(18, "\0\0s/", 4, 242852) = 4
[pid 505664] pread64(17, "\1^\333\\", 4, 3033084) = 4
[pid 505664] pread64(18, "\0\0\2527", 4, 121432) = 4
[pid 505664] pread64(17, "\1\3\314\317", 4, 3089436) = 4
[pid 505664] pread64(18, "\0\1z\231", 4, 60720) = 4
[pid 505664] pread64(17, "\0\266\203D", 4, 3302820) = 4
[pid 505664] pread64(18, "\0\0\224~", 4, 30364) = 4
[pid 505664] pread64(17, "\0J\30\315", 4, 3067192) = 4
[pid 505664] pread64(18, "\0\1>\16", 4, 15188) = 4
[pid 505664] pread64(17, "\0(\337\22", 4, 3240824) = 4
[pid 505664] pread64(18, "\0\0\311\241", 4, 22776) = 4
[pid 505664] pread64(17, "\09>Y", 4, 3121604) = 4

@hanwen

hanwen commented May 24, 2026

Copy link
Copy Markdown
Author

There is still space for optimization. Go-git should be able to reach JGit's speed:

hanwen@fedora:~/vc/jgit$ time bazel-bin/org.eclipse.jgit.pgm/jgit --git-dir ~/vc/linux/.git merge-base  refs/tags/v7.0 refs/tags/v6.9
SLF4J(W): No SLF4J providers were found.
SLF4J(W): Defaulting to no-operation (NOP) logger implementation
SLF4J(W): See https://www.slf4j.org/codes.html#noProviders for further details.
a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6

real	0m4.003s
user	0m6.880s
sys	0m0.897s

hiddeco added a commit to hiddeco/go-git that referenced this pull request May 25, 2026
Materialising an `FSObject` returned by `EncodedObject` previously
fell back to a direct `o.fs.Open(packPath)` on every `Reader()`
call, bypassing the per-pack file-descriptor pool introduced in
`billy.Filesystem` that pays an mmap/munmap on each Open, the
regression amplified to ~400x more pack opens over 500 reads on
the basic fixture[1].

The cure threads the dotgit-cached `*packhandle.PackHandle` into
the `*packfile.Packfile` via a new `WithPackHandle` option taking
a `PackHandleResolver`. `objectFromHeader` captures the resolver
in the `FSObject.acquireRandom` closure, so every read acquires
a fresh cursor against the pool-owned `SharedFile`. The resolver
re-runs on every `Reader` call, which lets cached FSObjects
survive `cleanPackList` invalidations such as the one
`NewObjectPack` triggers before a repack.

The internal `*packhandle.PackHandle` stays internal — the
public `packfile.PackHandle` interface returns `io.ReadSeekCloser`
and `RandomReader`, and `DotGit.PackHandle` returns the interface
through an unexported adapter so callers can't accidentally call
`Close()` on a catalog-owned handle. `Packfile.Close` joins the
scanner-close and owned-handle close errors via `errors.Join`.

[1]: go-git#2153 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 25, 2026
Materialising an `FSObject` returned by `EncodedObject` previously
fell back to a direct `o.fs.Open(packPath)` on every `Reader()`
call, bypassing the per-pack file-descriptor pool introduced in
`billy.Filesystem` that pays an mmap/munmap on each Open, the
regression amplified to ~400x more pack opens over 500 reads on
the basic fixture[1].

The cure threads the dotgit-cached `*packhandle.PackHandle` into
the `*packfile.Packfile` via a new `WithPackHandle` option taking
a `PackHandleResolver`. `objectFromHeader` captures the resolver
in the `FSObject.acquireRandom` closure, so every read acquires
a fresh cursor against the pool-owned `SharedFile`. The resolver
re-runs on every `Reader` call, which lets cached FSObjects
survive `cleanPackList` invalidations such as the one
`NewObjectPack` triggers before a repack.

The internal `*packhandle.PackHandle` stays internal — the
public `packfile.PackHandle` interface returns `io.ReadSeekCloser`
and `RandomReader`, and `DotGit.PackHandle` returns the interface
through an unexported adapter so callers can't accidentally call
`Close()` on a catalog-owned handle. `Packfile.Close` joins the
scanner-close and owned-handle close errors via `errors.Join`.

[1]: go-git#2153 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
@cedric-appdirect

Copy link
Copy Markdown
Contributor

Have you looked at https://github.com/go-git/go-git/tree/main/storage/filesystem/mmap? Also mmap is tricky, have a read of https://valyala.medium.com/mmap-in-go-considered-harmful-d92a25cb161d , but basically using mmap in a server process will lead to unpredictable latency . mmap should be optional to not create regression on services that use go-git today. mmap is useful for local and desktop application especially in multi process case. The right approach for go-git will be to look into using io.ReaderAt instead, in my opinion.

@hanwen

hanwen commented May 25, 2026

Copy link
Copy Markdown
Author

https://github.com/go-git/go-git/tree/main/storage/filesystem/mmap?

This looks somewhat suspect, because NewPackScanner is tied to packs. Reftables could be served out of memory backed storage too, and does not depend on packs.

basically using mmap in a server process will lead to unpredictable latency .

Point taken, but I'm interested in a locally running process. Also, in the case that the data does fit in RAM, the alternative to mmap'ing data is to load the data into explicitly, which will cause delays on program startup.

The right approach for go-git will be to look into using io.ReaderAt instead, in my opinion.

IMO, the go-git FS abstraction lacks something BlockSource (see https://github.com/hanwen/reftable/blob/master/api.go#L16). ReaderAt has the disadvantage that the caller must allocate a buffer, and all ReaderAt implementations must copy over the bytes. So it's always slower than an API that just given you the bytes directly.

hiddeco added a commit to hiddeco/go-git that referenced this pull request May 26, 2026
Materialising an `FSObject` returned by `EncodedObject` previously
fell back to a direct `o.fs.Open(packPath)` on every `Reader()`
call, bypassing the per-pack file-descriptor pool introduced in
`billy.Filesystem` that pays an mmap/munmap on each Open, the
regression amplified to ~400x more pack opens over 500 reads on
the basic fixture[1].

The cure threads the dotgit-cached `*packhandle.PackHandle` into
the `*packfile.Packfile` via a new `WithPackHandle` option taking
a `PackHandleResolver`. `objectFromHeader` captures the resolver
in the `FSObject.acquireRandom` closure, so every read acquires
a fresh cursor against the pool-owned `SharedFile`. The resolver
re-runs on every `Reader` call, which lets cached FSObjects
survive resets of dotgit's pack-handle store such as the one
`NewObjectPack` triggers before a repack.

The internal `*packhandle.PackHandle` stays internal — the
public `packfile.PackHandle` interface returns `io.ReadSeekCloser`
and `RandomReader`, and `DotGit.PackHandle` returns the interface
through an unexported adapter so callers can't accidentally call
`Close()` on a dotgit-owned handle. `Packfile.Close` releases
only the scanner cursor; the resolver-owned handle outlives the
Packfile.

[1]: go-git#2153 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 26, 2026
Materialising an `FSObject` returned by `EncodedObject` previously
fell back to a direct `o.fs.Open(packPath)` on every `Reader()`
call, bypassing the per-pack file-descriptor pool introduced in
`billy.Filesystem` that pays an mmap/munmap on each Open, the
regression amplified to ~400x more pack opens over 500 reads on
the basic fixture[1].

The cure threads the dotgit-cached `*packhandle.PackHandle` into
the `*packfile.Packfile` via a new `WithPackHandle` option taking
a `PackHandleResolver`. `objectFromHeader` captures the resolver
in the `FSObject.acquireRandom` closure, so every read acquires
a fresh cursor against the pool-owned `SharedFile`. The resolver
re-runs on every `Reader` call, which lets cached FSObjects
survive resets of dotgit's pack-handle store such as the one
`NewObjectPack` triggers before a repack.

The internal `*packhandle.PackHandle` stays internal — the
public `packfile.PackHandle` interface returns `io.ReadSeekCloser`
and `RandomReader`, and `DotGit.PackHandle` returns the interface
through an unexported adapter so callers can't accidentally call
`Close()` on a dotgit-owned handle. `Packfile.Close` releases
only the scanner cursor; the resolver-owned handle outlives the
Packfile.

[1]: go-git#2153 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 29, 2026
Materialising an `FSObject` returned by `EncodedObject` previously
fell back to a direct `o.fs.Open(packPath)` on every `Reader()`
call, bypassing the per-pack file-descriptor pool introduced in
`billy.Filesystem` that pays an mmap/munmap on each Open, the
regression amplified to ~400x more pack opens over 500 reads on
the basic fixture[1].

The cure threads the dotgit-cached `*packhandle.PackHandle` into
the `*packfile.Packfile` via a new `WithPackHandle` option taking
a `PackHandleResolver`. `objectFromHeader` captures the resolver
in the `FSObject.acquireRandom` closure, so every read acquires
a fresh cursor against the pool-owned `SharedFile`. The resolver
re-runs on every `Reader` call, which lets cached FSObjects
survive resets of dotgit's pack-handle store such as the one
`NewObjectPack` triggers before a repack.

The internal `*packhandle.PackHandle` stays internal — the
public `packfile.PackHandle` interface returns `io.ReadSeekCloser`
and `RandomReader`, and `DotGit.PackHandle` returns the interface
through an unexported adapter so callers can't accidentally call
`Close()` on a dotgit-owned handle. `Packfile.Close` releases
only the scanner cursor; the resolver-owned handle outlives the
Packfile.

[1]: go-git#2153 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants