Skip to content

storage: pool .pack file descriptors via internal/packhandle#2132

Merged
pjbgf merged 19 commits into
go-git:mainfrom
hiddeco:release-desc-packhandle
May 21, 2026
Merged

storage: pool .pack file descriptors via internal/packhandle#2132
pjbgf merged 19 commits into
go-git:mainfrom
hiddeco:release-desc-packhandle

Conversation

@hiddeco

@hiddeco hiddeco commented May 16, 2026

Copy link
Copy Markdown
Member

Introduces internal/packhandle, a per-pack file-descriptor pool that opens lazily, shares one descriptor across concurrent readers, and grace-closes on idle. storage/filesystem routes its read paths through the pool via a new DotGit.OpenPackForReading.

KeepDescriptors and MaxOpenDescriptors become silently honoured no-ops.

Performance

BenchmarkObjectStorage_PackHandle, basic fixture, Apple M4 Pro, darwin/arm64, n=5, benchstat vs origin/main:

sub-benchmark main branch Δ
EncodedObject/G=1/fd 44.83 µs 4.91 µs −89.06%
EncodedObject/G=NumCPU/fd 42.12 µs 16.03 µs −61.95%
EncodedObject/G=1/mmap 46.31 µs 1.47 µs −96.82%
EncodedObject/G=NumCPU/mmap 44.70 µs 1.70 µs −96.20%
HasEncodedObject/G=1/fd 21.12 µs 20.72 µs −1.89%
HasEncodedObject/G=NumCPU/fd 19.45 µs 19.87 µs +2.19%
HasEncodedObject/G=1/mmap 20.34 µs 20.31 µs ~
HasEncodedObject/G=NumCPU/mmap 20.02 µs 20.05 µs ~
geomean 29.99 µs 8.71 µs −70.97%

Comment thread tests/pack/common_test.go Outdated
// registers cleanup that closes all three.
func openFixtureTriple(t *testing.T, f fixtureTriple) (pack, idx, rev billy.File) {
t.Helper()
pack, err := f.Packfile()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this pattern solidifies, we can update go-git-fixtures so that we return the triple tuple in one go as well.

Comment thread storage/filesystem/dotgit/dotgit.go Outdated
@hiddeco hiddeco force-pushed the release-desc-packhandle branch 2 times, most recently from 714268f to d921e28 Compare May 20, 2026 11:34
@hiddeco hiddeco changed the title plumbing: introduce packhandle, refcounted pack-file FD lifecycle storage: pool .pack file descriptors via internal/packhandle May 20, 2026
@hiddeco hiddeco force-pushed the release-desc-packhandle branch 2 times, most recently from d9995a8 to 10d5ca7 Compare May 20, 2026 12:22
go-billy upstream main (commit 0095b06[1]) documents
billy.File.ReadAt as concurrency-safe and introduces
osfs.WithMmap as an opt-in option on read-only opens.
The packhandle redesign depends on both: concurrent ReadAt
is the substrate cursorReader builds on, and WithMmap is
the consumer-side enablement path for the mmap-backed
read pool.

[1]: go-git/go-billy@0095b06

Assisted-by: Claude Opus 4.7 (1M context) <[email protected]>
Signed-off-by: Hidde Beydals <[email protected]>
@hiddeco hiddeco force-pushed the release-desc-packhandle branch 4 times, most recently from c2f7803 to 0617e06 Compare May 20, 2026 23:22
@hiddeco hiddeco marked this pull request as ready for review May 21, 2026 00:00
@hiddeco hiddeco force-pushed the release-desc-packhandle branch 2 times, most recently from 5060afd to 4a91415 Compare May 21, 2026 10:49
hiddeco added 13 commits May 21, 2026 12:59
Establish the package with its module-private invariant in
`doc.go` (no identifier from `internal/packhandle` may appear
on the exported surface of any package outside `internal/`)
and the three sentinel errors callers match via `errors.Is`.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
`Source` describes how to obtain one file of a pack triple:
an `Open` closure returning `ReadAtCloser` (the narrow
`io.ReaderAt` + `io.ReadCloser` shape, structurally
identical to `idxfile.ReadAtCloser`) and a `Size` closure.
The narrow interface keeps in-memory wrappers trivial — the
`bytes.Reader` fallback used by dotgit when a `.rev` file is
absent on disk is a five-line wrapper rather than a full
`billy.File` stub matrix.

`Sources` bundles the three triple files: `Pack` is required,
`Idx` and `Rev` are optional. Field-named struct literals
stay forward-compatible as future formats (`.bitmap`,
`.midx`) are added.

`PathSource` is the standard-case constructor against a
`billy.Basic`; it works equally for fd-backed and mmap-backed
osfs instances because the mode is selected by the filesystem
at `Open` time, not by `PathSource`.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
`sharedFile` is the FD-pool primitive: `Acquire` opens the
underlying `ReadAtCloser` if needed and bumps a refcount,
`Release` decrements and schedules a grace-period close via
`time.AfterFunc` when refs reach zero, and `Close` is
terminal and synchronous (stops pending timers, closes the
open FD before returning).

The grace-period mechanism is what gives this PR Windows-safe
delete semantics: dotgit operations that delete or rename a
pack succeed even while another goroutine holds a reader,
because the actual close is deferred until refs reach zero
and the grace window elapses.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
`cursorReader` is the concrete reader returned by both
`OpenPackReader` (as `PackReader`) and `OpenRandomReader`
(as `RandomReader`). Each cursor holds an `Acquire` on the
underlying `sharedFile` and its own offset; concurrent
cursors over the same `PackHandle` have independent offsets
and do not serialize on a shared seek position. `ReadAt` is
concurrent-safe via the upstream go-billy contract — the
substrate that lets `PackHandle` scale concurrent readers
near-linearly up to NumCPU.

The `PackReader` and `RandomReader` interfaces are declared
inline here; they are the minimal contracts the two open
methods need.

Assisted-by: Claude Sonnet 4.6
Signed-off-by: Hidde Beydals <[email protected]>
Move the `PackReader` and `RandomReader` interface
declarations out of `cursor_reader.go` into their own
`pack_reader.go` now that the package surface refers to
them externally. Pure structural reshuffle; no behaviour
change.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
`PackHandle` is the facade tying `Sources`, the `sharedFile`
FD owner, and per-consumer `cursorReader`s into one type.
`New` validates that the `Pack` source's `Open` and `Size`
are wired and that the supplied `packHash` is non-zero;
`OpenPackReader` and `OpenRandomReader` hand out independent
cursors backed by one refcounted FD. `Close` is idempotent
through `sync.OnceValue` and sets the closed flag before
releasing any FDs so a concurrent retry path bails with
`fs.ErrClosed` rather than reopening idx/rev FDs against a
torn-down pack.

`Sources.Pack.Size` is consulted once per handle: pack files
are immutable post-creation and the on-disk identity is
pinned via `packHash`, so the size is invariant for the
handle's lifetime. The cache turns a per-cursor-open
`fs.Stat` into a single atomic load on the hot path.
Failures are not cached; the next call retries.

A minimal `PackMeta` type stub ships as a struct definition
for the `metaVal` field.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
`Meta` reads the 12-byte pack header and `packHash.Size()`-byte
footer, verifying the footer hash against the pinned
`packHash` and the parsed version against the supported set
(2 or 3). Mis-wired `Sources`, corrupted packs, or
unsupported pack-format versions surface as clear errors at
the parse site rather than as downstream parsing failures.

The mutex pattern caches successful reads; transient failures
(e.g. EMFILE on `Acquire`) are not cached, so the next call
retries. This avoids the failure mode where one EMFILE blip
on a long-running server permanently breaks the pack until
process restart.

A closed-flag check on entry and again inside the mutex
returns `fs.ErrClosed` if `Close` has landed, preventing the
half-closed retry race where a goroutine retries `Meta`
after a transient error and re-opens the pack FD after
`Close` has torn it down.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
`Index` lazily constructs an `idxfile.LazyIndex` from
`Sources.Idx` and `Sources.Rev`. The opener closures the
method passes to `idxfile.NewLazyIndex` wrap
`Sources.Idx.Open` and `Sources.Rev.Open` directly;
`packhandle.ReadAtCloser` satisfies `idxfile.ReadAtCloser`
structurally so no adapter is needed.

`indexVal` is stored as the concrete `*idxfile.LazyIndex`
(not the `idxfile.Index` interface) so `PackHandle.Close`
can call its `Close()` method directly — the
`idxfile.Index` interface has no `Close`. The concrete-type
field is internal to this package; an `idxfile.Index`
redesign would update this site along with the
cache-construction body.

The mutex pattern caches successes only and reads the
closed flag both outside and inside the lock to close the
half-closed retry race where a goroutine retries `Index`
after a transient error and re-opens idx/rev FDs after
`Close` has torn down the pack.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
Pin two regressions that the closed flag prevents.

`Index`/`Meta` retry race: goroutine A's call fails on a
transient `Source` error, goroutine B's `Close` lands, then
goroutine A retries. The retry must return `fs.ErrClosed`,
not a freshly-built `LazyIndex` against a torn-down pack.

Cursor-vs-`Close` race: a goroutine holds an open
`PackReader` cursor; another goroutine calls `Close`; the
cursor's subsequent `Read` must return `fs.ErrClosed`
regardless of what the underlying `ReadAtCloser` reports
post-Close.

`testing/synctest` (stdlib graduation in Go 1.25) makes the
scheduling deterministic — for the retry race, a channel
blocks B until A's first call returns; `synctest.Wait()`
then parks A until B's `Close()` completes and B exits, at
which point the retry executes under the invariant that
`Close` has already set the closed flag.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
Add a named private `h *packhandle.PackHandle` field on
`Packfile` and route the `.pack` file's FD access through
it when callers supply both `WithFs` and the new
`WithPackHash` option. In that mode `NewPackfile` builds a
`PackHandle` from `packhandle.PathSource(fs, file.Name())`,
closes the file the caller handed in (now redundant), and
the `Scanner` and `FSObject` acquire cursors from
`OpenPackReader` and `OpenRandomReader`. `Packfile.init`
consults `PackHandle.Meta` for the trailing pack hash so
that parse is cached on the handle rather than re-done from
the scanner on every `Packfile` construction.
`Packfile.Close` then tears down the `PackHandle`, which
closes the scanner cursor and the underlying `sharedFile`.

The field is named rather than embedded so no identifier
from `internal/packhandle` reaches `Packfile`'s exported
surface; `FSObject` likewise grows a private
`acquireRandom` closure that hides the packhandle types.
Callers that omit either of the new options keep the
existing direct-file path — including the closed-FD reopen
probe and the scanner-driven trailing-hash extraction — so
`NewPackfile`'s signature stays frozen.

`Packfile.Close` now atomically sets a closed flag; `Get`,
`GetByOffset`, `GetSizeByOffset`, `GetByType`, and `Scanner`
check the flag on entry and (for the methods that take
`p.m`) again after the mutex. Without the gate, a Close
that returns the bufio to its pool would race against a
concurrent caller still inside the scanner — the race
detector flagged this on macOS when `Get` was invoked after
`Close` while delta-selector workers were active in a
parallel test.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
Cache one `*packhandle.PackHandle` per pack hash inside
`DotGit`, keyed off the pack's hash and guarded by a
private mutex. The new unexported `packHandle(hash)`
method is the single construction point: it verifies the
pack exists, then wires a `packhandle.Sources` whose
`Pack` opener delegates to `ObjectPack`, `Idx` to
`ObjectPackIdx`, and `Rev` to `OpenPackRev`. Routing `Rev`
through `OpenPackRev` reuses the existing
disk-vs-in-memory fallback for `.rev`, so the cache is
transparent to whether the pack ships a `.rev` on disk.

Amortizing the handle across callers detaches FD lifecycle
from any single `Packfile`: scanner cursors and
random-access cursors can come and go while the same
`sharedFile` observes their refcount and applies the
grace-period close on idle.

Cache invalidation tracks `packMap`'s wholesale-clear
semantics: `cleanPackList` (called from `NewObjectPack`
and the entry point of `DeleteOldObjectPackAndIndex`)
drains and closes every cached handle, since a pack-set
mutation may have invalidated any of them.
`DeleteOldObjectPackAndIndex` additionally evicts the
specific pack's handle. The function is also reworked to
attempt the `.pack`, `.idx` and `.rev` removals
independently and join their errors, so a partial failure
cannot leave orphan siblings on disk; a missing `.rev` is
not an error since the reverse index is optional.
`DotGit.Close` iterates the map, closes each handle, and
joins any errors with the existing file-close error.

The cache and method are unexported; no signature on
`DotGit` changes. Callers that still go through
`ObjectPack(hash) billy.File` are unaffected.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
Add `OpenPackForReading(hash) (billy.File, error)` to
`DotGit` and route `ObjectStorage`'s two read paths through
it instead of the fresh-FD `ObjectPack(hash)`. The returned
`billy.File` is an unexported wrapper over a
`packhandle.PackHandle` cursor: `Read`, `ReadAt`, `Seek`,
`Close` forward to the cursor; `Stat` resolves through the
filesystem; `Write`, `WriteAt`, `Lock`, `Unlock`, and
`Truncate` return an error since the handle is read-only by
contract. The wrapper type keeps `*packhandle.PackHandle`
out of the public surface.

`ObjectStorage.packfile` and `buildPackfileIters` now call
the new method. Both code paths inherit the dotgit-level
cache: across calls for the same pack hash the FD is pooled
while in use and grace-closed when idle, instead of being
opened fresh on every call. `ObjectPack(hash)` itself is
unchanged and continues to hand out fresh descriptors for
external callers.

Remove the deprecated FD-pinning options now that the
PackHandle cache supersedes them. Gone:
`storage/filesystem.Options.KeepDescriptors`,
`storage/filesystem.Options.MaxOpenDescriptors`, and
`storage/filesystem/dotgit.Options.KeepDescriptors`, along
with the legacy `DotGit.files` map and
`ObjectStorage.packfiles`/`packList`/`packListIdx` cache
those options gated (plus their `packfileFromCache` and
`storePackfileInCache` helpers). Callers who relied on
`KeepDescriptors=true` to pin FDs across an external
unlink should keep readers acquired — an active acquire
pins the FD until release. The `_examples/ls` example is
updated to call `filesystem.NewStorage` since the option
no longer exists.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
Three end-to-end tests at the `ObjectStorage` layer pin the
user-visible FD-lifecycle properties of the PackHandle
routing.

FD pooling: twelve serial `EncodedObject` calls against the
same pack produce ≤2 `fs.Open` calls. A `countingFS`
wrapper around the `billy.Filesystem` intercepts `.pack`
opens via the interface embedding pattern; only the `Open`
method is overridden.

Concurrent reads: `runtime.NumCPU()` goroutines each issue
200 reads of the same object against a zero-capacity
object cache (so every call traverses the pack-FD path).
All reads succeed and return the correct hash. The perf
claim about parallel scaling is validated by
`BenchmarkObjectStorage_PackHandle`, not here — CI runners
are too noisy to be a reliable wall-clock signal.

Reindex invalidation: calling `DeleteOldObjectPackAndIndex`
with a future time closes the cached `PackHandle` via
`cleanPackList`. The underlying `sharedFile.Close` is
synchronous even with active refs, so an in-flight reader's
next `ReadAt` surfaces `fs.ErrClosed`. Subsequent
`OpenPackForReading` and `EncodedObject` calls also fail
(`ExclusiveAccess=true` ensures the `packMap` is consulted
rather than re-scanned).

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added 4 commits May 21, 2026 12:59
Three micro-benchmarks attribute throughput to specific
PackHandle mechanisms and serve as regression sentinels when
run with `-count=5` piped to `benchstat`:

- `BenchmarkPackHandleAcquireGrace`: steady-state cost of the
  `OpenPackReader → ReadAt → Close` round-trip on a warm
  handle. Isolates `sharedFile` acquire/release plus
  `cursorReader` allocation; the grace-period design means
  `open(2)` never fires in the loop.

- `BenchmarkPackHandleParallelReadAt`: concurrent `ReadAt`
  scaling via `b.RunParallel`. Two sub-benchmarks:
  `packhandle_readat` (cursor against an osfs mmap-backed
  copy) and `baseline_direct_pread` (raw
  `(*os.File).ReadAt`). The baseline is the mmap ceiling;
  the gap measures PackHandle coordination overhead.

- `BenchmarkPackHandleMeta`: `Meta()` first-call vs cached
  cost. `first` constructs a fresh PackHandle each iteration
  (cold sync.Once); `cached` reuses one handle (warm). The
  ratio quantifies the memoisation benefit.

These are attribution-oriented micro-benchmarks. End-to-end
throughput measurement against the public `ObjectStorage`
surface lives in `storage/filesystem/`.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
`BenchmarkObjectStorage_PackHandle` is the headline
benchmark for the user-visible PackHandle deliverable.
Eight sub-benchmarks exercise the 2 × 2 × 2 workload
matrix:

  `EncodedObject`  vs `HasEncodedObject`
  G=1 (serial)     vs G=NumCPU (parallel)
  fd               vs mmap

Each sub-benchmark materialises a fresh fixture copy in
`b.TempDir()`, wraps it with the requested osfs variant,
builds an `ObjectStorage` with the default LRU object
cache, and runs a serial warm-up pass over all five
benchmark hashes. The warm-up ensures the pack index is
built and the PackHandle FD pool is primed before the
timed region starts; this isolates steady-state throughput
from first-open overhead.

A `benchWarmPack` helper performs one end-to-end read of
the pack file before the timed region to populate the OS
page cache, so the mmap first-touch faults are paid
outside the loop.

The baseline comparison against `origin/main` is intended
to be done with `benchstat`, not via an inline branch —
keeping a legacy code path in the same binary would
require significant wiring for no correctness gain.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
TestSharedFileGracePeriod flaked on Windows runners with
"Should be true ... file should close after grace period". The
test set `gracePeriod = 50ms`, ran an acquire/release pair, then
slept 100ms and asserted the file was closed. On GitHub-hosted
Windows runners under load the AfterFunc body can lag well past
its fire time, so a 50ms margin over the timer fire is not
enough for the body to actually run before the assertion checks.

Replace the fixed-sleep `assert.True` for "file should be
closed" with `assert.Eventually` so we no longer race AfterFunc
scheduling jitter. The test still covers the same semantics;
only the wait mechanism changes. Mirrors the fix applied to a
sibling test in d185174 [1].

[1]: go-git@d185174

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
The Index interface lacked a Close method even though
implementations holding file descriptors (LazyIndex) need
one. PackHandle.Index returns idxfile.Index, so the
interface is what external code receives; without Close on
the contract, callers had no way to signal resource
ownership through the interface. The omission is documented
in internal/packhandle/doc.go as a layering wart awaiting
the idxfile.Index redesign.

Add Close to the interface. MemoryIndex gets a no-op
implementation since it holds no external resources; the
test-only singleEntryIndex in plumbing/format/packfile gets
the same treatment so it continues to satisfy the interface.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
@hiddeco hiddeco force-pushed the release-desc-packhandle branch from 4a91415 to 7c245f6 Compare May 21, 2026 10:59
The refcounted-FD-with-grace-period primitive existed in two
near-identical copies: one in plumbing/format/idxfile and one in
internal/packhandle. The two had already started to drift —
bugfixes applied to one were not propagated to the other.

Move both implementations to a single internal/sharedfile package
with an exported API (SharedFile, Acquire, Release, Close,
ReadAtCloser). The packhandle copy provided the contract — it
already had the more capable shape with a configurable grace
period and a gen counter for clean timer invalidation. The
idxfile callers (LazyIndex and its entry iterators) move to the
exported API; idxfile.ReadAtCloser and packhandle.ReadAtCloser
both become type aliases for sharedfile.ReadAtCloser to preserve
backward source compatibility with existing call sites.

Polling loops in the tests that previously used a manual
deadline+sleep loop now use assert.Eventually for consistency
across the consolidated test file.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
@hiddeco hiddeco force-pushed the release-desc-packhandle branch from 7c245f6 to 3c57de9 Compare May 21, 2026 11:58
Comment on lines +375 to +396
fsObj = &FSObject{
hash: oh.ID(),
offset: oh.ContentOffset,
size: oh.Size,
typ: oh.Type,
index: p.Index,
fs: p.fs,
cache: p.cache,
acquireRandom: func() (packhandle.RandomReader, error) { return h.OpenRandomReader() },
}
} else {
fsObj = NewFSObject(
oh.ID(),
oh.Type,
oh.ContentOffset,
oh.Size,
p.Index,
p.fs,
p.file,
p.file.Name(),
p.cache,
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's simplify this creating the object once with all the shared values, and use the if only to set the difference.

}

p := packfile.NewPackfile(f,
return packfile.NewPackfile(f,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per off-line chat, we may need to add packfile.WithPackHash(pack) here.

@pjbgf pjbgf left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hiddeco Thanks for working on this. 🙇

@pjbgf pjbgf merged commit 3af8745 into go-git:main May 21, 2026
17 checks passed
@pjbgf

pjbgf commented May 21, 2026

Copy link
Copy Markdown
Member

@hiddeco merged as is, the nits above we can be pushed into #2133.

@hiddeco hiddeco deleted the release-desc-packhandle branch May 21, 2026 16:43
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 21, 2026
Build `&FSObject{...}` once with the fields common to both the
PackHandle-backed and legacy branches, and let the `p.h != nil`
conditional flip only the differing fields: `acquireRandom` for
the PackHandle path, `pack`/`packPath` for the legacy path. The
non-delta branch reads as one struct plus a short conditional
instead of two parallel literals kept in lockstep.

The legacy branch no longer routes through `NewFSObject`; the
inline literal already covers it. The exported constructor stays
for external callers.

Addresses pjbgf's nit on b98f623 [1], folded into this branch
per the merge comment [2].

[1]: go-git#2132 (comment)
[2]: go-git#2132 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 21, 2026
Document why `WithPackHash` should *not* be wired when the file
handed to `NewPackfile` already wraps a caller-managed
`packhandle.PackHandle` (e.g. the cursor returned by
`dotgit.OpenPackForReading`). Setting the option in that case
makes `NewPackfile` build a second, private FD owner on top of
the caller's wrapper, spawning a parallel grace-period catalog
per Packfile and bypassing the caller-owned soft-close hooks
such as `Storage.CloseIdleDescriptors`.

Behaviour is unchanged; this is a contract clarification
prompted by pjbgf's "we may need to add `WithPackHash(pack)`
here" nit on 988ebcd [1] (`ObjectStorage.packfile`), folded
into this branch per the merge comment [2]. Wiring it there
regresses `TestIntegration_PackFDIsPooledAcrossCalls` from <=2
to 13 FD opens per 12 lookups, and breaks `TestThinPack` on the
second parse, confirming the omission is load-bearing.

[1]: go-git#2132 (comment)
[2]: go-git#2132 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 22, 2026
Build `&FSObject{...}` once with the fields common to both the
PackHandle-backed and legacy branches, and let the `p.h != nil`
conditional flip only the differing fields: `acquireRandom` for
the PackHandle path, `pack`/`packPath` for the legacy path. The
non-delta branch reads as one struct plus a short conditional
instead of two parallel literals kept in lockstep.

The legacy branch no longer routes through `NewFSObject`; the
inline literal already covers it. The exported constructor stays
for external callers.

Addresses pjbgf's nit on b98f623 [1], folded into this branch
per the merge comment [2].

[1]: go-git#2132 (comment)
[2]: go-git#2132 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 22, 2026
Document why `WithPackHash` should *not* be wired when the file
handed to `NewPackfile` already wraps a caller-managed
`packhandle.PackHandle` (e.g. the cursor returned by
`dotgit.OpenPackForReading`). Setting the option in that case
makes `NewPackfile` build a second, private FD owner on top of
the caller's wrapper, spawning a parallel grace-period catalog
per Packfile and bypassing the caller-owned soft-close hooks
such as `Storage.CloseIdleDescriptors`.

Behaviour is unchanged; this is a contract clarification
prompted by pjbgf's "we may need to add `WithPackHash(pack)`
here" nit on 988ebcd [1] (`ObjectStorage.packfile`), folded
into this branch per the merge comment [2]. Wiring it there
regresses `TestIntegration_PackFDIsPooledAcrossCalls` from <=2
to 13 FD opens per 12 lookups, and breaks `TestThinPack` on the
second parse, confirming the omission is load-bearing.

[1]: go-git#2132 (comment)
[2]: go-git#2132 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
@hanwen hanwen mentioned this pull request May 23, 2026
@hanwen

hanwen commented May 23, 2026

Copy link
Copy Markdown

In #2153, I observe that my optimization to use mmap(), which yields a 30% speed increase, is undone by this change.

stracing into it, I get

[pid 470367] openat(24, "pack-afcaef7c5f76cb7ad37d5505e77ff4aa6a6aecbb.pack", O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 23
[pid 470367] close(24)                  = 0
[pid 470367] fcntl(23, F_GETFL)         = 0x28000 (flags O_RDONLY|O_LARGEFILE|O_NOFOLLOW)
[pid 470367] fcntl(23, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW) = 0
[pid 470367] epoll_ctl(6, EPOLL_CTL_ADD, 23, {events=EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, data=0x3fab86a9cc0001bc}) = -1 EPERM (Operation not permitted)
[pid 470367] fcntl(23, F_GETFL)         = 0x28800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_NOFOLLOW)
[pid 470367] fcntl(23, F_SETFL, O_RDONLY|O_LARGEFILE|O_NOFOLLOW) = 0
[pid 470367] close(4)                   = 0
[pid 470367] fstat(23, {st_mode=S_IFREG|0444, st_size=152076845, ...}) = 0
[pid 470367] mmap(NULL, 152076845, PROT_READ, MAP_SHARED, 23, 0) = 0x7f554ee00000
[pid 470367] munmap(0x7f554ee00000, 152076845) = 0
[pid 470367] close(23)                  = 0
[pid 470367] openat(AT_FDCWD, "/home/hanwen/vc/linux/.git", O_RDONLY|O_CLOEXEC) = 4
[pid 470367] fstat(4, {st_mode=S_IFDIR|0755, st_size=142, ...}) = 0
[pid 470367] openat(4, "objects", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_DIRECTORY) = 23
[pid 470367] openat(23, "pack", O_RDONLY|O_NOFOLLOW|O_CLOEXEC|O_DIRECTORY) = 24
[pid 470367] close(23)                  = 0
[pid 470367] openat(24, "pack-afcaef7c5f76cb7ad37d5505e77ff4aa6a6aecbb.pack", O_RDONLY|O_NOFOLLOW|O_CLOEXEC) = 23
[pid 470367] close(24)                  = 0
[pid 470367] fcntl(23, F_GETFL <unfinished ...>
[pid 470369] <... epoll_pwait resumed>, [], 128, 6, NULL, 0) = 0

i.e the same pack is open/closed over and over again.

@hiddeco

hiddeco commented May 24, 2026

Copy link
Copy Markdown
Member Author

@hanwen thanks for reporting this.

I have a fix that I'll try to turn into a PR tonight or tomorrow.

hiddeco added a commit to hiddeco/go-git that referenced this pull request May 27, 2026
Build `&FSObject{...}` once with the fields common to both the
PackHandle-backed and legacy branches, and let the `p.h != nil`
conditional flip only the differing fields: `acquireRandom` for
the PackHandle path, `pack`/`packPath` for the legacy path. The
non-delta branch reads as one struct plus a short conditional
instead of two parallel literals kept in lockstep.

The legacy branch no longer routes through `NewFSObject`; the
inline literal already covers it. The exported constructor stays
for external callers.

Addresses pjbgf's nit on b98f623 [1], folded into this branch
per the merge comment [2].

[1]: go-git#2132 (comment)
[2]: go-git#2132 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 27, 2026
Document why `WithPackHash` should *not* be wired when the file
handed to `NewPackfile` already wraps a caller-managed
`packhandle.PackHandle` (e.g. the cursor returned by
`dotgit.OpenPackForReading`). Setting the option in that case
makes `NewPackfile` build a second, private FD owner on top of
the caller's wrapper, spawning a parallel grace-period catalog
per Packfile and bypassing the caller-owned soft-close hooks
such as `Storage.CloseIdleDescriptors`.

Behaviour is unchanged; this is a contract clarification
prompted by pjbgf's "we may need to add `WithPackHash(pack)`
here" nit on 988ebcd [1] (`ObjectStorage.packfile`), folded
into this branch per the merge comment [2]. Wiring it there
regresses `TestIntegration_PackFDIsPooledAcrossCalls` from <=2
to 13 FD opens per 12 lookups, and breaks `TestThinPack` on the
second parse, confirming the omission is load-bearing.

[1]: go-git#2132 (comment)
[2]: go-git#2132 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 27, 2026
Build `&FSObject{...}` once with the fields common to both the
PackHandle-backed and legacy branches, and let the `p.h != nil`
conditional flip only the differing fields: `acquireRandom` for
the PackHandle path, `pack`/`packPath` for the legacy path. The
non-delta branch reads as one struct plus a short conditional
instead of two parallel literals kept in lockstep.

The legacy branch no longer routes through `NewFSObject`; the
inline literal already covers it. The exported constructor stays
for external callers.

Addresses pjbgf's nit on b98f623 [1], folded into this branch
per the merge comment [2].

[1]: go-git#2132 (comment)
[2]: go-git#2132 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 27, 2026
Document why `WithPackHash` should *not* be wired when the file
handed to `NewPackfile` already wraps a caller-managed
`packhandle.PackHandle` (e.g. the cursor returned by
`dotgit.OpenPackForReading`). Setting the option in that case
makes `NewPackfile` build a second, private FD owner on top of
the caller's wrapper, spawning a parallel grace-period catalog
per Packfile and bypassing the caller-owned soft-close hooks
such as `Storage.CloseIdleDescriptors`.

Behaviour is unchanged; this is a contract clarification
prompted by pjbgf's "we may need to add `WithPackHash(pack)`
here" nit on 988ebcd [1] (`ObjectStorage.packfile`), folded
into this branch per the merge comment [2]. Wiring it there
regresses `TestIntegration_PackFDIsPooledAcrossCalls` from <=2
to 13 FD opens per 12 lookups, and breaks `TestThinPack` on the
second parse, confirming the omission is load-bearing.

[1]: go-git#2132 (comment)
[2]: go-git#2132 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 27, 2026
Build `&FSObject{...}` once with the fields common to both the
PackHandle-backed and legacy branches, and let the `p.h != nil`
conditional flip only the differing fields: `acquireRandom` for
the PackHandle path, `pack`/`packPath` for the legacy path. The
non-delta branch reads as one struct plus a short conditional
instead of two parallel literals kept in lockstep.

The legacy branch no longer routes through `NewFSObject`; the
inline literal already covers it. The exported constructor stays
for external callers.

Addresses pjbgf's nit on b98f623 [1], folded into this branch
per the merge comment [2].

[1]: go-git#2132 (comment)
[2]: go-git#2132 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
hiddeco added a commit to hiddeco/go-git that referenced this pull request May 27, 2026
Document why `WithPackHash` should *not* be wired when the file
handed to `NewPackfile` already wraps a caller-managed
`packhandle.PackHandle` (e.g. the cursor returned by
`dotgit.OpenPackForReading`). Setting the option in that case
makes `NewPackfile` build a second, private FD owner on top of
the caller's wrapper, spawning a parallel grace-period catalog
per Packfile and bypassing the caller-owned soft-close hooks
such as `Storage.CloseIdleDescriptors`.

Behaviour is unchanged; this is a contract clarification
prompted by pjbgf's "we may need to add `WithPackHash(pack)`
here" nit on 988ebcd [1] (`ObjectStorage.packfile`), folded
into this branch per the merge comment [2]. Wiring it there
regresses `TestIntegration_PackFDIsPooledAcrossCalls` from <=2
to 13 FD opens per 12 lookups, and breaks `TestThinPack` on the
second parse, confirming the omission is load-bearing.

[1]: go-git#2132 (comment)
[2]: go-git#2132 (comment)

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants