Skip to content

worktree: skip ignored directories during Status walk#2048

Merged
pjbgf merged 10 commits into
go-git:mainfrom
Soph:status-skip-ignored-walk
May 4, 2026
Merged

worktree: skip ignored directories during Status walk#2048
pjbgf merged 10 commits into
go-git:mainfrom
Soph:status-skip-ignored-walk

Conversation

@Soph

@Soph Soph commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Worktree.Status() walks the working tree via the filesystem merkletrie noder and then drops gitignored entries in excludeIgnoredChanges (worktree_status.go:175-177). Because the walker has no concept of .gitignore, every file under a large ignored directory like node_modules is lstat'd and hashed before being filtered out. CLI git skips such directories at the directory level via treat_directory / read_directory_recursive in dir.c, which is why git status is effectively constant-time in the size of node_modules.

This PR plumbs an IgnoreMatcher into filesystem.Options so the walker can skip ignored entries during traversal, while preserving the "tracked entries are still walked" semantics that match git status.

Design

  • New exported interface filesystem.IgnoreMatcher (single method, satisfied by gitignore.Matcher — kept as an interface to avoid pulling gitignore into a leaf utility package).
  • New optional field filesystem.Options.IgnoreMatcher.
  • At root construction, when both Index and IgnoreMatcher are set, build a trackedDirs set of directory prefixes that contain index entries.
  • In calculateChildren, skip a child entry when:
    • it matches the matcher, and
    • it has no entry in the index (files), or contains no tracked entry (directories).
  • In worktree_status.go, load gitignore.ReadPatterns once per Status call and reuse the slice for both the matcher and the post-walk excludeIgnoredChanges filter (it previously re-read patterns from disk).

The post-walk filter is kept as a defense-in-depth backstop; with the new path it has no work to do but the cost is negligible.

Correctness vs. upstream git

Behavior preserved:

  • Untracked + ignored → not reported. Same as git status.
  • Tracked + matches ignore → still reported. Mirrors git status behavior for files that were committed before an ignore rule was added (e.g. git add -f then later ignored). The walker descends into directories that contain tracked entries even if the directory matches an ignore pattern.
  • Untracked siblings of a tracked-but-ignored file → still filtered. When the walker descends into an ignored directory because a tracked file lives there, untracked siblings are filtered per-entry.

No change to:

  • diffCommitWithStaging (left side of Status) — only the worktree walk is affected.
  • excludeIgnoredChanges semantics — still drops len(ch.From) == 0 entries that match.
  • Any code path that doesn't pass an IgnoreMatcher (this is opt-in via Options).

Benchmark

BenchmarkStatusIgnoredDir (added in this PR) times Status() over a tree with 100 tracked source files plus N untracked files inside a gitignored vendor_ignored/ directory. Apple M4 Max, -benchtime=10x:

Untracked files Before After Speedup
0 1.5 ms 1.9 ms
1,000 7.0 ms 4.4 ms 1.6×
5,000 18.2 ms 4.4 ms 4.2×
20,000 58.4 ms 4.4 ms 13.3×

The "After" column is flat in the size of the ignored tree — the walker no longer descends. The ~0.4 ms overhead at N=0 is the unchanged gitignore.ReadPatterns walk plus the small trackedDirs map; it scales with the tracked tree, not the ignored one.

Tests

Unit tests in utils/merkletrie/filesystem/node_ignore_test.go:

  • TestIgnoredDirIsSkipped — directory matching the matcher with no tracked content is not walked.
  • TestTrackedFileInIgnoredDirReportsModify — a tracked file inside an ignored directory still surfaces as Modify when its content changes.
  • TestUntrackedSiblingsInIgnoredDirAreSkipped — when the walker descends into an ignored directory because of a tracked file, untracked siblings are still filtered.

Integration test in worktree_status_test.go:

  • TestStatusReportsModifiedTrackedFileInIgnoredDirectory — same invariant as the second unit test, exercised end-to-end through Worktree.Status().

Existing test suites (Status*, TestNoder*, gitignore) remain green.

API surface

Additive only:

  • New exported interface filesystem.IgnoreMatcher.
  • New field filesystem.Options.IgnoreMatcher.

gitignore.Matcher satisfies the new interface implicitly, so existing callers can opt in by setting the field.

AI disclosure

Per AI_POLICY.md, this contribution was assisted by Claude Opus 4.7 (commit trailers). The author has reviewed every line and is accountable for correctness; the design (interface, trackedDirs building, the "tracked → still walked" invariant) was specified by the author and validated against upstream git's dir.c behavior.

Copilot AI review requested due to automatic review settings April 30, 2026 17:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes Worktree.Status() by avoiding filesystem traversal/hashing for large gitignored directories (e.g., node_modules) by plumbing an ignore matcher into the merkletrie filesystem walker while preserving the “tracked entries are still walked” behavior.

Changes:

  • Add filesystem.IgnoreMatcher + filesystem.Options.IgnoreMatcher and use it during filesystem merkletrie traversal to skip ignored, untracked entries.
  • Update Worktree status logic to read gitignore patterns once per Status() call and reuse them both for walk-skipping and the post-walk filter.
  • Add unit/integration tests and a benchmark covering ignored-directory status performance.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
utils/merkletrie/filesystem/node.go Adds ignore-matcher support and tracked-directory detection to allow skipping ignored/untracked entries during traversal.
utils/merkletrie/filesystem/node_ignore_test.go Adds unit tests validating ignored-dir skipping and tracked-in-ignored semantics.
worktree_status.go Reads ignore patterns once, sets the filesystem walker’s ignore matcher, and reuses patterns for the backstop filter.
worktree_status_test.go Adds end-to-end test asserting a tracked-but-ignored file still appears as modified while untracked ignored siblings do not.
worktree_status_ignored_bench_test.go Adds benchmark to demonstrate performance improvement for large ignored directories.

Comment thread worktree_status_ignored_bench_test.go Outdated
Comment thread worktree_status_ignored_bench_test.go Outdated
Comment thread utils/merkletrie/filesystem/node.go Outdated
Comment thread utils/merkletrie/filesystem/node.go Outdated
@Soph Soph force-pushed the status-skip-ignored-walk branch from 3c75778 to 5d06149 Compare April 30, 2026 18:44
Soph added 2 commits April 30, 2026 20:45
Status() walks the working tree via the filesystem merkletrie noder, then
filters gitignored entries in excludeIgnoredChanges. Every file under a
large ignored directory like node_modules is lstat'd and hashed before
being dropped, even though CLI git skips such directories at the
directory level (dir.c:read_directory_recursive).

Plumb an IgnoreMatcher into filesystem.Options so the walker can skip
entries that match the matcher AND have no corresponding entry in the
index. Tracked entries that happen to match an ignore rule are still
walked so their modifications are reported. Patterns are loaded once in
diffStagingWithWorktree and reused, so the post-walk filter no longer
re-reads them.

Benchmark on a tree with 100 tracked source files and N untracked files
in a gitignored directory (Apple M4 Max, -benchtime=10x):

  Untracked | before  | after  | speedup
  --------- | ------- | ------ | -------
  0         |  1.5 ms | 1.9 ms | -
  1,000     |  7.0 ms | 4.4 ms |  1.6x
  5,000     | 18.2 ms | 4.4 ms |  4.2x
  20,000    | 58.4 ms | 4.4 ms | 13.3x

The small overhead in the 0-untracked case comes from the unchanged
ReadPatterns walk plus a small trackedDirs map; the gain comes from the
diff walk no longer descending into ignored subtrees.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
Add coverage for the three correctness invariants of the new
IgnoreMatcher path in the filesystem noder, plus a Status-level
integration test:

  - TestIgnoredDirIsSkipped: an ignored directory containing only
    untracked files is not walked.
  - TestTrackedFileInIgnoredDirReportsModify: a tracked file inside an
    ignored directory still surfaces as Modify.
  - TestUntrackedSiblingsInIgnoredDirAreSkipped: when the walker
    descends into an ignored directory because of a tracked file,
    untracked siblings are still filtered.
  - TestStatusReportsModifiedTrackedFileInIgnoredDirectory: the same
    invariant as the second test, exercised through Worktree.Status().

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
@Soph Soph force-pushed the status-skip-ignored-walk branch from 5d06149 to 9695898 Compare April 30, 2026 18:47
Soph added 2 commits April 30, 2026 22:34
The paralleltest linter is enabled in .golangci.yaml and existing tests
in utils/merkletrie/filesystem follow the t.Parallel() convention. The
three tests added in 9695898 were missing the call, breaking
validate-lint on CI.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
shouldSkipIgnored documents that IgnoreMatcher is a no-op when no Index
is provided, since without an index there is no way to identify tracked
entries. The implementation did not enforce that contract: idxMap nil
caused ignored files to be skipped, and trackedDirs nil (a nil-map
lookup returning false) caused ignored directories to be skipped too.

Add an explicit early return when idxMap is nil, drop the now-dead nil
check on the file path, tighten the doc comment on Options.IgnoreMatcher,
and cover the contract with a regression test.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

@pjbgf pjbgf left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Soph thanks for looking into this. The performance results look great. 👏

The PR is largely LGTM apart from the small bits below.

Comment thread utils/merkletrie/filesystem/node.go Outdated
Comment thread utils/merkletrie/filesystem/node.go Outdated
Comment thread utils/merkletrie/filesystem/node.go Outdated
Comment thread worktree_status.go Outdated
Comment thread worktree_status.go Outdated
Comment thread worktree_status_ignored_bench_test.go Outdated
@onee-only

Copy link
Copy Markdown
Contributor

Related issue: #181. The PR #1379 originally attempted to fix this using a similar approach. Thankfully, the author agreed to let us supersede their PR.

Soph added 6 commits May 3, 2026 18:50
The filesystem walker now skips ignored, untracked entries during the
Status walk, so they never enter the merkletrie. The post-walk filter
that removed them again can never observe such an entry, and tracked
entries are walked in both implementations, so excludeIgnoredChanges
agrees with the new walker on every input.

Drop the function and its only caller in diffStagingWithWorktree, and
fold the early return into the DiffTree calls. collectIgnorePatterns is
kept; it now only feeds the IgnoreMatcher.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
trackedDirs is only ever consulted as a presence check; it never stored
a meaningful bool value. Use map[string]struct{} to make the intent
explicit and avoid the per-entry byte.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
Keep all Status() benchmarks in a single file. Also refresh the
docstrings: they described the pre-PR behavior where ignored files were
walked and hashed, which no longer matches reality now that the
filesystem walker skips ignored directories.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
Drop the local IgnoreMatcher interface and reference gitignore.Matcher
in Options instead. The interface served only to keep this package free
of the gitignore import, but reviewers preferred the simpler direct
reference: every real caller plugs in a gitignore.Matcher anyway, so
the abstraction was paying for a layering boundary nobody was using.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
shouldSkipIgnored consulted trackedDirs in the directory branch, but
trackedDirs only records *parent* paths of index entries. A submodule's
own path is the index entry, so its directory was pruned by the walker
when its parent matched an ignore rule, and modifications to the
submodule were silently dropped from Status().

Check idxMap before falling back to trackedDirs: an entry whose own path
is in the index is tracked, whether it is a regular file or a
directory-shaped entry like a submodule. Add a regression test that
places a submodule under an ignored directory.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
Add three regression tests that exercise edge cases of the ignore-walk
optimization, modeled on equivalent scenarios in upstream git's
t/t7061-wtstatus-ignore.sh and t/t7521-ignored-mode.sh:

  - TestDeeplyNestedTrackedFileInIgnoredDir: a tracked file three
    directories deep under an ignored top-level dir, exercising the
    parent-chain population of trackedDirs.
  - TestFilePatternIgnoreSkipsUntrackedSiblings: a *.log file pattern
    rather than a directory pattern, ensuring the file branch of
    shouldSkipIgnored handles per-file ignore rules.
  - TestEmptyIgnoredDirIsSkipped: an ignored directory with no entries
    on disk is still pruned without surfacing as a child.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
@Soph

Soph commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

@pjbgf @onee-only I had the thought to check which tests the git cli has around this scenario. It had 4 tests with coverage this PR hadn't yet. So I added them and the submodule case found an actual bug.

@onee-only onee-only left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the update! I left a small suggestion on trackedDirs initialization. I'd appreciate it if you could share your thoughts.

Comment on lines +107 to 126
var trackedDirs map[string]struct{}

if options.Index != nil {
idxMap = make(map[string]*index.Entry, len(options.Index.Entries))
for _, entry := range options.Index.Entries {
idxMap[entry.Name] = entry
}

if options.IgnoreMatcher != nil {
trackedDirs = make(map[string]struct{})
for _, entry := range options.Index.Entries {
for parent := path.Dir(entry.Name); parent != "." && parent != "/"; parent = path.Dir(parent) {
if _, ok := trackedDirs[parent]; ok {
break
}
trackedDirs[parent] = struct{}{}
}
}
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can combine these idxMap and trackdDirs initializations into one loop? I'm not sure how big this would affect to the performance, but it could take advantage of locality of reference I think.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤓 I let claude work a bit for it's money:

Ran a microbenchmark over NewRootNodeWithOptions setup at 1k/10k/100k index entries, 10 samples each, compared via benchstat:

Entries Path Two-loop (current) Fused Δ
1k WithMatcher 147.7 µs 146.6 µs -0.78%
10k WithMatcher 1.027 ms 1.012 ms -1.46%
100k WithMatcher 7.552 ms 7.454 ms -1.30%
1k NoMatcher 11.32 µs 11.46 µs +1.32%
10k NoMatcher 146.1 µs 149.1 µs +2.06%
100k NoMatcher 1.861 ms 1.911 ms +2.71%
geomean 390.1 µs 391.7 µs +0.41%

All deltas are statistically significant (p ≤ 0.01). Allocations are identical.

Fusing wins 1–1.5% on the matcher path but loses 1–3% on the no-matcher path — the per-iteration trackedDirs != nil check costs a comparison every iteration even when nothing else runs. Geomean is +0.41%, essentially flat.

Since the no-matcher path is the common one (only Status with excludeIgnoredChanges populates trackedDirs), I'd rather keep the two passes — the rare-path win doesn't pay for the common-path regression, and the absolute setup cost is well under 1% of Status() either way on realistic repos.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. Thanks for looking into it. 🙇

@pjbgf pjbgf left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Soph thanks for working on this. 🙇

@pjbgf pjbgf merged commit 70ab884 into go-git:main May 4, 2026
17 checks passed
Soph added a commit to Soph/go-git that referenced this pull request May 8, 2026
Reset (HardReset, KeepReset, MergeReset) walks the working tree via
diffStagingWithWorktree to find tracked files that are missing or stale
on disk. Without the IgnoreMatcher optimization landed in go-git#2048, that
walk descends into every gitignored subtree before the loop body
discards the resulting Delete actions, so every file under directories
like node_modules is lstat'd for nothing.

The infrastructure to avoid this already exists: passing
excludeIgnoredChanges=true to diffStagingWithWorktree builds a
gitignore.Matcher and threads it into filesystem.Options.IgnoreMatcher,
where the noder's shouldSkipIgnored prunes ignored, untracked subtrees
at enumeration time while keeping tracked entries (or directories with
tracked descendants) visible. Flip the flag to true at the two Reset
call sites — resetWorktreeToTree and resetWorktree — to engage it.

The observable result is unchanged. resetWorktreeToTree's loop already
skips Delete actions explicitly to preserve untracked files (the fix
from 0c7da9764), and resetWorktree filters changes by the supplied
files list, which never names an untracked-and-ignored path. The
matcher only avoids the cost of walking those entries.

Benchmark on a tree with 100 tracked source files and N untracked files
in a gitignored directory (Apple M5, -benchtime=10x):

  Untracked | before   | after  | speedup
  --------- | -------- | ------ | -------
  0         |  2.2 ms  | 2.6 ms | -
  1,000     |  7.7 ms  | 3.8 ms |  2.0x
  5,000     | 16.2 ms  | 5.2 ms |  3.1x
  20,000    | 51.8 ms  | 4.6 ms | 11.3x

The small overhead in the 0-untracked case is the same matcher build
cost noted in go-git#2048; the gain comes from the diff walk no longer
descending into ignored subtrees. The numbers track BenchmarkStatusIgnoredDir
closely because both operations share diffStagingWithWorktree.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
Soph added a commit to Soph/go-git that referenced this pull request May 8, 2026
resetWorktreeToTree (HardReset, KeepReset) walks the working tree via
diffStagingWithWorktree to find tracked files that are missing or
stale on disk. Without the IgnoreMatcher optimization landed in go-git#2048,
that walk descends into every gitignored subtree before the loop body
discards the resulting Delete actions, so every file under directories
like node_modules is lstat'd for nothing.

The infrastructure to avoid this already exists: passing
excludeIgnoredChanges=true to diffStagingWithWorktree builds a
gitignore.Matcher and threads it into filesystem.Options.IgnoreMatcher,
where the noder's shouldSkipIgnored prunes ignored, untracked subtrees
at enumeration time while keeping tracked entries (or directories with
tracked descendants) visible. Flip the flag to true at the
resetWorktreeToTree call site to engage it.

Leave the resetWorktree (MergeReset) call site at false. resetIndex
runs immediately before resetWorktree and removes paths from the
index. A tracked path inside a gitignored directory that was just
removed from the index would no longer be in idxMap, so the noder
would prune it from the walk and the Delete action needed to remove
it from disk would never be emitted. resetWorktreeToTree is safe
because step 1 (the tree-to-tree diff) handles deletions, and step 2
explicitly skips any Delete action returned from the worktree walk.

The observable result is unchanged for HardReset / KeepReset: step 2's
loop already skips Delete actions explicitly to preserve untracked
files (the fix from 0c7da9764). The matcher only avoids the cost of
walking those entries.

Benchmark on a tree with 100 tracked source files and N untracked files
in a gitignored directory (Apple M5, -benchtime=10x):

  Untracked | before   | after  | speedup
  --------- | -------- | ------ | -------
  0         |  2.2 ms  | 2.6 ms | -
  1,000     |  7.7 ms  | 3.8 ms |  2.0x
  5,000     | 16.2 ms  | 5.2 ms |  3.1x
  20,000    | 51.8 ms  | 4.6 ms | 11.3x

The small overhead in the 0-untracked case is the same matcher build
cost noted in go-git#2048; the gain comes from the diff walk no longer
descending into ignored subtrees. The numbers track BenchmarkStatusIgnoredDir
closely because both operations share diffStagingWithWorktree.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
Soph added a commit to Soph/go-git that referenced this pull request May 8, 2026
Three regression tests pinning the contract this branch relies on at
the worktree-level (the noder-level invariants are already covered by
node_ignore_test.go in go-git#2048).

  - TestResetHardTrackedFileInIgnoredDir: a tracked file inside a
    gitignored directory, deleted on disk, is restored by HardReset.
    Exercises the trackedDirs path through Reset.
  - TestResetHardModifiedTrackedFileWithGitIgnore: complements the
    existing TestResetHardWithGitIgnore (which covers deletion) with
    the modification path: a tracked file whose own name is gitignored
    has its content reverted by HardReset.
  - TestMergeResetRemovesTrackedFileInIgnoredDir: pins the reason
    resetWorktree (MergeReset) must keep excludeIgnoredChanges=false.
    Engaging the matcher there would prune a tracked path that
    resetIndex just removed from the index, so the Delete needed to
    clean it off disk would never be emitted. Verified by flipping
    the flag locally — the test fails when the matcher is enabled and
    passes when it is not.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
Soph added a commit to Soph/go-git that referenced this pull request May 20, 2026
Three regression tests pinning the contract this branch relies on at
the worktree-level (the noder-level invariants are already covered by
node_ignore_test.go in go-git#2048).

  - TestResetHardTrackedFileInIgnoredDir: a tracked file inside a
    gitignored directory, deleted on disk, is restored by HardReset.
    Exercises the trackedDirs path through Reset.
  - TestResetHardModifiedTrackedFileWithGitIgnore: complements the
    existing TestResetHardWithGitIgnore (which covers deletion) with
    the modification path: a tracked file whose own name is gitignored
    has its content reverted by HardReset.
  - TestMergeResetRemovesTrackedFileInIgnoredDir: pins the reason
    resetWorktree (MergeReset) must keep excludeIgnoredChanges=false.
    Engaging the matcher there would prune a tracked path that
    resetIndex just removed from the index, so the Delete needed to
    clean it off disk would never be emitted. Verified by flipping
    the flag locally — the test fails when the matcher is enabled and
    passes when it is not.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
yuzhuo added a commit to yuzhuo/go-git that referenced this pull request Jun 4, 2026
* plumbing: packp, fix lint error in advrefs_decode

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: packp, add depth validation to match canonical git

Prevent combining deepen <n> with deepen-since or deepen-not, following
canonical git's validation in upload-pack.c:
`deepen and deepen-since (or deepen-not) cannot be used together`

Add validation to both encoder and decoder, with tests for all
invalid combinations.

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: transport, fix unreachable haves sideband progress test

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: packp, rename DepthRequest fields to use "deepen" prefix, matching protocol lines

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: packp, refactor packp decoding and encoding, remove state machine

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: capability, document that the List is not thread safe

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: capability, add validation for capabilities and their arguments

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: capability, add SessionID capability with validation

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: transport, validate capabilities in handshake and serve

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: capability, combine tests into table driven cases

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: packp, advrefs: use ErrorIs in tests

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: packp, use deepen mutually exclusive error

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: packp, remove unused io.ReadCloser cast

Signed-off-by: Ayman Bagabas <[email protected]>

* worktree: skip ignored directories during Status walk (#2048)

* *: skip gitignored directories during Status walk

Status() walks the working tree via the filesystem merkletrie noder, then
filters gitignored entries in excludeIgnoredChanges. Every file under a
large ignored directory like node_modules is lstat'd and hashed before
being dropped, even though CLI git skips such directories at the
directory level (dir.c:read_directory_recursive).

Plumb an IgnoreMatcher into filesystem.Options so the walker can skip
entries that match the matcher AND have no corresponding entry in the
index. Tracked entries that happen to match an ignore rule are still
walked so their modifications are reported. Patterns are loaded once in
diffStagingWithWorktree and reused, so the post-walk filter no longer
re-reads them.

Benchmark on a tree with 100 tracked source files and N untracked files
in a gitignored directory (Apple M4 Max, -benchtime=10x):

  Untracked | before  | after  | speedup
  --------- | ------- | ------ | -------
  0         |  1.5 ms | 1.9 ms | -
  1,000     |  7.0 ms | 4.4 ms |  1.6x
  5,000     | 18.2 ms | 4.4 ms |  4.2x
  20,000    | 58.4 ms | 4.4 ms | 13.3x

The small overhead in the 0-untracked case comes from the unchanged
ReadPatterns walk plus a small trackedDirs map; the gain comes from the
diff walk no longer descending into ignored subtrees.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

* *: test ignored-directory walk skipping in Status

Add coverage for the three correctness invariants of the new
IgnoreMatcher path in the filesystem noder, plus a Status-level
integration test:

  - TestIgnoredDirIsSkipped: an ignored directory containing only
    untracked files is not walked.
  - TestTrackedFileInIgnoredDirReportsModify: a tracked file inside an
    ignored directory still surfaces as Modify.
  - TestUntrackedSiblingsInIgnoredDirAreSkipped: when the walker
    descends into an ignored directory because of a tracked file,
    untracked siblings are still filtered.
  - TestStatusReportsModifiedTrackedFileInIgnoredDirectory: the same
    invariant as the second test, exercised through Worktree.Status().

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

* *: add t.Parallel() to ignore-walk tests

The paralleltest linter is enabled in .golangci.yaml and existing tests
in utils/merkletrie/filesystem follow the t.Parallel() convention. The
three tests added in 96958981 were missing the call, breaking
validate-lint on CI.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

* utils: merkletrie/filesystem, ignore IgnoreMatcher when Index is nil

shouldSkipIgnored documents that IgnoreMatcher is a no-op when no Index
is provided, since without an index there is no way to identify tracked
entries. The implementation did not enforce that contract: idxMap nil
caused ignored files to be skipped, and trackedDirs nil (a nil-map
lookup returning false) caused ignored directories to be skipped too.

Add an explicit early return when idxMap is nil, drop the now-dead nil
check on the file path, tighten the doc comment on Options.IgnoreMatcher,
and cover the contract with a regression test.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

* *: drop redundant excludeIgnoredChanges post-walk filter

The filesystem walker now skips ignored, untracked entries during the
Status walk, so they never enter the merkletrie. The post-walk filter
that removed them again can never observe such an entry, and tracked
entries are walked in both implementations, so excludeIgnoredChanges
agrees with the new walker on every input.

Drop the function and its only caller in diffStagingWithWorktree, and
fold the early return into the DiffTree calls. collectIgnorePatterns is
kept; it now only feeds the IgnoreMatcher.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

* utils: merkletrie/filesystem, switch trackedDirs to a set

trackedDirs is only ever consulted as a presence check; it never stored
a meaningful bool value. Use map[string]struct{} to make the intent
explicit and avoid the per-entry byte.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

* *: fold ignored-dir status bench into worktree_status_bench_test.go

Keep all Status() benchmarks in a single file. Also refresh the
docstrings: they described the pre-PR behavior where ignored files were
walked and hashed, which no longer matches reality now that the
filesystem walker skips ignored directories.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

* utils: merkletrie/filesystem, use gitignore.Matcher in Options directly

Drop the local IgnoreMatcher interface and reference gitignore.Matcher
in Options instead. The interface served only to keep this package free
of the gitignore import, but reviewers preferred the simpler direct
reference: every real caller plugs in a gitignore.Matcher anyway, so
the abstraction was paying for a layering boundary nobody was using.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

* utils: merkletrie/filesystem, walk submodules inside ignored directories

shouldSkipIgnored consulted trackedDirs in the directory branch, but
trackedDirs only records *parent* paths of index entries. A submodule's
own path is the index entry, so its directory was pruned by the walker
when its parent matched an ignore rule, and modifications to the
submodule were silently dropped from Status().

Check idxMap before falling back to trackedDirs: an entry whose own path
is in the index is tracked, whether it is a regular file or a
directory-shaped entry like a submodule. Add a regression test that
places a submodule under an ignored directory.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

* utils: merkletrie/filesystem, expand ignore-walk test coverage

Add three regression tests that exercise edge cases of the ignore-walk
optimization, modeled on equivalent scenarios in upstream git's
t/t7061-wtstatus-ignore.sh and t/t7521-ignored-mode.sh:

  - TestDeeplyNestedTrackedFileInIgnoredDir: a tracked file three
    directories deep under an ignored top-level dir, exercising the
    parent-chain population of trackedDirs.
  - TestFilePatternIgnoreSkipsUntrackedSiblings: a *.log file pattern
    rather than a directory pattern, ensuring the file branch of
    shouldSkipIgnored handles per-file ignore rules.
  - TestEmptyIgnoredDirIsSkipped: an ignored directory with no entries
    on disk is still pruned without surfacing as a child.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

---------

Signed-off-by: Stefan Haubold <[email protected]>

* build: Update module github.com/pjbgf/sha1cd to v0.6.0

* build: Deprecate ChrootOS

Signed-off-by: Paulo Gomes <[email protected]>

* storage: Ensure alternates resolve BoundOS path

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: Update util.TempFile usage

Signed-off-by: Paulo Gomes <[email protected]>

* *: Use test TempDir() instead of util.TempDir()

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: transport/http, Add test for multi-round Fetch

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: transport, Align flush size with upstream git
Previously go-git flush size was hard-coded to 32, which could be rather inefficient for very
large negotiations. This change aligns the behaviour with upstream, growing exponentially that
number when it is safe to do so, as per next_flush() (fetch-pack.c:277-291).

The initial flush size is now 16 - which is also aligned with upstream.

ref: https://github.com/git/git/blob/9f223ef1c026d91c7ac68cc0211bde255dda6199/fetch-pack.c
Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: transport, align ACK handling and negotiation with upstream git

Add applyServerACKs to properly handle multi-ack-detailed responses:
- Reset in_vain on ACK_continue and ACK_ready (fetch-pack.c:550-552)
- Re-queue ACK_common haves for stateless RPC rounds (fetch-pack.c:533-542)
- Suppress duplicate ACK_common to avoid re-queuing already-known objects
- Track ACK_ready and stop negotiation early (fetch-pack.c:555-556, 569-570)
- Send a terminal done round after ACK_ready (fetch-pack.c:577-579)
- Clamp batch size to remaining in-vein budget after got_continue

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: transport/http, Close storage

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: object, Add support for gpgsig-sha256 header

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: objects, Align EncodeWithoutSignature with upstream Git

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: objects, Ignore standard headers in ExtraHeaders

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: object, Decode commit headers via a state machine

Replace the per-iteration flag soup in (*Commit).Decode with a
func-based stateFn loop that mirrors upstream Git's parsing posture.
The new state machine encodes "where we are in the buffer" as the
function being called rather than as a bag of mutable booleans.

Conformance changes vs upstream commit.c:
- tree must be the first header, missing tree errors with
  ErrMalformedCommit (matches the "bogus commit object" check at
  commit.c:508-510).
- Parents are accepted only contiguously after tree; out-of-position
  parent lines are silently dropped (matches parse_commit_buffer's
  parent loop exiting at the first non-parent line and
  read_commit_extra_header_lines filtering parents from extras).
- Author and committer are accepted only at their canonical positions
  (immediately after parents, immediately after author). Out-of-place
  occurrences are dropped, mirroring parse_commit_date silently
  returning 0 plus the standard_header_field filter.
- Encoding/mergetag/gpgsig/gpgsig-sha256 follow first-wins; duplicate
  sig/mergetag continuation lines are discarded by a dedicated
  scanSkipCont state.
- Continuation strip uses line[1:] (single space) instead of
  bytes.TrimLeft, mirroring upstream's `line + 1` at commit.c:1509.
- Header/body boundary is the literal "\n" only, matching
  *line == '\n' at commit.c:1502.

Tests:
- TestDecodeRequiresTreeFirst covers the four hard-error inputs
  (missing tree, parent before tree, extra before tree, empty buffer).
- TestDecodeFirstOccurrenceWins covers the lenient drop semantics for
  duplicate tree/author/committer/gpgsig/gpgsig-sha256, parents after
  author or interleaved with extras, missing committer, encoding
  between author and committer (drops the misplaced committer), and
  author at a non-canonical position (drops author and committer).
- TestMalformedHeader is back to expecting NoError (lenient parsing of
  garbage author/committer values).

The state machine lives in plumbing/object/commit_scanner.go.

Assisted-by: Claude Opus 4.7 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: objects, Remove Commit.MergeTag in favour of ExtraHeaders
The previous MergeTag field had two issues:
1. It could only represent a single mergetag, whilst upstream support multiple within a single Commit.
2. Did not allow for a byte-to-byte alignment with upstream, due to the above, and its position
set by go-git during encoding time.

In order to mitigate both issues, it will now be handled as part of ExtraHeaders, which
better aligns with upstream.

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: object, Add support for Tag.SignatureSHA256

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: object, Align Tag.EncodeWithoutSignature with Commit

Tag verification re-encoded from struct fields, so any non-canonical
bytes the signature was computed over (duplicate headers, mutation-only
fields like Signature/SignatureSHA256, etc.) were lost from the payload
and signatures that were valid in upstream Git failed in go-git.

Mirror the approach already used by Commit: retain a reference to the
source plumbing.EncodedObject on Decode, add matchesSource() so
EncodeWithoutSignature can stream the raw bytes (with the inline
trailing PGP block truncated and gpgsig/gpgsig-sha256 headers stripped)
when the struct still matches the source. Mutating struct fields still
falls back to the struct-encoded payload.

Assisted-by: Claude Opus 4.7 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: object, Reject multi-signature commits at Verify

Align Commit.Verify with upstream's parse_buffer_signed_by_header
(commit.c:1186) and parse_gpg_output rejection (gpg-interface.c:257-269).

The commit scanner now concatenates every gpgsig/gpgsig-sha256 header
into a single signature buffer instead of first-wins, and Commit.Verify
returns the new ErrMultipleSignatures when that buffer carries more
than one armored block.

Assisted-by: Claude Opus 4.7 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: object, Decode Tag headers via a state machine

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: object, Add ErrMalformedTag

Signed-off-by: Paulo Gomes <[email protected]>

* tests: Add git conformance tests for signing verification

Signed-off-by: Paulo Gomes <[email protected]>

* tests: Skip double checks in Git v2.11

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: object, Align Tree handling with upstream

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: object, Reset object before decode

Signed-off-by: Paulo Gomes <[email protected]>

* internal: git, add server timeout test small tolerance

This adds a small tolerance to workaround OS related time precision
jitter. Without this, time-based tests might fail with really small
deltas like 199.3ms < 200ms.

Signed-off-by: Ayman Bagabas <[email protected]>

* plumbing: transport/ssh, Shell-quote path and args

The SSH transport interpolates `req.URL.Path` and each entry of
`req.Args` into the remote-exec command line. The previous
formatter wrapped each value in literal single quotes via
`'%s'`, which diverges from canonical Git's wire format for any
value containing characters that the shell treats specially.

Port `sq_quote_buf`[1] from canonical Git into a `writeShellQuote`
helper that mirrors upstream's `struct strbuf *dst` signature, and
route both the path and each `req.Args` entry through it. The
helper escapes the same character set that upstream Git escapes
(`'` and `!`), so the bytes go-git puts on the wire match what
`git` itself would send for the same input. Benign inputs are
byte-identical to the previous output; values containing `'` or
`!` now round-trip correctly through any POSIX shell and through
`git-shell`'s `sq_dequote_to_argv`.

Add a table-driven `TestBuildCommand` covering the plain case,
both characters that `sq_quote_buf` escapes, mixed inputs, and
inert metacharacters that pass through unchanged.

[1]: https://github.com/git/git/blob/v2.54.0/quote.c#L28

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* git: submodule, canonical remote for relative URLs

A relative submodule URL like `../sibling` has to be joined onto a
base URL — the superproject's remote — to produce the actual URL
to clone. Submodule.Repository() picked `Repository.Remotes()[0]`
as the base. That slice is built by iterating a Go map, so its
order is randomized: in any superproject with more than one
remote (the common `origin` + `upstream` fork setup), the chosen
base flipped between runs and the same `git submodule update`
could clone the sibling from either namespace.

Canonical Git is deterministic[1][2]: it uses
`branch.<HEAD>.remote` if configured, the sole remote when there
is exactly one, otherwise the literal `"origin"`. The submodule
helper[3] then joins the relative URL onto that remote's URL.

Mirror that rule in an unexported `defaultRemote` and call it
from the relative-URL branch. Tests cover the rule directly and
the end-to-end submodule flow; the integration test loops to
catch any future regression that re-introduces map-order
dependence.

[1]: https://github.com/git/git/blob/v2.54.0/remote.c#L1801-L1809
[2]: https://github.com/git/git/blob/v2.54.0/remote.c#L655-L669
[3]: https://github.com/git/git/blob/v2.54.0/builtin/submodule--helper.c#L53-L75

Assisted-by: Claude Opus 4.7 <[email protected]>
Signed-off-by: Hidde Beydals <[email protected]>

* storage: filesystem, Use repository object format for loose objects
Pass the configured object format into the objfile writer so SHA-256
repositories create loose objects at their SHA-256 object IDs instead
of writing SHA-1-addressed files. Add SHA-256 writer and filesystem
round-trip coverage.

Assisted-by: OpenAI Codex <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* git: Add commit and ref creation SHA-256 tests

Signed-off-by: Paulo Gomes <[email protected]>

* git: submodule, detect absolute paths on Windows

When a submodule URL parses to a `file://` URL,
Submodule.Repository() checks whether the URL's `.Path` is absolute
and only joins it onto a parent remote if it isn't. The check used
`path.IsAbs`, which only recognizes POSIX-rooted paths (`/foo`).
On Windows a submodule URL can legitimately be a drive-letter path
(`C:\path\to\repo`) or a UNC share (`\\server\share\foo`); those
are absolute under `filepath.IsAbs` but not `path.IsAbs`, so the
code was treating them as relative and joining them onto the
superproject's remote URL, producing nonsense like
`file:///parent/origin/C:\path\to\repo`.

Add the platform-aware `filepath.IsAbs` check alongside the
URL-style `path.IsAbs`. POSIX-rooted paths still work because
`path.IsAbs` catches them; native Windows absolute paths now skip
the relative-URL branch as they should. The new test runs only on
Windows (`//go:build windows`), where the divergence is observable
— on Unix both predicates agree on every case.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* internal: url, disambiguate SCP from local paths

`MatchesScpLike` accepted any string that fit the regex
`^[user@]host:[port:]path`, but the regex's host class permits
`/` and single ASCII letters, so it matched strings that are
actually local filesystem paths:

  - `./relative:path` and `/abs/with:colon/file` matched with host
    `./relative` or `/abs/with`.
  - On Windows, `C:/path/to/repo` matched with host `C` — a
    Windows drive-letter path treated as SSH to a single-character
    hostname.

`transport.ParseURL` then returned an `ssh://` URL where canonical
Git would return a local path. The Windows drive-prefix case is
the practical impact: a submodule URL of `C:/path/to/repo` in
.gitmodules is a valid Windows-absolute path, and go-git was
attempting an SSH connection to a host named `C` instead of
opening the local repository.

Mirror canonical Git's `url_is_local_not_ssh`[1]: after the regex
matches, reject the SCP classification when (a) the URL has a `/`
before the first `:`, or (b) on Windows, the URL begins with a
DOS drive prefix. The drive-prefix check is a small byte-wise
helper rather than a regex, mirroring Git's
`win32_has_dos_drive_prefix`[2] and matching the canonical
implementation more closely.

Add positive tests for the rejected forms (portable for (a),
gated by `runtime.GOOS == "windows"` for (b)) and a regression
test verifying that real SCP forms still match.

[1]: https://github.com/git/git/blob/v2.54.0/connect.c#L710-L716
[2]: https://github.com/git/git/blob/v2.54.0/compat/win32/path-utils.c#L20-L29

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* git: submodule, error on remote without URLs

`defaultRemote`, introduced in 0d02124f, returns a `*RemoteConfig`
that `submodule.Repository()` then dereferences as `base.URLs[0]`
to join a relative submodule URL. `RemoteConfig.Validate()` rejects
an empty `URLs` slice on `SetConfig`, but a config that bypasses
validation — for example a hand-edited `[remote "origin"]` section
with no `url =` entry loaded from disk — could still produce a
remote with no URLs and panic the caller with `index out of range
[0] with length 0`.

Promote the invariant into `lookupRemote` and route the
single-remote branch through it as well, so any code path that
asks `defaultRemote` for the resolution base gets a usable remote
or a clear `remote %q has no configured URL` error.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* plumbing: objfile, Pass object format into Reader

Thread the repository's object format through objfile.NewReader so
SHA-256 loose objects are hashed with SHA-256 instead of being
hard-coded to SHA-1. Update callers in dotgit, filesystem object
storage, and the dumb HTTP transport, and extend reader/writer tests
with SHA-256 round-trip coverage.

Assisted-by: Claude Opus 4.7 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* git: submodule, Add relative URL regression tests

Mirror the regression-test coverage from v5 PR #2070 (54a32347)
to v6. The v5 commit fixed `Submodule.Repository()` to resolve
relative submodule URLs such as `../X.git` against the parent
repository's remote URL per git-submodule(1)[1] and canonical
Git, rather than absolutizing them against the process CWD.

The v5 root cause — `transport.NewEndpoint` running scheme-less
inputs through `parseFile`, which called `filepath.Abs` — does
not exist on v6: `internal/url.ParseFile` stores the input
verbatim. The relative-resolution branch in
`Submodule.Repository()` therefore already fires correctly, and
PR #2071 (0d02124f) plus PR #2076 (16558108) further hardened
remote selection via `defaultRemote` and `lookupRemote`. So the
production code on v6 already has the fix; this commit only
backfills the matching test coverage so a future refactor that
reintroduces eager normalization in `ParseFile` fails loudly
instead of silently breaking submodule resolution.

Five table-driven cases cover relative URLs against HTTPS and
SSH parents, deep traversal (`../../`), absolute local URL
preservation, and the no-parent-remote case. The last asserts
the `defaultRemote`-wrapped error
`resolving relative submodule URL: remote "origin" not found`,
matching v5's `TestRepositoryRelativeURLNoParentRemote` after
the same upstream remote-selection refactor landed there.

[1]: https://git-scm.com/docs/git-submodule

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* plumbing: format/idxfile, Validate offset64 indices

When a 32-bit offset entry has its MSB set, the lower 31 bits index
into the `Offset64` table. Neither `MemoryIndex.getOffset` nor
`LazyIndex.offset` validated that the resulting position fell inside
the decoded table; an idx with an MSB-set entry whose lower bits
exceed the number of decoded 64-bit entries would index past the
end of the table.

Bounds-check both lookups against the decoded table size and return
`ErrMalformedIdxFile` when the index is out of range. `MemoryIndex`'s
unexported `getOffset` now returns `(uint64, error)`; the change
propagates through `FindOffset`, `genOffsetHash`, and the `Entries`
iterator. `LazyIndex` caches the 64-bit-entry count discovered during
`init` and uses it for the same check before issuing the `ReadAt`.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* storage: filesystem, Search alternates in HashesWithPrefix

EncodedObject, HasEncodedObject and EncodedObjectSize all fall back to
alternate object stores, but HashesWithPrefix only consults local loose
objects and packs. With alternates configured, ResolveRevision on an
abbreviated hash that lives in the alternate returns
ErrReferenceNotFound even though the full hash resolves.

Iterate alternates after local lookup, deduplicating against results
already collected. Also mark pack hits as seen so a hash present in
multiple packs or in both a pack and an alternate is returned once.

Signed-off-by: Andrew Nesbitt <[email protected]>

* plumbing: format/idxfile, Add `MemoryIndex` fuzzer

`FuzzLazyIndex` covers one of the two parallel idx implementations.
The companion `MemoryIndex` (decoder + post-decode `Index` methods +
both iterators) had no fuzz coverage, which is why the offset64
bounds gap fixed in e429f8e2 was not surfaced earlier.

`FuzzMemoryIndex` mirrors the structure of `FuzzLazyIndex`: seed the
corpus with structurally valid prefixes (using `buildMinimalIdx`),
the empty input, and a malformed-but-decodable idx from
`buildOOBOffset64Idx`; the fuzz function then runs `Decode` and
exercises every `Index` method plus both iterators on inputs that
decode successfully.

`buildOOBOffset64Idx` lives in `fuzz_helpers.go` (alongside
`buildMinimalIdx`) rather than `_test.go`, because the OSS-Fuzz
harness extracts fuzz targets into a non-test build context where
sibling `_test.go` symbols are not resolved.

A 30s local run produced ~2.25M executions and 99 corpus inputs
without failure on top of the validated tree.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* plumbing: format/idxfile, Reformulate offset64 check

The previous bounds check `offset+8 > uint64(len(idx.Offset64))`
was correct because masking and the `uint32` source type bound
`offset` to ~2^34, leaving uint64 headroom. That safety relies on
invariants in surrounding code (mask shape and source type) that
aren't visible at the comparison itself.

Reformulate as `len < 8 || offset > len-8`, which holds without
relying on the upstream bound. Behaviour is unchanged.

`LazyIndex.offset` already uses an index-vs-count comparison
(`loIndex >= s.count64`) with no addition, so it needs no
analogous change.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* storage: filesystem, Close entry iterator on error in HashesWithPrefix

The pack entry iterator was only closed after a full iteration; an
error from Next() returned early without closing it.

Signed-off-by: Andrew Nesbitt <[email protected]>

* git: Add path-validating filesystem wrapper
Introduce worktreeFilesystem, a wrapper around billy.Filesystem that
calls validPath on every mutating operation (Create, OpenFile, Remove,
Rename, Symlink, MkdirAll). This ensures dangerous paths like .git/*,
../, and git~1/ are rejected regardless of which code path writes to
the worktree.

The Worktree.Filesystem field is replaced with a Filesystem() method
that returns the underlying billy.Filesystem, preventing external code
from bypassing the wrapper by reassigning the field.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* git: Gate NTFS path checks on core.protectNTFS config

Replace the runtime.GOOS == "windows" guard in path validation with
the core.protectNTFS configuration option, matching upstream Git
behaviour. When not explicitly set, defaults to true on Windows.

This allows non-Windows systems to opt in to NTFS protection, and
Windows systems to opt out when not needed.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* git: Add core.protectHFS config for HFS+ path validation

Add support for core.protectHFS, which detects .git paths obfuscated
with Unicode zero-width and directional characters that HFS+ silently
strips during path normalization. When enabled, paths like
.g\u200cit are rejected. Defaults to true on macOS, matching upstream
Git behaviour.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* git: Validate .git and dot components at every path level

Extend path validation to check every path component, not just the
first. Single-dot components are also rejected. This matches upstream
Git's verify_path_internal which validates at every directory separator
boundary.

A non-first final .git component (e.g. "submodule/.git") is permitted
because submodule worktrees contain a .git pointer file.

The per-call-site validChange checks are removed since the
worktreeFilesystem wrapper enforces validation on all mutating
operations.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* git: Check for Windows reserved names

Signed-off-by: Paulo Gomes <[email protected]>

* git: Split ntfs and hfs

Signed-off-by: Paulo Gomes <[email protected]>

* git: Expand worktreeFilesystem validation

Signed-off-by: Paulo Gomes <[email protected]>

* git: Refer to internal filesystem field

Signed-off-by: Paulo Gomes <[email protected]>

* build: Skip tests for older git versions

Signed-off-by: Paulo Gomes <[email protected]>

* git: Allow for submodules with .git in Windows

Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: idxfile, widen sharedFile reset test timing

TestSharedFileGracePeriodResetByAcquire flaked repeatedly on the
Windows runners with `Should be false ... new grace period hasn't
expired`. The test set `sf.gracePeriod = 80ms`, ran an acquire/
release pair, slept 50ms and triggered a reset acquire, slept
another 50ms, then asserted the file was still open. That left
only 30ms of margin on the second assertion — well under the
~15ms `time.Sleep` granularity and scheduling jitter that
GitHub-hosted Windows runners exhibit under load, so the close
timer fired before the assertion ran.

Widen the grace period to 200ms, leaving ~150ms of headroom on
the negative assertion, and replace the fixed-sleep `assert.True`
for the final "file should be closed" check with
`assert.Eventually`, so we no longer race timer scheduling jitter
on either side. The test still covers the reset-on-acquire
semantics introduced by fc7566a3 [1]; only the timing knobs
change.

Recent flake on `main`: 25490308205 [2].

[1]: https://github.com/go-git/go-git/commit/fc7566a3f53410298c77fa23635889c81e3dd3b5
[2]: https://github.com/go-git/go-git/actions/runs/25490308205

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* internal: server/git, widen Timeout test tolerance

TestGitServer_Timeout flaked on the Linux runners with messages
like `"193.372166ms" is not greater than or equal to "195ms"` —
the server closed ~7ms before the assertion's lower bound. The
test measures elapsed from the client's perspective:

	start := time.Now()
	_, err = conn.Read(buf)
	elapsed := time.Since(start)

but the server arms its idle deadline during accept and the
first server-side `Read`, both of which run before the client
calls `Read`. Client-measured elapsed therefore systematically
underestimates the real server-side timeout by a small amount,
and the previous 5ms tolerance was simply tighter than realistic
scheduling jitter on shared GitHub-hosted runners.

Widen the tolerance to 30ms (~15% of the 200ms timer). This still
catches a server that closes substantially earlier than
configured (a real bug), but absorbs the inherent client/server
clock skew plus normal CI jitter.

Recent flakes on `main`: 25443754481 [1], 25440100884 [2].

[1]: https://github.com/go-git/go-git/actions/runs/25443754481
[2]: https://github.com/go-git/go-git/actions/runs/25440100884

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* *: close fetch/push session and test storages

On Windows the test runner repeatedly hit

	testing.go:1464: TempDir RemoveAll cleanup: unlinkat
	  …\objects\pack\pack-XXXX.idx:
	  The process cannot access the file because it is being
	  used by another process.

across `TestFetchWithFilters` (run 25132475242 [1]),
`TestSmartInfoRefsObjectFormat` (25498994424 [2]), and several
upload-pack server suite tests. The cause traces back to the
sharedFile-managed `.idx` descriptors introduced in 13daf62f:
they are kept open for a one-second grace period after the last
reader releases, which races `t.TempDir`'s synchronous
`RemoveAll` on test return.

Two distinct shapes leaked storages. First, `Remote.fetch` and
`Remote.PushContext` opened a transport session via `Handshake`
but only closed it on the success path, so every early return —
refspec validation, `GetRemoteRefs` failure, `ErrFilterNotSupported`
— leaked the session, and with it the file transport's storage
and its `.idx` readers. Mirror `Remote.list`'s pattern by adding
`defer ioutil.CheckClose(sess, &err)` immediately after
`Handshake`, and drop the now-redundant explicit close in
`fetch`'s happy path. The push path showed the same leak shape
without an observed flake; it is fixed alongside `fetch` since
it is the same one-line change.

Second, `tagLoader`/`fixturesLoader` in `backend/http_test.go`
and the `testServe` helper in `plumbing/transport` build a
`filesystem.Storage` per request but never close it. Register a
`t.Cleanup` so the storage closes at test end, cancelling the
grace timer and releasing descriptors synchronously.
`TestUploadPackStatefulMultiRoundSendsFinalACK` bypasses
`testServe` and gets its own cleanup.

[1]: https://github.com/go-git/go-git/actions/runs/25132475242
[2]: https://github.com/go-git/go-git/actions/runs/25498994424

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* *: Reject malformed variable-length integers

Variable-length integer decoders accepted continuation chains long
enough to push the running shift past the bit-width of their
accumulator, producing out-of-range values that propagated into
buffer allocation hints and on-disk size fields.

Mirror the bound from canonical Git's `unpack_object_header_buffer`[1]:
reject any continuation byte that would shift past the type's
capacity. The same rule applies to `VariableLengthSize`, `DecodeLEB128`,
`DecodeLEB128FromReader` and `ReadVariableWidthInt`; `DecodeLEB128` now
returns an error so callers surface invalid input rather than
silently zeroing the result.

Add defense-in-depth at every call site that allocates from a parsed
length: cap `bytes.Buffer.Grow` hints in `Parser.parentReader` and
`patchDeltaWriter` at 1 GiB / `maxPatchPreemptionSize`, cap
`parserCache.Reset` at a 4 Mi-entry hint so a hostile `ObjectsQty`
cannot request a multi-gigabyte slice, validate `OFS-delta`
back-references in the streaming and mmap paths (must be `> 0` and
`<= o.offset`), and reject `mmap.fsobject` sizes whose top bit is
set rather than truncating into a negative `int64`.

Cover the new behavior with unit, parser-level regression and fuzz
tests (`FuzzVariableLengthSize`, `FuzzDecodeLEB128`, `FuzzParser`).

[1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L1135-L1158

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* backend: use embedded TB.Cleanup directly

`fixturesLoader` and `tagLoader` embed `testing.TB`, so the
explicit `f.TB.Cleanup` / `l.TB.Cleanup` selectors trip
`staticcheck`'s `QF1008` (redundant embedded-field selector).
Drop the explicit `.TB`. No behaviour change.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* plumbing: format/packfile, Test patchDeltaWriter cap

`patchDeltaWriter`'s preemptive `Buffer.Grow` is already clamped
at `maxPatchPreemptionSize` since `5b89cb5a`, but no test covered
that path end-to-end. Add a regression test that constructs a
delta whose header advertises a `math.MaxInt64` `targetSz` and
asserts the call returns an error rather than driving the
allocation.

Also restate the rationale for the clamp in the comment, naming
both the input-validation goal and the symmetry with `patchDelta`.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* plumbing: format/packfile, Tighten delta validation

Each delta operation's declared size is validated against the
size still remaining in the target buffer, not the total target
size. The previous check against the total accepted a sequence of
operations whose cumulative output would exceed the declared
target, leaving the unsigned remaining counter unable to reach
zero and the loop free to keep writing past the declared target
until the delta payload was consumed.

Three sibling code paths shared the flaw and are updated together:
`patchDelta` (used by `PatchDelta` and `ApplyDelta`),
`ReaderFromDelta`, and `patchDeltaWriter` (used by the packfile
parser).

The loop structure is rewritten as `for remainingTargetSz > 0`
so the invariant is explicit, and a post-loop check rejects
deltas that leave bytes unconsumed -- matching the `data != top`
sanity check in `patch-delta.c`[1]. Empty-target deltas
(header-only, no operations) are now accepted, also matching
upstream.

Several incidental defects on the same call paths are fixed:
`patchDelta`'s copy-from-src failure used `break`, which only
exited the switch and let the loop keep consuming delta bytes;
`patchDeltaWriter` returned `(0, ZeroHash, nil)` on invalid input,
silently reporting success; and `ReaderFromDelta`'s copy-from-src
failure closed the writer without an error, silently truncating
the consumer stream. Each now propagates `ErrInvalidDelta` (or
`ErrDeltaCmd`) consistently.

`packfile.getMemoryObject` was re-inflating delta payloads into
the buffer the scanner had already populated, appending a
duplicate copy that previously went unnoticed because the loop
silently ignored anything past the target. The fix matches the
cached-content pattern already used in `Scanner.WriteObject`.

[1]: https://github.com/git/git/blob/c2c5f6b1e479f2c38e0e01345350620944e3527f/patch-delta.c

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* config: optbool, accept Git-style boolean syntax

Add a `parseConfigBool` helper that mirrors upstream Git's
`git_parse_maybe_bool`: it accepts `true`/`yes`/`on`/`1` and
`false`/`no`/`off`/`0` case-insensitively, and returns
`OptBoolUnset` for empty or unrecognised values so the
caller's platform default stays in place.

`unmarshalCore` swaps `strconv.ParseBool` for `parseConfigBool`
when reading `core.protectNTFS` and `core.protectHFS`. The
previous `strconv.ParseBool` path silently misinterpreted
user-friendly syntax: writing `protectNTFS = on` made
`strconv.ParseBool` return an error, which left the field at
its zero value (`OptBoolUnset`), so on Windows the platform
default applied — but a user who wrote `protectNTFS = on` to
deliberately enable the protection on Linux would have got the
platform default (`false`) instead of the explicit `true` they
intended. With the tolerant parser, all of `true` / `yes` /
`on` / `1` (and their false counterparts) take effect where
the user expects them to.

Other booleans in this package keep the loose `== "true"`
pattern; aligning them is out of scope. Only the security
toggles are upgraded here, where silent misinterpretation has
the highest cost.

Reference: upstream Git `git_parse_maybe_bool_text` at
`parse.c` L157-L173 and `git_parse_maybe_bool` at `parse.c`
L174-L182 in tag `v2.54.0`[1]. `git_parse_maybe_bool` is the
closer match, since it also accepts integer values via
`git_parse_int`.

[1]: https://github.com/git/git/blob/v2.54.0/parse.c#L157-L182

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* internal: pathutil, port Git path-validity rules

Lift platform-specific dotgit-variant detection out of package
`git` into a self-contained package so it can be reached from
any caller — `config` (#2079's submodule-name validation),
`storage/filesystem/dotgit` (the `Module(name)` containment
check), and the tree-side gates added in a follow-up commit —
without those callers depending on the root `git` package.

The package collects three layers:

  - HFS+ side: the ignored-codepoint table plus
    `IsHFSDot` / `IsHFSDotGit` / `IsHFSDotGitmodules` family
    (zero-width / case-folding aware). Implementations are
    unchanged from the previous `git`-package versions; only
    the package boundary moves.

  - NTFS / Windows side: `IsNTFSDot` (a port of canonical
    Git's `is_ntfs_dot_generic`), `IsNTFSDotGitmodules`, the
    `WindowsValidPath` predicate, and the reserved-name table
    (`CON`, `NUL`, `AUX`, …). The `dotGit = ".git"` constant
    is declared locally to avoid coupling pathutil to
    `git.GitDirName`.

  - Cross-platform helper: `IsDotGitName` matches `.git` and
    its 8.3 NTFS short alias `git~1` case-insensitively.

On top of these primitives sits `ValidTreePath`, the strict
validator applied at the boundary where attacker-controlled
tree data leaves the trusted store. Where the wrapper-layer
`validPath` in package `git` is intentionally tolerant of
final-position `.git` (legitimate `submodule/.git` flows in
submodule cleanup) and only consults HFS+/NTFS variants when
the corresponding `core.protect*` flag is on, `ValidTreePath`
is always-strict regardless of runtime config: tree paths are
canonical UTF-8 with no zero-width characters or 8.3 short-
name forms, so an entry that looks like one is suspicious
anywhere. It rejects control characters, empty / `.` / `..`
components, Windows volume-name prefixes, `.git` and its
HFS+/NTFS variants at every position, and reserved device
names.

Mirrors upstream Git's `verify_path_internal` at
`read-cache.c` L987-L1048 in tag `v2.54.0`[1], stripped of
its runtime `protect_hfs` / `protect_ntfs` gating because the
pathutil layer is consulted from the strict tree boundary,
not from application paths.

[1]: https://github.com/git/git/blob/v2.54.0/read-cache.c#L987-L1048

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* *: gate path materialisation at trust boundaries

Apply `pathutil.ValidTreePath` at the chokepoints where tree
data crosses out of the trusted object store and where
application-supplied paths cross into the index. Layered on
top of the existing tolerant wrapper `validPath` in package
`git`, this gives the worktree two layers of protection:
strict validation at the boundary, tolerant validation at
the filesystem edge for legitimate flows (`submodule/.git`
Stat / Remove during submodule cleanup).

Read-side chokepoints in `plumbing/object`:

  - `(*Tree).FindEntry` — most callers funnel through here:
    `(*Tree).File`, `(*Tree).Tree`, `(*Tree).Size`, and the
    `checkoutChange` Modify/Insert branch. A dangerous tree-
    derived path is refused at the lookup boundary before
    anything materialises.

  - `TreeWalker.Next` — drives `transformChildren` (which
    feeds `merkletrie.DiffTree`), `FileIter`, and the archive
    writers in `internal/archive`. Each leaf entry name is
    validated as it surfaces; a malformed entry stops the
    walk with the validator's error rather than skipping
    silently. Inspection-only callers that need raw access
    can still read `Tree.Entries` directly.

  - `(*Tree).TreeEntryFile` — boundary where a `*File` whose
    Name a caller can hand to filesystem ops leaves the
    store.

Write-side chokepoint in `plumbing/format/index`:

  - `Index.Add(path)` now validates and returns
    `(*Entry, error)`. Every Add and Move flow that records
    a path in the index funnels through here, mirroring
    upstream Git's `verify_path_internal` invocation from
    `make_cache_entry` on the index-addition side.
    `Index.Add`'s signature change is a v6 API break.

Application-side gates in package `git`:

  - The wrapper-level `validPath` continues to gate
    filesystem writes; HFS+/NTFS-aware rejection of
    `.gitmodules` symlink targets is now driven by the same
    `pathutil` predicates so the wrapper and the strict
    validator stay aligned. The control-character loop is
    byte-oriented for upstream parity with
    `verify_path_internal`.

  - `Submodule.Repository`'s `Chroot` validates the
    submodule's tree-stored Path before scoping the
    repository, refusing embedded `.git` / HFS+ / NTFS
    variants regardless of `core.protectHFS` /
    `core.protectNTFS`.

  - `Worktree.checkoutChange`'s Delete branch is reached
    only from `resetWorktree`'s filesystem-vs-index
    merkletrie diff (`resetWorktreeToTree`'s tree-derived
    deletes call `rmFileAndDirsIfEmpty` directly). The path
    source is the local worktree filesystem, where the
    tolerant wrapper is the right fit: cleanup of
    legitimately-tracked shapes like `submodule/.git`
    should not abort the whole reset.

The root-level `hfs.go` / `ntfs.go` files held only the
3-line `defaultProtectHFS` / `defaultProtectNTFS` runtime-
policy helpers after the `pathutil` extraction; they fold
into `worktree_fs.go` next to the wrapper that consumes
them. The wrapper-integration tests (`TestValidPath*`,
`TestWorktreeFilesystem*` for HFS / NTFS variants) move
next to the wrapper they exercise in `worktree_fs_test.go`.

Tests pin every chokepoint against the catalogue of attacker
shapes used elsewhere in this branch — final-position `.git`
(`submodule/.git`), NTFS trailing-space / trailing-dot / ADS
(`.git ` / `.git.` / `.git::$INDEX_ALLOCATION`), HFS+ zero-
width `.git` (e.g. `.g‌it`), Windows reserved `CON` /
`NUL`, the 8.3 short alias `git~1`, dot-dot traversal,
control chars, and the empty path. `Index.Add`'s
`TestIndexAdd` is updated for the new signature, and a new
`TestIndexAddRejectsDangerousPaths` pins the index gate.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* *: skip ignored directories during HardReset walk

resetWorktreeToTree (HardReset, KeepReset) walks the working tree via
diffStagingWithWorktree to find tracked files that are missing or
stale on disk. Without the IgnoreMatcher optimization landed in #2048,
that walk descends into every gitignored subtree before the loop body
discards the resulting Delete actions, so every file under directories
like node_modules is lstat'd for nothing.

The infrastructure to avoid this already exists: passing
excludeIgnoredChanges=true to diffStagingWithWorktree builds a
gitignore.Matcher and threads it into filesystem.Options.IgnoreMatcher,
where the noder's shouldSkipIgnored prunes ignored, untracked subtrees
at enumeration time while keeping tracked entries (or directories with
tracked descendants) visible. Flip the flag to true at the
resetWorktreeToTree call site to engage it.

Leave the resetWorktree (MergeReset) call site at false. resetIndex
runs immediately before resetWorktree and removes paths from the
index. A tracked path inside a gitignored directory that was just
removed from the index would no longer be in idxMap, so the noder
would prune it from the walk and the Delete action needed to remove
it from disk would never be emitted. resetWorktreeToTree is safe
because step 1 (the tree-to-tree diff) handles deletions, and step 2
explicitly skips any Delete action returned from the worktree walk.

The observable result is unchanged for HardReset / KeepReset: step 2's
loop already skips Delete actions explicitly to preserve untracked
files (the fix from 0c7da9764). The matcher only avoids the cost of
walking those entries.

Benchmark on a tree with 100 tracked source files and N untracked files
in a gitignored directory (Apple M5, -benchtime=10x):

  Untracked | before   | after  | speedup
  --------- | -------- | ------ | -------
  0         |  2.2 ms  | 2.6 ms | -
  1,000     |  7.7 ms  | 3.8 ms |  2.0x
  5,000     | 16.2 ms  | 5.2 ms |  3.1x
  20,000    | 51.8 ms  | 4.6 ms | 11.3x

The small overhead in the 0-untracked case is the same matcher build
cost noted in #2048; the gain comes from the diff walk no longer
descending into ignored subtrees. The numbers track BenchmarkStatusIgnoredDir
closely because both operations share diffStagingWithWorktree.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

* *: cover IgnoreMatcher behavior across Reset modes

Three regression tests pinning the contract this branch relies on at
the worktree-level (the noder-level invariants are already covered by
node_ignore_test.go in #2048).

  - TestResetHardTrackedFileInIgnoredDir: a tracked file inside a
    gitignored directory, deleted on disk, is restored by HardReset.
    Exercises the trackedDirs path through Reset.
  - TestResetHardModifiedTrackedFileWithGitIgnore: complements the
    existing TestResetHardWithGitIgnore (which covers deletion) with
    the modification path: a tracked file whose own name is gitignored
    has its content reverted by HardReset.
  - TestMergeResetRemovesTrackedFileInIgnoredDir: pins the reason
    resetWorktree (MergeReset) must keep excludeIgnoredChanges=false.
    Engaging the matcher there would prune a tracked path that
    resetIndex just removed from the index, so the Delete needed to
    clean it off disk would never be emitted. Verified by flipping
    the flag locally — the test fails when the matcher is enabled and
    passes when it is not.

Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>

* *: align tree-path validation with upstream Git

Restrict `ValidTreePath`'s NTFS gating to the disguised-`.git`
family, dropping the always-on Windows reserved-name check that
made go-git refuse trees upstream Git happily reads. In upstream's
`verify_path_internal`[1], `is_ntfs_dotgit` runs under `protect_ntfs`
(defaulting to 1 on every platform) but `is_valid_win32_path` is
compile-time gated to Windows-native and Cygwin builds. Names such
as `lib/con.go` are well-formed on non-Windows, so a go-git client
on Linux must be able to read trees containing them.

Lift `is_ntfs_dotgit` out as `IsNTFSDotGit` rather than keeping the
disguise logic fused into `WindowsValidPath`. As a side-effect this
closes a gap in the previous implementation: it only recognised the
`.git` prefix, so `git~1 ` (trailing space), `git~1.`, and
`git~1::ads` slipped past, even though upstream's `is_ntfs_dotgit`[2]
also matches the `git~1` short-name prefix. `WindowsValidPath` now
composes `IsNTFSDotGit` with the reserved-name table, retaining its
existing wrapper-layer contract: bare `.git` and `git~1` are allowed,
position-checked by callers.

Defence in depth is preserved at the materialisation boundary:
`worktreeFilesystem.validPath` still enforces both checks under
`core.protectNTFS`, so reserved-name and disguise rejection remain
in place when a path is about to hit disk on Windows. Tests in
`plumbing/format/index`, `plumbing/object`, and the worktree
strict-gate suite are updated to reflect the new contract: reserved
names move out of the always-on tree gate, and `git~1` disguise
variants are added alongside their `.git` counterparts.

[1]: https://github.com/git/git/blob/v2.54.0/read-cache.c#L987-L1048
[2]: https://github.com/git/git/blob/v2.54.0/path.c#L1415-L1449

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* git: default core.protectNTFS to on everywhere

Mirror upstream Git's `PROTECT_NTFS_DEFAULT`, which has been `1`
unconditionally since 9102f958ee5 (CVE-2019-1353)[1]. Until now
go-git gated the default on `runtime.GOOS == "windows"`, leaving
Linux and macOS users without the wrapper-layer `is_ntfs_dotgit`
and reserved-name checks unless they explicitly set
`core.protectNTFS=true`. The motivating scenario is unchanged
from upstream's: WSL mounts Windows drives under `/mnt/`, so a
Linux process can reach an NTFS-backed worktree where the `.git`
directory is also resolvable as `git~1` (or `.git ` /
`.git::$DATA`). Gating the guard on the runtime OS skips that
class of attack on the very system where it is reachable.

Tree-side gates already catch disguised `.git` regardless of
this default — `pathutil.ValidTreePath` is always-on per
ce4cca18. This commit closes the parallel gap at the wrapper
layer: `worktreeFilesystem.validPath` and `validSymlinkName` now
enforce the NTFS rules on non-Windows by default, matching
upstream's protect-by-default posture.

`PROTECT_HFS_DEFAULT` is left untouched. Upstream chose not to
flip that default in 9102f958ee5 (the cost in the cited
benchmark was non-trivial and the WSL-equivalent scenario for
HFS+ is not realistic), and `defaultProtectHFS` already mirrors
that decision via its Darwin-only return.

[1]: https://github.com/git/git/commit/9102f958ee5

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* config: validate submodule names

`Submodule.Validate()` validated `Path` against `dotdotPath` but
left `Name` unchecked, so a `.gitmodules` declaring
`[submodule ".."]` passed validation cleanly. Downstream the same
name reached `storage/filesystem/dotgit.DotGit.Module(name)` where
it was joined into the storage path with no containment check,
letting the submodule storer escape `.git/modules/` and write
into the parent repo's `.git/`.

Mirror canonical Git's `check_submodule_name` from
`submodule-config.c` [1], rejecting empty names and any name with
a `..` path component using both `/` and `\\` as separators (the
canonical loop uses both unconditionally so the rule stays
platform-consistent). On top of that, reject names that are bare
`.`, contain a NUL byte, start or end with a separator, or carry
a Windows drive prefix — go-git-specific edge cases the canonical
loop does not exercise because it treats names as opaque C
strings.

The path-component check is also extended for filesystems whose
canonicalisation can turn an innocent-looking name into `..`:
HFS+ ignores a fixed set of Unicode code points (zero-width and
directional marks) so `.<U+200C>.` resolves to `..` on macOS, and
NTFS strips trailing spaces, periods, and an alternate-data-stream
suffix so `.. `, `....`, and `..::$INDEX_ALLOCATION` all resolve
to `..` on Windows. Both variants are detected by calling
`pathutil.IsHFSDot` and `pathutil.IsNTFSDot` with `.` as the
needle — they already port the relevant fragments of canonical
Git's `next_hfs_char` and `is_ntfs_dotgit` — and run
unconditionally on every submodule name since `.gitmodules` is
attacker-controlled by definition.

Wrap the new `ErrModuleBadName` sentinel (whose message echoes
canonical Git's `ignoring suspicious submodule name` warning)
with the offending name in `Validate()` so callers can both
`errors.Is` the sentinel and read the bad name in the error
string.

[1]: https://github.com/git/git/blob/v2.54.0/submodule-config.c#L214-L237

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* config: skip submodules with invalid names

Extend `unmarshalSubmodules`'s silent-skip to also drop entries
whose `Name` fails `Submodule.Validate()`. The previous check only
matched `ErrModuleBadPath`; with the new `ErrModuleBadName`
validation landed in the preceding commit, a `.gitmodules`
declaring `[submodule ".."]` would still be inserted into the map
keyed by `..` and reach `Submodule.Repository()` callers
downstream. Mirror canonical Git's "ignoring suspicious submodule
name" semantics by treating bad-name entries the same way
bad-path entries are already treated: drop them without failing
the parse, so a malformed `.gitmodules` does not break unrelated
config loading.

Empty-path and empty-URL entries keep their existing lenient
behaviour of being stored despite the validation error,
preserving backward compatibility with `.gitmodules` files in
the wild.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* storage: dotgit, refuse escaping submodule names

`DotGit.Module(name)` joined the raw `name` into
`modulePath = "modules"` and chrooted to the result without any
containment check. A `Submodule` whose `Name` is `..` (or a sibling
traversal like `foo/../..`) caused `Chroot("modules/..")` to resolve
back to `.git/`, so once `Submodule.Repository()` opened that storer
any subsequent `Storer.SetReference` / `SetEncodedObject` call wrote
into the parent repo's `.git/`. The parser-layer fix in the
preceding commit already rejects such names from `.gitmodules`,
but `Module` is also reachable from any caller that constructs a
`Submodule` struct programmatically — including the transactional
storage wrapper — so a defence in depth is warranted.

After joining, clean the result with `path.Clean(filepath.ToSlash)`
and refuse the call unless the cleaned path is exactly `modules` or
remains under `modules/`. This catches `..`-traversal escapes
without rejecting nested but contained names like `lib/foo`, which
canonical Git permits.

Names whose joined form `path.Clean`s back into the modules/
subtree (for example `/etc`, which becomes `modules/etc`, or
`modules/../escape`, which becomes `modules/escape`) are accepted
here — the parser layer is responsible for the shape of the name.
The storage layer's only job is to ensure no caller can write
outside `modules/`.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* git: submodule, test name escape defence

Cover the storage-layer defence: constructing a `Submodule` with
`Name = ".."` programmatically (bypassing the .gitmodules parser)
must not result in `Repository()` opening a storer rooted in the
parent's `.git/` directory, and must leave the parent's HEAD
reference untouched.

Use `filesystem.NewStorage` rather than `memory.NewStorage` for the
parent because only the filesystem-backed `ModuleStorer` exercises
the dotgit defence; the in-memory implementation maps names to
storage units directly without traversing a path. The test asserts
the call returns an error wrapping `dotgit.ErrModuleNameEscape` and
that the parent's HEAD target is unchanged after the failed call.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* git: add repository and remote archive support

Signed-off-by: Ayman Bagabas <[email protected]>

* build: Update module github.com/go-git/go-git-fixtures/v6 to v6.0.0-20260502205956-66bffbe5a6ff

Signed-off-by: go-git-renovate[bot] <245267575+go-git-renovate[bot]@users.noreply.github.com>

* build: Update golang

Signed-off-by: go-git-renovate[bot] <245267575+go-git-renovate[bot]@users.noreply.github.com>

* build: Update go-git

Signed-off-by: go-git-renovate[bot] <245267575+go-git-renovate[bot]@users.noreply.github.com>

* plumbing: format/pktline, validate reader/writer inputs

Guard the read and write entry points against nil io.Reader/io.Writer
and undersized buffers so callers receive a typed error instead of a
panic. Tighten Read so a stale prefix in a recycled buffer cannot be
misread as an error-line, and refuse payloads larger than the caller's
buffer.

Assisted-by: Claude Opus 4.7 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: format/pktline, add fuzz tests
Fuzz the Read, ReadLine, PeekLine, and Scanner entry points so the
defensive validation in the previous commit stays panic-free under
arbitrary byte sequences. Seed inputs cover flush/delim/response-end,
the short-buffer cases, and the stale-prefix scenario.

Drop the now-redundant test := test loop-variable shadowing in the
new table tests; the module is on go 1.25 where each iteration has
its own variable.

Assisted-by: Claude Opus 4.7 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: format/pktline, drain payload on short buffer

When Read is called with a buffer too small to hold the advertised
packet, it had already consumed the 4-byte header via io.ReadFull but
returned without draining the payload, leaving the stream positioned
mid-packet. Subsequent pkt-line reads would misinterpret the leftover
payload bytes as a fresh pkt-len and desync. Drain length-LenSize bytes
before returning io.ErrUnexpectedEOF so the next read finds the
following packet.

Assisted-by: Claude Opus 4.7 <[email protected]>
Signed-off-by: Paulo Gomes <[email protected]>

* plumbing: object, validate names in Tree.Encode

Tree.Encode previously rejected only embedded NUL bytes and
mis-sorted entries, so a caller assembling a *Tree programmatically
could produce a tree object containing components like ".git", "..",
control characters, or HFS+/NTFS variants of ".git". Run each entry
name through pathutil.ValidTreePath at the encoder boundary so the
library cannot emit such bytes through its API.

The same loop also rejects duplicate names, including the file and
directory collision where `foo` and `foo/` resolve to the same sort
key. With entries already in memory a single pass over `entry.Name`
catches the non-consecutive case that upstream Git's `verify_ordered`
in `fsck.c`[1] tracks with a name stack; the simpler form suffices
because the file/dir variant shares the bare name.

A test helper in worktree_commit_test.go that planted a tree
containing ".git/config" for cherry-pick coverage can no longer
do that through Tree.Encode and now writes the raw object bytes
directly, matching the escape hatch documented in Tree.Encode's
godoc.

[1]: https://github.com/git/git/blob/v2.54.0/fsck.c#L544-L614

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* plumbing: object, port fsck_tree as Tree.Validate

go-git's tree decoder is permissive on purpose: tools that inspect
or recover repositories need to read tree objects whose entries do
not pass the index-side path-validation gate. Upstream Git keeps
the same split — `parse_tree_buffer`[1] is a no-op on names while
`fsck_tree`[2] is the validator — and exposes the latter through
`git fsck`. Add the read-side counterpart to Tree.Encode's
producer gate: a Tree.Validate method that mirrors `fsck_tree`'s
structural ruleset. Decode stays as-is; callers that want
fsck-shaped reporting call Validate.

The ported rules cover null entry hashes, slashes in names, empty
names, ".", "..", ".git" with HFS+/NTFS disguises (already handled
by pathutil.ValidTreePath), bad file modes, duplicate entries,
mis-sort, names longer than 4096 bytes, and the four
FSCK_MSG_*_SYMLINK rules for ".gitmodules", ".gitattributes",
".gitignore", and ".mailmap". Two fsck_tree warnings —
zero-padded modes and the non-canonical 0100664 bits — are not
surfaced: both require retaining the raw octal mode string from
the wire, which canonicalTreeMode drops. The structural rules are
the load-bearing ones for refusing malformed trees.

The new pathutil helpers IsHFSDotGitattributes /
IsHFSDotGitignore / IsHFSDotMailmap and their NTFS counterparts
extend the existing IsHFSDotGitmodules / IsNTFSDotGitmodules pair
with the short-name prefixes upstream uses for the same names
(`gi7d29`, `gi250a`, `maba30`).

Violations are returned as an errors.Join of individual rule
errors, each wrapping ErrInvalidTree, so callers can match either
the umbrella or specific rules (ErrDuplicateEntry,
ErrEntriesNotSorted, pathutil.ErrInvalidPath) with errors.Is.

[1]: https://github.com/git/git/blob/v2.54.0/tree.c#L175-L184
[2]: https://github.com/git/git/blob/v2.54.0/fsck.c#L616-L800

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <[email protected]>

* plumbing: object, fold Tree.Encode through Tree.Validate

Encode previously enforced a subset of Tree.Validate's rules inline
(ValidTreePath, duplicate detection, sort order). With Validate
available as the structural ruleset, Encode can run it once at the
top and skip the duplicated checks — both shrinking the encoder and
widening the producer-side gate to refuse the rules Validate already
catches but Encode did not: null entry hashes, oversize names, and
symlinks disguised as `.gitmodules` / `.gitattributes` /
`.gitignore` / `.mailmap`.

The widened gate exposes one pre-existing path that built trees
containing entries with the zero hash: buildTreeHelper turned every
index entry into a tree entry, including those carried through #1773
to avoid a panic during rec…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants