worktree: skip ignored directories during Status walk#2048
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes Worktree.Status() by avoiding filesystem traversal/hashing for large gitignored directories (e.g., node_modules) by plumbing an ignore matcher into the merkletrie filesystem walker while preserving the “tracked entries are still walked” behavior.
Changes:
- Add
filesystem.IgnoreMatcher+filesystem.Options.IgnoreMatcherand use it during filesystem merkletrie traversal to skip ignored, untracked entries. - Update
Worktreestatus logic to read gitignore patterns once perStatus()call and reuse them both for walk-skipping and the post-walk filter. - Add unit/integration tests and a benchmark covering ignored-directory status performance.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
utils/merkletrie/filesystem/node.go |
Adds ignore-matcher support and tracked-directory detection to allow skipping ignored/untracked entries during traversal. |
utils/merkletrie/filesystem/node_ignore_test.go |
Adds unit tests validating ignored-dir skipping and tracked-in-ignored semantics. |
worktree_status.go |
Reads ignore patterns once, sets the filesystem walker’s ignore matcher, and reuses patterns for the backstop filter. |
worktree_status_test.go |
Adds end-to-end test asserting a tracked-but-ignored file still appears as modified while untracked ignored siblings do not. |
worktree_status_ignored_bench_test.go |
Adds benchmark to demonstrate performance improvement for large ignored directories. |
3c75778 to
5d06149
Compare
Status() walks the working tree via the filesystem merkletrie noder, then filters gitignored entries in excludeIgnoredChanges. Every file under a large ignored directory like node_modules is lstat'd and hashed before being dropped, even though CLI git skips such directories at the directory level (dir.c:read_directory_recursive). Plumb an IgnoreMatcher into filesystem.Options so the walker can skip entries that match the matcher AND have no corresponding entry in the index. Tracked entries that happen to match an ignore rule are still walked so their modifications are reported. Patterns are loaded once in diffStagingWithWorktree and reused, so the post-walk filter no longer re-reads them. Benchmark on a tree with 100 tracked source files and N untracked files in a gitignored directory (Apple M4 Max, -benchtime=10x): Untracked | before | after | speedup --------- | ------- | ------ | ------- 0 | 1.5 ms | 1.9 ms | - 1,000 | 7.0 ms | 4.4 ms | 1.6x 5,000 | 18.2 ms | 4.4 ms | 4.2x 20,000 | 58.4 ms | 4.4 ms | 13.3x The small overhead in the 0-untracked case comes from the unchanged ReadPatterns walk plus a small trackedDirs map; the gain comes from the diff walk no longer descending into ignored subtrees. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]>
Add coverage for the three correctness invariants of the new
IgnoreMatcher path in the filesystem noder, plus a Status-level
integration test:
- TestIgnoredDirIsSkipped: an ignored directory containing only
untracked files is not walked.
- TestTrackedFileInIgnoredDirReportsModify: a tracked file inside an
ignored directory still surfaces as Modify.
- TestUntrackedSiblingsInIgnoredDirAreSkipped: when the walker
descends into an ignored directory because of a tracked file,
untracked siblings are still filtered.
- TestStatusReportsModifiedTrackedFileInIgnoredDirectory: the same
invariant as the second test, exercised through Worktree.Status().
Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
5d06149 to
9695898
Compare
The paralleltest linter is enabled in .golangci.yaml and existing tests in utils/merkletrie/filesystem follow the t.Parallel() convention. The three tests added in 9695898 were missing the call, breaking validate-lint on CI. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]>
shouldSkipIgnored documents that IgnoreMatcher is a no-op when no Index is provided, since without an index there is no way to identify tracked entries. The implementation did not enforce that contract: idxMap nil caused ignored files to be skipped, and trackedDirs nil (a nil-map lookup returning false) caused ignored directories to be skipped too. Add an explicit early return when idxMap is nil, drop the now-dead nil check on the file path, tighten the doc comment on Options.IgnoreMatcher, and cover the contract with a regression test. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]>
The filesystem walker now skips ignored, untracked entries during the Status walk, so they never enter the merkletrie. The post-walk filter that removed them again can never observe such an entry, and tracked entries are walked in both implementations, so excludeIgnoredChanges agrees with the new walker on every input. Drop the function and its only caller in diffStagingWithWorktree, and fold the early return into the DiffTree calls. collectIgnorePatterns is kept; it now only feeds the IgnoreMatcher. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]>
trackedDirs is only ever consulted as a presence check; it never stored
a meaningful bool value. Use map[string]struct{} to make the intent
explicit and avoid the per-entry byte.
Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
Keep all Status() benchmarks in a single file. Also refresh the docstrings: they described the pre-PR behavior where ignored files were walked and hashed, which no longer matches reality now that the filesystem walker skips ignored directories. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]>
Drop the local IgnoreMatcher interface and reference gitignore.Matcher in Options instead. The interface served only to keep this package free of the gitignore import, but reviewers preferred the simpler direct reference: every real caller plugs in a gitignore.Matcher anyway, so the abstraction was paying for a layering boundary nobody was using. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]>
shouldSkipIgnored consulted trackedDirs in the directory branch, but trackedDirs only records *parent* paths of index entries. A submodule's own path is the index entry, so its directory was pruned by the walker when its parent matched an ignore rule, and modifications to the submodule were silently dropped from Status(). Check idxMap before falling back to trackedDirs: an entry whose own path is in the index is tracked, whether it is a regular file or a directory-shaped entry like a submodule. Add a regression test that places a submodule under an ignored directory. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]>
Add three regression tests that exercise edge cases of the ignore-walk
optimization, modeled on equivalent scenarios in upstream git's
t/t7061-wtstatus-ignore.sh and t/t7521-ignored-mode.sh:
- TestDeeplyNestedTrackedFileInIgnoredDir: a tracked file three
directories deep under an ignored top-level dir, exercising the
parent-chain population of trackedDirs.
- TestFilePatternIgnoreSkipsUntrackedSiblings: a *.log file pattern
rather than a directory pattern, ensuring the file branch of
shouldSkipIgnored handles per-file ignore rules.
- TestEmptyIgnoredDirIsSkipped: an ignored directory with no entries
on disk is still pruned without surfacing as a child.
Assisted-by: Claude Opus 4.7
Signed-off-by: Stefan Haubold <[email protected]>
|
@pjbgf @onee-only I had the thought to check which tests the git cli has around this scenario. It had 4 tests with coverage this PR hadn't yet. So I added them and the submodule case found an actual bug. |
onee-only
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the update! I left a small suggestion on trackedDirs initialization. I'd appreciate it if you could share your thoughts.
| var trackedDirs map[string]struct{} | ||
|
|
||
| if options.Index != nil { | ||
| idxMap = make(map[string]*index.Entry, len(options.Index.Entries)) | ||
| for _, entry := range options.Index.Entries { | ||
| idxMap[entry.Name] = entry | ||
| } | ||
|
|
||
| if options.IgnoreMatcher != nil { | ||
| trackedDirs = make(map[string]struct{}) | ||
| for _, entry := range options.Index.Entries { | ||
| for parent := path.Dir(entry.Name); parent != "." && parent != "/"; parent = path.Dir(parent) { | ||
| if _, ok := trackedDirs[parent]; ok { | ||
| break | ||
| } | ||
| trackedDirs[parent] = struct{}{} | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Maybe we can combine these idxMap and trackdDirs initializations into one loop? I'm not sure how big this would affect to the performance, but it could take advantage of locality of reference I think.
There was a problem hiding this comment.
🤓 I let claude work a bit for it's money:
Ran a microbenchmark over NewRootNodeWithOptions setup at 1k/10k/100k index entries, 10 samples each, compared via benchstat:
| Entries | Path | Two-loop (current) | Fused | Δ |
|---|---|---|---|---|
| 1k | WithMatcher | 147.7 µs | 146.6 µs | -0.78% |
| 10k | WithMatcher | 1.027 ms | 1.012 ms | -1.46% |
| 100k | WithMatcher | 7.552 ms | 7.454 ms | -1.30% |
| 1k | NoMatcher | 11.32 µs | 11.46 µs | +1.32% |
| 10k | NoMatcher | 146.1 µs | 149.1 µs | +2.06% |
| 100k | NoMatcher | 1.861 ms | 1.911 ms | +2.71% |
| geomean | 390.1 µs | 391.7 µs | +0.41% |
All deltas are statistically significant (p ≤ 0.01). Allocations are identical.
Fusing wins 1–1.5% on the matcher path but loses 1–3% on the no-matcher path — the per-iteration trackedDirs != nil check costs a comparison every iteration even when nothing else runs. Geomean is +0.41%, essentially flat.
Since the no-matcher path is the common one (only Status with excludeIgnoredChanges populates trackedDirs), I'd rather keep the two passes — the rare-path win doesn't pay for the common-path regression, and the absolute setup cost is well under 1% of Status() either way on realistic repos.
There was a problem hiding this comment.
That sounds good. Thanks for looking into it. 🙇
Reset (HardReset, KeepReset, MergeReset) walks the working tree via diffStagingWithWorktree to find tracked files that are missing or stale on disk. Without the IgnoreMatcher optimization landed in go-git#2048, that walk descends into every gitignored subtree before the loop body discards the resulting Delete actions, so every file under directories like node_modules is lstat'd for nothing. The infrastructure to avoid this already exists: passing excludeIgnoredChanges=true to diffStagingWithWorktree builds a gitignore.Matcher and threads it into filesystem.Options.IgnoreMatcher, where the noder's shouldSkipIgnored prunes ignored, untracked subtrees at enumeration time while keeping tracked entries (or directories with tracked descendants) visible. Flip the flag to true at the two Reset call sites — resetWorktreeToTree and resetWorktree — to engage it. The observable result is unchanged. resetWorktreeToTree's loop already skips Delete actions explicitly to preserve untracked files (the fix from 0c7da9764), and resetWorktree filters changes by the supplied files list, which never names an untracked-and-ignored path. The matcher only avoids the cost of walking those entries. Benchmark on a tree with 100 tracked source files and N untracked files in a gitignored directory (Apple M5, -benchtime=10x): Untracked | before | after | speedup --------- | -------- | ------ | ------- 0 | 2.2 ms | 2.6 ms | - 1,000 | 7.7 ms | 3.8 ms | 2.0x 5,000 | 16.2 ms | 5.2 ms | 3.1x 20,000 | 51.8 ms | 4.6 ms | 11.3x The small overhead in the 0-untracked case is the same matcher build cost noted in go-git#2048; the gain comes from the diff walk no longer descending into ignored subtrees. The numbers track BenchmarkStatusIgnoredDir closely because both operations share diffStagingWithWorktree. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]>
resetWorktreeToTree (HardReset, KeepReset) walks the working tree via diffStagingWithWorktree to find tracked files that are missing or stale on disk. Without the IgnoreMatcher optimization landed in go-git#2048, that walk descends into every gitignored subtree before the loop body discards the resulting Delete actions, so every file under directories like node_modules is lstat'd for nothing. The infrastructure to avoid this already exists: passing excludeIgnoredChanges=true to diffStagingWithWorktree builds a gitignore.Matcher and threads it into filesystem.Options.IgnoreMatcher, where the noder's shouldSkipIgnored prunes ignored, untracked subtrees at enumeration time while keeping tracked entries (or directories with tracked descendants) visible. Flip the flag to true at the resetWorktreeToTree call site to engage it. Leave the resetWorktree (MergeReset) call site at false. resetIndex runs immediately before resetWorktree and removes paths from the index. A tracked path inside a gitignored directory that was just removed from the index would no longer be in idxMap, so the noder would prune it from the walk and the Delete action needed to remove it from disk would never be emitted. resetWorktreeToTree is safe because step 1 (the tree-to-tree diff) handles deletions, and step 2 explicitly skips any Delete action returned from the worktree walk. The observable result is unchanged for HardReset / KeepReset: step 2's loop already skips Delete actions explicitly to preserve untracked files (the fix from 0c7da9764). The matcher only avoids the cost of walking those entries. Benchmark on a tree with 100 tracked source files and N untracked files in a gitignored directory (Apple M5, -benchtime=10x): Untracked | before | after | speedup --------- | -------- | ------ | ------- 0 | 2.2 ms | 2.6 ms | - 1,000 | 7.7 ms | 3.8 ms | 2.0x 5,000 | 16.2 ms | 5.2 ms | 3.1x 20,000 | 51.8 ms | 4.6 ms | 11.3x The small overhead in the 0-untracked case is the same matcher build cost noted in go-git#2048; the gain comes from the diff walk no longer descending into ignored subtrees. The numbers track BenchmarkStatusIgnoredDir closely because both operations share diffStagingWithWorktree. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]>
Three regression tests pinning the contract this branch relies on at the worktree-level (the noder-level invariants are already covered by node_ignore_test.go in go-git#2048). - TestResetHardTrackedFileInIgnoredDir: a tracked file inside a gitignored directory, deleted on disk, is restored by HardReset. Exercises the trackedDirs path through Reset. - TestResetHardModifiedTrackedFileWithGitIgnore: complements the existing TestResetHardWithGitIgnore (which covers deletion) with the modification path: a tracked file whose own name is gitignored has its content reverted by HardReset. - TestMergeResetRemovesTrackedFileInIgnoredDir: pins the reason resetWorktree (MergeReset) must keep excludeIgnoredChanges=false. Engaging the matcher there would prune a tracked path that resetIndex just removed from the index, so the Delete needed to clean it off disk would never be emitted. Verified by flipping the flag locally — the test fails when the matcher is enabled and passes when it is not. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]>
Three regression tests pinning the contract this branch relies on at the worktree-level (the noder-level invariants are already covered by node_ignore_test.go in go-git#2048). - TestResetHardTrackedFileInIgnoredDir: a tracked file inside a gitignored directory, deleted on disk, is restored by HardReset. Exercises the trackedDirs path through Reset. - TestResetHardModifiedTrackedFileWithGitIgnore: complements the existing TestResetHardWithGitIgnore (which covers deletion) with the modification path: a tracked file whose own name is gitignored has its content reverted by HardReset. - TestMergeResetRemovesTrackedFileInIgnoredDir: pins the reason resetWorktree (MergeReset) must keep excludeIgnoredChanges=false. Engaging the matcher there would prune a tracked path that resetIndex just removed from the index, so the Delete needed to clean it off disk would never be emitted. Verified by flipping the flag locally — the test fails when the matcher is enabled and passes when it is not. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]>
* plumbing: packp, fix lint error in advrefs_decode Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: packp, add depth validation to match canonical git Prevent combining deepen <n> with deepen-since or deepen-not, following canonical git's validation in upload-pack.c: `deepen and deepen-since (or deepen-not) cannot be used together` Add validation to both encoder and decoder, with tests for all invalid combinations. Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: transport, fix unreachable haves sideband progress test Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: packp, rename DepthRequest fields to use "deepen" prefix, matching protocol lines Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: packp, refactor packp decoding and encoding, remove state machine Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: capability, document that the List is not thread safe Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: capability, add validation for capabilities and their arguments Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: capability, add SessionID capability with validation Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: transport, validate capabilities in handshake and serve Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: capability, combine tests into table driven cases Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: packp, advrefs: use ErrorIs in tests Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: packp, use deepen mutually exclusive error Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: packp, remove unused io.ReadCloser cast Signed-off-by: Ayman Bagabas <[email protected]> * worktree: skip ignored directories during Status walk (#2048) * *: skip gitignored directories during Status walk Status() walks the working tree via the filesystem merkletrie noder, then filters gitignored entries in excludeIgnoredChanges. Every file under a large ignored directory like node_modules is lstat'd and hashed before being dropped, even though CLI git skips such directories at the directory level (dir.c:read_directory_recursive). Plumb an IgnoreMatcher into filesystem.Options so the walker can skip entries that match the matcher AND have no corresponding entry in the index. Tracked entries that happen to match an ignore rule are still walked so their modifications are reported. Patterns are loaded once in diffStagingWithWorktree and reused, so the post-walk filter no longer re-reads them. Benchmark on a tree with 100 tracked source files and N untracked files in a gitignored directory (Apple M4 Max, -benchtime=10x): Untracked | before | after | speedup --------- | ------- | ------ | ------- 0 | 1.5 ms | 1.9 ms | - 1,000 | 7.0 ms | 4.4 ms | 1.6x 5,000 | 18.2 ms | 4.4 ms | 4.2x 20,000 | 58.4 ms | 4.4 ms | 13.3x The small overhead in the 0-untracked case comes from the unchanged ReadPatterns walk plus a small trackedDirs map; the gain comes from the diff walk no longer descending into ignored subtrees. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> * *: test ignored-directory walk skipping in Status Add coverage for the three correctness invariants of the new IgnoreMatcher path in the filesystem noder, plus a Status-level integration test: - TestIgnoredDirIsSkipped: an ignored directory containing only untracked files is not walked. - TestTrackedFileInIgnoredDirReportsModify: a tracked file inside an ignored directory still surfaces as Modify. - TestUntrackedSiblingsInIgnoredDirAreSkipped: when the walker descends into an ignored directory because of a tracked file, untracked siblings are still filtered. - TestStatusReportsModifiedTrackedFileInIgnoredDirectory: the same invariant as the second test, exercised through Worktree.Status(). Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> * *: add t.Parallel() to ignore-walk tests The paralleltest linter is enabled in .golangci.yaml and existing tests in utils/merkletrie/filesystem follow the t.Parallel() convention. The three tests added in 96958981 were missing the call, breaking validate-lint on CI. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> * utils: merkletrie/filesystem, ignore IgnoreMatcher when Index is nil shouldSkipIgnored documents that IgnoreMatcher is a no-op when no Index is provided, since without an index there is no way to identify tracked entries. The implementation did not enforce that contract: idxMap nil caused ignored files to be skipped, and trackedDirs nil (a nil-map lookup returning false) caused ignored directories to be skipped too. Add an explicit early return when idxMap is nil, drop the now-dead nil check on the file path, tighten the doc comment on Options.IgnoreMatcher, and cover the contract with a regression test. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> * *: drop redundant excludeIgnoredChanges post-walk filter The filesystem walker now skips ignored, untracked entries during the Status walk, so they never enter the merkletrie. The post-walk filter that removed them again can never observe such an entry, and tracked entries are walked in both implementations, so excludeIgnoredChanges agrees with the new walker on every input. Drop the function and its only caller in diffStagingWithWorktree, and fold the early return into the DiffTree calls. collectIgnorePatterns is kept; it now only feeds the IgnoreMatcher. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> * utils: merkletrie/filesystem, switch trackedDirs to a set trackedDirs is only ever consulted as a presence check; it never stored a meaningful bool value. Use map[string]struct{} to make the intent explicit and avoid the per-entry byte. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> * *: fold ignored-dir status bench into worktree_status_bench_test.go Keep all Status() benchmarks in a single file. Also refresh the docstrings: they described the pre-PR behavior where ignored files were walked and hashed, which no longer matches reality now that the filesystem walker skips ignored directories. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> * utils: merkletrie/filesystem, use gitignore.Matcher in Options directly Drop the local IgnoreMatcher interface and reference gitignore.Matcher in Options instead. The interface served only to keep this package free of the gitignore import, but reviewers preferred the simpler direct reference: every real caller plugs in a gitignore.Matcher anyway, so the abstraction was paying for a layering boundary nobody was using. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> * utils: merkletrie/filesystem, walk submodules inside ignored directories shouldSkipIgnored consulted trackedDirs in the directory branch, but trackedDirs only records *parent* paths of index entries. A submodule's own path is the index entry, so its directory was pruned by the walker when its parent matched an ignore rule, and modifications to the submodule were silently dropped from Status(). Check idxMap before falling back to trackedDirs: an entry whose own path is in the index is tracked, whether it is a regular file or a directory-shaped entry like a submodule. Add a regression test that places a submodule under an ignored directory. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> * utils: merkletrie/filesystem, expand ignore-walk test coverage Add three regression tests that exercise edge cases of the ignore-walk optimization, modeled on equivalent scenarios in upstream git's t/t7061-wtstatus-ignore.sh and t/t7521-ignored-mode.sh: - TestDeeplyNestedTrackedFileInIgnoredDir: a tracked file three directories deep under an ignored top-level dir, exercising the parent-chain population of trackedDirs. - TestFilePatternIgnoreSkipsUntrackedSiblings: a *.log file pattern rather than a directory pattern, ensuring the file branch of shouldSkipIgnored handles per-file ignore rules. - TestEmptyIgnoredDirIsSkipped: an ignored directory with no entries on disk is still pruned without surfacing as a child. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> --------- Signed-off-by: Stefan Haubold <[email protected]> * build: Update module github.com/pjbgf/sha1cd to v0.6.0 * build: Deprecate ChrootOS Signed-off-by: Paulo Gomes <[email protected]> * storage: Ensure alternates resolve BoundOS path Signed-off-by: Paulo Gomes <[email protected]> * plumbing: Update util.TempFile usage Signed-off-by: Paulo Gomes <[email protected]> * *: Use test TempDir() instead of util.TempDir() Signed-off-by: Paulo Gomes <[email protected]> * plumbing: transport/http, Add test for multi-round Fetch Signed-off-by: Paulo Gomes <[email protected]> * plumbing: transport, Align flush size with upstream git Previously go-git flush size was hard-coded to 32, which could be rather inefficient for very large negotiations. This change aligns the behaviour with upstream, growing exponentially that number when it is safe to do so, as per next_flush() (fetch-pack.c:277-291). The initial flush size is now 16 - which is also aligned with upstream. ref: https://github.com/git/git/blob/9f223ef1c026d91c7ac68cc0211bde255dda6199/fetch-pack.c Signed-off-by: Paulo Gomes <[email protected]> * plumbing: transport, align ACK handling and negotiation with upstream git Add applyServerACKs to properly handle multi-ack-detailed responses: - Reset in_vain on ACK_continue and ACK_ready (fetch-pack.c:550-552) - Re-queue ACK_common haves for stateless RPC rounds (fetch-pack.c:533-542) - Suppress duplicate ACK_common to avoid re-queuing already-known objects - Track ACK_ready and stop negotiation early (fetch-pack.c:555-556, 569-570) - Send a terminal done round after ACK_ready (fetch-pack.c:577-579) - Clamp batch size to remaining in-vein budget after got_continue Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * plumbing: transport/http, Close storage Signed-off-by: Paulo Gomes <[email protected]> * plumbing: object, Add support for gpgsig-sha256 header Signed-off-by: Paulo Gomes <[email protected]> * plumbing: objects, Align EncodeWithoutSignature with upstream Git Signed-off-by: Paulo Gomes <[email protected]> * plumbing: objects, Ignore standard headers in ExtraHeaders Signed-off-by: Paulo Gomes <[email protected]> * plumbing: object, Decode commit headers via a state machine Replace the per-iteration flag soup in (*Commit).Decode with a func-based stateFn loop that mirrors upstream Git's parsing posture. The new state machine encodes "where we are in the buffer" as the function being called rather than as a bag of mutable booleans. Conformance changes vs upstream commit.c: - tree must be the first header, missing tree errors with ErrMalformedCommit (matches the "bogus commit object" check at commit.c:508-510). - Parents are accepted only contiguously after tree; out-of-position parent lines are silently dropped (matches parse_commit_buffer's parent loop exiting at the first non-parent line and read_commit_extra_header_lines filtering parents from extras). - Author and committer are accepted only at their canonical positions (immediately after parents, immediately after author). Out-of-place occurrences are dropped, mirroring parse_commit_date silently returning 0 plus the standard_header_field filter. - Encoding/mergetag/gpgsig/gpgsig-sha256 follow first-wins; duplicate sig/mergetag continuation lines are discarded by a dedicated scanSkipCont state. - Continuation strip uses line[1:] (single space) instead of bytes.TrimLeft, mirroring upstream's `line + 1` at commit.c:1509. - Header/body boundary is the literal "\n" only, matching *line == '\n' at commit.c:1502. Tests: - TestDecodeRequiresTreeFirst covers the four hard-error inputs (missing tree, parent before tree, extra before tree, empty buffer). - TestDecodeFirstOccurrenceWins covers the lenient drop semantics for duplicate tree/author/committer/gpgsig/gpgsig-sha256, parents after author or interleaved with extras, missing committer, encoding between author and committer (drops the misplaced committer), and author at a non-canonical position (drops author and committer). - TestMalformedHeader is back to expecting NoError (lenient parsing of garbage author/committer values). The state machine lives in plumbing/object/commit_scanner.go. Assisted-by: Claude Opus 4.7 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * plumbing: objects, Remove Commit.MergeTag in favour of ExtraHeaders The previous MergeTag field had two issues: 1. It could only represent a single mergetag, whilst upstream support multiple within a single Commit. 2. Did not allow for a byte-to-byte alignment with upstream, due to the above, and its position set by go-git during encoding time. In order to mitigate both issues, it will now be handled as part of ExtraHeaders, which better aligns with upstream. Signed-off-by: Paulo Gomes <[email protected]> * plumbing: object, Add support for Tag.SignatureSHA256 Signed-off-by: Paulo Gomes <[email protected]> * plumbing: object, Align Tag.EncodeWithoutSignature with Commit Tag verification re-encoded from struct fields, so any non-canonical bytes the signature was computed over (duplicate headers, mutation-only fields like Signature/SignatureSHA256, etc.) were lost from the payload and signatures that were valid in upstream Git failed in go-git. Mirror the approach already used by Commit: retain a reference to the source plumbing.EncodedObject on Decode, add matchesSource() so EncodeWithoutSignature can stream the raw bytes (with the inline trailing PGP block truncated and gpgsig/gpgsig-sha256 headers stripped) when the struct still matches the source. Mutating struct fields still falls back to the struct-encoded payload. Assisted-by: Claude Opus 4.7 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * plumbing: object, Reject multi-signature commits at Verify Align Commit.Verify with upstream's parse_buffer_signed_by_header (commit.c:1186) and parse_gpg_output rejection (gpg-interface.c:257-269). The commit scanner now concatenates every gpgsig/gpgsig-sha256 header into a single signature buffer instead of first-wins, and Commit.Verify returns the new ErrMultipleSignatures when that buffer carries more than one armored block. Assisted-by: Claude Opus 4.7 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * plumbing: object, Decode Tag headers via a state machine Signed-off-by: Paulo Gomes <[email protected]> * plumbing: object, Add ErrMalformedTag Signed-off-by: Paulo Gomes <[email protected]> * tests: Add git conformance tests for signing verification Signed-off-by: Paulo Gomes <[email protected]> * tests: Skip double checks in Git v2.11 Signed-off-by: Paulo Gomes <[email protected]> * plumbing: object, Align Tree handling with upstream Signed-off-by: Paulo Gomes <[email protected]> * plumbing: object, Reset object before decode Signed-off-by: Paulo Gomes <[email protected]> * internal: git, add server timeout test small tolerance This adds a small tolerance to workaround OS related time precision jitter. Without this, time-based tests might fail with really small deltas like 199.3ms < 200ms. Signed-off-by: Ayman Bagabas <[email protected]> * plumbing: transport/ssh, Shell-quote path and args The SSH transport interpolates `req.URL.Path` and each entry of `req.Args` into the remote-exec command line. The previous formatter wrapped each value in literal single quotes via `'%s'`, which diverges from canonical Git's wire format for any value containing characters that the shell treats specially. Port `sq_quote_buf`[1] from canonical Git into a `writeShellQuote` helper that mirrors upstream's `struct strbuf *dst` signature, and route both the path and each `req.Args` entry through it. The helper escapes the same character set that upstream Git escapes (`'` and `!`), so the bytes go-git puts on the wire match what `git` itself would send for the same input. Benign inputs are byte-identical to the previous output; values containing `'` or `!` now round-trip correctly through any POSIX shell and through `git-shell`'s `sq_dequote_to_argv`. Add a table-driven `TestBuildCommand` covering the plain case, both characters that `sq_quote_buf` escapes, mixed inputs, and inert metacharacters that pass through unchanged. [1]: https://github.com/git/git/blob/v2.54.0/quote.c#L28 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * git: submodule, canonical remote for relative URLs A relative submodule URL like `../sibling` has to be joined onto a base URL — the superproject's remote — to produce the actual URL to clone. Submodule.Repository() picked `Repository.Remotes()[0]` as the base. That slice is built by iterating a Go map, so its order is randomized: in any superproject with more than one remote (the common `origin` + `upstream` fork setup), the chosen base flipped between runs and the same `git submodule update` could clone the sibling from either namespace. Canonical Git is deterministic[1][2]: it uses `branch.<HEAD>.remote` if configured, the sole remote when there is exactly one, otherwise the literal `"origin"`. The submodule helper[3] then joins the relative URL onto that remote's URL. Mirror that rule in an unexported `defaultRemote` and call it from the relative-URL branch. Tests cover the rule directly and the end-to-end submodule flow; the integration test loops to catch any future regression that re-introduces map-order dependence. [1]: https://github.com/git/git/blob/v2.54.0/remote.c#L1801-L1809 [2]: https://github.com/git/git/blob/v2.54.0/remote.c#L655-L669 [3]: https://github.com/git/git/blob/v2.54.0/builtin/submodule--helper.c#L53-L75 Assisted-by: Claude Opus 4.7 <[email protected]> Signed-off-by: Hidde Beydals <[email protected]> * storage: filesystem, Use repository object format for loose objects Pass the configured object format into the objfile writer so SHA-256 repositories create loose objects at their SHA-256 object IDs instead of writing SHA-1-addressed files. Add SHA-256 writer and filesystem round-trip coverage. Assisted-by: OpenAI Codex <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * git: Add commit and ref creation SHA-256 tests Signed-off-by: Paulo Gomes <[email protected]> * git: submodule, detect absolute paths on Windows When a submodule URL parses to a `file://` URL, Submodule.Repository() checks whether the URL's `.Path` is absolute and only joins it onto a parent remote if it isn't. The check used `path.IsAbs`, which only recognizes POSIX-rooted paths (`/foo`). On Windows a submodule URL can legitimately be a drive-letter path (`C:\path\to\repo`) or a UNC share (`\\server\share\foo`); those are absolute under `filepath.IsAbs` but not `path.IsAbs`, so the code was treating them as relative and joining them onto the superproject's remote URL, producing nonsense like `file:///parent/origin/C:\path\to\repo`. Add the platform-aware `filepath.IsAbs` check alongside the URL-style `path.IsAbs`. POSIX-rooted paths still work because `path.IsAbs` catches them; native Windows absolute paths now skip the relative-URL branch as they should. The new test runs only on Windows (`//go:build windows`), where the divergence is observable — on Unix both predicates agree on every case. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * internal: url, disambiguate SCP from local paths `MatchesScpLike` accepted any string that fit the regex `^[user@]host:[port:]path`, but the regex's host class permits `/` and single ASCII letters, so it matched strings that are actually local filesystem paths: - `./relative:path` and `/abs/with:colon/file` matched with host `./relative` or `/abs/with`. - On Windows, `C:/path/to/repo` matched with host `C` — a Windows drive-letter path treated as SSH to a single-character hostname. `transport.ParseURL` then returned an `ssh://` URL where canonical Git would return a local path. The Windows drive-prefix case is the practical impact: a submodule URL of `C:/path/to/repo` in .gitmodules is a valid Windows-absolute path, and go-git was attempting an SSH connection to a host named `C` instead of opening the local repository. Mirror canonical Git's `url_is_local_not_ssh`[1]: after the regex matches, reject the SCP classification when (a) the URL has a `/` before the first `:`, or (b) on Windows, the URL begins with a DOS drive prefix. The drive-prefix check is a small byte-wise helper rather than a regex, mirroring Git's `win32_has_dos_drive_prefix`[2] and matching the canonical implementation more closely. Add positive tests for the rejected forms (portable for (a), gated by `runtime.GOOS == "windows"` for (b)) and a regression test verifying that real SCP forms still match. [1]: https://github.com/git/git/blob/v2.54.0/connect.c#L710-L716 [2]: https://github.com/git/git/blob/v2.54.0/compat/win32/path-utils.c#L20-L29 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * git: submodule, error on remote without URLs `defaultRemote`, introduced in 0d02124f, returns a `*RemoteConfig` that `submodule.Repository()` then dereferences as `base.URLs[0]` to join a relative submodule URL. `RemoteConfig.Validate()` rejects an empty `URLs` slice on `SetConfig`, but a config that bypasses validation — for example a hand-edited `[remote "origin"]` section with no `url =` entry loaded from disk — could still produce a remote with no URLs and panic the caller with `index out of range [0] with length 0`. Promote the invariant into `lookupRemote` and route the single-remote branch through it as well, so any code path that asks `defaultRemote` for the resolution base gets a usable remote or a clear `remote %q has no configured URL` error. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * plumbing: objfile, Pass object format into Reader Thread the repository's object format through objfile.NewReader so SHA-256 loose objects are hashed with SHA-256 instead of being hard-coded to SHA-1. Update callers in dotgit, filesystem object storage, and the dumb HTTP transport, and extend reader/writer tests with SHA-256 round-trip coverage. Assisted-by: Claude Opus 4.7 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * git: submodule, Add relative URL regression tests Mirror the regression-test coverage from v5 PR #2070 (54a32347) to v6. The v5 commit fixed `Submodule.Repository()` to resolve relative submodule URLs such as `../X.git` against the parent repository's remote URL per git-submodule(1)[1] and canonical Git, rather than absolutizing them against the process CWD. The v5 root cause — `transport.NewEndpoint` running scheme-less inputs through `parseFile`, which called `filepath.Abs` — does not exist on v6: `internal/url.ParseFile` stores the input verbatim. The relative-resolution branch in `Submodule.Repository()` therefore already fires correctly, and PR #2071 (0d02124f) plus PR #2076 (16558108) further hardened remote selection via `defaultRemote` and `lookupRemote`. So the production code on v6 already has the fix; this commit only backfills the matching test coverage so a future refactor that reintroduces eager normalization in `ParseFile` fails loudly instead of silently breaking submodule resolution. Five table-driven cases cover relative URLs against HTTPS and SSH parents, deep traversal (`../../`), absolute local URL preservation, and the no-parent-remote case. The last asserts the `defaultRemote`-wrapped error `resolving relative submodule URL: remote "origin" not found`, matching v5's `TestRepositoryRelativeURLNoParentRemote` after the same upstream remote-selection refactor landed there. [1]: https://git-scm.com/docs/git-submodule Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * plumbing: format/idxfile, Validate offset64 indices When a 32-bit offset entry has its MSB set, the lower 31 bits index into the `Offset64` table. Neither `MemoryIndex.getOffset` nor `LazyIndex.offset` validated that the resulting position fell inside the decoded table; an idx with an MSB-set entry whose lower bits exceed the number of decoded 64-bit entries would index past the end of the table. Bounds-check both lookups against the decoded table size and return `ErrMalformedIdxFile` when the index is out of range. `MemoryIndex`'s unexported `getOffset` now returns `(uint64, error)`; the change propagates through `FindOffset`, `genOffsetHash`, and the `Entries` iterator. `LazyIndex` caches the 64-bit-entry count discovered during `init` and uses it for the same check before issuing the `ReadAt`. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * storage: filesystem, Search alternates in HashesWithPrefix EncodedObject, HasEncodedObject and EncodedObjectSize all fall back to alternate object stores, but HashesWithPrefix only consults local loose objects and packs. With alternates configured, ResolveRevision on an abbreviated hash that lives in the alternate returns ErrReferenceNotFound even though the full hash resolves. Iterate alternates after local lookup, deduplicating against results already collected. Also mark pack hits as seen so a hash present in multiple packs or in both a pack and an alternate is returned once. Signed-off-by: Andrew Nesbitt <[email protected]> * plumbing: format/idxfile, Add `MemoryIndex` fuzzer `FuzzLazyIndex` covers one of the two parallel idx implementations. The companion `MemoryIndex` (decoder + post-decode `Index` methods + both iterators) had no fuzz coverage, which is why the offset64 bounds gap fixed in e429f8e2 was not surfaced earlier. `FuzzMemoryIndex` mirrors the structure of `FuzzLazyIndex`: seed the corpus with structurally valid prefixes (using `buildMinimalIdx`), the empty input, and a malformed-but-decodable idx from `buildOOBOffset64Idx`; the fuzz function then runs `Decode` and exercises every `Index` method plus both iterators on inputs that decode successfully. `buildOOBOffset64Idx` lives in `fuzz_helpers.go` (alongside `buildMinimalIdx`) rather than `_test.go`, because the OSS-Fuzz harness extracts fuzz targets into a non-test build context where sibling `_test.go` symbols are not resolved. A 30s local run produced ~2.25M executions and 99 corpus inputs without failure on top of the validated tree. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * plumbing: format/idxfile, Reformulate offset64 check The previous bounds check `offset+8 > uint64(len(idx.Offset64))` was correct because masking and the `uint32` source type bound `offset` to ~2^34, leaving uint64 headroom. That safety relies on invariants in surrounding code (mask shape and source type) that aren't visible at the comparison itself. Reformulate as `len < 8 || offset > len-8`, which holds without relying on the upstream bound. Behaviour is unchanged. `LazyIndex.offset` already uses an index-vs-count comparison (`loIndex >= s.count64`) with no addition, so it needs no analogous change. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * storage: filesystem, Close entry iterator on error in HashesWithPrefix The pack entry iterator was only closed after a full iteration; an error from Next() returned early without closing it. Signed-off-by: Andrew Nesbitt <[email protected]> * git: Add path-validating filesystem wrapper Introduce worktreeFilesystem, a wrapper around billy.Filesystem that calls validPath on every mutating operation (Create, OpenFile, Remove, Rename, Symlink, MkdirAll). This ensures dangerous paths like .git/*, ../, and git~1/ are rejected regardless of which code path writes to the worktree. The Worktree.Filesystem field is replaced with a Filesystem() method that returns the underlying billy.Filesystem, preventing external code from bypassing the wrapper by reassigning the field. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * git: Gate NTFS path checks on core.protectNTFS config Replace the runtime.GOOS == "windows" guard in path validation with the core.protectNTFS configuration option, matching upstream Git behaviour. When not explicitly set, defaults to true on Windows. This allows non-Windows systems to opt in to NTFS protection, and Windows systems to opt out when not needed. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * git: Add core.protectHFS config for HFS+ path validation Add support for core.protectHFS, which detects .git paths obfuscated with Unicode zero-width and directional characters that HFS+ silently strips during path normalization. When enabled, paths like .g\u200cit are rejected. Defaults to true on macOS, matching upstream Git behaviour. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * git: Validate .git and dot components at every path level Extend path validation to check every path component, not just the first. Single-dot components are also rejected. This matches upstream Git's verify_path_internal which validates at every directory separator boundary. A non-first final .git component (e.g. "submodule/.git") is permitted because submodule worktrees contain a .git pointer file. The per-call-site validChange checks are removed since the worktreeFilesystem wrapper enforces validation on all mutating operations. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * git: Check for Windows reserved names Signed-off-by: Paulo Gomes <[email protected]> * git: Split ntfs and hfs Signed-off-by: Paulo Gomes <[email protected]> * git: Expand worktreeFilesystem validation Signed-off-by: Paulo Gomes <[email protected]> * git: Refer to internal filesystem field Signed-off-by: Paulo Gomes <[email protected]> * build: Skip tests for older git versions Signed-off-by: Paulo Gomes <[email protected]> * git: Allow for submodules with .git in Windows Signed-off-by: Paulo Gomes <[email protected]> * plumbing: idxfile, widen sharedFile reset test timing TestSharedFileGracePeriodResetByAcquire flaked repeatedly on the Windows runners with `Should be false ... new grace period hasn't expired`. The test set `sf.gracePeriod = 80ms`, ran an acquire/ release pair, slept 50ms and triggered a reset acquire, slept another 50ms, then asserted the file was still open. That left only 30ms of margin on the second assertion — well under the ~15ms `time.Sleep` granularity and scheduling jitter that GitHub-hosted Windows runners exhibit under load, so the close timer fired before the assertion ran. Widen the grace period to 200ms, leaving ~150ms of headroom on the negative assertion, and replace the fixed-sleep `assert.True` for the final "file should be closed" check with `assert.Eventually`, so we no longer race timer scheduling jitter on either side. The test still covers the reset-on-acquire semantics introduced by fc7566a3 [1]; only the timing knobs change. Recent flake on `main`: 25490308205 [2]. [1]: https://github.com/go-git/go-git/commit/fc7566a3f53410298c77fa23635889c81e3dd3b5 [2]: https://github.com/go-git/go-git/actions/runs/25490308205 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * internal: server/git, widen Timeout test tolerance TestGitServer_Timeout flaked on the Linux runners with messages like `"193.372166ms" is not greater than or equal to "195ms"` — the server closed ~7ms before the assertion's lower bound. The test measures elapsed from the client's perspective: start := time.Now() _, err = conn.Read(buf) elapsed := time.Since(start) but the server arms its idle deadline during accept and the first server-side `Read`, both of which run before the client calls `Read`. Client-measured elapsed therefore systematically underestimates the real server-side timeout by a small amount, and the previous 5ms tolerance was simply tighter than realistic scheduling jitter on shared GitHub-hosted runners. Widen the tolerance to 30ms (~15% of the 200ms timer). This still catches a server that closes substantially earlier than configured (a real bug), but absorbs the inherent client/server clock skew plus normal CI jitter. Recent flakes on `main`: 25443754481 [1], 25440100884 [2]. [1]: https://github.com/go-git/go-git/actions/runs/25443754481 [2]: https://github.com/go-git/go-git/actions/runs/25440100884 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * *: close fetch/push session and test storages On Windows the test runner repeatedly hit testing.go:1464: TempDir RemoveAll cleanup: unlinkat …\objects\pack\pack-XXXX.idx: The process cannot access the file because it is being used by another process. across `TestFetchWithFilters` (run 25132475242 [1]), `TestSmartInfoRefsObjectFormat` (25498994424 [2]), and several upload-pack server suite tests. The cause traces back to the sharedFile-managed `.idx` descriptors introduced in 13daf62f: they are kept open for a one-second grace period after the last reader releases, which races `t.TempDir`'s synchronous `RemoveAll` on test return. Two distinct shapes leaked storages. First, `Remote.fetch` and `Remote.PushContext` opened a transport session via `Handshake` but only closed it on the success path, so every early return — refspec validation, `GetRemoteRefs` failure, `ErrFilterNotSupported` — leaked the session, and with it the file transport's storage and its `.idx` readers. Mirror `Remote.list`'s pattern by adding `defer ioutil.CheckClose(sess, &err)` immediately after `Handshake`, and drop the now-redundant explicit close in `fetch`'s happy path. The push path showed the same leak shape without an observed flake; it is fixed alongside `fetch` since it is the same one-line change. Second, `tagLoader`/`fixturesLoader` in `backend/http_test.go` and the `testServe` helper in `plumbing/transport` build a `filesystem.Storage` per request but never close it. Register a `t.Cleanup` so the storage closes at test end, cancelling the grace timer and releasing descriptors synchronously. `TestUploadPackStatefulMultiRoundSendsFinalACK` bypasses `testServe` and gets its own cleanup. [1]: https://github.com/go-git/go-git/actions/runs/25132475242 [2]: https://github.com/go-git/go-git/actions/runs/25498994424 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * *: Reject malformed variable-length integers Variable-length integer decoders accepted continuation chains long enough to push the running shift past the bit-width of their accumulator, producing out-of-range values that propagated into buffer allocation hints and on-disk size fields. Mirror the bound from canonical Git's `unpack_object_header_buffer`[1]: reject any continuation byte that would shift past the type's capacity. The same rule applies to `VariableLengthSize`, `DecodeLEB128`, `DecodeLEB128FromReader` and `ReadVariableWidthInt`; `DecodeLEB128` now returns an error so callers surface invalid input rather than silently zeroing the result. Add defense-in-depth at every call site that allocates from a parsed length: cap `bytes.Buffer.Grow` hints in `Parser.parentReader` and `patchDeltaWriter` at 1 GiB / `maxPatchPreemptionSize`, cap `parserCache.Reset` at a 4 Mi-entry hint so a hostile `ObjectsQty` cannot request a multi-gigabyte slice, validate `OFS-delta` back-references in the streaming and mmap paths (must be `> 0` and `<= o.offset`), and reject `mmap.fsobject` sizes whose top bit is set rather than truncating into a negative `int64`. Cover the new behavior with unit, parser-level regression and fuzz tests (`FuzzVariableLengthSize`, `FuzzDecodeLEB128`, `FuzzParser`). [1]: https://github.com/git/git/blob/v2.54.0/packfile.c#L1135-L1158 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * backend: use embedded TB.Cleanup directly `fixturesLoader` and `tagLoader` embed `testing.TB`, so the explicit `f.TB.Cleanup` / `l.TB.Cleanup` selectors trip `staticcheck`'s `QF1008` (redundant embedded-field selector). Drop the explicit `.TB`. No behaviour change. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * plumbing: format/packfile, Test patchDeltaWriter cap `patchDeltaWriter`'s preemptive `Buffer.Grow` is already clamped at `maxPatchPreemptionSize` since `5b89cb5a`, but no test covered that path end-to-end. Add a regression test that constructs a delta whose header advertises a `math.MaxInt64` `targetSz` and asserts the call returns an error rather than driving the allocation. Also restate the rationale for the clamp in the comment, naming both the input-validation goal and the symmetry with `patchDelta`. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * plumbing: format/packfile, Tighten delta validation Each delta operation's declared size is validated against the size still remaining in the target buffer, not the total target size. The previous check against the total accepted a sequence of operations whose cumulative output would exceed the declared target, leaving the unsigned remaining counter unable to reach zero and the loop free to keep writing past the declared target until the delta payload was consumed. Three sibling code paths shared the flaw and are updated together: `patchDelta` (used by `PatchDelta` and `ApplyDelta`), `ReaderFromDelta`, and `patchDeltaWriter` (used by the packfile parser). The loop structure is rewritten as `for remainingTargetSz > 0` so the invariant is explicit, and a post-loop check rejects deltas that leave bytes unconsumed -- matching the `data != top` sanity check in `patch-delta.c`[1]. Empty-target deltas (header-only, no operations) are now accepted, also matching upstream. Several incidental defects on the same call paths are fixed: `patchDelta`'s copy-from-src failure used `break`, which only exited the switch and let the loop keep consuming delta bytes; `patchDeltaWriter` returned `(0, ZeroHash, nil)` on invalid input, silently reporting success; and `ReaderFromDelta`'s copy-from-src failure closed the writer without an error, silently truncating the consumer stream. Each now propagates `ErrInvalidDelta` (or `ErrDeltaCmd`) consistently. `packfile.getMemoryObject` was re-inflating delta payloads into the buffer the scanner had already populated, appending a duplicate copy that previously went unnoticed because the loop silently ignored anything past the target. The fix matches the cached-content pattern already used in `Scanner.WriteObject`. [1]: https://github.com/git/git/blob/c2c5f6b1e479f2c38e0e01345350620944e3527f/patch-delta.c Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * config: optbool, accept Git-style boolean syntax Add a `parseConfigBool` helper that mirrors upstream Git's `git_parse_maybe_bool`: it accepts `true`/`yes`/`on`/`1` and `false`/`no`/`off`/`0` case-insensitively, and returns `OptBoolUnset` for empty or unrecognised values so the caller's platform default stays in place. `unmarshalCore` swaps `strconv.ParseBool` for `parseConfigBool` when reading `core.protectNTFS` and `core.protectHFS`. The previous `strconv.ParseBool` path silently misinterpreted user-friendly syntax: writing `protectNTFS = on` made `strconv.ParseBool` return an error, which left the field at its zero value (`OptBoolUnset`), so on Windows the platform default applied — but a user who wrote `protectNTFS = on` to deliberately enable the protection on Linux would have got the platform default (`false`) instead of the explicit `true` they intended. With the tolerant parser, all of `true` / `yes` / `on` / `1` (and their false counterparts) take effect where the user expects them to. Other booleans in this package keep the loose `== "true"` pattern; aligning them is out of scope. Only the security toggles are upgraded here, where silent misinterpretation has the highest cost. Reference: upstream Git `git_parse_maybe_bool_text` at `parse.c` L157-L173 and `git_parse_maybe_bool` at `parse.c` L174-L182 in tag `v2.54.0`[1]. `git_parse_maybe_bool` is the closer match, since it also accepts integer values via `git_parse_int`. [1]: https://github.com/git/git/blob/v2.54.0/parse.c#L157-L182 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * internal: pathutil, port Git path-validity rules Lift platform-specific dotgit-variant detection out of package `git` into a self-contained package so it can be reached from any caller — `config` (#2079's submodule-name validation), `storage/filesystem/dotgit` (the `Module(name)` containment check), and the tree-side gates added in a follow-up commit — without those callers depending on the root `git` package. The package collects three layers: - HFS+ side: the ignored-codepoint table plus `IsHFSDot` / `IsHFSDotGit` / `IsHFSDotGitmodules` family (zero-width / case-folding aware). Implementations are unchanged from the previous `git`-package versions; only the package boundary moves. - NTFS / Windows side: `IsNTFSDot` (a port of canonical Git's `is_ntfs_dot_generic`), `IsNTFSDotGitmodules`, the `WindowsValidPath` predicate, and the reserved-name table (`CON`, `NUL`, `AUX`, …). The `dotGit = ".git"` constant is declared locally to avoid coupling pathutil to `git.GitDirName`. - Cross-platform helper: `IsDotGitName` matches `.git` and its 8.3 NTFS short alias `git~1` case-insensitively. On top of these primitives sits `ValidTreePath`, the strict validator applied at the boundary where attacker-controlled tree data leaves the trusted store. Where the wrapper-layer `validPath` in package `git` is intentionally tolerant of final-position `.git` (legitimate `submodule/.git` flows in submodule cleanup) and only consults HFS+/NTFS variants when the corresponding `core.protect*` flag is on, `ValidTreePath` is always-strict regardless of runtime config: tree paths are canonical UTF-8 with no zero-width characters or 8.3 short- name forms, so an entry that looks like one is suspicious anywhere. It rejects control characters, empty / `.` / `..` components, Windows volume-name prefixes, `.git` and its HFS+/NTFS variants at every position, and reserved device names. Mirrors upstream Git's `verify_path_internal` at `read-cache.c` L987-L1048 in tag `v2.54.0`[1], stripped of its runtime `protect_hfs` / `protect_ntfs` gating because the pathutil layer is consulted from the strict tree boundary, not from application paths. [1]: https://github.com/git/git/blob/v2.54.0/read-cache.c#L987-L1048 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * *: gate path materialisation at trust boundaries Apply `pathutil.ValidTreePath` at the chokepoints where tree data crosses out of the trusted object store and where application-supplied paths cross into the index. Layered on top of the existing tolerant wrapper `validPath` in package `git`, this gives the worktree two layers of protection: strict validation at the boundary, tolerant validation at the filesystem edge for legitimate flows (`submodule/.git` Stat / Remove during submodule cleanup). Read-side chokepoints in `plumbing/object`: - `(*Tree).FindEntry` — most callers funnel through here: `(*Tree).File`, `(*Tree).Tree`, `(*Tree).Size`, and the `checkoutChange` Modify/Insert branch. A dangerous tree- derived path is refused at the lookup boundary before anything materialises. - `TreeWalker.Next` — drives `transformChildren` (which feeds `merkletrie.DiffTree`), `FileIter`, and the archive writers in `internal/archive`. Each leaf entry name is validated as it surfaces; a malformed entry stops the walk with the validator's error rather than skipping silently. Inspection-only callers that need raw access can still read `Tree.Entries` directly. - `(*Tree).TreeEntryFile` — boundary where a `*File` whose Name a caller can hand to filesystem ops leaves the store. Write-side chokepoint in `plumbing/format/index`: - `Index.Add(path)` now validates and returns `(*Entry, error)`. Every Add and Move flow that records a path in the index funnels through here, mirroring upstream Git's `verify_path_internal` invocation from `make_cache_entry` on the index-addition side. `Index.Add`'s signature change is a v6 API break. Application-side gates in package `git`: - The wrapper-level `validPath` continues to gate filesystem writes; HFS+/NTFS-aware rejection of `.gitmodules` symlink targets is now driven by the same `pathutil` predicates so the wrapper and the strict validator stay aligned. The control-character loop is byte-oriented for upstream parity with `verify_path_internal`. - `Submodule.Repository`'s `Chroot` validates the submodule's tree-stored Path before scoping the repository, refusing embedded `.git` / HFS+ / NTFS variants regardless of `core.protectHFS` / `core.protectNTFS`. - `Worktree.checkoutChange`'s Delete branch is reached only from `resetWorktree`'s filesystem-vs-index merkletrie diff (`resetWorktreeToTree`'s tree-derived deletes call `rmFileAndDirsIfEmpty` directly). The path source is the local worktree filesystem, where the tolerant wrapper is the right fit: cleanup of legitimately-tracked shapes like `submodule/.git` should not abort the whole reset. The root-level `hfs.go` / `ntfs.go` files held only the 3-line `defaultProtectHFS` / `defaultProtectNTFS` runtime- policy helpers after the `pathutil` extraction; they fold into `worktree_fs.go` next to the wrapper that consumes them. The wrapper-integration tests (`TestValidPath*`, `TestWorktreeFilesystem*` for HFS / NTFS variants) move next to the wrapper they exercise in `worktree_fs_test.go`. Tests pin every chokepoint against the catalogue of attacker shapes used elsewhere in this branch — final-position `.git` (`submodule/.git`), NTFS trailing-space / trailing-dot / ADS (`.git ` / `.git.` / `.git::$INDEX_ALLOCATION`), HFS+ zero- width `.git` (e.g. `.git`), Windows reserved `CON` / `NUL`, the 8.3 short alias `git~1`, dot-dot traversal, control chars, and the empty path. `Index.Add`'s `TestIndexAdd` is updated for the new signature, and a new `TestIndexAddRejectsDangerousPaths` pins the index gate. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * *: skip ignored directories during HardReset walk resetWorktreeToTree (HardReset, KeepReset) walks the working tree via diffStagingWithWorktree to find tracked files that are missing or stale on disk. Without the IgnoreMatcher optimization landed in #2048, that walk descends into every gitignored subtree before the loop body discards the resulting Delete actions, so every file under directories like node_modules is lstat'd for nothing. The infrastructure to avoid this already exists: passing excludeIgnoredChanges=true to diffStagingWithWorktree builds a gitignore.Matcher and threads it into filesystem.Options.IgnoreMatcher, where the noder's shouldSkipIgnored prunes ignored, untracked subtrees at enumeration time while keeping tracked entries (or directories with tracked descendants) visible. Flip the flag to true at the resetWorktreeToTree call site to engage it. Leave the resetWorktree (MergeReset) call site at false. resetIndex runs immediately before resetWorktree and removes paths from the index. A tracked path inside a gitignored directory that was just removed from the index would no longer be in idxMap, so the noder would prune it from the walk and the Delete action needed to remove it from disk would never be emitted. resetWorktreeToTree is safe because step 1 (the tree-to-tree diff) handles deletions, and step 2 explicitly skips any Delete action returned from the worktree walk. The observable result is unchanged for HardReset / KeepReset: step 2's loop already skips Delete actions explicitly to preserve untracked files (the fix from 0c7da9764). The matcher only avoids the cost of walking those entries. Benchmark on a tree with 100 tracked source files and N untracked files in a gitignored directory (Apple M5, -benchtime=10x): Untracked | before | after | speedup --------- | -------- | ------ | ------- 0 | 2.2 ms | 2.6 ms | - 1,000 | 7.7 ms | 3.8 ms | 2.0x 5,000 | 16.2 ms | 5.2 ms | 3.1x 20,000 | 51.8 ms | 4.6 ms | 11.3x The small overhead in the 0-untracked case is the same matcher build cost noted in #2048; the gain comes from the diff walk no longer descending into ignored subtrees. The numbers track BenchmarkStatusIgnoredDir closely because both operations share diffStagingWithWorktree. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> * *: cover IgnoreMatcher behavior across Reset modes Three regression tests pinning the contract this branch relies on at the worktree-level (the noder-level invariants are already covered by node_ignore_test.go in #2048). - TestResetHardTrackedFileInIgnoredDir: a tracked file inside a gitignored directory, deleted on disk, is restored by HardReset. Exercises the trackedDirs path through Reset. - TestResetHardModifiedTrackedFileWithGitIgnore: complements the existing TestResetHardWithGitIgnore (which covers deletion) with the modification path: a tracked file whose own name is gitignored has its content reverted by HardReset. - TestMergeResetRemovesTrackedFileInIgnoredDir: pins the reason resetWorktree (MergeReset) must keep excludeIgnoredChanges=false. Engaging the matcher there would prune a tracked path that resetIndex just removed from the index, so the Delete needed to clean it off disk would never be emitted. Verified by flipping the flag locally — the test fails when the matcher is enabled and passes when it is not. Assisted-by: Claude Opus 4.7 Signed-off-by: Stefan Haubold <[email protected]> * *: align tree-path validation with upstream Git Restrict `ValidTreePath`'s NTFS gating to the disguised-`.git` family, dropping the always-on Windows reserved-name check that made go-git refuse trees upstream Git happily reads. In upstream's `verify_path_internal`[1], `is_ntfs_dotgit` runs under `protect_ntfs` (defaulting to 1 on every platform) but `is_valid_win32_path` is compile-time gated to Windows-native and Cygwin builds. Names such as `lib/con.go` are well-formed on non-Windows, so a go-git client on Linux must be able to read trees containing them. Lift `is_ntfs_dotgit` out as `IsNTFSDotGit` rather than keeping the disguise logic fused into `WindowsValidPath`. As a side-effect this closes a gap in the previous implementation: it only recognised the `.git` prefix, so `git~1 ` (trailing space), `git~1.`, and `git~1::ads` slipped past, even though upstream's `is_ntfs_dotgit`[2] also matches the `git~1` short-name prefix. `WindowsValidPath` now composes `IsNTFSDotGit` with the reserved-name table, retaining its existing wrapper-layer contract: bare `.git` and `git~1` are allowed, position-checked by callers. Defence in depth is preserved at the materialisation boundary: `worktreeFilesystem.validPath` still enforces both checks under `core.protectNTFS`, so reserved-name and disguise rejection remain in place when a path is about to hit disk on Windows. Tests in `plumbing/format/index`, `plumbing/object`, and the worktree strict-gate suite are updated to reflect the new contract: reserved names move out of the always-on tree gate, and `git~1` disguise variants are added alongside their `.git` counterparts. [1]: https://github.com/git/git/blob/v2.54.0/read-cache.c#L987-L1048 [2]: https://github.com/git/git/blob/v2.54.0/path.c#L1415-L1449 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * git: default core.protectNTFS to on everywhere Mirror upstream Git's `PROTECT_NTFS_DEFAULT`, which has been `1` unconditionally since 9102f958ee5 (CVE-2019-1353)[1]. Until now go-git gated the default on `runtime.GOOS == "windows"`, leaving Linux and macOS users without the wrapper-layer `is_ntfs_dotgit` and reserved-name checks unless they explicitly set `core.protectNTFS=true`. The motivating scenario is unchanged from upstream's: WSL mounts Windows drives under `/mnt/`, so a Linux process can reach an NTFS-backed worktree where the `.git` directory is also resolvable as `git~1` (or `.git ` / `.git::$DATA`). Gating the guard on the runtime OS skips that class of attack on the very system where it is reachable. Tree-side gates already catch disguised `.git` regardless of this default — `pathutil.ValidTreePath` is always-on per ce4cca18. This commit closes the parallel gap at the wrapper layer: `worktreeFilesystem.validPath` and `validSymlinkName` now enforce the NTFS rules on non-Windows by default, matching upstream's protect-by-default posture. `PROTECT_HFS_DEFAULT` is left untouched. Upstream chose not to flip that default in 9102f958ee5 (the cost in the cited benchmark was non-trivial and the WSL-equivalent scenario for HFS+ is not realistic), and `defaultProtectHFS` already mirrors that decision via its Darwin-only return. [1]: https://github.com/git/git/commit/9102f958ee5 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * config: validate submodule names `Submodule.Validate()` validated `Path` against `dotdotPath` but left `Name` unchecked, so a `.gitmodules` declaring `[submodule ".."]` passed validation cleanly. Downstream the same name reached `storage/filesystem/dotgit.DotGit.Module(name)` where it was joined into the storage path with no containment check, letting the submodule storer escape `.git/modules/` and write into the parent repo's `.git/`. Mirror canonical Git's `check_submodule_name` from `submodule-config.c` [1], rejecting empty names and any name with a `..` path component using both `/` and `\\` as separators (the canonical loop uses both unconditionally so the rule stays platform-consistent). On top of that, reject names that are bare `.`, contain a NUL byte, start or end with a separator, or carry a Windows drive prefix — go-git-specific edge cases the canonical loop does not exercise because it treats names as opaque C strings. The path-component check is also extended for filesystems whose canonicalisation can turn an innocent-looking name into `..`: HFS+ ignores a fixed set of Unicode code points (zero-width and directional marks) so `.<U+200C>.` resolves to `..` on macOS, and NTFS strips trailing spaces, periods, and an alternate-data-stream suffix so `.. `, `....`, and `..::$INDEX_ALLOCATION` all resolve to `..` on Windows. Both variants are detected by calling `pathutil.IsHFSDot` and `pathutil.IsNTFSDot` with `.` as the needle — they already port the relevant fragments of canonical Git's `next_hfs_char` and `is_ntfs_dotgit` — and run unconditionally on every submodule name since `.gitmodules` is attacker-controlled by definition. Wrap the new `ErrModuleBadName` sentinel (whose message echoes canonical Git's `ignoring suspicious submodule name` warning) with the offending name in `Validate()` so callers can both `errors.Is` the sentinel and read the bad name in the error string. [1]: https://github.com/git/git/blob/v2.54.0/submodule-config.c#L214-L237 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * config: skip submodules with invalid names Extend `unmarshalSubmodules`'s silent-skip to also drop entries whose `Name` fails `Submodule.Validate()`. The previous check only matched `ErrModuleBadPath`; with the new `ErrModuleBadName` validation landed in the preceding commit, a `.gitmodules` declaring `[submodule ".."]` would still be inserted into the map keyed by `..` and reach `Submodule.Repository()` callers downstream. Mirror canonical Git's "ignoring suspicious submodule name" semantics by treating bad-name entries the same way bad-path entries are already treated: drop them without failing the parse, so a malformed `.gitmodules` does not break unrelated config loading. Empty-path and empty-URL entries keep their existing lenient behaviour of being stored despite the validation error, preserving backward compatibility with `.gitmodules` files in the wild. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * storage: dotgit, refuse escaping submodule names `DotGit.Module(name)` joined the raw `name` into `modulePath = "modules"` and chrooted to the result without any containment check. A `Submodule` whose `Name` is `..` (or a sibling traversal like `foo/../..`) caused `Chroot("modules/..")` to resolve back to `.git/`, so once `Submodule.Repository()` opened that storer any subsequent `Storer.SetReference` / `SetEncodedObject` call wrote into the parent repo's `.git/`. The parser-layer fix in the preceding commit already rejects such names from `.gitmodules`, but `Module` is also reachable from any caller that constructs a `Submodule` struct programmatically — including the transactional storage wrapper — so a defence in depth is warranted. After joining, clean the result with `path.Clean(filepath.ToSlash)` and refuse the call unless the cleaned path is exactly `modules` or remains under `modules/`. This catches `..`-traversal escapes without rejecting nested but contained names like `lib/foo`, which canonical Git permits. Names whose joined form `path.Clean`s back into the modules/ subtree (for example `/etc`, which becomes `modules/etc`, or `modules/../escape`, which becomes `modules/escape`) are accepted here — the parser layer is responsible for the shape of the name. The storage layer's only job is to ensure no caller can write outside `modules/`. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * git: submodule, test name escape defence Cover the storage-layer defence: constructing a `Submodule` with `Name = ".."` programmatically (bypassing the .gitmodules parser) must not result in `Repository()` opening a storer rooted in the parent's `.git/` directory, and must leave the parent's HEAD reference untouched. Use `filesystem.NewStorage` rather than `memory.NewStorage` for the parent because only the filesystem-backed `ModuleStorer` exercises the dotgit defence; the in-memory implementation maps names to storage units directly without traversing a path. The test asserts the call returns an error wrapping `dotgit.ErrModuleNameEscape` and that the parent's HEAD target is unchanged after the failed call. Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * git: add repository and remote archive support Signed-off-by: Ayman Bagabas <[email protected]> * build: Update module github.com/go-git/go-git-fixtures/v6 to v6.0.0-20260502205956-66bffbe5a6ff Signed-off-by: go-git-renovate[bot] <245267575+go-git-renovate[bot]@users.noreply.github.com> * build: Update golang Signed-off-by: go-git-renovate[bot] <245267575+go-git-renovate[bot]@users.noreply.github.com> * build: Update go-git Signed-off-by: go-git-renovate[bot] <245267575+go-git-renovate[bot]@users.noreply.github.com> * plumbing: format/pktline, validate reader/writer inputs Guard the read and write entry points against nil io.Reader/io.Writer and undersized buffers so callers receive a typed error instead of a panic. Tighten Read so a stale prefix in a recycled buffer cannot be misread as an error-line, and refuse payloads larger than the caller's buffer. Assisted-by: Claude Opus 4.7 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * plumbing: format/pktline, add fuzz tests Fuzz the Read, ReadLine, PeekLine, and Scanner entry points so the defensive validation in the previous commit stays panic-free under arbitrary byte sequences. Seed inputs cover flush/delim/response-end, the short-buffer cases, and the stale-prefix scenario. Drop the now-redundant test := test loop-variable shadowing in the new table tests; the module is on go 1.25 where each iteration has its own variable. Assisted-by: Claude Opus 4.7 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * plumbing: format/pktline, drain payload on short buffer When Read is called with a buffer too small to hold the advertised packet, it had already consumed the 4-byte header via io.ReadFull but returned without draining the payload, leaving the stream positioned mid-packet. Subsequent pkt-line reads would misinterpret the leftover payload bytes as a fresh pkt-len and desync. Drain length-LenSize bytes before returning io.ErrUnexpectedEOF so the next read finds the following packet. Assisted-by: Claude Opus 4.7 <[email protected]> Signed-off-by: Paulo Gomes <[email protected]> * plumbing: object, validate names in Tree.Encode Tree.Encode previously rejected only embedded NUL bytes and mis-sorted entries, so a caller assembling a *Tree programmatically could produce a tree object containing components like ".git", "..", control characters, or HFS+/NTFS variants of ".git". Run each entry name through pathutil.ValidTreePath at the encoder boundary so the library cannot emit such bytes through its API. The same loop also rejects duplicate names, including the file and directory collision where `foo` and `foo/` resolve to the same sort key. With entries already in memory a single pass over `entry.Name` catches the non-consecutive case that upstream Git's `verify_ordered` in `fsck.c`[1] tracks with a name stack; the simpler form suffices because the file/dir variant shares the bare name. A test helper in worktree_commit_test.go that planted a tree containing ".git/config" for cherry-pick coverage can no longer do that through Tree.Encode and now writes the raw object bytes directly, matching the escape hatch documented in Tree.Encode's godoc. [1]: https://github.com/git/git/blob/v2.54.0/fsck.c#L544-L614 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * plumbing: object, port fsck_tree as Tree.Validate go-git's tree decoder is permissive on purpose: tools that inspect or recover repositories need to read tree objects whose entries do not pass the index-side path-validation gate. Upstream Git keeps the same split — `parse_tree_buffer`[1] is a no-op on names while `fsck_tree`[2] is the validator — and exposes the latter through `git fsck`. Add the read-side counterpart to Tree.Encode's producer gate: a Tree.Validate method that mirrors `fsck_tree`'s structural ruleset. Decode stays as-is; callers that want fsck-shaped reporting call Validate. The ported rules cover null entry hashes, slashes in names, empty names, ".", "..", ".git" with HFS+/NTFS disguises (already handled by pathutil.ValidTreePath), bad file modes, duplicate entries, mis-sort, names longer than 4096 bytes, and the four FSCK_MSG_*_SYMLINK rules for ".gitmodules", ".gitattributes", ".gitignore", and ".mailmap". Two fsck_tree warnings — zero-padded modes and the non-canonical 0100664 bits — are not surfaced: both require retaining the raw octal mode string from the wire, which canonicalTreeMode drops. The structural rules are the load-bearing ones for refusing malformed trees. The new pathutil helpers IsHFSDotGitattributes / IsHFSDotGitignore / IsHFSDotMailmap and their NTFS counterparts extend the existing IsHFSDotGitmodules / IsNTFSDotGitmodules pair with the short-name prefixes upstream uses for the same names (`gi7d29`, `gi250a`, `maba30`). Violations are returned as an errors.Join of individual rule errors, each wrapping ErrInvalidTree, so callers can match either the umbrella or specific rules (ErrDuplicateEntry, ErrEntriesNotSorted, pathutil.ErrInvalidPath) with errors.Is. [1]: https://github.com/git/git/blob/v2.54.0/tree.c#L175-L184 [2]: https://github.com/git/git/blob/v2.54.0/fsck.c#L616-L800 Assisted-by: Claude Opus 4.7 Signed-off-by: Hidde Beydals <[email protected]> * plumbing: object, fold Tree.Encode through Tree.Validate Encode previously enforced a subset of Tree.Validate's rules inline (ValidTreePath, duplicate detection, sort order). With Validate available as the structural ruleset, Encode can run it once at the top and skip the duplicated checks — both shrinking the encoder and widening the producer-side gate to refuse the rules Validate already catches but Encode did not: null entry hashes, oversize names, and symlinks disguised as `.gitmodules` / `.gitattributes` / `.gitignore` / `.mailmap`. The widened gate exposes one pre-existing path that built trees containing entries with the zero hash: buildTreeHelper turned every index entry into a tree entry, including those carried through #1773 to avoid a panic during rec…
Summary
Worktree.Status()walks the working tree via the filesystem merkletrie noder and then drops gitignored entries inexcludeIgnoredChanges(worktree_status.go:175-177). Because the walker has no concept of.gitignore, every file under a large ignored directory likenode_modulesislstat'd and hashed before being filtered out. CLI git skips such directories at the directory level viatreat_directory/read_directory_recursivein dir.c, which is whygit statusis effectively constant-time in the size ofnode_modules.This PR plumbs an
IgnoreMatcherintofilesystem.Optionsso the walker can skip ignored entries during traversal, while preserving the "tracked entries are still walked" semantics that matchgit status.Design
filesystem.IgnoreMatcher(single method, satisfied bygitignore.Matcher— kept as an interface to avoid pullinggitignoreinto a leaf utility package).filesystem.Options.IgnoreMatcher.IndexandIgnoreMatcherare set, build atrackedDirsset of directory prefixes that contain index entries.calculateChildren, skip a child entry when:worktree_status.go, loadgitignore.ReadPatternsonce per Status call and reuse the slice for both the matcher and the post-walkexcludeIgnoredChangesfilter (it previously re-read patterns from disk).The post-walk filter is kept as a defense-in-depth backstop; with the new path it has no work to do but the cost is negligible.
Correctness vs. upstream git
Behavior preserved:
git status.git statusbehavior for files that were committed before an ignore rule was added (e.g.git add -fthen later ignored). The walker descends into directories that contain tracked entries even if the directory matches an ignore pattern.No change to:
diffCommitWithStaging(left side of Status) — only the worktree walk is affected.excludeIgnoredChangessemantics — still dropslen(ch.From) == 0entries that match.IgnoreMatcher(this is opt-in via Options).Benchmark
BenchmarkStatusIgnoredDir(added in this PR) timesStatus()over a tree with 100 tracked source files plus N untracked files inside a gitignoredvendor_ignored/directory. Apple M4 Max,-benchtime=10x:The "After" column is flat in the size of the ignored tree — the walker no longer descends. The ~0.4 ms overhead at N=0 is the unchanged
gitignore.ReadPatternswalk plus the smalltrackedDirsmap; it scales with the tracked tree, not the ignored one.Tests
Unit tests in
utils/merkletrie/filesystem/node_ignore_test.go:TestIgnoredDirIsSkipped— directory matching the matcher with no tracked content is not walked.TestTrackedFileInIgnoredDirReportsModify— a tracked file inside an ignored directory still surfaces asModifywhen its content changes.TestUntrackedSiblingsInIgnoredDirAreSkipped— when the walker descends into an ignored directory because of a tracked file, untracked siblings are still filtered.Integration test in
worktree_status_test.go:TestStatusReportsModifiedTrackedFileInIgnoredDirectory— same invariant as the second unit test, exercised end-to-end throughWorktree.Status().Existing test suites (
Status*,TestNoder*, gitignore) remain green.API surface
Additive only:
filesystem.IgnoreMatcher.filesystem.Options.IgnoreMatcher.gitignore.Matchersatisfies the new interface implicitly, so existing callers can opt in by setting the field.AI disclosure
Per
AI_POLICY.md, this contribution was assisted by Claude Opus 4.7 (commit trailers). The author has reviewed every line and is accountable for correctness; the design (interface,trackedDirsbuilding, the "tracked → still walked" invariant) was specified by the author and validated against upstream git'sdir.cbehavior.