Skip to content

git: Improve Status() speed with new index.ModTime check#1747

Merged
pjbgf merged 7 commits intogo-git:mainfrom
cedric-appdirect:improve-status-speed
Feb 22, 2026
Merged

git: Improve Status() speed with new index.ModTime check#1747
pjbgf merged 7 commits intogo-git:mainfrom
cedric-appdirect:improve-status-speed

Conversation

@cedric-appdirect
Copy link
Contributor

This should help #181. This mimick what git cli does by relying on file mtime and size to decide if we need to compute its hash. This does not address the untracked file case.

@hansbogert
Copy link

hansbogert commented Nov 24, 2025

For context, I functionally tried the same in: #1694

This PR surely is more production ready. However, in the time between making my PR and and now, I've looked at upstream git some more. There is are some edge cases that they catch. I think they are needed in the go-git implementation as well. A big one is documented here https://git-scm.com/docs/racy-git

@cedric-appdirect
Copy link
Contributor Author

For context, I functionally tried the same in: #1694

Oh, sorry, I missed it. I have just been hammering my local benchmark and going at everything that stick out.

This PR surely is more production ready. However, in the time between making my PR and and now, I've looked at upstream git some more. There is are some edge cases that they catch. I think they are needed in the go-git implementation as well. A big one is documented here https://git-scm.com/docs/racy-git

I was aware of this issue while reading the git code, but that documentation is a lot more understandable with a lot more context. I had assumed that Golang code in https://github.com/golang/go/blob/master/src/time/time.go#L316 was going to be enough. Seems I need to double check. I wonder how I can write a tests case for this would help ensure we don't have this problem. Thanks a lot for the link.

@cedric-appdirect
Copy link
Contributor Author

The solution is actually simple, we just need to carry around the mtime of the index file and fallback to compute hash in case mtime of the file == mtime of the index. In practice, that shouldn't really happen very often, so the performance impact shouldn't be a big deal for anyone and we keep the race condition out in a simple manner.

// Deprecated: Use NewRootNodeWithIndex instead for better performance.
// This function is kept for backward compatibility and now internally calls
// NewRootNodeWithIndex with a nil index, which disables the metadata
// optimization but maintains the same functionality.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just keep this function or remove them as this is work for v6 and API break are still acceptable at this point in time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep NewRootNodeWithOptions and add the index in the Options instead.

@cedric-appdirect
Copy link
Contributor Author

Performance improvement on the provided benchmark:

                     │ ../../go-git/before-status.txt │             after.txt              │
                     │             sec/op             │   sec/op     vs base               │
Status/Clean-12                           96.77m ± 2%   18.70m ± 5%  -80.68% (p=0.001 n=7)
Status/Modified-12                        98.99m ± 3%   19.08m ± 1%  -80.72% (p=0.001 n=7)
StatusLarge/Clean-12                     251.89m ± 2%   43.95m ± 6%  -82.55% (p=0.001 n=7)
geomean                                   134.1m        25.03m       -81.34%

                     │ ../../go-git/before-status.txt │              after.txt              │
                     │              B/op              │     B/op      vs base               │
Status/Clean-12                         11.497Mi ± 0%   6.586Mi ± 0%  -42.71% (p=0.001 n=7)
Status/Modified-12                      11.505Mi ± 0%   6.695Mi ± 0%  -41.80% (p=0.001 n=7)
StatusLarge/Clean-12                     28.04Mi ± 1%   15.46Mi ± 0%  -44.88% (p=0.001 n=7)
geomean                                  15.48Mi        8.801Mi       -43.15%

                     │ ../../go-git/before-status.txt │             after.txt              │
                     │           allocs/op            │  allocs/op   vs base               │
Status/Clean-12                           217.9k ± 0%   147.9k ± 0%  -32.13% (p=0.001 n=7)
Status/Modified-12                        218.0k ± 0%   148.7k ± 0%  -31.79% (p=0.001 n=7)
StatusLarge/Clean-12                      542.2k ± 0%   367.1k ± 0%  -32.28% (p=0.001 n=7)
geomean                                   295.3k        200.6k       -32.07%

@cedric-appdirect cedric-appdirect force-pushed the improve-status-speed branch 2 times, most recently from e0ac8f6 to 68ad755 Compare February 17, 2026 17:27
@cedric-appdirect
Copy link
Contributor Author

I hadn't covered the in memory storage case previously and recent main test improvement added a tests case that caught that. This solve the problem in both a defensive way (making sure that if we don't have a proper index or modification time set on the index, it will fallback to full status check) and then in an optimized way by setting the index time in memory to match the time we load the index.

@cedric-appdirect cedric-appdirect force-pushed the improve-status-speed branch 3 times, most recently from f2febfb to 751a53c Compare February 17, 2026 21:56
Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cedric-appdirect good stuff, thanks for looking into this it largely LGTM. Please rebase and take a look at the changes below.

Thank you @hansbogert for sharing the additional information around the racy index and helping with the initial feedback. 🙇

// Deprecated: Use NewRootNodeWithIndex instead for better performance.
// This function is kept for backward compatibility and now internally calls
// NewRootNodeWithIndex with a nil index, which disables the metadata
// optimization but maintains the same functionality.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep NewRootNodeWithOptions and add the index in the Options instead.

@pjbgf
Copy link
Member

pjbgf commented Feb 20, 2026

For future reference, a benchstat comparing v5, main and this PR:

goos: linux
goarch: amd64
pkg: github.com/go-git/go-git/v5
cpu: AMD Ryzen AI 9 HX PRO 370 w/ Radeon 890M
                     │   /tmp/v5    │
                     │    sec/op    │
Status/Clean-24        28.83m ± 21%
Status/Modified-24     29.50m ± 18%
StatusLarge/Clean-24   77.68m ± 18%
geomean                40.42m

                     │   /tmp/v5    │
                     │     B/op     │
Status/Clean-24        70.21Mi ± 0%
Status/Modified-24     70.20Mi ± 0%
StatusLarge/Clean-24   174.7Mi ± 0%
geomean                95.14Mi

                     │   /tmp/v5   │
                     │  allocs/op  │
Status/Clean-24        194.2k ± 0%
Status/Modified-24     194.3k ± 0%
StatusLarge/Clean-24   484.1k ± 0%
geomean                263.4k

pkg: github.com/go-git/go-git/v6
                     │   /tmp/main   │               /tmp/pr                │
                     │    sec/op     │    sec/op     vs base                │
Status/Clean-24        21.043m ± 12%   9.636m ± 19%  -54.21% (p=0.000 n=10)
Status/Modified-24     20.659m ± 13%   9.681m ± 15%  -53.14% (p=0.000 n=10)
StatusLarge/Clean-24    54.55m ± 18%   25.41m ± 20%  -53.41% (p=0.000 n=10)
geomean                 28.73m         13.33m        -53.59%

                     │  /tmp/main   │               /tmp/pr                │
                     │     B/op     │     B/op      vs base                │
Status/Clean-24        9.861Mi ± 0%   6.514Mi ± 0%  -33.94% (p=0.000 n=10)
Status/Modified-24     9.866Mi ± 0%   6.626Mi ± 0%  -32.84% (p=0.000 n=10)
StatusLarge/Clean-24   24.06Mi ± 0%   15.27Mi ± 0%  -36.53% (p=0.000 n=10)
geomean                13.28Mi        8.702Mi       -34.46%

                     │  /tmp/main  │               /tmp/pr               │
                     │  allocs/op  │  allocs/op   vs base                │
Status/Clean-24        217.8k ± 0%   147.8k ± 0%  -32.14% (p=0.000 n=10)
Status/Modified-24     217.9k ± 0%   148.6k ± 0%  -31.80% (p=0.000 n=10)
StatusLarge/Clean-24   542.0k ± 0%   367.0k ± 0%  -32.29% (p=0.000 n=10)
geomean                295.2k        200.5k       -32.08%

Add benchmarks to measure Status() performance on repositories with
thousands of files. These benchmarks help identify optimization
opportunities in the status checking code path.

BenchmarkStatusClean measures worst-case performance: a clean repository
where every file's hash is computed unnecessarily. This is the primary
target for the upcoming metadata-first comparison optimization.

BenchmarkStatusModified measures a more realistic case with a small
percentage of modified files.

BenchmarkStatusLarge tests performance on larger repositories with 5000
files to ensure optimizations scale appropriately.
Store the modification time of the index file in the Index structure.
This timestamp is captured when the index is loaded and will be used
to properly handle the racy-git condition in the worktree status
optimization.

The ModTime is obtained by calling Stat() on the open file handle,
ensuring it corresponds exactly to the index content that was read.

Reference: https://git-scm.com/docs/racy-git
… check

Implement metadata-first comparison to avoid unnecessary file hashing
when checking status. This matches native git's ie_match_stat() approach.

Before hashing a file, check if its metadata (mtime, size, mode) matches
the index entry. If metadata matches AND the file's mtime is before the
index file's mtime (not in racy-git window), reuse the hash from the
index instead of reading and hashing the file content.

This optimization dramatically reduces I/O operations for Status() calls
on unchanged files:
- Clean repository (2000 files): 769ms → 143ms (~5.4x faster)
- Repository with 1% modified (2000 files): ~202ms
- Large repository (5000 files): ~533ms

The optimization includes:
- O(1) index entry lookup using a pre-built map
- Cached modification time from ReadDir (no extra Lstat calls)
- Size, mtime, and mode comparison before any file I/O
- Racy-git check using idx.ModTime to ensure correctness

This is the same "trust the index" optimization that makes native git
status fast. The implementation:
1. Builds an entry map on root node creation for fast lookups
2. Caches file metadata (mtime, size, mode) from ReadDir
3. Compares cached metadata with index entries before hashing
4. Checks for racy-git condition (file mtime >= index mtime)
5. Only hashes files when metadata differs or in racy window

The optimization is backward compatible - when idx is nil, the function
works without optimization. The old constructors are deprecated but
continue to work by calling NewRootNodeWithIndex with nil.

Includes test demonstrating proper racy-git handling.

Reference: https://git-scm.com/docs/racy-git
The ModTime field added to index.Index is filesystem metadata that
gets populated from the file modification time when reading, not
from the serialized index content. This means it does not round-trip
through SetIndex/Index operations.

Update the test to exclude ModTime from the equality comparison
since it is expected to differ between the original and read-back
index.
… optimization

The metadata-first optimization for git status compares file metadata
(size, mtime, mode) against the index to avoid expensive content hashing.
To ensure correctness, it relies on a "racy git" safety check when files
are modified in the same second the index was written.

However, with in-memory storage (memory.NewStorage()), idx.ModTime is never
set (remains zero), silently skipping the racy git check. On Windows—where
time.Now() has coarser resolution (~15ms)—a tracked file could be modified
with the same size and mtime, incorrectly appearing unchanged and causing
false negatives in status checks.

This fix conservatively disables the metadata optimization when the racy
git check cannot be performed (idx is nil or idx.ModTime is zero), ensuring
correctness. This only affects in-memory storage used in tests; production
filesystem storage always has idx.ModTime set and retains the full optimization.

Fixes CI test failure in TestStatusCheckedInBeforeIgnored.
The metadata-first optimization relies on racy git detection, which requires
idx.ModTime to be set. A prior change disabled this optimization when ModTime
was zero to ensure correctness. However, this inadvertently disabled the
optimization for all in-memory storage usage.

This change re-enables the optimization by making memory storage set ModTime
to time.Now() during SetIndex, simulating filesystem storage behavior where
ModTime represents the index file's modification time. With ModTime properly
set, the racy git check can now function correctly for in-memory storage.

Additionally, update the index round-trip test to verify that ModTime is
set by SetIndex before zeroing both sides for structural comparison.
… guard

Move the index parameter into the Options struct and remove NewRootNodeWithIndex
in favor of NewRootNodeWithOptions, addressing PR review feedback. The index can
now be passed via options.Index, simplifying the API surface.

Also add a nil guard in metadataMatches to safely handle cases where no index
entry is available, preventing potential panics.
@pjbgf pjbgf changed the title Improve Status() speed by relying on index metadata git: Improve Status() speed with new index.ModTime check Feb 22, 2026
Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cedric-appdirect thanks for working on this. 🙇

@pjbgf pjbgf merged commit 424e996 into go-git:main Feb 22, 2026
24 of 25 checks passed
@cedric-appdirect cedric-appdirect deleted the improve-status-speed branch February 24, 2026 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants