You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
backport, git: Improve Status() speed with new index.ModTime check (#1862)
* git: worktree, add benchmark for Status() with many files
Add benchmarks to measure Status() performance on repositories with
thousands of files. These benchmarks help identify optimization
opportunities in the status checking code path.
BenchmarkStatusClean measures worst-case performance: a clean repository
where every file's hash is computed unnecessarily. This is the primary
target for the upcoming metadata-first comparison optimization.
BenchmarkStatusModified measures a more realistic case with a small
percentage of modified files.
BenchmarkStatusLarge tests performance on larger repositories with 5000
files to ensure optimizations scale appropriately.
* plumbing: format/index, add ModTime field to Index struct
Store the modification time of the index file in the Index structure.
This timestamp is captured when the index is loaded and will be used
to properly handle the racy-git condition in the worktree status
optimization.
The ModTime is obtained by calling Stat() on the open file handle,
ensuring it corresponds exactly to the index content that was read.
Reference: https://git-scm.com/docs/racy-git
* git: utils/merkletrie/filesystem, optimize node hashing with metadata check
Implement metadata-first comparison to avoid unnecessary file hashing
when checking status. This matches native git's ie_match_stat() approach.
Before hashing a file, check if its metadata (mtime, size, mode) matches
the index entry. If metadata matches AND the file's mtime is before the
index file's mtime (not in racy-git window), reuse the hash from the
index instead of reading and hashing the file content.
This optimization dramatically reduces I/O operations for Status() calls
on unchanged files:
- Clean repository (2000 files): 769ms → 143ms (~5.4x faster)
- Repository with 1% modified (2000 files): ~202ms
- Large repository (5000 files): ~533ms
The optimization includes:
- O(1) index entry lookup using a pre-built map
- Cached modification time from ReadDir (no extra Lstat calls)
- Size, mtime, and mode comparison before any file I/O
- Racy-git check using idx.ModTime to ensure correctness
This is the same "trust the index" optimization that makes native git
status fast. The implementation:
1. Builds an entry map on root node creation for fast lookups
2. Caches file metadata (mtime, size, mode) from ReadDir
3. Compares cached metadata with index entries before hashing
4. Checks for racy-git condition (file mtime >= index mtime)
5. Only hashes files when metadata differs or in racy window
The optimization is backward compatible - when idx is nil, the function
works without optimization.
Includes test demonstrating proper racy-git handling.
Reference: https://git-scm.com/docs/racy-git
* git: utils/merkletrie/filesystem, require racy git check for metadata optimization
The metadata-first optimization for git status compares file metadata
(size, mtime, mode) against the index to avoid expensive content hashing.
To ensure correctness, it relies on a "racy git" safety check when files
are modified in the same second the index was written.
However, with in-memory storage (memory.NewStorage()), idx.ModTime is never
set (remains zero), silently skipping the racy git check. On Windows—where
time.Now() has coarser resolution (~15ms)—a tracked file could be modified
with the same size and mtime, incorrectly appearing unchanged and causing
false negatives in status checks.
This fix conservatively disables the metadata optimization when the racy
git check cannot be performed (idx is nil or idx.ModTime is zero), ensuring
correctness. This only affects in-memory storage used in tests; production
filesystem storage always has idx.ModTime set and retains the full optimization.
Fixes CI test failure in TestStatusCheckedInBeforeIgnored.
* storage: set ModTime in memory storage to enable racy git optimization
The metadata-first optimization relies on racy git detection, which requires
idx.ModTime to be set. A prior change disabled this optimization when ModTime
was zero to ensure correctness. However, this inadvertently disabled the
optimization for all in-memory storage usage.
This change re-enables the optimization by making memory storage set ModTime
to time.Now() during SetIndex, simulating filesystem storage behavior where
ModTime represents the index file's modification time. With ModTime properly
set, the racy git check can now function correctly for in-memory storage.
Additionally, update the index round-trip test to verify that ModTime is
set by SetIndex before zeroing both sides for structural comparison.
* git: utils/merkletrie/filesystem, add nil guard
Also add a nil guard in metadataMatches to safely handle cases where no index
entry is available, preventing potential panics.
* storage: Remove redundant verification
Signed-off-by: Paulo Gomes <[email protected]>
c.Log("PASS: Correctly detected file change despite metadata match (racy-git handled)")
325
+
} else {
326
+
c.Errorf("FAIL: Racy-git not handled correctly.\nExpected hash: %x (bar)\nGot hash: %x (likely foo)\nThis means the file was not hashed despite being in the racy window.", expectedHash, fileHash)
327
+
}
328
+
329
+
c.Assert(expectedHash, DeepEquals, fileHash, Commentf("should hash file content when in racy-git window"))
0 commit comments