Skip to content

backport, git: Improve Status() speed with new index.ModTime check#1862

Merged
pjbgf merged 7 commits intogo-git:releases/v5.xfrom
cedric-appdirect:backport-status-fix-v5
Feb 24, 2026
Merged

backport, git: Improve Status() speed with new index.ModTime check#1862
pjbgf merged 7 commits intogo-git:releases/v5.xfrom
cedric-appdirect:backport-status-fix-v5

Conversation

@cedric-appdirect
Copy link
Contributor

@cedric-appdirect cedric-appdirect commented Feb 23, 2026

This is a backport of #1747 . There is more change than I would have wanted, but v5 and v6 have diverged a bit. I will try to highlight in the code what I had to change compared to v6 PR.

@cedric-appdirect cedric-appdirect changed the title Backport status fix v5 backport, git: Improve Status() speed with new index.ModTime check Feb 23, 2026
Add benchmarks to measure Status() performance on repositories with
thousands of files. These benchmarks help identify optimization
opportunities in the status checking code path.

BenchmarkStatusClean measures worst-case performance: a clean repository
where every file's hash is computed unnecessarily. This is the primary
target for the upcoming metadata-first comparison optimization.

BenchmarkStatusModified measures a more realistic case with a small
percentage of modified files.

BenchmarkStatusLarge tests performance on larger repositories with 5000
files to ensure optimizations scale appropriately.
Store the modification time of the index file in the Index structure.
This timestamp is captured when the index is loaded and will be used
to properly handle the racy-git condition in the worktree status
optimization.

The ModTime is obtained by calling Stat() on the open file handle,
ensuring it corresponds exactly to the index content that was read.

Reference: https://git-scm.com/docs/racy-git
… check

Implement metadata-first comparison to avoid unnecessary file hashing
when checking status. This matches native git's ie_match_stat() approach.

Before hashing a file, check if its metadata (mtime, size, mode) matches
the index entry. If metadata matches AND the file's mtime is before the
index file's mtime (not in racy-git window), reuse the hash from the
index instead of reading and hashing the file content.

This optimization dramatically reduces I/O operations for Status() calls
on unchanged files:
- Clean repository (2000 files): 769ms → 143ms (~5.4x faster)
- Repository with 1% modified (2000 files): ~202ms
- Large repository (5000 files): ~533ms

The optimization includes:
- O(1) index entry lookup using a pre-built map
- Cached modification time from ReadDir (no extra Lstat calls)
- Size, mtime, and mode comparison before any file I/O
- Racy-git check using idx.ModTime to ensure correctness

This is the same "trust the index" optimization that makes native git
status fast. The implementation:
1. Builds an entry map on root node creation for fast lookups
2. Caches file metadata (mtime, size, mode) from ReadDir
3. Compares cached metadata with index entries before hashing
4. Checks for racy-git condition (file mtime >= index mtime)
5. Only hashes files when metadata differs or in racy window

The optimization is backward compatible - when idx is nil, the function
works without optimization.

Includes test demonstrating proper racy-git handling.

Reference: https://git-scm.com/docs/racy-git
… optimization

The metadata-first optimization for git status compares file metadata
(size, mtime, mode) against the index to avoid expensive content hashing.
To ensure correctness, it relies on a "racy git" safety check when files
are modified in the same second the index was written.

However, with in-memory storage (memory.NewStorage()), idx.ModTime is never
set (remains zero), silently skipping the racy git check. On Windows—where
time.Now() has coarser resolution (~15ms)—a tracked file could be modified
with the same size and mtime, incorrectly appearing unchanged and causing
false negatives in status checks.

This fix conservatively disables the metadata optimization when the racy
git check cannot be performed (idx is nil or idx.ModTime is zero), ensuring
correctness. This only affects in-memory storage used in tests; production
filesystem storage always has idx.ModTime set and retains the full optimization.

Fixes CI test failure in TestStatusCheckedInBeforeIgnored.
The metadata-first optimization relies on racy git detection, which requires
idx.ModTime to be set. A prior change disabled this optimization when ModTime
was zero to ensure correctness. However, this inadvertently disabled the
optimization for all in-memory storage usage.

This change re-enables the optimization by making memory storage set ModTime
to time.Now() during SetIndex, simulating filesystem storage behavior where
ModTime represents the index file's modification time. With ModTime properly
set, the racy git check can now function correctly for in-memory storage.

Additionally, update the index round-trip test to verify that ModTime is
set by SetIndex before zeroing both sides for structural comparison.
Also add a nil guard in metadataMatches to safely handle cases where no index
entry is available, preventing potential panics.
pjbgf
pjbgf previously approved these changes Feb 24, 2026
Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cedric-appdirect thanks for working on this. 🙇

@pjbgf pjbgf merged commit 8ed442c into go-git:releases/v5.x Feb 24, 2026
12 checks passed
@cedric-appdirect cedric-appdirect deleted the backport-status-fix-v5 branch February 24, 2026 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants