Skip to content

Conversation

@swills
Copy link
Contributor

@swills swills commented Nov 20, 2025

Got a panic on concurrent read/write of this map calling Repository.PushContext(). It's difficult to reproduce but the panic is pretty clear. This should fix it. Was using v5.16.2 at the time. Patch is for main but can back port if preferred.

IdxChecksum plumbing.Hash

offsetHash map[int64]plumbing.Hash
offsetHashIsFull bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would convert this to an atomic and keep it because current approach sacrifices performance in FindOffset by keep writing even after full generation. Combining buildOnce with atomic would provide better performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I tested it using BenchmarkFindOffset I saw:

BenchmarkFindOffset-16 2548224 534.7 ns/op

before this change and this after:

BenchmarkFindOffset-16 2696178 459.4 ns/op

This is using the fasted of 3 runs for each of them. But the performance after this change varied a lot less, the performance before varied a lot more, up to >800ns/op at times. But maybe you are thinking it should be faster?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're on it and improving concurrent access, I suggested not to leave any obvious performance on the table but it might really not matter that much with these numbers.

It also seems existing benchmark doesn't exercise concurrent usage. It would be better to use RunParallel to see real effect of the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I'm actually not sure how often parallel access happens. If it happened a lot then I suspect likely the concurrency issue would have been more of an issue and fixed before now. On the other hand, panics, even rare ones, are kind of a show stopper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree.

Right now, there is a mutex that would like to support concurrent access but it's half baked from experimental changes some time ago. I would delegate this nuance and how performant it should be to maintainers.

offsetHashIsFull bool
mu sync.RWMutex
offsetHash map[int64]plumbing.Hash
buildOnce sync.Once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add offset prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

idx.offsetHash = make(map[int64]plumbing.Hash, count)
idx.offsetHashIsFull = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to assign to a local variable then at the end assign it to idx so that lock contention is minimized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to lock idx in case Fanout (Count()) or other parts of the struct change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not supposed to change after decoding so I believe no.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, did that but noticed that performance got worse in BenchmarkFindOffset after doing this. Perhaps I've done something wrong though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe that's due to other issues, FindOffset doesn't call FindHash. Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now.

@ferhatelmas
Copy link
Contributor

There is clearly a concurrency issue, thanks for fixing it. It would be desirable to see a test that breaks with race detector. It seems CI isn't using -race though, which is something else to improve.

@pjbgf
Copy link
Member

pjbgf commented Nov 20, 2025

x-ref: related to concurrency issues.

@swills
Copy link
Contributor Author

swills commented Nov 20, 2025

I have a test which demonstrates the issue:

func TestOffsetHashConcurrentPopulation(t *testing.T) {
        idx, err := fixtureIndex()
        if err != nil {
                t.Fatalf("failed to build fixture index: %v", err)
        }

        var wg sync.WaitGroup

        for _, h := range fixtureHashes {
                wg.Add(1)
                go func() {
                        defer wg.Done()
                        for i := 0; i < 5000; i++ {
                                _, _ = idx.FindOffset(h)
                        }
                }()
        }

        for _, off := range fixtureOffsets {
                wg.Add(1)
                go func() {
                        defer wg.Done()
                        for i := 0; i < 3000; i++ {
                                _, _ = idx.FindHash(off)
                        }
                }()
        }

        wg.Wait()
}

Should I add that as well?

Also, benchmarks in progress, can throw those in as well, but will need a larger dataset to really show the differences. fixtureHashes is only 9 hashes....

Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swills thanks for looking into this and @ferhatelmas for the great initial review. 👏
The changes LGTM.

Please squash the commits and reword them so that they align with the contributing guidelines.

Note that genOffsetHash is likely going to be impacted by the introduction of reverse indexes (i.e. .rev files) as it will remove the need to preload the offsets into memory.

@pjbgf
Copy link
Member

pjbgf commented Nov 20, 2025

@swills no need to add benchmark stats for this specific change. As for the concurrency tests, I think that would be a nice addition to avoid future regressions.

prevent concurrent map read and write panic, and add test to prevent
future regressions
@swills swills force-pushed the find-hash-panic-fix branch from aa1be42 to f79fe1f Compare November 21, 2025 00:25
@swills swills requested a review from pjbgf November 21, 2025 00:27
@swills
Copy link
Contributor Author

swills commented Nov 21, 2025

Squashed, please let me know if I didn't get the formatting right.

Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swills thanks for working on this. 🙇

@pjbgf pjbgf merged commit 39fcec4 into go-git:main Nov 21, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants