-
Notifications
You must be signed in to change notification settings - Fork 869
prevent concurrent map read and write panic #1729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| IdxChecksum plumbing.Hash | ||
|
|
||
| offsetHash map[int64]plumbing.Hash | ||
| offsetHashIsFull bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would convert this to an atomic and keep it because current approach sacrifices performance in FindOffset by keep writing even after full generation. Combining buildOnce with atomic would provide better performance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tested it using BenchmarkFindOffset I saw:
BenchmarkFindOffset-16 2548224 534.7 ns/op
before this change and this after:
BenchmarkFindOffset-16 2696178 459.4 ns/op
This is using the fasted of 3 runs for each of them. But the performance after this change varied a lot less, the performance before varied a lot more, up to >800ns/op at times. But maybe you are thinking it should be faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're on it and improving concurrent access, I suggested not to leave any obvious performance on the table but it might really not matter that much with these numbers.
It also seems existing benchmark doesn't exercise concurrent usage. It would be better to use RunParallel to see real effect of the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I'm actually not sure how often parallel access happens. If it happened a lot then I suspect likely the concurrency issue would have been more of an issue and fixed before now. On the other hand, panics, even rare ones, are kind of a show stopper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree.
Right now, there is a mutex that would like to support concurrent access but it's half baked from experimental changes some time ago. I would delegate this nuance and how performant it should be to maintainers.
plumbing/format/idxfile/idxfile.go
Outdated
| offsetHashIsFull bool | ||
| mu sync.RWMutex | ||
| offsetHash map[int64]plumbing.Hash | ||
| buildOnce sync.Once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add offset prefix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| } | ||
|
|
||
| idx.offsetHash = make(map[int64]plumbing.Hash, count) | ||
| idx.offsetHashIsFull = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to assign to a local variable then at the end assign it to idx so that lock contention is minimized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to lock idx in case Fanout (Count()) or other parts of the struct change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not supposed to change after decoding so I believe no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, did that but noticed that performance got worse in BenchmarkFindOffset after doing this. Perhaps I've done something wrong though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, maybe that's due to other issues, FindOffset doesn't call FindHash. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me now.
|
There is clearly a concurrency issue, thanks for fixing it. It would be desirable to see a test that breaks with race detector. It seems CI isn't using |
|
x-ref: related to concurrency issues. |
|
I have a test which demonstrates the issue: Should I add that as well? Also, benchmarks in progress, can throw those in as well, but will need a larger dataset to really show the differences. fixtureHashes is only 9 hashes.... |
pjbgf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swills thanks for looking into this and @ferhatelmas for the great initial review. 👏
The changes LGTM.
Please squash the commits and reword them so that they align with the contributing guidelines.
Note that genOffsetHash is likely going to be impacted by the introduction of reverse indexes (i.e. .rev files) as it will remove the need to preload the offsets into memory.
|
@swills no need to add benchmark stats for this specific change. As for the concurrency tests, I think that would be a nice addition to avoid future regressions. |
prevent concurrent map read and write panic, and add test to prevent future regressions
aa1be42 to
f79fe1f
Compare
|
Squashed, please let me know if I didn't get the formatting right. |
pjbgf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@swills thanks for working on this. 🙇
Got a panic on concurrent read/write of this map calling Repository.PushContext(). It's difficult to reproduce but the panic is pretty clear. This should fix it. Was using v5.16.2 at the time. Patch is for main but can back port if preferred.