Fix item not-nil race#45
Merged
jjeffcaii merged 1 commit intojjeffcaii:masterfrom Jan 20, 2025
Merged
Conversation
Owner
|
It has been released with v0.5.6. Thanks for the contribution~ 👍🏽 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Investigation
I caught this bug while stress-testing RSocket-based Client/Server implementation in Facebook Thrift: https://github.com/facebook/fbthrift/blob/main/thrift/lib/go/thrift/stress/server_test.go
I noticed that occasionally an RSocket would return a
rsocket: socket closed alreadyerror for no apparent reason. (Communication in this stress test is done between synthetic client/server via a unix socket - so there should be no errors at all under stress test.)The investigation lead me to the
processorclass in this repo. I had a suspicion that upon RSocket close - the wrongprocessorinstance was getting notified (due to a global processor pool being use in the implementation).https://github.com/rsocket/rsocket-go/blob/099cb5babee5b6e19d9488de5d7ad12f107be5ef/internal/socket/duplex.go#L135
To test my theory I applied the following patch:
And after a few hundred thousand iterations I got the following panic:
Race condition
The race condition is as follows:
processorinstance is obtained from the global pool:https://github.com/echistyakov/reactor-go/blob/ee85b1f144c3b16dc7149f45573cf029c9c139cb/mono/initiate.go#L60-L61
ProcessorFinallyHookwhich is implemented correctly and and has a call toDispose()theprocessorinstance (on "finally").processorinstance is used by its caller to subscribe to viaprocessorSubscriber(in a multi-goroutine environment - where multiple goroutines might have a reference to thisprocessoras aSinkinterface).processorSubscriberruns itsOnCompletemethod, invoking the "on finally" hook of theprocessor:https://github.com/echistyakov/reactor-go/blob/ee85b1f144c3b16dc7149f45573cf029c9c139cb/mono/processor.go#L188
processor.Dispose, in turn, places theprocessorback into the global pool.Disposemethod gets interrupted by another goroutine right in between the following lines:https://github.com/echistyakov/reactor-go/blob/ee85b1f144c3b16dc7149f45573cf029c9c139cb/mono/processor.go#L46-L47
SuccessorErroron theprocessorinstance - setting theitemto some non-nilvalue.Disposeand theprocessoris placed back into the global pool.processoris soon removed again from the pool to be used by another caller.itemis notniland at the same time unrelated to the new caller. That's both a leak (between unrelated callers) and a race at the same time!After the fix
The issue with "socket already closed" goes away.