Auto-port 5.0: Fix race in io.netty.channel.uring.IoUringIoHandler.wakeup#16842
Merged
Conversation
Motivation:
Fixes a shutdown race in the io_uring transport where `wakeup()` can
write to an eventfd after it has already been closed.
The race looks like this:
```text
Thread T1 (non-eventloop) Eventloop thread
------------------------- ----------------
wakeup()
getAndSet(true) -> false
flag = true
[preempted before eventfd_write]
prepareToDestroy()
eventfd_write(1)
submitAndGet()
processCompletionsAndHandleOverflow(...)
handleEventFdRead()
if (!eventFdClosing) {
flag = false
}
destroy()
drainEventFd()
flag.getAndSet(true) -> false
concludes "no pending wakeup"
completeRingClose()
close(eventfd)
T1 resumes
eventfd_write(eventfd)
-> EBADF
```
I also considered setting `eventFdClosing` earlier, but that changes a
different part of the eventfd lifecycle. When `eventFdClosing` is set,
`handleEventFdRead()` stops clearing `eventfdAsyncNotify` and stops
submitting the next eventfd read:
```text
handleEventFdRead()
eventfdReadSubmitted = 0
if (!eventFdClosing) {
eventfdAsyncNotify = false
submitEventFdRead()
}
```
So if `eventFdClosing` is moved too early into `prepareToDestroy()`,
shutdown can get stuck in this shape:
```text
prepareToDestroy()
eventFdClosing = true
eventfd read completion arrives
handleEventFdRead()
does not clear eventfdAsyncNotify
does not submit another eventfd read
destroy()
drainEventFd()
sees eventfdAsyncNotify still true
waits for an eventfd read completion that will not be submitted
```
Modification:
Add a small wakeup write gate around `eventfd_write()`:
```text
wakeup thread event-loop thread
------------- -----------------
reserve wakeup writer slot
eventfd_write(eventfd)
release wakeup writer slot
close gate
wait until writer count is 0
close(eventfd)
```
If the close gate is already closed, `wakeup()` returns without writing
to the eventfd. At that
point there is no event loop left to wake up, so dropping the wakeup is
the correct behavior.
I intentionally did not add a larger eventfd state machine or more
eventfd write states. The race does not require modeling the whole
eventfd lifecycle.
This approach does add some extra atomic read/write overhead on the
wakeup path. However, compared with the state-machine based alternative,
it keeps the fix more easier to read about.
I did not add a normal unit test for this race because the problematic
window is between the `eventfdAsyncNotify` state transition and the
native `eventfd_write()` call, and there is no good deterministic way to
force that interleaving without adding test-only control points to
`IoUringIoHandler`.
Result:
Fixes #16716.
(cherry picked from commit b022b47)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Auto-port of #16836 to 5.0
Cherry-picked commit: b022b47
Motivation:
Fixes a shutdown race in the io_uring transport where
wakeup()can write to an eventfd after it has already been closed.The race looks like this:
I also considered setting
eventFdClosingearlier, but that changes a different part of the eventfd lifecycle. WheneventFdClosingis set,handleEventFdRead()stops clearingeventfdAsyncNotifyand stops submitting the next eventfd read:So if
eventFdClosingis moved too early intoprepareToDestroy(), shutdown can get stuck in this shape:Modification:
Add a small wakeup write gate around
eventfd_write():If the close gate is already closed,
wakeup()returns without writing to the eventfd. At thatpoint there is no event loop left to wake up, so dropping the wakeup is the correct behavior.
I intentionally did not add a larger eventfd state machine or more eventfd write states. The race does not require modeling the whole eventfd lifecycle.
This approach does add some extra atomic read/write overhead on the wakeup path. However, compared with the state-machine based alternative, it keeps the fix more easier to read about.
I did not add a normal unit test for this race because the problematic window is between the
eventfdAsyncNotifystate transition and the nativeeventfd_write()call, and there is no good deterministic way to force that interleaving without adding test-only control points toIoUringIoHandler.Result:
Fixes #16716.
I verified the fix with a dedicated reproducer that pauses
wakeup()in the race window: https://github.com/dreamlike-ocean/netty/tree/repro-16716-eventfd-raceThe reproducer fails before this change with
eventfd_write(...) failed: Bad file descriptorand passes after applying the fix.I may not fully understand all the details of this race yet, please take a look together and help verify whether this fix is reasonable.
@franz1981 @normanmaurer @tsegismont