Prevent pollable from accessing a potentially orphaned/destroyed fd#15776
Merged
sreecha merged 8 commits intogrpc:masterfrom Jun 25, 2018
Merged
Prevent pollable from accessing a potentially orphaned/destroyed fd#15776sreecha merged 8 commits intogrpc:masterfrom
sreecha merged 8 commits intogrpc:masterfrom
Conversation
kpayson64
reviewed
Jun 15, 2018
Contributor
kpayson64
left a comment
There was a problem hiding this comment.
Good investigation!
Just 1 minor comment.
| gpr_mu_init(&(*p)->mu); | ||
| (*p)->epfd = epfd; | ||
| (*p)->owner_fd = nullptr; | ||
| gpr_mu_init(&(*p)->owner_orphan_mu); |
Contributor
There was a problem hiding this comment.
We should have a corresponding gpr_mu_destroy somewhere.
kpayson64
approved these changes
Jun 15, 2018
Contributor
Author
|
Added a test as well to catch similar failures in future. |
Contributor
|
LGTM |
yang-g
approved these changes
Jun 18, 2018
| pollset_transition_pollable_from_empty_to_fd_locked(pollset, fd); | ||
| gpr_mu_lock(&po_at_start->owner_orphan_mu); | ||
| if (po_at_start->owner_orphaned) { | ||
| pollset_transition_pollable_from_empty_to_fd_locked(pollset, fd); |
Contributor
Author
There was a problem hiding this comment.
this is a great catch!. thanks Yang.
Contributor
Author
There was a problem hiding this comment.
(I am not even sure how i missed this. I don't remember touching this line :)
Contributor
|
The tsan failure seems real. |
Contributor
Author
|
Addressed the TSAN failures |
|
|
|
|
|
|
|
|
|
|
|
yang-g
approved these changes
Jun 22, 2018
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fixes the issue: #15760
(copying the description of the problem again here:)
Issue
in epollex implementation, the pollable struct (see below) does not maintain a ref to owner_fd
Since a pollable can outlive its owner_fd, the owner_fd could be orphaned/destroyed without the pollable knowing about it - and this causes a problem if the pollable tries to access owner_fd later (for example, see the function pollset_add_fd_locked() that tries to access the owner_fd field!)
Example scenario of where this can be a problem:
Assume
epollexis the polling enginech1andch2(lets sayfd1andfd2are corresponding file descriptors)pollable_objfields in both the fds (fd1andfd2) would be set tonullptrat this point)cq(which creates a pollsetps)active_pollablefield in the pollsetpsis set tonullptrC1and sends it on channelch1and usescqpollset_add_fd(ps, fd1)functionpollable(lets call itpo) of typePO_FDfd1->pollable_objis set topo.ps->active_pollableis set topoandps->owner_fdis set tofd1pohas a ref count of 2. One fromfd1and the other frompsC1completes, and the channelch1is destroyed. This causesfd1to be orphaned and drops a reference topo. However, since the completion queue (and thereforeps) is still active, The pollablepois still active still with a refcount of 1C2on channelch2and usescqpollset_add_fd(ps, fd2)functionps->active_pollable->owner_fdfield which is invalid!!The above example might seem like a corner case but there are other scenarios where it can happen too. For example in the
backup_pollercode in thetcp_posixendpoint. The backup poller is creates a global pollset whose owner_fd may refer to a channel that is long gone!Fix
The fix is to synchronize
fd_orphanand the two methods that access theowner_fdfield in apollable(i.epollset_add_fd_lockedandpollset_as_multipollable_locked)