Skip to content

Prevent pollable from accessing a potentially orphaned/destroyed fd#15776

Merged
sreecha merged 8 commits intogrpc:masterfrom
sreecha:epollex-ownerfd-fix
Jun 25, 2018
Merged

Prevent pollable from accessing a potentially orphaned/destroyed fd#15776
sreecha merged 8 commits intogrpc:masterfrom
sreecha:epollex-ownerfd-fix

Conversation

@sreecha
Copy link
Copy Markdown
Contributor

@sreecha sreecha commented Jun 14, 2018

This fixes the issue: #15760
(copying the description of the problem again here:)

Issue

in epollex implementation, the pollable struct (see below) does not maintain a ref to owner_fd

struct pollable {
  ...
  // only for type fd... one ref to the owner fd
  grpc_fd* owner_fd;
...
};

Since a pollable can outlive its owner_fd, the owner_fd could be orphaned/destroyed without the pollable knowing about it - and this causes a problem if the pollable tries to access owner_fd later (for example, see the function pollset_add_fd_locked() that tries to access the owner_fd field!)

Example scenario of where this can be a problem:

Assume epollex is the polling engine

  • Client creates two channels ch1 and ch2 (lets say fd1 and fd2 are corresponding file descriptors)
    • (Note that pollable_obj fields in both the fds (fd1 and fd2) would be set to nullptr at this point)
  • Client creates completion queue cq (which creates a pollset ps)
    • (Note that the active_pollable field in the pollset ps is set to nullptr
  • Client creates a call C1 and sends it on channel ch1 and uses cq
    • This calls pollset_add_fd(ps, fd1) function
    • This ends up creating a pollable (lets call it po) of type PO_FD
    • fd1->pollable_obj is set to po. ps->active_pollable is set to po and ps->owner_fd is set to fd1
    • Also note that po has a ref count of 2. One from fd1 and the other from ps
  • C1 completes, and the channel ch1 is destroyed. This causes fd1 to be orphaned and drops a reference to po. However, since the completion queue (and therefore ps) is still active, The pollable po is still active still with a refcount of 1
  • Client creates a new call C2 on channel ch2 and uses cq
    • This calls pollset_add_fd(ps, fd2) function
    • This accesses ps->active_pollable->owner_fd field which is invalid!!

The above example might seem like a corner case but there are other scenarios where it can happen too. For example in the backup_poller code in the tcp_posix endpoint. The backup poller is creates a global pollset whose owner_fd may refer to a channel that is long gone!

Fix

The fix is to synchronize fd_orphan and the two methods that access the owner_fd field in a pollable (i.e pollset_add_fd_locked and pollset_as_multipollable_locked)

@sreecha sreecha requested review from kpayson64 and yang-g June 14, 2018 23:56
Copy link
Copy Markdown
Contributor

@kpayson64 kpayson64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good investigation!

Just 1 minor comment.

gpr_mu_init(&(*p)->mu);
(*p)->epfd = epfd;
(*p)->owner_fd = nullptr;
gpr_mu_init(&(*p)->owner_orphan_mu);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a corresponding gpr_mu_destroy somewhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@sreecha
Copy link
Copy Markdown
Contributor Author

sreecha commented Jun 16, 2018

Added a test as well to catch similar failures in future.
@yang-g and @kpayson64 PTAL

@kpayson64
Copy link
Copy Markdown
Contributor

LGTM

pollset_transition_pollable_from_empty_to_fd_locked(pollset, fd);
gpr_mu_lock(&po_at_start->owner_orphan_mu);
if (po_at_start->owner_orphaned) {
pollset_transition_pollable_from_empty_to_fd_locked(pollset, fd);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add error = ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great catch!. thanks Yang.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I am not even sure how i missed this. I don't remember touching this line :)

@yang-g
Copy link
Copy Markdown
Contributor

yang-g commented Jun 18, 2018

The tsan failure seems real.

@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@grpc grpc deleted a comment from grpc-testing Jun 20, 2018
@sreecha
Copy link
Copy Markdown
Contributor Author

sreecha commented Jun 20, 2018

Addressed the TSAN failures

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                              FILE SIZE
 ++++++++++++++ GROWING                                ++++++++++++++
  +0.0%     +16 [None]                                    +248  +0.0%
  +1.2%    +144 src/core/lib/iomgr/ev_epollex_linux.cc    +144  +1.2%
       +41%     +72 fd_orphan                                  +72   +41%
      +1.6%     +32 pollset_work                               +32  +1.6%
      +3.8%     +25 pollable_create                            +25  +3.8%
       +15%     +16 pollable_unref                             +16   +15%
      +2.2%     +16 pollable_process_events                    +16  +2.2%

  +0.0%    +160 TOTAL                                     +392  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                              FILE SIZE
 ++++++++++++++ GROWING                                ++++++++++++++
  +0.0%     +48 [None]                                    +248  +0.0%
  +1.2%    +144 src/core/lib/iomgr/ev_epollex_linux.cc    +144  +1.2%
       +41%     +72 fd_orphan                                  +72   +41%
      +1.6%     +32 pollset_work                               +32  +1.6%
      +3.8%     +25 pollable_create                            +25  +3.8%
       +15%     +16 pollable_unref                             +16   +15%
      +2.2%     +16 pollable_process_events                    +16  +2.2%

  +0.0%    +192 TOTAL                                     +392  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] Performance differences noted:
Benchmark                                                                                 cpu_time    real_time
----------------------------------------------------------------------------------------  ----------  -----------
BM_StreamingPingPong<InProcess, NoOpMutator, NoOpMutator>/262144/2                        -5%         -5%
BM_StreamingPingPongMsgs<InProcess, NoOpMutator, NoOpMutator>/32768                       -8%         -8%
BM_StreamingPingPongMsgs<MinInProcess, NoOpMutator, NoOpMutator>/32768                    -4%         -4%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/262144/1/1  -7%         -7%
BM_StreamingPingPongWithCoalescingApi<MinInProcess, NoOpMutator, NoOpMutator>/262144/2/1  -4%         -4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/0/2097152                           +4%         +4%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/0/262144                            +8%         +8%
BM_UnaryPingPong<InProcess, NoOpMutator, NoOpMutator>/262144/262144                       +7%         +7%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/0/262144                         +8%         +8%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/262144/262144                    +9%         +9%
BM_UnaryPingPong<MinInProcess, NoOpMutator, NoOpMutator>/32768/0                          +4%         +4%

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                              FILE SIZE
 ++++++++++++++ GROWING                                ++++++++++++++
  +0.0%     +48 [None]                                    +248  +0.0%
  +1.2%    +144 src/core/lib/iomgr/ev_epollex_linux.cc    +144  +1.2%
       +41%     +72 fd_orphan                                  +72   +41%
      +1.6%     +32 pollset_work                               +32  +1.6%
      +3.8%     +25 pollable_create                            +25  +3.8%
       +15%     +16 pollable_unref                             +16   +15%
      +2.2%     +16 pollable_process_events                    +16  +2.2%

  +0.0%    +192 TOTAL                                     +392  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                              FILE SIZE
 ++++++++++++++ GROWING                                ++++++++++++++
  +0.0%     +48 [None]                                    +240  +0.0%
  +1.2%    +144 src/core/lib/iomgr/ev_epollex_linux.cc    +144  +1.2%
       +41%     +72 fd_orphan                                  +72   +41%
      +1.6%     +32 pollset_work                               +32  +1.6%
      +3.8%     +25 pollable_create                            +25  +3.8%
       +15%     +16 pollable_unref                             +16   +15%
      +2.2%     +16 pollable_process_events                    +16  +2.2%

  +0.0%    +192 TOTAL                                     +384  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@sreecha sreecha merged commit e0198fc into grpc:master Jun 25, 2018
@sreecha sreecha deleted the epollex-ownerfd-fix branch June 25, 2018 20:38
@lock lock bot locked as resolved and limited conversation to collaborators Sep 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants