Skip to content

Unref global backup poller while under its lock#13670

Merged
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:fix_backup_poller_race
Dec 7, 2017
Merged

Unref global backup poller while under its lock#13670
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:fix_backup_poller_race

Conversation

@apolcyn
Copy link
Copy Markdown
Contributor

@apolcyn apolcyn commented Dec 7, 2017

Fixes #13272

Tweaks the flakey ruby test to exaggerate the failure in that issue, and should fix the issue (I'm not able to reproduce after the fix).

It looks to me like there's a race between the setting of g_poller to nullptr and a newly malloc'd backup_poller.
I wasn't able to run this under TSAN due to difficulty in building with ruby under TSAN, but still there seems to be a race for example in this scenario:

  1. Thread A calls grpc_client_channel_start_backup_polling and sets g_poller to a new backup poller
  2. Thread A calls grpc_client_channel_stop_backup_polling and then unrefs g_poller->refs, but hasn't acquired g_poller_mu yet.
  3. Thread B calls grpc_client_channel_start_backup_polling and acquires g_poller_mu. It doesn't set g_poller to anything new since it isn't nullptr at this point.
  4. Thread A finally acquires g_poller_mu and sets g_poller to nullptr, since it thinks there are no still refs on it.
  5. Later on, g_poller will be unexpectedly set to nullptr. (under gdb, I consistently saw the segfault within grpc_client_channel_start_backup_polling, where g_poller was not set to a valid pointer).

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                            FILE SIZE
 ++++++++++++++ GROWING                                              ++++++++++++++
  +1.2%     +12 src/core/ext/filters/client_channel/backup_poller.cc     +12  +1.2%
      +5.7%     +12 grpc_client_channel_stop_backup_polling                  +12  +5.7%

 -+-+-+-+-+-+-+ MIXED                                                +-+-+-+-+-+-+-
  -0.0%     -12 [None]                                                   +44  +0.0%

  [ = ]       0 TOTAL                                                    +56  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't you have to unconditionally unlock?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It leaves the old unlock within the if, I think it's ok

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes.

@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@apolcyn apolcyn force-pushed the fix_backup_poller_race branch from a44404b to f7a869f Compare December 7, 2017 17:59
@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Dec 7, 2017

Just updated with small change to the ruby test since it got caught in the ruby style checker

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE                                                            FILE SIZE
 ++++++++++++++ GROWING                                              ++++++++++++++
  +1.2%     +12 src/core/ext/filters/client_channel/backup_poller.cc     +12  +1.2%
      +5.7%     +12 grpc_client_channel_stop_backup_polling                  +12  +5.7%

 -+-+-+-+-+-+-+ MIXED                                                +-+-+-+-+-+-+-
  -0.0%     -12 [None]                                                   +44  +0.0%

  [ = ]       0 TOTAL                                                    +56  +0.0%


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Dec 7, 2017

basic tests macos failure: #13669
basic tests multi-lang linux failure: #13669

@apolcyn apolcyn merged commit 4ff97d2 into grpc:master Dec 7, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ruby crash in tools/run_tests/helper_scripts/run_ruby_end2end_tests.sh

3 participants