Skip to content

[binder] Use combiner & refs to prevent data race#27182

Merged
TaWeiTu merged 1 commit intogrpc:masterfrom
TaWeiTu:combiner
Sep 9, 2021
Merged

[binder] Use combiner & refs to prevent data race#27182
TaWeiTu merged 1 commit intogrpc:masterfrom
TaWeiTu:combiner

Conversation

@TaWeiTu
Copy link
Copy Markdown
Contributor

@TaWeiTu TaWeiTu commented Aug 30, 2021

No description provided.

gbt->transport_stream_receiver->RegisterRecvInitialMetadata(
tx_code, [tx_code, gbs,
gbt](absl::StatusOr<grpc_binder::Metadata> initial_metadata) {
grpc_core::ExecCtx exec_ctx;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Hi @ctiller, this comment was originally left in the client fuzzer thread, but since relevant changes are moved to this PR, I copy it here. Please take a look when you have time. Thanks!)

Hi Craig, we're dealing with combiner and execution context in this PR, and would like your opinion on the following issue here:

The RegisterRecvInitialMetadata function here would register a callback which will be called once a binder transaction comes. The binder transaction will be coming from a separate thread and therefore this callback will be invoked in a non-gRPC thread as well. When this callback is called, we would like to move the received metadata to op->payload and call the corresponding recv_initial_metadata_ready callback. As a result, we would need to instantiate an execution context here, because otherwise the call to ExecCtx::Run() would fail.

We noticed that the HTTP/2 transport does not instantiate any execution context on its own, and also having an execution context in a non-gRPC thread makes the behavior of our transport inconsistent. An example of such inconsistency can be seen from client_fuzzer.cc, where we have to join the thread before the assertion that the cancel operation completes, otherwise grpc_completion_queue_next might return a timeout event. I'm not 100% sure about whether having an execution context in the transport is the root cause of the assertion failure though, but judging from the log it seems like every time the assertion fails, another thread is pushing a callback to the combiner at the same time.

We've tried removing the execution context by implementing something like read_action_locked in HTTP/2 transport, but couldn't make it work because we cannot poll the incoming binder transactions like HTTP/2 does.

What's your opinion on the issue? Is it OK to instantiate execution context in the transport like we did? Or do you have any suggestions on how we might be able to do something similar to polling and have ready callbacks be invoked in the main thread?

There are certain assumptions or statements above that I'm not fully confident about, so please kindly correct me if I'm missing something here.

Thanks!

@TaWeiTu
Copy link
Copy Markdown
Contributor Author

TaWeiTu commented Sep 6, 2021

Hi @sifmelcara, can you take a look at this PR? Thanks!

Copy link
Copy Markdown
Contributor

@sifmelcara sifmelcara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good to me, please make sure this is also tested in real apps with different kinds of RPC calls

@TaWeiTu TaWeiTu force-pushed the combiner branch 3 times, most recently from d41d740 to 47a49c4 Compare September 8, 2021 09:31
@TaWeiTu TaWeiTu merged commit 27eae53 into grpc:master Sep 9, 2021
ashithasantosh pushed a commit to ashithasantosh/grpc that referenced this pull request Sep 10, 2021
lidizheng pushed a commit to lidizheng/grpc that referenced this pull request Sep 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lang/core release notes: no Indicates if PR should not be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants