[binder] Use combiner & refs to prevent data race#27182
Conversation
| gbt->transport_stream_receiver->RegisterRecvInitialMetadata( | ||
| tx_code, [tx_code, gbs, | ||
| gbt](absl::StatusOr<grpc_binder::Metadata> initial_metadata) { | ||
| grpc_core::ExecCtx exec_ctx; |
There was a problem hiding this comment.
(Hi @ctiller, this comment was originally left in the client fuzzer thread, but since relevant changes are moved to this PR, I copy it here. Please take a look when you have time. Thanks!)
Hi Craig, we're dealing with combiner and execution context in this PR, and would like your opinion on the following issue here:
The RegisterRecvInitialMetadata function here would register a callback which will be called once a binder transaction comes. The binder transaction will be coming from a separate thread and therefore this callback will be invoked in a non-gRPC thread as well. When this callback is called, we would like to move the received metadata to op->payload and call the corresponding recv_initial_metadata_ready callback. As a result, we would need to instantiate an execution context here, because otherwise the call to ExecCtx::Run() would fail.
We noticed that the HTTP/2 transport does not instantiate any execution context on its own, and also having an execution context in a non-gRPC thread makes the behavior of our transport inconsistent. An example of such inconsistency can be seen from client_fuzzer.cc, where we have to join the thread before the assertion that the cancel operation completes, otherwise grpc_completion_queue_next might return a timeout event. I'm not 100% sure about whether having an execution context in the transport is the root cause of the assertion failure though, but judging from the log it seems like every time the assertion fails, another thread is pushing a callback to the combiner at the same time.
We've tried removing the execution context by implementing something like read_action_locked in HTTP/2 transport, but couldn't make it work because we cannot poll the incoming binder transactions like HTTP/2 does.
What's your opinion on the issue? Is it OK to instantiate execution context in the transport like we did? Or do you have any suggestions on how we might be able to do something similar to polling and have ready callbacks be invoked in the main thread?
There are certain assumptions or statements above that I'm not fully confident about, so please kindly correct me if I'm missing something here.
Thanks!
|
Hi @sifmelcara, can you take a look at this PR? Thanks! |
sifmelcara
left a comment
There was a problem hiding this comment.
Generally looks good to me, please make sure this is also tested in real apps with different kinds of RPC calls
d41d740 to
47a49c4
Compare
No description provided.