Skip to content

Resolve various races in QPS sync client#13576

Merged
vjpai merged 5 commits intogrpc:masterfrom
vjpai:qps_killer
Dec 4, 2017
Merged

Resolve various races in QPS sync client#13576
vjpai merged 5 commits intogrpc:masterfrom
vjpai:qps_killer

Conversation

@vjpai
Copy link
Copy Markdown
Contributor

@vjpai vjpai commented Nov 30, 2017

Fixes #13122 and various related issues

This resolves several different kinds of races in the QPS sync client:

  1. Check if a specific RPC client thread needs to be shutdown before it is even started (analogous to Fix TSAN json_run_localhost flake #13409 for the async client)
  2. Move end-of-test stream cancellation to DestroyMultithreading (which is called before the client has to shutdown) rather than at the client destructor. Test cases such as streaming_from_server could end up getting stuck forever otherwise if a client thread was still doing a Read op and could never escape it to do the thread join, creating a deadlock
  3. Don't let an RPC completion trigger a new RPC if the thread has needed to be shutdown in between
  4. Don't let an RPC finishing trigger a refresh of the ClientContext if the thread has needed to be shutdown meanwhile
  5. Protect stream-changing operations with a lock so that a cancellation in a different thread doesn't conflict with a regular stream operation in the common thread for handling a stream

Additionally, this PR refactors a common code path by which a "cleanup" operation is applied to all streams at different points.

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

1 similar comment
@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@vjpai vjpai changed the title Attempt to resolve sync qps test timeouts Resolve various races in QPS sync client Dec 4, 2017
@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@vjpai
Copy link
Copy Markdown
Contributor Author

vjpai commented Dec 4, 2017

The flakes on this PR have nothing to do with QPS testing. This is ready to review.

Copy link
Copy Markdown
Contributor

@kpayson64 kpayson64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind adding a quick comment in the PR description as to the nature of the races (Handle Shutdown before initialization should be fine if I understand this correctly).

Also what do you think about changing the cleaner lambda to a overrideable function? Feel free to say "lambda's are lit, get with the times old man".

cleanup_threads.emplace_back([this, i, cleaner]() {
std::lock_guard<std::mutex> l(stream_mu_[i]);
if (stream_[i]) {
shutdown_[i].val = cleaner(&context_[i], stream_[i].get());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused by this. What happens if stream is not initialized by the time we call OnAllStreams? Then shutdown would never get set. It looks like this PR addresses a race condition where we shutdown before we finish initialization, but if looks like its impossible for that to happen.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I guess that case wasn't happening yet in practice. I've resolved this.

cleanup_threads.emplace_back([this, i, cleaner]() {
std::lock_guard<std::mutex> l(stream_mu_[i]);
if (stream_[i]) {
shutdown_[i].val = cleaner(&context_[i], stream_[i].get());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like all the lambdas return true. Can we just assign shutdown_[i].val = true for simplicity? (I was confused by what happens if the cleaner fails at first).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. My original intention was to make this a general map-operation-on-stream but there is no need for such generality now (or possibly ever). Changed.

@vjpai
Copy link
Copy Markdown
Contributor Author

vjpai commented Dec 4, 2017

I've also edited the PR description with details about the races being resolved and changed the lambdas to a virtual function. I was originally resistant about doing that because the base class SynchronousStreamingClient actually called OnAllStreams in two different places: from DestroyMultithreading and also from the destructor. So I offloaded the part from the destructor down to the derived classes and only kept the version that gets called when destroying multithreading. Again, it's not the most general thing to do but it should be ok for this.

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



1 similar comment
@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@vjpai
Copy link
Copy Markdown
Contributor Author

vjpai commented Dec 4, 2017

Wait, my last change was dumb. I should have tested it locally first, will fix.

@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@vjpai
Copy link
Copy Markdown
Contributor Author

vjpai commented Dec 4, 2017

I've changed my mind. Lambdas are lit. This is ready for re-review. I think the new form of the lambdas is better.

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

1 similar comment
@grpc-testing
Copy link
Copy Markdown

[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_fullstack_unary_ping_pong.BM_UnaryPingPong_MinInProcessCHTTP2_NoOpMutator_NoOpMutator__1_0.counters.old: 1


[microbenchmarks] No significant performance differences

@vjpai
Copy link
Copy Markdown
Contributor Author

vjpai commented Dec 4, 2017

Again, no QPS-related flakes!

@vjpai vjpai merged commit c3b1e55 into grpc:master Dec 4, 2017
@vjpai vjpai deleted the qps_killer branch December 4, 2017 20:03
@lock lock bot locked as resolved and limited conversation to collaborators Jan 21, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

json_run_localhost:cpp_protobuf_sync_streaming_from_server_qps_unconstrained_secure_low_thread_count timeout (on ASAN and TSAN)

3 participants