Skip to content

Ruby: refactor init/shutdown logic to avoid using atexit; fix windows#17997

Merged
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:fix_ruby_windows
Feb 11, 2019
Merged

Ruby: refactor init/shutdown logic to avoid using atexit; fix windows#17997
apolcyn merged 1 commit intogrpc:masterfrom
apolcyn:fix_ruby_windows

Conversation

@apolcyn
Copy link
Copy Markdown
Contributor

@apolcyn apolcyn commented Feb 10, 2019

Fixes #17799

It looks like grpc-ruby on Windows is running into the issue described in this interesting blog post. Currently, in the ruby extension on Windows, Ruby loads the grpc-ruby extension library, and that library sets up an atexit handler to call grpc_rb_shutdown, which then calls into the C-core dll. Apparently, on Windows each dll has it's own separate atexit stack, and the order of process shutdown goes roughly as follows:

  1. main thread falls of main and starts to kill off background threads
  2. the dll's present in the library have their own atexit handlers called as a part of their unloading

So in grpc-ruby extension's atexit handler, it calls grpc_shutdown and proceeds to try to await timer threads to signal their completions, but the timer threads are no longer running, and so we hang.

Overall it seems atexit is dubious. So this PR refactors the grpc_init/grpc_shutdown logic as follows:

  1. At the top of the "alloc" hook of every ruby object that has a dependency on C-core, call grpc_ruby_init
  2. At the end of every such object's GC hook, call grpc_ruby_shutdown
  3. Wrap the lifetime of grpc-ruby background threads (the call creds thread and the connection poller thread) in a grpc_ruby_init/grpc_ruby_shutdown pair.

Note that we no longer schedule grpc init and shutdown only once, and I've deleted the comment on why that's needed. But I believe the described scenario would have been broken for a long time anyways :), because there are ruby-level grpc background threads for which we don't make any attempt to cleanly reset their state at the end of a ruby VM's lifetime.

@apolcyn
Copy link
Copy Markdown
Contributor Author

apolcyn commented Feb 11, 2019

cloud to prod interop test failure: #18004

@apolcyn apolcyn merged commit d0d93bd into grpc:master Feb 11, 2019
@lock lock bot locked as resolved and limited conversation to collaborators May 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ruby client hangs on Windows

2 participants