Skip to content

Conversation

@wjordan
Copy link
Contributor

@wjordan wjordan commented Oct 22, 2020

Description

This PR fixes a hang on shutdown that occurred intermittently in test_refork (example). I reproduced the hang more consistently running test_refork with an extra sleep after the 'b' (boot) message and before it starts the server (e.g., line 106 below):

begin
@worker_write << "b#{Process.pid}:#{index}\n"
rescue SystemCallError, IOError
Thread.current.purge_interrupt_queue if Thread.current.respond_to? :purge_interrupt_queue
STDERR.puts "Master seems to have exited, exiting."
return
end
while restart_server.pop
server_thread = server.run

The PR contains two small fixes.

  1. Prevent server from starting if worker receives TERM signal (0018edd). This handles some (but not 100% of all) edge cases where the TERM signal is received and calls server.stop before the server has started.
  2. Send TERM to fork_worker child processes in Cluster#wait_workers (aae4d4d) - When fork_worker is used, the call to Process.wait in wait_workers returns ECHILD for forked workers. This commit adds a call to w.term in the rescue clause, matching the behavior when fork_worker is not used and ensuring workers shut down properly. This should handle all other edge-cases where a worker receives TERM before the server starts.

Finally, note that this bug started occurring more frequently after #2435, because that PR changed behavior so that calling #stop before #run now has no effect. I've added test_command_ignored_before_run to clarify this new behavior in a unit test. I think this is more intuitive and predictable behavior overall, but we can change it back if anyone disagrees or if the change continues to cause any other issues.

If a worker receives a `TERM` signal before the server has started,
prevent it from starting at all by clearing `restart_server`.
When `fork_worker` is used, the call to `Process.wait` in `wait_workers`
returns `ECHILD` for forked workers.
Add a call to `w.term` in the rescue clause to match the behavior when
`fork_worker` is not used, ensuring workers shut down properly.
Unit test clarifying behavior of commands sent before Server#run is called.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug maintenance waiting-for-review Waiting on review from anyone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants