Skip to content

Conversation

@sriedel
Copy link
Contributor

@sriedel sriedel commented Nov 28, 2016

This pull request changes a few things in the cluster:

  • Move the check worker interval into a constant (please check to see if all occurrences of 5 were indeed meant to indicate the same interval)
  • Move the killing of workers out of the signal handler into a #cull_workers method called from the #check_workers method
  • Skip processing of #spawn_workers and #cull_workers if there is no difference in the number of workers to the expected number of workers (to deal with diff now being able to be < 0 in #spawn_workers)
  • When a worker is culled, it is logged (takes care of TTOU doesn't log anything #1147)

This of course changes the behaviour of puma in dealing with TTOU signals, as they are no longer processed immediately, but there is a delay until the next WORKER_CHECK_INTERVAL expires. On the upside, multiple TTOU signals can be handled at once and the signal handler is leaner and more symmetric to the TTIN signal handler.

Unfortunately I did not come up with a way to add tests for the changes (or the signal handlers themselves) without changing the internal structure of Puma::Launcher and Puma::Cluster. If you can give me pointers to how you'd like to have this tested, I'd be happy to add the tests.


@workers.delete_if(&:dead?)

cull_workers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there really ever a case where we should cull and spawn?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be rearchitected so that cull_workers and spawn_workers takes an integer argument, and then you calculate the value to give to those methods here. You could then call cull_workers(1) in your TTOU signal trap.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, if the signals come fast enough, they can "queue up" before an indirect handler such as this one has a chance to run. But you're right, maybe having a difference counter is a better approach. I'll have a look at it this week.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this approach is fine, no need to combine them for now.

Signal.trap "TTOU" do
@options[:workers] -= 1 if @options[:workers] >= 2
@workers.last.term
wakeup!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this ends up delaying the response to TTOU until the next run of check_workers, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Derp, you said that in your description.

@nateberkopec
Copy link
Member

multiple TTOU signals can be handled at once

Can you explain a little more about how/why this might occur?

@sriedel
Copy link
Contributor Author

sriedel commented Dec 19, 2016

It could happen if the signals are fired in quick succession (and maybe even by some automated process that wants to scale out by say 5 workers). If the signal handler isn't fast enough and the process is busy, signals can get dropped by the kernel, or worse, the signal handler can get interrupted by the same signal being sent again (thus we have a race condition in the signal handler).

The more I think about it, it may actually be better to rearchitect the entire signal handling block to use a self pipe (q.v. https://www.sitepoint.com/the-self-pipe-trick-explained/) in order to minimize the race condition potential (that already exists with the worker counter increment/decrement code as it stands). But that may be too much change in one go.

Would you rather have a second pull request introducing a self pipe, or should I try and rewrite everything at once?

@nateberkopec
Copy link
Member

and maybe even by some automated process that wants to scale out by say 5 workers

Ah right, this is probably a fairly common case.

I have no opinion on whether or not a self pipe is better or worse here. Not including a self pipe with this PR would not make the code any worse than it already stands, so let's do the changes separately.

@evanphx
Copy link
Member

evanphx commented Dec 21, 2016

:shipit:

@nateberkopec
Copy link
Member

You heard the man! MERGE!

@nateberkopec nateberkopec merged commit ba7fa8a into puma:master Dec 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants