Skip to content

Conversation

@evanphx
Copy link
Member

@evanphx evanphx commented Apr 27, 2017

Talking through this with @nateberkopec, @schneems, @FooBarWidget, and others at RailsConf , we realized that accepting new clients without taking account for the available capacity of the thread pool doesn't improve throughput, it only hurts it in the case of multiple workers. If a worker pauses (or starts up before other workers), then a worker can accidentally suck up a high number of
clients and leave unused capacity inside the other workers.

This change will smooth out this issue, with a minor penalty to maximum throughput.

Talking through this with Nate and Richard, we realized that accepting
new clients without taking account for the available capacity of the
thread pool doesn't improve throughput, it only hurts it in the case of
multiple workers. If a worker pauses (or starts up before other
workers), then a worker can accidentally suck up a high number of
clients and leave unused capacity inside the other workers.

This change will smooth out this issue, with a minor penalty to maximum
throughput.
@nateberkopec
Copy link
Member

Linking #1107

@nateberkopec nateberkopec merged commit 482ea5a into master Apr 28, 2017
@nateberkopec
Copy link
Member

I think this could really improve performance for a lot of people, so I'm going to start looking at packaging up a new release.

@ivoanjo
Copy link

ivoanjo commented May 30, 2017

@evanphx and @nateberkopec, thanks for looking into this and looking forward to a puma release :)

@nateberkopec
Copy link
Member

👍 since the Jruby bug on master was fixed, I think I can push a 3.9.0 soon.

@waynerobinson
Copy link

This will also resolve some potential deadlocking issues for APIs that can sometimes end up calling themselves (i.e. authentication systems that use microservices that need authenticating).

👍👍

@joealba
Copy link

joealba commented Jun 13, 2017

@evanphx @schneems Do you recommend this upgrade for apps that are not yet threadsafe? And does that answer change if the app is also behind the Heroku router?

If I have N dynos with X workers each -- and the Heroku router picks a single busy worker in each of my N dynos, would that request fail due to the combination of this PR and a non-threadsafe app?

(I've tested this theory with ab and 4x higher concurrency than the number of workers, and I saw no failed requests. But I thought it prudent to ask you folks as well.)

References:

@schneems
Copy link
Contributor

apps that are not yet threadsafe?

I actually don't recommend puma for apps that aren't threadsafe, unless it's a step to getting the app threadsafe. I recommend the rack-freeze gem for threadsafety in the middleware (it's a great gem).

I've seen performance issues with Puma using X workers and 1 thread compared to Unicorn with X workers. I'm not totally sure why, but i've seen this several times. You could also try passenger.

I don't think this patch would hurt if you're only using 1 thread, but i've seen other perf issues with that setup. I would recommend either putting in leg work to get the app threadsafe, or moving to a different server.


pool << client
pool.wait_until_not_full unless queue_requests
pool.wait_until_not_full

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason queue_requests was removed here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is explained adequately in the OP?

# If we can still spin up new threads and there
# is work queued, then accept more work until we would
# spin up the max number of threads.
return if @todo.size < @max - @spawned
Copy link

@ivanpesin ivanpesin Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an equivalent of initial condition. With this change puma serving a single-threaded app with "threads 1,1" attempts to pass a request to a worker that is already serving a request. We're seeing occasional deadlocks for the requests coming right after a long-running request in our environment with "threads 1,1", "queue_requests false", and "workers 5" configuration. Rolling back this single commit to the original solves the issue completely.

Copy link
Member

@nateberkopec nateberkopec Nov 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of a single-threaded worker, @waiting will be 0, @todo.size will be 1, @max will be 1 and @spawned will be 1, so the condition evaluates to 1 < 1 - 1, which is zero, so Puma waits here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is in line:
+ return if @waiting > 0

This is not how the original condition worked: until @todo.size - @waiting < @max - @spawned or @shutdown. The fact that @waiting > 0 is not enough for decision not to do @not_full.wait @mutex.

I prepared a simple test scenario which demonstrates the problem: https://gist.github.com/ivanpesin/f40a76e51a60a83f1190113fdecddaa5

With this commit, second request gets always blocked until first request is served, even though there are available workers. Third requests goes to available worker.

With original condition, second request goes to next available worker and works as expected. The condition needs to be either reverted to original form or needs logic correction.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also had this issue.

Our setup is using 16 workers with 1 thread each running with an nginx server in front. We were getting high wait times between nginx and rails as measured by a middleware similar to the one at #1405 (comment). We partially undid the change in this pull request with the patch below and our wait times dropped as can be seen by the attached screenshot.

module PumaMonkeypatch
  def wait_until_not_full
    @mutex.synchronize do
      until @todo.size - @waiting < @max - @spawned or @shutdown
        @not_full.wait @mutex
      end
    end
  end
end
Puma::ThreadPool.prepend(::PumaMonkeypatch)

image

Thoughts on how to proceed? Should we file a new issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we file a new issue?

Please do.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Ticket created: #1471

@eprothro
Copy link
Contributor

eprothro commented Nov 8, 2017

If I'm understanding this change correctly:

Previously
If queue_requests was set, puma would eagerly accept connections, irrespective of the availability of worker threads in the pool to process them.

Now
Puma will only accept a connection if there is an available worker thread in the pool.

In effect, this leaves requests unaccepted (in the socket) instead of accepted and waiting for processing by the worker ("in" the queue_requests thread). And ultimately, as stated in the OP, this would result in better load balancing in a multi-worker configuration in certain scenarios.

Around the water cooler, it seems some are concerned that queue_requests has been deprecated or no longer has any effect.

To clarify, my understanding is that there is still benefit in having the queue_requests option, as this allows requests to be received without blocking a worker thread from handling on another request that has already been received. This has not changed, simply how many requests could be in this state (being handled by the queue_requests thread before being handed to a worker thread) has changed.

Is this correct?

@nateberkopec
Copy link
Member

@eprothro

To clarify, my understanding is that there is still benefit in having the queue_requests option, as this allows requests to be received without blocking a worker thread from handling on another request that has already been received. This has not changed, simply how many requests could be in this state (being handled by the queue_requests thread before being handed to a worker thread) has changed.

That is correct.

dannyfallon added a commit to dannyfallon/puma that referenced this pull request Apr 17, 2018
As part of the work in puma#1278 we refactored the conditional in
`ThreadPool#wait_until_not_full` because we found it difficult to reason about
at the RailsConf '17 roundtable. We broke it up a bit and everyone went away
happier. However it turns out that you need to account for both `@waiting` and
`@todo.size` together to make a determination on whether to accept more work
from the socket and the refactor was not equivalent to the old code.

There are no guarantees that request threads that have been signalled due to
work being added to the thread pool's queue will become active, take the mutex
and decrement `@waiting` before the main worker's thread runs
`ThreadPool#wait_until_not_full`. If we run that before the request thread
decrements @waiting we end up hitting the early exit of `return @waiting > 0`,
taking more work off the socket than the pool's capacity.

The correct way to determine the number of requests we have queued in the pool
that are outstanding is `@todo.size - @waiting`, so I've reverted to the old
behaviour and augmented the comment above it.
davidor added a commit to 3scale/puma that referenced this pull request Jul 10, 2020
In PR puma#1278 this was changed to avoid using the "queue_requests" params,
so workers only accept new connections if they have a free thread. This
commit reverts that.

I guess that change was beneficial for the most typical deployment
scenarios, but it has a noticeable performance penalty in the case of
Apisonator given that it is single threaded. Even with a moderate number
of connections there are lots of time-outs.
davidor added a commit to 3scale/puma that referenced this pull request Jul 13, 2020
In PR puma#1278 this was changed to avoid using the "queue_requests" params,
so workers only accept new connections if they have a free thread. This
commit reverts that.

I guess that change was beneficial for the most typical deployment
scenarios, but it has a noticeable performance penalty in the case of
Apisonator given that it is single threaded. Even with a moderate number
of connections there are lots of time-outs.
davidor added a commit to 3scale/puma that referenced this pull request Feb 11, 2021
In PR puma#1278 this was changed to avoid using the "queue_requests" params,
so workers only accept new connections if they have a free thread. This
commit reverts that.

I guess that change was beneficial for the most typical deployment
scenarios, but it has a noticeable performance penalty in the case of
Apisonator given that it is single threaded. Even with a moderate number
of connections there are lots of time-outs.
eguzki pushed a commit to 3scale/puma that referenced this pull request Nov 16, 2021
In PR puma#1278 this was changed to avoid using the "queue_requests" params,
so workers only accept new connections if they have a free thread. This
commit reverts that.

I guess that change was beneficial for the most typical deployment
scenarios, but it has a noticeable performance penalty in the case of
Apisonator given that it is single threaded. Even with a moderate number
of connections there are lots of time-outs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.