Skip to content

Conversation

@bailsman
Copy link
Contributor

Issue #612

This adds a configuration option queue_requests

With queue_requests set to true (the default), workers accept all
requests and queue them before passing them to the handlers.
With it set to false, each worker process accepts exactly as
many requests as it is configured to simultaneously handle.

In combination with threads 1, 1 this ensures that requests
are balanced across workers in a single threaded application.
This can avoid deadlocks when a single threaded app sends a
request to itself. (For example, to generate a PDF.)

With queue_requests set to true (the default), workers accept all
requests and queue them before passing them to the handlers.
With it set to false, each worker process accepts exactly as
many requests as it is configured to simultaneously handle.

In combination with threads 1, 1 this ensures that requests
are balanced across workers in a single threaded application.
This can avoid deadlocks when a single threaded app sends a
request to itself. (For example, to generate a PDF.)
evanphx added a commit that referenced this pull request Jan 20, 2015
…deadlock

Add a configuration option that prevents puma from queueing requests.
@evanphx evanphx merged commit bb2aeb8 into puma:master Jan 20, 2015
@bailsman bailsman deleted the issue-612-dependent-requests-deadlock branch January 20, 2015 22:29
schneems added a commit that referenced this pull request Oct 15, 2024
- Short: Removing this feature makes the internals easier to reason about and I do not believe there's a good reason to disable the reactor that we want to support. Research also points to disabling of the reactor to be an implementation detail in an initial effort to configure the ACCEPT behavior, and over time it morphed into effectively meaning "disable the reactor"

- Long: 

## Background

The `queue_requests` config was originally introduced in #640 in 2015. It is marked as closing this issue #612 (opened in 2014, ten years ago). 

The use case in the issue for this option to disable the reactor was this: 

A worker with 1 thread needs to send a request to itself to finish. The example given is downloading a PDF from the same server. If the reactor is accepting requests as fast as it can, the same worker that sent the request might accept the request, however, since it cannot complete the first request (waiting on the second), the second request will never execute. i.e. it will deadlock.

The problem described does not come from having a reactor that buffers request bodies, but rather from the application ACCEPT-ing a new request from the socket before it had capacity to process it. Disabling the reactor was effectively an implementation detail.

The issue was opened nearly 10 years ago and the justification given for wanting to run with one max thread is that "not all code is thread safe". In the decade since, the Ruby community has come a long way. Sidekiq's threaded job queue has become the industry standard. It's become an expectation that all Ruby code should be threadsafe for quite some time now. An application can avoid this deadlock situation by either not requiring this functionality (making requests to itself) or running with more threads. It's also worth noting that the possibility of deadlock is not zero when moving to 2 threads, as both those threads might be executing a "give me a PDF" request, however the chances are decreased. I do not believe this to be a primary use case we need to support, and further, we've since mitigated the issue in other ways.

In the same PR a `wait_until_not_full` method was introduced. But it was only called if queue_requests was disabled. This method still lives on today, but is called unconditionally before the socket is read. https://github.com/puma/puma/blob/a59afabe782ec8869bb709ca991b32c0b2cc95fd/lib/puma/server.rb#L363

What does `wait_until_not_full` do? It is what I call "anti-thundering-herd" measures where a worker will first check if it has capacity to handle a new request (it sees if all threads are busy or not) and only when it has capacity does it ACCEPT a new request from the socket. The situation for a thundering herd looks like this: Puma boots with multiple workers, one worker boots before the others and accepts more requests than it can process. The other workers boot and have nothing to work on and are idle. However it didn't always work as intended. This is one fix 9690d8f. I was present in the 2017 roundtable mentioned in the commit. Since then, this logic has been relatively stable.

Since that was introduced, we also introduced `wait_for_less_busy_worker` which adds a tiny sleep to the worker to handle a related yet different scenario caused by OS thread scheduling. In this scenario, the OS might favor executing a busy worker rather than one with capacity to take new requests. More details are in the PR where it was added #2079. The solutio was to add a really tiny sleep to the busy workers to hint to the OS to let the less busy ones be scheduled. This became a default 

## Queue requests confusion

The `queue_requests` config that was originally introduced to gate ACCEPT behavior, has eventually come to mean "disable the reactor" in the current codebase as defaults have changed and morphed over the years.

While the original docs introduced in that PR hint that enabling it deactivates keepalive functionality:

>      # Note that setting this to false disables HTTP keepalive and
>      # slow clients will occupy a handler thread while the request
>      # is being sent. A reverse proxy, such as nginx, can handle
>      # slow clients and queue requests before they reach puma.

It wasn't exactly clear what that meant. i.e. when a keepalive request is made to puma with queue requests disabled, what should be the expected behavior?

The released version of puma checks for several conditions before determining whether or not to send a "close" header to a keepalive request https://github.com/puma/puma/blob/d2b3b309bf76f252819cf6c89d72249895daf629/lib/puma/request.rb#L167-L176. The ability to deactivate this keepalive behavior is recent and is staged on main, it was introduced in #3496. Notably even HEAD does not deactivate this behavior based on the state of `queue_requests`.

The current behavior is that while that logic returns `true` and the request is not placed back into the reactor (not possible when queue_requests is false) it will loop and continue to accept keepalive requests over the same connection because this code https://github.com/puma/puma/blob/9712d29bc156a7fa05a6f4ec6deda5ac20df50be/lib/puma/server.rb#L468-L470 is inside of a while loop.

## Keepalive connection

This looping behavior for keepalive connections is not limited to when the reactor is disabled. There's a bug that currently means that keepalive requests effectively monopolize a thread for up to 10 sequential requests (by default) which is described here #3487.

All signs point to needing to put the keepalive request into the reactor as a part of the mitigation. However, as pointed out in this comment #3506 (comment) when request queuing is disabled we are left with a difficult path forward. We can either choose to continue monopolizing the thread and preserve the behavior that exists today. Or we can close the connection which aligns with the docs saying that this setting will not work with keepalive requests. Rather than make that determination for the user, this change suggests we should remove that codepath as an invalid configuration option.

## Usage

As part of informal research into useage of the feature I asked on mastodon and no one appears to be using it:

- Link: https://ruby.social/@Schneems/113289823703358711
- Screenshot: https://www.dropbox.com/s/9t1ur0rpqz5zkku/Screenshot%202024-10-14%20at%2012.43.27%E2%80%AFPM.png?dl=0

I asked in slack groups and the only example someone could give was this issue comment where (funny enough) they're using it to try to diagnose the very same keepalive buggy behavior #2311 (comment).

From my personal experience debugging Ruby applications at Heroku, I have seen this feature used in the wild, so I know it's used. It's just that when it is used, people are not aware that they're losing slow-client protection and disabling the reactor. Usually they're after some other side-effect that disabling it caused.

In short, I don't believe it's heavilly used, and for the cases where it is used, I don't believe those are valid use cases. In the comments I added a note requesting people inform us of their use cases if they feel they have a valid one so we can investigate it and possibly add a different configuration option.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants