Skip to content

Conversation

@MSP-Greg
Copy link
Member

@MSP-Greg MSP-Greg commented Nov 6, 2024

Description

While looking into #3540, I noticed that the read_nonblock calls used for requests were either 4k or 16k. For large request bodies, this causes a large number of loops to read them.

Using curl, I increased the value to 128k, and checked the bytes read. They were 64k for both content-length and chunked bodies.

Also, the 16k value in Puma::Const dates back to at least 2007, and networks are much different now.

Your checklist for this pull request

  • I have reviewed the guidelines for contributing to this repository.
  • I have added (or updated) appropriate tests if this PR fixes a bug or adds a feature.
  • My pull request is 100 lines added/removed or less so that it can be easily reviewed.
  • If this PR doesn't need tests (docs change), I added [ci skip] to the title of the PR.
  • If this closes any issues, I have added "Closes #issue" to the PR description or my commit messages.
  • I have updated the documentation accordingly.
  • All new and existing tests passed, including Rubocop.

@dentarg
Copy link
Member

dentarg commented Nov 6, 2024

I added the "perf" label here because this is more efficient, right?

How did you check with curl? If you don't mind sharing, I don't think I fully followed.

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Nov 6, 2024

If you don't mind sharing

Certainly. I was debating whether to comment out some statements, but instead removed them.

One needs to have a large file, then add debug statements that output the bytesize of what is read by the @io.read_nonblock statements in client.rb

As to curl, something like

curl --form "data=@<large file path>" http://127.0.0.1:<port>/
curl --form "data=@<large file path>" --header "Transfer-Encoding: chunked" http://127.0.0.1:<port>/

So, one could take this PR, and up the CHUNK_SIZE to 128 kB, and you should see approx 64 kB as the limit for what is read.

@MSP-Greg
Copy link
Member Author

MSP-Greg commented Nov 7, 2024

@dentarg

Try the branch at https://github.com/MSP-Greg/puma/tree/00-lrg-request-body-debug

I used a 4mB file generated with

# Writes 4 mB file
txt64kb = "*" * (64 * 1_024)
File.open(<file path>, 'w+') { |f|
  64.times { f.syswrite txt64kb }
}

Then set

$ export PUMA_DEBUG_CLIENT_READ=true

You can then run either of the following:

test/runner -v test_puma_server_err.rb

or:

bundle exec bin/puma -b tcp://127.0.0.1:40001 -t1:1 test/rackup/hello.ru

and, in another console:

curl --form "data=@< 4mB file path >" http://127.0.0.1:40001/
curl --form "data=@< 4mB file path >" --header "Transfer-Encoding: chunked" http://127.0.0.1:40001/

The branch sets CHUNK_SIZE to 128 kB, most of the bytesize logging on my system shows approx 64 kB.

@MSP-Greg
Copy link
Member Author

@dentarg

Finally got around to trying this on macOS Sonoma. Using it with curl, it seems to send 128k packets, or that's what Puma reads. As noteds above, WSL2/Ubuntu 24.04 sends 64k packets.

Hence, I think the update to 64k is okay. Have you had a chance to look at this?

@dentarg
Copy link
Member

dentarg commented Nov 21, 2024

I have tested now, thanks for explaining! :)

@dentarg
Copy link
Member

dentarg commented Nov 21, 2024

Should we benchmark this? Can we?

@dentarg
Copy link
Member

dentarg commented Nov 21, 2024

I did this:

This PR running on port 8080

~/src/dup/puma 00-req-body-chunk-size
$ echo 'app { |env| [200, {}, ["OK"]] }' | ruby -Ilib ./bin/puma --config /dev/stdin --port 8080
Puma starting in single mode...
* Puma version: 6.4.3 ("The Eagle of Durango")
* Ruby version: ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [arm64-darwin23]
*  Min threads: 0
*  Max threads: 5
*  Environment: development
*          PID: 68485
* Listening on http://0.0.0.0:8080
Use Ctrl-C to stop

created a big file

$ dd if=/dev/zero of=bigfile bs=4k count=100000
...
$ ls -ahl bigfile
-rw-r--r--@ 1 dentarg  staff   391M Nov 21 22:54 bigfile
$ for i in {1..10} ; do curl --form "data=@bigfile" --url http://127.0.0.1:8080 --write-out " time_total: %{time_total}\n"; done
OK time_total: 0.003273
OK time_total: 0.003007
OK time_total: 0.002918
OK time_total: 0.002590
OK time_total: 0.002704
OK time_total: 0.004606
OK time_total: 0.002739
OK time_total: 0.002599
OK time_total: 0.002232
OK time_total: 0.002265

master (932f5d0) running on port 8181

~/src/puma master
$ echo 'app { |env| [200, {}, ["OK"]] }' | ruby -Ilib ./bin/puma --config /dev/stdin --port 8181
Puma starting in single mode...
* Puma version: 6.4.3 ("The Eagle of Durango")
* Ruby version: ruby 3.3.5 (2024-09-03 revision ef084cc8f4) [arm64-darwin23]
*  Min threads: 0
*  Max threads: 5
*  Environment: development
*          PID: 62559
* Listening on http://0.0.0.0:8181
Use Ctrl-C to stop
$ for i in {1..10} ; do curl --form "data=@bigfile" --url http://127.0.0.1:8181 --write-out " time_total: %{time_total}\n"; done
OK time_total: 0.374278
OK time_total: 0.306575
OK time_total: 0.300101
OK time_total: 0.280538
OK time_total: 0.288668
OK time_total: 0.300438
OK time_total: 0.294658
OK time_total: 0.301145
OK time_total: 0.292926
OK time_total: 0.292424

That's pretty good.

Copy link
Member

@dentarg dentarg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to make a lot of sense

@MSP-Greg MSP-Greg merged commit 89740fb into puma:master Nov 21, 2024
@MSP-Greg
Copy link
Member Author

@dentarg

Thanks for testing. I didn't check speed. I knew it would be faster, and less 'gc pressure'. But, two orders of magnitude is a big jump.

Real world networks may not have that much improvement, but they'll certainly be faster. Also, many sites don't have large request bodies...

@MSP-Greg MSP-Greg deleted the 00-req-body-chunk-size branch November 21, 2024 22:25
@mcfadden
Copy link

I think this is a great change, but wanted to mention a gotcha that we ran into after this rolled out.

We run nginx in front of Puma. At some point in time we learned that responses with large headers would cause errors from nginx:

upstream sent too big header while reading response header from upstream

After that error, nginx would also stop using that upstream connection for a few seconds.

Ultimately, we tracked it down to responses with headers over 8k, and eventually we figured out that some of the buffer settings in nginx defaulted to 8k, but Puma would send chunks that were 16k. When the buffers didn't match, things got mad really fast. Matching the puma and nginx buffer sizes solved it!

At least, until we recently started seeing these errors again for some requests.. This week one of my coworkers noted that this value changed in this PR. We had to update our nginx settings to match the new value as well.

If you run nginx > puma you may want to take a look at your nginx config too. Specifically, the values I know to check are proxy_buffers, proxy_buffer_size, proxy_busy_buffers_size, and large_client_header_buffers.

The large requests were typically requests that were fuzzing with giant query strings, but we also saw it with some out of control preload links headers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants