Skip to content

Conversation

@joseluisq
Copy link
Collaborator

@joseluisq joseluisq commented Oct 16, 2022

Description

This PR makes file content and metadata read operations sync which increases performance and reduces resource utilization.

Synopsis

We know that the throughput is critical for file reading (content/metadata) operations on our static file module.
However, those particular operations were performed async, causing significant overhead due to the added extra costs of using the async runtime for them.
But, we have extremely little waiting (we essentially block on one task) for those file reading operations.
so we turned them into their sync counterparts, resulting in ~58% performance increase and ~10% (CPU) / ~52% (RAM) less resource utilization.

Notes

We are now using a static 4KB (read buffer size) for Unix-like since looks like the number is still the most common page size for most operating systems.
For Windows, we still keep the usage of 8KB since it looks like it's a balanced number.

We were inspired by the ideas of the weihanglo/sfz's FileStream implementation as analogous to our file_stream one.

Related Issue

Resolves #146

Motivation and Context

How Has This Been Tested?

Below are two 1min tests using the SWS defaults:

wrk --latency -t4 -c100 -d1m http://localhost

before (v2.13.0):

Running 1m test @ http://localhost
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.47ms    1.15ms  33.78ms   75.84%
    Req/Sec     4.58k   276.47     5.74k    74.96%
  Latency Distribution
     50%    5.47ms
     75%    6.06ms
     90%    6.73ms
     99%    8.49ms
  1094168 requests in 1.00m, 747.13MB read
Requests/sec:  18222.47
Transfer/sec:     12.44MB

after:

Running 1m test @ http://localhost
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.29ms    0.91ms  19.52ms   71.84%
    Req/Sec    10.97k   628.29    13.89k    86.83%
  Latency Distribution
     50%    2.25ms
     75%    2.82ms
     90%    3.40ms
     99%    4.73ms
  2620459 requests in 1.00m, 1.75GB read
Requests/sec:  43648.84
Transfer/sec:     29.80MB

Screenshots (if appropriate):

No

we know that the throughput is critical for file
reading (content/metadata) operations of our static file module.
however, those operations were performed async, causing significant
overhead due to the added extra costs of using the async runtime.

we have extremely little waiting (we essentially block on one task)
for those file reading operations.
so we turned them into their sync variants which resulted into
~58% performance increase and ~10% CPU / ~52% RAM less utilization.

below two 1min tests using the sws defaults:

$ wrk --latency -t4 -c100 -d1m http://localhost

before (v2.13.0):

Running 1m test @ http://localhost
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.47ms    1.15ms  33.78ms   75.84%
    Req/Sec     4.58k   276.47     5.74k    74.96%
  Latency Distribution
     50%    5.47ms
     75%    6.06ms
     90%    6.73ms
     99%    8.49ms
  1094168 requests in 1.00m, 747.13MB read
Requests/sec:  18222.47
Transfer/sec:     12.44MB

after:

Running 1m test @ http://localhost
  4 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.29ms    0.91ms  19.52ms   71.84%
    Req/Sec    10.97k   628.29    13.89k    86.83%
  Latency Distribution
     50%    2.25ms
     75%    2.82ms
     90%    3.40ms
     99%    4.73ms
  2620459 requests in 1.00m, 1.75GB read
Requests/sec:  43648.84
Transfer/sec:     29.80MB

resolves #146
@joseluisq joseluisq added enhancement New feature or request v2 v2 release performance Related to server performance labels Oct 16, 2022
@joseluisq joseluisq self-assigned this Oct 16, 2022
@joseluisq joseluisq merged commit d1b72fd into master Oct 16, 2022
@bors bors bot deleted the sync-file-reading branch October 16, 2022 22:15
@bjornharrtell
Copy link

Very interesting! It suprises me quite a bit that rust async framework has such a large overhead.

@joseluisq
Copy link
Collaborator Author

joseluisq commented Oct 18, 2022

Very interesting! It suprises me quite a bit that rust async framework has such a large overhead.

As far as I can tell, it's not only because of Tokio runtime but the tokio::fs / async usage in particular for our io file reads since those operations really happen sync under the hood (blocking system calls), and using Tokio via tokio::fs adds extra overhead. That's why our old file_stream implementation was definitely involved (as removed in the PR).

But here is a remarkable explanation from Tokio website talking about Linux io_uring support.

All tokio-uring operations are truly async, unlike APIs provided by tokio::fs, which run on a thread pool. Using synchronous filesystem operations from a thread pool adds significant overhead. With io-uring, we can perform both network and file system operations asynchronously from the same thread. But, io-uring is a lot more.
https://tokio.rs/blog/2021-07-tokio-uring

By the way, I have some ideas in mind to add experimental support for really asynchronous (non-blocking) file read operations using io_uring in Linux via the tokio-uring crate soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request performance Related to server performance v2 v2 release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Much slower than sfz

2 participants