Fix slow WASI stdin reads by passing size hint to worker thread#13256
Conversation
alexcrichton
left a comment
There was a problem hiding this comment.
Thanks! One small comment but otherwise looks good to me 👍
| // Extract the size hint from the request and cap it to `MAX_READ_SIZE_ALLOC` | ||
| // to avoid guest-controlled unbounded allocation. | ||
| let size_hint = match *lock { | ||
| StdinState::ReadRequested(size) => size.min(MAX_READ_SIZE_ALLOC).max(1024), |
There was a problem hiding this comment.
I think it's fine to remove this max(1024) call which I think is just a holdover from before? It seems reasonable to me that the size is clamped but otherwise un-tampered with.
| // Extract the size hint from the request and cap it to `MAX_READ_SIZE_ALLOC` | ||
| // to avoid guest-controlled unbounded allocation. | ||
| let size_hint = match *lock { | ||
| StdinState::ReadRequested(size) => size.min(MAX_READ_SIZE_ALLOC).max(1024), |
There was a problem hiding this comment.
I think it's fine to remove this max(1024) call which I think is just a holdover from before? It seems reasonable to me that the size is clamped but otherwise un-tampered with.
alexcrichton
left a comment
There was a problem hiding this comment.
Thanks! One small comment but otherwise looks good to me 👍
alexcrichton
left a comment
There was a problem hiding this comment.
Thanks! One small comment but otherwise looks good to me 👍
|
Thanks for the suggestion! You're right that However, removing I've addressed it like this:
This removes the arbitrary 1024 floor while guarding against the edge case. |
|
Thanks! |
While working on a program that reads a large amount of data through stdin, I was surprised by slow stdin throughput under wasmtime. Piping 1 GiB into a simple WASI program that reads in 64 KiB chunks took about 38 seconds (~28 MiB/s). I expected something in the gigabytes-per-second order of magnitude.
To demonstrate the issue, below is a minimal WASI program that reads stdin in configurable chunks.
Piping 1 GiB of zeroes and attempting to read in 65536 bytes chunks:
Benchmark application code:
Root cause
Regardless of how many bytes were requested (e.g. 65536) the worker thread would read at most 1024 bytes (hard-coded number) per round-trip, which results in slow performance.
See worker_thread_stdin.rs:108.
Fix
StdinState::ReadRequestednow has ausizesize hint from the caller. The worker thread uses this hint to size its read buffer, clamped to[1, MAX_READ_SIZE_ALLOC]to avoid guest-controlled unbounded allocation while still enabling efficient bulk reads.Callers now store a size hint when transitioning to
ReadRequested;Pollable::readyusesMAX_READ_SIZE_ALLOCbecause it does not know the size of the following read.When reading 1 GiB in 64 KiB chunks, I now get: