Speed up pull with larger buffers while writing to layer tempfile#48602
Speed up pull with larger buffers while writing to layer tempfile#48602akx wants to merge 1 commit intomoby:masterfrom
Conversation
Signed-off-by: Aarni Koskela <[email protected]>
|
(I'll take care of the lint error if there's some review activity on this 👍) |
|
Thanks for contributing! I brought this PR (and your other PR) up in the maintainers call for more people to have a look. There's also some discussion to look if there's possibly an issue / missing optimization in Go stdlib. (just to let you know this PR is not overlooked, just needs more eyes 🤗) |
| // `CopyBufferDirect` avoids a mis-optimization in | ||
| // `io.Copy`/`io.CopyBuffer` that would use the `ReadFrom` interface. | ||
| // For some reason, `ReadFrom` being used will cause short reads and writes. | ||
| _, err = utils.CopyBufferDirect(tmpFileBufWriter, io.TeeReader(reader, ld.verifier), buf) |
There was a problem hiding this comment.
there's a (likely misplaced!) copy of
io.copyBuffer, sans itsReadFromoptimization, vendored in.
Surprisingly, the use of the ReadFrom optimization is the root cause of the suboptimal performance! See https://go.dev/issue/16474. And as mentioned in that proposal description, there is another workaround which does not require the use of any modified stdlib functions.
_, err = pools.Copy(struct{io.Writer}{tmpFileBufWriter}, io.TeeReader(struct{io.Reader}{reader}, ld.verifier))| // Allocate rather large buffers to avoid small | ||
| // `write()` syscalls to the underlying temporary file. | ||
| var buf = make([]byte, 524288) | ||
| var tmpFileBufWriter = bufio.NewWriterSize(tmpFile, 524288) |
There was a problem hiding this comment.
512 kiB is a very large buffer -- sixteen times the default 32 kiB buffer size of our pools.Copy. How did you pick this value? I have my doubts that the increased memory pressure is worth the 2% speed bost. How much performance would we be leaving on the table if pools.Copy was used instead, with the default buffer size?
There was a problem hiding this comment.
I can try and run some better numbers tomorrow (it's 22 PM, I'm at home and don't have any sort of Go toolchain on this machine, etc. etc.), but my idea was that a layer generally is, say, 20 megs to multiple gigabytes, so a combined megabyte of buffers seems to be pretty peanuts. The size of the buffer could be made a tunable (envvar, maybe?) though?
Without an explicit buffer here, we (empirically) end up doing a lot of pretty small writes; io.Copy and pools.Copy just use the size of the temporary buffer as the maximum size of a single read, and that single read is then written to the writer - and if that writer is a File, it'll just do the write syscall.
There was a problem hiding this comment.
We have pooled bufio, too.
| // Allocate rather large buffers to avoid small | |
| // `write()` syscalls to the underlying temporary file. | |
| var buf = make([]byte, 524288) | |
| var tmpFileBufWriter = bufio.NewWriterSize(tmpFile, 524288) | |
| var tmpFileBufWriter = pools.BufioWriter32KPool.NewWriteCloserWrapper(pools.BufioWriter32KPool.Get(tmpFile), nil) | |
| defer tmpFileBufWriter.Close() |
There was a problem hiding this comment.
I'll have to dig a little deeper, I think...
I wrote a small standalone program that uses the same basic ideas as dockerd here – basically copies bytes from an URL opened with http.Get() to a local file with
- various copy buffer sizes
- various write buffer sizes
- the
io.Readercast/hack workaround from the other comment turned on or off
, and looking at the results I'm not really sure I'm measuring the right thing, because it looks like none of this matters in a statistically significant way. (Or I might be measuring the right thing, and none of this actually matters, and it's pretty much as optimum as it can be?)
There was a problem hiding this comment.
I think in this case the largest read size, regardless of buffer size, is going to be 16KB.
Having a buffered writer could improve things somewhat but perhaps not in a significant way. In this case you are probably only saving the syscall overhead of the write call.
|
We discussed this today and came to the conclusion that:
|
What I did
I investigated
dockerdwithstraceand noticed it was doing a lot of rather small (less than 16k) writes to theGetImageBlobtemporary files.How I did it
This PR improves upon that by allocating larger buffers for the copy-to-tempfile operation, so not as much time is spent in tiny little syscalls.
However, due to an apparent quirk in how golang's
io.Copyworks (see comments within), just using a buffered writer as the write target doesn't actually end up filling the buffered writer before it flushes to disk, so there's a (likely misplaced!) copy ofio.copyBuffer, sans itsReadFromoptimization, vendored in.How to verify it
Empirically, in a
limaLinux aarch64 virtual machine on my Macbook, this seems to improvehyperfine 'docker rmi -f quay.io/singularity/singularity:v4.1.0-arm64 && docker system prune -f && docker pull quay.io/singularity/singularity:v4.1.0-arm64'(image chosen because it's not small, and not local – pulling from alocalhost:5000registryregistry, or pulling a more trivial image, does not show as much improvement).The below seems to show a 2% improvement. I'd love for others to replicate (or prove wrong) these results!
docker-ce
5:27.3.1-1~ubuntu.24.04~nobleThis branch
Description for the changelog
A picture of a cute animal (not mandatory but encouraged)
A closeup of Saarinen, a cat. From the archives, vintage 2008.