daemon/logger: Share buffers by sync.Pool#43650
Conversation
| writer: writer, | ||
| tag: tag, | ||
| extra: extra, | ||
| buffersPool: makePool(), |
There was a problem hiding this comment.
Wondering if it would make sense to make the pool global for the whole driver instead of per the logger instance?
There was a problem hiding this comment.
Yes, a global pool for the whole driver is the most efficient way to use it.
| writeLen := protoSize + (2 * encodeBinaryLen) // + len(messageDelimiter) | ||
|
|
||
| buf := *buffer | ||
| if writeLen > cap(buf) { |
There was a problem hiding this comment.
I changed the len(buf) to cap(buf) to stop considering buffers with smaller len, but enough capacity, as too small for the message.
I think the whole logic tries to reimplement what the bytes.Buffer does... I wonder if this should be rewritten to use bytes.Buffer instead. However it would be a bit hacky, because it would require to write directly to the byte slice returned by buffer.Bytes() and then return the buffer.Bytes()[:writeLen] to the caller.
Also I don't really like *[]byte in sync.Pool, but maybe it's fine?
There was a problem hiding this comment.
I changed the len(buf) to cap(buf) to stop considering buffers with smaller len, but enough capacity, as too small for the message.
👍 Good catch!
I think the whole logic tries to reimplement what the bytes.Buffer does... I wonder if this should be rewritten to use bytes.Buffer instead. However it would be a bit hacky, because it would require to write directly to the byte slice returned by
buffer.Bytes()and then return thebuffer.Bytes()[:writeLen]to the caller.
I came to the same conclusion as well; trying to use a bytes.Buffer would be too hacky to be worth it.
Also I don't really like
*[]bytein sync.Pool, but maybe it's fine?
I don't like it either, but it is the most performant way to use a Pool. As mentioned in the example in the godoc:
// The Pool's New function should generally only return pointer
// types, since a pointer can be put into the return interface
// value without an allocationA slice value is wider than a pointer so putting it directly into the pool would cost the very heap allocation we wanted to avoid. Getting and putting *[]byte values ensures that the heap allocation is only paid once, when the slice is first allocated, no matter how many times it is reused.
| } | ||
| return &driver{ | ||
| logfile: lf, | ||
| buffersPool: sync.Pool{New: func() any { |
There was a problem hiding this comment.
Same as in the jsonfilelog. Maybe the pool could be global?
| // Log converts logger.Message to jsonlog.JSONLog and serializes it to file. | ||
| func (l *JSONFileLogger) Log(msg *logger.Message) error { | ||
| return l.writer.WriteLogEntry(msg) | ||
| defer logger.PutMessage(msg) |
There was a problem hiding this comment.
I wanted to move the PutMessage up to be the responsibility of the caller. But a lot of tests rely on this being done inside Log, so left it to keep the changes small. Maybe could be done in a follow up?
There was a problem hiding this comment.
🤔 I'm of two minds on this. One one hand, there are more logging drivers than consumers of the logger.Logger interface so pushing responsibility of putting the message back into the pool over to the consumer would be DRYer. On the other hand, the existing architecture has "move" semantics, which gives the log drivers great flexibility in handling incoming log messages as the message's lifetime does not need to be bounded by the Log call. Given that forgetting to put the message back into the pool would only result in a performance regression, not a memory leak (the struct will be GC'ed like normal) I think the flexibility afforded to the drivers by the move semantics is worth the cost of keeping the drivers responsible for returning the messages back to the pool when they're done.
506ba72 to
cb72814
Compare
Marshalling log messages by json-file and local drivers involved serializing the message into a shared buffer. This caused a regression resulting in log corruption with recent changes where Log may be called from multiple goroutines at the same time. Solution is to use a sync.Pool to manage the buffers used for the serialization. Also removed the MarshalFunc, which the driver had to expose to the LogFile so that it can marshal the message. This is now moved entirely to the driver. Signed-off-by: Paweł Gronowski <[email protected]>
|
I rebased the PR which adds the tests for ContainerLogs on this branch: #43642 The output doesn't match, because there are some garbage strings between the lines (25h and 25l). |
cb72814 to
7493342
Compare
corhere
left a comment
There was a problem hiding this comment.
LGTM!
I see nothing that couldn't be addressed in a follow-up PR, my review comments included.
| writer: writer, | ||
| tag: tag, | ||
| extra: extra, | ||
| buffersPool: makePool(), |
There was a problem hiding this comment.
Yes, a global pool for the whole driver is the most efficient way to use it.
| // Log converts logger.Message to jsonlog.JSONLog and serializes it to file. | ||
| func (l *JSONFileLogger) Log(msg *logger.Message) error { | ||
| return l.writer.WriteLogEntry(msg) | ||
| defer logger.PutMessage(msg) |
There was a problem hiding this comment.
🤔 I'm of two minds on this. One one hand, there are more logging drivers than consumers of the logger.Logger interface so pushing responsibility of putting the message back into the pool over to the consumer would be DRYer. On the other hand, the existing architecture has "move" semantics, which gives the log drivers great flexibility in handling incoming log messages as the message's lifetime does not need to be bounded by the Log call. Given that forgetting to put the message back into the pool would only result in a performance regression, not a memory leak (the struct will be GC'ed like normal) I think the flexibility afforded to the drivers by the move semantics is worth the cost of keeping the drivers responsible for returning the messages back to the pool when they're done.
| writeLen := protoSize + (2 * encodeBinaryLen) // + len(messageDelimiter) | ||
|
|
||
| buf := *buffer | ||
| if writeLen > cap(buf) { |
There was a problem hiding this comment.
I changed the len(buf) to cap(buf) to stop considering buffers with smaller len, but enough capacity, as too small for the message.
👍 Good catch!
I think the whole logic tries to reimplement what the bytes.Buffer does... I wonder if this should be rewritten to use bytes.Buffer instead. However it would be a bit hacky, because it would require to write directly to the byte slice returned by
buffer.Bytes()and then return thebuffer.Bytes()[:writeLen]to the caller.
I came to the same conclusion as well; trying to use a bytes.Buffer would be too hacky to be worth it.
Also I don't really like
*[]bytein sync.Pool, but maybe it's fine?
I don't like it either, but it is the most performant way to use a Pool. As mentioned in the example in the godoc:
// The Pool's New function should generally only return pointer
// types, since a pointer can be put into the return interface
// value without an allocationA slice value is wider than a pointer so putting it directly into the pool would cost the very heap allocation we wanted to avoid. Getting and putting *[]byte values ensures that the heap allocation is only paid once, when the slice is first allocated, no matter how many times it is reused.
|
|
||
| buf := *buffer | ||
| if writeLen > cap(buf) { | ||
| buf = make([]byte, writeLen) |
There was a problem hiding this comment.
| buf = make([]byte, writeLen) | |
| buf = make([]byte, 1 << (bits.Len(writeLen) + 1)) |
If we need to pay the cost to reallocate the buffer, we should try to make it big enough to be reusable. Round up to the next power of two?
(Reusing slices with good performance is tricky to get right.)
|
|
||
| func (d *driver) Log(msg *logger.Message) error { | ||
| return d.logfile.WriteLogEntry(msg) | ||
| defer logger.PutMessage(msg) |
There was a problem hiding this comment.
Optimization opportunity: msg can be returned to the pool as soon as marshal returns, rather than deferring until after WriteLogEntry. Be mindful of that timestamp, though!
| // Every buffer will have to store the same constant json structure and the message | ||
| // len(`{"log":"","stream:"stdout","time":"2000-01-01T00:00:00.000000000Z"}\n`) = 68 | ||
| // So let's start with a buffer bigger than this | ||
| const initialBufSize = 128 |
There was a problem hiding this comment.
| const initialBufSize = 128 | |
| const initialBufSize = 256 |
60 chars for the log message after overhead seems a touch small to me. It wouldn't fit a full 80-column line of text without resizing. How about sizing it enough for an SMS?
|
FWIW; running a build of this PR on the internal test-pipeline of Docker Desktop; just kicked of a new run, but one test was failing with this: the Details |
|
@thaJeztah the error message Lines 137 to 138 in 69adaa8 The stdcopy encoder's constructor, stdcopy.NewStdWriter, takes the input header value as an argument. And that argument is specified as a constant in all call sites. Outside of memory corruption, I don't see a way for a log driver to corrupt the stdcopy log stream like that.
|
Yes, it's odd. Got another one of those I'm not very familiar with the tests in that repository, so I'll do some more digging as well ; this is roughly what the logs of that specific tests shows it's doing docker service create --detach=false --name service_logs alpine:3.14 /bin/sh -c echo Hello && sleep 100
docker service logs service_logsI do notice that
trying to run it manually; docker service create --detach=false --name service_logs alpine:3.14 /bin/sh -c 'echo Hello && sleep 100'
oymfv2ajy6mj9lkyidtatihyh
overall progress: 1 out of 1 tasks
1/1: running [==================================================>]
verify: Service converged
docker service logs service_logs
service_logs.1.zegk07hwjj5a@docker-desktop | Hello |
thaJeztah
left a comment
There was a problem hiding this comment.
LGTM
let's get this one in, and do further changes as a follow-up 👍
Marshalling log messages by json-file and local drivers involved
serializing the message into a shared buffer. This caused a regression
resulting in log corruption with recent changes where Log may be called
from multiple goroutines at the same time.
Solution is to use a sync.Pool to manage the buffers used for the
serialization. Also removed the MarshalFunc, which the driver had to
expose to the LogFile so that it can marshal the message. This is now
moved entirely to the driver.
- What I did
FIx the log regression described in #43647
- How I did it
Use sync.Pool for sharing the buffers between multiple goroutines.
Move execution of marshal code up to the driver
- How to verify it
docker logs -f $(docker run -d busybox sh -c 'echo stdout; echo stderr >&2')Run tests from #43642
- Description for the changelog
Fix log corruption
- A picture of a cute animal (not mandatory but encouraged)