Fix windows log file rotation with readers#41100
Conversation
|
Seeing some failures on Windows RS5; |
|
I see the problem. It's an error handling issue with |
5ae6d1c to
aa8a4e8
Compare
|
Should be fixed now. |
|
Hmm.. one failure remaining on RS5; |
|
I do not know why this is failing now... It didn't fail until I changed to not wrap the error from open... |
aa8a4e8 to
fdeba53
Compare
|
Ok, so I had to change the this to evict log readers when there is an error rotating files.
On step 3, the rotation will fail when it is trying to rename the current file. So on step 3, when an error occurs, it will evict readers and try again (a few times until successful || max retries). The thinking here is allowing the container to make progress is more important than the log reader reading the logs. Without this change, log rotation is essentially broken on Windows if there are active log consumers. |
dedf05d to
dd491c4
Compare
|
Added a 2nd commit to fix some race conditions in the loggerutils tests. |
dd491c4 to
0c548d7
Compare
|
@kevpar @kolyshkin ptal |
0c548d7 to
0a1613c
Compare
|
And split up commits for easier review. |
edc0b7d to
df66c91
Compare
|
All green. |
|
@kevpar @kolyshkin ptal |
df66c91 to
3366557
Compare
|
@kolyshkin nits are addressed; PTAL |
|
@kevpar @kolyshkin @tonistiigi ptal |
|
I don't have much context on this code, but the issue I'm aware of should be fixed by switching to do file opens with |
thaJeztah
left a comment
There was a problem hiding this comment.
overall seems good; left one comment/suggestion (as discussed)
Found by running with `go test -race` Signed-off-by: Brian Goff <[email protected]>
- Ignore some pointless errors (like not exist on remove) - Consolidate error handling/logging - Fix race condition reading last log timestamp in the compression goroutine. This needs to be done while holding the write lock, which is not (or may not be) locked while compressing a rotated log file. - Remove some indentation and consolidate mutex unlocking in `compressFile` Signed-off-by: Brian Goff <[email protected]>
This makes sure, on Windows, that all files are opened with FILE_SHARE_DELETE. On non-Windows this just calls the same `os.Open()`. Signed-off-by: Brian Goff <[email protected]>
3366557 to
6d65088
Compare
This fixes the case where log rotation fails on Windows while there are clients reading container logs. Evicts readers if there is an error during rotation and try rotation again. This is needed for Windows with this scenario: 1. `docker logs -f` is called 2. Log rotation occurs (log.txt -> log.txt.1, truncate and re-open log.txt) 3. Log rotation occurs again (rm log.txt.1, log.txt -> log.txt.1) On step 3, before this change, the log rotation will fail with `Access is denied`. In this case, what we have is a reader holding a file handle to the primary log file. The log file is then rotated, but the reader still has a the handle open. `FILE_SHARE_DELETE` allows this to happen... but then we try to do it again for the next rotation and it blows up. So when it blows up we force all the readers to disconnect, close the log file, and try rotation again, which will succeed based on the added tests. Signed-off-by: Brian Goff <[email protected]>
6d65088 to
b102d46
Compare
|
All green! |
| } else { | ||
| var renameErr error | ||
| for i := 0; i < 10; i++ { | ||
| if renameErr = os.Rename(fname, fname+".1"); renameErr != nil && !os.IsNotExist(renameErr) { |
There was a problem hiding this comment.
If fname.1 already exists, this will potentially be the same issue, right? Should this use i or i+1 instead?
(Sorry I'm not more familiar with this code to know whether this even makes sense 🙈)
Discussed on maintainer call, this isn't relevant. 👍
This fixes the case where log rotation fails on Windows while there are
clients reading container logs.
Fixes #40999