Skip to content

Fix windows log file rotation with readers#41100

Merged
tianon merged 4 commits intomoby:masterfrom
cpuguy83:windows_fix_rename_open_file
Nov 5, 2020
Merged

Fix windows log file rotation with readers#41100
tianon merged 4 commits intomoby:masterfrom
cpuguy83:windows_fix_rename_open_file

Conversation

@cpuguy83
Copy link
Copy Markdown
Member

@cpuguy83 cpuguy83 commented Jun 12, 2020

This fixes the case where log rotation fails on Windows while there are
clients reading container logs.

Fixes #40999

@thaJeztah
Copy link
Copy Markdown
Member

Seeing some failures on Windows RS5;

=== RUN   TestReadLog/tail_exact
    --- FAIL: TestReadLog/tail_exact (0.00s)
        local_test.go:146: assertion failed: error is not nil: error opening rotated log file: error opening file: The system cannot find the file specified.

=== RUN   TestReadLog/tail_less_than_available
    --- FAIL: TestReadLog/tail_less_than_available (0.00s)
        local_test.go:156: assertion failed: error is not nil: error opening rotated log file: error opening file: The system cannot find the file specified.

=== RUN   TestReadLog/tail_more_than_available
    --- FAIL: TestReadLog/tail_more_than_available (0.00s)
        local_test.go:164: assertion failed: error is not nil: error opening rotated log file: error opening file: The system cannot find the file specified.

@cpuguy83
Copy link
Copy Markdown
Member Author

I see the problem. It's an error handling issue with openFile

@cpuguy83 cpuguy83 force-pushed the windows_fix_rename_open_file branch from 5ae6d1c to aa8a4e8 Compare June 15, 2020 23:38
@cpuguy83
Copy link
Copy Markdown
Member Author

Should be fixed now.

@thaJeztah
Copy link
Copy Markdown
Member

Hmm.. one failure remaining on RS5;

=== RUN   TestCheckCapacityAndRotate
--- FAIL: TestCheckCapacityAndRotate (0.01s)
    logfile_test.go:294: assertion failed: error is not nil: rename C:\Users\ContainerAdministrator\AppData\Local\Temp\TestCheckCapacityAndRotate612011099\log214931454 C:\Users\ContainerAdministrator\AppData\Local\Temp\TestCheckCapacityAndRotate612011099\log214931454.1: Access is denied.

@cpuguy83
Copy link
Copy Markdown
Member Author

I do not know why this is failing now... It didn't fail until I changed to not wrap the error from open...

@cpuguy83 cpuguy83 force-pushed the windows_fix_rename_open_file branch from aa8a4e8 to fdeba53 Compare July 14, 2020 19:15
@cpuguy83
Copy link
Copy Markdown
Member Author

Ok, so I had to change the this to evict log readers when there is an error rotating files.
I found that in this scenario:

  1. Client opens log stream
  2. Rotation occurs
  3. Rotation occurs again

On step 3, the rotation will fail when it is trying to rename the current file.
It will either get an "Access is denied" error, or "file is being used by another process" error.

So on step 3, when an error occurs, it will evict readers and try again (a few times until successful || max retries).

The thinking here is allowing the container to make progress is more important than the log reader reading the logs.
We can't really seamlessly transfer the log reader without dropping logs anyway, so take the simpler route and have the client reconnect.
We pass the eviction error along to the client.

Without this change, log rotation is essentially broken on Windows if there are active log consumers.

@cpuguy83 cpuguy83 force-pushed the windows_fix_rename_open_file branch 4 times, most recently from dedf05d to dd491c4 Compare July 15, 2020 00:16
@cpuguy83
Copy link
Copy Markdown
Member Author

Added a 2nd commit to fix some race conditions in the loggerutils tests.
Additionally, I think I fixed the flakey test. See commit for details.

@cpuguy83 cpuguy83 force-pushed the windows_fix_rename_open_file branch from dd491c4 to 0c548d7 Compare July 15, 2020 00:18
@cpuguy83 cpuguy83 requested a review from kolyshkin July 15, 2020 16:13
@thaJeztah
Copy link
Copy Markdown
Member

@kevpar @kolyshkin ptal

@cpuguy83 cpuguy83 force-pushed the windows_fix_rename_open_file branch from 0c548d7 to 0a1613c Compare July 15, 2020 18:08
@cpuguy83
Copy link
Copy Markdown
Member Author

And split up commits for easier review.

@cpuguy83 cpuguy83 force-pushed the windows_fix_rename_open_file branch 2 times, most recently from edc0b7d to df66c91 Compare July 15, 2020 19:24
@cpuguy83
Copy link
Copy Markdown
Member Author

All green.

@thaJeztah
Copy link
Copy Markdown
Member

@kevpar @kolyshkin ptal

Comment thread daemon/logger/loggerutils/logfile.go Outdated
Comment thread daemon/logger/loggerutils/logfile.go Outdated
@cpuguy83 cpuguy83 force-pushed the windows_fix_rename_open_file branch from df66c91 to 3366557 Compare August 20, 2020 19:14
@thaJeztah
Copy link
Copy Markdown
Member

@kolyshkin nits are addressed; PTAL

@thaJeztah
Copy link
Copy Markdown
Member

@kevpar @kolyshkin @tonistiigi ptal

@kevpar
Copy link
Copy Markdown
Contributor

kevpar commented Aug 31, 2020

I don't have much context on this code, but the issue I'm aware of should be fixed by switching to do file opens with FILE_SHARE_DELETE. It looks like this PR consistently uses the open/openFile wrappers which do so, so LGTM.

@thaJeztah thaJeztah added this to the 20.10.0 milestone Oct 20, 2020
Comment thread daemon/logger/loggerutils/logfile.go Outdated
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall seems good; left one comment/suggestion (as discussed)

Found by running with `go test -race`

Signed-off-by: Brian Goff <[email protected]>
- Ignore some pointless errors (like not exist on remove)
- Consolidate error handling/logging
- Fix race condition reading last log timestamp in the compression
  goroutine. This needs to be done while holding the write lock, which
  is not (or may not be) locked while compressing a rotated log file.
- Remove some indentation and consolidate mutex unlocking in
  `compressFile`

Signed-off-by: Brian Goff <[email protected]>
This makes sure, on Windows, that all files are opened with
FILE_SHARE_DELETE.

On non-Windows this just calls the same `os.Open()`.

Signed-off-by: Brian Goff <[email protected]>
@cpuguy83 cpuguy83 force-pushed the windows_fix_rename_open_file branch from 3366557 to 6d65088 Compare October 28, 2020 20:43
This fixes the case where log rotation fails on Windows while there are
clients reading container logs.

Evicts readers if there is an error during rotation and try rotation again.
This is needed for Windows with this scenario:

1. `docker logs -f` is called
2. Log rotation occurs (log.txt -> log.txt.1, truncate and re-open
   log.txt)
3. Log rotation occurs again (rm log.txt.1, log.txt -> log.txt.1)

On step 3, before this change, the log rotation will fail with `Access
is denied`.
In this case, what we have is a reader holding a file handle to the
primary log file. The log file is then rotated, but the reader still has
a the handle open. `FILE_SHARE_DELETE` allows this to happen... but then
we try to do it again for the next rotation and it blows up.
So when it blows up we force all the readers to disconnect, close the
log file, and try rotation again, which will succeed based on the added
tests.

Signed-off-by: Brian Goff <[email protected]>
@cpuguy83 cpuguy83 force-pushed the windows_fix_rename_open_file branch from 6d65088 to b102d46 Compare October 29, 2020 18:39
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if green

@cpuguy83
Copy link
Copy Markdown
Member Author

All green!

Copy link
Copy Markdown
Contributor

@StefanScherer StefanScherer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

} else {
var renameErr error
for i := 0; i < 10; i++ {
if renameErr = os.Rename(fname, fname+".1"); renameErr != nil && !os.IsNotExist(renameErr) {
Copy link
Copy Markdown
Member

@tianon tianon Nov 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fname.1 already exists, this will potentially be the same issue, right? Should this use i or i+1 instead?
(Sorry I'm not more familiar with this code to know whether this even makes sense 🙈)

Discussed on maintainer call, this isn't relevant. 👍

@tianon tianon merged commit 761066f into moby:master Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows log rotation errors with active readers

7 participants