Refactor nerdctl logs functionality to pkg/logging. #1468
Refactor nerdctl logs functionality to pkg/logging. #1468AkihiroSuda merged 3 commits intocontainerd:mainfrom
nerdctl logs functionality to pkg/logging. #1468Conversation
cmd/nerdctl/logs.go
Outdated
| if waitCh, err := task.Wait(ctx); err == nil { | ||
| <-waitCh | ||
| logrus.Debugf("container task has finished, sending kill event to log viewer") | ||
| stopChannel <- os.Interrupt |
There was a problem hiding this comment.
What happens when we restart a container ?
There was a problem hiding this comment.
Great catch, seeing as though a restart is just a sequential stop and start, this would kill the existing task and thus stop the logs -f command too. (just tested it myself and can confirm it stops)
We could simply requery the container's current task and status but I'm worried we could catch the container "in between restarts" and before it actually reaches Up status, but I'll try to figure something out...
There was a problem hiding this comment.
@fahedouch I've made some changes for working around a container restart on a separate branch but unfortunately the JSON file based logging rotates the logfile by moving $CONTAINER_ID-json.log to $CONTAINER_ID-json.log.1 every time a container is started, which leads to:
- the
tail-based implementation failing to re-open the new logfile, sotailkeeps feeding the old$CONTAINER_ID-json.log.1file which is no longer being written to until the container is finally stopped or exists - the native implementation will attempt re-open the file at the original path but preserve the file seek, which could lead to it seeking over the rotated logfile's fill
Although both of the above could be theoretically bypassed by sending a syscall.SIGHUP on the stopChannel and having the LogViewer refresh the log viewing, I don't consider the technical overhead to be worth it for something this edge-casey...
Please note I've also tested docker logs from Docker CE 20.10.18 and can report that:
docker logs -falso closes itself if the containerdocker restarted so in that sense the current behavior is correctdocker logsdoes not seem to rotate logfiles so us using logrotate is deviating a bit from how Docker stores logs
There was a problem hiding this comment.
@aznashwan thanks for digging into the logging logic.
Regarding your branch (changes for working around a container restart) . I think we can make it simpler by checking wether the container exists or not not (I am not a fun of sleep :) )
The principal action(s) that can be exerted on the container that we follow its logs are:
nerdctl stop=> do not trigger thestopChannelnerdctl stop=>nerdctl start=> do not trigger thestopChannelnerdctl restart=> do not triggerstopChannelnerdctl kill=> do not trigger thestopChannel, the container exists by no longer emitting logsnerdctl stop=>nerdctl rm=> do triggerstopChannel
==> we Interrupt only when the container is removed
I made a pseudo-code here :
go func() {
for {
//check if the container still exists
info, err := found.Container.Info(ctx)
if err != nil {
stopChannel <- os.Interrupt
return
}
continue
}
}
-
Regarding log file rotation, I am not sure why the
tailis reading the old log files despite passing the principale file to thelogrotate. Anyway it shouldn't write/read to old log files => we should investigate, it may be a bug -
Regarding docker behaviour, from my knowledge, we are compatible with docker regarding logs (plz see logging:
json-filefilenames are incompatible with Docker #1077)
There was a problem hiding this comment.
Regarding log file rotation, I confirm that after restart nerdctl writes in the principal log file.
benchmark of mod time after a restart of the container nerdctl restart 0e8a3fdbf637 :
root@e4276e97ebd3:/home/containerd/nerdctl# stat /var/lib/nerdctl/1935db59/containers/default/0e8a3fdbf637c61db53536d9ce4a6334726b432c5e1b6101fd71e9c48d7dc93f/0e8a3fdbf637c61db53536d9ce4a6334726b432c5e1b6101fd71e9c48d7dc93f-json.log
File: /var/lib/nerdctl/1935db59/containers/default/0e8a3fdbf637c61db53536d9ce4a6334726b432c5e1b6101fd71e9c48d7dc93f/0e8a3fdbf637c61db53536d9ce4a6334726b432c5e1b6101fd71e9c48d7dc93f-json.log
Size: 150083 Blocks: 296 IO Block: 4096 regular file
Device: 10000eh/1048590d Inode: 3708060 Links: 1
Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-11-01 11:18:47.922219013 +0000
Modify: 2022-11-01 11:17:45.048155011 +0000
Change: 2022-11-01 11:17:45.048155011 +0000
Birth: 2022-11-01 11:17:45.035155011 +0000
root@e4276e97ebd3:/home/containerd/nerdctl# stat /var/lib/nerdctl/1935db59/containers/default/0e8a3fdbf637c61db53536d9ce4a6334726b432c5e1b6101fd71e9c48d7dc93f/0e8a3fdbf637c61db53536d9ce4a6334726b432c5e1b6101fd71e9c48d7dc93f-json.log.1
File: /var/lib/nerdctl/1935db59/containers/default/0e8a3fdbf637c61db53536d9ce4a6334726b432c5e1b6101fd71e9c48d7dc93f/0e8a3fdbf637c61db53536d9ce4a6334726b432c5e1b6101fd71e9c48d7dc93f-json.log.1
Size: 150128 Blocks: 296 IO Block: 4096 regular file
Device: 10000eh/1048590d Inode: 3708071 Links: 1
Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-11-01 11:17:45.035552345 +0000
Modify: 2022-11-01 11:17:45.035552470 +0000
Change: 2022-11-01 11:17:45.034155011 +0000
Birth: 2022-11-01 11:12:35.079202007 +0000
mtime of e8a3fdbf637c61db53536d9ce4a6334726b432c5e1b6101fd71e9c48d7dc93f-json.log > mtime of e8a3fdbf637c61db53536d9ce4a6334726b432c5e1b6101fd71e9c48d7dc93f-json.log.1
There was a problem hiding this comment.
@fahedouch apologies for the confusion, I wasn't saying nerdctl is logging to the wrong file, I was saying that the tail executable will continue reading from the old file (which has been renamed to *-json.log.1) instead of the new file (*-json.log).
# NOTE: I've replaced the actual log file path with `$JSONFILE` in all output for readability
nashu@ubuntu:~/nerdctl$ nerdctl create c318242786b1 sh -euc 'echo sleeping && sleep 300 && echo awake'
24b26c08f8d5b8cad3aae8f4a6767c8e97f0f70a19e5e4062add8cdf3f208993
nashu@ubuntu:~/nerdctl$ nerdctl start 24b26c08f8d5b8cad3aae8f4a6767c8e97f0f70a19e5e4062add8cdf3f208993
24b26c08f8d5b8cad3aae8f4a6767c8e97f0f70a19e5e4062add8cdf3f208993
### started `nerdctl tail -f` in a separate tty
nashu@ubuntu:~/nerdctl$ ps aux | grep tail
root 121902 0.0 0.0 6220 1024 pts/3 S+ 13:32 0:00 tail -n +0 -f $JSONFILE
nashu 121904 0.0 0.0 7004 2132 pts/1 S+ 13:32 0:00 grep --color=auto tail
nashu@ubuntu:~/nerdctl$ sudo lsof -p 121902
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
[...]
### tail is reading from inode 1843300
tail 121902 root 3r REG 253,0 77 1843300 $JSONFILE
tail 121902 root 4r a_inode 0,14 0 13457 inotify
### Ran restart
nashu@ubuntu:~/nerdctl$ nerdctl restart 24b26c08f8d5b8cad3aae8f4a6767c8e97f0f70a19e5e4062add8cdf3f208993
24b26c08f8d5b8cad3aae8f4a6767c8e97f0f70a19e5e4062add8cdf3f208993
nashu@ubuntu:~/nerdctl$ sudo stat $JSONFILE $JSONFILE.1
File: $JSONFILE
Size: 79 Blocks: 8 IO Block: 4096 regular file
### New "*-json.log" is inode 1843320.
Device: fd00h/64768d Inode: 1843320 Links: 1
Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-11-01 13:33:39.971781653 +0000
Modify: 2022-11-01 13:33:39.971781653 +0000
Change: 2022-11-01 13:33:39.971781653 +0000
Birth: 2022-11-01 13:33:39.971781653 +0000
File: $JSONFILE.1
Size: 77 Blocks: 8 IO Block: 4096 regular file
### Old logfile with inode 1843300 now moved as "*-json.log.1" .
Device: fd00h/64768d Inode: 1843300 Links: 1
Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-11-01 13:33:39.976783904 +0000
Modify: 2022-11-01 13:33:39.976784004 +0000
Change: 2022-11-01 13:33:39.971781653 +0000
Birth: 2022-11-01 13:32:30.188465129 +0000
nashu@ubuntu:~/nerdctl$ ps aux | grep tail
root 121902 0.0 0.0 6220 1024 pts/3 S+ 13:32 0:00 tail -n +0 -f $JSONFILE
nashu 122104 0.0 0.0 7004 2156 pts/1 S+ 13:33 0:00 grep --color=auto tail
nashu@ubuntu:~/nerdctl$ sudo lsof -p 121902
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
### tail is still reading from 1843300 (which is now `*-json.log.1`, and no longer being written to)
tail 121902 root 3r REG 253,0 77 1843300 $JSONFILE.1
tail 121902 root 4r a_inode 0,14 0 13457 inotifyTo illustrate the problem using this test, which we would expect to output sleeping sleeping awake:
tailwill only displaysleeping, as the JSON logfile whichtailis reading from gets rotated during the restart as*-json.log.1, and nothing will ever get written to it again- the current native implementation will only print
sleeping awake, as it will first open*-json.log, read the firstsleepingline, then re-open the new*-json.logbut seek it to where it left off, so it will skip the firstsleepingline in the new logfile (or crash if the restarted container didn't produce enough output past the seek point yet) docker logswould simply exit the second the container restarts, only printing the firstsleeping
While it would probably be possible to use the --retry or -F flags when executing tail, those could lead to some undefined behaviours since they'll keep running even if the logfile doesn't exist yet and even if it is deleted.
Either way, the native implementation will need to have a "refresh mechanism" added (likely by sending os.SIGHUP through the stopChannel as I mentioned previously), but it'll also need to check that it doesn't seek too far.
Although none of the above are that involved, I honestly believe it'd add more code overhead than it's worth for such an edge-casey situation, but please let me know which approach you think is best.
There was a problem hiding this comment.
I think we can make it simpler by checking wether the container exists or not
The principal action(s) that can be exerted on the container that we follow its logs are:
nerdctl stop => do not trigger the stopChannel
nerdctl stop => nerdctl start => do not trigger the stopChannel
nerdctl restart => do not trigger stopChannel
nerdctl kill => do not trigger the stopChannel, the container exists by no longer emitting logs
nerdctl stop => nerdctl rm => do trigger stopChannel
==> we Interrupt only when the container is removed
If we were to solely rely on the container's existence, it would make nerdctl logs -f on a running container never stop even if the container stops. (neither by it stopping by itself or being stopped by another command)
As mentioned, docker logs -f exits if the container stops, and there's also a check which looks like it was meant to explicitly test that logs -f exits when a container does, so it looks we'll need to check the container's running status one way or the other.
@fahedouch apologies for the confusion, I wasn't saying nerdctl is logging to the wrong file, I was saying that the tail executable will continue reading from the old file (which has been renamed to -json.log.1) instead of the new file (-json.log).
Although none of the above are that involved, I honestly believe it'd add more code overhead than it's worth for such an edge-casey situation, but please let me know which approach you think is best.
The question of whether or not these workarounds are worth the risk of a blocking/failing nerdctl logs -f command still stands. (IMHO it adds too much overhead for a behaviour docker logs -f itself doesn't do)
There was a problem hiding this comment.
If we were to solely rely on the container's existence, it would make nerdctl logs -f on a running container never stop even if the container stops. (neither by it stopping by itself or being stopped by another command)
As mentioned, docker logs -f exits if the container stops, and there's also a check which looks like it was meant to explicitly test that logs -f exits when a container does, so it looks we'll need to check the container's running status one way or the other.
@aznashwan Agree apologies for the confusion. Yes to be compatible with docker we should consider that following logs is conditioned by container state. so I am ok with this code.
I got your point regarding tail. We have delegated the read responsibility to tail. It looks like it has a limitation when the file inode number changes. I believe we can put it aside for investigation in another PR may be. let's keep the actual behavior for the moment :)
There was a problem hiding this comment.
Yes to be compatible with docker we should consider that following logs is conditioned by container state. so I am ok with this code.
@fahedouch unfortunately that was the commit which initially lead me to discover the issues caused by the log file being rotated with both tail and the native implementation.
I believe we can put it aside for investigation in another PR may be. let's keep the actual behavior for the moment :)
In this sense, I have "downgraded" the logic to immediately send the stop signal when the current container task finishes as is the case in the current main branch.
8c0b8af to
6fb400c
Compare
|
@fahedouch thanks a lot for the review, your older PR (#1115) was a partial inspiration for it! One thing worth noting in this PR is this commit which actually enables the It depends on containerd/containerd#7351, without which the Is an update to the containerd ref in go.mod planned? |
I beleive you can point |
This was mainly done to include containerd#7351 for enabling JSON file based logging on Windows. Please see discussion at: containerd#1468 (comment) Signed-off-by: Nashwan Azhari <[email protected]>
71e60d3 to
9f6b45f
Compare
This was mainly done to include containerd#7351 for enabling JSON file based logging on Windows. Please see discussion at: containerd#1468 (comment) Signed-off-by: Nashwan Azhari <[email protected]>
9f6b45f to
5b5a18d
Compare
@fahedouch just pushed all revisions including the containerd dependency update. There seems to be a test failure on this run on containerd 1.6.8 on 20.04 in TestExecWithUser but it seems to be pretty flaky since it worked on 20.04 3 other consecutive runs I tested (here, here, and here), and other tests seemed generally flaky overall (most notably TestRunWithJsonFileLogDriver which randomly failed 2/3 runs, and have seen it fail on apparently unrelated PRs). Considering that one of the test runs was green on al platform/containerd combos, I'd suspect the issues are external to |
| // Following the merging of the below, any localhost/loopback registries will | ||
| // get automatically downgraded to HTTP so this will still succceed: | ||
| // https://github.com/containerd/containerd/pull/7393 | ||
| shouldSuccess: true, |
There was a problem hiding this comment.
Unexpectedly, I think the PR7393 is not included in v1.6.8 which version is dependent by nerdctl
There was a problem hiding this comment.
In that case, I've opened a backport request for the LogURIGenerator only and will revert this test and update go.mod once it gets in.
Thank you both!
There was a problem hiding this comment.
I'm not sure I get the point. go.mod in nerdctl should point to v1.7.0-beta.0 (or later), not v1.6.x.
There was a problem hiding this comment.
@aznashwan nerdctl depends on other containerd commits not present on v1.6.x;
git tag -l --contains 3df7674058301f290fc148719d483e47f4118494
v1.7.0-beta.0
We should keep pointing to main until v1.17.x I guess.
@AkihiroSuda what do you mean by expectation here ?
There was a problem hiding this comment.
I'm not sure I get the point. go.mod in nerdctl should point to v1.7.0-beta.0 (or later), not v1.6.x.
My bad, I got confused by the version string currently in go.mod saying 1.6.1.
It looks like this is the containerd commit currently referenced in go.mod, which would put it on early August around the v1.6.8 area.
Unfortunately the LogURIGenerator fixes we depend on are not included in v1.6.9 either so I guess the only option is switching to v1.7.0-beta.0, and keeping the changes to these local registry tests.
Please lemme know if v1.7.0-beta.0 would be okay so I can update the PR and re-run the test suite a couple more times.
@aznashwan hard to live without FlakyTest :p . We have an opened ticket for |
pkg/logging/jsonfile/jsonfile.go
Outdated
| if err != nil { | ||
| return err | ||
| } | ||
| if !e.Time.Before(time.Unix(i, 0)) { |
|
|
||
| func init() { | ||
| RegisterLogViewer("json-file", viewLogsJSONFile) | ||
| RegisterLogViewer("journald", viewLogsJournald) |
There was a problem hiding this comment.
what about fluentd still supported ?
There was a problem hiding this comment.
As of the making of this patch, it doesn't look like viewing logs from fluentd is supported in nerdctl logs.
Specifying nerdctl run --log-driver fluend seems to be supported, as it's even tested, but the tests read the logs from the fluentd files directly instead of validating the output of nerdctl logs like the other logging-related tests.
It's probably a bit late for me to mention this, but my immediate goal (before I saw this TODO) was getting containerd logging and nerdctl logs to work on Windows, so adding log viewing for fluentd/syslog would be even more beyond my initial scope 😆
I had considered the need for adding fluentd and syslog log viewers down the line which is why I tried to make the logging viewers as pluggable and extensible as possible. (hypothetically all the logging options needed to read the logs for a given container will be saved in the logging config on container creation and loaded in LogViewer.loggingConfig so the LogViewer can go from there...)
This was mainly done to include containerd#7351 for enabling JSON file based logging on Windows. Please see discussion at: containerd#1468 (comment) Signed-off-by: Nashwan Azhari <[email protected]>
This was mainly done to include containerd#7351 for enabling JSON file based logging on Windows. Please see discussion at: containerd#1468 (comment) Signed-off-by: Nashwan Azhari <[email protected]>
5b5a18d to
dea9243
Compare
This was mainly done to include containerd#7351 for enabling JSON file based logging on Windows. Please see discussion at: containerd#1468 (comment) Signed-off-by: Nashwan Azhari <[email protected]>
dea9243 to
3363128
Compare
|
@fahedouch @AkihiroSuda just changed go.mod to point to LE: re-ran the tests and the above passed but TestCopyToContainer and TestComposeUpWithBypass4netns failed on 22.04 this time. |
|
Thanks |
|
Sorry needs rebase |
go.mod
Outdated
| // containerd main | ||
| github.com/containerd/containerd => github.com/containerd/containerd v1.6.1-0.20220807041520-6deb1174fbfc | ||
| // Please see: https://github.com/containerd/nerdctl/pull/1468#discussion_r1014006376 | ||
| github.com/containerd/containerd => github.com/containerd/containerd v1.7.0-beta.0 |
There was a problem hiding this comment.
Probably no longer need to use replace()
This was mainly done to include containerd#7351 for enabling JSON file based logging on Windows. Please see discussion at: containerd#1468 (comment) Signed-off-by: Nashwan Azhari <[email protected]>
Signed-off-by: Nashwan Azhari <[email protected]>
This patch refactors log-reading-related logic used by `nerdctl logs` into the pkg/logging module. Additionally, checks and workarounds for the `tail` executable not being available on the system have also been implemented, as it was previously always called for jsonfile-based container logs. Signed-off-by: Nashwan Azhari <[email protected]>
3363128 to
48e9c0f
Compare
This was mainly done to include containerd#7351 for enabling JSON file based logging on Windows. Please see discussion at: containerd#1468 (comment) Signed-off-by: Nashwan Azhari <[email protected]>
NOTE: requires containerd/containerd#7351
This patch refactors log-reading-related logic used by
nerdctl logsinto the pkg/logging module.
Additionally, checks and workarounds for the
tailexecutable notavailable on the system have also been implemented, as it was previously
always called for jsonfile-based container logs.
Signed-off-by: Nashwan Azhari [email protected]