kubelet: fix log files being overwritten on container state loss#99748
Conversation
|
/sig node |
705eab5 to
13779f0
Compare
594d3c8 to
f505c92
Compare
|
/test pull-kubernetes-unit |
f505c92 to
8c0814e
Compare
|
/priority important-longterm |
There was a problem hiding this comment.
Please switch to structured log (InfoS with err and path)
8c0814e to
0ad5f6e
Compare
0ad5f6e to
d169c81
Compare
|
One thing I am thinking about: this will cause the kubelet to potentially retain more logs than it currently is doing, for crashlooping pods. There is a possibility that a crashing pod that's outputting a lot of garbage into its logs would add a lot of log volume to disk. How do we ensure this disk space is being accounted for by the pod and e.g. rotated or truncated as necessary? |
|
The kubelet has a max log files and max log file size it takes into account: --container-log-max-files int32 Default: 5 |
|
/retest |
ehashman
left a comment
There was a problem hiding this comment.
/lgtm
Will leave setting milestone to approvers. I think we should consider this bugfix for 1.21.
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mrunalp, rphillips The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
…9748-upstream-release-1.21 Automated cherry pick of #99748: check log directory for restartCount
What type of PR is this?
/kind bug
What this PR does / why we need it:
If the container runtime state is wiped after a reboot, we can figure out the restart count by looking at the log files, since the log files are named {restartCount}.log.
This fixes a situation where pods can overwrite previous log files, since a lost container state's restartCount will start at 0. We see this mostly with static pods, but pods that are not drained, normally daemonsets and replicaset pods, can exhibit this issue as well.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: