-
Notifications
You must be signed in to change notification settings - Fork 42k
kubelet: do not call RemoveAll on volumes directory for orphaned pods #102576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @dobsonj. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test all |
|
/retest |
pkg/util/removeall/removeall.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would probably move fd.Close in a defer after opening the file and and assumning no errors were thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a deliberate decision in the original implementation. The comment right above that (line 96) says:
// Close directory, because windows won't remove opened directory.
if we were to defer instead, the fd would be closed after the remove() call, which could introduce issues on Windows systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah okay.
gnufied
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can add some unit tests for this?
Yes, I added one new test case for kubelet_volumes (where |
|
/test all |
|
/lgtm |
|
I'll defer to node/storage approvers on this one. Once this has approval from them, I can ack if needed to get the new OWNERS file in place. |
|
/approve |
ehashman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with this code but the description, test, and implementation in pkg/kubelet/kubelet_volumes.go LGTM with SIG Storage's 👍
pkg/kubelet/kubelet_volumes.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error condition looks good. See also https://github.com/kubernetes/community/blob/master/contributors/devel/sig-instrumentation/migration-to-structured-logging.md#change-log-functions-to-structured-equivalent for there not being "WarningS" equivalents.
|
/approve |
1 similar comment
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dobsonj, gnufied, liggitt, mrunalp The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Review the full test history for this PR. Silence the bot with an |
1 similar comment
|
/retest Review the full test history for this PR. Silence the bot with an |
…576-upstream-release-1.20 Automated cherry pick of #102576: kubelet: do not call RemoveAll on volumes directory for
…576-upstream-release-1.21 Automated cherry pick of #102576: kubelet: do not call RemoveAll on volumes directory for
…576-upstream-release-1.19 Automated cherry pick of #102576: kubelet: do not call RemoveAll on volumes directory for
What type of PR is this?
/kind bug
What this PR does / why we need it:
See issue #101911 for the full story. It is possible in some situations for the volume of a terminating pod to be added back to the DSW after being removed, because pods are currently added and removed based on two different caches. As a result, the orphaned volume cleanup code can call RemoveAll on the pod dir, which deletes vol_data.json, and causes NodeUnstageVolume to fail when attempting to unmount the volume.
To avoid this issue, this PR instead uses
rmdiron the volumes dir during orphaned volume cleanup which will fail if any files are left in the volumes directory. The other subpaths under the pod directory CAN have files that need to be removed during orphan cleanup, so those continue to use RemoveAll.Here is an example pod directory that needs to be cleaned up:
This PR recursively calls
rmdiron the volumes directory which will be successful only if there are no files or mounts left behind. Then it calls RemoveAll on the other subpaths (containers, etc-hosts, plugins). And if both of those are successful, it callsrmdiron the pod directory itself.Which issue(s) this PR fixes:
Fixes #101911
Special notes for your reviewer:
cc: @msau42 @gnufied @jingxu97
Does this PR introduce a user-facing change?