Skip to content

Conversation

@msau42
Copy link
Member

@msau42 msau42 commented Dec 22, 2018

What type of PR is this?
/kind bug

What this PR does / why we need it:
Check for existence of volume-subpaths directory before cleaning up the pod directory

Which issue(s) this PR fixes:

Fixes #72257

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Fixes issue where subpath volume content was deleted during orphaned pod cleanup for Local volumes that are directories (and not mount points) on the root filesystem.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 22, 2018
@msau42
Copy link
Member Author

msau42 commented Dec 22, 2018

/assign @jingxu97 @saad-ali

@k8s-ci-robot k8s-ci-robot added area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 22, 2018
@msau42
Copy link
Member Author

msau42 commented Dec 22, 2018

/priority critical-urgent

@k8s-ci-robot k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 22, 2018
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened up #72290 for refactoring and fixing these inter-pkg dependency issues

@Shnatsel
Copy link

pull-kubernetes-verify is flaky, which is tracked as #71394

/retest

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modify the comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above podVolumesExist() checking, it calls getMountedVolumePathListFromDisk() to check whether there are mounted volumes exist. And then it calls getPodVolumePathListFromDisk to check whether there are still volume paths exist. I think there are some redundancy here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be redundant, but I don't want to remove it as part of this bug fix, as this code is not very well tested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it comment that this may be able to be cleaned up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened up #72346

@msau42 msau42 force-pushed the fix-subpath-orphan branch from 6bfc578 to 639278a Compare December 26, 2018 18:00
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per documentation:

// IsNotMountPoint is more expensive than IsLikelyNotMountPoint.

How frequently is getPodVolumeSubpathsDir(...) called? Want to be careful we don't introduce a performance regression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to remove this commit. Changing to IsNotMountPoint is not strictly necessary to fix this particular bug.

But we should investigate other callers of these functions to see if this may be wrong in other contexts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened up #72347 to investigate all uses of IsLikelyNotMountPoint

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it necessary to call IsLikelyNotMountPoint function at all then? There's no protection that it won't end up removing data if mount is from the same device anyway (which was the case for the bug we hit), so if you are ok with that, it seems like calling the function just gives false sense of confidence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsLikelyNotMountPoint still works for most other volume types that are actually different devices.

The reason why I didn't want to change this function here is because it's being called in other paths as well, and we only wanted to do a targeted fix for this particular issue. We can do a more holistic investigation in #72346 and #72347 for further areas to cleanup.

@msau42 msau42 force-pushed the fix-subpath-orphan branch from 639278a to 3ebbbbd Compare December 26, 2018 18:50
@saad-ali
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 26, 2018
@saad-ali
Copy link
Member

/test pull-kubernetes-local-e2e-containerized

@msau42
Copy link
Member Author

msau42 commented Dec 26, 2018

/retest

@msau42
Copy link
Member Author

msau42 commented Dec 26, 2018

I manually tested this by:

  1. Create a pod with a subpath volume
  2. Create a directory and random file under /var/lib/kubelet/pods/poduid/volume-subpaths
  3. Delete the pod
  4. Verify the directories and files under 2) were not deleted, and that this log message is printed:
kubelet_volumes.go:154] Orphaned pod "5799cca2-0956-11e9-b2f6-42010a800002" found, but volume subpaths are still present on disk

@msau42
Copy link
Member Author

msau42 commented Dec 26, 2018

/retest

@k8s-ci-robot
Copy link
Contributor

@msau42: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-local-e2e-containerized 3ebbbbd link /test pull-kubernetes-local-e2e-containerized

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@msau42
Copy link
Member Author

msau42 commented Dec 26, 2018

/assign @Random-Liu
for kubelet review

@Random-Liu
Copy link
Member

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, Random-Liu, saad-ali

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 27, 2018
continue
}

// If there are any volume-subpaths, do not cleanup directories

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do not fully follow why we should not cleanup directories if volume-subpaths exists ? May be add some additional comments to explain that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that all the volumes have not been cleanly unmounted yet

@k8s-ci-robot k8s-ci-robot merged commit 68451f3 into kubernetes:master Dec 27, 2018
k8s-ci-robot added a commit that referenced this pull request Jan 5, 2019
…-upstream-release-1.12

Automated cherry pick of #72291: Check for volume-subpaths directory in orpahaned pod
k8s-ci-robot added a commit that referenced this pull request Jan 5, 2019
…-upstream-release-1.13

Automated cherry pick of #72291: Check for volume-subpaths directory in orpahaned pod
k8s-ci-robot added a commit that referenced this pull request Jan 17, 2019
…-upstream-release-1.11

Automated cherry pick of #72291: Check for volume-subpaths directory in orpahaned pod
@msau42 msau42 deleted the fix-subpath-orphan branch March 4, 2019 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Orphaned pod cleanup removed all data in subPaths of a local volume with path on root drive

8 participants