Orphaned pod cleanup removed all data in subPaths of a local volume with path on root drive

**What happened**:
Network for all nodes of kubernetes cluster (29 hosts) was turned off for a day. Once the network was brought back, we've scaled down a statefulset that mounts a `subPath` from local volume which `path` on a root drive and then scaled it back up to previous number of replicas. kubelet on 26 out of 29 hosts proceeded with removing all data from subpaths of the pod. In fact, it was ongoing, such that when we'd create a file on a mounted subpath within the container, the file would be cleaned up within a couple of seconds. The issue would continue until volume-subpath was manually unmounted (with `unmount` command).

Ended up removing 3rd of all of our data (we have 3 drives per host and one was wiped) for the whole cluster.

Equivalent definition. Note that *host path of local volume has to be on root drive*:
```
---
apiVersion: v1
kind: Service
metadata:
  name: busybox
spec:
  ports:
  - port: 1234
    name: placeholder
  clusterIP: None
  selector:
    app: busybox
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: "test-volume"
  namespace: default
spec:
  capacity:
    storage: 1Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: test-data
  local:
    path: /volume/subpath
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - "examplehost.com"
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: busybox
  namespace: default
spec:
  serviceName: hbase
  replicas: 1
  template:
    metadata:
      labels:
        app: busybox
    spec:
      containers:
      - name: server
        image: busybox
        command:
        - sleep
        - "36000"
        volumeMounts:
        - name: data
          subPath: subpath
          mountPath: /subpath
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: test-data
      resources:
        requests:
          storage: 1Gi
```

Kubelet logs (filtered out logs from other pods) during the issue:
```
Dec 11 20:32:44 <hostname> kubelet-1.13.0[11504]: E1211 20:32:44.240919   11504 desired_state_of_world_populator.go:296] Error processing volume "data1" for pod "datanode-19_default(9af0d50b-50f1-11e8-b37b-ac1f6b10eed2)": error processing PVC "default"/"data1-datanode-19": failed to fetch PVC default/data1-datanode-19 from API server. err=Get https://<master>/api/v1/namespaces/default/persistentvolumeclaims/data1-datanode-19: dial tcp <masterip>:443: connect: network is unreachable
... NETWORK CAME BACK UP ...
... RESCALED STATEFULSET ...
Dec 11 21:20:40 <hostname> kubelet-1.13.0[11504]: E1211 21:20:40.606405   11504 remote_runtime.go:119] StopPodSandbox "c139242352da780cf22f10bc6514d59b737f1cfd6b3462893c133251a3cebbd4" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "datanode-19_default" network: could not retrieve port mappings: checkpoint is corrupted
Dec 11 21:20:40 <hostname> kubelet-1.13.0[11504]: E1211 21:20:40.606435   11504 kuberuntime_manager.go:815] Failed to stop sandbox {"docker" "c139242352da780cf22f10bc6514d59b737f1cfd6b3462893c133251a3cebbd4"}
Dec 11 21:20:40 <hostname> kubelet-1.13.0[11504]: E1211 21:20:40.606479   11504 kubelet.go:1576] error killing pod: failed to "KillPodSandbox" for "9af0d50b-50f1-11e8-b37b-ac1f6b10eed2" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"datanode-19_default\" network: could not retrieve port mappings: checkpoint is corrupted"
Dec 11 21:20:40 <hostname> kubelet-1.13.0[11504]: E1211 21:20:40.606490   11504 pod_workers.go:190] Error syncing pod 9af0d50b-50f1-11e8-b37b-ac1f6b10eed2 ("datanode-19_default(9af0d50b-50f1-11e8-b37b-ac1f6b10eed2)"), skipping: error killing pod: failed to "KillPodSandbox" for "9af0d50b-50f1-11e8-b37b-ac1f6b10eed2" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"datanode-19_default\" network: could not retrieve port mappings: checkpoint is corrupted"
Dec 11 21:20:40 <hostname> kubelet-1.13.0[11504]: W1211 21:20:40.876177   11504 docker_sandbox.go:384] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "datanode-19_default": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "c139242352da780cf22f10bc6514d59b737f1cfd6b3462893c133251a3cebbd4"
Dec 11 21:20:40 <hostname> kubelet-1.13.0[11504]: W1211 21:20:40.909701   11504 cni.go:302] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "c139242352da780cf22f10bc6514d59b737f1cfd6b3462893c133251a3cebbd4"
Dec 11 21:20:41 <hostname> kubelet-1.13.0[11504]: I1211 21:20:41.029701   11504 reconciler.go:181] operationExecutor.UnmountVolume started for volume "data1" (UniqueName: "kubernetes.io/local-volume/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2-<local volume name>") pod "9af0d50b-50f1-11e8-b37b-ac1f6b10eed2" (UID: "9af0d50b-50f1-11e8-b37b-ac1f6b10eed2")
Dec 11 21:20:41 <hostname> kubelet-1.13.0[11504]: I1211 21:20:41.065868   11504 operation_generator.go:683] UnmountVolume.TearDown succeeded for volume "kubernetes.io/local-volume/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2-<local volume name>" (OuterVolumeSpecName: "data1") pod "9af0d50b-50f1-11e8-b37b-ac1f6b10eed2" (UID: "9af0d50b-50f1-11e8-b37b-ac1f6b10eed2"). InnerVolumeSpecName "<local volume name>". PluginName "kubernetes.io/local-volume", VolumeGidValue ""
Dec 11 21:20:41 <hostname> kubelet-1.13.0[11504]: I1211 21:20:41.130411   11504 reconciler.go:294] operationExecutor.UnmountDevice started for volume "<local volume name>" (UniqueName: "kubernetes.io/local-volume/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2-<local volume name>") on node "<node name>"
Dec 11 21:20:41 <hostname> kubelet-1.13.0[11504]: I1211 21:20:41.132419   11504 operation_generator.go:768] UnmountDevice succeeded for volume "<local volume name>" %!(EXTRA string=UnmountDevice succeeded for volume "<local volume name>" (UniqueName: "kubernetes.io/local-volume/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2-<local volume name>") on node "<node name>" )
Dec 11 21:20:41 <hostname> kubelet-1.13.0[11504]: I1211 21:20:41.230660   11504 reconciler.go:301] Volume detached for volume "<local volume name>" (UniqueName: "kubernetes.io/local-volume/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2-<local volume name>") on node "<node name>" DevicePath ""
Dec 11 21:22:33 <hostname> kubelet-1.13.0[11504]: E1211 21:22:33.919172   11504 kubelet_volumes.go:133] Failed to remove orphaned pod "9af0d50b-50f1-11e8-b37b-ac1f6b10eed2" dir; err: remove /var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volume-subpaths/data1/server/3: device or resource busy
Dec 11 21:22:33 <hostname> kubelet-1.13.0[11504]: E1211 21:22:33.919235   11504 kubelet_pods.go:1039] Failed cleaning up orphaned pod directories: [cannot delete directory /var/lib/kubelet/pods/9a77b332-50f1-11e8-b37b-ac1f6b10eed2/volume-subpaths/data/server/0: it is a mount point, remove /var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volume-subpaths/data1/server/3: device or resource busy]
Dec 11 21:22:33 <hostname> kubelet-1.13.0[11504]: W1211 21:22:33.935707   11504 kubelet_getters.go:277] Path "/var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volumes" does not exist
Dec 11 21:22:33 <hostname> kubelet-1.13.0[11504]: W1211 21:22:33.935716   11504 kubelet_getters.go:277] Path "/var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volumes" does not exist
Dec 11 21:22:33 <hostname> kubelet-1.13.0[11504]: E1211 21:22:33.935907   11504 kubelet_volumes.go:133] Failed to remove orphaned pod "9af0d50b-50f1-11e8-b37b-ac1f6b10eed2" dir; err: remove /var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volume-subpaths/data1/server/3: device or resource busy
Dec 11 21:22:33 <hostname> kubelet-1.13.0[11504]: E1211 21:22:33.935926   11504 kubelet_pods.go:1039] Failed cleaning up orphaned pod directories: [cannot delete directory /var/lib/kubelet/pods/9a77b332-50f1-11e8-b37b-ac1f6b10eed2/volume-subpaths/data/server/0: it is a mount point, remove /var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volume-subpaths/data1/server/3: device or resource busy]
Dec 11 21:22:36 <hostname> kubelet-1.13.0[11504]: W1211 21:22:34.501180   11504 kubelet_getters.go:277] Path "/var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volumes" does not exist
Dec 11 21:22:36 <hostname> kubelet-1.13.0[11504]: W1211 21:22:34.501189   11504 kubelet_getters.go:277] Path "/var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volumes" does not exist
Dec 11 21:22:36 <hostname> kubelet-1.13.0[11504]: E1211 21:22:36.641636   11504 kubelet_volumes.go:133] Failed to remove orphaned pod "9af0d50b-50f1-11e8-b37b-ac1f6b10eed2" dir; err: remove /var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volume-subpaths/data1/server/3: device or resource busy

... CONTINUES UNTIL WE MANUALLY REMOVED MOUNT ...
```

**What you expected to happen**:

Expected no data loss for peristent local volumes.

**How to reproduce it (as minimally and precisely as possible)**:
Unfortunately was not able to reproduce. But this bug hit 26 out of 29 nodes simultaneously. Need some way to create a case where a pod becomes orphan and "volumes" folder gets somehow removed (similar to issue https://github.com/kubernetes/kubernetes/issues/38498). Then there is a code path that can lead to data loss:

```
Dec 11 21:22:33 <hostname> kubelet-1.13.0[11504]: W1211 21:22:33.935707   11504 kubelet_getters.go:277] Path "/var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volumes" does not exist
Dec 11 21:22:33 <hostname> kubelet-1.13.0[11504]: W1211 21:22:33.935716   11504 kubelet_getters.go:277] Path "/var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volumes" does not exist
```
We are in orphan pod cleanup and `volumes` directory is missing (not sure why it's printed twice):
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_getters.go#L278

No error returned and empty volumes list, so we go into this case which proceeds removing pod directory and `volume-subpaths` along with it even thought they might still be mounted (not sure why):
https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kubelet_volumes.go#L132

```
Dec 11 21:22:33 <hostname> kubelet-1.13.0[11504]: E1211 21:22:33.935907   11504 kubelet_volumes.go:133] Failed to remove orphaned pod "9af0d50b-50f1-11e8-b37b-ac1f6b10eed2" dir; err: remove /var/lib/kubelet/pods/9af0d50b-50f1-11e8-b37b-ac1f6b10eed2/volume-subpaths/data1/server/3: device or resource busy
```

`IsLikelyNotMountPoint` checks if device is the same, but we use root drive for local volume, so the device is the same, so it's not a mount point so ok for removing (this is a bug IMHO).
https://github.com/kubernetes/kubernetes/blob/master/pkg/util/mount/mount_linux.go#L250

**Anything else we need to know?**:

k8s version upgrades went from 1.9.6 -> 1.11.1 -> 1.12.1 -> 1.12.2 -> 1.13.0 without restarting the pods.

**Environment**:
- Kubernetes version (use `kubectl version`):
```
Client Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T19:44:19Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.0", GitCommit:"ddf47ac13c1a9483ea035a79cd7c10005ff21a6d", GitTreeState:"clean", BuildDate:"2018-12-03T20:56:12Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
```
- Cloud provider or hardware configuration: OVH, baremetal servers
- OS (e.g. from /etc/os-release):
```
cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1688.5.3
VERSION_ID=1688.5.3
BUILD_ID=2018-04-03-0547
PRETTY_NAME="Container Linux by CoreOS 1688.5.3 (Rhyolite)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"
```
- Kernel (e.g. `uname -a`):
```
Linux <hostname> 4.14.32-coreos #1 SMP Tue Apr 3 05:21:26 UTC 2018 x86_64 Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz GenuineIntel GNU/Linux
```
- Install tools:
Using Ansible tasks that downloads kubelet from `https://storage.googleapis.com/kubernetes-release/release` and all other required tools.
- Others:


/kind bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Orphaned pod cleanup removed all data in subPaths of a local volume with path on root drive #72257

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Orphaned pod cleanup removed all data in subPaths of a local volume with path on root drive #72257

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions