Description
We use kubernetes and containerd without docker. Pod is stuck in terminating, because containerd-shim umount error (device is busy). Restarting the kubelet or containerd has no effect. All we can do is find and kill the bad shim. Since it is a customer node, we do not know what will run on it, maybe wrong log collection.
Steps to reproduce the issue:
- Create a pod
- Find the corresponding container with crictl
- Change to the rootfs of the containerd and occupy anything in rootfs, such as
$ cd /run/containerd/io.containerd.runtime.v1.linux/k8s.io/8b11526e80a5e5d99b9e2b07c742e26f7401cac747aa06d1812733b995ffc101/rootfs
$ bin/sleep 10000
- delete the pod
Describe the results you received:
The pod will be stuck in terminating forever.
Describe the results you expected:
The container will be deleted correctly. Docker do!!!
Output of containerd --version:
containerd github.com/containerd/containerd v1.2.7 85f6aa58b8a3170aec9824568f7a31832878b603
Any other relevant information:
Containerd log
container.log
I think the reason is when pod is deleted containerd-shim will kill the process with runc and umount the rootfs of bundle.
func (p *Init) delete(ctx context.Context) error {
err := p.runtime.Delete(ctx, p.id, nil)
// ...
if err2 := mount.UnmountAll(p.Rootfs, 0); err2 != nil {
// ...
}
}
Umount will get an error when rootfs is busy. Kubelet will retry, but the remaining attempts will get an no error(runc did not terminate sucessfully). We can use mount.UnmountAll(p.Rootfs, unix.MNT_DETACH) replice mount.UnmountAll(p.Rootfs, 0) like docker does maybe.
Description
We use kubernetes and containerd without docker. Pod is stuck in terminating, because containerd-shim umount error (device is busy). Restarting the kubelet or containerd has no effect. All we can do is find and kill the bad shim. Since it is a customer node, we do not know what will run on it, maybe wrong log collection.
Steps to reproduce the issue:
Describe the results you received:
The pod will be stuck in terminating forever.
Describe the results you expected:
The container will be deleted correctly. Docker do!!!
Output of
containerd --version:Any other relevant information:
Containerd log
container.log
I think the reason is when pod is deleted containerd-shim will kill the process with runc and umount the rootfs of bundle.
Umount will get an error when rootfs is busy. Kubelet will retry, but the remaining attempts will get an no error(runc did not terminate sucessfully). We can use
mount.UnmountAll(p.Rootfs, unix.MNT_DETACH)replicemount.UnmountAll(p.Rootfs, 0)like docker does maybe.