Description
We just had an issue with containerd: an application was killed several times by the oom killer because it reached its cgroup memory limit. Containers on the host are now in a really weird state:
- ok according to
crictl ps
crictl exec fails with cannot exec in a stopped state: unknown
ctr -n k8s.io t ls hangs without any output
ps auxf shows many containerd-shim without any child process (or sometime only the pause container)
runc --root /run/containerd/runc/k8s.io list shows some containers in stopped state
- the associated
containerd-shim process is still running without any child
It seems that sometimes when a container process is oom-killed because it has reached its cgroup memory limit the containerd state becomes inconsistent. Once this has happened it's no longer possible to delete containers. When trying to delete a pod, the containerd logs show:
- containerd tries to stop it (StopContainer)
- stop container xx timed out
- then error=“an error occurs during waiting for container xxx to stop: wait container xxx is cancelled”
- the container is stopped but not removed
Steps to reproduce the issue:
- Run kubernetes using containerd as CRI
- Create a pod with a memory limit
- Allocate more memory than the limit
- After several OOM kills, it should no longer be possible to interact with containerd
Describe the results you received:
containerd seems to be stuck in a inconsistent state and no longer able to fulfill CRI requests
Describe the results you expected:
containerd should clean up oom killed containers and remain consistent
Output of containerd --version:
containerd --version
containerd github.com/containerd/containerd v1.1.0 209a7fc3e4a32ef71a8c7b50c68fc8398415badf```
Description
We just had an issue with containerd: an application was killed several times by the oom killer because it reached its cgroup memory limit. Containers on the host are now in a really weird state:
crictl pscrictl execfails withcannot exec in a stopped state: unknownctr -n k8s.io t lshangs without any outputps auxfshows many containerd-shim without any child process (or sometime only the pause container)runc --root /run/containerd/runc/k8s.io listshows some containers instoppedstatecontainerd-shimprocess is still running without any childIt seems that sometimes when a container process is oom-killed because it has reached its cgroup memory limit the containerd state becomes inconsistent. Once this has happened it's no longer possible to delete containers. When trying to delete a pod, the containerd logs show:
Steps to reproduce the issue:
Describe the results you received:
containerd seems to be stuck in a inconsistent state and no longer able to fulfill CRI requests
Describe the results you expected:
containerd should clean up oom killed containers and remain consistent
Output of
containerd --version: