Description
We use kubelet and containerd. Restart containerd failed because cri found the same container name.
8月 02 22:56:05 VM-0-29-centos containerd[36948]: time="2022-08-02T22:56:05.989055410+08:00" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve container name "kube-proxy_kube-proxy-m28fw_kube-system_0df69e5f-4355-4f99-bfd0-d1c6b2f935aa_0": name "kube-proxy_kube-proxy-m28fw_kube-system_0df69e5f-4355-4f99-bfd0-d1c6b2f935aa_0" is reserved for "73cc6d80cd6602e5ff53fd62db85cf09ecc8fe12b9effe753c404bf45750842a""
8月 02 22:56:05 VM-0-29-centos systemd[1]: containerd.service: Main process exited, code=exited, status=1/FAILURE
It's easy to reproduce, as long as you fill up the disk.
CRI will load all containers., when restart. And the container will be skipped, if load failed. And kubelet will create a container/sandbox with same name and restrartCount, because kubelet get restrartCount from annotation of the existed container. If loading the missing container succeeds, on the next restart, cri will find the container with the same name, so it will panic.
Steps to reproduce the issue
- Find a common node with containerd as runtime
- Fill the disk with commond such as
dd if=/dev/zero of=file bs=1M count=1024
- Restart containerd and we'll go a message "Failed to load container xxx error=failed to checkpoint status to xxx/.tmp-status106398678: no space left on device""
- Kubelet will create new containers will same name
rm file
systemctl restart containerd
Describe the results you received and expected
Noop
What version of containerd are you using?
1.4.3
Any other relevant information
no matter
Show configuration if it is related to CRI plugin.
no matter
Description
We use kubelet and containerd. Restart containerd failed because cri found the same container name.
8月 02 22:56:05 VM-0-29-centos containerd[36948]: time="2022-08-02T22:56:05.989055410+08:00" level=fatal msg="Failed to run CRI service" error="failed to recover state: failed to reserve container name "kube-proxy_kube-proxy-m28fw_kube-system_0df69e5f-4355-4f99-bfd0-d1c6b2f935aa_0": name "kube-proxy_kube-proxy-m28fw_kube-system_0df69e5f-4355-4f99-bfd0-d1c6b2f935aa_0" is reserved for "73cc6d80cd6602e5ff53fd62db85cf09ecc8fe12b9effe753c404bf45750842a""
8月 02 22:56:05 VM-0-29-centos systemd[1]: containerd.service: Main process exited, code=exited, status=1/FAILURE
It's easy to reproduce, as long as you fill up the disk.
CRI will load all containers., when restart. And the container will be skipped, if load failed. And kubelet will create a container/sandbox with same name and restrartCount, because kubelet get restrartCount from annotation of the existed container. If loading the missing container succeeds, on the next restart, cri will find the container with the same name, so it will panic.
Steps to reproduce the issue
dd if=/dev/zero of=file bs=1M count=1024rm filesystemctl restart containerdDescribe the results you received and expected
Noop
What version of containerd are you using?
1.4.3
Any other relevant information
no matter
Show configuration if it is related to CRI plugin.
no matter