Description
If containerd process crashes for some reason while creating a new container then shim process is leaked and will never be cleaned up.
When containerd creates a new container it creates the shim process and then calls the Create method on the task service(runtime/v2/runc/task/service.go). If containerd crashes while the shim process is creating the container then containerd loses any information about this running shim process. The Create call in the shim fails with context cancelled error.
Now if containerd restarts if connects back to the shim, but shim has no running tasks. If the client asks containerd to delete the container that container gets deleted but the shim process lives on. There is no way to clean up this shim process but to manually kill it.
Steps to reproduce the issue
- Add a small fake delayed crash in
runtime/v2/shim.go:Create just before calling s.task.Create(ctx, request) like below:
go func() {
time.Sleep(1 * time.Second)
panic("deliberate panic...")
}()
- Add a small sleep in the beginning of
runtime/v2/runc/container.go:NewContainer so that containerd crashes before container creation in the shim is complete:
time.Sleep(3 * time.Second)
- Create a new container with
./ctr run docker.io/library/alpine:latest test_container ping 127.0.0.1. containerd crashes and shim fails to create the container with context cancelled error.
- Restart containerd and run
./ctr t list it shows no tasks. Now delete the container with ./ctr c delete test_container and the container is deleted.
- Run
ps -aux | grep containerd to see that the shim process is still running.
Describe the results you received and expected
The shim process is leaked and will never be cleaned up. Expected behavior is that the shim process gets cleaned up.
What version of containerd are you using?
Build from latest main
Any other relevant information
No response
Show configuration if it is related to CRI plugin.
No response
Description
If containerd process crashes for some reason while creating a new container then shim process is leaked and will never be cleaned up.
When containerd creates a new container it creates the shim process and then calls the
Createmethod on the task service(runtime/v2/runc/task/service.go). If containerd crashes while the shim process is creating the container then containerd loses any information about this running shim process. TheCreatecall in the shim fails withcontext cancellederror.Now if containerd restarts if connects back to the shim, but shim has no running tasks. If the client asks containerd to delete the container that container gets deleted but the shim process lives on. There is no way to clean up this shim process but to manually kill it.
Steps to reproduce the issue
runtime/v2/shim.go:Createjust before callings.task.Create(ctx, request)like below:runtime/v2/runc/container.go:NewContainerso that containerd crashes before container creation in the shim is complete:./ctr run docker.io/library/alpine:latest test_container ping 127.0.0.1. containerd crashes and shim fails to create the container withcontext cancellederror../ctr t listit shows no tasks. Now delete the container with./ctr c delete test_containerand the container is deleted.ps -aux | grep containerdto see that the shim process is still running.Describe the results you received and expected
The shim process is leaked and will never be cleaned up. Expected behavior is that the shim process gets cleaned up.
What version of containerd are you using?
Build from latest main
Any other relevant information
No response
Show configuration if it is related to CRI plugin.
No response