Skip to content

Shim process is leaked when containerd crashes during container create #6860

@ambarve

Description

@ambarve

Description

If containerd process crashes for some reason while creating a new container then shim process is leaked and will never be cleaned up.
When containerd creates a new container it creates the shim process and then calls the Create method on the task service(runtime/v2/runc/task/service.go). If containerd crashes while the shim process is creating the container then containerd loses any information about this running shim process. The Create call in the shim fails with context cancelled error.
Now if containerd restarts if connects back to the shim, but shim has no running tasks. If the client asks containerd to delete the container that container gets deleted but the shim process lives on. There is no way to clean up this shim process but to manually kill it.

Steps to reproduce the issue

  1. Add a small fake delayed crash in runtime/v2/shim.go:Create just before calling s.task.Create(ctx, request) like below:
go func() {
   time.Sleep(1 * time.Second)
   panic("deliberate panic...")
}()
  1. Add a small sleep in the beginning of runtime/v2/runc/container.go:NewContainer so that containerd crashes before container creation in the shim is complete:
time.Sleep(3 * time.Second)
  1. Create a new container with ./ctr run docker.io/library/alpine:latest test_container ping 127.0.0.1. containerd crashes and shim fails to create the container with context cancelled error.
  2. Restart containerd and run ./ctr t list it shows no tasks. Now delete the container with ./ctr c delete test_container and the container is deleted.
  3. Run ps -aux | grep containerd to see that the shim process is still running.

Describe the results you received and expected

The shim process is leaked and will never be cleaned up. Expected behavior is that the shim process gets cleaned up.

What version of containerd are you using?

Build from latest main

Any other relevant information

No response

Show configuration if it is related to CRI plugin.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions