Skip to content

cri: cannot remove stopped container #4655

@zhuangqh

Description

@zhuangqh

Description

crictl stop xxx successfully, but cannot remove it

ERRO[0000] removing container "ff62688307157" failed: rpc error: code = Unknown desc = failed to set removing state for container "ff62688307157152925177bad8608458b1513fa50408c8212cb9dd22959093f2": container state is unknown, to stop first
FATA[0000] unable to remove container(s)

containerd log

time="2020-10-26T16:01:26.710342258+08:00" level=info msg="StopContainer for "ff62688307157152925177bad8608458b1513fa50408c8212cb9dd22959093f2" with timeout 30 (s)"
time="2020-10-26T16:01:26.712425853+08:00" level=info msg="StopContainer for "ff62688307157152925177bad8608458b1513fa50408c8212cb9dd22959093f2" returns successfully"
time="2020-10-26T16:01:26.714464191+08:00" level=info msg="RemoveContainer for "ff62688307157152925177bad8608458b1513fa50408c8212cb9dd22959093f2""
time="2020-10-26T16:01:26.714534014+08:00" level=error msg="RemoveContainer for "ff62688307157152925177bad8608458b1513fa50408c8212cb9dd22959093f2" failed" error="failed to set re
moving state for container "ff62688307157152925177bad8608458b1513fa50408c8212cb9dd22959093f2": container state is unknown, to stop first"

Steps to reproduce the issue:
using kata container

  1. create cri sandbox successfully, but failed to start non-sandbox container xxxx
  2. restart containerd
  3. crictl stop xxxx successfully, but cannot remove it

Describe the results you received:

Describe the results you expected:

Output of containerd --version:

containerd github.com/containerd/containerd v1.2.10

Any other relevant information:

crictl inspect

{
  "status": {
    "id": "ff62688307157152925177bad8608458b1513fa50408c8212cb9dd22959093f2",
    "metadata": {
      "attempt": 0,
      "name": "xxxx"
    },
    "state": "CONTAINER_UNKNOWN",
    "createdAt": "2020-10-22T21:09:03.933372337+08:00",
    "startedAt": "1970-01-01T08:00:00+08:00",
    "finishedAt": "2020-10-22T21:09:05.043456809+08:00",
    "exitCode": 128,
...

container in unknown state, but createdAt and finishedAt timestamp is not empty.

when stopping this container, the unknown state will never set to false due to https://github.com/containerd/containerd/blob/master/pkg/cri/server/events.go#L330 . thus, this container cannot be removed.

root cause analyze
After restarting containerd, containerd failed to load this container

time="2020-10-22T21:09:08.015373417+08:00" level=error msg="Failed to load container status for "ff62688307157152925177bad8608458b1513fa50408c8212cb9dd22959093f2"" error="failed to load task: error creating fifo /run/containerd/io.containerd.grpc.v1.cri/containers/ff62688307157152925177bad8608458b1513fa50408c8212cb9dd22959093f2/io/057455983/ff62688307157152925177bad8608458b1513fa50408c8212cb9dd22959093f2-stdout: no such file or directory"

But containerd partially set unknown field to true here. left finishedAt non-empty
https://github.com/containerd/containerd/blob/master/pkg/cri/server/restart.go#L312

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions