Skip to content

Task delete fails after unsuccesful container restore. #7357

@NikolaBo

Description

@NikolaBo

Description

After any unsuccessful container restore where containerd has invoked runc and the runc restore has failed, the task is left in a created state. In this state you cannot delete the task or the container.

Steps to reproduce the issue

  1. pull image: ctr -n demo i pull docker.io/library/redis:latest
  2. create container: ctr -n demo c create docker.io/library/redis:latest redis
  3. start container: ctr -n demo task start redis
  4. container checkpoint: ctr -n demo c checkpoint --rw --image --task redis redis-checkpoint
  5. find checkpoint image digest: ctr -n demo i ls | grep redis-checkpoint
  6. find checkpoint config digest in image manifest: for me the image manifest is at /var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/{digest of image}, and the checkpoint config has type application/vnd.containerd.container.checkpoint.config.v1+proto
  7. the checkpoint config file describes the OCI spec of the checkpointed container, modify it so that runc fails to restore: For instance, I replaced namespace type: pid with type: not, which will cause restore to fail. NOTE: if your edit changes the size of the checkpoint config file containerd will fail to parse it, so you will not make it to runc failing to restore, which reveals the issue at hand.
  8. container restore: ctr -n demo c restore --rw --live redis-checkpoint redis-checkpoint. This command will fail, and a task will be left with status created.

Describe the results you received and expected

Expected: To be able to delete the task and container created when a restore fails.
Actual: ctr -n demo t ls lists

TASK                PID       STATUS    
redis               310281    RUNNING
redis-checkpoint    0         CREATED

The task redis-checkpoint cannot be removed: ctr -n demo t rm redis-checkpoint gets ERRO[0000] unable to delete redis-checkpoint error="task must be stopped before deletion: created: failed precondition"

The task cannot be killed: ctr -n demo t kill redis-checkpoint gets ctr: no such container: not found

The container cannot be deleted: ERRO[0000] unable to delete redis-checkpoint error="task must be stopped before deletion: created: failed precondition" ctr: task must be stopped before deletion: created: failed precondition

The issue seems to be that the task is left in a completed state after the restore fails, preventing its deletion.

What version of containerd are you using?

containerd github.com/containerd/containerd v1.6.6 10c1295

Any other relevant information

runc version 1.1.2

Show configuration if it is related to CRI plugin.

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions