-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Description
After any unsuccessful container restore where containerd has invoked runc and the runc restore has failed, the task is left in a created state. In this state you cannot delete the task or the container.
Steps to reproduce the issue
- pull image:
ctr -n demo i pull docker.io/library/redis:latest - create container:
ctr -n demo c create docker.io/library/redis:latest redis - start container:
ctr -n demo task start redis - container checkpoint:
ctr -n demo c checkpoint --rw --image --task redis redis-checkpoint - find checkpoint image digest:
ctr -n demo i ls | grep redis-checkpoint - find checkpoint config digest in image manifest: for me the image manifest is at
/var/lib/containerd/io.containerd.content.v1.content/blobs/sha256/{digest of image}, and the checkpoint config has typeapplication/vnd.containerd.container.checkpoint.config.v1+proto - the checkpoint config file describes the OCI spec of the checkpointed container, modify it so that runc fails to restore: For instance, I replaced namespace
type: pidwithtype: not, which will cause restore to fail. NOTE: if your edit changes the size of the checkpoint config file containerd will fail to parse it, so you will not make it to runc failing to restore, which reveals the issue at hand. - container restore:
ctr -n demo c restore --rw --live redis-checkpoint redis-checkpoint. This command will fail, and a task will be left with status created.
Describe the results you received and expected
Expected: To be able to delete the task and container created when a restore fails.
Actual: ctr -n demo t ls lists
TASK PID STATUS
redis 310281 RUNNING
redis-checkpoint 0 CREATED
The task redis-checkpoint cannot be removed: ctr -n demo t rm redis-checkpoint gets ERRO[0000] unable to delete redis-checkpoint error="task must be stopped before deletion: created: failed precondition"
The task cannot be killed: ctr -n demo t kill redis-checkpoint gets ctr: no such container: not found
The container cannot be deleted: ERRO[0000] unable to delete redis-checkpoint error="task must be stopped before deletion: created: failed precondition" ctr: task must be stopped before deletion: created: failed precondition
The issue seems to be that the task is left in a completed state after the restore fails, preventing its deletion.
What version of containerd are you using?
containerd github.com/containerd/containerd v1.6.6 10c1295
Any other relevant information
runc version 1.1.2
Show configuration if it is related to CRI plugin.
No response