Skip to content

allow client to remove created tasks with PID 0#7787

Merged
estesp merged 1 commit intocontainerd:mainfrom
ginglis13:restore-fail
Dec 9, 2022
Merged

allow client to remove created tasks with PID 0#7787
estesp merged 1 commit intocontainerd:mainfrom
ginglis13:restore-fail

Conversation

@ginglis13
Copy link
Contributor

Fixes #7357

If a container is restored from a checkpoint that has a configuration error, the task for the restored container is created, but fails to start and is left in the state CREATED with a PID of 0.

In the original issue, the induced error comes from call to task.Start:

ctr: OCI runtime restore failed: namespace {"not" ""} does not exist: unknown

The main task created from a restored container will always be PID 0 before the task is explicitly started

task, err := tasks.NewTask(ctx, client, ctr, "", con, false, "", []cio.Opt{}, topts...)

I had originally intended to make a change on the go-runc side to prevent PID 0 tasks from even being created, but given that restored tasks are assigned a PID at task.Start, that would have been a breaking change.

If the task fails to start in this case, the task will be left in the CREATED state with PID 0. Before this change, the only way to remove this task was to find the PID of the shim monitoring the task and kill that process. Now, ctr t rm <task> will work on tasks that result in the CREATED state with PID 0.

Signed-off-by: Gavin Inglis [email protected]

Fixes containerd#7357

If a container is restored from a checkpoint that has a configuration
error, the task for the restored container is created, but fails to
start and is left in the state CREATED with a PID of 0. Before this
change, the only way to remove this task was to find the PID of the shim
monitoring the task and kill that process. Now, ctr t rm <task> will
work on tasks that result in the CREATED state with PID 0.

Signed-off-by: Gavin Inglis <[email protected]>
@k8s-ci-robot
Copy link

Hi @ginglis13. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@AkihiroSuda
Copy link
Member

/ok-to-test

Copy link
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit e5751d4 into containerd:main Dec 9, 2022
@dmcgowan dmcgowan added cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch and removed cherry-pick/1.6.x labels Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/1.6.x PR commits are cherry-picked into release/1.6 branch ok-to-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task delete fails after unsuccesful container restore.

5 participants