Skip to content

Task is left in weird state after containerd is restarted during task deletion. #2150

@Random-Liu

Description

@Random-Liu

I hit one case in cri-containerd restart test that:

  1. The test stops a container;
  2. cri-containerd handles TaskExit event and task.Delete the task;
  3. task.Delete closes the task io, which deletes the fifo directory here
  4. containerd/cri-containerd gets restarted before task is actually deleted here.
  5. After cri-containerd/containerd is up, we'll never be able to load the task and attaching FIFOs. Because the fifo directory has been removed, even we do OpenFifo in ioAttach we'll get the error that the directory is not found.

There are 2 options:

  • Option 1: Do not try to attach io to a stopped task. The problems is that we usually attach io when loading task, but with the current client we can only know the task status after loading the task. It means that to reliably load a task and attach io, we have to 1) load task without IO first -> 2) check status -> 3) load task with IO. This is pretty troublesome.
  • Option 2: Ensure fifo directory exists in loadTask when ioAttach is specified.
  • Option 3: Fix task load. #2151 (comment)

I feel like option 2 seems simpler.

/cc @dnephin

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions