enhancement: make task.Delete API retriable

### What is the problem you're trying to solve

In current design, the containerd runtime v2 task.Delete API will ignore the Shutdown error, for instance `context.Canceled` or `context.DeadlineExceed`, and then delete shim instance from memory. If the shim doesn't receive the `Shutdown` call, the shim won't exit by itself. And then it becomes leaky shim.

https://github.com/containerd/containerd/blob/ba852faf410e5202bb65291743f86ca84fa0ff5b/runtime/v2/shim.go#L471-L481

I uses Shutdown failpoint to reproduce this issue in #8954, please checkout the code https://github.com/containerd/containerd/blob/ba852faf410e5202bb65291743f86ca84fa0ff5b/integration/issue7496_shutdown_linux_test.go#L37.

I want to fix the leaky shim issue from API design, instead of adding easy-break patches. The key I think is to make the task.Delete API retriable.

Related issues:
* https://github.com/containerd/containerd/issues/7496
* https://github.com/containerd/containerd/issues/8931

### Describe the solution you'd like

Before we make the task.Delete API retriable, we should rethink about the task event and runc-shim delete binary call.

Currently, both `task.Exit` and `task.Delete` can be sent by shim or containerd runtime v2 plugin.
For containerd runtime v2 plugin, it doesn't know whether the shim has sent it or not.
So, the runtime v2 plugin will send the event to subscriber (like moby/moby) only if the shim.Delete API call is successful.

https://github.com/containerd/containerd/blob/ba852faf410e5202bb65291743f86ca84fa0ff5b/runtime/v2/shim.go#L149-L163

And the runc-shim doesn't persist the exit code of container in bundle dir. The runc-shim always use **137** as exit code even if the container exit code is zero. For example, https://github.com/containerd/containerd/issues/4769. 

I think we should define the shim.Delete binary call with more detail in https://github.com/containerd/containerd/blob/main/runtime/v2/README.md#delete: The shim implementation should ensure that the exit code value from shim.Delete binary call should be aligned with shim.Delete API call.

There is one question we need to confirm:

* Is it allowed to send the same event multiple times?

If so and the shim.Delete binary call can returns the real exit code, the retriable task.Delete will be like:


```go
func (s *shimTask) delete(ctx context.Context, sandboxed bool, removeTask func(ctx context.Context, id string)) (*runtime.Exit, error) {
	response, shimErr := s.task.Delete(ctx, &task.DeleteRequest{ID: s.ID() })
	if shimErr != nil {
		log.G(ctx).WithField("id", s.ID()).WithError(shimErr).Debug("failed to delete task")
		if !errors.Is(shimErr, ttrpc.ErrClosed) {
			shimErr = errdefs.FromGRPC(shimErr)
			if !errdefs.IsNotFound(shimErr) {
				return nil, shimErr
			}
		}
	}

	
	if err := s.waitShutdown(ctx); err != nil {
		return err
	}

	if err := s.ShimInstance.Delete(ctx); err != nil {
		return err
	}

        // send the exit and delete events
	
	removeTask(ctx, s.ID())

	return &runtime.Exit{
		Status:    response.ExitStatus,
		Timestamp: protobuf.FromTimestamp(response.ExitedAt),
		Pid:       response.Pid,
	}, nil
}
```

If it's not accecpted to send the same event multiple time, I think we should implement the event system in runtime v2, instead of shim server. 

### Additional context

I think the moby/moby should not rely on the event to change task status. moby should have relist function to ensure the status is not out-of-date. 

And IMO, the shim.Wait API call is better than task event. We don't miss any event based on task.Wait, for instance, CRI plugin.

https://github.com/moby/moby/issues/46212

	// Don't shutdown sandbox as there may be other containers running.
	// Let controller decide when to shutdown.
	if !sandboxed {
	if err := s.waitShutdown(ctx); err != nil {
	// FIXME(fuweid):
	//
	// If the error is context canceled, should we use context.TODO()
	// to wait for it?
	log.G(ctx).WithField("id", s.ID()).WithError(err).Error("failed to shutdown shim task and the shim might be leaked")
	}
	}

	func cleanupAfterDeadShim(ctx context.Context, id string, rt runtime.NSMap[ShimInstance], events exchange.Exchange, binaryCall *binary) {
	ctx, cancel := timeout.WithContext(ctx, cleanupTimeout)
	defer cancel()

	log.G(ctx).WithField("id", id).Warn("cleaning up after shim disconnected")
	response, err := binaryCall.Delete(ctx)
	if err != nil {
	log.G(ctx).WithError(err).WithField("id", id).Warn("failed to clean up after shim disconnected")
	}

	if _, err := rt.Get(ctx, id); err != nil {
	// Task was never started or was already successfully deleted
	// No need to publish events
	return
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enhancement: make task.Delete API retriable #8981

What is the problem you're trying to solve

Describe the solution you'd like

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

enhancement: make task.Delete API retriable #8981

Description

What is the problem you're trying to solve

Describe the solution you'd like

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions