Skip to content

Shim cleanup events fail due to context.Cancelled #4400

@cpuguy83

Description

@cpuguy83

Description

ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
err := publisher.Publish(ctx, runc.GetTopic(e), e)
cancel()

The shim tries to send an event.
If the send fails, the event is added to a queue and it is retried a few times.

i := &item{
ev: &v1.Envelope{
Timestamp: time.Now(),
Namespace: ns,
Topic: topic,
Event: any,
},
ctx: ctx,
}
if err := l.forwardRequest(i.ctx, &v1.ForwardRequest{Envelope: i.ev}); err != nil {
l.queue(i)
return err
}

The problem is the context is re-used for all send requests, but it has already been cancelled by the time the send is attempted again since the original function will have returned.

Steps to reproduce the issue:

  1. Start a container
  2. Kill containerd
  3. Kill the container process
  4. Start containerd again

You will see logs from the shim saying that it can't send the event due to context cancelled.

Describe the results you received:

time="2020-07-17T23:55:43.878255535Z" level=error msg="forward event" error="context canceled"
time="2020-07-17T23:55:49.879116394Z" level=error msg="evicting /tasks/exit from queue because of retry count"

Describe the results you expected:

Event should be able to send.

Output of containerd --version:

containerd github.com/containerd/containerd v1.4.0-beta.2-17-g4feb8c46.m 4feb8c462393ce6834dda9e3464c4fee8ee73232.m

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions