Skip to content

[release/1.3] shim: move event context timeout to publisher#4417

Merged
mxpv merged 1 commit intocontainerd:release/1.3from
cpuguy83:1.3-shim2_event_cancelled
Jul 29, 2020
Merged

[release/1.3] shim: move event context timeout to publisher#4417
mxpv merged 1 commit intocontainerd:release/1.3from
cpuguy83:1.3-shim2_event_cancelled

Conversation

@cpuguy83
Copy link
Copy Markdown
Member

Backports #4412 to the 1.3 branch


Before this change, if an event fails to send on the first attempt,
subsequent attempts will fail with context.Cancelled because the the
caller of publish passes a cancellable timeout, which the publisher uses
to send the event.

The publisher returns immediately if the send fails, but adds the event
to an async queue to try again.
Meanwhile the caller will return cancelling the context.

Additionally, subsequent attempts may fail to send because the timeout
was expected to be for a single request but the queue sleeps for
attempt*time.Second.

In the shim service, the timeout was set to 5s, which means the send
will fail with context.DeadlineExceeded before it reaches maxRequeue
(which is currently 5).

This change moves the timeout to the publisher so each send attempt gets
its own timeout.

Signed-off-by: Brian Goff [email protected]
(cherry picked from commit d7b9cb0)
Signed-off-by: Brian Goff [email protected]

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 23, 2020

Build succeeded.

@estesp estesp changed the title [1.3] shim: move event context timeout to publsher [release/1.3] shim: move event context timeout to publisher Jul 28, 2020
@estesp
Copy link
Copy Markdown
Member

estesp commented Jul 28, 2020

Can you rebase on master now that the CI fixes are merged?

@cpuguy83 cpuguy83 force-pushed the 1.3-shim2_event_cancelled branch from 6eea0eb to fb54117 Compare July 28, 2020 18:46
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 28, 2020

Build succeeded.

Before this change, if an event fails to send on the first attempt,
subsequent attempts will fail with context.Cancelled because the the
caller of publish passes a cancellable timeout, which the publisher uses
to send the event.

The publisher returns immediately if the send fails, but adds the event
to an async queue to try again.
Meanwhile the caller will return cancelling the context.

Additionally, subsequent attempts may fail to send because the timeout
was expected to be for a single request but the queue sleeps for
`attempt*time.Second`.

In the shim service, the timeout was set to 5s, which means the send
will fail with context.DeadlineExceeded before it reaches `maxRequeue`
(which is currently 5).

This change moves the timeout to the publisher so each send attempt gets
its own timeout.

Signed-off-by: Brian Goff <[email protected]>
(cherry picked from commit d7b9cb0)
Signed-off-by: Brian Goff <[email protected]>
@cpuguy83 cpuguy83 force-pushed the 1.3-shim2_event_cancelled branch from fb54117 to ad19320 Compare July 28, 2020 19:29
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 28, 2020

Build succeeded.

@crosbymichael
Copy link
Copy Markdown
Member

LGTM

@mxpv mxpv merged commit da85548 into containerd:release/1.3 Jul 29, 2020
Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cpuguy83 cpuguy83 deleted the 1.3-shim2_event_cancelled branch July 29, 2020 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants