Add timeout option for docker's exec operation #58925

louyihua · 2018-01-28T08:52:40Z

What this PR does / why we need it:
Currently, docker's exec operation never timeout. This patch introduces a new ExecOptions that contains a timeout value into the parameters of libdocker.StartExec. When a positive timeout is passed and the exec operation timeouts, the StartExec function returns the libdocker.OperationTimeout error.

Release note:

Add timeout option into libdocker.StartExec

louyihua · 2018-01-28T08:55:00Z

@yujuhong
This is a more generic patch than #58510.

yujuhong · 2018-02-06T23:54:17Z

pkg/kubelet/dockershim/libdocker/kube_docker_client.go

Would it make sense to set the timeout in the context?

This patch allows the code that calls StartExec to be notified on timeout. Otherwise, each caller should have its own implementation for detecting timeouts.

dims · 2018-03-07T01:15:04Z

/ok-to-test

louyihua · 2018-04-11T07:19:55Z

@yujuhong
This PR is reworked. Now it matches @ncdc 's option 2

ncdc · 2018-04-11T13:52:20Z

pkg/kubelet/dockershim/libdocker/kube_docker_client.go

I think this wording might be a bit clearer: "Unable to kill process %d of exec session %s in container %s for timeout termination!", inspect.Pid, startExec, inspect.ContainerID

ncdc · 2018-04-11T14:07:04Z

pkg/kubelet/dockershim/libdocker/kube_docker_client.go

It's probably a good idea to store the time.Timer returned by time.AfterFunc so we can defer timeoutTimer.Stop() - this will ensure we don't have timers with long timeouts sticking around for a while after the exec has returned successfully. WDYT?

Yes, stopping the timer on function exit is really a good idea.

ncdc · 2018-04-12T13:58:51Z

LGTM. Will defer to sig-node for final lgtm & approval
/assign @yujuhong @Random-Liu @derekwaynecarr

dims · 2018-04-30T12:39:44Z

/uncc @dims

louyihua · 2018-05-22T14:06:13Z

Another month gone, anyone has any comments on this?
@Random-Liu @derekwaynecarr @yujuhong

mtaufen · 2018-05-22T21:55:33Z

pkg/kubelet/dockershim/libdocker/kube_docker_client.go

nit: s/does/did

mtaufen · 2018-05-22T21:59:05Z

pkg/kubelet/dockershim/libdocker/kube_docker_client.go

Suggest inspectError instead of err2, for a more semantic name. Similar advice below.

mtaufen · 2018-05-22T22:03:00Z

pkg/kubelet/dockershim/libdocker/kube_docker_client.go

suggest something like glog.Errorf("failed to kill exec process after timeout: session: %s, container: %s, error: %v", startExec, inspect.ContainerID, inspectError)

Regarding inspect.ContainerID, can you trust that inspect holds a value when you get an error here?

similar suggestions below

Currently, docker's exec operation never timeout. This patch introduces a new timeout value into the parameters of `libdocker.StartExec`. If the exec operation timeouts, the `StartExec` function returns the `libdocker.operationTimeout` error.

louyihua · 2018-05-23T00:58:10Z

@mtaufen Thanks for your suggestions.

k8s-ci-robot · 2018-05-23T00:58:33Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: louyihua
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: derekwaynecarr

Assign the PR to them by writing /assign @derekwaynecarr in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

pkg/kubelet/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · 2018-08-21T07:21:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-06-11T21:37:27Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

louyihua · 2019-06-12T00:49:44Z

/remove-lifecycle stale

fejta-bot · 2019-09-10T01:07:09Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2019-10-10T01:53:12Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

pigletfly · 2019-10-13T08:50:15Z

any new progress on this? @louyihua

wangrzneu · 2019-12-24T09:27:38Z

@ncdc Can you help to review this PR?

ncdc · 2020-01-02T16:01:29Z

Sorry, I am not involved in this area of the code any more - I would recommend getting someone from SIG Node to review.

/uncc

fejta-bot · 2020-04-01T17:02:06Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

wangrzneu · 2020-04-08T15:56:30Z

/remove-lifecycle stale

fejta-bot · 2020-07-07T16:54:54Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-08-06T17:35:53Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

SergeyKanzhelev · 2020-09-01T20:26:31Z

This PR solves the same problem as the PR that is currently being actively discussed #94115. The agreement from SIG node meeting today was that the code change is only a small part of the fix and we need to actively look at side effects this change may bring to the production payloads. Please join the discussion at the PR: #94115.

/close

k8s-ci-robot · 2020-09-01T20:26:45Z

@SergeyKanzhelev: Closed this PR.

Details

In response to this:

This PR solves the same problem as the PR that is currently being actively discussed #94115. The agreement from SIG node meeting today was that the code change is only a small part of the fix and we need to actively look at side effects this change may bring to the production payloads. Please join the discussion at the PR: #94115.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

The timeout wrapper in health checks was added in helm/charts#11355 to work around Docker/containerd not respecting timeouts in probes (cf. kubernetes/kubernetes#58925). The upstream issue has been fixed since Kubernetes 1.20 (kubernetes/kubernetes#94115), and this wrapper causes degraded behavior (ie. any failure in the wrapped command only gets reported as "The monitored command dumped core", without details for the specific failure), so the original behavior should be restored.

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 28, 2018

k8s-ci-robot requested review from dims and mtaufen January 28, 2018 08:52

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jan 28, 2018

louyihua mentioned this pull request Jan 30, 2018

Allow exec prober to timeout under Docker #58510

Closed

yujuhong reviewed Feb 6, 2018

View reviewed changes

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 7, 2018

louyihua force-pushed the exec-timeout branch 2 times, most recently from 049a5e4 to 750ea4b Compare April 11, 2018 06:27

ncdc reviewed Apr 11, 2018

View reviewed changes

louyihua force-pushed the exec-timeout branch from 750ea4b to 60164a7 Compare April 12, 2018 00:52

k8s-ci-robot assigned derekwaynecarr, Random-Liu and yujuhong Apr 12, 2018

k8s-ci-robot removed the request for review from dims April 30, 2018 12:39

mtaufen reviewed May 22, 2018

View reviewed changes

louyihua force-pushed the exec-timeout branch from 60164a7 to c64785e Compare May 23, 2018 00:57

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 21, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 11, 2019

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 12, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 10, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 1, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 8, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 6, 2020

SergeyKanzhelev mentioned this pull request Aug 20, 2020

kubelet: respect exec probe timeouts #94115

Merged

k8s-ci-robot closed this Sep 1, 2020

giskou mentioned this pull request Mar 3, 2021

fix(argo-cd): Upgrade redis-ha to v4.10.4 argoproj/argo-helm#608

Merged

5 tasks

stuartpb mentioned this pull request Sep 16, 2021

[bitnami/redis] Use timeoutSeconds to timeout health checks bitnami/charts#7520

Closed

2 tasks

Add timeout option for docker's exec operation #58925

Add timeout option for docker's exec operation #58925

Uh oh!

Conversation

louyihua commented Jan 28, 2018

Uh oh!

louyihua commented Jan 28, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dims commented Mar 7, 2018

Uh oh!

louyihua commented Apr 11, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ncdc commented Apr 12, 2018

Uh oh!

dims commented Apr 30, 2018

Uh oh!

louyihua commented May 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

louyihua commented May 23, 2018

Uh oh!

k8s-ci-robot commented May 23, 2018

Uh oh!

fejta-bot commented Aug 21, 2018

Uh oh!

fejta-bot commented Jun 11, 2019

Uh oh!

louyihua commented Jun 12, 2019

Uh oh!

fejta-bot commented Sep 10, 2019

Uh oh!

fejta-bot commented Oct 10, 2019

Uh oh!

pigletfly commented Oct 13, 2019

Uh oh!

wangrzneu commented Dec 24, 2019

Uh oh!

ncdc commented Jan 2, 2020

Uh oh!

fejta-bot commented Apr 1, 2020

Uh oh!

wangrzneu commented Apr 8, 2020

Uh oh!

fejta-bot commented Jul 7, 2020

Uh oh!

fejta-bot commented Aug 6, 2020

Uh oh!

SergeyKanzhelev commented Sep 1, 2020

Uh oh!

k8s-ci-robot commented Sep 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

17 participants