-
Notifications
You must be signed in to change notification settings - Fork 42k
kubelet: dockershim ExecSync should return context.DeadlineExeceeded on timeout #96495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: dockershim ExecSync should return context.DeadlineExeceeded on timeout #96495
Conversation
5b79be9 to
5a1b531
Compare
|
/assign @SergeyKanzhelev |
3517e5f to
9a03598
Compare
hasheddan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewsykim looks like this may break some of the other probing container tests
|
@hasheddan that's interesting, I thought none of the presubmits run docker (see #96463 (comment)), but this PR only touches dockreshim. |
|
@andrewsykim so it looks like the presubmit does use dockershim, but our periodic does not. From the failing test logs: I know @oomichi tested out that the test did in fact fail without your PR but it wasn't
|
|
@andrewsykim ah found it, we are skipping the test that is failing in the periodic |
ed47ee3 to
ab34ee5
Compare
ab34ee5 to
56fd817
Compare
|
/milestone v1.20 |
pkg/kubelet/dockershim/exec.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing bad will happen, but maybe need to call cancel()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cancel is called with defer on line 112
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, totally. My comment is that we can cancel timeout in case the exec has already finished =)
Signed-off-by: Andrew Sy Kim <[email protected]>
bbdcd20 to
3391546
Compare
|
/test pull-kubernetes-e2e-azure-disk-windows |
derekwaynecarr
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just the one comment about not adding NodeConformance tag.
test/e2e/common/container_probe.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree we should defer labeling this NodeConformance
Signed-off-by: Andrew Sy Kim <[email protected]>
3391546 to
f5a82f7
Compare
|
@derekwaynecarr removed exec probe timeout tests from |
|
/lgtm |
|
/kind failing-test |
hasheddan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
@derekwaynecarr could you take another look here and approve if looks good to you?
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andrewsykim, derekwaynecarr, hasheddan The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Andrew Sy Kim [email protected]
What type of PR is this?
/kind bug
What this PR does / why we need it:
In #94115 we fixed a bug where kubelet did not respect exec probe timeouts. That PR also re-enabled some tests to ensure we don't regress. So far all the tests using containerd have been passing, but some tests using dockershim started to fail (see #96463). This wasn't caught during presubmit since jobs that run dockershim only run e2es marked with
[NodeConformance].The bug is in the CRI implementation of dockershim where it should be returning
context.Deadlinein the ExecSync response instead ofexec.TimedoutError.TimedoutErroris actually the error as expected by the prober, which is what should be returned by the gRPC client, not the server. The server is expected to returncontext.DeadlineExeceeded, which results in the cri client returningTimedoutErrorback to the prober.Which issue(s) this PR fixes:
Fixes #96463
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: