-
Notifications
You must be signed in to change notification settings - Fork 42.2k
bugfix(port-forward): Correctly handle known errors #117493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Skipping CI for Draft Pull Request. |
|
@sxllwx is is a user-visible issue? If it is, please provide a release note. /triage accepted |
- structured error types added - use Close instead of Reset close data and error conn
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sxllwx The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test pull-kubernetes-unit |
|
PTAL thx~. @liggitt |
|
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
Ugh, a stale bot. /remove-lifecycle stale |
| ctx := context.Background() | ||
| defer p.dataStream.Close() | ||
| defer p.errorStream.Close() | ||
| defer p.dataStream.Close() //nolint: errcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we have the same problem as in here and we should Reset() instead of Close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see , discussed here #117493 (comment)
|
A note for everyone who might be looking into this PR and trying it out with latest kind. kind is using internally containerd. Current version used there has the fix from containerd/containerd#8418 included, which means you won't get failures from e2e-s in this PR. Newer versions of containerd, on the other hand switched to relying on k8s provided streaming implementation, this will be available in 1.8 and 2.0 versions. So to be able to get the error back, and thus verify the functionality of this PR, one needs to re-built the base image with newer containerd and use that when building a node-image from this PR ( |
|
synced offline with @aojea and @soltysh I still don't think plumbing up specific error details from the kubelet / containerd and skipping alerting the client that there was an error is the correct approach. As I mentioned in #117493 (comment), the loop in the client that currently unconditionally tears down the overall streamConn when an error is seen handling a single portforward request seems like the place we should be making changes. git diff
diff --git a/staging/src/k8s.io/client-go/tools/portforward/portforward.go b/staging/src/k8s.io/client-go/tools/portforward/portforward.go
index 83ef3e929b3..365b7dd1603 100644
--- a/staging/src/k8s.io/client-go/tools/portforward/portforward.go
+++ b/staging/src/k8s.io/client-go/tools/portforward/portforward.go
@@ -407,12 +407,22 @@ func (pf *PortForwarder) handleConnection(conn net.Conn, port ForwardedPort) {
case <-localError:
}
+ // This is from https://github.com/kubernetes/kubernetes/pull/126718
+ /*
+ reset dataStream to discard any unsent data, preventing port forwarding from being blocked.
+ we must reset dataStream before waiting on errorChan, otherwise, the blocking data will affect errorStream and cause <-errorChan to block indefinitely.
+ */
+ _ = dataStream.Reset()
+
// always expect something on errorChan (it may be nil)
err = <-errorChan
if err != nil {
runtime.HandleError(err)
- pf.streamConn.Close()
+ // Don't tear down the whole parent port-forward pf.streamConn when there's an error handling a single request
+ // TODO: *something* has to notice and tear down the whole parent port-forward pf.streamConn when the backend is completely gone... what / where?
}
+ // This also forces drain of the errorStream, similar to dataStream above
+ _ = errorStream.Reset()
}
// Close stops all listeners of PortForwarder. |
|
@soltysh was going to dig into #117493 (comment) more as well |
|
To keep everyone updated, here are the findings so far: Adding the resets as outlined by Jordan in #117493 (comment) only partially addresses the issue. It ensures we receive all the messages, but unfortunately neither our spdy.connection nor the upstream spdy.Connection offers us option to reset the connection without actually closing it, which seems the cleanest approach atm. Without closing the channel we will hang when trying to create a new error channel in the next iteration, and will eventually timeout after 30s. (EDIT): The one test where I primarily focused on is this test from this PR, which is sending big chunks of data continuously, but from my tests will get stuck after sending ~1-2 MB of data. Surprisingly, I can keep the port-forward open and re-try sending the data after the failures and it will work again to send a similar amount of data in subsequent requests. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
|
@k8s-triage-robot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #74551 #107203
Special notes for your reviewer:
For detailed fault location process, please refer to #74551 (comment)
containerd PR: containerd/containerd#8418
This PR is mainly to correctly handle
EPIPEerrors. So we rely on the error returned byh.forwarder.PortForward[undecorated or the underlay err can be read byerrors.Is(err, syscall.EPIPE)].Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: