Conversation
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
alpeb
reviewed
Oct 17, 2023
Signed-off-by: Alex Leong <[email protected]>
| // The endStream channel has already been closed so no action is | ||
| // necessary. | ||
| default: | ||
| et.log.Error("endpoint update queue full; ending stream") |
Contributor
There was a problem hiding this comment.
What information do we have about the stream here? Can we log additional diagnostic information? E.g. the address of the proxy.
Can we also explain what we expect to happen by ending the stream? E.g. is this something the user needs to come to us with an issue on? Or do they just need to know that the proxy should just reconnect organically.
|
|
||
| et.log.Debugf("Sending destination no endpoints: %+v", u) | ||
| if err := et.stream.Send(u); err != nil { | ||
| et.log.Debugf("Failed to send address update: %s", err) |
Contributor
There was a problem hiding this comment.
I know this is previous code that we've moved, but why would an error would be logged at Debug level?
Member
There was a problem hiding this comment.
We expect to see this every time a client proxy restarts/closes a stream.
mateiidavid
reviewed
Oct 18, 2023
Signed-off-by: Alex Leong <[email protected]>
mateiidavid
approved these changes
Oct 18, 2023
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
Signed-off-by: Alex Leong <[email protected]>
olix0r
approved these changes
Oct 18, 2023
mateiidavid
pushed a commit
that referenced
this pull request
Oct 26, 2023
When a grpc client of the destination.Get API initiates a request but then doesn't read off of that stream, the HTTP2 stream flow control window will fill up and eventually exert backpressure on the destination controller. This manifests as calls to `Send` on the stream blocking. Since `Send` is called synchronously from the client-go informer callback (by way of the endpoint translator), this blocks the informer callback and prevents all further informer calllbacks from firing. This causes the destination controller to stop sending updates to any of its clients. We add a queue in the endpoint translator so that when it gets an update from the informer callback, that update is queued and we avoid potentially blocking the informer callback. Each endpoint translator spawns a goroutine to process this queue and call `Send`. If there is not capacity in this queue (e.g. because a client has stopped reading and we are experiencing backpressure) then we terminate the stream. Signed-off-by: Alex Leong <[email protected]>
Merged
alpeb
pushed a commit
that referenced
this pull request
Nov 9, 2023
#11491 changed the EndpointTranslator to use a queue to avoid calling `Send` on a gRPC stream directly from an informer callback goroutine. This change updates the ProfileTranslator in the same way, adding a queue to ensure we do not block the informer thread. Signed-off-by: Alex Leong <[email protected]>
alpeb
added a commit
that referenced
this pull request
Nov 9, 2023
## edge-23.11.2 This edge release contains observability improvements and bug fixes to the Destination controller, and a refinement to the multicluster gateway resolution logic. * Fixed an issue where the Destination controller could stop processing service profile updates, if a proxy subscribed to those updates stops reading them; this is a followup to the issue [#11491] fixed in edge-23.10.3 ([#11546]) * In the Destination controller, added informer lag histogram metrics to track whenever the objects tracked are falling behind the state in the kube-apiserver ([#11534]) * In the multicluster service mirror, extended the target gateway resolution logic to take into account all the possible IPs a hostname might resolve to, not just the first one (thanks @MrFreezeex!) ([#11499]) * Added probes to the debug container to appease environments requiring probes for all containers ([#11308])
alpeb
added a commit
that referenced
this pull request
Nov 9, 2023
## edge-23.11.2 This edge release contains observability improvements and bug fixes to the Destination controller, and a refinement to the multicluster gateway resolution logic. * Fixed an issue where the Destination controller could stop processing service profile updates, if a proxy subscribed to those updates stops reading them; this is a followup to the issue [#11491] fixed in [edge-23.10.3] ([#11546]) * In the Destination controller, added informer lag histogram metrics to track whenever the Kubernetes objects watched by the controller are falling behind the state in the kube-apiserver ([#11534]) * In the multicluster service mirror, extended the target gateway resolution logic to take into account all the possible IPs a hostname might resolve to, rather than just the first one (thanks @MrFreezeex!) ([#11499]) * Added probes to the debug container to appease environments requiring probes for all containers ([#11308]) [edge-23.10.3]: https://github.com/linkerd/linkerd2/releases/tag/edge-23.10.3 [#11546]: #11546 [#11534]: #11534 [#11499]: #11499 [#11308]: #11308
adleong
added a commit
that referenced
this pull request
Nov 16, 2023
#11491 changed the EndpointTranslator to use a queue to avoid calling `Send` on a gRPC stream directly from an informer callback goroutine. This change updates the ProfileTranslator in the same way, adding a queue to ensure we do not block the informer thread. Signed-off-by: Alex Leong <[email protected]>
Merged
adleong
added a commit
that referenced
this pull request
Nov 16, 2023
This stable release improves observability for the control plane by adding additional logging to the destination controller and by adding histograms which can detect Kubernetes informer lag. It also adds the ability to configure protocol detection. * Improved logging in the destination controller by adding the client pod's name to the logging context. This will improve visibility into the messages sent and received by the control plane from a specific proxy ([#11532]) * helm: Introduce configurable values for protocol detection ([#11536]) * Fixed an issue where the Destination controller could stop processing service profile updates, if a proxy subscribed to those updates stops reading them; this is a followup to the issue [#11491] fixed in [stable-2.14.2] ([#11546]) * In the Destination controller, added informer lag histogram metrics to track whenever the Kubernetes objects watched by the controller are falling behind the state in the kube-apiserver ([#11534]) * proxy: Fix grpc_status metric labels for inbound traffic [stable-2.14.2]: https://github.com/linkerd/linkerd2/releases/tag/stable-2.14.2 [#11532]: #11532 [#11536]: #11536 [#11546]: #11546 [#11534]: #11534 --------- Signed-off-by: Matei David <[email protected]> Signed-off-by: Alex Leong <[email protected]> Co-authored-by: Matei David <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When a grpc client of the destination.Get API initiates a request but then doesn't read off of that stream, the HTTP2 stream flow control window will fill up and eventually exert backpressure on the destination controller. This manifests as calls to
Sendon the stream blocking. SinceSendis called synchronously from the client-go informer callback (by way of the endpoint translator), this blocks the informer callback and prevents all further informer calllbacks from firing. This causes the destination controller to stop sending updates to any of its clients.We add a queue in the endpoint translator so that when it gets an update from the informer callback, that update is queued and we avoid potentially blocking the informer callback. Each endpoint translator spawns a goroutine to process this queue and call
Send. If there is not capacity in this queue (e.g. because a client has stopped reading and we are experiencing backpressure) then we terminate the stream.