server: Fix connection issues when receiving ECONNRESET#123
server: Fix connection issues when receiving ECONNRESET#123dmcgowan merged 1 commit intocontainerd:mainfrom
Conversation
14878b8 to
8a6a44e
Compare
Shutdown()
9079746 to
edfef48
Compare
server.go
Outdated
| // requests due to a terminal error. | ||
| recvErr = nil // connection is now "closing" | ||
| if err == io.EOF || err == io.ErrUnexpectedEOF { | ||
| opErr, isOpErr := err.(*net.OpError) |
There was a problem hiding this comment.
What's the purpose of this type check? Shouldn't we just be able to use errors.Is without this?
There was a problem hiding this comment.
60d8f33 to
031309f
Compare
| if err != nil { | ||
| logrus.WithError(err).Error("error receiving message") | ||
| } | ||
| logrus.WithError(err).Error("error receiving message") |
There was a problem hiding this comment.
Why remove the nil check though? If this message is needed, then perhaps we need to change sendStatus to return error to send on recvError.
It seems to me though that this is trying to catch the shutdown case which this block is purposefully avoiding when there is active connections. If the return after this line is necessary, then the shutdown logic in this goroutine still doesn't seem correct.
There was a problem hiding this comment.
As code at https://github.com/containerd/ttrpc/blob/v1.1.0/server.go#L363 will do the nil check and only send non-nill error to the recvErr channel, I think it's safe to remove the nil check here.
It seems to me though that this is trying to catch the shutdown case which this block is purposefully avoiding when there is active connections. If the return after this line is necessary, then the shutdown logic in this goroutine still doesn't seem correct.
Yes, the return here is not necessary and should be done by the shutdown logic. I‘ll remove it, thx
It seems like the correct behavior would be to handle the connection since it has already been accepted. In this change, the check is being done before, but the same race is still possible since there is no guarantee in execution order between checking For the |
031309f to
9ccde4c
Compare
@dmcgowan You are right, the race condition still exists. I found the possible actual issue is |
fd47460 to
8a747cf
Compare
The ttrpc server somtimes receives `ECONNRESET` rather than `EOF` when the client is exited. Such as reading from a closed connection. Handle it properly to avoid goroutine and connection leak. Change-Id: If32711cfc1347dd2da27ca846dd13c3f5af351bb Signed-off-by: liyuxuan.darfux <[email protected]>
8a747cf to
a03aa04
Compare
Shutdown()|
Thanks for the fix to this leak that we've been seeing for a while It still seems wrong to me that the default case is to set recvErr = nil (which keeps this select case from ever unblocking again) and continuing the loop after logging the error. Yes this fixes ECONNRESET but what happens the next time you see an esoteric error? You will have the same problem. Why dont you just 'return' all the time, and only use the if to decide whether or not to log the error? If your concern is not draining the other channels, then the goroutine that is writing to those channels should close them in a deferral like it does here: This would guarantee that if the writer (This goroutine) dies, the channels are closed and the select isn't stuck hanging. |
Hi @benbuzbee, as discussed before, we considered other cases should be handled by the shutdown logic. But I'm agree with you, all errors should return in the block as the related connection will no longer receive requests. The shutdown case is only used for normal connection without any error. cc @dmcgowan |
The ttrpc server somtimes receives
ECONNRESETrather thanEOFwhen the client is exited. Such as reading from a closed connection. Handle it properly to avoid goroutine and connection leak.Below is a debug example:
Currently there's a possible race condition when shutting down the
server, which causes new connection to be added after
Shutdown()