-
Notifications
You must be signed in to change notification settings - Fork 82
Description
Hey folks, while tracking down a production error I inadvertently found a rare error in the receive message logic for server connections.
To set some context the receive message logic relies on two goroutines working together.
goroutine A: https://github.com/containerd/ttrpc/blob/main/server.go#L136
goroutine B: https://github.com/containerd/ttrpc/blob/main/server.go#L369-L487
A [built-in Go] error channel is used for communication of errors between goroutines A and B.
https://github.com/containerd/ttrpc/blob/main/server.go#L339
The first thing to note is on message header read error the channel returns the underlying error to goroutine B. Based on the current implementation, these errors can only be from io.ReadFull. io.ReadFull can return io.EOF or io.ErrUnexpectedEOF. Reference: https://pkg.go.dev/io#ReadFull
https://github.com/containerd/ttrpc/blob/main/channel.go#L73-L76
This is fine because goroutine A has error handling to check for these errors and stop execution if they occur.
https://github.com/containerd/ttrpc/blob/main/server.go#L545-L555
The issue begins when considering errors on message read by goroutine B. In this branch, goroutine B has successfully read a message header for a message with some length > 0; however, when reading the message an error has occurred by io.ReadFull either io.EOF or io.ErrUnexpectedEOF.
What I have found is for these errors we wrap the error with more information to clarify the error (opposed to on message header read) occurred on message read.
https://github.com/containerd/ttrpc/blob/main/channel.go#L138
However, goroutine A's error handling logic is not currently setup to check for wrapped errors.
https://github.com/containerd/ttrpc/blob/main/server.go#L550-L554
So the effect is goroutine B is stopped due to a [non-gRPC] receive error, but goroutine A is never stopped resulting in leaked goroutine and server connection.
Stop goroutine B on receive error: https://github.com/containerd/ttrpc/blob/main/server.go#L380-L385
Again (currently) only stop goroutine A on some receive errors: https://github.com/containerd/ttrpc/blob/main/server.go#L550-L554