ensure we cancel child context when reading grpc#4629
Conversation
Signed-off-by: Avi Deitcher <[email protected]>
|
Hi @deitch. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Build succeeded.
|
|
/ok-to-test |
|
Thanks @AkihiroSuda 😄 |
cpuguy83
left a comment
There was a problem hiding this comment.
At first thrown off by:
- Call RecvMsg until a non-nil error is returned. A protobuf-generated
client-streaming RPC, for instance, might use the helper function
CloseAndRecv (note that CloseSend does not Recv, therefore is not
guaranteed to release all resources).
But the loop doesn't necessarily read until an error since it's based on len(p) > 0.
So, LGTM.
Fixes #4617
Short summary: we go down the following chain when we try to read data from containerd via grpc:
content.Storeimplementation, specifically, proxyContentStorecontent.ReaderAtimplementation, specifically remoteReaderAtremoteReaderAtattempts to ReadAt, it callsapi/services/content/v1.ContentClient.Read()hereThe comments on
NewStreamexplicitly state:Yet we have the single context for the entire life of the connection. With a short content blob, it doesn't matter; with a larger one, you have leakages of hundreds, thousands or tens of thousands of goroutines (and whatever other resources, like memory) are attendant to them.
This creates a derived child context for each
ReadAt(), and therefore eachNewStream(), and then cancels the child context only as soon as theReadAt()is done.See the attached issue #4617 for much more detail. Short and simple test:
ctr content getfor some large blobkill -USR1 <ctr_pid>or, if you don't mind killing it entirely,kill -ABRT <ctr_pid>selectstate withgrpc.newClientStreamselectstate withgrpc.newClientStream- it is fixedAs discussed with @dmcgowan