Unify dialer implementations#3802
Conversation
|
Build succeeded.
|
57a81e4 to
9ee758d
Compare
|
Build succeeded.
|
Codecov Report
@@ Coverage Diff @@
## master #3802 +/- ##
==========================================
- Coverage 42.35% 38.26% -4.09%
==========================================
Files 130 88 -42
Lines 14683 12357 -2326
==========================================
- Hits 6219 4729 -1490
+ Misses 7539 6983 -556
+ Partials 925 645 -280
Continue to review full report at Codecov.
|
estesp
left a comment
There was a problem hiding this comment.
overall I think this is good; a few questions and one that maybe we need someone from the Windows team to comment on regarding the extra timeout setup for shims
There was a problem hiding this comment.
Would be good to get some feedback from one of the Windows developers on this--this comment and the timeout seems to be specific to shims, but now is in the code used by many different parts of the code (Go client, server-side gRPC connection(s), etc.). It also seems like a lot of timeouts going on.. the timeout from the caller; if you look at the wrapping call it also has a timeout running while calling the dialer func. Would be good to get clarity, and at a minimum if there is specific shim-behavior then there probably has to be a wrapping call for shims to use separately from the unified dialer function.
There was a problem hiding this comment.
@estesp - For RuntimeV2 (shim) activation's Windows uses named pipes not a fifo. Due to this client server model it means that there is a delay between when the call to shim.exe start ... and the actual pipe is served by the shim. This timeout is meant to handle that case but I agree with such a generic dialer it doesn't make a ton of sense. We really should have that caller pass in the increased timeout knowing that its a shim.exe start ... call.
There was a problem hiding this comment.
It also seems weird that the dialer package already handles retries every 10ms up to timeout and this dialer func will do the same
There was a problem hiding this comment.
@jterry75 let me see if I get this correctly, because I really have no idea how any of this works on windows:
-
pkg/dialer/dialer_windows.goshould just dowinio.DialPipe(address, &timeout)as before this PR and ttrpcutil should just call it -
this awkward 5 second retry logic should only be left inside the runtimeV2's shim
There was a problem hiding this comment.
Yes I think that is correct. It should issues a DialPipe as it did previously and only handle os.IsNotExist as a retry candidate in isNoent. Otherwise it should return the error immediately. The caller from runtime v2 shim should use the 5 second timeout
There was a problem hiding this comment.
@estesp - And to your point, this one is already odd. This is the ttrpc connection from a shim BACK to containerd.exe for the events publisher ttrpc API. It makes little to no sense to not fail that call immediately if it didn't connect. IE: a shim tried to connect to containerd.exe but couldn't? containerd.exe must not be running.
|
Build succeeded.
|
35e6e11 to
2a18ddb
Compare
|
Build succeeded.
|
There was a problem hiding this comment.
This is already handled by timeoutDialer you should be able to just call winio.DialPipe(address, &timeout) right?
There was a problem hiding this comment.
Right. I've completely forgot that timeoutDialer handles that.
There was a problem hiding this comment.
What about util_windows.go AnonDialer?
There was a problem hiding this comment.
Erm... what about it? 59374543f4aba28e5ac16ebb1875034ff52c9dab reverts my initial changes and brings back that 5 second timeout as before.
There was a problem hiding this comment.
So we are not going to use dialer for it then?
There was a problem hiding this comment.
I don't really see a good reason for it, unless that comment is wrong:
If there is nobody serving the pipe we limit the timeout for this case to
5 seconds because any shim that would serve this endpoint should serve it
within 5 seconds.
If it's correct, we would still have to create a context with timeout of 5 seconds and wait for it, then call the timeoutDialer which does time.After(timeout) and calls winio.DialPipe where yet another timeout is handled with a context.
Handling timeout on so many layers seems wrong to me.
There was a problem hiding this comment.
@jterry75 does this make sense? Should I anyways use the pkg/dialer?
There was a problem hiding this comment.
Im fine with not merging them I was just making sure you didnt miss it by accident
|
Build succeeded.
|
57f2cf5 to
77e0b7b
Compare
|
Build succeeded.
|
AkihiroSuda
left a comment
There was a problem hiding this comment.
Overall looks good, but TrimPrefix for abstract sockets should be kept back.
|
Build succeeded.
|
ad1a87c to
b89780c
Compare
|
Build succeeded.
|
There was a problem hiding this comment.
This change is related to windows platform or go version?
There was a problem hiding this comment.
Not sure I understand the question. This change is just a simplification. os.IsNotExist unwraps *os.PathError so there's no point in doing it manually.
This function is needed when the ENOENT is wrapped inside *net.OpError (i.e. on unix)
There was a problem hiding this comment.
go1.13 starts to support wrap https://golang.org/doc/go1.13#error_wrapping. containerd don't add limitation of golang right now. If someone uses go1.12 to build it, it might be issue. That is what I said that it is related to golang version.
|
This is ready to merge, but given the 7 commits are sometimes undoing earlier commits, the commit history would be cleaner if you can squash them to a single commit at this point. After that I'm happy to merge it. |
Instead of having several dialer implementations, leave only one in `pkg/dialer` and call it from `pkg/ttrpcutil`, `runtime/v(1|2)/shim` which had their own Closes containerd#3471. Signed-off-by: Kiril Vladimiroff <[email protected]>
b89780c to
4dd75be
Compare
|
Build succeeded.
|
|
Thanks for your patience on this one--nice to see +7, -100 lines of code.. good cleanup! |
Instead of having several dialer implementations, leave only one in
pkg/dialerand call it frompkg/ttrpcutil,runtime/v(1|2)/shimwhich had their own.
Closes #3471.
Signed-off-by: Kiril Vladimiroff [email protected]