Skip to content

Unify dialer implementations#3802

Merged
estesp merged 1 commit intocontainerd:masterfrom
vladimiroff:unify-dialers
Feb 26, 2020
Merged

Unify dialer implementations#3802
estesp merged 1 commit intocontainerd:masterfrom
vladimiroff:unify-dialers

Conversation

@vladimiroff
Copy link
Copy Markdown
Contributor

Instead of having several dialer implementations, leave only one in
pkg/dialer and call it from pkg/ttrpcutil, runtime/v(1|2)/shim
which had their own.

Closes #3471.

Signed-off-by: Kiril Vladimiroff [email protected]

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Nov 7, 2019

Build succeeded.

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Nov 7, 2019

Build succeeded.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Nov 7, 2019

Codecov Report

Merging #3802 into master will decrease coverage by 4.08%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3802      +/-   ##
==========================================
- Coverage   42.35%   38.26%   -4.09%     
==========================================
  Files         130       88      -42     
  Lines       14683    12357    -2326     
==========================================
- Hits         6219     4729    -1490     
+ Misses       7539     6983     -556     
+ Partials      925      645     -280
Flag Coverage Δ
#linux ?
#windows 38.26% <ø> (+0.44%) ⬆️
Impacted Files Coverage Δ
archive/tar_opts.go 11.76% <0%> (-47.06%) ⬇️
cio/io.go 1.4% <0%> (-45.08%) ⬇️
snapshots/native/native.go 1.73% <0%> (-40.7%) ⬇️
archive/tar.go 18.95% <0%> (-29.24%) ⬇️
metadata/snapshot.go 33.18% <0%> (-15.29%) ⬇️
metadata/adaptors.go 39.04% <0%> (-9.53%) ⬇️
oci/spec_opts.go 32.91% <0%> (-2.65%) ⬇️
gc/scheduler/scheduler.go 66.34% <0%> (-0.97%) ⬇️
content/local/writer.go 57.69% <0%> (-0.97%) ⬇️
platforms/cpuinfo.go 4.28% <0%> (-0.33%) ⬇️
... and 51 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 640ca78...4dd75be. Read the comment docs.

Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall I think this is good; a few questions and one that maybe we need someone from the Windows team to comment on regarding the extra timeout setup for shims

Comment thread pkg/dialer/dialer_windows.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to get some feedback from one of the Windows developers on this--this comment and the timeout seems to be specific to shims, but now is in the code used by many different parts of the code (Go client, server-side gRPC connection(s), etc.). It also seems like a lot of timeouts going on.. the timeout from the caller; if you look at the wrapping call it also has a timeout running while calling the dialer func. Would be good to get clarity, and at a minimum if there is specific shim-behavior then there probably has to be a wrapping call for shims to use separately from the unified dialer function.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@estesp - For RuntimeV2 (shim) activation's Windows uses named pipes not a fifo. Due to this client server model it means that there is a delay between when the call to shim.exe start ... and the actual pipe is served by the shim. This timeout is meant to handle that case but I agree with such a generic dialer it doesn't make a ton of sense. We really should have that caller pass in the increased timeout knowing that its a shim.exe start ... call.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems weird that the dialer package already handles retries every 10ms up to timeout and this dialer func will do the same

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jterry75 let me see if I get this correctly, because I really have no idea how any of this works on windows:

  • pkg/dialer/dialer_windows.go should just do winio.DialPipe(address, &timeout) as before this PR and ttrpcutil should just call it

  • this awkward 5 second retry logic should only be left inside the runtimeV2's shim

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think that is correct. It should issues a DialPipe as it did previously and only handle os.IsNotExist as a retry candidate in isNoent. Otherwise it should return the error immediately. The caller from runtime v2 shim should use the 5 second timeout

Comment thread runtime/v1/shim/client/client.go Outdated
Comment thread runtime/v2/shim/util_unix.go Outdated
Comment thread pkg/ttrpcutil/client_windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@estesp - And to your point, this one is already odd. This is the ttrpc connection from a shim BACK to containerd.exe for the events publisher ttrpc API. It makes little to no sense to not fail that call immediately if it didn't connect. IE: a shim tried to connect to containerd.exe but couldn't? containerd.exe must not be running.

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Nov 11, 2019

Build succeeded.

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Nov 11, 2019

Build succeeded.

Comment thread pkg/dialer/dialer_windows.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already handled by timeoutDialer you should be able to just call winio.DialPipe(address, &timeout) right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I've completely forgot that timeoutDialer handles that.

Comment thread runtime/v2/shim/util_unix.go Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about util_windows.go AnonDialer?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm... what about it? 59374543f4aba28e5ac16ebb1875034ff52c9dab reverts my initial changes and brings back that 5 second timeout as before.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we are not going to use dialer for it then?

Copy link
Copy Markdown
Contributor Author

@vladimiroff vladimiroff Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see a good reason for it, unless that comment is wrong:

If there is nobody serving the pipe we limit the timeout for this case to
5 seconds because any shim that would serve this endpoint should serve it
within 5 seconds.

If it's correct, we would still have to create a context with timeout of 5 seconds and wait for it, then call the timeoutDialer which does time.After(timeout) and calls winio.DialPipe where yet another timeout is handled with a context.

Handling timeout on so many layers seems wrong to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jterry75 does this make sense? Should I anyways use the pkg/dialer?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im fine with not merging them I was just making sure you didnt miss it by accident

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Nov 12, 2019

Build succeeded.

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Dec 6, 2019

Build succeeded.

Copy link
Copy Markdown
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but TrimPrefix for abstract sockets should be kept back.

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jan 21, 2020

Build succeeded.

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jan 21, 2020

Build succeeded.

Comment thread pkg/dialer/dialer_windows.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is related to windows platform or go version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the question. This change is just a simplification. os.IsNotExist unwraps *os.PathError so there's no point in doing it manually.

This function is needed when the ENOENT is wrapped inside *net.OpError (i.e. on unix)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go1.13 starts to support wrap https://golang.org/doc/go1.13#error_wrapping. containerd don't add limitation of golang right now. If someone uses go1.12 to build it, it might be issue. That is what I said that it is related to golang version.

Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp
Copy link
Copy Markdown
Member

estesp commented Feb 26, 2020

This is ready to merge, but given the 7 commits are sometimes undoing earlier commits, the commit history would be cleaner if you can squash them to a single commit at this point. After that I'm happy to merge it.

Instead of having several dialer implementations, leave only one in
`pkg/dialer` and call it from `pkg/ttrpcutil`, `runtime/v(1|2)/shim`
which had their own

Closes containerd#3471.

Signed-off-by: Kiril Vladimiroff <[email protected]>
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Feb 26, 2020

Build succeeded.

@estesp
Copy link
Copy Markdown
Member

estesp commented Feb 26, 2020

Thanks for your patience on this one--nice to see +7, -100 lines of code.. good cleanup!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unify dialer implementations

6 participants