Skip to content

client: surface a connection error more clearly#4447

Merged
estesp merged 1 commit intocontainerd:mainfrom
kzys:grpc-error
Jul 12, 2021
Merged

client: surface a connection error more clearly#4447
estesp merged 1 commit intocontainerd:mainfrom
kzys:grpc-error

Conversation

@kzys
Copy link
Copy Markdown
Member

@kzys kzys commented Aug 3, 2020

gRPC-Go recently added grpc.WithReturnConnectionError() which improves
the situation of #2576.

Permission errors:

% ./bin/ctr t ls
ctr: failed to dial "/run/containerd/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: permission denied"
%

Non-existent sockets:

% ./bin/ctr -a notfound t ls
ctr: failed to dial "notfound": context deadline exceeded: connection error: desc = "transport: error while dialing: dial unix://notfound: timeout"
%

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Aug 3, 2020

Build succeeded.

Copy link
Copy Markdown
Contributor

@Zyqsempai Zyqsempai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Aug 3, 2020

Do you folks think the fix is good enough to resolve #2576?

@estesp
Copy link
Copy Markdown
Member

estesp commented Aug 3, 2020

This looks like a good improvement; I think the important decision point is whether we want a significant version upgrade to gRPC right before the 1.4 release? I don't know the level of impact--obviously CI testing doesn't show any impact on the surface, but would be good to get some further testing of this if possible.

@dims
Copy link
Copy Markdown
Member

dims commented Aug 3, 2020

Apparently this API is experimental :(

grpc/grpc-go@b02de00
https://github.com/grpc/grpc-go/blame/master/dialoptions.go#L293

bumping k/k to this new grpc version will be challenging as well.

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Aug 4, 2020

Thanks for reviewing. The issue is not currently blocking anything on my end. I'm fine postponing the change to 1.5.

@dmcgowan dmcgowan added this to the 1.5 milestone Aug 4, 2020
@kzys
Copy link
Copy Markdown
Member Author

kzys commented Sep 21, 2020

Since we have release/1.4 and master is for 1.5, can we merge this change now?

@estesp
Copy link
Copy Markdown
Member

estesp commented Sep 21, 2020

@dims any thoughts on how this needs to play out with other consumers of gRPC? I don't think we have ever been in lock step with components above or below containerd, but I'm not enough of a gRPC expert to state that with conviction. :) I know the CRI plugin has gRPC vendored as well, and we at least need to validate that moving CRI to a later version is workable before we officially move the core containerd vendoring.

@dims
Copy link
Copy Markdown
Member

dims commented Sep 22, 2020

So the chain is ... containerd client is used by cadvisor, which is used by kubernetes and kubernetes in turn uses etcd

Things are stalled. To unblock we have to:

  • updated etcd first
  • get etcd vendored into kubernetes
  • update containerd
  • get containerd vendored into cadvisor
  • then get new cadvisor vendored into kubernetes

This is as far as i can tell at the moment :)

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Jan 21, 2021

etcd's gRPC upgrade PR is now etcd-io/etcd#12636.

@dmcgowan
Copy link
Copy Markdown
Member

Since etcd-io/etcd#12658 got merged, are we good to proceed with the vendor update chain?

@dims
Copy link
Copy Markdown
Member

dims commented Feb 26, 2021

@dmcgowan we need this in tag/release of etcd unfortunately

@dims
Copy link
Copy Markdown
Member

dims commented Apr 5, 2021

UPDATE: we are making some good progress here kubernetes/kubernetes#100488 - but not enough to merge this in containerd 1.5. Let's please move the milestone to 1.6.

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Jun 16, 2021

kubernetes/kubernetes#100488 has been merged! Meanwhile I've extracted grpc upgrade as #5613.

gRPC-Go recently added grpc.WithReturnConnectionError() which improves
the situation of containerd#2576.

Permission errors:

  % ./bin/ctr t ls
  ctr: failed to dial "/run/containerd/containerd.sock": connection error: desc = "transport: error while dialing: dial unix /run/containerd/containerd.sock: connect: permission denied"
  %

Non-existent sockets:

  % ./bin/ctr -a notfound t ls
  ctr: failed to dial "notfound": context deadline exceeded: connection error: desc = "transport: error while dialing: dial unix://notfound: timeout"
  %

Signed-off-by: Kazuyoshi Kato <[email protected]>
@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Jul 2, 2021

Build succeeded.

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Jul 6, 2021

@dims @estesp Can you take a look the change again? Since we've upgraded gRPC in main, this PR becomes one-line change now :)

Copy link
Copy Markdown
Member

@dims dims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kzys
Copy link
Copy Markdown
Member Author

kzys commented Jul 12, 2021

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants