Skip to content

Commit 9f73396

Browse files
committed
daemon: add grpc.WithBlock option
WithBlock makes sure that the following containerd request is reliable. In one edge case with high load pressure, kernel kills dockerd, containerd and containerd-shims caused by OOM. When both dockerd and containerd restart, but containerd will take time to recover all the existing containers. Before containerd serving, dockerd will failed with gRPC error. That bad thing is that restore action will still ignore the any non-NotFound errors and returns running state for already stopped container. It is unexpected behavior. And we need to restart dockerd to make sure that anything is OK. It is painful. Add WithBlock can prevent the edge case. And n common case, the containerd will be serving in shortly. It is not harm to add WithBlock for containerd connection. Signed-off-by: Wei Fu <[email protected]>
1 parent a30990b commit 9f73396

1 file changed

Lines changed: 18 additions & 0 deletions

File tree

daemon/daemon.go

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -887,6 +887,24 @@ func NewDaemon(ctx context.Context, config *config.Config, pluginStore *plugin.S
887887
registerMetricsPluginCallback(d.PluginStore, metricsSockPath)
888888

889889
gopts := []grpc.DialOption{
890+
// WithBlock makes sure that the following containerd request
891+
// is reliable.
892+
//
893+
// NOTE: In one edge case with high load pressure, kernel kills
894+
// dockerd, containerd and containerd-shims caused by OOM.
895+
// When both dockerd and containerd restart, but containerd
896+
// will take time to recover all the existing containers. Before
897+
// containerd serving, dockerd will failed with gRPC error.
898+
// That bad thing is that restore action will still ignore the
899+
// any non-NotFound errors and returns running state for
900+
// already stopped container. It is unexpected behavior. And
901+
// we need to restart dockerd to make sure that anything is OK.
902+
//
903+
// It is painful. Add WithBlock can prevent the edge case. And
904+
// n common case, the containerd will be serving in shortly.
905+
// It is not harm to add WithBlock for containerd connection.
906+
grpc.WithBlock(),
907+
890908
grpc.WithInsecure(),
891909
grpc.WithBackoffMaxDelay(3 * time.Second),
892910
grpc.WithDialer(dialer.Dialer),

0 commit comments

Comments
 (0)