Skip to content

Retry client connection in waitForStart#7537

Merged
samuelkarp merged 1 commit intocontainerd:mainfrom
swagatbora90:fix-client-wait-start
Oct 27, 2022
Merged

Retry client connection in waitForStart#7537
samuelkarp merged 1 commit intocontainerd:mainfrom
swagatbora90:fix-client-wait-start

Conversation

@swagatbora90
Copy link
Copy Markdown
Contributor

Issue: #7422

  • Increased context timeout to 4 sec
  • Retry client connection errors until context deadline in waitForStart

Signed-off-by: Swagat Bora [email protected]

@k8s-ci-robot
Copy link
Copy Markdown

Hi @swagatbora90. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment on lines +70 to +73
client, err = New(d.addr)
if err != nil {
continue
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked closely in the whole code, but this one felt a bit odd; is it needed to create a new client in the loop (so construct a new client for every request)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new client can sometimes fast fail, as we saw in here #7422. This change addresses that by retrying the client connection on failure at least until the context deadline has exceeded.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that New(d.addr) already makes a connection? That is a bit surprising (I wouldn't immediately expect that constructing a client requires a connection); if that's the case, I feel like that's something we need to look into.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,New() makes a connection to the containerd instance at the given address

containerd/client.go

Lines 142 to 155 in 9cd3357

connector := func() (*grpc.ClientConn, error) {
ctx, cancel := context.WithTimeout(context.Background(), copts.timeout)
defer cancel()
conn, err := grpc.DialContext(ctx, dialer.DialAddress(address), gopts...)
if err != nil {
return nil, fmt.Errorf("failed to dial %q: %w", address, err)
}
return conn, nil
}
conn, err := connector()
if err != nil {
return nil, err
}
c.conn, c.connector = conn, connector

Copy link
Copy Markdown
Member

@kzys kzys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Can you rebase this branch? I have made some reporting fixes in the main branch. So rebasing will bring them to this PR.

@kzys
Copy link
Copy Markdown
Member

kzys commented Oct 19, 2022

/ok-to-test

@swagatbora90 swagatbora90 force-pushed the fix-client-wait-start branch from d3d2fe8 to 727b33c Compare October 19, 2022 16:28
@kzys
Copy link
Copy Markdown
Member

kzys commented Oct 19, 2022

/retest

@kzys
Copy link
Copy Markdown
Member

kzys commented Oct 19, 2022

pull-containerd-node-e2e is failing not only this PR, but also others. Wait a moment...

@samuelkarp
Copy link
Copy Markdown
Member

/retest

@kzys
Copy link
Copy Markdown
Member

kzys commented Oct 26, 2022

@thaJeztah @samuelkarp Can you take a look? The tests are all green.

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@samuelkarp samuelkarp merged commit d577ef8 into containerd:main Oct 27, 2022
@swagatbora90 swagatbora90 deleted the fix-client-wait-start branch October 31, 2022 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants