Skip to content

[Windows] Fix deadline exceeded in daemon restart#6635

Merged
estesp merged 1 commit intocontainerd:mainfrom
gabriel-samfira:fix-deadline-exceeded-in-daemon-restart
Mar 10, 2022
Merged

[Windows] Fix deadline exceeded in daemon restart#6635
estesp merged 1 commit intocontainerd:mainfrom
gabriel-samfira:fix-deadline-exceeded-in-daemon-restart

Conversation

@gabriel-samfira
Copy link
Copy Markdown
Contributor

Windows needs a bit more time to finish the restarting containerd. With the current 2 second timeout, we run the risk of exceeding that deadline.

Windows needs a bit more time to finish the restarting containerd. With
the current 2 second timeout, we run the risk of exceeding that
deadline.

Signed-off-by: Gabriel Adrian Samfira <[email protected]>
@k8s-ci-robot
Copy link
Copy Markdown

Hi @gabriel-samfira. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kzys
Copy link
Copy Markdown
Member

kzys commented Mar 8, 2022

/ok-to-test

@theopenlab-ci
Copy link
Copy Markdown

theopenlab-ci Bot commented Mar 8, 2022

Build succeeded.

  • containerd-build-arm64 : RETRY_LIMIT in 3m 41s (non-voting)
  • containerd-test-arm64 : RETRY_LIMIT in 6m 01s (non-voting)
  • containerd-integration-test-arm64 : RETRY_LIMIT in 24m 36s (non-voting)

@jterry75
Copy link
Copy Markdown
Contributor

jterry75 commented Mar 8, 2022

@dcantah - I thought this was already solved by parallel reconnect?

@dcantah
Copy link
Copy Markdown
Member

dcantah commented Mar 8, 2022

@dcantah - I thought this was already solved by parallel reconnect?

Is this PR just Containerd on Windows takes > 2 seconds to startup and that's failing the test? I'd done some work to try reconnecting any task IO if containerd restarted, but not sure this is the reason for the failure in this test. https://github.com/microsoft/hcsshim/blob/master/internal/cmd/io_npipe.go#L77-L159

@gabriel-samfira
Copy link
Copy Markdown
Contributor Author

Is this PR just Containerd on Windows takes > 2 seconds to startup and that's failing the test?

Yes,

I'd done some work to try reconnecting any task IO if containerd restarted, but not sure this is the reason for the failure in this test.

It hink it's just a matter of timing. The wait time is just 2 seconds. If containerd takes longer to start on Windows, the test will fail. It's not an issue with hcsshim. The client tries to reconnect to containerd within 2 seconds after restart and sometimes (not always) fails.

Copy link
Copy Markdown
Contributor

@jterry75 jterry75 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Ok SGTM

Copy link
Copy Markdown
Member

@estesp estesp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@estesp estesp merged commit 8047eb2 into containerd:main Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants