Skip to content

[release/0.8] Cherry-pick PrepareLayer fixes#1131

Merged
dcantah merged 2 commits intomicrosoft:release/0.8from
dcantah:cp-preparelayer
Aug 26, 2021
Merged

[release/0.8] Cherry-pick PrepareLayer fixes#1131
dcantah merged 2 commits intomicrosoft:release/0.8from
dcantah:cp-preparelayer

Conversation

@dcantah
Copy link
Copy Markdown
Contributor

@dcantah dcantah commented Aug 26, 2021

This PR cherry-picks some best effort fixes for the PrepareLayer call/layer setup. The errors that these try and fix have been observed in a couple of k8s scenarios.

From:

  1. Add retry around wclayer operations for process isolated containers #1091
  2. Add sleep before layer operation retries #1122

This change adds a simple retry loop to handle some behavior on RS5. Loopback VHDs
used to be mounted in a different manor on RS5 (ws2019) which led to some
very odd cases where things would succeed when they shouldn't have, or we'd simply
timeout if an operation took too long. Many parallel invocations of this code path
and stressing the machine seem to bring out the issues, but all of the possible failure
paths that bring about the errors we have observed aren't known.

On 19h1+ this retry loop shouldn't be needed, but the logic is to leave the loop if everything succeeded so this is harmless
and shouldn't need a version check.

Signed-off-by: Daniel Canter <[email protected]>
(cherry picked from commit 01b9911)
Signed-off-by: Daniel Canter <[email protected]>
This change adds a small sleep before a re-attempt on layer operation
failures. These failures should only happen on RS5 and the probable cause is because
of a different way in which container loopback vhds were mounted on this OS version.
A theory of why things might go awry on RS5 is due to some events from pnp getting reported
too late/early. If the prognosis is correct, a small sleep might help to try and get
things back into a "good" state before a reattempt.

Signed-off-by: Daniel Canter <[email protected]>
(cherry picked from commit adc35b0)
Signed-off-by: Daniel Canter <[email protected]>
@dcantah dcantah requested a review from a team as a code owner August 26, 2021 00:12
Copy link
Copy Markdown

@katiewasnothere katiewasnothere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ambarve
Copy link
Copy Markdown
Contributor

ambarve commented Aug 26, 2021

lgtm

@dcantah dcantah merged commit d8dfad1 into microsoft:release/0.8 Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants