Skip to content

Windows: Retry workaround for RS1/RS2 compute system enumeration#31055

Merged
vieux merged 2 commits intomoby:masterfrom
microsoft:jjh/fix30278
Feb 18, 2017
Merged

Windows: Retry workaround for RS1/RS2 compute system enumeration#31055
vieux merged 2 commits intomoby:masterfrom
microsoft:jjh/fix30278

Conversation

@lowenna
Copy link
Copy Markdown
Member

@lowenna lowenna commented Feb 15, 2017

Signed-off-by: John Howard [email protected]

Fixes #30278

@georgyturevich @cons0l3 FYI

There's a race condition in HCS in RS1 and RS2 builds (fixed in RS3) whereby the call to
enumerate compute systems can return access denied if a silo is being torn down in the kernel
underneath it.

The net result is that docker rm may fail. In fact, using these two scripts, I have been
able to repro the failure almost immediately (usually with less than 10 containers having
been started)

docker stop recovertest
docker rm -f recovertest
remove-item "failed.txt" -erroraction silentlycontinue
while ($true) {
    $count++
    Write-Host "Iteration $count $(Get-Date)"
    docker ps -q
    docker run -d -i --name recovertest microsoft/iis
    if ($lastexitcode -ne 0) { new-item ".\failed.txt"; exit -1 }
    docker stop recovertest
    if ($lastexitcode -ne 0) { new-item ".\failed.txt"; exit -1 }
    docker rm recovertest
   if ($lastexitcode -ne 0) { new-item ".\failed.txt"; exit -1 }
   if (Test-Path ".\failed.txt") { Write-Host "Quitting as failed.txt exists"; exit -1 }
}
docker stop recovertest2
docker rm -f recovertest2
remove-item "failed.txt" -erroraction silentlycontinue
while ($true) {
    $count++
    Write-Host "Iteration $count $(Get-Date)"
    docker ps -q
    docker run -d -i --name recovertest2 microsoft/iis
    if ($lastexitcode -ne 0) { new-item ".\failed.txt"; exit -1 }
    docker stop recovertest2
    if ($lastexitcode -ne 0) { new-item ".\failed.txt"; exit -1 }
    Start-Sleep -Seconds 5
    docker rm recovertest2
   if ($lastexitcode -ne 0) { new-item ".\failed.txt"; exit -1 }
   if (Test-Path ".\failed.txt") { Write-Host "Quitting as failed.txt exists"; exit -1 }
}

With this fix, I've had the scripts running for a couple of hours without failure.

@darrenstahlmsft @johnstep PTAL.

@thaJeztah @vieux @friism - this is one that should be strongly considered for a future 1.13 release, and for CS.

Copy link
Copy Markdown
Member

@johnstep johnstep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread daemon/graphdriver/windows/windows.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Replace Note is with something like Note: It is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Signed-off-by: John Howard <[email protected]>

This fixes moby#30278 where
there is a race condition in HCS for RS1 and RS2 builds, and enumeration
of compute systems can return access is denied if a silo is being
torn down in the kernel while HCS is attempting to enumerate them.
@darstahl
Copy link
Copy Markdown
Contributor

Not a maintainer, but LGTM

@vieux
Copy link
Copy Markdown
Contributor

vieux commented Feb 18, 2017

LGTM

@vieux vieux merged commit 7761c69 into moby:master Feb 18, 2017
@thaJeztah thaJeztah mentioned this pull request Feb 18, 2017
@lowenna lowenna deleted the jjh/fix30278 branch February 18, 2017 02:03
thaJeztah pushed a commit to thaJeztah/docker that referenced this pull request Feb 18, 2017
Windows: Retry workaround for RS1/RS2 compute system enumeration
(cherry picked from commit 7761c69)
Signed-off-by: Sebastiaan van Stijn <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docker rm throws access denied error after upgrade to new version of docker

6 participants