Windows: Try to detach sandbox on cleanup to avoid permission denied#37712
Windows: Try to detach sandbox on cleanup to avoid permission denied#37712yongtang merged 2 commits intomoby:masterfrom
Conversation
|
Note this might not (won't) pass vendoring until #37674 is merged. |
|
@carlfischer1. Another backport candidate. |
There was a problem hiding this comment.
Kind of strange to call the error exists, since the true case is nil, but okay, I guess.
There was a problem hiding this comment.
Agree this is confusing to read. Does can we just call DetachVhd directly instead of checking existence first?
There was a problem hiding this comment.
I'll update to address both of these.
There was a problem hiding this comment.
Is it really worth retrying if nothing changed (the sandbox file did not exist)?
Codecov Report
@@ Coverage Diff @@
## master #37712 +/- ##
=========================================
Coverage ? 36.02%
=========================================
Files ? 610
Lines ? 45071
Branches ? 0
=========================================
Hits ? 16238
Misses ? 26603
Partials ? 2230 |
|
Rebased. Windows CI and vendor passed (the only path affected here). Rest are completing. @cpuguy83 are you OK with this now? @thaJeztah Could you take a look too please? |
cpuguy83
left a comment
There was a problem hiding this comment.
Can we add some additional context to the error messages?
There was a problem hiding this comment.
Nit, but a little cleaner if we return early when "!os.IsPermission(err)"
There was a problem hiding this comment.
Oooh, missed that comment @cpuguy83. Yes, that's a lot cleaner. Updated.
Signed-off-by: John Howard <[email protected]>
|
Can you just add some extra context to the new errors, e.g. |
@cpuguy83 Done. |
Signed-off-by: John Howard <[email protected]> This is a fix for a few related scenarios where it's impossible to remove layers or containers until the host is rebooted. Generally (or at least easiest to repro) through a forced daemon kill while a container is running. Possibly slightly worse than that, as following a host reboot, the scratch layer would possibly be leaked and left on disk under the dataroot\windowsfilter directory after the container is removed. One such example of a failure: 1. run a long running container with the --rm flag docker run --rm -d --name test microsoft/windowsservercore powershell sleep 30 2. Force kill the daemon not allowing it to cleanup. Simulates a crash or a host power-cycle. 3. (re-)Start daemon 4. docker ps -a PS C:\control> docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7aff773d782b malloc "powershell start-sl…" 11 seconds ago Removal In Progress malloc 5. Try to remove PS C:\control> docker rm 7aff Error response from daemon: container 7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d: driver "windowsfilter" failed to remove root filesystem: rename C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d C:\control\windowsfilter\7aff773d782bbf35d95095369ffcb170b7b8f0e6f8f65d5aff42abf61234855d-removing: Access is denied. PS C:\control> Step 5 fails.
|
@thaJeztah Do you know if this will make it into 18.09 EE? Not sure what the cut-off was. |
|
@jhowardmsft looks like it didn't (but will double check when I'm back at my keyboard), I can see I opened a backport request for it, but it slipped of my radar. I think a patch release is scheduled, so I'll open backports where needed, so that it will be included in that release |
|
@thaJeztah Thanks. Kind of an important one as I know it fixes multiple production issues I've had to debug. |
|
Still seeing this with 19.03.2 on a clean Windows 2016 VM created from ISO image and fully patched. driver "windowsfilter" failed to remove root filesystem: hcsshim::GetComputeSystems: Access is denied. |
|
We had the same issue as @drnybble. (We used Unlocker Windows to delete the files) |
|
I am still facing this problem. I use the "Docker Desktop" in Windows 10 OS, I built image with "mcr.microsoft.com/windows/servercore:1709", during the build I got the following message I could not remove the left-behind containers with command "docker container prune --force", after I restarted my computer, I could use that command to delete the left-behind containers. Below is the information of "Docker" that I am using |
|
I have the same issue as #37712 (comment) . aka running the |
Fixes #36218
A few different scenarios, but it's possible that the daemon can leak sandbox/scratch layers in the Windows graphdriver, and have side effects where containers can't be removed. Here's one such case:
Before:
After: