This is follow-up of debug I did on docker/for-win#3229
It seems that my issue is slightly different from that one actually.
Setup
- OS: Windows Server 2019(Headless)
- docker version: 19.03.2 (client and server)
How to reproduce
When building a large docker container:
FROM mcr.microsoft.com/windows/servercore
RUN @powershell -NoProfile -ExecutionPolicy unrestricted -Command "(iex ((new-object net.webclient).DownloadString('https://chocolatey.org/install.ps1')))"
RUN choco install msys2
Actually my setup is a bit more complicated, as I am building behind corporate proxies, and I have to customize both choco and msys2 package to work in my env.
And.. I have no way to work without the proxies.
I have the same issue with other large docker images (dockerizing Matlab is the first usecase I have been experiencing this with, but It is even less reproducible in an open-source collaboration environment)
Symptom
When trying to build this container, the docker build freezes just after finishing the "RUN choco install msys2" command.
After many tries, I had some times where the build actually finished pretty much instantly.
I have been trying to reprod this with a simpler setup (a dockerfile creating thousands of file), but was unable to do so.
So I don't know exactly what triggers what happens to be a race condition
After doing some stack-traces, I observe that the code is stuck in
os.RemoveAll(0xc000d1c000, 0x2a, 0x0, 0x0)
C:/.GOROOT/src/os/path.go:67 +0x3c
github.com/docker/docker/vendor/github.com/Microsoft/hcsshim/internal/wclayer.(*legacyLayerReaderWrapper).Close(0xc0000ba980, 0xc0000ba980, 0x2546fe0)
C:/go/src/github.com/docker/docker/vendor/github.com/Microsoft/hcsshim/internal/wclayer/exportlayer.go:74 +0x95
github.com/docker/docker/daemon/graphdriver/windows.(*Driver).exportLayer.func1.1(0x5f8, 0xc00078e000)
C:/go/src/github.com/docker/docker/daemon/graphdriver/windows/windows.go:672 +0x120
This code is using exportLayer syscall from the winfilter directory to the tmp directory.
Then when the tarfile has been produced, it will remove the tmp directory (like \\\\?\\C:\\ProgramData\\docker\\tmp\\hcs425012433) version of the layer.
Workaround
After a bit of back and forth, I did try docker-ci-zap.exe on the hcs425012433 folder, then the docker build command will instantly unfreeze.
So I hacked a new dockerd-dev.exe using following patch.
diff --git a/internal/wclayer/exportlayer.go b/internal/wclayer/exportlayer.go
index 0425b33..0753ff2 100644
--- a/internal/wclayer/exportlayer.go
+++ b/internal/wclayer/exportlayer.go
@@ -71,6 +71,10 @@ type legacyLayerReaderWrapper struct {
func (r *legacyLayerReaderWrapper) Close() error {
err := r.legacyLayerReader.Close()
+ // if the layer is not Destroyed at hcs level before removing
+ // we might enter in a race-condition for large containers
+ // which end-up in a hang of the os.RemoveAll() call
+ DestroyLayer(r.root)
os.RemoveAll(r.root)
return err
}
I have no idea if this is the right solution for this problem or this is rather an issue with the windows kernel.
I can submit that patch as a PR if requested
This is follow-up of debug I did on docker/for-win#3229
It seems that my issue is slightly different from that one actually.
Setup
How to reproduce
When building a large docker container:
Actually my setup is a bit more complicated, as I am building behind corporate proxies, and I have to customize both choco and msys2 package to work in my env.
And.. I have no way to work without the proxies.
I have the same issue with other large docker images (dockerizing Matlab is the first usecase I have been experiencing this with, but It is even less reproducible in an open-source collaboration environment)
Symptom
When trying to build this container, the docker build freezes just after finishing the "RUN choco install msys2" command.
After many tries, I had some times where the build actually finished pretty much instantly.
I have been trying to reprod this with a simpler setup (a dockerfile creating thousands of file), but was unable to do so.
So I don't know exactly what triggers what happens to be a race condition
After doing some stack-traces, I observe that the code is stuck in
hcsshim/internal/wclayer/exportlayer.go
Line 74 in bd9b255
This code is using exportLayer syscall from the winfilter directory to the tmp directory.
Then when the tarfile has been produced, it will remove the tmp directory (like
\\\\?\\C:\\ProgramData\\docker\\tmp\\hcs425012433) version of the layer.Workaround
After a bit of back and forth, I did try docker-ci-zap.exe on the hcs425012433 folder, then the docker build command will instantly unfreeze.
So I hacked a new dockerd-dev.exe using following patch.
I have no idea if this is the right solution for this problem or this is rather an issue with the windows kernel.
I can submit that patch as a PR if requested