Set 127GB default sandbox size for WCOW, and ensure storage-opts is honoured on all paths under WCOW and LCOW#41636
Conversation
dbb7371 to
b771d8c
Compare
3f3b03a to
b266896
Compare
|
In case we want it later, Jenkins build #8 (b266896) replicates #37352 in an integration test. So now I just need to fix it. ^_^ |
d02394a to
fdea164
Compare
storage-opts setting for _dockerd.exe_ applies to COPY the same way it applies to RUNfdea164 to
178b073
Compare
|
Annoyingly, Build #17 shows it successfully passed the test of
A failure for the "COPY got a 20GB sandbox" issue (#37352) is visible in build 8, and looks like this:
The directory mentioned here is the mounted volume for the sandbox VHDX of the layer being created. I suspect I actually ran out of disk space on the test node there. I'm not sure what to do about that, if that's something we have control over? Given that the files being created are all-zeroes, we should only have needed about 63GB of disk space: 21GB for the WCOW Layer, another 21GB for the temp directory, then 21GB for the on-disk format, and we ran out of space in the first third of that last 21GB. That suggests that either we have consumed a lot of space in earlier tests (and That suggests the build node is an Azure D2 v3 with a 50GB SSD D:\ based on a note on windows.ps1's Anyway, that demonstrates "Moved the 127GB WCOW default from the Builder internals to the Windows Filter GraphDriver." worked, so I'll consider this part "tested", and completion of this fix only requires working out what to do about the test itself. |
178b073 to
b6a2242
Compare
|
|
81fdd13 to
f059b68
Compare
|
Conceptually this seems sane; what's the downside? (In other words, why was this defaulted so much lower previously?) |
|
The 20GB default is what happens if you ask HCS to create you a sandbox and don't then resize it. Someone at MS picked a random number sometime before 2016, perhaps. It is a nice, round number, and doesn't seem too terrifyingly large, or painfully small. The documented rationale leaves a lot to be desired:
Edit: The above was added in late 2017 in MicrosoftDocs/Virtualization-Documentation#691, so years after the 20GB limit was actualised, as part of documenting ECR caps image layers at 10GB (which is super-annoying...), which matches the devicemapper graphdriver base image size, so 20GB looks positively generous in the right contexts. I'm not aware of any downside to this change, unless perhaps people were relying on the 20GB default size to prevent runaway disk consumption on the backing drive. I checked and the sandbox.vhdx file created doesn't consume the full 20GB (the vhdx is about 4MB when created, is temporarily larger when mounted (37MB), and when unmounted (without doing anything else) shrinks back down to about 9MB, so the "differencing VHDX" fileformat doesn't appear to be going to blow out with this change. Using Perhaps those who had tried to change this default in the past were bitten by microsoft/hcsshim#708 (fix shipped in Docker Engine 19.03.6 in February 2020) and gave up... #35925 should have caused that bug to be seen often, but I suspect most people (including the Docker CI pipeline...) were not passing |
|
Thank you, that makes sense, and sounds like it brings us closer to how Linux works (unless you're using devicemapper, which we try not to talk too loudly about 😅). |
tianon
left a comment
There was a problem hiding this comment.
Changes LGTM, but I'm not sure I'm the right person to review all these changes 😇
|
Who would be the right person to review these changes? We're already unable to build the latest Unreal Engine, as its output size is over 20GB on Windows, and each release seems larger than the last. |
thaJeztah
left a comment
There was a problem hiding this comment.
oops, sorry, thought I reviewed already; left some nits on the last commit (also feel free to open it separately if you want to unblock this PR), LGTM otherwise
| // Convert this directory into a shared mount point so that we do | ||
| // not rely on propagation properties of parent mount. |
There was a problem hiding this comment.
Looks like this comment was no longer matching what was actually done?
| // Convert this directory into a shared mount point so that we do | |
| // not rely on propagation properties of parent mount. | |
| // Convert this directory into a private mount point so that we do | |
| // not rely on propagation properties of parent mount. |
There was a problem hiding this comment.
Actually; wondering if this last commit should be in this PR, as it doesn't look related to the change itself 😅 (not a blocker though)
There was a problem hiding this comment.
No, it's still shared. This change is just a refactoring:
mount.Mount(X,X, "none", "bind,private") => mount.MakePrivate(X)
and
mount.Mount("none", X, "none", "shared") => mount.MakeShared(X);
In both cases, it generates 3 mount syscalls: bind, private, shared on that path.
The final result is a 'shared' mount (so others mounting this will see mounts done here or there), but which is not receiving any mount changes from its parent tree. (#41638 (comment) has more details and links, because I also had gone and forth over that comment when I first saw it.)
You're right, it's not part of this change. It's a minor cleanup that was pointed out to me in #41638 (comment) (where I moved these tests from integration-cli to integration) after that was merged. Or rather, the comment was made before it was merged, but I didn't see the comment until after it was merged.
I can trivially pull this commit into a new PR if it's a blocker for the rest of this landing. But I believe the commit is correct now.
There was a problem hiding this comment.
Ah, yes, that's definitely confusing. I think what makes it confusing is that the comment is associated with he first of those mounts (mount.MakePrivate), which is only the first step; perhaps we should rephrase it somewhat to outline the steps (bind -> private -> shared).
Anyway; it's indeed not something new, so let's keep that for a follow-up
| // Convert this directory into a shared mount point so that we do | ||
| // not rely on propagation properties of parent mount. |
There was a problem hiding this comment.
Ah, yes, that's definitely confusing. I think what makes it confusing is that the comment is associated with he first of those mounts (mount.MakePrivate), which is only the first step; perhaps we should rephrase it somewhat to outline the steps (bind -> private -> shared).
Anyway; it's indeed not something new, so let's keep that for a follow-up
|
Awesome, thank you for landing this. Is there anything I can or should do to help this get into 20.10.3, or will that process proceed simply from the milestone tag, and I should just wait and see? |
|
We'll go through PR's that were merged and look what to include in upcoming patch releases 👍 |
|
@thaJeztah could you please comment on whether this PR got into 20.10.x releases? I currently see that is has 20.10.3 milestone as labelled with |
|
It shows the 21.xx milestone, updated on February 3rd. |
|
Oh, I misread it somehow, sorry. |
|
So this was merged almost 2 years ago...but never applied to any Docker release...why exactly? 🤦 |
|
Because there hasn't been a major feature release for Docker Engine since it was merged. It'll be part of the next release, which I see now has been recently renamed from 22.06.0 (per the latest beta, which contained this fix) to 23.0.0 (per the milestone's current name). |
|
@TBBle I see. Thanks for the heads up. That's a very patient release cycle. (And sorry 'bout the snarkiness of my prev comment. Getting bitten by this limit was a bit frustrating yesterday, but in general I do of course appreciate all the work you guys put in creating all the Docker tooling that (mostly) makes my life as a dev a lot less painful than I'm sure it otherwise would be!) |
|
Yeah, I appreciate that the delay is frustrating. I think it's been surprising for everyone involved, given the milestone was previously known as 22.06 and even 21.x over its existence. This change also landed very early in the release cycle, it only missed 20.10's initial branch-point by a month or two, AFAIR. So by unlucky timing, it's probably in the top few percent of PRs for "length of time waiting to ship in a stable Docker release since merging into trunk". In hindsight, perhaps this might have been accepted for backporting to the 22.10 series if we'd known how long it'd be until the next feature release. But I don't know for sure, since I am not currently aware of the team's backporting criteria, and this was somewhat impactful on the scale of bugfixes. In the meantime, alternatives for Docker Engine are around, such as using the 22.06 beta release, or https://github.com/slonopotamus/stevedore (which is tracking 20.10 but includes a backport of this PR). If using Docker Desktop on Windows, I suspect enabling one of the tech previews might pull in a build of the engine with this fix but haven't actually checked this. I just happened to notice an example |

- What I did
RUN(introduced in Windows: Bump RW layer size #35925) to all WCOW sandboxes.--platform windowsor by image-based or defaulted platform choice.storage-optsoverrides the 127GB default sandbox size when both are applied.storage-optssetting for dockerd.exe to all WCOW and LCOW sandboxes, not just the ones underlying a container, i.e. not justRUNanddocker run.lcow.sandboxsize(it cannot be less than 20GB) at startup rather than waiting until someone runs a container.- How I did it
storage-optsin the Windows Filter and LCOW GraphDriversinitfninstead of merging them into the container-creation parameters.TODO: Update the documentation for the new effective default value of
--storage-opts size. Source.- How to verify it
TestBuildWCOWSandboxSizeto the integration test suiteRUNsandbox is > 20GB.COPYsandbox is > 20GB.DOCKER_STORAGE_OPTSenv-var for the integration test script. Settingsize=20GBwill causeTestBuildWCOWSandboxSizeto fail.RUN fsutil && deletecall) is deleted, it should then also fail on the last step (theCOPY --from).RUNsandbox and one for theCOPYsandbox, if that's useful. TheRUNtest has almost-zero time or space overhead where it is, as fsutil creates an empty file, and we delete it before it's archived into a layer.DOCKER_STORAGE_OPTSfor CI, we'd need a separate pipeline stage. That probably needs separate discussion. And special tests that detect correctly running out of space indocker build.docker run --isolation=hyperv -it --rm mcr.microsoft.com/windows/servercore:2004showed that storage-opts was taking effect, and if not supplied, a 127GB sandbox was used.- Description for the changelog
On Windows, all WCOW read-write layers default to 127GB of free space (up from 20GB in most cases), and all WCOW and LCOW layers will honour the
sizeoption andlcow.sandboxsizeoption (repsectively) in the daemon'sstorage-optsconfiguration- A picture of a cute animal (not mandatory but encouraged)