Skip to content

Set 127GB default sandbox size for WCOW, and ensure storage-opts is honoured on all paths under WCOW and LCOW#41636

Merged
thaJeztah merged 5 commits intomoby:masterfrom
TBBle:37352-test-and-fix
Jan 25, 2021
Merged

Set 127GB default sandbox size for WCOW, and ensure storage-opts is honoured on all paths under WCOW and LCOW#41636
thaJeztah merged 5 commits intomoby:masterfrom
TBBle:37352-test-and-fix

Conversation

@TBBle
Copy link
Copy Markdown
Contributor

@TBBle TBBle commented Nov 5, 2020

- What I did

- How I did it

  • Moved the 127GB WCOW default from the Builder internals to the Windows Filter GraphDriver.
  • Processed daemon storage-opts in the Windows Filter and LCOW GraphDrivers initfn instead of merging them into the container-creation parameters.
  • Worked around the limited storage on the Windows RS5 Jenkins build node by detecting when the test passed but the container build failed, see 695b151.

TODO: Update the documentation for the new effective default value of --storage-opts size. Source.

- How to verify it

  • Added TestBuildWCOWSandboxSize to the integration test suite
    • Creates a 21GB file (and immediately deletes it) to confirm the RUN sandbox is > 20GB.
    • Attempts to copy three 7GB files from a previous layer into a new layer, to confirm the COPY sandbox is > 20GB.
  • Manual testing of storage-opts can be done either by hand, or using the DOCKER_STORAGE_OPTS env-var for the integration test script. Setting size=20GB will cause TestBuildWCOWSandboxSize to fail.
    • If the first failure line (the RUN fsutil && delete call) is deleted, it should then also fail on the last step (the COPY --from).
    • This test could be split into two, one for the RUN sandbox and one for the COPY sandbox, if that's useful. The RUN test has almost-zero time or space overhead where it is, as fsutil creates an empty file, and we delete it before it's archived into a layer.
    • TODO: Think of some way to automate this? To adjust DOCKER_STORAGE_OPTS for CI, we'd need a separate pipeline stage. That probably needs separate discussion. And special tests that detect correctly running out of space in docker build.
  • Manual testing with docker run --isolation=hyperv -it --rm mcr.microsoft.com/windows/servercore:2004 showed that storage-opts was taking effect, and if not supplied, a 127GB sandbox was used.

- Description for the changelog

On Windows, all WCOW read-write layers default to 127GB of free space (up from 20GB in most cases), and all WCOW and LCOW layers will honour the size option and lcow.sandboxsize option (repsectively) in the daemon's storage-opts configuration

- A picture of a cute animal (not mandatory but encouraged)

@TBBle
Copy link
Copy Markdown
Contributor Author

TBBle commented Nov 5, 2020

In case we want it later, Jenkins build #8 (b266896) replicates #37352 in an integration test. So now I just need to fix it. ^_^

@TBBle TBBle force-pushed the 37352-test-and-fix branch 9 times, most recently from d02394a to fdea164 Compare November 6, 2020 09:58
@TBBle TBBle changed the title Ensure that the storage-opts setting for _dockerd.exe_ applies to COPY the same way it applies to RUN Set 127GB default sandbox size for WCOW, and ensure storage-opts is honoured on all paths under WCOW and LCOW Nov 6, 2020
@TBBle TBBle force-pushed the 37352-test-and-fix branch from fdea164 to 178b073 Compare November 6, 2020 11:28
@TBBle
Copy link
Copy Markdown
Contributor Author

TBBle commented Nov 6, 2020

Annoyingly, Build #17 shows it successfully passed the test of COPY getting a sandbox obeying the 127GB allowance, but it then failed while archiving the resulting 21GB layer:

re-exec error: exit status 1: output: write \\\\?\\D:\\CI\\PR-41636\\17\\daemon\\tmp\\hcs198867335\\Files\\stuff\\bigfile_1.txt: There is not enough space on the disk.

re-exec is used only for writeLayer calling hcsshim.NewLayerWriter, and the directory mentioned here is the temporary directory used in that process. So, when this error occurs, we've passed the problem being tested, and are restreaming the data from the sandbox VHDX (RW) format into a temporary folder in a format suitable for importing as a WCOW Layer with hscshim/internal/wclayer.ImportLayer.

A failure for the "COPY got a 20GB sandbox" issue (#37352) is visible in build 8, and looks like this:

failed to copy files: failed to copy directory: write \\\\?\\Volume{7981a97f-1f77-11eb-b13e-e3f2701e02c7}\\stuff\\bigfile_3.txt: There is not enough space on the disk.

The directory mentioned here is the mounted volume for the sandbox VHDX of the layer being created.

I suspect I actually ran out of disk space on the test node there. I'm not sure what to do about that, if that's something we have control over? Given that the files being created are all-zeroes, we should only have needed about 63GB of disk space: 21GB for the WCOW Layer, another 21GB for the temp directory, then 21GB for the on-disk format, and we ran out of space in the first third of that last 21GB. That suggests that either we have consumed a lot of space in earlier tests (and docker system prune -f might help) or the disk just isn't big enough. Perhaps it's a 50GB drive? 5GB for Windows Server Core image, leaves 45GB free, which would be about-right for failing after consuming 42GB and then copying a 7GB file.

That suggests the build node is an Azure D2 v3 with a 50GB SSD D:\ based on a note on windows.ps1's TESTRUN_DRIVE env-var.

Anyway, that demonstrates "Moved the 127GB WCOW default from the Builder internals to the Windows Filter GraphDriver." worked, so I'll consider this part "tested", and completion of this fix only requires working out what to do about the test itself.

@TBBle TBBle force-pushed the 37352-test-and-fix branch from 178b073 to b6a2242 Compare November 6, 2020 11:50
@TBBle
Copy link
Copy Markdown
Contributor Author

TBBle commented Nov 6, 2020

Validation failures in CI are inherited from #41638 (See #41638 (comment)) Edit: Now resolved in latest version of #41638, and I've rebased on it.

@TBBle TBBle marked this pull request as ready for review November 6, 2020 13:07
@TBBle TBBle force-pushed the 37352-test-and-fix branch 2 times, most recently from 81fdd13 to f059b68 Compare November 8, 2020 17:15
@tianon
Copy link
Copy Markdown
Member

tianon commented Dec 16, 2020

Conceptually this seems sane; what's the downside?

(In other words, why was this defaulted so much lower previously?)

@TBBle
Copy link
Copy Markdown
Contributor Author

TBBle commented Dec 16, 2020

The 20GB default is what happens if you ask HCS to create you a sandbox and don't then resize it. Someone at MS picked a random number sometime before 2016, perhaps. It is a nice, round number, and doesn't seem too terrifyingly large, or painfully small.

The documented rationale leaves a lot to be desired:

A common pattern for Windows applications is to query the amount of free disk space before installing or creating new files or as a trigger for cleaning up temporary files. With the goal of maximizing application compatibility, the C: drive in a Windows container represents a virtual free size of 20GB.

Edit: The above was added in late 2017 in MicrosoftDocs/Virtualization-Documentation#691, so years after the 20GB limit was actualised, as part of documenting --storage-opt flag and storage-opts config.

ECR caps image layers at 10GB (which is super-annoying...), which matches the devicemapper graphdriver base image size, so 20GB looks positively generous in the right contexts.

I'm not aware of any downside to this change, unless perhaps people were relying on the 20GB default size to prevent runaway disk consumption on the backing drive.

I checked and the sandbox.vhdx file created doesn't consume the full 20GB (the vhdx is about 4MB when created, is temporarily larger when mounted (37MB), and when unmounted (without doing anything else) shrinks back down to about 9MB, so the "differencing VHDX" fileformat doesn't appear to be going to blow out with this change. Using docker run --storage-opt size=127G showed the mounted sandbox.vhdx file was 47MB, so it's definitely not linearly scaling with free space.

Perhaps those who had tried to change this default in the past were bitten by microsoft/hcsshim#708 (fix shipped in Docker Engine 19.03.6 in February 2020) and gave up...

#35925 should have caused that bug to be seen often, but I suspect most people (including the Docker CI pipeline...) were not passing --platform windows, and so #35925's change was not effective for them. For those who were passing --platform windows, perhaps they saw microsoft/hcsshim#708 and just stopped doing so.

@tianon
Copy link
Copy Markdown
Member

tianon commented Dec 16, 2020

Thank you, that makes sense, and sounds like it brings us closer to how Linux works (unless you're using devicemapper, which we try not to talk too loudly about 😅).

Copy link
Copy Markdown
Member

@tianon tianon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, but I'm not sure I'm the right person to review all these changes 😇

@TBBle
Copy link
Copy Markdown
Contributor Author

TBBle commented Jan 22, 2021

Who would be the right person to review these changes? We're already unable to build the latest Unreal Engine, as its output size is over 20GB on Windows, and each release seems larger than the last.

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, sorry, thought I reviewed already; left some nits on the last commit (also feel free to open it separately if you want to unblock this PR), LGTM otherwise

Comment on lines 286 to 287
// Convert this directory into a shared mount point so that we do
// not rely on propagation properties of parent mount.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this comment was no longer matching what was actually done?

Suggested change
// Convert this directory into a shared mount point so that we do
// not rely on propagation properties of parent mount.
// Convert this directory into a private mount point so that we do
// not rely on propagation properties of parent mount.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually; wondering if this last commit should be in this PR, as it doesn't look related to the change itself 😅 (not a blocker though)

Copy link
Copy Markdown
Contributor Author

@TBBle TBBle Jan 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's still shared. This change is just a refactoring:

mount.Mount(X,X, "none", "bind,private") => mount.MakePrivate(X)

and

mount.Mount("none", X, "none", "shared") => mount.MakeShared(X);

In both cases, it generates 3 mount syscalls: bind, private, shared on that path.

The final result is a 'shared' mount (so others mounting this will see mounts done here or there), but which is not receiving any mount changes from its parent tree. (#41638 (comment) has more details and links, because I also had gone and forth over that comment when I first saw it.)

You're right, it's not part of this change. It's a minor cleanup that was pointed out to me in #41638 (comment) (where I moved these tests from integration-cli to integration) after that was merged. Or rather, the comment was made before it was merged, but I didn't see the comment until after it was merged.

I can trivially pull this commit into a new PR if it's a blocker for the rest of this landing. But I believe the commit is correct now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that's definitely confusing. I think what makes it confusing is that the comment is associated with he first of those mounts (mount.MakePrivate), which is only the first step; perhaps we should rephrase it somewhat to outline the steps (bind -> private -> shared).

Anyway; it's indeed not something new, so let's keep that for a follow-up

Comment thread integration/container/mounts_linux_test.go
Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 286 to 287
// Convert this directory into a shared mount point so that we do
// not rely on propagation properties of parent mount.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, that's definitely confusing. I think what makes it confusing is that the comment is associated with he first of those mounts (mount.MakePrivate), which is only the first step; perhaps we should rephrase it somewhat to outline the steps (bind -> private -> shared).

Anyway; it's indeed not something new, so let's keep that for a follow-up

@thaJeztah thaJeztah merged commit f266f13 into moby:master Jan 25, 2021
@thaJeztah thaJeztah added this to the 20.10.3 milestone Jan 25, 2021
@TBBle TBBle deleted the 37352-test-and-fix branch January 25, 2021 15:05
@TBBle
Copy link
Copy Markdown
Contributor Author

TBBle commented Jan 25, 2021

Awesome, thank you for landing this.

Is there anything I can or should do to help this get into 20.10.3, or will that process proceed simply from the milestone tag, and I should just wait and see?

@thaJeztah
Copy link
Copy Markdown
Member

We'll go through PR's that were merged and look what to include in upcoming patch releases 👍

@slonopotamus
Copy link
Copy Markdown
Contributor

@thaJeztah could you please comment on whether this PR got into 20.10.x releases? I currently see that is has 20.10.3 milestone as labelled with process/cherry-pick, but 20.10.3 was already released and according to its changelog, it doesn't contain changes from current PR.

@TBBle
Copy link
Copy Markdown
Contributor Author

TBBle commented Mar 31, 2021

It shows the 21.xx milestone, updated on February 3rd.

@slonopotamus
Copy link
Copy Markdown
Contributor

Oh, I misread it somehow, sorry.

@TBBle
Copy link
Copy Markdown
Contributor Author

TBBle commented Mar 31, 2021

Yeah, the GitHub change log can be confusing. Every time I see
image
I have to check if the change was 'added' or 'removed', because GitHub doesn't distinguish them. >_< In this case, the first was removed, the second was added.

@wbudd
Copy link
Copy Markdown

wbudd commented Dec 5, 2022

So this was merged almost 2 years ago...but never applied to any Docker release...why exactly? 🤦

@TBBle
Copy link
Copy Markdown
Contributor Author

TBBle commented Dec 6, 2022

Because there hasn't been a major feature release for Docker Engine since it was merged. It'll be part of the next release, which I see now has been recently renamed from 22.06.0 (per the latest beta, which contained this fix) to 23.0.0 (per the milestone's current name).

@wbudd
Copy link
Copy Markdown

wbudd commented Dec 6, 2022

@TBBle I see. Thanks for the heads up. That's a very patient release cycle.

(And sorry 'bout the snarkiness of my prev comment. Getting bitten by this limit was a bit frustrating yesterday, but in general I do of course appreciate all the work you guys put in creating all the Docker tooling that (mostly) makes my life as a dev a lot less painful than I'm sure it otherwise would be!)

@TBBle
Copy link
Copy Markdown
Contributor Author

TBBle commented Dec 6, 2022

Yeah, I appreciate that the delay is frustrating. I think it's been surprising for everyone involved, given the milestone was previously known as 22.06 and even 21.x over its existence. This change also landed very early in the release cycle, it only missed 20.10's initial branch-point by a month or two, AFAIR. So by unlucky timing, it's probably in the top few percent of PRs for "length of time waiting to ship in a stable Docker release since merging into trunk".

In hindsight, perhaps this might have been accepted for backporting to the 22.10 series if we'd known how long it'd be until the next feature release. But I don't know for sure, since I am not currently aware of the team's backporting criteria, and this was somewhat impactful on the scale of bugfixes.

In the meantime, alternatives for Docker Engine are around, such as using the 22.06 beta release, or https://github.com/slonopotamus/stevedore (which is tracking 20.10 but includes a backport of this PR).

If using Docker Desktop on Windows, I suspect enabling one of the tech previews might pull in a build of the engine with this fix but haven't actually checked this. I just happened to notice an example docker info output in a tech preview's documentation that showed a 22.06-branch build was being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

COPY step fails on Windows when the size is greater than 20GB.

5 participants