Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

devicemapper: remove container rootfs mountPath after umount #34573

Merged
merged 2 commits into from
Nov 8, 2017

Conversation

cyphar
Copy link
Contributor

@cyphar cyphar commented Aug 20, 2017

libdm currently has a fairly substantial DoS bug that makes certain
operations fail on a libdm device if the device has active references
through mountpoints. This is a significant problem with the advent of
mount namespaces and MS_PRIVATE, and can cause certain --volume mounts
to cause libdm to no longer be able to remove containers:

% docker run -d --name testA busybox top
% docker run -d --name testB -v /var/lib/docker:/docker busybox top
% docker rm -f testA
[fails on libdm with dm_task_run errors.]

This also solves the problem of unprivileged users being able to DoS
docker by using unprivileged mount namespaces to preseve mounts that
Docker has dropped.

Walter

Walter by Alexander Lemke.

Closes #34542
Signed-off-by: Aleksa Sarai [email protected]

/cc @rhvgoyal This is an alternative to #34542
/cc @vrothberg

@cyphar cyphar force-pushed the dm-dos-prevention-remove-mountpoint branch from 41b98d1 to a7f8155 Compare August 20, 2017 04:16
@cyphar cyphar force-pushed the dm-dos-prevention-remove-mountpoint branch 7 times, most recently from 9934083 to a875ebf Compare August 20, 2017 17:13
@cyphar
Copy link
Contributor Author

cyphar commented Aug 20, 2017

It's a bit of a pain that Jenkins doesn't test any graphdrivers other than vfs for the integration tests. Is there anything we can do about that @dnephin? I've tested it locally (and I bet that @rhvgoyal will also want to).

@dnephin
Copy link
Member

dnephin commented Aug 21, 2017

We should have separate suites for testing the graph drivers, so we can run the suite for any supported drivers.

// -1 if lval < rval
// 0 if lval = rval
// 1 if lval > rval
func MustKernelCmp(rval kernel.VersionInfo) int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be inlined into run_test.go? Do you expect to use it outside of container tests?

// tested ocurred). We run it anyway just to be safe.
driver, _ := daemonStorageDriver(context.Background(), client)
if driver != "devicemapper" {
t.Logf("Active driver (%s) is not devicemapper, continuing anyway.", driver)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should t.Skip() if it's not devicemapper.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be a good measure to make sure the bug doesn't happen on other storage drivers too. I should probably just remove this check all-together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right way to test storage drivers. These "integration" tests are very slow, we need to be conservative about the test we add here (see TESTING).

We should be able to test storage drivers without the daemon running, which makes it much easier and faster to test them. It looks like each graphdriver has a test suite which uses graphtest. We should test this case in that suite.

Why is it not possible to just check that the mount point doesn't exist anymore from that test suite?

Copy link
Contributor Author

@cyphar cyphar Aug 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need to check again, but it's not that it's impossible, it's that I'd have to effectively start re-implementing parts of runc inside the test framework to make sure we're testing the right interaction between mount namespaces and libdm.

Just checking if the mountpoint has gone isn't good enough, you need to check that the mount has disappeared in other namespaces (that's where the bug comes from).

// torvalds/linux@8ed936b5671bfb33d89bc60bdcc7cf0470ba52fe was merged).
oldKernel := util.MustKernelCmp(kernel.VersionInfo{Kernel: 3, Major: 18, Minor: 0}) < 0
if oldKernel {
t.Logf("NOTE: This test may fail on your <3.18 kernels.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, probably t.Skip()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some distributions may have backported the fix, so the kernel version alone isn't enough to know whether the test will work (but we don't hard fail on older kernels).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use kernel.CheckKernelVersion from github.com/docker/docker/pkg/parsers/kernel here. For example:

import "github.com/docker/docker/pkg/parsers/kernel"
...
oldKernel := !kernel.CheckKernelVersion(3, 18, 0)
...

@cyphar
Copy link
Contributor Author

cyphar commented Aug 21, 2017

@dnephin So where does that leave this PR's tests? Should I not include tests at all (these tests actually currently fail silently because the error handling in the new integration scheme doesn't exist yet -- and we'll eventually need to recreate the old integration-cli testing setup), or should I start working on making it possible to have devicemapper integration tests?

[As an aside, I'm not sure it was a good idea to deprecate integration-cli while integration doesn't have any significant infrastructure for testing, not to mention that integration-cli has three orders of magnitude more tests that we can't touch anymore because of the deprecation warnings.]

@dnephin
Copy link
Member

dnephin commented Aug 21, 2017

these tests actually currently fail silently because the error handling in the new integration scheme doesn't exist yet

I don't understand this. What do you mean by error handling doesn't exist?

while integration doesn't have any significant infrastructure for testing

It's a work in progress. If you see something significant that is missing, please let me know.

not to mention that integration-cli has three orders of magnitude more tests that we can't touch anymore because of the deprecation warnings

You can change tests, you just can't add new ones

So where does that leave this PR's tests?

The ideal place for this test would be in daemon/graphdriver/devmapper/devmapper_test.go. I don't understand yet why it's not possible to exercise the bug there. If it's not possible I think what you have now is the right idea, but we should skip it when the graphdriver is not devicemapper

if err := system.EnsureRemoveAll(mountPath); err != nil {
logrus.Debugf("devmapper: error doing a removeall on unmounted device %s: %v", mountPath, err)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use EnsureRemoveAll? This function is trying to do so much that I don't even understand what it is doing. After lazy unmount, why don't we simply do a os.Remove() ?

So idea is that use removing directory to yank the mount points which might have leaked into other mount namespaces and this will only work on newer kernels. I think this should work reasonably well. Little concerned about some use case being broken somewhere. There have been quite a few changes w.r.t referece counting. Especially make sure that the case of docker daemon being shutdown/start works while containers were running the whole time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, os.Remove is probably a better choice.

Especially make sure that the case of docker daemon being shutdown/start works while containers were running the whole time.

Can you explain what you mean here? Are you referring to the "live restore" stuff?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was referring to live restore functionality.

@@ -2416,6 +2417,15 @@ func (devices *DeviceSet) UnmountDevice(hash, mountPath string) error {
}
logrus.Debug("devmapper: Unmount done")

// Remove the mountpoint here. This is necessary to avoid cases where a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to tell that removing directory will remove the mount point on this directory from all the mount namespaces.

@rhvgoyal
Copy link
Contributor

@cyphar We will require similar change in Remove() path as well? That is, make sure device is unmounted, then remove mnt directory and then try device deactivation and deletion?

@rhvgoyal
Copy link
Contributor

cc @rhatdan

@cyphar
Copy link
Contributor Author

cyphar commented Aug 25, 2017

@rhvgoyal Yes probably. I'll need to figure out what's the best place to put this change (I'd prefer to not have to duplicate it in both paths).

@cyphar cyphar force-pushed the dm-dos-prevention-remove-mountpoint branch from a875ebf to ba9ee05 Compare August 31, 2017 16:01
@cyphar
Copy link
Contributor Author

cyphar commented Aug 31, 2017

Actually @rhvgoyal I don't think we need this in the Remove path. The Remove path is only called from the driver after it's been Put and removed (and Put calls UnmountDevice which removes the mountpoint). I'm testing the live restore stuff now.

@cyphar cyphar force-pushed the dm-dos-prevention-remove-mountpoint branch 4 times, most recently from ef75b39 to 330461c Compare September 1, 2017 15:13
cyphar added a commit to SUSE/docker that referenced this pull request Oct 21, 2017
libdm currently has a fairly substantial DoS bug that makes certain
operations fail on a libdm device if the device has active references
through mountpoints. This is a significant problem with the advent of
mount namespaces and MS_PRIVATE, and can cause certain --volume mounts
to cause libdm to no longer be able to remove containers:

  % docker run -d --name testA busybox top
  % docker run -d --name testB -v /var/lib/docker:/docker busybox top
  % docker rm -f testA
  [fails on libdm with dm_task_run errors.]

This also solves the problem of unprivileged users being able to DoS
docker by using unprivileged mount namespaces to preseve mounts that
Docker has dropped.

SUSE-Bug: https://bugzilla.suse.com/show_bug.cgi?id=1045628
SUSE-Backport: moby#34573
Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar
Copy link
Contributor Author

cyphar commented Oct 25, 2017

nudge

@cpuguy83
Copy link
Member

Maybe a unit test that does uses unshare(1), and can skip if it's not available. Not exactly ideal, but ...

@cyphar
Copy link
Contributor Author

cyphar commented Oct 25, 2017

The problem with such a test is that unshare would possibly break other unit tests. I'll just write a test to make sure that the mountpoint was deleted, I guess...

[Again, this would not be a problem if I could just write a normal integration test... 😞]

@dnephin
Copy link
Member

dnephin commented Oct 25, 2017

Why would it break other unit tests?

@cyphar cyphar force-pushed the dm-dos-prevention-remove-mountpoint branch 4 times, most recently from 262017f to 2ee2f84 Compare November 7, 2017 23:40
@cyphar
Copy link
Contributor Author

cyphar commented Nov 7, 2017

@dnephin I've added a unit test in graphdriver/devmapper/....

/cc @vieux @cpuguy83

In order to avoid reverting our fix for mount leakage in devicemapper,
add a test which checks that devicemapper's Get() and Put() cycle can
survive having a command running in an rprivate mount propagation setup
in-between. While this is quite rudimentary, it should be sufficient.

We have to skip this test for pre-3.18 kernels.

Signed-off-by: Aleksa Sarai <[email protected]>
@cyphar cyphar force-pushed the dm-dos-prevention-remove-mountpoint branch from 2ee2f84 to 1af8ea6 Compare November 8, 2017 00:02
@cyphar
Copy link
Contributor Author

cyphar commented Nov 8, 2017

ppc failure is not related to my change, it looks like some test wrapper broke in the middle of running the suite.

@thaJeztah
Copy link
Member

Restarted PowerPC; I was told there were some connection issue with Jenkins and the PowerPC nodes

Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @cyphar!

@cyphar
Copy link
Contributor Author

cyphar commented Nov 8, 2017

/cc @vdemeester

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐮

@vdemeester vdemeester merged commit bbc4f78 into moby:master Nov 8, 2017
Copy link
Member

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing the unit test

@cyphar cyphar deleted the dm-dos-prevention-remove-mountpoint branch November 8, 2017 16:25
@cyphar
Copy link
Contributor Author

cyphar commented Nov 8, 2017

@dnephin Sorry, I never answered your question. I was worried about using Unshare in the context of a Go program (historically that's caused nothing but trouble, and I'd argue that it's not solveable in Go because it gives you so little information about the threads that your program runs on). But in the final patch state, I just ended up using os/exec which has support for specifying Unshare and Clone flags that get applied correctly. It's a bit of a shame that it depends on running subcommands, but there's no other nice way of testing it.

cyphar added a commit to SUSE/docker that referenced this pull request Nov 23, 2017
libdm currently has a fairly substantial DoS bug that makes certain
operations fail on a libdm device if the device has active references
through mountpoints. This is a significant problem with the advent of
mount namespaces and MS_PRIVATE, and can cause certain --volume mounts
to cause libdm to no longer be able to remove containers:

  % docker run -d --name testA busybox top
  % docker run -d --name testB -v /var/lib/docker:/docker busybox top
  % docker rm -f testA
  [fails on libdm with dm_task_run errors.]

This also solves the problem of unprivileged users being able to DoS
docker by using unprivileged mount namespaces to preseve mounts that
Docker has dropped.

SUSE-Bug: https://bugzilla.suse.com/show_bug.cgi?id=1045628
SUSE-Backport: moby#34573
Signed-off-by: Aleksa Sarai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.