-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Mount /proc and /sys read-only, except in privileged containers. #5445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I'm getting a lot of failures and a panic in the integration tests. This isn't happening on master. |
|
Won't this disable a lot of very useful functionality? Like /proc/sys/net has good stuff in it and it's valid to change in a container because of the network namespace. |
|
@unclejack: those tests run fine here. The only tests failing are PingGoogle (because 8.8.8.8 is not reachable on my machine) and the test that checks sysfs access (obviously). How did you run the tests? (I did |
|
Tests updated. All pass here. |
|
I still have tons of failures |
|
@vieux i don't trust you |
|
I fixed the gofmt thing. (Also, the person responsible for messing up with my gofmt git hook has been terminated!) @vieux what kind of failure do you see? FWIW, I'm running |
|
ping @creack |
|
Same as @vieux: plenty of failure in test-integration. (too much output to paste) + endup in panic. (master works fine) |
|
@jpetazzo btrfs 3.13 |
|
@jpetazzo I was running the tests using |
|
Just reproduced the tests within a boot2docker VM (kernel 3.13, docker 0.10, native, aufs). First time, I got a completely bogus error in archive. I switched back to master, test went fine. Then I switched back to my branch, test went fine. I don't even. Meanwhile, I'm running the whole suite 5 times in a row here to see what happens. I'll also compare the fails reported by @creack and @unclejack. @vieux, can you add yours please? Thanks. |
|
@unclejack @creack @vieux Instead of running the tests can you try running a container to get a better error message about what is preventing it from running? |
|
@jpetazzo @crosbymichael Here's what I get when I run the tests: http://showterm.io/72a7e02e87a8ec374ffe6#fast @crosbymichael I'm building a binary to test now. |
|
|
|
Nothing in the daemon logs: |
|
I'm also getting the read-only file system error. |
|
Maybe something with apparmor on their systems? |
|
Ya, I think it maybe apparmor when we go to set the profile for the process |
|
Nothing regarding apparmor in |
|
Just spitballing, but couldn't we |
|
Yes, if it really is the setting the apparmor profile we will have to mount it as rw initially then after we have everything setup remount as ro |
|
Right. Eventually reproduced the issue with a Ubuntu-based VM. And confirming that it comes from the AppArmor thing, thanks a lot to those who helped to figure this out! I think I will just improve |
It has been pointed out that some files in /proc and /sys can be used to break out of containers. However, if those filesystems are mounted read-only, most of the known exploits are mitigated, since they rely on writing some file in those filesystems. This does not replace security modules (like SELinux or AppArmor), it is just another layer of security. Likewise, it doesn't mean that the other mitigations (shadowing parts of /proc or /sys with bind mounts) are useless. Those measures are still useful. As such, the shadowing of /proc/kcore is still enabled with both LXC and native drivers. Special care has to be taken with /proc/1/attr, which still needs to be mounted read-write in order to enable the AppArmor profile. It is bind-mounted from a private read-write mount of procfs. All that enforcement is done in dockerinit. The code doing the real work is in libcontainer. The init function for the LXC driver calls the function from libcontainer to avoid code duplication. Docker-DCO-1.1-Signed-off-by: Jérôme Petazzoni <[email protected]> (github: jpetazzo)
|
To summarize a quick exchange with Mike on #docker-dev: there is a catch 22 here, because to enable AppArmor, you need write access to /proc; but once you have done that, you don't have the necessary permissions to remount /proc read-only anymore. I found a workaround: do a temporary mount of /proc read-write, and bind-mount /proc/1/attr (the directory needed by AppArmor) over the "real" read-only /proc. It's not the cleanest thing, but it works pretty well, and I like the fact that instead of masking some things, we mask everything and explicitly re-enable access wherever needed. Regarding the overall structure, I ended up doing the following changes:
Unfortunately, I haven't been able to run all tests on this branch. I have some trouble with my test setup (all tests pass within my boot2docker VM; I have random failures in my local environment; and consistent failures in my Ubuntu 14.04 VM in graphdriver/devmapper unit tests, which doesn't make any sense; those failures also happen on master, which hints at some problem within my VM rather than the code itself). If people could give this a roll while I try to fix my test setup, I would be ever grateful. |
|
#assignee=crosbymichael |
|
@jpetazzo I'll take this PR from here. We are trying a new workflow so instead of asking you to make a lot of small changes i'll just do it and get it merged in unless there are some larger changes that you want to still make. |
It has been pointed out that some files in /proc and /sys can be used
to break out of containers. However, if those filesystems are mounted
read-only, most of the known exploits are mitigated, since they rely
on writing some file in those filesystems.
This does not replace security modules (like SELinux or AppArmor), it
is just another layer of security. Likewise, it doesn't mean that the
other mitigations (shadowing parts of /proc or /sys with bind mounts)
are useless. Those measures are still useful.
Fixes #5444
Docker-DCO-1.1-Signed-off-by: Jérôme Petazzoni [email protected] (github: jpetazzo)