Skip to content

daemon: use 'private' ipc mode by default#35621

Merged
thaJeztah merged 2 commits intomoby:masterfrom
kolyshkin:ipc-private
Mar 14, 2019
Merged

daemon: use 'private' ipc mode by default#35621
thaJeztah merged 2 commits intomoby:masterfrom
kolyshkin:ipc-private

Conversation

@kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Nov 28, 2017

(this is a follow-up to #34087)

This changes the default ipc mode of daemon/engine to be private, meaning the containers will not have their /dev/shm filesystem mounted on the host and then bind-mounted from host to a container by default.

The benefits of doing this are:

  1. No leaked mounts. This eliminates a possibility to leak mounts into other namespaces (and therefore unfortunate errors like "Unable to remove filesystem for $ID: remove /var/lib/docker/containers/$ID/shm: device or resource busy").

  2. Working checkpoint/restore. Make docker checkpoint not lose the contents of /dev/shm, but save it to the dump, and be restored back upon docker start --checkpoint.

  3. Better security. Currently any container is opened to share its /dev/shm with any other container.

The downside is, obviously, this change will break the following usage scenario:

 $ docker run -d --name donor busybox top
 $ docker run --rm -it --ipc container:donor busybox sh
 Error response from daemon: linux spec namespaces: can't join IPC
 of  container <ID>: non-shareable IPC (hint: use --ipc shareable
 for the donor container)

The solution, as hinted by the (amended) error message, is to explicitly enable donor sharing by using --ipc shareable option to docker run or docker create:

 $ docker run -d --name donor --ipc shareable busybox top

Compatibility notes

  1. This only applies to containers created after this change. Existing containers are not affected and will work fine as their ipc mode is stored in HostConfig.

  2. Old (bad, but backward-compatible) behavior (i.e. "shareable" containers by default) can be enabled by either using --default-ipc-mode shareable daemon command line option, or by adding a "default-ipc-mode": shareable" line in docker.json configuration file.

  3. If an older client (API < 1.40) is used, a "shareable" container is created. A test to check that is added.

  4. An item in API version history explaining the new default, and a way to change it back, is added.

  5. As a side effect, this PR fixes bug originally addressed in containers/create: use shareable ipc by default for older clients #38700 (that is obsoleted by this one).

- A picture of a cute animal (not mandatory but encouraged)

See also

screenshot from 2017-11-27 17-05-55
A tiger cub from the 1984 USSR cartoon https://www.youtube.com/watch?v=0DnjD7w-2FI

@kolyshkin kolyshkin changed the title daemon: use 'private' ipc mode by default [WIP] daemon: use 'private' ipc mode by default Nov 30, 2017
@kolyshkin
Copy link
Contributor Author

Need to figure out why TestCleanupMountsAfterDaemonAndContainerKill and TestCleanupMountsAfterDaemonCrash fail

@kolyshkin kolyshkin force-pushed the ipc-private branch 4 times, most recently from cbe64bf to 7870484 Compare December 1, 2017 22:41
@kolyshkin
Copy link
Contributor Author

OK, I got it now why these two tests are failing!

The reason is, all the mounts are now in the private namespace, except for one used for container's /dev/shm. With this patch, this last mount is gone so test is not finding anything.

The solution is to check whether there are any container mounts visible from the host system after container start -- if not, skip the test.

@kolyshkin kolyshkin changed the title [WIP] daemon: use 'private' ipc mode by default daemon: use 'private' ipc mode by default Dec 4, 2017
@kolyshkin kolyshkin force-pushed the ipc-private branch 2 times, most recently from cd3f5d2 to 737176d Compare December 4, 2017 19:56
@kolyshkin
Copy link
Contributor Author

kolyshkin commented Dec 4, 2017

This one is now ready for the prime time

@kolyshkin kolyshkin force-pushed the ipc-private branch 3 times, most recently from 07cc25c to 09e427e Compare December 5, 2017 05:39
@kolyshkin
Copy link
Contributor Author

Powerpc test case failure is unrelated, nevertheless I figured it out and fixed the test case.

@kolyshkin
Copy link
Contributor Author

The following failure is most probably unrelated as I am seeing it in other PRs. I'll take a look nevertheless.

12:29:56 ----------------------------------------------------------------------
12:29:56 FAIL: check_test.go:366: DockerSwarmSuite.TearDownTest
12:29:56 
12:29:56 check_test.go:371:
12:29:56     d.Stop(c)
12:29:56 daemon/daemon.go:395:
12:29:56     t.Fatalf("Error while stopping the daemon %s : %v", d.id, err)
12:29:56 ... Error: Error while stopping the daemon d704ce7bc07d4 : exit status 130
12:29:56 
12:29:56 
12:29:56 ----------------------------------------------------------------------
12:29:56 PANIC: docker_api_swarm_service_test.go:201: DockerSwarmSuite.TestAPISwarmServicesUpdateStartFirst
12:29:56 
12:29:56 [d704ce7bc07d4] waiting for daemon to start
12:29:56 [d704ce7bc07d4] daemon started
12:29:56 
12:29:56 [d704ce7bc07d4] daemon started
12:29:56 Attempt #2: daemon is still running with pid 19775
12:29:56 Attempt #3: daemon is still running with pid 19775
12:29:56 Attempt #4: daemon is still running with pid 19775
12:29:56 [d704ce7bc07d4] exiting daemon
12:29:56 ... Panic: Fixture has panicked (see related PANIC)
12:29:56 
12:29:56 ----------------------------------------------------------------------

@kolyshkin kolyshkin force-pushed the ipc-private branch 2 times, most recently from 91a7089 to 0a1c4c0 Compare December 6, 2017 07:00
@kolyshkin kolyshkin requested a review from tianon as a code owner December 6, 2017 07:13
@cpuguy83
Copy link
Member

cpuguy83 commented Dec 6, 2017

I worry about changing the default.
If the goal is to eliminate issues with EBUSY, shouldn't we dig deeper into the cause of these leaks? This isn't really fixing mount leaks, just making it less likely in the default configuration.

@cpuguy83
Copy link
Member

cpuguy83 commented Dec 7, 2017

We discussed this in the maintainers meeting and it came up "well is Kubernetes depending on shared IPC" ... I believe it is and this will break them... and even if not it's a good example of how this could break people.

note: I mentioned on the call maybe they are using exec but that just doesn't make any sense because they couldn't start containers with different images in the pod, which is required for the pod scenario.

@cpuguy83
Copy link
Member

cpuguy83 commented Feb 9, 2019

@kolyshkin This wasn't confusing at all. Just that it's wrong to break existing users, which is what this does.

@kolyshkin
Copy link
Contributor Author

We disscused this with @thaJeztah and @cpuguy83 on maintainers meeting today. What needs to be added is:

  • a changelog entry explaining the new default;
  • a fix for clients API < 1.41: if new container ipc mode is empty AND compiled-in daemon default is not changed, use "shareable".

@thaJeztah
Copy link
Member

For the API version; upcoming release will be API v1.40, so if it gets merged before that, it's gonna be API < 1.40

@kolyshkin
Copy link
Contributor Author

I'll revisit this one once #38700 is merged.

@kolyshkin
Copy link
Contributor Author

kolyshkin commented Mar 9, 2019

OK I just came to realization that #38700 needs to be merged into this one, for the reasons that I am too lazy to describe right now. Also that @cpuguy83 and @thaJeztah was right from the start asking to forcefully set IpcMode: shareable for older clients.

I have also added an item into API version history explaining the new default, and a way to change it back.

I think this PR is ready for prime time, aside from the fact that it currently includes #38843; I will rebase once it's merged.

@kolyshkin
Copy link
Contributor Author

kolyshkin commented Mar 9, 2019

01:16:03.313 FAIL: docker_api_swarm_test.go:297: DockerSwarmSuite.TestAPISwarmLeaderElection

is a well-known flaky test (#32673), rebuilding

@kolyshkin
Copy link
Contributor Author

Derek add label: rebuild/janky

@derek derek bot added the rebuild/janky label Mar 9, 2019
There are two if statements checking for exactly same conditions:

> if hostConfig != nil && versions.LessThan(version, "1.40")

Merge these.

Signed-off-by: Kir Kolyshkin <[email protected]>
This changes the default ipc mode of daemon/engine to be private,
meaning the containers will not have their /dev/shm bind-mounted
from the host by default. The benefits of doing this are:

 1. No leaked mounts. Eliminate a possibility to leak mounts into
    other namespaces (and therefore unfortunate errors like "Unable to
    remove filesystem for <ID>: remove /var/lib/docker/containers/<ID>/shm:
    device or resource busy").

 2. Working checkpoint/restore. Make `docker checkpoint`
    not lose the contents of `/dev/shm`, but save it to
    the dump, and be restored back upon `docker start --checkpoint`
    (currently it is lost -- while CRIU handles tmpfs mounts,
    the "shareable" mount is seen as external to container,
    and thus rightfully ignored).

3. Better security. Currently any container is opened to share
   its /dev/shm with any other container.

Obviously, this change will break the following usage scenario:

 $ docker run -d --name donor busybox top
 $ docker run --rm -it --ipc container:donor busybox sh
 Error response from daemon: linux spec namespaces: can't join IPC
 of container <ID>: non-shareable IPC (hint: use IpcMode:shareable
 for the donor container)

The soution, as hinted by the (amended) error message, is to
explicitly enable donor sharing by using --ipc shareable:

 $ docker run -d --name donor --ipc shareable busybox top

Compatibility notes:

1. This only applies to containers created _after_ this change.
   Existing containers are not affected and will work fine
   as their ipc mode is stored in HostConfig.

2. Old backward compatible behavior ("shareable" containers
   by default) can be enabled by either using
   `--default-ipc-mode shareable` daemon command line option,
   or by adding a `"default-ipc-mode": "shareable"`
   line in `/etc/docker/daemon.json` configuration file.

3. If an older client (API < 1.40) is used, a "shareable" container
   is created. A test to check that is added.

Signed-off-by: Kir Kolyshkin <[email protected]>
@kolyshkin
Copy link
Contributor Author

Rebased on top of merged #38843

@thaJeztah thaJeztah added status/2-code-review and removed status/0-triage status/failing-ci Indicates that the PR in its current state fails the test suite labels Mar 11, 2019
Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants