Docker 0.7 Staging#2094
Conversation
This is a module that uses the device-mapper create CoW snapshots
You instantiate a DeviceSetDM object on a specified root (/var/lib/docker),
and it will create a subdirectory there called "loopback". It will
contain two sparse files which are loopback mounted into
a thin-pool device-mapper device called "docker-pool".
We then create a base snapshot in the pool with an empty filesystem
which can be used as a base for docker snapshots. It also keeps track
of the mapping between docker image ids and the snapshots in the pool.
Typical use of is something like (without error checking):
devices = NewDeviceSetDM("/var/lib/docker")
devices.AddDevice(imageId, "") // "" is the base image id
devices.MountDevice(imageId, "/mnt/image")
... extract base image to /mnt/image
devices.AddDevice(containerId, imageId)
devices.MountDevice(containerId, "/mnt/container")
... start container at /mnt/container
This may be used for the .dockerinit case if the main binary is not statically linked.
In some builds the main docker binary is not statically linked, and as such not usable in as the .dockerinit binary, for those cases we look for a separately shipped docker-init binary and use that instead.
We will later need the runtime to get access to the VolumeSet singleton, and the container id to have a name for the volume for the container
This interface matches the device-mapper implementation (DeviceSetDM) but is free from any dependencies. This allows core docker code to refer to a DeviceSet without having an explicit dependency on the devmapper package. This is important, because the devmapper package has external dependencies which are not wanted in the docker client app, as it needs to run with minimal dependencies in the docker image.
This makes docker (but not docker-init) link to libdevmapper and will allow it to use the DeviceSet
This adds a DeviceSet singleton to the Runtime object which will be used for any DeviceMapper dependent code.
This supports creating images from layers and mounting them for running a container. Not supported yet are: * Creating diffs between images/containers * Creating layers for new images from a device-mapper container
There is no need to keep all the device-mapper devices active, we can just activate them on demand if needed.
Without this there is really no way to map back from the device-mapper devices to the actual docker image/container ids in case the json file somehow got lost
This means the default is "docker-*", but for tests we get separate prefixes for each test.
To do diffing we just compare file metadata, so this relies on things like size and mtime/ctime to catch any changes. Its *possible* to trick this by updating a file without changing the size and setting back the mtime/ctime, but that seems pretty unlikely to happen in reality, and lets us avoid comparing the actual file data.
There is no need to duplicate the compression flags for every element in the filter.
There are a few changes: * Callers can specify if they want recursive behaviour or not * All file listings to tar are sent on stdin, to handle long lists better * We can pass in a list of filenames which will be created as empty files in the tarball This is exactly what we want for the creation of layer tarballs given a container fs, a set of files to add and a set of whiteout files to create.
If an image is deleted and there is a corresponding device for that image we also delete the image.
This wraps an existing DeviceSet and just adds a prefix to all ids in it. This will be useful for reusing a single DeviceSet for all the tests (but with separate ids)
We wrap the "real" DeviceSet for each test so that we get only a single device-mapper pool and loopback mounts, but still separate out the IDs in the tests. This makes the test run much faster.
This removes some Debugf() calls and chages some direct prints to Debugf(). This means we don't get a bunch of spew when running the tests.
I currently need this to get the tests running, otherwise it will mount the docker.test binary inside the containers, which doesn't work due to the libdevmapper.so dependency.
This directory is copied to each test prefix which is really slow with the large loopback mounts.
This way we don't get any issues with leftovers
Right now this does nothing but add a new layer, but it means that all DeviceMounts are paired with DeviceUnmounts so that we can track (and cleanup) active mounts.
We unmount all mounts and deactivate all device mapper devices to make sure we're left with no leftovers after the test.
This helps us track the unmount
Allow active links to be removed
Add links unit test file
Add build steps to compile docker statically with CGO enabled
Hard code root entity name Remove test from Dockerfile Name sure container names work across commands
Only now full ids with docker ls -a
Conflicts: Dockerfile docker/docker.go hack/PACKAGERS.md hack/make.sh hack/make/binary hack/make/test runtime.go runtime_test.go server.go utils.go utils/utils.go utils_test.go
This separates out the DeviceSet logic a bit better from the raw device mapper operations. devicemapper: Serialize addess to the devicemapper deviceset This code is not safe to run in multiple threads at the same time, and neither is libdevmapper. DeviceMapper: Move deactivate into UnmountDevice This way the deactivate is atomic wrt othe device mapper operations and will not fail with EBUSY if someone else starts a devicemapper operation inbetween unmount and deactivate. devmapper: Fix loopback mounting regression Some changes were added to attach_loop_device which added a perror() in a place that caused it to override errno so that a later errno != EBUSY failed. This fixes that and cleans up the error reporting a bit. devmapper: Build on old kernels without LOOP_CTL_GET_FREE define
|
I don't like wrapping the command in an extra shell command to make mounts private. If you need to execute code inside the namespace, do it in dockerinit, that's what it's for. Also, can we do that without shelling out to mount? /cc @alexlarsson |
|
@shykes The problem happens in the lxc-start script, not in the .dockerinit process. I.e. when lxc-start starts to mount things they propagate out to the host where they are then never cleaned up. The right approach would be for lxc-start to change the FS to private itself. In fact, it should probably make just the location it mounts private rather than everything. However, that is not supported atm. Second best would be to do the unshare and the mount private inside the docker daemon after a clone(CLONE_FS) but before execing lxc-start. However, go does not allow running any code between fork/clone and exec as it is problematic wrt threads and whatnot in the go runtime. Instead it allows a limited subset of operations to happen via syscall.SysProcAttr, and what we need is not supported there. |
fix for moby#1333, calling LinkDel to delete link device when the err is NULL
No description provided.