Skip to content

Docker 0.7 Staging#2094

Closed
crosbymichael wants to merge 164 commits intomasterfrom
0.7-staging
Closed

Docker 0.7 Staging#2094
crosbymichael wants to merge 164 commits intomasterfrom
0.7-staging

Conversation

@crosbymichael
Copy link
Copy Markdown
Contributor

No description provided.

This is a module that uses the device-mapper create CoW snapshots
You instantiate a DeviceSetDM object on a specified root (/var/lib/docker),
and it will create a subdirectory there called "loopback". It will
contain two sparse files which are loopback mounted into
a thin-pool device-mapper device called "docker-pool".

We then create a base snapshot in the pool with an empty filesystem
which can be used as a base for docker snapshots. It also keeps track
of the mapping between docker image ids and the snapshots in the pool.

Typical use of is something like (without error checking):

devices = NewDeviceSetDM("/var/lib/docker")
devices.AddDevice(imageId, "") // "" is the base image id
devices.MountDevice(imageId, "/mnt/image")
 ... extract base image to /mnt/image
devices.AddDevice(containerId, imageId)
devices.MountDevice(containerId, "/mnt/container")
... start container at /mnt/container
This may be used for the .dockerinit case if the main binary is not
statically linked.
In some builds the main docker binary is not statically linked,
and as such not usable in as the .dockerinit binary, for those
cases we look for a separately shipped docker-init binary and
use that instead.
We will later need the runtime to get access to the VolumeSet
singleton, and the container id to have a name for the volume
for the container
This interface matches the device-mapper implementation (DeviceSetDM)
but is free from any dependencies. This allows core docker code
to refer to a DeviceSet without having an explicit dependency on
the devmapper package.

This is important, because the devmapper package has external
dependencies which are not wanted in the docker client app, as it
needs to run with minimal dependencies in the docker image.
This makes docker (but not docker-init) link to libdevmapper and will
allow it to use the DeviceSet
This adds a DeviceSet singleton to the Runtime object which will be used for
any DeviceMapper dependent code.
This supports creating images from layers and mounting them
for running a container.

Not supported yet are:
* Creating diffs between images/containers
* Creating layers for new images from a device-mapper container
There is no need to keep all the device-mapper devices active, we
can just activate them on demand if needed.
Without this there is really no way to map back from the device-mapper
devices to the actual docker image/container ids in case the json file
somehow got lost
This means the default is "docker-*", but for tests we get separate
prefixes for each test.
To do diffing we just compare file metadata, so this relies
on things like size and mtime/ctime to catch any changes.
Its *possible* to trick this by updating a file without
changing the size and setting back the mtime/ctime, but
that seems pretty unlikely to happen in reality, and lets
us avoid comparing the actual file data.
There is no need to duplicate the compression flags for
every element in the filter.
There are a few changes:
* Callers can specify if they want recursive behaviour or not
* All file listings to tar are sent on stdin, to handle long lists better
* We can pass in a list of filenames which will be created as empty
  files in the tarball

This is exactly what we want for the creation of layer tarballs given
a container fs, a set of files to add and a set of whiteout files to create.
If an image is deleted and there is a corresponding device
for that image we also delete the image.
This wraps an existing DeviceSet and just adds a prefix to all ids in
it. This will be useful for reusing a single DeviceSet for all the tests
(but with separate ids)
We wrap the "real" DeviceSet for each test so that we get only
a single device-mapper pool and loopback mounts, but still
separate out the IDs in the tests. This makes the test run
much faster.
This removes some Debugf() calls and chages some direct prints to
Debugf(). This means we don't get a bunch of spew when running the
tests.
I currently need this to get the tests running, otherwise it will
mount the docker.test binary inside the containers, which doesn't
work due to the libdevmapper.so dependency.
This directory is copied to each test prefix which is really
slow with the large loopback mounts.
This way we don't get any issues with leftovers
Right now this does nothing but add a new layer, but it means
that all DeviceMounts are paired with DeviceUnmounts so that we
can track (and cleanup) active mounts.
We unmount all mounts and deactivate all device mapper devices to
make sure we're left with no leftovers after the test.
This helps us track the unmount
crosbymichael and others added 25 commits October 4, 2013 09:51
Add links unit test file
Add build steps to compile docker statically
with CGO enabled
Hard code root entity name
Remove test from Dockerfile
Name sure container names work across commands
Only now full ids with docker ls -a
Conflicts:
	Dockerfile
	docker/docker.go
	hack/PACKAGERS.md
	hack/make.sh
	hack/make/binary
	hack/make/test
	runtime.go
	runtime_test.go
	server.go
	utils.go
	utils/utils.go
	utils_test.go
This separates out the DeviceSet logic a bit better from the raw
device mapper operations.

devicemapper: Serialize addess to the devicemapper deviceset

This code is not safe to run in multiple threads at the same time,
and neither is libdevmapper.

DeviceMapper: Move deactivate into UnmountDevice

This way the deactivate is atomic wrt othe device mapper operations
and will not fail with EBUSY if someone else starts a devicemapper
operation inbetween unmount and deactivate.

devmapper: Fix loopback mounting regression

Some changes were added to attach_loop_device which added
a perror() in a place that caused it to override errno so that
a later errno != EBUSY failed. This fixes that and cleans up
the error reporting a bit.

devmapper: Build on old kernels without LOOP_CTL_GET_FREE define
@shykes
Copy link
Copy Markdown
Contributor

shykes commented Oct 8, 2013

I don't like wrapping the command in an extra shell command to make mounts private. If you need to execute code inside the namespace, do it in dockerinit, that's what it's for.

Also, can we do that without shelling out to mount?

/cc @alexlarsson

@alexlarsson
Copy link
Copy Markdown
Contributor

@shykes The problem happens in the lxc-start script, not in the .dockerinit process. I.e. when lxc-start starts to mount things they propagate out to the host where they are then never cleaned up.

The right approach would be for lxc-start to change the FS to private itself. In fact, it should probably make just the location it mounts private rather than everything. However, that is not supported atm.

Second best would be to do the unshare and the mount private inside the docker daemon after a clone(CLONE_FS) but before execing lxc-start. However, go does not allow running any code between fork/clone and exec as it is problematic wrt threads and whatnot in the go runtime. Instead it allows a limited subset of operations to happen via syscall.SysProcAttr, and what we need is not supported there.

cpuguy83 pushed a commit to cpuguy83/docker that referenced this pull request May 25, 2021
fix for moby#1333, calling LinkDel to delete link device when the err is NULL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants