Content addressability by tonistiigi · Pull Request #17924 · moby/moby

tonistiigi · 2015-11-12T01:24:15Z

This PR changes the Docker engine to use content addressable storage for images and layers. This means that image IDs are no longer arbitrarily assigned - now they are computed based on the filesystem contents and the image configuration. Images can securely share their underlying layers without duplicating data on disk.

This work involves a big refactoring of image-related code in the engine. The graph package is entirely removed, and is replaced by a few new packages:

layer: Layer store interface and implementation. This manages filesystem layers without being aware of image configurations. The layer store maintains reference counts for layers and removes unreferenced layers.
image: Image store interface and implementation. This manages image configurations. Image configurations include runtime configuration and reference the underlying filesystem data.
tag: The tag functionality from the graph package has been separated out and moved here. Tags now use types from the distribution/reference package instead of plain strings.
distribution: Push and pull code that originally was part of the graph package has been separated out into its own package.

The first time a version of the engine with these commits is started, it will migrate old graph metadata to the new format. This involves calculating content hashes for the existing data, but it does not move underlying graphdriver filesystem data. It doesn’t remove old graph metadata, so the migration process is not destructive.

The new data model does not have a one-to-one relationship between images and layers. A single image can have many layers. Existing versions of Docker create an image for each layer, and use the parent chain to link them together. That means that pulling a specific image requires pulling all the artifacts from the original build process of that image. With this PR, when an image is pulled from a registry, only a single image is created, to match the new data model. The history is preserved through a list of commands, dates, etc.

Summary of UI Changes:

Pull/push do not transfer the entire parent chain.
Full-length image IDs have a sha256: prefix. This prefix is hidden for truncated image IDs for convenience.
Images don't have the concept of VirtualSize (because there’s no special meaning for a top layer anymore).

Future work:

Add support for new manifest format for push/pull: docs/spec: Proposal for new manifest format distribution/distribution#1068
Move graphdriver initialization/shutdown out of daemon.
Move some layer, image, and distribution code out of docker/docker (remove dependencies on Engine code as necessary).
Additional changes needed for managing the Windows base layer.

Unit test coverage for the new packages:

ok      github.com/docker/docker/image  0.051s  coverage: 85.8% of statements
ok      github.com/docker/docker/layer  0.223s  coverage: 76.0% of statements
ok      github.com/docker/docker/migrate/v1 0.033s  coverage: 71.1% of statements
ok      github.com/docker/docker/tag    0.017s  coverage: 86.1% of statements

For the full design document, see https://gist.github.com/aaronlehmann/b42a2eaf633fc949f93b

lowenna · 2015-11-12T02:11:10Z

@swernli - Stefan can you review from the Windows side please?

thaJeztah · 2015-11-12T07:33:06Z

The first time a version of the engine with these commits is started, it will migrate old graph metadata to the new format. This involves calculating content hashes for the existing data, but it does not move underlying graphdriver filesystem data. It doesn’t remove old graph metadata, so the migration process is not destructive.

Curious; will repeated upgrades / downgrades work? (e.g., Used 1.10 for testing, then continue on 1.9 and run 1.10 again)

aaronlehmann · 2015-11-12T07:51:44Z

Curious; will repeated upgrades / downgrades work? (e.g., Used 1.10 for testing, then continue on 1.9 and run 1.10 again)

New images built or pulled in 1.10 wouldn't be visible to 1.9. But the original images would still be there, unless you manually deleted some of them.

runcom · 2015-11-12T13:03:29Z

daemon/daemon.go

should this be versioned? or can't it anymore?

VirtualSize as a concept has been removed as all images freely share all of their layers and no layer is unique to an image. So the value Size is now same that was previously VirtualSize.

I changed the PR so that VirtualSize isn't cleared any more but shows same data. That should provide correct data to older clients. New clients don't use this field any more. As this field isn't useful for new clients we can figure out a deprecation path and this can be done through API versions, but I think we can do that after this PR is merged.

cool thanks @tonistiigi just making sure we won't forget this

swernli · 2015-11-12T20:05:51Z

Reviewing for Windows...

thaJeztah · 2015-11-12T20:54:04Z

ping @docker/maintainers please review and test this one. It's a huge change, but really exciting stuff.

also:
please try to avoid merging large PRs while this PR isn't merged yet to avoid rebase hell

jessfraz · 2015-11-12T20:57:43Z

\o/

On Thu, Nov 12, 2015 at 12:56 PM, Sebastiaan van Stijn <
[email protected]> wrote:

ping @docker/maintainers
https://github.com/orgs/docker/teams/maintainers please review and test
this one. It's a huge change, but really exciting stuff.

also:
please try to avoid merging large PRs while this PR isn't merged yet to
avoid rebase hell

—
Reply to this email directly or view it on GitHub
#17924 (comment).

thaJeztah · 2015-11-12T21:11:11Z

A difference I see with this one (probably expected, but just to verify)

Dockerfile

FROM ubuntu:14.04
ENV foo=bar
RUN echo hello > /foo

docker build -t foo .

Docker 1.9:

docker history foobar
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
408581b4301b        7 seconds ago       /bin/sh -c echo hello > /foo                    6 B
012645df8321        7 seconds ago       /bin/sh -c #(nop) ENV foo=bar                   0 B
e9ae3c220b23        2 days ago          /bin/sh -c #(nop) CMD ["/bin/bash"]             0 B
a6785352b25c        2 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   1.895 kB
0998bf8fb9e9        2 days ago          /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   194.5 kB
0a85502c06c9        2 days ago          /bin/sh -c #(nop) ADD file:531ac3e55db4293b8f   187.7 MB

With this PR:

docker history foobar
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
a12448c314a7        6 minutes ago       /bin/sh -c echo hello > /foo                    2.048 kB
e39f86d96664        6 minutes ago       /bin/sh -c #(nop) ENV foo=bar                   0 B
4b1e42b414f6        2 days ago          /bin/sh -c #(nop) CMD ["/bin/bash"]             1.024 kB
<missing>           2 days ago          /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/   4.608 kB
<missing>           2 days ago          /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic   208.9 kB
<missing>           2 days ago          /bin/sh -c #(nop) ADD file:531ac3e55db4293b8f   196.8 MB

Differences:

sizes reported are different
images from the parent image are reported as <missing> (I expect that to be by design, but just checking)

tonistiigi · 2015-11-12T21:22:49Z

@thaJeztah The sizes are currently from tars so they have the added padding. We are exploring ways to revert this if possible. <missing> is shown if the parent image is not locally available. This is because you probably did a fresh pull and got a flat image without parents for ubuntu.

thaJeztah · 2015-11-12T21:28:35Z

@tonistiigi correct, it was a fresh pull

swernli · 2015-11-12T23:41:15Z

Currently, these changes break the use of alternate ID files in layers necessary for Windows containers to work. I'm looking for a fix that can be applied to these changes to unblock us.

moxiegirl · 2015-11-13T18:04:06Z

Not in doc review but this could have documentation impact that is pretty big depending on the implementation. So, engineering hours could be required in terms of reviewing at the least. I'd like to include this information early in the PR lifecycle.

The image discussion recently added along with the storage driver material: https://github.com/docker/docker/blob/master/docs/userguide/storagedriver/imagesandcontainers.md --- minor changes I think likely in graphics and in discussion of how images are built. (Not impact necessarily to choosing a storage driver).
Re @thaJeztah's demo of the output of a command --> we have a lot of examples of that show image lists in the docs. If the format of the output around images changes significantly we should make sure to update all the examples across all the projects --- these are things readers notice.

metayd · 2016-02-17T01:48:51Z

The packages I was talking about are called image and layer. They are in the top-level directory of this repository.
@aaronlehmann Thanks.

metayd · 2016-02-20T05:03:02Z

@aaronlehmann @tonistiigi Hello, Is there any tools that I can used to calculate the content-hash of a tar file?
I have tried tar -xOf layer.tar | sha256sum, but the hash value I got is different from the hash value that calculated from docker daemon.
Or is there anything special in the go tar-split package?

stevvooe · 2016-02-22T20:53:23Z

@dbdd4us The content hash of a layer is the hash of the compressed content, as sent to the registry.

tonistiigi · 2016-02-23T03:02:32Z

@stevvooe layer hashes in image config are the hashes of uncompressed content.
@dbdd4us I'm not sure what the layer.tar is in your example. Checksum is for all the bytes in tar, there is no extraction step involved. Easiest to get those tars is with docker save. Note that these do not match the image IDs that are checksums of image config + layer checksums. Make sure to also read the design doc in first post.

When content addressablity was introduced in moby#17924, a compatibility layer for registry v1 pushes was added. When the engine is asked to push an image to a v1 registry it needs to create v1 IDs for the images. The strategy so far has been to use the full imageID for the first v1 layer and the ChainID for all other layers, effectively creating as many v1 layers as there are in the image. Only the top most layer contained the image configuration and the other layers had a dummy json containing only a parent reference. This becomes problematic when the first layer of the image is big. Consinder the following two Dockerfiles: FROM busybox RUN create_very_big_file CMD /foo FROM busybox RUN create_very_big_file CMD /bar Both of these images will have the exact same layers, with the layer created by `RUN create_very_big_file` being the topmost one, but their imageIDs will differ since they have a different CMD and therefore different image configs. When pushing to a v1 registry, the `RUN create_very_big_file` layer will be pushed twice, once with the v1 ID set to foo's imageID and once with the v1 ID set to bar's imageID. Also, any clients wanting to pull those images won't realise it's the same layer and will proceed to download it twice. This commit solves this problem by separating the layers from the image configuration information when pushing to a v1 registry. To do this, all layers of an image are pushed with their ChainIDs and a synthetic top level layer is created with its contents set to the EmptyLayer, it's config set to the image config, and its v1 ID set to the imageID. This will have the side-effect of adding one layer. To prevent new layers being piled on top of each other forever, the code checks if the topmost layer is already an empty layer and in that case it uses that for the image configuration. Signed-off-by: Petros Angelatos <[email protected]>

ayanamist · 2018-10-15T03:55:28Z

layer/layer_store.go

+	mountID := name
+	if runtime.GOOS != "windows" {
+		// windows has issues if container ID doesn't match mount ID
+		mountID = stringid.GenerateRandomID()


Hi, i'm investigating moby source code, when i read this, i doubt if there is any reason that moby do not always use name regard of windows or not. Is there any consideration not commented here?

The v1.10 layout and the migrator was added in 2015 via moby#17924. Although the migrator is not marked as "deprecated" explicitly in cli/docs/deprecated.md, I suppose people should have already migrated from pre-v1.10 and they no longer need the migrator, because pre-v1.10 version do not support schema2 images (and these versions no longer receives security updates). Signed-off-by: Akihiro Suda <[email protected]>

GordonTheTurtle added area/distribution Image Distribution platform/windows status/0-triage labels Nov 12, 2015

tiborvass added status/2-code-review and removed status/0-triage labels Nov 12, 2015

tiborvass added this to the 1.10 milestone Nov 12, 2015

aaronlehmann force-pushed the content-addressability branch 3 times, most recently from 8b59d05 to 1d793b1 Compare November 12, 2015 01:55

aaronlehmann force-pushed the content-addressability branch 4 times, most recently from 49ec06e to 44e6989 Compare November 12, 2015 03:19

This was referenced Nov 12, 2015

docker manifest [--digest] command #17402

Closed

daemon: isolate windows only graphdriver logic #17216

Closed

runcom reviewed Nov 12, 2015
View reviewed changes

aaronlehmann force-pushed the content-addressability branch from 59afa95 to 9c6133a Compare November 12, 2015 20:08

aaronlehmann mentioned this pull request Nov 12, 2015

update docker_cli_pull_test.go #16872

Closed

stevvooe mentioned this pull request Nov 13, 2015

Proposal add docker images --tree back #17366

Closed

thaJeztah mentioned this pull request Feb 23, 2016

docker images cannot list repo with port #20599

Closed

ixdy mentioned this pull request Feb 29, 2016

e2e/build/docker flake: Successfully built X, no such id: sha256:<X> kubernetes/kubernetes#21991

Closed

thaJeztah mentioned this pull request Mar 3, 2016

Add a quiet option to docker pull #13588

Closed

thaJeztah mentioned this pull request Apr 15, 2016

changed image ids when using registry v2 #18179

Closed

damienmg mentioned this pull request May 11, 2016

docker_build does not support docker 1.10 bazelbuild/bazel#1113

Closed

igrayson mentioned this pull request Jun 6, 2016

Caching doesn't work for CI builds? grammarly/rocker#108

Closed

tonistiigi mentioned this pull request Jul 19, 2016

Why is Docker load command not secure? #24779

Closed

petrosagg mentioned this pull request Mar 29, 2017

distribution: separate layer and image config for v1 pushes balena-io-archive/docker#12

Merged

ayanamist reviewed Oct 15, 2018

View reviewed changes

AkihiroSuda mentioned this pull request Nov 24, 2018

Remove v1.10 migrator #38265

Merged

olljanat mentioned this pull request May 23, 2019

Docker run from a image with same image ID and size behaves differently #39247

Closed

olljanat mentioned this pull request Jun 7, 2019

[19.03 backport] Added garbage collector for image layers docker-archive/engine#268

Closed

4 tasks

thaJeztah mentioned this pull request Apr 28, 2023

remove uses of deprecated VirtualSize field docker/cli#4242

Merged

thaJeztah mentioned this pull request Jan 8, 2024

builder-next: builder uses ChecksumForGraphID, which always produces error (?) #47040

Open

thaJeztah mentioned this pull request Jul 3, 2025

image/tarexport: remove suport for loading v0/v1 images #50324

Merged

thaJeztah mentioned this pull request Oct 5, 2025

api/types/image: InspectResponse: remove deprecated fields #51103

Merged

4 tasks

thaJeztah mentioned this pull request Oct 14, 2025

api/types/image: remove deprecated Summary.VirtualSize field #51190

Merged

Conversation

tonistiigi commented Nov 12, 2015

Summary of UI Changes:

Future work:

Unit test coverage for the new packages:

Uh oh!

lowenna commented Nov 12, 2015

Uh oh!

thaJeztah commented Nov 12, 2015

Uh oh!

aaronlehmann commented Nov 12, 2015

Uh oh!

runcom Nov 12, 2015

Choose a reason for hiding this comment

Uh oh!

tonistiigi Nov 14, 2015

Choose a reason for hiding this comment

Uh oh!

runcom Nov 14, 2015

Choose a reason for hiding this comment

Uh oh!

swernli commented Nov 12, 2015

Uh oh!

thaJeztah commented Nov 12, 2015

Uh oh!

jessfraz commented Nov 12, 2015

Uh oh!

thaJeztah commented Nov 12, 2015

Uh oh!

tonistiigi commented Nov 12, 2015

Uh oh!

thaJeztah commented Nov 12, 2015

Uh oh!

swernli commented Nov 12, 2015

Uh oh!

moxiegirl commented Nov 13, 2015

Uh oh!

metayd commented Feb 17, 2016

Uh oh!

metayd commented Feb 20, 2016

Uh oh!

stevvooe commented Feb 22, 2016

Uh oh!

tonistiigi commented Feb 23, 2016

Uh oh!

ayanamist Oct 15, 2018

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants