Skip to content

[WIP] Remove non reproducible parts from image config for stable builds#30611

Closed
justincormack wants to merge 1 commit intomoby:masterfrom
justincormack:repro
Closed

[WIP] Remove non reproducible parts from image config for stable builds#30611
justincormack wants to merge 1 commit intomoby:masterfrom
justincormack:repro

Conversation

@justincormack
Copy link
Contributor

If you do the same Docker build again and again, you will currently get
a different outcome. Madness!

The reason for this is that the config file inside the image, from which
the hash is calculated, has some fields that are effectively random or
which change over time.

These are

  • config.Hostname - build containers have random hostnames
  • container_config.Hostname - same
  • container - ID of container used to commit, effectively random
  • created - timestamp of creation
  • history.created - same

This PR leaves the hostname (and domainname) fields blank, and
the container ID.

Timestamps are set to Go zero time. See https://reproducible-builds.org/docs/timestamps/
for rationale. Users who want timestamps should use labels with eg the commit
time or other repeatable time.

TODO: tests, deprecation notices on fields.

With this patch:

whale:repro justin$ cat Dockerfile
FROM scratch
ADD . .
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578

without:

justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:47170bd064a699b2838ecc8b279813da1edde5b49a00d8e866863fa1da94c477
justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:f9e5f8f8ac1e23f1d1ed45150d4c69f251056840d8745c4606001f6c340b2baa
justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:5924b383e19182bc2d6e382041d5acc11985cdea76c18db12fcbbcd867fd66fb

Signed-off-by: Justin Cormack [email protected]

cc @riyazdf @stevvooe

little-egret

If you do the same Docker build again and again, you will currently get
a different outcome. Madness!

The reason for this is that the config file inside the image, from which
the hash is calculated, has some fields that are effectively random or
which change over time.

These are

- config.Hostname - build containers have random hostnames
- container_config.Hostname - same
- container - ID of container used to commit, effectively random
- created - timestamp of creation
- history.created - same

This PR leaves the hostname (and domainname) fields blank, and
the container ID.

Timestamps are set to Go zero time. See https://reproducible-builds.org/docs/timestamps/
for rationale. Users who want timestamps should use labels with eg the commit
time or other repeatable time.

TODO: tests, deprecation notices on fields.

With this patch:
```
whale:repro justin$ cat Dockerfile
FROM scratch
ADD . .
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578
whale:repro justin$ docker build -q --no-cache .
sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578
```
without:
```
justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:47170bd064a699b2838ecc8b279813da1edde5b49a00d8e866863fa1da94c477
justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:f9e5f8f8ac1e23f1d1ed45150d4c69f251056840d8745c4606001f6c340b2baa
justin@zander:~/tmp/repro$ docker build -q --no-cache .
sha256:5924b383e19182bc2d6e382041d5acc11985cdea76c18db12fcbbcd867fd66fb
```

Signed-off-by: Justin Cormack <[email protected]>
@stevvooe
Copy link
Contributor

stevvooe commented Feb 1, 2017

@justincormack Interesting!

So, this is classically handled in merge.

While we really do need timestamps for reproducible builds, eliding them will result in some odd UX behavior. We'll have to consider the omission against this.

@AkihiroSuda AkihiroSuda added the status/failing-ci Indicates that the PR in its current state fails the test suite label Feb 1, 2017
@justincormack
Copy link
Contributor Author

Test failures are:

  • TestBuildSameDockerfileWithAndWithoutCache test uses the fact that build hash is the same to see if cache used, which no longer works, should be fairly easy to fix
  • lots of date based failures - as creation date is not there any more these fail. More problematic.

The other option to still retain creation date is to make reproducible builds an option, not a default, by having a build (and other create operation) time option to set creation date (eg you could set it from the git timestamp of your code, or unix 0 if you don't care about metadata). The other changes are ok anyway I think, it is just the timestamp. This would work for me, as I am fine opting in to reproducibility, but is not as nice as reproducible by default. However given the number of tests we have with dates, I think this is probably the best option, will take a look at it.

@justincormack
Copy link
Contributor Author

Specifying the timestamp would also allow using the equivalent of --clamp-mtime on the layer tarballs which would help a lot see https://reproducible-builds.org/docs/archives/

@tonistiigi
Copy link
Member

I think having an option is the way to go at least initially. I think both timestamp and boolean flag make sense. We also need to clean up the UI, zero time should not show up or in image inspect and images/history should not show that it was created 292 years ago.

return nil
}

// Return the reproducible parts of the config, which currently excludes hostname as random
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just mark hostname as omitempty

@sathieu
Copy link
Contributor

sathieu commented Feb 11, 2017

Why not use SOURCE_DATE_EPOCH as creation time, defaulting to current time ?

@tonistiigi
Copy link
Member

Remembered one other thing that should be checked with this. Local build cache and image deletion logic is based on the parent chain. With this change, it becomes very easy to end up with the same image IDs and that could probably overwrite previous parent IDs or leave them in an inconsistent state. This should happen for example when someone just reorders Dockerfile commands.

@justincormack
Copy link
Contributor Author

@tonistiigi ah interesting, will think about adding some tests for that. It may be ok, but worth checking. Planning to get back to this in a bit when I have some more time...

@justincormack
Copy link
Contributor Author

Not planning to work on this until after Dockercon, if you want to close now thats fine by me...

@vdemeester
Copy link
Member

@justincormack I'll close for now, please ping me when you work on it later so we can re-open 😉

@AkihiroSuda
Copy link
Member

I saw Kaniko is going to implement this: GoogleContainerTools/kaniko#120

@justincormack @tonistiigi
Is this still plan on Moby&BuildKit ?

@tonistiigi
Copy link
Member

BuildKit has bit improvements on this. The image date is the creation date of last layer and layer timestamps are persisted through cache(even remote). So image timestamp should only change if layers change. But I think having manual control is still a good idea and future-wise we should avoid using these fields completely and switch to the local metadata fields.

@tonistiigi
Copy link
Member

Also buildkit doesn't add randomness for hostname/containerid etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status/failing-ci Indicates that the PR in its current state fails the test suite status/1-design-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants