[WIP] Remove non reproducible parts from image config for stable builds#30611
[WIP] Remove non reproducible parts from image config for stable builds#30611justincormack wants to merge 1 commit intomoby:masterfrom
Conversation
If you do the same Docker build again and again, you will currently get a different outcome. Madness! The reason for this is that the config file inside the image, from which the hash is calculated, has some fields that are effectively random or which change over time. These are - config.Hostname - build containers have random hostnames - container_config.Hostname - same - container - ID of container used to commit, effectively random - created - timestamp of creation - history.created - same This PR leaves the hostname (and domainname) fields blank, and the container ID. Timestamps are set to Go zero time. See https://reproducible-builds.org/docs/timestamps/ for rationale. Users who want timestamps should use labels with eg the commit time or other repeatable time. TODO: tests, deprecation notices on fields. With this patch: ``` whale:repro justin$ cat Dockerfile FROM scratch ADD . . whale:repro justin$ docker build -q --no-cache . sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578 whale:repro justin$ docker build -q --no-cache . sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578 whale:repro justin$ docker build -q --no-cache . sha256:d6f99a7d1f661973ecda1aa34394c0523774bd61382dee348ec8457471b95578 ``` without: ``` justin@zander:~/tmp/repro$ docker build -q --no-cache . sha256:47170bd064a699b2838ecc8b279813da1edde5b49a00d8e866863fa1da94c477 justin@zander:~/tmp/repro$ docker build -q --no-cache . sha256:f9e5f8f8ac1e23f1d1ed45150d4c69f251056840d8745c4606001f6c340b2baa justin@zander:~/tmp/repro$ docker build -q --no-cache . sha256:5924b383e19182bc2d6e382041d5acc11985cdea76c18db12fcbbcd867fd66fb ``` Signed-off-by: Justin Cormack <[email protected]>
|
@justincormack Interesting! So, this is classically handled in While we really do need timestamps for reproducible builds, eliding them will result in some odd UX behavior. We'll have to consider the omission against this. |
|
Test failures are:
The other option to still retain creation date is to make reproducible builds an option, not a default, by having a build (and other create operation) time option to set creation date (eg you could set it from the git timestamp of your code, or unix 0 if you don't care about metadata). The other changes are ok anyway I think, it is just the timestamp. This would work for me, as I am fine opting in to reproducibility, but is not as nice as reproducible by default. However given the number of tests we have with dates, I think this is probably the best option, will take a look at it. |
|
Specifying the timestamp would also allow using the equivalent of |
|
I think having an option is the way to go at least initially. I think both timestamp and boolean flag make sense. We also need to clean up the UI, zero time should not show up or in |
| return nil | ||
| } | ||
|
|
||
| // Return the reproducible parts of the config, which currently excludes hostname as random |
There was a problem hiding this comment.
Can't we just mark hostname as omitempty
|
Why not use SOURCE_DATE_EPOCH as creation time, defaulting to current time ? |
|
Remembered one other thing that should be checked with this. Local build cache and image deletion logic is based on the parent chain. With this change, it becomes very easy to end up with the same image IDs and that could probably overwrite previous parent IDs or leave them in an inconsistent state. This should happen for example when someone just reorders Dockerfile commands. |
|
@tonistiigi ah interesting, will think about adding some tests for that. It may be ok, but worth checking. Planning to get back to this in a bit when I have some more time... |
|
Not planning to work on this until after Dockercon, if you want to close now thats fine by me... |
|
@justincormack I'll close for now, please ping me when you work on it later so we can re-open 😉 |
|
I saw Kaniko is going to implement this: GoogleContainerTools/kaniko#120 @justincormack @tonistiigi |
|
BuildKit has bit improvements on this. The image date is the creation date of last layer and layer timestamps are persisted through cache(even remote). So image timestamp should only change if layers change. But I think having manual control is still a good idea and future-wise we should avoid using these fields completely and switch to the local metadata fields. |
|
Also buildkit doesn't add randomness for hostname/containerid etc. |
If you do the same Docker build again and again, you will currently get
a different outcome. Madness!
The reason for this is that the config file inside the image, from which
the hash is calculated, has some fields that are effectively random or
which change over time.
These are
This PR leaves the hostname (and domainname) fields blank, and
the container ID.
Timestamps are set to Go zero time. See https://reproducible-builds.org/docs/timestamps/
for rationale. Users who want timestamps should use labels with eg the commit
time or other repeatable time.
TODO: tests, deprecation notices on fields.
With this patch:
without:
Signed-off-by: Justin Cormack [email protected]
cc @riyazdf @stevvooe