Conversation
Signed-off-by: Tamal Saha <[email protected]>
|
I think #3612 is the bug being fixed by this. |
|
I have an open question on this PR with the other maintainers. That is, is the removal of the timestamp a breaking change? We have been discussing these nuances lately. I added this issue to the agenda for the next meeting. |
| Name: filepath.ToSlash(name), | ||
| Mode: 0644, | ||
| Size: int64(len(body)), | ||
| ModTime: time.Now(), |
There was a problem hiding this comment.
IMHO, the current implementation of modTime is also wrong because it just sets the time when helm package was run. If it wants to be correct, it should set the original template files timestamp.
Also, zero-ing out timestamps is a common pattern for reproducible builds GoogleContainerTools/distroless#242
There was a problem hiding this comment.
May be this can be added as a flag to helm package command for those of us want to have reproducible charts.
There was a problem hiding this comment.
Everyone wants a reproducible build. It does not need to be hidden behind a flag. What I am trying to convey is that stripping mtime from the file is NOT the right way to go about addressing the issue.
#3612 (comment) suggests the issue is not due to timestamps, but instead the sorted order of the files within the tar file.
Can you please look further into the suggestions made in that thread before coming up with a solution. Even if you strip mtime fields from the archive, you may still not produce a reproducible build. We've looked into this issue multiple times and discovered it's not the mtime of the original files causing the issue here. Or at least, there are multiple layers causing this issue, and to solve it we need to address them all.
We tested this theory in another area and discovered stripping mtime from the archive does not resolve the issue. See #8216 (review).
In other words, it is possible that
- preserving the original file's mtime, and
- sorting files by name (as you've done so in this PR)
May fix this issue. but without any test infrastructure in place to confirm, we cannot prove it.
If you're attempting to fix something, please provide tests to ensure that your change addresses the issue.
There was a problem hiding this comment.
Can we add a backwards-compatible flag, like helm package --timestamp where timestamp defaults to true, but users can set it to false for reproducibility on that level.
There was a problem hiding this comment.
As discussed in the developer call today, the first step would be to write a test to identify the symptom. Then write a fix to alleviate the symptom.
It is possible that we only need to sort the files in the archive to fix the issue, which means there would not be any need for stripping file mtimes (and therefore no need for a --timestamp flag).
We did test the theory of stripping file mtimes before and that did not allay the symptom. @jdolitsky can also attest to this as he and his coworker tried to fix this for helm chart save. So I'm not confident that stripping file mtimes will fix the problem.
Let's get some tests in first before we start providing solutions.
There was a problem hiding this comment.
This should be tried without the timestamp going to 0. But, I suspect any timestamp difference will cause a different hash. A different time stored in the tar when everything else is the same is still different content that would produce a different hash.
In another part of the PR there is sorting on the content. The order of the files matters and I hope the change here solves the issue. I have not checked, though.
If the timestamp needs to be 0 then that would need to be hidden behind a flag. We can't break backwards compatibility. I would suggest a name like --reproducible-build.
There was a problem hiding this comment.
Please be aware that Git doesn't preserve modification time on files. So an Application Distributor cannot create a reproducible build by preserving the original file's modification time because the modified time will be different whenever the chart source checks out. I think using a flag (--reproducible-build) and setting timestamp to 0 looks acceptable to me—some tests to avoid regression would be great.
mattfarina
left a comment
There was a problem hiding this comment.
Thank you for putting this pull request together.
One thing this PR needs is tests to make sure the tarball build is reproducible. We need to build a package at least twice and compare the hash of them to verify the fix works.
This is important for Helm so that we can catch a regression in the future.
|
@tamalsaha are you interested or able to write the tests for this? If not, someone else can pick it up. |
If someone else can pick up that will be really great! Thanks. |
Signed-off-by: Tamal Saha [email protected]
What this PR does / why we need it:
Currently
helm packagedoes not produce reproducible tarballs. This change sets the timestamps to0to fix this. We store the tarballs in git repos. This helps us avoid unnecessary changes everytime we runhelm package.See also: https://unix.stackexchange.com/a/438330/42136
Special notes for your reviewer:
If applicable: