Skip to content

Conversation

@mrwilson
Copy link
Contributor

@mrwilson mrwilson commented Mar 18, 2020

Overview

This PR makes the java bytecode generated by this project reproducible. This means that under identical builds condition (for example Java version), repeated builds should provide the same output byte-for-byte.

It achieves this by:

  • supporting SOURCE_DATE_EPOCH as a way of fixing build-time-specific metadata added to JARs
  • using Gradle's reproducible archives functionality to fix timestamps and file ordering within JARs
  • re-archiving JARs that are non-deterministic because they use jar to rewrite their contents after build.

Confirming the change

#!/bin/bash -e

rm -f checksums*

# Fix build time
export SOURCE_DATE_EPOCH=$(date +%s)

./gradlew clean build -x test --parallel --rerun-tasks

find . -name '*.jar' \
    | grep '/build/libs/' \
    | grep -v 'javadoc' \
    | sort \
    | xargs sha256sum > checksums-1.txt

./gradlew clean build -x test --parallel --rerun-tasks

find . -name '*.jar' \
    | grep '/build/libs/' \
    | grep -v 'javadoc' \
    | sort \
    | xargs sha256sum > checksums-2.txt

diff checksums-{1,2}.txt

The diff should be empty, that is to say, both independent runs of the build should have generated byte-for-byte identical outputs.

Omitting Javadocs

The javadocs outputs remain non-deterministic - despite the inclusion of noTimestamp(true) in the project's javadoc configuration, the comment header is still being generated. It looks like the javadoc mechanism in Gradle isn't honouring this flag.

I wasn't able to solve this problem, but would be necessary to achieve full reproducibility.


I hereby agree to the terms of the JUnit Contributor License Agreement. ✅


Definition of Done

@mrwilson mrwilson changed the title Make jar builds reproducible [WIP] Make jar builds reproducible Mar 18, 2020
@mrwilson mrwilson changed the title [WIP] Make jar builds reproducible Make (most) jar builds reproducible Mar 19, 2020
Copy link
Member

@sormuras sormuras left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm -- @marcphilipp should take a look, too.

@sbrannen
Copy link
Member

Tentatively slated for 5.7 M1 for team discussion

@marcphilipp
Copy link
Member

The JARs that it does not apply to are junit-platform-commons and junit-platform-console as these use direct calls to the jar command as part of their builds which introduces non-determinism through the last modified timestamps.

Since all other jars use junit-platform-commons, merging this would have limited value. Is there an option for the jar command to use a fixed timestamp or can we add another task action to reset the timestamps of the jars?

@marcphilipp marcphilipp modified the milestones: 5.7 M1, General Backlog Mar 19, 2020
@mrwilson
Copy link
Contributor Author

mrwilson commented Mar 19, 2020

As far as I'm aware, jar doesn't have the capacity to set timestamps, so we'd need to go with option 2 - happy to add that.

I imagine something that loops over the contents of the archive and sets last modified to the value of buildTimeAndDate, which can be pinned to SOURCE_DATE_EPOCH.

@codecov
Copy link

codecov bot commented Mar 20, 2020

Codecov Report

Merging #2217 into master will not change coverage by %.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #2217   +/-   ##
=========================================
  Coverage     91.21%   91.21%           
  Complexity     4466     4466           
=========================================
  Files           383      383           
  Lines         10724    10724           
  Branches        869      869           
=========================================
  Hits           9782     9782           
  Misses          728      728           
  Partials        214      214           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6cde364...b4ad488. Read the comment docs.

@mrwilson
Copy link
Contributor Author

mrwilson commented Mar 20, 2020

@mrwilson mrwilson changed the title Make (most) jar builds reproducible Make JAR builds reproducible Mar 20, 2020
@marcphilipp marcphilipp removed this from the General Backlog milestone Mar 21, 2020
Copy link
Member

@marcphilipp marcphilipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mrwilson
Copy link
Contributor Author

Those commits have been re-written, and the test script only returns a diff for buildSrc as discussed earlier in the PR.

The jar-rewriting has been moved to its own file under buildSrc and is imported only in the two projects that need it. Feels much cleaner!

@mrwilson mrwilson requested a review from marcphilipp March 21, 2020 16:54
@mrwilson
Copy link
Contributor Author

This looks like it's almost ready to go - is there a sensible place to record this feature in the User Guide / Release Notes? (Last remaining unchecked box)

@marcphilipp
Copy link
Member

I'd say as a new section in the appendix. WDYT?

@mrwilson
Copy link
Contributor Author

mrwilson commented Mar 31, 2020

That sounds good, will add something there.

I've also added another GitHub action which will perform multiple assemble steps and compare their checksums, to ensure that future changes don't regress this property.

EDIT: This is now done and working - ready for re-review @marcphilipp

This change uses Gradle's reproducible archives feature (https://docs.gradle.org/current/userguide/working_with_files.html#sec:reproducible_archives) to consistently build the output JARs.

Additionally, the build now supports SOURCE_DATE_EPOCH to allow overriding of `buildTimeAndDate` as this introduces non-determinism into the build.
This uses the same fixed-time constant as is being used when `isPreserveFileTimestamps` is set to false.
@mrwilson mrwilson requested a review from sormuras April 1, 2020 18:48
Copy link
Member

@marcphilipp marcphilipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, only found two minor things!

@mrwilson mrwilson requested a review from marcphilipp April 2, 2020 17:41
@mrwilson
Copy link
Contributor Author

mrwilson commented Apr 2, 2020

Thanks for the feedback, made changes as suggested. 👍

Copy link
Member

@marcphilipp marcphilipp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@marcphilipp marcphilipp added this to the 5.7 M1 milestone Apr 3, 2020
@marcphilipp marcphilipp merged commit 5631ebc into junit-team:master Apr 3, 2020
@marcphilipp
Copy link
Member

Thanks a lot, @mrwilson! 🎉

@jbduncan
Copy link
Contributor

jbduncan commented Apr 3, 2020

Glad to see this merged in! Great work @mrwilson. 🎉

@vlsi
Copy link
Contributor

vlsi commented Apr 3, 2020

Frankly speaking, I see no reason to enforce timestamps in jar files.
It could be OK to just leave them as is (==0 timestamps) and avoid repackage step.


It is required to capture the build environment: timestamp, and Java version/vendor.

Even though jar most jar files have Build-Date and Build-Time headers, it is not clear how to reproduce JUnit build.
What value should be passed to SOURCE_DATE_EPOCH ?

val jarEntry = JarEntry(entry.name)

// Use the same constant as the fixed timestamps in normal copy actions
jarEntry.time = ZipCopyAction.CONSTANT_TIME_FOR_ZIP_ENTRIES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrwilson , this seems to specify the same constant value for all the jars. Is isPreserveFileTimestamps = false enough then? What is the purpose of rewriting the jar?

Copy link
Contributor Author

@mrwilson mrwilson Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for two specific cases where the actual jar command is being used to manipulate the artifact after it's been created by Gradle, which alters the timestamps and makes the build non-reproducible - I tried to replicate what the jar tool was doing in pure Gradle, with no success.

As an immediate solution, this resets all the timestamps of the two artifacts in question - the choice of timestamp was informed by what the isPreserveFileTimestamps = false would do if this step wasn't necessary.

I hope that helps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that clarifies things. It would probably be great if something like that was in the comment above.

@hboutemy
Copy link

FYI, I just added Gradle builds support to reproducible-central
and I tested JUnit5 latest release: https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/org/junit/junit5/README.md
sorry, there are still a few issues

@sormuras
Copy link
Member

The results linked via 🔍 show differences like:

│ -Created-By: 11.0.14.1 (Oracle Corporation 11.0.14.1+1)
│ +Created-By: 11.0.13 (Eclipse Adoptium 11.0.13+8)

Seems like not all "external" properties are stable - or not all expected differences are excluded from the comparison.

What is the latest result for JUnit 5.10.1?

@hboutemy
Copy link

What is the latest result for JUnit 5.10.1?

with my limited knowledge of Gradle, I can't make the build work: I need help

$ ./gradlew --no-daemon clean assemble publishToMavenLocal -x test -x signMavenPublication
...
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':junit-bom:publishMavenPublicationToMavenLocal'.
> Failed to publish publication 'maven' to repository 'mavenLocal'
   > Invalid publication 'maven': artifact file does not exist: '/tmp/junit-bom/junit-bom/build/publications/maven/pom-default.xml.asc'
...

Do you know how I can fix this?

@sormuras
Copy link
Member

The command we use in checkBuildReproducibility.sh reads:

./gradlew --no-build-cache clean assemble --parallel -Porg.gradle.java.installations.auto-download=false -Dscan.tag.Reproducibility

Does this help?

@hboutemy
Copy link

I tried to run this command: gives build success, but I don't know what this success means

Then I tried to modify source code and rebuild: same success, even if the expected result for modified code is failure because I obvioulsy should not get the same binaries are the 5.10.1 release

then I don't know what this command does, it it does not do what I expect = check local build output against content downloaded from Maven Central

I'll try to have a look at checkBuildReproducibility.sh to figure out...

@marcphilipp
Copy link
Member

@hboutemy Our release builds currently require signing but you can use any PGP key you want. The key id in your local GPG can be configured by passing -Psigning.gnupg.keyName=<your-key-id> (e.g. the actual key would be FF6E2C001948C5F2F38B0CC385911F425EC61B51). I'll add a build option to future releases so that signing can be disabled.

@hboutemy
Copy link

thanks @marcphilipp for helping: ok, I need to drop the -x signMavenPublication, then I'm now able to build 5.10.1

another question: is there a way to configure the publishToMavenLocal output to another place than default .m2/repository? This is a flexibility that would ease my work on creating a rebuild script

@hboutemy
Copy link

for now, I checked manually (please continue helping me on the ~/.m2/rpeository part), and found that issues from 5.8.2 remain in 5.10.1, for example:

├── META-INF/MANIFEST.MF
│ @@ -1,17 +1,17 @@
│  Manifest-Version: 1.0
│ -Build-Date: 2023-11-17
│ +Build-Date: 2023-11-05
│  Build-Revision: e5f50d8720741623915979ac154b65bcbcd6a577 
│ -Build-Time: 09:46:53.210+0100
│ +Build-Time: 17:49:13.996+0100
│  Built-By: JUnit Team
│  Bundle-ManifestVersion: 2
│  Bundle-Name: JUnit Vintage Engine
│  Bundle-SymbolicName: junit-vintage-engine
│  Bundle-Version: 5.10.1
│ -Created-By: 17.0.8 (Azul Systems, Inc. 17.0.8+7-LTS)
│ +Created-By: 17.0.5 (BellSoft 17.0.5+8-LTS)
│  Export-Package: org.junit.vintage.engine;version="5.10.1";status=INTER
│   NAL;mandatory:=status;uses:="org.apiguardian.api,org.junit.platform.e
│   ngine",org.junit.vintage.engine.descriptor;version="5.10.1";status=IN
│   TERNAL;mandatory:=status;uses:="org.apiguardian.api,org.junit.platfor
│   m.engine,org.junit.platform.engine.support.descriptor,org.junit.runne
│   r,org.junit.runner.manipulation",org.junit.vintage.engine.discovery;v
│   ersion="5.10.1";status=INTERNAL;mandatory:=status;uses:="org.apiguard

for the Build-Time and Build-Date , maybe there is a command line attribute to add for me to get the same binaries as what is published to Maven Central
for the Created-By: 17.0.8 (Azul Systems, Inc. 17.0.8+7-LTS), it's a pain to have that detailed info on the JDK because I need to install exactly the same one (hoping I can find it)

@sormuras
Copy link
Member

sormuras commented Nov 17, 2023

for the Build-Time and Build-Date , maybe there is a command line attribute to add for me to get the same binaries as what is published to Maven Central

Set the SOURCE_DATE_EPOCH environment variable, it's picked up by the build.

# Fix build time
export SOURCE_DATE_EPOCH=$(date +%s)

Details: https://reproducible-builds.org/docs/source-date-epoch/

@marcphilipp
Copy link
Member

@hboutemy Could you please raise a new issue where we can discuss dropping the Created-By attribute?

@marcphilipp
Copy link
Member

another question: is there a way to configure the publishToMavenLocal output to another place than default .m2/repository? This is a flexibility that would ease my work on creating a rebuild script

Yes, you can call publishAllPublicationsToTempRepository instead of publishToMavenLocal to write to build/repo instead.

@marcphilipp
Copy link
Member

thanks @marcphilipp for helping: ok, I need to drop the -x signMavenPublication, then I'm now able to build 5.10.1

I've just added an option to disable signing that will help with future releases: 6238d55

@sormuras
Copy link
Member

Set the SOURCE_DATE_EPOCH environment variable, ...

Perhaps we can store the value used for SOURCE_DATE_EPOCH inside the Manifest.mf file for easier consumption? 🤔

@vlsi
Copy link
Contributor

vlsi commented Nov 17, 2023

Created-By: 17.0.8 (Azul Systems, Inc. 17.0.8+7-LTS), it's a pain to have that detailed info on the JDK because I need to install exactly the same one (hoping I can find it)

In practice, different JVM can produce different jars, so having a detailed JVM version does help.
I am not sure it should be in MANIFEST though, and it might be better to store it side by side with the generated jars (just like we do not store the timestamps in the jars rather than we store it separately).

@hboutemy
Copy link

@visi #3559 created, as asked by @marcphilipp
let's discuss there :)

and thanks @marcphilipp for your hints, I'll be able to improve my Gradle rebuilding script in Reproducible Central to check more projects in the future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants