Add option to skip maven execution if source is older than output#1502
Add option to skip maven execution if source is older than output#1502wing328 merged 1 commit intoOpenAPITools:masterfrom
Conversation
There was a problem hiding this comment.
Thanks for the PR. I'm not too familiar with the maven plugin, but I thought I would comment on the PR itself.
One issue we might have with skipping generation when there have been no new spec changes by file age of the input spec is that it's possible there have been spec changes between when the output directory was created and when the output directory was modified. An example would be this scenario:
- generate
- edit spec
- create some nested directory under output directory (this could happen if you build the output directory in-place or clean a previous build)
- edit spec
- regenerate
Here's a mock example in terminal:
# create spec file with mtime in the past
$ touch -a -m -t 201801181205.09 spec.json
# create output directory some time (5 min) after spec modification
$ mkdir output
$ touch -a -m -t 201801181210.09 output
# check that mtimes are spec.json < output
$ ls -la --full-time
total 0
drwxr-xr-x 4 jim wheel 128 2018-11-20 23:11:26.670343174 -0500 .
drwxrwxrwt 8 root wheel 256 2018-11-20 23:11:14.882538014 -0500 ..
drwxr-xr-x 2 jim wheel 64 2018-01-18 12:10:09.000000000 -0500 output
-rw-r--r-- 1 jim wheel 0 2018-01-18 12:05:09.000000000 -0500 spec.json
# pretend we've compiled code under the output directory, resulting in nested directories
$ mkdir -p output/build
# check mtimes again output is now updated to today
$ ls -la --full-time
total 0
drwxr-xr-x 4 jim wheel 128 2018-11-20 23:11:26.670343174 -0500 .
drwxrwxrwt 8 root wheel 256 2018-11-20 23:11:14.882538014 -0500 ..
drwxr-xr-x 3 jim wheel 96 2018-11-20 23:11:55.603842334 -0500 output
-rw-r--r-- 1 jim wheel 0 2018-01-18 12:05:09.000000000 -0500 spec.json
# edit spec in some way
$ cat >> spec.json <<EOF
> {
> }
> EOF
# check mtimes, notice spec.json > output
# I think we can agree that most would probably expect generation here
$ ls -la --full-time
total 4
drwxr-xr-x 4 jim wheel 128 2018-11-20 23:11:26.670343174 -0500 .
drwxrwxrwt 8 root wheel 256 2018-11-20 23:11:14.882538014 -0500 ..
drwxr-xr-x 3 jim wheel 96 2018-11-20 23:11:55.603842334 -0500 output
-rw-r--r-- 1 jim wheel 4 2018-11-20 23:12:37.773754953 -0500 spec.json
# simulate a "clean" build operation
# I agree this _might_ be ok to skip, but seems this logic always skips.
$ rm output/build/
$ ls -la --full-time
total 4
drwxr-xr-x 4 jim wheel 128 2018-11-20 23:11:26.670343174 -0500 .
drwxrwxrwt 8 root wheel 256 2018-11-20 23:11:14.882538014 -0500 ..
drwxr-xr-x 2 jim wheel 64 2018-11-20 23:21:36.119196033 -0500 output
-rw-r--r-- 1 jim wheel 4 2018-11-20 23:12:37.773754953 -0500 spec.jsonIn the above example, I would personally expect regeneration in both of the last two examples since both occurred after the spec document was changed. It seems to me that in this PR, the second to last would not be skipped while the last example would be skipped.
Do you think it might make sense to store a checksum of the specification document in the output directory (e.g. output/.openapi-generator/checksum), then evaluate that checksum and skip generation if the spec's checksum is the same? I think this would achieve the same goal and would avoid any potential surprises, like if a generation occurred after the last example above.
We'd previously discussed storing some details about the original spec in the output directory and I think a checksum or other hash of the input file's contents would be a good addition.
edit: I left out a bit of the example, and cleaned up some confusing wording in the scenario.
There was a problem hiding this comment.
If one were to reuse the property codegen.skip, doesn't that result in all generation being skipped (line 358 in this diff)?
There was a problem hiding this comment.
inputSpecFile here would reference an OpenAPI document and output refers to the output directory.
Is the intention of skipIfSourceIsOlderThanOutput to only regenerate if someone has modified the spec document since the last regeneration? If this is the case, I would suggest renaming the property to avoid confusion… something like skipIfSpecIsUnchanged might be clearer to those reading the properties list. Even after reading the issue and scanning the PR's changes, I thought this was doing something similar to skipOvewrite.
Although… my suggested skipIfSpecIsUnchanged might suggest to users that we're storing the atime/mtime of the spec (which we could definitely persisted to {output}/.openapi-generator/SOME_FILENAME).
|
Thanks for the detailed comments. I have changed the implementation to generate a SHA-256 checksum of the spec and store it in the .openapi-generator folder under the name .sha256 and then use that to compare. The generation of the checksum happens in the CodeGenMojo-class, so it only runs when using Maven. Another option would be to have DefaultGenerator generate it, but that would have higher impact. |
|
@rasmusfaber I've reviewed the changes, and I think it fixes any issues in workflow as I presented above. I'll leave this for review/merge by someone more familiar with the Maven plugin. I'll be happy to do the refactor you mentioned once this is merged, so the same strategy can be used outside of the Maven plugin (like in the gradle plugin). |
|
LGTM (but no expert in the Maven plug-in either) |
|
Greetings! @wing328 pointed out that we have similar PRs addressing similar problems, so I'd like to have a little discussion to see if we should choose one approach or both. My PR is #1397 and issue #1081.
The thing I don't understand about the difference between my approach and this approach is how it may or may not affect downstream build steps. If the maven plugin change avoids firing the code-generation step at all, this may result in fewer ripples down the dependency tree. Unfortunately I don't understand maven plugins and can't judge. It is possible that the combined techniques will be useful. Because even if the maven plugin generates code due to timestamp update, we can still only touch a minimal number of output files. Assuming the maven plugin doesn't delete the output folder first. This may or may not really matter in a java environment where recompiling some source files is fairly quick. One other thing to note: my approach will of course always generate new output files if the output is cleaned first. It basically has no relation to the clean lifecycle. Because of that, build-without-clean may result in lingering output files whose specification has since disappeared -- I don't check for removal of an output file, but perhaps I should? @rasmusfaber your thoughts appreciated. |
|
@wheezil I don't think your contributions are doing the same thing. This PR speeds builds by not wasting cycles if your spec is identical to the spec that previously generated your code. This means that once you've generated code, you assume it shouldn't regenerate unless your spec's hash changes. Maven builds are notoriously slow, which is why I don't personally use it and therefore have little experience with it. This PR helps there. Your PR is only changing files that are not identical to the contents that would be generated. While that could be useful in many cases, it would cause issues for example where someone has modified templates to output template generation time in a file header/comment. If the spec hasn't changed for 20 builds, there's not really a reason to run codegen 20 times just to update that header. I do think the work in this PR could be moved from the Maven plugin into Codegen core, so it'll work the same across all plugins/entrypoints. But I don't think that'll have an effect on your PR. The difference between "should I generate at all" and "should I replace this generated file" are significant enough to keep separate. |
|
@jimschubert Thanks for comment. I see what you mean that they are addressing separate concerns. |
|
Is this check based on file times or file contents? |
|
@jimschubert regarding performance of the build, is this time spent running the generator, or is it time spent compiling the resulting code? I have not yet used a large API so I don't know what to expect. |
|
@wheezil I'm confused by your comment about file time or contents (I can't tell if you're referring to spec or generated code). The work in this PR compares the sha hash of a spec document, and if that matches the hash stored in the existing generated code, it skips executing the generator at all. It has nothing to do with the generated files, other than the metadata file storing the hash. When you ask about referenced files, it's something I'm considering to improve upon when I move into core... most likely parsing into the OpenAPI structure, then hashing against those contents. This would mean rather than file modifications, it would regenerate on changes in our abstraction across versions (which often results in fixes or changes in generated code). This would probably need an option to allow file-based or object-based hashing for those who don't want to consider parsing differences to be spec differences. Your question about Maven is hard to answer. Maven is slow when you have many dependencies, many projects, and other factors. It doesn't have built in caching like Gradle/SBT or other JVM targeting build systems. There are options out there for speeding up builds, but they can be difficult to setup. Maven has parallel build as an option, although I think it's fairly recent. I don't know if it's better to use the exec plugin because, as I mentioned I'm not that familiar with Maven (this being the only project I touch that uses it, and I don't do much of the build management). I do know that I've had issues trying to get the exec plugin to work in the past when I was trying to integrate the gradle plugin into the build. I would personally stick with the Maven plugin, since it has cleanly into the architecture. Or, use our gradle plugin ;) |
|
@jimschubert thanks for clarifying that it compares the contents (hash) of the spec file. I was confused and thought it was using modification times. Regarding maven/gradle, I will stick with maven, since that's our team standard. Thanks again. |
|
@wheezil I understand the confusion. The original implementation used file modification time compared to output directory modification time. |
|
@rasmusfaber thanks for the PR, which has been included in the v3.3.4 release: https://twitter.com/oas_generator/status/1068772409795207168 |
PR checklist
./bin/to update Petstore sample so that CIs can verify the change. (For instance, only need to run./bin/{LANG}-petstore.shand./bin/security/{LANG}-petstore.shif updating the {LANG} (e.g. php, ruby, python, etc) code generator or {LANG} client's mustache templates). Windows batch files can be found in.\bin\windows\.master,3.4.x,4.0.x. Default:master.Description of the PR
Implementation of #1501