-
Notifications
You must be signed in to change notification settings - Fork 134
Add performance regression comparison in CI #2370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
08d2bdf to
f4b9875
Compare
f2fa8df to
11be08e
Compare
11be08e to
9cce08c
Compare
I agree with the idea that we have different scales for different benchmarks. But I am not sure about the numbers. How about running this check in CI a couple times to collect the data of running, time and then use the data to decide the scale numbers. |
celinval
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit: This PR is adding a new workflow for PRs but it isn't blocking yet. AFAIK, this is a github configuration which I think we should activate only after we adjust the scale that ensures this job won't give too many false positives.
9cce08c to
498731e
Compare
Thanks, I've changed the commit message. Agreed about leaving this to bake for a while before we make it blocking |
|
@qinheping here are my thoughts on which numbers to use: I've tried a few experiments with CI and locally. I get a lot less variance locally, often all benchmarks (or all but one) have a variance <10%. This is when comparing identical Kanis. In CI, the numbers that @adpaco-aws seem to work well for now. I do think it would be a good idea to optimize these numbers, but I would prefer to see what the range is on realistic PRs---that is, PRs that actually change the implementation of Kani and so cause and actual performance difference. What I propose:
I suggest that we merge this PR first, though. I would be inclined to merge quickly and start getting data as quickly as we can, rather than trying to get it right from now, especially since we don't have data for realistic runs. |
tautschnig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to make sure we have access to metrics that are very low noise.
a9523c7 to
bd483f2
Compare
Sounds great! Thank you @karkhaz. |
bd483f2 to
20ad552
Compare
tautschnig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got convinced that this is a start, but I do maintain that we need to dig deeper to figure out what the sources of performance variability are.
I agree. I think we should merge the PR but only make it required once we figure that out. |
71ec9e6 to
60c0c43
Compare
fdaa1fa to
eae21c3
Compare
This commit adds a CI job that runs the `benchcomp` tool on the performance regression suite, comparing the HEAD of the pull request to the branch that the PR targets. The CI job fails if any performance benchmark regresses when run using the HEAD version of Kani with respect to the 'base' branch. Regression is defined as a regression in symex or solver time of 10% for any benchmark for which this value is >2s, or if any performance benchmark fails with the HEAD version while passing with the base. This fixes model-checking#2277.
eae21c3 to
aad0cbb
Compare
This commit adds a CI job that runs the
benchcomptool on theperformance regression suite, comparing the HEAD of the pull request to
the branch that the PR targets.
The CI job fails if any performance benchmark regresses when run using
the HEAD version of Kani with respect to the 'base' branch. Regression
is defined as a regression in symex or solver time of 10% for any
benchmark for which this value is >2s, or if any performance benchmark
fails with the HEAD version while passing with the base.
This fixes #2277.
Call-outs:
This is currently running 'old' kani on the old perf suite, and 'new' kani on the new suite.
There seems to be a problem with finding the stdlib otherwise. This means that benchmarks that get renamed will not be checked in the PR that renames them (though they will then start to be checked again when the PR is merged).
Testing:
Here is a run that failed because one of the benchmarks regressed. Compared with that run, I've changed this PR so that numerical metrics that are less than 10 are not compared between the old and new checkouts.
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.