Skip to content

Conversation

@karkhaz
Copy link
Contributor

@karkhaz karkhaz commented Apr 13, 2023

This commit adds a CI job that runs the benchcomp tool on the
performance regression suite, comparing the HEAD of the pull request to
the branch that the PR targets.

The CI job fails if any performance benchmark regresses when run using
the HEAD version of Kani with respect to the 'base' branch. Regression
is defined as a regression in symex or solver time of 10% for any
benchmark for which this value is >2s, or if any performance benchmark
fails with the HEAD version while passing with the base.

This fixes #2277.

Call-outs:

This is currently running 'old' kani on the old perf suite, and 'new' kani on the new suite.
There seems to be a problem with finding the stdlib otherwise. This means that benchmarks that get renamed will not be checked in the PR that renames them (though they will then start to be checked again when the PR is merged).

Testing:

  • How is this change tested?

Here is a run that failed because one of the benchmarks regressed. Compared with that run, I've changed this PR so that numerical metrics that are less than 10 are not compared between the old and new checkouts.

Checklist

  • Each commit message has a non-empty body, explaining why the change was made
  • Methods or procedures are documented
  • Regression or unit tests are included, or existing tests cover the modified code
  • My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

@karkhaz karkhaz force-pushed the kk-benchcomp-kani-perf branch from 08d2bdf to f4b9875 Compare April 13, 2023 14:14
@karkhaz karkhaz marked this pull request as ready for review April 13, 2023 15:09
@karkhaz karkhaz requested a review from a team as a code owner April 13, 2023 15:09
@karkhaz karkhaz force-pushed the kk-benchcomp-kani-perf branch 2 times, most recently from f2fa8df to 11be08e Compare April 13, 2023 15:20
@karkhaz karkhaz force-pushed the kk-benchcomp-kani-perf branch from 11be08e to 9cce08c Compare April 13, 2023 15:37
@qinheping
Copy link
Contributor

This may frequently block on benchmarks which take a small amount of time (e.g., less than 5s). Maybe we should define different scales, for example:

* Less than 1s: Block on >3s

* Less than 30s: Block on >1min

* Anything else: Block on >1.1x

Of course, I haven't experimented on these particular values, so more scales or tweaks might be needed.

I agree with the idea that we have different scales for different benchmarks. But I am not sure about the numbers. How about running this check in CI a couple times to collect the data of running, time and then use the data to decide the scale numbers.

Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit: This PR is adding a new workflow for PRs but it isn't blocking yet. AFAIK, this is a github configuration which I think we should activate only after we adjust the scale that ensures this job won't give too many false positives.

@karkhaz karkhaz force-pushed the kk-benchcomp-kani-perf branch from 9cce08c to 498731e Compare April 14, 2023 14:32
@karkhaz karkhaz changed the title Add PR-blocking performance regression run (#2362) Add performance regression comparison in CI Apr 14, 2023
@karkhaz
Copy link
Contributor Author

karkhaz commented Apr 14, 2023

Just a nit: This PR is adding a new workflow for PRs but it isn't blocking yet. AFAIK, this is a github configuration which I think we should activate only after we adjust the scale that ensures this job won't give too many false positives.

Thanks, I've changed the commit message. Agreed about leaving this to bake for a while before we make it blocking

@karkhaz
Copy link
Contributor Author

karkhaz commented Apr 14, 2023

@qinheping here are my thoughts on which numbers to use:

I've tried a few experiments with CI and locally. I get a lot less variance locally, often all benchmarks (or all but one) have a variance <10%. This is when comparing identical Kanis.

In CI, the numbers that @adpaco-aws seem to work well for now. I do think it would be a good idea to optimize these numbers, but I would prefer to see what the range is on realistic PRs---that is, PRs that actually change the implementation of Kani and so cause and actual performance difference.

What I propose:

  • In a future PR, I'll save the results.yaml file as an artifact. This will allow me to download the results of several runs over the last 30 days.
  • benchcomp visualize can be used locally on a results.yaml file, without re-running the benchmarks. So if we download a lot of results.yaml files, we can try lots of different numbers in the config file and see which ones are best suited for checking that wide range of result files.

I suggest that we merge this PR first, though. I would be inclined to merge quickly and start getting data as quickly as we can, rather than trying to get it right from now, especially since we don't have data for realistic runs.

Copy link
Member

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make sure we have access to metrics that are very low noise.

@karkhaz karkhaz force-pushed the kk-benchcomp-kani-perf branch 2 times, most recently from a9523c7 to bd483f2 Compare April 14, 2023 16:36
@qinheping
Copy link
Contributor

@qinheping here are my thoughts on which numbers to use:

I've tried a few experiments with CI and locally. I get a lot less variance locally, often all benchmarks (or all but one) have a variance <10%. This is when comparing identical Kanis.

In CI, the numbers that @adpaco-aws seem to work well for now. I do think it would be a good idea to optimize these numbers, but I would prefer to see what the range is on realistic PRs---that is, PRs that actually change the implementation of Kani and so cause and actual performance difference.

What I propose:

* In a future PR, I'll save the results.yaml file as an artifact. This will allow me to download the results of several runs over the last 30 days.

* `benchcomp visualize` can be used locally on a results.yaml file, without re-running the benchmarks. So if we download a lot of results.yaml files, we can try lots of different numbers in the config file and see which ones are best suited for checking that wide range of result files.

I suggest that we merge this PR first, though. I would be inclined to merge quickly and start getting data as quickly as we can, rather than trying to get it right from now, especially since we don't have data for realistic runs.

Sounds great! Thank you @karkhaz.
Yes, we also want to save the results from realistic runs and review them periodically to detect accumulating performance regression.

@karkhaz karkhaz force-pushed the kk-benchcomp-kani-perf branch from bd483f2 to 20ad552 Compare April 14, 2023 17:44
Copy link
Member

@tautschnig tautschnig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got convinced that this is a start, but I do maintain that we need to dig deeper to figure out what the sources of performance variability are.

@celinval
Copy link
Contributor

Got convinced that this is a start, but I do maintain that we need to dig deeper to figure out what the sources of performance variability are.

I agree. I think we should merge the PR but only make it required once we figure that out.

@karkhaz karkhaz force-pushed the kk-benchcomp-kani-perf branch 8 times, most recently from 71ec9e6 to 60c0c43 Compare April 17, 2023 21:12
@karkhaz karkhaz force-pushed the kk-benchcomp-kani-perf branch 4 times, most recently from fdaa1fa to eae21c3 Compare April 17, 2023 22:04
This commit adds a CI job that runs the `benchcomp` tool on the
performance regression suite, comparing the HEAD of the pull request to
the branch that the PR targets.

The CI job fails if any performance benchmark regresses when run using
the HEAD version of Kani with respect to the 'base' branch. Regression
is defined as a regression in symex or solver time of 10% for any
benchmark for which this value is >2s, or if any performance benchmark
fails with the HEAD version while passing with the base.

This fixes model-checking#2277.
@karkhaz karkhaz force-pushed the kk-benchcomp-kani-perf branch from eae21c3 to aad0cbb Compare April 17, 2023 22:04
@karkhaz karkhaz merged commit 00c19e2 into model-checking:main Apr 17, 2023
@karkhaz karkhaz deleted the kk-benchcomp-kani-perf branch April 17, 2023 22:43
@tautschnig tautschnig mentioned this pull request Apr 20, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add a CI pass for pull requests that detects perfomance regressions compared to HEAD

5 participants