What are the performance regressions at Mozilla- who monitors them and what kind of regressions do we see? I want to answer this question with a few peeks at the data. There are plenty of previous blog posts I have done outlining stats, trends, and the process. Lets recap what we do briefly, then look at the breakdown of alerts (not necessarily bugs).
When Talos uploads numbers to graph server they get stored and eventually run through a calculation loop to find regressions and improvements. As of Jan 1, 2015, we upload these to mozilla.dev.tree-alerts as well as email to the offending patch author (if they can easily be identified). There are a couple folks (performance sheriffs) who look at the alerts and triage them. If necessary a bug is filed for further investigation. Reading this brief recap of what happens to our performance numbers probably doesn’t inspire folks, what is interesting is looking at the actual data we have.
Lets start with some basic facts about alerts in the last 12 months:
- We have collected 8232 alerts!
- 4213 of those alerts are regressions (the rest are improvements)
- 3780 of those above alerts have a manually marked status
- the rest have been programatically marked as merged and associated with the original
- 278 bugs have been filed (or 17 alerts/bug)
- 89 fixed!
- 61 open!
- 128 (5 invalid, 8 duplicate, 115 wontfix/worksforme)
As you can see this is not a casual hobby, it is a real system helping out in fixing and understanding hundreds of performance issues.
We generate alerts on a variety of branches, here is the breakdown of branches and alerts/branch;
There are a few things to keep in mind here, mobile/mozilla-central/Firefox are the same branch, and for non-pgo branches that is only linux/windows/android, not osx.
Looking at that graph is sort of non inspiring, most of the alerts will land on fx-team and mozilla-inbound, then show up on the other branches as we merge code. We run more tests/platforms and land/backout stuff more frequently on mozilla-inbound and fx-team, this is why we have a larger number of alerts.
Given the fact we have so many alerts and have manually triaged them, what state the the alerts end up in?
The interesting data point here is that 43% of our alerts are duplicates. A few reasons for this:
- we see an alert on non-pgo, then on pgo (we usually mark the pgo ones as duplicates)
- we see an alert on mozilla-inbound, then the same alert shows up on fx-team,b2g-inbound,firefox (due to merging)
- and then later we see the pgo versions on the merged branches
- sometimes we retrigger or backfill to find the root cause, this generates a new alert many times
- in a few cases we have landed/backed out/landed a patch and we end up with duplicate sets of alerts
The last piece of information that I would like to share is the break down of alerts per test:
There are a few outliers, but we need to keep in mind that active work was being done in certain areas which would explain a lot of alerts for a given test. There are 35 different test types which wouldn’t look good in an image, so I have excluded retired tests, counters, startup tests, and android tests.
Personally, I am looking forward to the next year as we transition some tools and do some hacking on the reporting, alert generation and overall process. Thanks for reading!