Add one-pager for using machine learning to predict pipeline duration #9901

melotic · 2022-07-06T22:31:57Z

To double check:

The right tests are in and and the right validation has happened. Guidance: https://github.com/dotnet/arcade/tree/main/Documentation/Validation

Unfortunately, Github doesnt support inline comments with jupyter notebooks. If you could quote the section of the notebook that you had feedback on for context, that would be much appreciated!

ChadNedzlek · 2022-07-06T22:45:36Z

Uh... this doesn't appear to be a one pager... or anything I can read. :-)

ChadNedzlek · 2022-07-06T22:47:09Z

Weird. GitHub can render the file more or less fine if you go to "view file". It's just the diff that's basically worthless. :-)

melotic · 2022-07-06T22:47:31Z

Uh... this doesn't appear to be a one pager... or anything I can read. :-)

Github isn't good with PRs and notebooks. You can click on Files Changed -> three dots on the right -> View File to render it in the browser :/

riarenas · 2022-07-06T22:48:15Z

three dots on the right -> View File to render it in the browser :/

That makes it very unfriendly to reviewers when they want to leave a comment. Let's make a proper one pager doc, and you can link your notebook to it.

ChadNedzlek · 2022-07-06T22:53:42Z

This doc is really, really long... Long enough that I'm not sure it should be checked into arcade. There's like a sentence or two every page that's sometimes saying important stuff... but otherwise I'm not sure what I'm supposed to be getting of this other than "runtime is doing something weird and we should tell them not to do that" and "averages and stddev are stable, so we can just use those". I'm not sure what the "machine learning" part of this is... it's basically just saying that for a given definition, the build times form a more or less predicable distribution, but that that point, it's just basic statistics.

melotic · 2022-07-06T23:31:34Z

This doc is really, really long... Long enough that I'm not sure it should be checked into arcade. There's like a sentence or two every page that's sometimes saying important stuff... but otherwise I'm not sure what I'm supposed to be getting of this other than "runtime is doing something weird and we should tell them not to do that" and "averages and stddev are stable, so we can just use those". I'm not sure what the "machine learning" part of this is... it's basically just saying that for a given definition, the build times form a more or less predicable distribution, but that that point, it's just basic statistics.

The one-pager outlines the data science process in order to arrive at the conclusion that regressing a distribution over the duration of pipelines is a suitable and effective way to predict the duration of a given pipeline. Without this document, how do we answer the question "How are you giving us prediction?" and "Is your prediction accuarate?" The one-pager (I should rename it something else, and write a seperate markdown one-pager) walks any user through this process and provides suitable statistical analysis and answers to those questions.

While yes, this is basic statistics, the point was that I proved that using basic statistics is effective in solving this problem. The machine learning part comes from the regression on the distrubution.

ChadNedzlek · 2022-07-06T23:35:08Z

True... but I'm not sure we need a 10 page doc to answer the question "yes, we did the science, and the science says ok" saved (and fetched by everyone that clones arcade) into our public repository for all time. A quick summary of the findings in a MD file is likely sufficient. With a quick summary of "we did analysis of all the repositories, and most of them form a stable normal distribution".

I trust that you did all the correct analysis if you say you did. I don't really need proof. :-)

riarenas · 2022-07-07T14:55:53Z

Initially Justin came to me with the idea of making his one pager as a jupyter notebook and I thought it was worth the try as long as it was possible to review it.

We now see that the approach is not correct. This file is too big to check in, and this PR difficult to give feedback on.

@ChadNedzlek's suggestion of just summarizing your findings in the one-pager, along with a link to the full notebook for those who want to go through your process sounds right to me.

markwilkie · 2022-07-07T16:20:12Z

I love the idea of the jupyter notebook being there because I suspect there's a good amount of learning documented. However, I guess a giant doc isn't really a "one pager"....lol So yea, a high level summary of the findings and approach, with the jupyter doc as source material seems great.

Oh, and I'm well aware I was wrong when I thought that using the jupyter file as the one pager was "fine" @melotic ..... :)

melotic · 2022-07-08T22:01:28Z

Switched it to markdown.

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md

garath · 2022-07-11T21:28:19Z

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md

+* Is there a goal to have this work completed by, and what is the risk of not hitting that date? (e.g. missed OKRs, increased pain-points for consumers, functionality is required for the next product release, et cetera)
+  * Aug 12, the end of the internship.
+* Does anything the new feature depend on consume a limited/throttled API resource?
+  * No. Kusto will be queried once a week to retrain the models.


How many models? How much data is needed per model?

I've arbitrarily set the minimum data to be 50 runs for a pipeline to have a model. Each pipeline has its own model.

Should this be removed as well since we're no longer using this approach?

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md

garath · 2022-07-11T21:30:03Z

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md

+* Are you utilizing any response data that allows intelligent back-off from the service?
+  * We only query Kusto, so there is no need for back-off.
+* What is the plan for getting more capacity if the feature both must exist and needs more capacity than available?
+  * Azure Functions should auto-scale our function if our ML endpoint is being queried hard.


This implies no cache of already-calculated data. Is that correct?

If already-calculated you mean the trained models, they are cached in a blob container.

No longer using Azure Functions.

Let's remove it from the doc then

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md

melotic · 2022-07-11T23:38:30Z

Testing only fitting normal distributions:

Accuracy results:

	Accuracy
count	29
mean	94.9138
std	3.14995
min	84.7935
25%	93.7422
50%	96.0576
75%	96.9298
max	98.3808

Statistics on ranges given over backtesting period, in minutes:

	Ranges(min)
count	534
mean	26.7273
std	24.2594
min	4.05823
25%	11.7814
50%	18.9974
75%	26.6553
max	107.596

@ChadNedzlek seems like this is a good solution.

mathaholic · 2022-07-12T17:24:33Z

@melotic you mention overall being concerned about the distributions being different and that being a limitation of your pipeline. Consider using Chebychev's Inequality Theorem for a model that is more generalizable. https://statisticsbyjim.com/basics/chebyshevs-theorem-in-statistics/

melotic · 2022-07-12T18:31:53Z

@melotic you mention overall being concerned about the distributions being different and that being a limitation of your pipeline. Consider using Chebychev's Inequality Theorem for a model that is more generalizable. https://statisticsbyjim.com/basics/chebyshevs-theorem-in-statistics/

This solves all of our problems! Thanks Nikki :)

I've updated the one-pager with the new method of doing this. We'll use Chebychev's Inequality and we can do this all in kusto.

melotic · 2022-07-14T19:52:07Z

I think this is ready for a second review. Let me know your thoughts.

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md

melotic · 2022-07-19T21:16:56Z

Addressed feedback. Would love to get thoughts & feedback on the possible remediations I listed for detecting when Helix or AzDo is on the floor so we can hide our predictions:

Use Grafana's Alert API and check a handful of bad alerts.
A test pipeline that periodically runs and sends stuff to Helix that we can monitor.

There's more detail in the one pager (under Caveats -> Possible Solutions)

ChadNedzlek · 2022-07-19T21:41:07Z

@melotic Why not use the "known issues" stuff that @ulisesh has worked on? Trying to decide which alerts mean "don't show things" is going to be really difficult, and so rare that it doesn't seem worth coding up.

Honestly, I say just show it, even in weird broken times. Odds are users will have some inclination things are horribly broken... either because of a mail, FR contact, or a "known issue" in the build analysis. Trying to make this feature smart enough to know that it's not smart enough sounds like a crazy twisted bit of logic that hurts my brain. :-) Just... let it be wrong sometimes and let other mechanisms handle telling uses that "stuff is weird right now".

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md

melotic · 2022-07-26T16:20:34Z

Known issues is definitely the way to go. I think our best bet is to simply hide the duration if there is a Critical known issue.

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md

riarenas · 2022-07-28T15:16:06Z

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md

+
+In addition, there is the issue of AzDo, Helix, or builds being on the floor, and we still give customers an estimate, blissfully unaware of any infrastructure errors. In the Juptyer notebook, I dive into an anomaly detection model, based on Helix work item wait times trying to predict this, but the model only improves accuracy by $0.3\%$.
+
+#### Possible Solutions


I think I would just leave this section out, and instead put the solution to each caveat directly underneath.

First, some pipeline's like runtime's....

We will handle this by...

riarenas · 2022-07-28T15:18:55Z

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md

+
+We backtested the model by training on all previous data before a point, and then testing on 1 week ahead, on data the model has not seen before. Here is a graph of the accuracy over time.
+
+<img src="./PipelineMachineLearning/back-tested-accuracy-vs-time.svg" width="600" height="600">


Can you add a small paragraph that interprets the results in the graph?

riarenas · 2022-07-28T15:22:29Z

We should move this doc to live under the other queue insights docs now that we decided to integrate it in the same place. This LGTM with only some minor feedback.

Co-authored-by: Ricardo Arenas <[email protected]>

Add pipeline machine learning

bbe00e2

melotic requested a review from a team July 6, 2022 22:32

Switch one-pager to markdown

663330d

garath reviewed Jul 11, 2022

View reviewed changes

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md Outdated Show resolved Hide resolved

Fix formatting

061fee5

garath reviewed Jul 11, 2022

View reviewed changes

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md Show resolved Hide resolved

garath reviewed Jul 11, 2022

View reviewed changes

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md Outdated Show resolved Hide resolved

garath reviewed Jul 11, 2022

View reviewed changes

ChadNedzlek reviewed Jul 11, 2022

View reviewed changes

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md Outdated Show resolved Hide resolved

ChadNedzlek reviewed Jul 11, 2022

View reviewed changes

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md Outdated Show resolved Hide resolved

Scrap azure functions, use Chebyshev's inequality with Kusto

71fab47

MattGal reviewed Jul 14, 2022

View reviewed changes

riarenas reviewed Jul 19, 2022

View reviewed changes

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md Outdated Show resolved Hide resolved

riarenas reviewed Jul 19, 2022

View reviewed changes

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md Outdated Show resolved Hide resolved

riarenas reviewed Jul 19, 2022

View reviewed changes

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md Outdated Show resolved Hide resolved

Justin Perez added 2 commits July 19, 2022 14:13

feedback

cbbf55c

more typos

7d03dcf

riarenas reviewed Jul 19, 2022

View reviewed changes

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md Outdated Show resolved Hide resolved

hide duration if critical known issue

10c7142

riarenas reviewed Jul 28, 2022

View reviewed changes

Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md Outdated Show resolved Hide resolved

riarenas reviewed Jul 28, 2022

View reviewed changes

melotic and others added 7 commits July 28, 2022 08:53

nit

f157815

Co-authored-by: Ricardo Arenas <[email protected]>

review

c7b1c80

Merge branch 'main' of https://github.com/melotic/arcade

192e462

fix latex

92ee449

fix latex pt 2

8afe82d

fix latex again

8a8bcd9

remove percent latex. github doesn't render latex properly..

f532086

riarenas approved these changes Aug 2, 2022

View reviewed changes

melotic enabled auto-merge (squash) August 2, 2022 17:34

melotic merged commit f7e668c into dotnet:main Aug 2, 2022


		In addition, there is the issue of AzDo, Helix, or builds being on the floor, and we still give customers an estimate, blissfully unaware of any infrastructure errors. In the Juptyer notebook, I dive into an anomaly detection model, based on Helix work item wait times trying to predict this, but the model only improves accuracy by $0.3\%$.

		#### Possible Solutions


		We backtested the model by training on all previous data before a point, and then testing on 1 week ahead, on data the model has not seen before. Here is a graph of the accuracy over time.

		<img src="./PipelineMachineLearning/back-tested-accuracy-vs-time.svg" width="600" height="600">

Add one-pager for using machine learning to predict pipeline duration #9901

Add one-pager for using machine learning to predict pipeline duration #9901

Uh oh!

Conversation

melotic commented Jul 6, 2022

To double check:

Uh oh!

ChadNedzlek commented Jul 6, 2022

Uh oh!

ChadNedzlek commented Jul 6, 2022

Uh oh!

melotic commented Jul 6, 2022

Uh oh!

riarenas commented Jul 6, 2022

Uh oh!

ChadNedzlek commented Jul 6, 2022

Uh oh!

melotic commented Jul 6, 2022

Uh oh!

ChadNedzlek commented Jul 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

riarenas commented Jul 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markwilkie commented Jul 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

melotic commented Jul 8, 2022

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

melotic commented Jul 11, 2022

Uh oh!

mathaholic commented Jul 12, 2022

Uh oh!

melotic commented Jul 12, 2022

Uh oh!

melotic commented Jul 14, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

melotic commented Jul 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChadNedzlek commented Jul 19, 2022

Uh oh!

Uh oh!

melotic commented Jul 26, 2022

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

riarenas commented Jul 28, 2022

Uh oh!

Reviewers

Assignees

ChadNedzlek commented Jul 6, 2022 •

edited

Loading

riarenas commented Jul 7, 2022 •

edited

Loading

markwilkie commented Jul 7, 2022 •

edited

Loading

melotic commented Jul 19, 2022 •

edited

Loading