-
Notifications
You must be signed in to change notification settings - Fork 378
Add one-pager for using machine learning to predict pipeline duration #9901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Uh... this doesn't appear to be a one pager... or anything I can read. :-) |
|
Weird. GitHub can render the file more or less fine if you go to "view file". It's just the diff that's basically worthless. :-) |
Github isn't good with PRs and notebooks. You can click on Files Changed -> three dots on the right -> View File to render it in the browser :/ |
That makes it very unfriendly to reviewers when they want to leave a comment. Let's make a proper one pager doc, and you can link your notebook to it. |
|
This doc is really, really long... Long enough that I'm not sure it should be checked into arcade. There's like a sentence or two every page that's sometimes saying important stuff... but otherwise I'm not sure what I'm supposed to be getting of this other than "runtime is doing something weird and we should tell them not to do that" and "averages and stddev are stable, so we can just use those". I'm not sure what the "machine learning" part of this is... it's basically just saying that for a given definition, the build times form a more or less predicable distribution, but that that point, it's just basic statistics. |
The one-pager outlines the data science process in order to arrive at the conclusion that regressing a distribution over the duration of pipelines is a suitable and effective way to predict the duration of a given pipeline. Without this document, how do we answer the question "How are you giving us prediction?" and "Is your prediction accuarate?" The one-pager (I should rename it something else, and write a seperate markdown one-pager) walks any user through this process and provides suitable statistical analysis and answers to those questions. While yes, this is basic statistics, the point was that I proved that using basic statistics is effective in solving this problem. The machine learning part comes from the regression on the distrubution. |
|
True... but I'm not sure we need a 10 page doc to answer the question "yes, we did the science, and the science says ok" saved (and fetched by everyone that clones arcade) into our public repository for all time. A quick summary of the findings in a MD file is likely sufficient. With a quick summary of "we did analysis of all the repositories, and most of them form a stable normal distribution". I trust that you did all the correct analysis if you say you did. I don't really need proof. :-) |
|
Initially Justin came to me with the idea of making his one pager as a jupyter notebook and I thought it was worth the try as long as it was possible to review it. We now see that the approach is not correct. This file is too big to check in, and this PR difficult to give feedback on. @ChadNedzlek's suggestion of just summarizing your findings in the one-pager, along with a link to the full notebook for those who want to go through your process sounds right to me. |
|
I love the idea of the jupyter notebook being there because I suspect there's a good amount of learning documented. However, I guess a giant doc isn't really a "one pager"....lol So yea, a high level summary of the findings and approach, with the jupyter doc as source material seems great. Oh, and I'm well aware I was wrong when I thought that using the jupyter file as the one pager was "fine" @melotic ..... :) |
|
Switched it to markdown. |
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Outdated
Show resolved
Hide resolved
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Show resolved
Hide resolved
| * Is there a goal to have this work completed by, and what is the risk of not hitting that date? (e.g. missed OKRs, increased pain-points for consumers, functionality is required for the next product release, et cetera) | ||
| * Aug 12, the end of the internship. | ||
| * Does anything the new feature depend on consume a limited/throttled API resource? | ||
| * No. Kusto will be queried once a week to retrain the models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many models? How much data is needed per model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've arbitrarily set the minimum data to be 50 runs for a pipeline to have a model. Each pipeline has its own model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be removed as well since we're no longer using this approach?
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Outdated
Show resolved
Hide resolved
| * Are you utilizing any response data that allows intelligent back-off from the service? | ||
| * We only query Kusto, so there is no need for back-off. | ||
| * What is the plan for getting more capacity if the feature both must exist and needs more capacity than available? | ||
| * Azure Functions should auto-scale our function if our ML endpoint is being queried hard. No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implies no cache of already-calculated data. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If already-calculated you mean the trained models, they are cached in a blob container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer using Azure Functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove it from the doc then
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Outdated
Show resolved
Hide resolved
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Outdated
Show resolved
Hide resolved
|
Testing only fitting normal distributions: Accuracy results:
Statistics on ranges given over backtesting period, in minutes:
@ChadNedzlek seems like this is a good solution. |
|
@melotic you mention overall being concerned about the distributions being different and that being a limitation of your pipeline. Consider using Chebychev's Inequality Theorem for a model that is more generalizable. https://statisticsbyjim.com/basics/chebyshevs-theorem-in-statistics/ |
This solves all of our problems! Thanks Nikki :) I've updated the one-pager with the new method of doing this. We'll use Chebychev's Inequality and we can do this all in kusto. |
|
I think this is ready for a second review. Let me know your thoughts. |
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Outdated
Show resolved
Hide resolved
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Outdated
Show resolved
Hide resolved
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Show resolved
Hide resolved
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Outdated
Show resolved
Hide resolved
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Outdated
Show resolved
Hide resolved
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Outdated
Show resolved
Hide resolved
|
Addressed feedback. Would love to get thoughts & feedback on the possible remediations I listed for detecting when Helix or AzDo is on the floor so we can hide our predictions:
There's more detail in the one pager (under |
|
@melotic Why not use the "known issues" stuff that @ulisesh has worked on? Trying to decide which alerts mean "don't show things" is going to be really difficult, and so rare that it doesn't seem worth coding up. Honestly, I say just show it, even in weird broken times. Odds are users will have some inclination things are horribly broken... either because of a mail, FR contact, or a "known issue" in the build analysis. Trying to make this feature smart enough to know that it's not smart enough sounds like a crazy twisted bit of logic that hurts my brain. :-) Just... let it be wrong sometimes and let other mechanisms handle telling uses that "stuff is weird right now". |
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Outdated
Show resolved
Hide resolved
|
Known issues is definitely the way to go. I think our best bet is to simply hide the duration if there is a Critical known issue. |
Documentation/TeamProcess/One-Pagers/pipeline-machine-learning-arcade8824.md
Outdated
Show resolved
Hide resolved
|
|
||
| In addition, there is the issue of AzDo, Helix, or builds being on the floor, and we still give customers an estimate, blissfully unaware of any infrastructure errors. In the Juptyer notebook, I dive into an anomaly detection model, based on Helix work item wait times trying to predict this, but the model only improves accuracy by $0.3\%$. | ||
|
|
||
| #### Possible Solutions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would just leave this section out, and instead put the solution to each caveat directly underneath.
First, some pipeline's like runtime's....
We will handle this by...
|
|
||
| We backtested the model by training on all previous data before a point, and then testing on 1 week ahead, on data the model has not seen before. Here is a graph of the accuracy over time. | ||
|
|
||
| <img src="./PipelineMachineLearning/back-tested-accuracy-vs-time.svg" width="600" height="600"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a small paragraph that interprets the results in the graph?
|
We should move this doc to live under the other queue insights docs now that we decided to integrate it in the same place. This LGTM with only some minor feedback. |
Co-authored-by: Ricardo Arenas <[email protected]>

To double check:
Epic: dotnet/dnceng#2657
Unfortunately, Github doesnt support inline comments with jupyter notebooks. If you could quote the section of the notebook that you had feedback on for context, that would be much appreciated!