Skip to content

Conversation

@Neeratyoy
Copy link
Contributor

Reference Issue

Addresses #895.

@Neeratyoy Neeratyoy requested a review from mfeurer March 24, 2021 13:55
@Neeratyoy Neeratyoy requested a review from mfeurer March 29, 2021 22:39
@Neeratyoy Neeratyoy marked this pull request as ready for review March 30, 2021 14:10
Copy link
Collaborator

@mfeurer mfeurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I renamed the file so I could render it on my local machine. Also, I left yet another few comments :)

@mfeurer
Copy link
Collaborator

mfeurer commented Apr 6, 2021

Hey, I think the example is really getting along well. Based on the current status I'm wondering three things:

  1. Should we have a summary section which summarizes the behavior of OpenML-Python, scikit-learn and joblib?
  2. Should we have a section which shows that the measured numbers will differ based on the backend?
  3. We currently don't have case 2 from Why is computation time not reported if n_jobs != 1 or != None? #895 (comment)

@Neeratyoy
Copy link
Contributor Author

  1. Should we have a summary section which summarizes the behavior of OpenML-Python, scikit-learn and joblib?

Do you mean their interaction and behaviour? It would be useful I guess, but the only concern or question is what (exactly) to summarize I guess.

  1. Should we have a section which shows that the measured numbers will differ based on the backend?

Though we could, from the OpenML API standpoint, we can also ignore it. If the duty of parallelization is delegated to scikit-learn or OpenML (as we show in the example), I feel it is likely that the user may not set the backend with a parallel_backend context.
Having said that, it also might be fair to show a 4th case in the example, giving an example of changing the backend.

On second thought, this might necessitate the mention of how the backends change in the entire stack of function calls going through OpenML to scikit-learn pipelines. This is obviously quite complicated.

  1. We currently don't have case 2 from #895 (comment)

In the latest SGDClassifier documentation, the parallelization happens through joblib which I thought we are already covering.
While the HistGradientBoosting, still appears to be experimental, so didn't quite think about including it..

@Neeratyoy Neeratyoy requested a review from mfeurer April 6, 2021 15:10
@mfeurer
Copy link
Collaborator

mfeurer commented Apr 6, 2021

Do you mean their interaction and behaviour? It would be useful I guess, but the only concern or question is what (exactly) to summarize I guess.

I think the main caveats to pay attention to, like the gist of #895

Though we could, from the OpenML API standpoint, we can also ignore it.

Probably I mixed up something. But an example on how this can be changed might be a good idea!

In the latest SGDClassifier documentation, the parallelization happens through joblib which I thought we are already covering.
While the HistGradientBoosting, still appears to be experimental, so didn't quite think about including it..

Yes, but that's for fitting multiple classifiers for a "one vs all" scheme. I was more thinking about something like the neural network where the number of used cores can't be set via the API.

################################################################################
# Summmary
# *********
# OpenML records model runtimes for the CPU-clock and the wall-clock times. The above
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, should we explicitly say "the scikit-learn extension"? Everything we're writing here is exclusive about the scikit-learn extension, so it could be confusing otherwise.

@Neeratyoy Neeratyoy requested a review from mfeurer April 8, 2021 14:07
@mfeurer
Copy link
Collaborator

mfeurer commented Apr 9, 2021

I still can't annotate all my comments... anyway, linear SVM also releases the GIL: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/svm/_liblinear.pyx#L61

Maybe naive bayes doesn't release the GIL?

@Neeratyoy Neeratyoy requested a review from mfeurer April 12, 2021 18:08
* Minor reshuffling

* Update examples/30_extended/fetch_runtimes_tutorial.py

Co-authored-by: Neeratyoy Mallik <[email protected]>

Co-authored-by: Neeratyoy Mallik <[email protected]>
@mfeurer mfeurer merged commit 3a1dfbd into develop Apr 13, 2021
@mfeurer mfeurer deleted the fix_895 branch April 13, 2021 16:02
github-actions bot pushed a commit that referenced this pull request Apr 13, 2021
PGijsbers pushed a commit to Mirkazemi/openml-python that referenced this pull request Feb 23, 2023
)

* Unit test to test existence of refit time

* Measuring runtime always

* Removing redundant check in unit test

* Updating docs with runtimes

* Adding more utilities to new example

* Removing refit_time + fetching trace runtime in example

* rename example

* Reiterating with changes to example from @mfeurer suggestions

* Including refit time and other minor formatting

* Adding more cases + a concluding summary

* Cosmetic changes

* Adding 5th case with no release of GIL

* Removing debug code

* Runtime measurement example updates (openml#1052)

* Minor reshuffling

* Update examples/30_extended/fetch_runtimes_tutorial.py

Co-authored-by: Neeratyoy Mallik <[email protected]>

Co-authored-by: Neeratyoy Mallik <[email protected]>

Co-authored-by: Matthias Feurer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants