DOC Use notebook style in plot_kernel_ridge_regression.py #22804

verakye · 2022-03-12T16:46:56Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adapts plot_kernel_ridge_regression.py to fit notebook style and doc style.

Any other comments?

#pariswimlds

ogrisel

Thanks for the PR, it's already a net improvement on its own but if you are willing, it would be great to further improve this example (please consider the suggestions below) by pushing new commits to this same PR branch.

The main point would be to move the last 2 paragraphs of the information of the beginning of the example to analyze cell outputs and the figures right after each relevant code cell and before the header of the next section.

Note that the content of those paragraph is no longer exact, most likely because the code of scikit-learn has evolved and some conclusion no longer hold. For instance:

However, prediction of 100000 target values is more than tree times faster with SVR
since it has learned a sparse model using only approx. 1/3 of the 100 training
datapoints as support vectors.

Should probably be rewritten to something like:

However, prediction of 100000 target values is could be in theory approximately
tree times faster with SVR since it has learned a sparse model using only
approximately 1/3 of the training datapoints as support vectors. However, in
practice, this is not necessarily the case because of implementation details
in the way the kernel function is computed for each model that can make the
KRR model as fast or even faster despite computing more arithmetic operations.

For reference the rendering of the current state of the example is here:

https://181259-843222-gh.circle-artifacts.com/0/doc/auto_examples/miscellaneous/plot_kernel_ridge_regression.html (from the link named "doc artifact" in the continuous integration report of the last commit).

examples/miscellaneous/plot_kernel_ridge_regression.py

ogrisel · 2022-03-13T15:34:45Z

examples/miscellaneous/plot_kernel_ridge_regression.py

 # Visualize learning curves
+
+from sklearn.model_selection import learning_curve
+


I think the learning curves figure below would benefit from displaying both the train scores (train_scores_kr / train_scores_svr) and the test scores (already displayed).

To be consistent with the previous figure on training and prediction times, it would be nice to use dashed lines for the test scores ("o--") and use the solid line style ("o-") for the training scores.

We could also reduce the number of steps from 10 to 7:

train_sizes=np.linspace(0.1, 1, 7),

while increasing the training set size a bit, e.g. from X[:100]/y[:100] to X[:1000]/y[:1000]. Using the full training set would be even more informative but I am afraid that this would increase the duration of the execution of this code example too much.

ogrisel · 2022-03-13T16:03:50Z

examples/miscellaneous/plot_kernel_ridge_regression.py

+
 t0 = time.time()
 svr.fit(X[:train_size], y[:train_size])
 svr_fit = time.time() - t0


In the lines below, it would be interesting to also display the results of the hyper-parameter search for both models, for instance:

print(f"Best SVR with params: {svr.best_params_} and R2 score: {svr.best_score_:.3f}") print(f"Best KRR with params: {kr.best_params_} and R2 score: {kr.best_score_:.3f}")

Then we could reuse those tuned hyper-parameters for the latest figure instead of using arbitrary values for alpha, C and gamma.

ogrisel · 2022-03-13T16:18:04Z

For information, I was curious and I tried to updated the learning curves as I suggested above and here is the result. Each model uses tuned hyper-parameters for the largest dataset size (1000 samples):

I really don't know what to do of this plot: in particular I do not understand why would the training error be significantly larger than the test error for the smallest training set sizes. I have tried several times with different cv= values and this seems to always happen. This is very confusing.

ogrisel · 2022-03-13T16:19:21Z

For reference here are the results of the tuning I observed on my machine:

SVR complexity and bandwidth selected and model fitted in 7.026 s
Best SVR with params: {'C': 1000.0, 'gamma': 0.01} and R2 score: 0.77
KRR complexity and bandwidth selected and model fitted in 3.012 s
Best KRR with params: {'alpha': 0.1, 'gamma': 0.1} and R2 score: 0.78

Co-authored-by: Olivier Grisel <[email protected]>

…nto notebook_style_plot_kernel_ridge_regression

lesteve · 2022-03-30T16:06:08Z

I pushed some tweaks to follow @ogrisel's suggestions, some of this is still WIP, I'll try to follow up on this soon.

lesteve · 2022-04-08T15:05:14Z

I am going to restrict this PR on using notebook-style and easy improvements. I will create an issue to try to track the issue mentioned in #22804 (comment).

Also use dashed line for test score for consistency

lesteve · 2022-04-08T15:49:43Z

Note that this is not true anymore:

SVR is faster than KRR for all sizes of the training set because of the learned sparse solution.

I am not quite sure what to do about it ...

…nto notebook_style_plot_kernel_ridge_regression

lesteve · 2022-04-29T12:18:00Z

Output is here: https://output.circle-artifacts.com/output/job/47245235-3df3-48a6-97fe-6085c65771c6/artifacts/0/doc/auto_examples/miscellaneous/plot_kernel_ridge_regression.html

I have opened an issue about how to improve this example further: #23243

glemaitre · 2022-04-29T12:35:12Z

Merging. We should revisit the example. @lesteve already opened an issue for this purpose.
In the meanwhile, we have a better rendering and a slightly more appropriate discussion.

…rn#22804) Co-authored-by: Loïc Estève <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>

Co-authored-by: Loïc Estève <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>

DOC Use notebook style in plot_kernel_ridge_regression.py

8e075b5

github-actions bot added the Documentation label Mar 12, 2022

ogrisel reviewed Mar 13, 2022

View reviewed changes

lesteve added the Quick Review For PRs that are quick to review label Mar 14, 2022

lesteve mentioned this pull request Mar 14, 2022

Fix notebook-style examples #22406

Closed

47 tasks

lesteve and others added 5 commits March 30, 2022 15:51

Apply suggestions from code review

69f7e03

Co-authored-by: Olivier Grisel <[email protected]>

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

17fb520

…nto notebook_style_plot_kernel_ridge_regression

merge random seed cell with data generation

c141943

Avoid another label output

cbc4d3f

wip

fa1b54b

Revert some changes

b366154

lesteve added 2 commits April 8, 2022 17:47

Move comments after the plots

0ca564e

Also use dashed line for test score for consistency

Consistent ordering of labels

89b5112

cmarmo added Needs Decision Requires decision and removed Quick Review For PRs that are quick to review labels Apr 14, 2022

lesteve added Quick Review For PRs that are quick to review and removed Quick Review For PRs that are quick to review labels Apr 20, 2022

lesteve added 5 commits April 27, 2022 15:57

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

e12e7b0

…nto notebook_style_plot_kernel_ridge_regression

wip

f136fc8

wip

35ff1ae

wip

2e1454b

Tweak comments to make it fit the figure

d0665a4

lesteve mentioned this pull request Apr 29, 2022

Improve SVR vs KRR example further #23243

Closed

lesteve added the Quick Review For PRs that are quick to review label Apr 29, 2022

glemaitre merged commit 1393a63 into scikit-learn:main Apr 29, 2022

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request May 19, 2022

DOC Use notebook style in plot_kernel_ridge_regression.py (scikit-lea…

b98580d

…rn#22804) Co-authored-by: Loïc Estève <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>

glemaitre pushed a commit that referenced this pull request May 19, 2022

DOC Use notebook style in plot_kernel_ridge_regression.py (#22804)

2a6a546

Co-authored-by: Loïc Estève <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>

		# Visualize learning curves

		from sklearn.model_selection import learning_curve

Uh oh!

DOC Use notebook style in plot_kernel_ridge_regression.py #22804

DOC Use notebook style in plot_kernel_ridge_regression.py #22804

Uh oh!

Conversation

verakye commented Mar 12, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel Mar 13, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel Mar 13, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel Mar 13, 2022

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Mar 13, 2022

Uh oh!

ogrisel commented Mar 13, 2022

Uh oh!

lesteve commented Mar 30, 2022

Uh oh!

lesteve commented Apr 8, 2022

Uh oh!

lesteve commented Apr 8, 2022

Uh oh!

lesteve commented Apr 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Apr 29, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ogrisel left a comment •

edited

Loading

lesteve commented Apr 29, 2022 •

edited

Loading