-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
DOC Use notebook style in plot_kernel_ridge_regression.py #22804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC Use notebook style in plot_kernel_ridge_regression.py #22804
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, it's already a net improvement on its own but if you are willing, it would be great to further improve this example (please consider the suggestions below) by pushing new commits to this same PR branch.
The main point would be to move the last 2 paragraphs of the information of the beginning of the example to analyze cell outputs and the figures right after each relevant code cell and before the header of the next section.
Note that the content of those paragraph is no longer exact, most likely because the code of scikit-learn has evolved and some conclusion no longer hold. For instance:
However, prediction of 100000 target values is more than tree times faster with SVR
since it has learned a sparse model using only approx. 1/3 of the 100 training
datapoints as support vectors.
Should probably be rewritten to something like:
However, prediction of 100000 target values is could be in theory approximately
tree times faster with SVR since it has learned a sparse model using only
approximately 1/3 of the training datapoints as support vectors. However, in
practice, this is not necessarily the case because of implementation details
in the way the kernel function is computed for each model that can make the
KRR model as fast or even faster despite computing more arithmetic operations.
For reference the rendering of the current state of the example is here:
https://181259-843222-gh.circle-artifacts.com/0/doc/auto_examples/miscellaneous/plot_kernel_ridge_regression.html (from the link named "doc artifact" in the continuous integration report of the last commit).
| # Visualize learning curves | ||
|
|
||
| from sklearn.model_selection import learning_curve | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the learning curves figure below would benefit from displaying both the train scores (train_scores_kr / train_scores_svr) and the test scores (already displayed).
To be consistent with the previous figure on training and prediction times, it would be nice to use dashed lines for the test scores ("o--") and use the solid line style ("o-") for the training scores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also reduce the number of steps from 10 to 7:
train_sizes=np.linspace(0.1, 1, 7),while increasing the training set size a bit, e.g. from X[:100]/y[:100] to X[:1000]/y[:1000]. Using the full training set would be even more informative but I am afraid that this would increase the duration of the execution of this code example too much.
|
|
||
| t0 = time.time() | ||
| svr.fit(X[:train_size], y[:train_size]) | ||
| svr_fit = time.time() - t0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the lines below, it would be interesting to also display the results of the hyper-parameter search for both models, for instance:
print(f"Best SVR with params: {svr.best_params_} and R2 score: {svr.best_score_:.3f}")
print(f"Best KRR with params: {kr.best_params_} and R2 score: {kr.best_score_:.3f}")Then we could reuse those tuned hyper-parameters for the latest figure instead of using arbitrary values for alpha, C and gamma.
|
For information, I was curious and I tried to updated the learning curves as I suggested above and here is the result. Each model uses tuned hyper-parameters for the largest dataset size (1000 samples): I really don't know what to do of this plot: in particular I do not understand why would the training error be significantly larger than the test error for the smallest training set sizes. I have tried several times with different |
|
For reference here are the results of the tuning I observed on my machine: |
Co-authored-by: Olivier Grisel <[email protected]>
…nto notebook_style_plot_kernel_ridge_regression
|
I pushed some tweaks to follow @ogrisel's suggestions, some of this is still WIP, I'll try to follow up on this soon. |
|
I am going to restrict this PR on using notebook-style and easy improvements. I will create an issue to try to track the issue mentioned in #22804 (comment). |
Also use dashed line for test score for consistency
|
Note that this is not true anymore:
I am not quite sure what to do about it ... |
…nto notebook_style_plot_kernel_ridge_regression
|
I have opened an issue about how to improve this example further: #23243 |
|
Merging. We should revisit the example. @lesteve already opened an issue for this purpose. |
…rn#22804) Co-authored-by: Loïc Estève <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>
Co-authored-by: Loïc Estève <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>

Reference Issues/PRs
targets #22406
What does this implement/fix? Explain your changes.
Adapts plot_kernel_ridge_regression.py to fit notebook style and doc style.
Any other comments?
#pariswimlds