DOC Rework Importance of Feature Scaling example #25012

ArturoAmorQ · 2022-11-23T10:36:37Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This example can benefit from a "tutorialization". In particular, this PR adds a section regarding how nearest neighbors is sensitive to scaling.

Any other comments?

Side effect: Implements notebook style as intended in #22406.

glemaitre · 2022-11-23T19:01:51Z

Weird that the CIs did not start. I merged main in the branch to trigger them.

…arn into scaling_importance

…nto scaling_importance

lorentzenchr

My 5 cent.

examples/preprocessing/plot_scaling_importance.py

Co-authored-by: Christian Lorentzen <[email protected]>

…arn into scaling_importance

examples/preprocessing/plot_scaling_importance.py

Co-authored-by: Guillaume Lemaitre <[email protected]>

…arn into scaling_importance

ArturoAmorQ · 2022-12-12T15:09:06Z

I think all of your comments have been addressed, @glemaitre and @lorentzenchr.
Please let me know what do you think of my proposed solutions.

glemaitre · 2022-12-12T15:09:51Z

I will have to check the rendering but I think that the proposal is already an improvement.

lorentzenchr

LGTM, some nitpicks.

examples/preprocessing/plot_scaling_importance.py

lorentzenchr · 2022-12-15T16:43:25Z

examples/preprocessing/plot_scaling_importance.py

+    X, y, test_size=0.30, random_state=42
 )
+scaled_X_train = scaler.fit_transform(X_train)
+


Optional: We could show the mean value of each feature, or min and max.

examples/preprocessing/plot_scaling_importance.py

lorentzenchr · 2022-12-15T16:57:11Z

examples/preprocessing/plot_scaling_importance.py

+
+# %%
+# The need for regularization is higher (lower values of `C`) for the data
+# that was not scaled before applying PCA. From the plot we can confirm that


Which plot? Is it over- or underfitting?

By plotting the validation curves I realized that the training and test accuracy overlap too much to make a proper statement about over- or underfitting for the scenario with no standardization.

I think that it is better to avoid mentioning over-/underfitting to keep the example as simple as possible.

examples/preprocessing/plot_scaling_importance.py

Co-authored-by: Christian Lorentzen <[email protected]>

…nto scaling_importance

glemaitre

Only nitpicks. Otherwise LGTM.

examples/preprocessing/plot_scaling_importance.py

Co-authored-by: Guillaume Lemaitre <[email protected]>

glemaitre · 2023-01-17T10:49:33Z

I certainly broke the linter with my suggestion. Sorry @ArturoAmorQ

examples/preprocessing/plot_scaling_importance.py

Co-authored-by: Guillaume Lemaitre <[email protected]>

…nto scaling_importance

Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Christian Lorentzen <[email protected]>

ArturoAmorQ and others added 4 commits November 21, 2022 17:40

First step to improve notebook style

cf0280e

Add KNeighbors section and refactor narrative

f2636d4

Tweak

43e87c3

Merge branch 'main' into scaling_importance

9dea8cc

ArturoAmorQ added the Documentation label Nov 24, 2022

ArturoAmorQ and others added 8 commits November 24, 2022 14:43

Tweak

9e373a2

Merge branch 'scaling_importance' of github.com:ArturoAmorQ/scikit-le…

3ec7569

…arn into scaling_importance

Tweak

14eb647

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

b15210b

…nto scaling_importance

Add clarifying text on using subset of features

4b6d685

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

edc65b8

…nto scaling_importance

Tweak

e3afbaf

Merge branch 'main' into scaling_importance

6d14bb6

lorentzenchr reviewed Dec 8, 2022

View reviewed changes

ArturoAmorQ and others added 2 commits December 9, 2022 11:30

Update examples/preprocessing/plot_scaling_importance.py

0491e92

Co-authored-by: Christian Lorentzen <[email protected]>

Merge branch 'scaling_importance' of github.com:ArturoAmorQ/scikit-le…

41d98de

…arn into scaling_importance

glemaitre reviewed Dec 9, 2022

View reviewed changes

glemaitre self-assigned this Dec 9, 2022

glemaitre reviewed Dec 9, 2022

View reviewed changes

examples/preprocessing/plot_scaling_importance.py Show resolved Hide resolved

ArturoAmorQ and others added 4 commits December 9, 2022 16:02

Apply wording suggestion from Christian

803a0a9

Apply suggestions from code review

a2d5085

Co-authored-by: Guillaume Lemaitre <[email protected]>

Merge branch 'scaling_importance' of github.com:ArturoAmorQ/scikit-le…

25e198e

…arn into scaling_importance

Use plt.show only for last plot

d21224e

glemaitre removed their assignment Dec 9, 2022

ArturoAmorQ added 5 commits December 9, 2022 17:47

Use set_output to retain pandas frames

575c2ec

Use pandas for displaying feature names

fb6ec53

Make plot more squared

78c025c

Add interpretation to plot

0ea3afd

Add discussion on regularization parameter

3da42f7

Add discussion on log-loss

407c69a

lorentzenchr approved these changes Dec 15, 2022

View reviewed changes

ArturoAmorQ and others added 5 commits December 15, 2022 23:58

Apply suggestions from code review

da8f4ad

Co-authored-by: Christian Lorentzen <[email protected]>

Fix format

1cfbcfa

Avoid repetitive print

7c40145

Merge log-loss and accuracy discussions

8ed2ad0

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

5702586

…nto scaling_importance

glemaitre approved these changes Jan 17, 2023

View reviewed changes

Apply suggestions from code review

ff8ba3b

Co-authored-by: Guillaume Lemaitre <[email protected]>

ArturoAmorQ added 2 commits January 17, 2023 12:00

Fix format

babd485

Add barplot for easier visualization

da49dc5

glemaitre reviewed Jan 19, 2023

View reviewed changes

examples/preprocessing/plot_scaling_importance.py Show resolved Hide resolved

glemaitre reviewed Jan 19, 2023

View reviewed changes

examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved

glemaitre reviewed Jan 19, 2023

View reviewed changes

examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved

examples/preprocessing/plot_scaling_importance.py Outdated Show resolved Hide resolved

ArturoAmorQ and others added 3 commits January 19, 2023 14:33

Apply suggestions from code review

de3dad1

Co-authored-by: Guillaume Lemaitre <[email protected]>

Avoid possible command line interruption created by plt.show

62c495a

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

08fa0a3

…nto scaling_importance

glemaitre merged commit 4b55dee into scikit-learn:main Jan 19, 2023

ArturoAmorQ deleted the scaling_importance branch January 19, 2023 16:12

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023

DOC Rework Importance of Feature Scaling example (scikit-learn#25012)

4575b54

Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Christian Lorentzen <[email protected]>

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023

DOC Rework Importance of Feature Scaling example (scikit-learn#25012)

23fe0c9

Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Christian Lorentzen <[email protected]>

jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023

DOC Rework Importance of Feature Scaling example (scikit-learn#25012)

94fd968

Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Christian Lorentzen <[email protected]>

adrinjalali pushed a commit that referenced this pull request Jan 24, 2023

DOC Rework Importance of Feature Scaling example (#25012)

546a506

Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: Christian Lorentzen <[email protected]>

Uh oh!

DOC Rework Importance of Feature Scaling example #25012

DOC Rework Importance of Feature Scaling example #25012

Uh oh!

Conversation

ArturoAmorQ commented Nov 23, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

glemaitre commented Nov 23, 2022

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArturoAmorQ commented Dec 12, 2022

Uh oh!

glemaitre commented Dec 12, 2022

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr Dec 15, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr Dec 15, 2022

Choose a reason for hiding this comment

Uh oh!

ArturoAmorQ Jan 16, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Jan 17, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants