ENH Updating defaults for RandomForestRegressor and Classifier#20803
ENH Updating defaults for RandomForestRegressor and Classifier#20803lorentzenchr merged 21 commits intoscikit-learn:mainfrom
Conversation
|
The code to actually raise a |
lorentzenchr
left a comment
There was a problem hiding this comment.
@bsun94 Thanks for tackling this issue. I've some comments and think that we should also deprecate "auto" in the decision trees.
|
thanks @lorentzenchr ! I've put in the suggested changes for the forest classes. Please let me know how they look - if all good, I can do the same for the trees. Regarding the test fail above, it looks like a certain step in the CI pipeline took too long a time. I did add a test to ensure full test coverage of the _forest.py module after my changes, but I ran that test locally and it succeeded (in 0.67s), so shouldn't be a problem. I'm not sure if this would be a problem on my side? |
sklearn/ensemble/_forest.py
Outdated
|
|
||
| if self.max_features == "auto": | ||
| warn( | ||
| "The prior default of 'auto' for max_features is " |
There was a problem hiding this comment.
I find the message ambiguous. I don't really think that we should speak about "default" because the only way that a user gets this warning is that it did not use the default value but explicitly provide a value (that is the same as the default one). So it could be confusing. I would find better a message along these lines:
`max_features="auto"` has been deprecated in 1.0 and will be removed in
1.2. To keep the past behaviour, explicitly set `max_features=1.0` or
remove this parameter as it is also the default value for
RandomForestRegressor and ExtraTreesRegressor.
sklearn/ensemble/_forest.py
Outdated
| "The prior default of 'auto' for max_features is " | ||
| "deprecated. Results of the fit, however, do not " | ||
| "change. Please use the new default for " | ||
| "RandomForestClassifiers and ExtraTreesClassifiers, " | ||
| "which is 'sqrt'.", |
There was a problem hiding this comment.
The same here:
`max_features="auto"` has been deprecated in 1.0 and will be removed in
1.2. To keep the past behaviour, explicitly set `max_features='sqrt'` or
remove this parameter as it is also the default value for
RandomForestClassifier and ExtraTreesClassifier.
sklearn/ensemble/_forest.py
Outdated
| randomness can be achieved by setting smaller values, e.g. 0.3. | ||
|
|
||
| .. deprecated:: 1.0 | ||
| Previous default option "auto" has been deprecated. |
There was a problem hiding this comment.
I think that we need two entries here. One to highlight the change of default and another for the deprecation:
.. versionchanged:: 1.0
The default of `max_features` changed from `"auto"` to 1.0.
.. deprecated:: 1.0
The `"auto"` option was deprecated in 1.0 and will be removed
in 1.2
Could you change for the different entry of all estimators?
|
thanks @glemaitre ! please let me know if these changes look good, and I can go do the same for trees. |
|
@bsun94 I think version 1.0 is now sailing. Could you increment, i.e. 1.0->1.1 and 1.2->1.3. For instance, the whatsnew entry should also be in 1.1. |
glemaitre
left a comment
There was a problem hiding this comment.
Otherwise, I am fine with the changes.
glemaitre
left a comment
There was a problem hiding this comment.
Just added "Request changes" to spot that this PR has been reviewed.
|
Hi @glemaitre @lorentzenchr , please let me know if this looks good. I've put in the version changes. Given this overall is a decently-sized chunk, should we go forward with it (if all looks good) and I can do the trees stuff under a separate PR (to avoid holding this up)? |
|
updated a version number I missed in the latest commit |
|
@lorentzenchr I corrected the last small typo and resolve the conflict with |
lorentzenchr
left a comment
There was a problem hiding this comment.
LGTM
@glemaitre Thanks for finishing.
…t-learn#20803) Co-authored-by: Guillaume Lemaitre <[email protected]>
…ivalTrees The new default is "sqrt", which does the same as the previous default "auto". See scikit-learn/scikit-learn#20803
…ivalTrees The new default is "sqrt", which does the same as the previous default "auto". See scikit-learn/scikit-learn#20803
…ivalTrees The new default is "sqrt", which does the same as the previous default "auto". See scikit-learn/scikit-learn#20803
…ivalTrees The new default is "sqrt", which does the same as the previous default "auto". See scikit-learn/scikit-learn#20803
Reference Issues/PRs
Enhancement for #20111 - set
max_featuresdefaults 1 for Regressors and "sqrt" for Classifiers.What does this implement/fix? Explain your changes.
For the
RandomForestandExtraTreesRegressorsclasses, sets default formax_featuresto be 1 with updates to the docstrings. For the correspondingClassifiers, sets the default to be"sqrt"with updates to docstrings. An update was also made to the documentation on the default setting of 1 corresponding to bagged trees.Any other comments?
N/A