scikit-learn · lasagnaman · Jun 15, 2018 · Jun 15, 2018 · Jun 15, 2018 · Jun 29, 2018
diff --git a/doc/modules/ensemble.rst b/doc/modules/ensemble.rst
@@ -218,7 +218,7 @@ setting ``oob_score=True``.
     The size of the model with the default parameters is :math:`O( M * N * log (N) )`,
     where :math:`M` is the number of trees and :math:`N` is the number of samples.
     In order to reduce the size of the model, you can change these parameters:
-    ``min_samples_split``, ``min_samples_leaf``, ``max_leaf_nodes`` and ``max_depth``.
+    ``min_samples_split``, ``max_leaf_nodes`` and ``max_depth``.
 
 Parallelization
 ---------------
@@ -382,9 +382,7 @@ The number of weak learners is controlled by the parameter ``n_estimators``. The
 the final combination. By default, weak learners are decision stumps. Different
 weak learners can be specified through the ``base_estimator`` parameter.
 The main parameters to tune to obtain good results are ``n_estimators`` and
-the complexity of the base estimators (e.g., its depth ``max_depth`` or
-minimum required number of samples at a leaf ``min_samples_leaf`` in case of
-decision trees).
+the complexity of the base estimators (e.g., its depth ``max_depth``).
 
 .. topic:: Examples:
 

diff --git a/doc/modules/tree.rst b/doc/modules/tree.rst
@@ -330,15 +330,12 @@ Tips on practical use
     for each additional level the tree grows to.  Use ``max_depth`` to control
     the size of the tree to prevent overfitting.
 
-  * Use ``min_samples_split`` or ``min_samples_leaf`` to control the number of
-    samples at a leaf node.  A very small number will usually mean the tree
-    will overfit, whereas a large number will prevent the tree from learning
-    the data. Try ``min_samples_leaf=5`` as an initial value. If the sample size
-    varies greatly, a float number can be used as percentage in these two parameters.
-    The main difference between the two is that ``min_samples_leaf`` guarantees
-    a minimum number of samples in a leaf, while ``min_samples_split`` can
-    create arbitrary small leaves, though ``min_samples_split`` is more common
-    in the literature.
+  * Use ``min_samples_split`` to control the number of samples at a leaf node.
+    A very small number will usually mean the tree will overfit, whereas a
+    large number will prevent the tree from learning the data. If the sample
+    size varies greatly, a float number can be used as percentage in this
+    parameter. Note that ``min_samples_split`` can create arbitrarily
+    small leaves.
 
   * Balance your dataset before training to prevent the tree from being biased
     toward the classes that are dominant. Class balancing can be done by
@@ -347,7 +344,7 @@ Tips on practical use
     class to the same value. Also note that weight-based pre-pruning criteria,
     such as ``min_weight_fraction_leaf``, will then be less biased toward
     dominant classes than criteria that are not aware of the sample weights,
-    like ``min_samples_leaf``.
+    like ``min_samples_split``.
 
   * If the samples are weighted, it will be easier to optimize the tree
     structure using weight-based pre-pruning criterion such as

diff --git a/doc/whats_new/v0.20.rst b/doc/whats_new/v0.20.rst
@@ -190,6 +190,22 @@ Classifiers and regressors
   efficient when ``algorithm='brute'``. :issue:`11136` by `Joel Nothman`_
   and :user:`Aman Dalmia <dalmia>`.
 
+- The parameter `min_samples_leaf` was deprecated in
+  :class:`ensemble.RandomForestClassifier`,
+  :class:`ensemble.RandomForestRegressor`,
+  :class:`ensemble.ExtraTreesClassifier`,
+  :class:`ensemble.ExtraTreesRegressor`,
+  :class:`ensemble.GradientBoostingClassifier`,
+  :class:`ensemble.GradientBoostingRegressor`,
+  :class:`tree.DecisionTreeClassifier`,
+  :class:`tree.DecisionTreeRegressor`,
+  :class:`tree.ExtraTreeClassifier`,
+  :class:`tree.ExtraTreeRegressor`,
+  and will be fixed to a value of 1 in version 0.22. It was not effective
+  for regularization and empirically, 1 is the best value.
+  :issue:`10773` by :user:`Bob Chen <lasagnaman>`.
+
+
 Cluster
 
 - :class:`cluster.KMeans`, :class:`cluster.MiniBatchKMeans` and
@@ -545,6 +561,23 @@ Datasets
 API changes summary
 -------------------
 
+Classifiers and regressors
+
+- The parameter `min_samples_leaf` was deprecated in
+  :class:`ensemble.RandomForestClassifier`,
+  :class:`ensemble.RandomForestRegressor`,
+  :class:`ensemble.ExtraTreesClassifier`,
+  :class:`ensemble.ExtraTreesRegressor`,
+  :class:`ensemble.GradientBoostingClassifier`,
+  :class:`ensemble.GradientBoostingRegressor`,
+  :class:`tree.DecisionTreeClassifier`,
+  :class:`tree.DecisionTreeRegressor`,
+  :class:`tree.ExtraTreeClassifier`,
+  :class:`tree.ExtraTreeRegressor`,
+  and will be fixed to a value of 1 in version 0.22. It was not effective
+  for regularization and empirically, 1 is the best value.
+  :issue:`10773` by :user:`Bob Chen <lasagnaman>`.
+
 Linear, kernelized and related models
 
 - Deprecate ``random_state`` parameter in :class:`svm.OneClassSVM` as the

diff --git a/examples/ensemble/plot_adaboost_hastie_10_2.py b/examples/ensemble/plot_adaboost_hastie_10_2.py
@@ -43,11 +43,11 @@
 X_test, y_test = X[2000:], y[2000:]
 X_train, y_train = X[:2000], y[:2000]
 
-dt_stump = DecisionTreeClassifier(max_depth=1, min_samples_leaf=1)
+dt_stump = DecisionTreeClassifier(max_depth=1)
 dt_stump.fit(X_train, y_train)
 dt_stump_err = 1.0 - dt_stump.score(X_test, y_test)
 
-dt = DecisionTreeClassifier(max_depth=9, min_samples_leaf=1)
+dt = DecisionTreeClassifier(max_depth=9)
 dt.fit(X_train, y_train)
 dt_err = 1.0 - dt.score(X_test, y_test)
 

diff --git a/examples/ensemble/plot_gradient_boosting_oob.py b/examples/ensemble/plot_gradient_boosting_oob.py
@@ -55,7 +55,7 @@
 
 # Fit classifier with out-of-bag estimates
 params = {'n_estimators': 1200, 'max_depth': 3, 'subsample': 0.5,
-          'learning_rate': 0.01, 'min_samples_leaf': 1, 'random_state': 3}
+          'learning_rate': 0.01, 'random_state': 3}
 clf = ensemble.GradientBoostingClassifier(**params)
 
 clf.fit(X_train, y_train)

diff --git a/examples/ensemble/plot_gradient_boosting_quantile.py b/examples/ensemble/plot_gradient_boosting_quantile.py
@@ -41,8 +41,7 @@ def f(x):
 
 clf = GradientBoostingRegressor(loss='quantile', alpha=alpha,
                                 n_estimators=250, max_depth=3,
-                                learning_rate=.1, min_samples_leaf=9,
-                                min_samples_split=9)
+                                learning_rate=.1, min_samples_split=9)
 
 clf.fit(X, y)
 

diff --git a/examples/model_selection/plot_randomized_search.py b/examples/model_selection/plot_randomized_search.py
@@ -55,7 +55,6 @@ def report(results, n_top=3):
 param_dist = {"max_depth": [3, None],
               "max_features": sp_randint(1, 11),
               "min_samples_split": sp_randint(2, 11),
-              "min_samples_leaf": sp_randint(1, 11),
               "bootstrap": [True, False],
               "criterion": ["gini", "entropy"]}
 
@@ -74,7 +73,6 @@ def report(results, n_top=3):
 param_grid = {"max_depth": [3, None],
               "max_features": [1, 3, 10],
               "min_samples_split": [2, 3, 10],
-              "min_samples_leaf": [1, 3, 10],
               "bootstrap": [True, False],
               "criterion": ["gini", "entropy"]}