sergeyf
diff --git a/‎doc/modules/classes.rst‎
Lines changed: 1 addition & 2 deletions b/‎doc/modules/classes.rst‎
Lines changed: 1 addition & 2 deletions
diff --git a/‎doc/modules/impute.rst‎
Lines changed: 2 additions & 69 deletions b/‎doc/modules/impute.rst‎
Lines changed: 2 additions & 69 deletions
diff --git a/‎doc/whats_new/v0.20.rst‎
Lines changed: 0 additions & 5 deletions b/‎doc/whats_new/v0.20.rst‎
Lines changed: 0 additions & 5 deletions
diff --git a/‎examples/plot_missing_values.py‎
Lines changed: 9 additions & 19 deletions b/‎examples/plot_missing_values.py‎
Lines changed: 9 additions & 19 deletions
@@ -655,9 +655,8 @@ Kernels:
    :template: class.rst
 
    impute.SimpleImputer
-   impute.ChainedImputer
    impute.MissingIndicator
-   
+
 .. _kernel_approximation_ref:
 
 :mod:`sklearn.kernel_approximation` Kernel Approximation
 
@@ -16,22 +16,6 @@ values. However, this comes at the price of losing data which may be valuable
 i.e., to infer them from the known part of the data. See the :ref:`glossary`
 entry on imputation.
 
-
-Univariate vs. Multivariate Imputation
-======================================
-
-One type of imputation algorithm is univariate, which imputes values in the i-th
-feature dimension using only non-missing values in that feature dimension
-(e.g. :class:`impute.SimpleImputer`). By contrast, multivariate imputation
-algorithms use the entire set of available feature dimensions to estimate the
-missing values (e.g. :class:`impute.ChainedImputer`).
-
-
-.. _single_imputer:
-
-Univariate feature imputation
-=============================
-
 The :class:`SimpleImputer` class provides basic strategies for imputing missing
 values. Missing values can be imputed with a provided constant value, or using
 the statistics (mean, median or most frequent) of each column in which the
@@ -87,60 +71,9 @@ string values or pandas categoricals when using the ``'most_frequent'`` or
      ['a' 'y']
      ['b' 'y']]
 
-.. _chained_imputer:
-
-
-Multivariate feature imputation
-===============================
 
-A more sophisticated approach is to use the :class:`ChainedImputer` class, which
-implements the imputation technique from MICE (Multivariate Imputation by
-Chained Equations). MICE models each feature with missing values as a function of
-other features, and uses that estimate for imputation. It does so in a round-robin
-fashion: at each step, a feature column is designated as output `y` and the other
-feature columns are treated as inputs `X`. A regressor is fit on `(X, y)` for known `y`.
-Then, the regressor is used to predict the unknown values of `y`. This is repeated
-for each feature in a chained fashion, and then is done for a number of imputation
-rounds. Here is an example snippet::
-
-    >>> import numpy as np
-    >>> from sklearn.impute import ChainedImputer
-    >>> imp = ChainedImputer(n_imputations=10, random_state=0)
-    >>> imp.fit([[1, 2], [np.nan, 3], [7, np.nan]])
-    ChainedImputer(imputation_order='ascending', initial_strategy='mean',
-            max_value=None, min_value=None, missing_values=nan, n_burn_in=10,
-            n_imputations=10, n_nearest_features=None, predictor=None,
-            random_state=0, verbose=False)
-    >>> X_test = [[np.nan, 2], [6, np.nan], [np.nan, 6]]
-    >>> print(np.round(imp.transform(X_test)))
-    [[ 1.  2.]
-     [ 6.  4.]
-     [13.  6.]]
-
-Both :class:`SimpleImputer` and :class:`ChainedImputer` can be used in a Pipeline
-as a way to build a composite estimator that supports imputation.
-See :ref:`sphx_glr_auto_examples_plot_missing_values.py`.
-
-.. _multiple_imputation:
-
-Multiple vs. Single Imputation
-==============================
-
-In the statistics community, it is common practice to perform multiple imputations,
-generating, for example, 10 separate imputations for a single feature matrix.
-Each of these 10 imputations is then put through the subsequent analysis pipeline
-(e.g. feature engineering, clustering, regression, classification). The 10 final
-analysis results (e.g. held-out validation error) allow the data scientist to
-obtain understanding of the uncertainty inherent in the missing values. The above
-practice is called multiple imputation. As implemented, the :class:`ChainedImputer`
-class generates a single (averaged) imputation for each missing value because this
-is the most common use case for machine learning applications. However, it can also be used
-for multiple imputations by applying it repeatedly to the same dataset with different
-random seeds with the ``n_imputations`` parameter set to 1.
-
-Note that a call to the ``transform`` method of :class:`ChainedImputer` is not
-allowed to change the number of samples. Therefore multiple imputations cannot be
-achieved by a single call to ``transform``.
+:class:`SimpleImputer` can be used in a Pipeline as a way to build a composite
+estimator that supports imputation. See :ref:`sphx_glr_auto_examples_plot_missing_values.py`.
 
 .. _missing_indicator:
 
 
@@ -150,11 +150,6 @@ Preprocessing
 - Added :class:`MissingIndicator` which generates a binary indicator for
   missing values. :issue:`8075` by :user:`Maniteja Nandana <maniteja123>` and
   :user:`Guillaume Lemaitre <glemaitre>`.
-  
-- Added :class:`impute.ChainedImputer`, which is a strategy for imputing missing
-  values by modeling each feature with missing values as a function of
-  other features in a round-robin fashion. :issue:`8478` by
-  :user:`Sergey Feldman <sergeyf>`.
 
 - :class:`linear_model.SGDClassifier`, :class:`linear_model.SGDRegressor`,
   :class:`linear_model.PassiveAggressiveClassifier`,
 
@@ -3,30 +3,29 @@
 Imputing missing values before building an estimator
 ====================================================
 
+This example shows that imputing the missing values can give better
+results than discarding the samples containing any missing value.
+Imputing does not always improve the predictions, so please check via
+cross-validation.  Sometimes dropping rows or using marker values is
+more effective.
+
 Missing values can be replaced by the mean, the median or the most frequent
 value using the basic :func:`sklearn.impute.SimpleImputer`.
 The median is a more robust estimator for data with high magnitude variables
 which could dominate results (otherwise known as a 'long tail').
 
-Another option is the :func:`sklearn.impute.ChainedImputer`. This uses
-round-robin linear regression, treating every variable as an output in
-turn. The version implemented assumes Gaussian (output) variables. If your
-features are obviously non-Normal, consider transforming them to look more
-Normal so as to improve performance.
-
 In addition of using an imputing method, we can also keep an indication of the
 missing information using :func:`sklearn.impute.MissingIndicator` which might
 carry some information.
 """
-
 import numpy as np
 import matplotlib.pyplot as plt
 
 from sklearn.datasets import load_diabetes
 from sklearn.datasets import load_boston
 from sklearn.ensemble import RandomForestRegressor
 from sklearn.pipeline import make_pipeline, make_union
-from sklearn.impute import SimpleImputer, ChainedImputer, MissingIndicator
+from sklearn.impute import SimpleImputer, MissingIndicator
 from sklearn.model_selection import cross_val_score
 
 rng = np.random.RandomState(0)
@@ -71,18 +70,10 @@ def get_results(dataset):
     mean_impute_scores = cross_val_score(estimator, X_missing, y_missing,
                                          scoring='neg_mean_squared_error')
 
-    # Estimate the score after chained imputation of the missing values
-    estimator = make_pipeline(
-        make_union(ChainedImputer(missing_values=0, random_state=0),
-                   MissingIndicator(missing_values=0)),
-        RandomForestRegressor(random_state=0, n_estimators=100))
-    chained_impute_scores = cross_val_score(estimator, X_missing, y_missing,
-                                            scoring='neg_mean_squared_error')
 
     return ((full_scores.mean(), full_scores.std()),
             (zero_impute_scores.mean(), zero_impute_scores.std()),
-            (mean_impute_scores.mean(), mean_impute_scores.std()),
-            (chained_impute_scores.mean(), chained_impute_scores.std()))
+            (mean_impute_scores.mean(), mean_impute_scores.std()))
 
 
 results_diabetes = np.array(get_results(load_diabetes()))
@@ -98,8 +89,7 @@ def get_results(dataset):
 
 x_labels = ['Full data',
             'Zero imputation',
-            'Mean Imputation',
-            'Chained Imputation']
+            'Mean Imputation']
 colors = ['r', 'g', 'b', 'orange']
 
 # plot diabetes results