I'm slightly concerned that currently the common tests don't cover as much as I'd like them to cover, which results in no sparse data tests for clustering (#4052) for example.
I think for clustering, regression, classification and transformers we are in relatively good shape, but there are two cases of "odd" estimators that we need to watch out for:
- estimators not returned by
all_estimators by default
- estimators not belonging to the four mixin classes.
For the second:
estimators = all_estimators(type_filter=['classifier', 'regressor', 'transformer', 'cluster'])
{('CheckingClassifier', sklearn.utils.mocking.CheckingClassifier),
('CountVectorizer', sklearn.feature_extraction.text.CountVectorizer),
('DPGMM', sklearn.mixture.dpgmm.DPGMM),
('EmpiricalCovariance',
sklearn.covariance.empirical_covariance_.EmpiricalCovariance),
('GMM', sklearn.mixture.gmm.GMM),
('GMMHMM', sklearn.hmm.GMMHMM),
('GaussianHMM', sklearn.hmm.GaussianHMM),
('GraphLasso', sklearn.covariance.graph_lasso_.GraphLasso),
('GraphLassoCV', sklearn.covariance.graph_lasso_.GraphLassoCV),
('HashingVectorizer', sklearn.feature_extraction.text.HashingVectorizer),
('KernelDensity', sklearn.neighbors.kde.KernelDensity),
('LSHForest', sklearn.neighbors.approximate.LSHForest),
('LedoitWolf', sklearn.covariance.shrunk_covariance_.LedoitWolf),
('LogOddsEstimator', sklearn.ensemble.gradient_boosting.LogOddsEstimator),
('MDS', sklearn.manifold.mds.MDS),
('MeanEstimator', sklearn.ensemble.gradient_boosting.MeanEstimator),
('MinCovDet', sklearn.covariance.robust_covariance.MinCovDet),
('MultinomialHMM', sklearn.hmm.MultinomialHMM),
('NearestNeighbors', sklearn.neighbors.unsupervised.NearestNeighbors),
('OAS', sklearn.covariance.shrunk_covariance_.OAS),
('OneClassSVM', sklearn.svm.classes.OneClassSVM),
('PatchExtractor', sklearn.feature_extraction.image.PatchExtractor),
('PriorProbabilityEstimator',
sklearn.ensemble.gradient_boosting.PriorProbabilityEstimator),
('QuantileEstimator', sklearn.ensemble.gradient_boosting.QuantileEstimator),
('ScaledLogOddsEstimator',
sklearn.ensemble.gradient_boosting.ScaledLogOddsEstimator),
('ShrunkCovariance', sklearn.covariance.shrunk_covariance_.ShrunkCovariance),
('SpectralBiclustering', sklearn.cluster.bicluster.SpectralBiclustering),
('SpectralCoclustering', sklearn.cluster.bicluster.SpectralCoclustering),
('SpectralEmbedding', sklearn.manifold.spectral_embedding_.SpectralEmbedding),
('TSNE', sklearn.manifold.t_sne.TSNE),
('TfidfVectorizer', sklearn.feature_extraction.text.TfidfVectorizer),
('VBGMM', sklearn.mixture.dpgmm.VBGMM),
('ZeroEstimator', sklearn.ensemble.gradient_boosting.ZeroEstimator),
('_BaseHMM', sklearn.hmm._BaseHMM),
('_BaseRidgeCV', sklearn.linear_model.ridge._BaseRidgeCV),
('_ConstantPredictor', sklearn.multiclass._ConstantPredictor),
('_RidgeGCV', sklearn.linear_model.ridge._RidgeGCV)}
These are mostly covariance, density, preprocessing and density models.
It would be great if we could figure out a good way to test them, too, or make more tests applicable to all estimators, without filtering for the four standard kinds.
I'm slightly concerned that currently the common tests don't cover as much as I'd like them to cover, which results in no sparse data tests for clustering (#4052) for example.
I think for clustering, regression, classification and transformers we are in relatively good shape, but there are two cases of "odd" estimators that we need to watch out for:
all_estimatorsby defaultFor the second:
These are mostly covariance, density, preprocessing and density models.
It would be great if we could figure out a good way to test them, too, or make more tests applicable to all estimators, without filtering for the four standard kinds.