Implement estimator tags

Currently the common test hard-code many things, like support for multi-output, or requiring positive input, or not allowing specific kinds of predictions.
That's bad design, but also a big problem for 3rd party packages that need to add to the conditions in the common tests (you need to add to the hard-coded list of estimators with a certain property).
See https://github.com/scikit-learn-contrib/py-earth/issues/96 for an example.

Currently we have a very poor mechanism for distinguishing classifiers and regressors for similar purposes (but also to use when deciding the default cross-validation strategy), the  `estimator_type` attribute. That allows only a single tag (classifier, transformer, regressor).

I think we should deprecate `estimator_type` and instead add a more flexible `estimator_properties` dictionary.
This will allow us to programmatically encode assumptions of the algorithms (like vectorizers taking non-numeric data or NB taking non-negative data) as well as clean up our act with the tests.
The people wanting to add to scikit-learn-contrib and auto-sklearn-like settings (tpot) will appreciate that ;)

List of tags that I am coming up with

- [ ] supports sparse data
- [ ] positive data only
- [ ] supports missing data
- [ ] semi-supervised
- [ ] multi-output only
- [ ] multi-label support
- [ ] multi-output regression
- [ ] multi-label multi-output
- [ ] 1d input only
- [ ] multi-class support (or maybe "no multi-class support"?)
- [ ] needs fitting (or maybe "stateless"? though the GP doesn't need fitting but is not stateless)
- [ ] input dtype / dtype conversions?
- [ ] sparse matrix formats / conversions
- [ ] deterministic?
- [ ] label transformation (not for data)
- [ ] special input format? like for CountVectorizer and DictVectorizer? Or maybe we want a field "supported inputs" that lists ndarray, sparse formats, list, strings, dicts?
- [ ] required parameters ?
- [ ] integer input / categorical input supported?

cc @GaelVaroquaux @ogrisel @mblondel @jnothman @mfeurer @rhiever


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement estimator tags #6599

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Implement estimator tags #6599

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions