@@ -1419,22 +1419,18 @@ advised to maintain notes on the `GitHub wiki
14191419Specific models
14201420---------------
14211421
1422- Classifiers should accept ``y `` (target) arguments to ``fit ``
1423- that are sequences (lists, arrays) of either strings or integers.
1424- They should not assume that the class labels
1425- are a contiguous range of integers;
1426- instead, they should store a list of classes
1427- in a ``classes_ `` attribute or property.
1428- The order of class labels in this attribute
1429- should match the order in which ``predict_proba ``, ``predict_log_proba ``
1430- and ``decision_function `` return their values.
1431- The easiest way to achieve this is to put::
1422+ Classifiers should accept ``y `` (target) arguments to ``fit `` that are
1423+ sequences (lists, arrays) of either strings or integers. They should not
1424+ assume that the class labels are a contiguous range of integers; instead, they
1425+ should store a list of classes in a ``classes_ `` attribute or property. The
1426+ order of class labels in this attribute should match the order in which
1427+ ``predict_proba ``, ``predict_log_proba `` and ``decision_function `` return their
1428+ values. The easiest way to achieve this is to put::
14321429
14331430 self.classes_, y = np.unique(y, return_inverse=True)
14341431
1435- in ``fit ``.
1436- This returns a new ``y `` that contains class indexes, rather than labels,
1437- in the range [0, ``n_classes ``).
1432+ in ``fit ``. This returns a new ``y `` that contains class indexes, rather than
1433+ labels, in the range [0, ``n_classes ``).
14381434
14391435A classifier's ``predict `` method should return
14401436arrays containing class labels from ``classes_ ``.
@@ -1445,14 +1441,89 @@ this can be achieved with::
14451441 D = self.decision_function(X)
14461442 return self.classes_[np.argmax(D, axis=1)]
14471443
1448- In linear models, coefficients are stored in an array called ``coef_ ``,
1449- and the independent term is stored in ``intercept_ ``.
1450- `` sklearn.linear_model.base `` contains a few base classes and mixins
1451- that implement common linear model patterns.
1444+ In linear models, coefficients are stored in an array called ``coef_ ``, and the
1445+ independent term is stored in ``intercept_ ``. `` sklearn.linear_model.base ``
1446+ contains a few base classes and mixins that implement common linear model
1447+ patterns.
14521448
14531449The :mod: `sklearn.utils.multiclass ` module contains useful functions
14541450for working with multiclass and multilabel problems.
14551451
1452+ Estimator Tags
1453+ --------------
1454+ .. warning ::
1455+
1456+ The estimator tags are experimental and the API is subject to change.
1457+
1458+ Scikit-learn introduced estimator tags in version 0.21. These are annotations
1459+ of estimators that allow programmatic inspection of their capabilities, such as
1460+ sparse matrix support, supported output types and supported methods. The
1461+ estimator tags are a dictionary returned by the method ``_get_tags() ``. These
1462+ tags are used by the common tests and the :func: `sklearn.utils.estomator_checks.check_estimator ` function to
1463+ decide what tests to run and what input data is appropriate. Tags can depends on
1464+ estimator parameters or even system architecture and can in general only be
1465+ determined at runtime.
1466+
1467+ The default value of all tags except for ``X_types `` is ``False ``.
1468+
1469+ The current set of estimator tags are:
1470+
1471+ non_deterministic
1472+ whether the estimator is not deterministic given a fixed ``random_state ``
1473+
1474+ requires_positive_data - unused for now
1475+ whether the estimator requires positive X.
1476+
1477+ no_validation
1478+ whether the estimator skips input-validation. This is only meant for stateless and dummy transformers!
1479+
1480+ multioutput - unused for now
1481+ whether a regressor supports multi-target outputs or a classifier supports multi-class multi-output.
1482+
1483+ multilabel
1484+ whether the estimator supports multilabel output
1485+
1486+ stateless
1487+ whether the estimator needs access to data for fitting. Even though
1488+ an estimator is stateless, it might still need a call to ``fit `` for initialization.
1489+
1490+ allow_nan
1491+ whether the estimator supports data with missing values encoded as np.NaN
1492+
1493+ poor_score
1494+ whether the estimator fails to provide a "reasonable" test-set score, which
1495+ currently for regression is an R2 of 0.5 on a subset of the boston housing
1496+ dataset, and for classification an accuracy of 0.83 on
1497+ ``make_blobs(n_samples=300, random_state=0) ``. These datasets and values
1498+ are based on current estimators in sklearn and might be replaced by
1499+ something more systematic.
1500+
1501+ multioutput_only
1502+ whether estimator supports only multi-output classification or regression.
1503+
1504+ _skip_test
1505+ whether to skip common tests entirely. Don't use this unless you have a *very good * reason.
1506+
1507+ X_types
1508+ Supported input types for X as list of strings. Tests are currently only run if '2darray' is contained
1509+ in the list, signifying that the estimator takes continuous 2d numpy arrays as input. The default
1510+ value is ['2darray']. Other possible types are ``'string' ``, ``'sparse' ``,
1511+ ``'categorical' ``, ``dict ``, ``'1dlabels' `` and ``'2dlabels' ``.
1512+ The goals is that in the future the supported input type will determine the
1513+ data used during testsing, in particular for ``'string' ``, ``'sparse' `` and
1514+ ``'categorical' `` data. For now, the test for sparse data do not make use
1515+ of the ``'sparse' `` tag.
1516+
1517+
1518+ In addition to the tags, estimators are also need to declare any non-optional
1519+ parameters to ``__init__ `` in the ``_required_parameters `` class attribute,
1520+ which is a list or tuple. If ``_required_parameters `` is only
1521+ ``["estimator"] `` or ``["base_estimator"] ``, then the estimator will be
1522+ instantiated with an instance of ``LinearDiscriminantAnalysis `` (or
1523+ ``RidgeRegression `` if the estimator is a regressor) in the tests. The choice
1524+ of these two models is somewhat idiosyncratic but both should provide robust
1525+ closed-form solutions.
1526+
14561527.. _reading-code :
14571528
14581529Reading the existing code base
0 commit comments