Skip to content

Single sample test time performance #1363

@pprett

Description

@pprett

Test time performance on single samples is important in real-world applications. Currently, performance on individual samples is often governed by input validation rather than model evaluation. Consider the following profile of GradientBoostingRegressor.decision_function trained on boston using 250 trees::

645         1          151    151.0     53.7          X = array2d(X, dtype=DTYPE, order='C')
646         1           49     49.0     17.4            score = self._init_decision_function(X)
647         1           78     78.0     27.8            predict_stages(self.estimators_, X, self.learning_rate, score)
648         1            3      3.0      1.1               return score

The major reason is that sklearn.validation.array2d calls scipy.sparse.issparse twice - this could be fixed but still the overhead from checking if the array values are finite is considerable.

We should optimize input validation or provide means to turn it off.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EasyWell-defined and straightforward way to resolveEnhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions