-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Closed
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolveEnhancement
Description
Test time performance on single samples is important in real-world applications. Currently, performance on individual samples is often governed by input validation rather than model evaluation. Consider the following profile of GradientBoostingRegressor.decision_function trained on boston using 250 trees::
645 1 151 151.0 53.7 X = array2d(X, dtype=DTYPE, order='C')
646 1 49 49.0 17.4 score = self._init_decision_function(X)
647 1 78 78.0 27.8 predict_stages(self.estimators_, X, self.learning_rate, score)
648 1 3 3.0 1.1 return score
The major reason is that sklearn.validation.array2d calls scipy.sparse.issparse twice - this could be fixed but still the overhead from checking if the array values are finite is considerable.
We should optimize input validation or provide means to turn it off.
Metadata
Metadata
Assignees
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolveEnhancement