-
-
Notifications
You must be signed in to change notification settings - Fork 26.7k
Description
A fair amount of estimators currently have copy=True (or copy_X=True) by default. In practice, this means that the code looks something like,
X = check_array(X, copy=copy)and then some other calculations that may change or not X inplace. In the case when the following operations are not done inplace, we have just made a wasteful copy with no good reason.
As discussed in #13923, an example is for instance Ridge(fit_intercept=False) that will copy X, although it is not needed. Actually, I can't find any inplace operations of (found it)X in Ridge even with fit_intercept=True, but maybe I am missing something.
I think in general it would be better to avoid the,
X = check_array(X, copy=copy)pattern, and instead make a copy explicitly where it is needed. Maybe it could be OK to not make a copy with copy=True if no copy is needed. Alternatively we could introduce copy=None by default.
Adding a common test that checks that Estimator(copy=True).fit(X, y) doesn't change X.