You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I use GridSearchCV to optimize the hyperparameters of a pipeline. I set the param grid by inputing transformers or estimators at different steps of the pipeline, following the Pipeline documentation:
A step’s estimator may be replaced entirely by setting the parameter with its name to another estimator, or a transformer removed by setting to None.
I couldn't figure why dumping cv_results_ would take so much memory on disk. It happens that cv_results_['params'] and all cv_results_['param_*'] objects contains fitted estimators, as much as there are points on my grid.
This bug should happen only when n_jobs = 1 (which is my usecase).
I don't think this is intended (else the arguments and attributes refit and best_estimator wouldn't be used).
My guess is that during the grid search, those estimator's aren't cloned before use (which could be a problem if using the same grid search several times, because estimators passed in the next grid would be fitted...).