-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
The underscore notation for specifying grid search parameters is unwieldy, because adding a layer of indirection in the model (e.g. a Pipeline wrapping an estimator you want to search parameters on) means prefixing all corresponding parameters.
We should be able to specify parameter searches using the estimator instances. The interface proposed by @amueller at #4949 (comment) (and elsewhere) suggests a syntax like:
char_vec = CountVectorizer(analyzer="char").search_params(n_gram_range=[(3, 3), (3, 5), (5, 5)])
word_vec = CountVectorizer().search_params(n_gram_range=[(1, 1), (1, 2), (2, 2)])
svc = LinearSVC().search_params(C=[0.001, 0.1, 10, 100])
GridSearchCV(make_pipeline(make_feature_union(char_vec, word_vec), svc), cv=..., scoring=...).fit(X, y)Calling search_params would presumably set an instance attribute on the estimator to record the search information.
Questions of fine semantics that need to be clarified for this approach include:
- does a call to
search_paramsoverwrite all previous settings for that estimator? - does
clonemaintain the priorsearch_params? - should this affect the search space of specialised CV objects (e.g.
LassoCV)
Questions of functionality include:
a) is RandomizedSearchCV supported by merely making one of the search spaces a scipy.stats rv, making some searches GridSearchCV-incompatible?
b) is there any way to support multiple grids, as is currently allowed in GridSearchCV?
I have proposed an alternative syntax that still avoids problems with underscore notation, and does not have the above issues, but is less user-friendly than the syntax above:
char_vec = CountVectorizer(analyzer="char")
word_vec = CountVectorizer()
svc = LinearSVC()
param_grid = {(char_vec, 'n_gram_range'): [(3, 3), (3, 5), (5, 5)],
(word_vec, 'n_gram_range'): [(1, 1), (1, 2), (2, 2)],
(svc, 'C'): [0.001, 0.1, 10, 100]}
GridSearchCV(make_pipeline(make_feature_union(char_vec, word_vec), svc),
param_grid,
cv=..., scoring=...).fit(X, y)Here, parameters are specified as a pair of (estimator, parameter name), but they are constructed directly as a grid and passed to GridSearchCV/RandomizedSearchCV