-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
ENH move estimator type to tags #30122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # Test that the best estimator contains the right value for foo_param | ||
| clf = MockClassifier() | ||
| grid_search = GridSearchCV(clf, {"foo_param": [1, 2, 3]}, cv=3, verbose=3) | ||
| grid_search = GridSearchCV(clf, {"foo_param": [1, 2, 3]}, cv=2, verbose=3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the data has only two samples for each class, and in classification case we use stratified cv, which requires cv <= n_samples per class. This wasn't an issue so far cause our MockClassifier wasn't declaring that it's a classifier.
| # The number of samples per class needs to be > n_splits, | ||
| # for StratifiedKFold(n_splits=3) | ||
| y2 = np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 3]) | ||
| y2 = np.array([1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same issue, now we need 5 samples for each class since devault cv is 5.
sklearn/tests/test_metaestimators.py
Outdated
| ) not in sig: | ||
| continue | ||
|
|
||
| for meta_estimator in _construct_instances(Estimator): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meta estimators' type sometimes depends on their sub-estimator, and we should run the test for all their instances.
adam2392
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see this getting consolidated!
Co-authored-by: Adam Li <[email protected]>
…into estimator_type
glemaitre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good. One merge, I'll test it in imbalanced-learn. But I think that having the logic in the tag makes things easier sometimes when we want to extend some part.
| ) | ||
|
|
||
|
|
||
| def _get_instance_with_pipeline(meta_estimator, init_params): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm codecov is complaining of not covered line here. I would not expect it since this should be some code that we have before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't complaining anymore, I think.
adrinjalali
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adam2392 wanna have another look here? Comments should be resolved now.
| exists = True | ||
| item += para | ||
| lst += item | ||
| if est.__sklearn_tags__().input_tags.allow_nan: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noticed that this section wasn't in the scope of the suppress(SkipTest), which it should be
| elif Estimator.__name__ == "FrozenEstimator": | ||
| X, y = make_classification(n_samples=20, n_features=5, random_state=0) | ||
| est = Estimator(LogisticRegression().fit(X, y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noticed we didn't have this while checking our usages of _construct_instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry few more questions after reading through the code (I wanna do a thorough job :p). These are mostly small comments. Once addressed (assuming those aren't massive changes), then I can approve and merge.
| return Tags( | ||
| estimator_type=None, | ||
| target_tags=TargetTags(required=False), | ||
| transformer_tags=None, | ||
| regressor_tags=None, | ||
| classifier_tags=None, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why aren't these the default_tags anymore? Since it's BaseEstimator, I assumed that would be "default".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cause we need to remove default_tags once we're done with the deprecation period. default_tags depends on detecting the type of estimator from the class, and not the instance, which we're removing in this PR.
| est_is_classifier = getattr(estimator, "_estimator_type", None) == "classifier" | ||
| est_is_regressor = getattr(estimator, "_estimator_type", None) == "regressor" | ||
| target_required = est_is_classifier or est_is_regressor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once _estimator_type is removed, what will we do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll be removing default_tags alltogether.
| return Tags( | ||
| estimator_type=None, | ||
| target_tags=TargetTags(required=False), | ||
| transformer_tags=None, | ||
| regressor_tags=None, | ||
| classifier_tags=None, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cause we need to remove default_tags once we're done with the deprecation period. default_tags depends on detecting the type of estimator from the class, and not the instance, which we're removing in this PR.
| est_is_classifier = getattr(estimator, "_estimator_type", None) == "classifier" | ||
| est_is_regressor = getattr(estimator, "_estimator_type", None) == "regressor" | ||
| target_required = est_is_classifier or est_is_regressor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll be removing default_tags alltogether.
| input_tags: InputTags = field(default_factory=InputTags) | ||
|
|
||
|
|
||
| # TODO(1.8): Remove this function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adam2392 note this comment 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see 😅
adam2392
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. thanks @adrinjalali
|
I open #30227 to solve a bug that we did not catch since we did not build the full documentation. I added an entry in the changelog because things could have gone wrong even before with the wrong ordering of the mixin. |
Closes #16469
Moving
_estimator_tpyeto tags.Right now this doesn't work.