-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
MRG add n_features_out_ attribute #14241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Does this look good, and do we want it like this? I also don't like that we can't just overwrite the property :-/ |
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't like that we can't just overwrite the property :-/
Why not provide a setter??
@n_features_out_.setter
def _(self, val):
self._n_features_out = val
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't like that we can't just overwrite the property :-/
Why not provide a setter??
@n_features_out_.setter
def _(self, val):
self._n_features_out = val
| return self.fit(X, y, **fit_params).transform(X) | ||
|
|
||
| @property | ||
| def n_features_out_(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I like this magic determination. I'd rather it be done by specialised mixins for decomposition and feature selection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I could do that.
I have to check how much of these are actually in the decomposition module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I prefer having these be mixins or base classes. It seems unlikely you want to mix those and base classes make the code shorter.
# Conflicts: # sklearn/impute/_base.py
# Conflicts: # sklearn/utils/estimator_checks.py
|
Ok so I did everything with mixins. That seems a bit verbose right now, but I'm pretty sure once we add feature names in some way, this will pay off. I'm really not sure why we'd use mixins here instead of base classes tbh. I would rather use base-classes |
|
@adrinjalali can you please review this? I think having this will be helpful for feature names, and this one is actually one of the easier parts. |
sklearn/base.py
Outdated
| @property | ||
| def n_features_out_(self): | ||
| if not hasattr(self, 'transform'): | ||
| raise AttributeError("{} doesn't have n_features_out_" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was working on feature_names_out_, I realized you could think of the feature_names_out_ of classifier xxx, as xxx0 to xxx{n-1} for a multiclass classification for instance. That would also make sense in the context of a stacking estimator. What are the feature_names_in_ of a stacking estimator if we're talking about interpreting the model? Should it not be {classifier_{i}_{c} | for i in classifiers and c in classes}, for instance?
Although you could implement a stacking estimator also with a meta-estimator which defines transform and returns the output of the estimator, do the union of those, then have a model on top of that. Then the problem's solved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not sure what your point is. This is for the ClusterMixin.
And I thought we only wanted to define feature names out for transformers.
Stacking should either be a transformer or a meta-estimator.
Also check what this PR does for VotingClassifier btw. But I'm really not sure I follow.
| n_features = self.n_components | ||
| elif hasattr(self, 'components_'): | ||
| n_features = self.components_.shape[0] | ||
| return n_features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a "default" value here? Maybe None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here? This is for ComponentsMixin. If it doesn't have components it probably shouldn't have the ComponentsMixin.
In general: the route I'm taking here is to require the user to set it to None and error otherwise in the tests. That allowed me to test that it's actually implemented.
What we want to enforce for third-party estimators is a slightly different discussion. This PR currently adds new requirements in check_estimator.
|
Yes, I'm happy with this now :)
…On Thu., Aug. 15, 2019, 19:08 Andreas Mueller, ***@***.***> wrote:
So is that approval from you @adrinjalali <https://github.com/adrinjalali>
? ;)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14241?email_source=notifications&email_token=AAMWG6F5RT6HZ2G53TZ7K7TQEWEPFA5CNFSM4H474WSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4MMV4A#issuecomment-521718512>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMWG6BTT5E2OPNA6EFIGULQEWEPFANCNFSM4H474WSA>
.
|
|
@jnothman thoughts? |
|
@amueller merge conflicts in the meantime. |
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amueller, this is looking pretty good, but you wrote that "This PR implements it for all estimators but FunctionTransformer, which sets it explicitly to None."... but it doesn't include vectorizers.
It also doesn't add n_features_out_ for Pipeline, FeatureUnion, GridSearchCV, etc., as far as I can see.
| class ComponentsMixin: | ||
| @property | ||
| def n_features_out_(self): | ||
| if hasattr(self, 'n_components_'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider deprecating n_components_, given the availability of n_features_out_??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider it ;)
# Conflicts: # sklearn/cluster/birch.py # sklearn/decomposition/base.py # sklearn/decomposition/factor_analysis.py # sklearn/decomposition/fastica_.py # sklearn/decomposition/kernel_pca.py # sklearn/decomposition/nmf.py # sklearn/decomposition/online_lda.py # sklearn/decomposition/sparse_pca.py # sklearn/decomposition/truncated_svd.py # sklearn/kernel_approximation.py # sklearn/manifold/isomap.py # sklearn/manifold/locally_linear.py # sklearn/neighbors/nca.py # sklearn/neural_network/rbm.py # sklearn/preprocessing/data.py # sklearn/random_projection.py
|
updated to reflect changes from #14812. |
Correct, cause they are not covered in common tests :( |
|
Should I add this to meta-estimators in this PR as well? |
|
I think if we are certain we want the attribute on meta-estimators, it
makes sense to do them here.
I think even if those estimators are tested, they're not tested as
transformers... and GridSearchCV vary rarely would be used as a transformer.
|
| def n_features_out_(self): | ||
| if hasattr(self, 'n_components_'): | ||
| # n_components could be auto or None | ||
| # this is more likely to be an int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also include the isinstance(..., numbers.Integral) check here to be sure.
|
I guess since we're doing a SLEP for |
|
@adrinjalali we could. Or we could do one for both? The issues are basically the same. The API is pretty easy and obvious, I think the only questions are about backward compatibility of |
|
still needs pipeline (and XSearchCV? And CountVectorizer?). |
|
Closing as it might be less useful since we have get_feature_names_out now |
This should be easier once we have an
n_features_in_attribute #13603, and use aOneToOneMixinor something like that.