Skip to content

Add get_feature_dependence to transformers#5

Merged
amueller merged 4 commits intomasterfrom
dependence
Dec 27, 2018
Merged

Add get_feature_dependence to transformers#5
amueller merged 4 commits intomasterfrom
dependence

Conversation

@jnothman
Copy link
Copy Markdown
Member

This is, I suppose, a WIP. But I'd like hints for what else needs to be done :)

@jnothman
Copy link
Copy Markdown
Member Author

Ping @GaelVaroquaux @amueller @kmike

@amueller
Copy link
Copy Markdown
Member

amueller commented Feb 1, 2017

added this to my todo priority queue

@amueller
Copy link
Copy Markdown
Member

amueller commented Feb 1, 2017

a total skim tells me that it doesn't say how the feature contributes. I feel like for polynomial features for example we could do better. I guess that would be part of describe_features, so I'd like to include that in the SEP.

Can we create some use-cases? I think my main use-case is labeling coefficients of a classifier at the end of a pipeline (or feature importances). get_feature_dependence does not solve that problem.

You gave implementation examples, but no use-case examples.
How would it look like to compress a dataset from a pipeline.
Say we have make_pipeline(SomeFeatureSelection(), LinearSVC(penalty="L1")).

Do only transformers have get_feature_dependence?

So main comment: write code that uses this, and I think one use case is get human-readable string names, the other is knowing which features actually influence the output. If you have more, feel free to add.

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Feb 2, 2017

a total skim tells me that it doesn't say how the feature contributes.

No, it only says that the feature contributes. Saying how the feature contributes is obviously a lot more complicated when non-linear. Where the input is just an array of features, you can maybe assess contribution by throwing random data at it, but getting an explicit mapping between input and output features for some transformer(s) seems to be a straightforward, consistent way to inspect this roughly.

Given my realisation that a SelectFeaturesByName meta-transformer is only going to work with feature names being passed alongside data (rather than through a separate transform_feature_names function) I am less certain that being able to get feature descriptions only for selected features is necessary. Nonetheless, I will endeavour to add some usage examples.

@amueller
Copy link
Copy Markdown
Member

amueller commented Feb 2, 2017

Given my realisation that a SelectFeaturesByName meta-transformer is only going to work with feature names being passed alongside data (rather than through a separate transform_feature_names function)

I'm not sure I follow, but that might be because I didn't read my last 2000 github notifications.

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Feb 2, 2017

Haha :) the relevant comment is scikit-learn/scikit-learn#6425 (comment), but don't rush

I think I will draft an example performing model compression with make_pipeline(CountVectorizer(), ..., LogisticRegression(penalty='l1')) where the middle steps could include any of SelectKBest, SparsePCA, PolynomialFeatures. The first step could equally be DictVectorizer or a union of CountVectorizers meaning that we can eliminate entire feature extraction processes by this method.

@amueller
Copy link
Copy Markdown
Member

amueller commented Feb 2, 2017

I need to think through your use-case but I think we should be able to support this. [And I might know next week if I get a 2yr grant to work on pandas integration and feature names ;)]

@amueller
Copy link
Copy Markdown
Member

amueller commented Feb 2, 2017

Also, it's 3134 notifications and it makes me sad :-/

@jnothman
Copy link
Copy Markdown
Member Author

jnothman commented Feb 2, 2017 via email

@amueller
Copy link
Copy Markdown
Member

amueller commented Feb 2, 2017

I'm good but thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants