Add get_feature_dependence to transformers by jnothman · Pull Request #5 · scikit-learn/enhancement_proposals

jnothman · 2017-01-31T01:50:15Z

This is, I suppose, a WIP. But I'd like hints for what else needs to be done :)

jnothman · 2017-01-31T01:51:56Z

amueller · 2017-02-01T23:05:27Z

added this to my todo priority queue

amueller · 2017-02-01T23:47:57Z

a total skim tells me that it doesn't say how the feature contributes. I feel like for polynomial features for example we could do better. I guess that would be part of describe_features, so I'd like to include that in the SEP.

Can we create some use-cases? I think my main use-case is labeling coefficients of a classifier at the end of a pipeline (or feature importances). get_feature_dependence does not solve that problem.

You gave implementation examples, but no use-case examples.
How would it look like to compress a dataset from a pipeline.
Say we have make_pipeline(SomeFeatureSelection(), LinearSVC(penalty="L1")).

Do only transformers have get_feature_dependence?

So main comment: write code that uses this, and I think one use case is get human-readable string names, the other is knowing which features actually influence the output. If you have more, feel free to add.

jnothman · 2017-02-02T00:22:15Z

a total skim tells me that it doesn't say how the feature contributes.

No, it only says that the feature contributes. Saying how the feature contributes is obviously a lot more complicated when non-linear. Where the input is just an array of features, you can maybe assess contribution by throwing random data at it, but getting an explicit mapping between input and output features for some transformer(s) seems to be a straightforward, consistent way to inspect this roughly.

Given my realisation that a SelectFeaturesByName meta-transformer is only going to work with feature names being passed alongside data (rather than through a separate transform_feature_names function) I am less certain that being able to get feature descriptions only for selected features is necessary. Nonetheless, I will endeavour to add some usage examples.

amueller · 2017-02-02T00:24:54Z

Given my realisation that a SelectFeaturesByName meta-transformer is only going to work with feature names being passed alongside data (rather than through a separate transform_feature_names function)

I'm not sure I follow, but that might be because I didn't read my last 2000 github notifications.

jnothman · 2017-02-02T00:33:00Z

Haha :) the relevant comment is scikit-learn/scikit-learn#6425 (comment), but don't rush

I think I will draft an example performing model compression with make_pipeline(CountVectorizer(), ..., LogisticRegression(penalty='l1')) where the middle steps could include any of SelectKBest, SparsePCA, PolynomialFeatures. The first step could equally be DictVectorizer or a union of CountVectorizers meaning that we can eliminate entire feature extraction processes by this method.

amueller · 2017-02-02T00:36:08Z

I need to think through your use-case but I think we should be able to support this. [And I might know next week if I get a 2yr grant to work on pandas integration and feature names ;)]

amueller · 2017-02-02T00:36:42Z

Also, it's 3134 notifications and it makes me sad :-/

jnothman · 2017-02-02T00:37:32Z

Need a hug?

…

On 2 February 2017 at 11:36, Andreas Mueller ***@***.***> wrote: Also, it's 3134 notifications and it makes me sad :-/ — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz623KoHsc04wfeKr30tp2TnAJcBNRks5rYSUagaJpZM4LyJfZ> .

amueller · 2017-02-02T00:40:03Z

I'm good but thanks :)

Proposal for get_feature_dependence

c508dfd

jnothman force-pushed the dependence branch from a6c0970 to c508dfd Compare January 31, 2017 01:50

jnothman mentioned this pull request Jan 31, 2017

Support explain_weights(pipeline) TeamHG-Memex/eli5#158

Closed

jnothman added 2 commits January 31, 2017 16:27

Comment on 1d output

341cd12

Comment on examples

708d078

add comment on number of input features

ee5adf1

amueller mentioned this pull request Mar 5, 2017

Can't provide feature indices for OneHotEncoder in pipeline scikit-learn/scikit-learn#8539

Closed

amueller pushed a commit to amueller/enhancement_proposals that referenced this pull request Dec 9, 2018

Fixes scikit-learn#5. Add code syntax highlighting example

6fe8c2a

amueller merged commit ee5adf1 into master Dec 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add get_feature_dependence to transformers#5

Add get_feature_dependence to transformers#5
amueller merged 4 commits intomasterfrom
dependence

jnothman commented Jan 31, 2017

Uh oh!

jnothman commented Jan 31, 2017

Uh oh!

amueller commented Feb 1, 2017

Uh oh!

amueller commented Feb 1, 2017

Uh oh!

jnothman commented Feb 2, 2017

Uh oh!

amueller commented Feb 2, 2017

Uh oh!

jnothman commented Feb 2, 2017 •

edited

Loading

Uh oh!

amueller commented Feb 2, 2017

Uh oh!

amueller commented Feb 2, 2017

Uh oh!

jnothman commented Feb 2, 2017 via email

Uh oh!

amueller commented Feb 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

jnothman commented Jan 31, 2017

Uh oh!

jnothman commented Jan 31, 2017

Uh oh!

amueller commented Feb 1, 2017

Uh oh!

amueller commented Feb 1, 2017

Uh oh!

jnothman commented Feb 2, 2017

Uh oh!

amueller commented Feb 2, 2017

Uh oh!

jnothman commented Feb 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Feb 2, 2017

Uh oh!

amueller commented Feb 2, 2017

Uh oh!

jnothman commented Feb 2, 2017 via email

Uh oh!

amueller commented Feb 2, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jnothman commented Feb 2, 2017 •

edited

Loading