[MRG+2] add get_feature_names to PolynomialFeatures by amueller · Pull Request #6372 · scikit-learn/scikit-learn

amueller · 2016-02-16T21:03:41Z

Fixes #6185, replaces #6216

amueller · 2016-02-16T21:14:43Z

jnothman · 2016-02-16T21:26:59Z

sklearn/preprocessing/data.py

        return np.vstack(np.bincount(c, minlength=self.n_input_features_)
                         for c in combinations)

+    def get_feature_names(self, input_features=None):


I've been suggesting for a while that this is an appropriate way to deal with feature names in extractor/transformer pipelines. I'm happy to see this.

+1. Very nice.

I've been wanting to do it but didn't have time. I'm just writing the book chapter about preprocessing and I'm embarrassed not having that feature ;)

Hey, who knew writing books would be good for something?

jnothman · 2016-02-16T21:28:07Z

LGTM.

jnothman · 2016-02-16T21:31:37Z

I qualify that LGTM. I think we need to decide whether x_1, x_2, ... is the appropriate naming scheme. If we are to go with get_feature_names transforming its input feature name list, the same convention will apply in, e.g. feature selectors and feature agglomeration.

A further small issue: should these be unicodes in Python 2? We should at least test that when unicode input names are passed, an error is not triggered.

jakevdp · 2016-02-16T21:40:55Z

sklearn/preprocessing/data.py

+        ----------
+        input_features : list of string, length n_features, optional
+            String names for input features if available. By default,
+            "x0", "x1", ... "x_n_features" is used.


Should have underscores (x_0, x_1) to match code below.

Maybe it would be cleaner without underscores? I find things like x_1^2 x_2^1 harder to visually parse than x1^2 x2^1

yeah... well I was thinking about latex. But maybe that's silly.

amueller · 2016-02-17T00:10:55Z

I changed the naming scheme to x%d for easier visual parsing. I think x for data is good. We could also do x[%d], which is what we use in trees.

amueller · 2016-02-17T00:17:58Z

added a unicode test. Is that what you had in mind @jnothman ?

jnothman · 2016-02-17T01:43:42Z

sklearn/preprocessing/tests/test_data.py

+    # test some unicode
+    poly = PolynomialFeatures(degree=1, include_bias=True).fit(X)
+    feature_names = poly.get_feature_names([u"\u0001F40D", u"\u262E", u"\u05D0"])
+    assert_array_equal(["1", u"\u0001F40D^1", u"\u262E^1", u"\u05D0^1"],


It's a bit awkward that 1 is not unicode. Can we find a simple convention for such things?

jnothman · 2016-02-17T01:44:47Z

We could also do x[%d], which is what we use in trees.

That consistency (I assume you mean with visualisation) might be worthwhile, though I know that the use of the first subscript (as opposed to X[:, %d]) confuses people, despite the lowercase x.

amueller · 2016-02-19T19:56:18Z

hm, I can't reproduce the error... it seems to be in powers_. Weird.

amueller · 2016-02-19T20:49:45Z

powers_ is never tested and is broken in numpy 0.16.1 (fixed in 0.16.2), because it used the bincount of an empty sequence

amueller · 2016-02-19T20:56:43Z

should be fixed. I have no opinion re x[0] vs x0, The second has less "noise" I guess?

jakevdp · 2016-02-20T02:43:37Z

I would prefer x0 vs x[0], because it's half the number of characters and leads to easier-to-read variable names.

amueller · 2016-02-22T18:50:13Z

appveyor failure due to #4914.
@jakevdp @jnothman +1?

jakevdp · 2016-02-22T22:49:20Z

+1 for merge

jnothman · 2016-02-23T00:04:12Z

+1

yenchenlin · 2016-02-23T11:56:54Z

sklearn/preprocessing/data.py

+
+        Returns
+        -------
+        output_feature_names : list of string, length n_output_features


Just a nitpick here:
Maybe there should be a "." in the end of this line?

conventionally not.

Oh I see.
So this line:
https://github.com/amueller/scikit-learn/blob/poly_feature_names/sklearn/preprocessing/data.py#L1586
ends with a "." because it's a new sentence used to describe output?

Yes, it is description, not type.

amueller · 2016-02-24T22:08:30Z

should be good now.

jakevdp · 2016-02-24T22:10:30Z

Once tests pass I'd say we can merge

jakevdp · 2016-02-25T19:54:56Z

Thanks @amueller!

[MRG+2] add get_feature_names to PolynomialFeatures

amueller force-pushed the poly_feature_names branch from 48eb9e7 to 30e591e Compare February 16, 2016 21:17

jnothman reviewed Feb 16, 2016
View reviewed changes

jakevdp reviewed Feb 16, 2016
View reviewed changes

amueller force-pushed the poly_feature_names branch from 30e591e to ec4d92d Compare February 17, 2016 00:09

amueller force-pushed the poly_feature_names branch from ec4d92d to bff115c Compare February 17, 2016 00:17

jnothman reviewed Feb 17, 2016
View reviewed changes

maniteja123 mentioned this pull request Feb 17, 2016

ENH: Add feature_names_ property to PolynomialFeatures #6216

Closed

amueller added 2 commits February 19, 2016 15:52

add get_feature_names to PolynomialFeatures

3fa684d

fix PolynomialFeatures.powers_ in python 0.16.1

ddc1740

amueller force-pushed the poly_feature_names branch from bff115c to ddc1740 Compare February 19, 2016 20:53

don't do ^1

897f8b6

amueller force-pushed the poly_feature_names branch from 9a11f46 to 897f8b6 Compare February 22, 2016 20:09

jnothman mentioned this pull request Feb 22, 2016

Pipeline object does not have a get_feature_names method - intentional? #6421

Closed

jnothman changed the title ~~[MRG] add get_feature_names to PolynomialFeatures~~ [MRG+2] add get_feature_names to PolynomialFeatures Feb 23, 2016

This was referenced Feb 23, 2016

RFC generalised Pipeline.get_feature_names #6424

Closed

Transformative get_feature_names for various transformers #6425

Closed

yenchenlin reviewed Feb 23, 2016
View reviewed changes

fixed doc for powers, added test

8fb928d

jakevdp added a commit that referenced this pull request Feb 25, 2016

Merge pull request #6372 from amueller/poly_feature_names

7895d38

[MRG+2] add get_feature_names to PolynomialFeatures

jakevdp merged commit 7895d38 into scikit-learn:master Feb 25, 2016

amueller deleted the poly_feature_names branch May 19, 2017 20:24

Uh oh!

Conversation

amueller commented Feb 16, 2016

Uh oh!

amueller commented Feb 16, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Feb 16, 2016

Uh oh!

jnothman commented Feb 16, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Feb 17, 2016

Uh oh!

amueller commented Feb 17, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Feb 17, 2016

Uh oh!

amueller commented Feb 19, 2016

Uh oh!

amueller commented Feb 19, 2016

Uh oh!

amueller commented Feb 19, 2016

Uh oh!

jakevdp commented Feb 20, 2016

Uh oh!

amueller commented Feb 22, 2016

Uh oh!

jakevdp commented Feb 22, 2016

Uh oh!

jnothman commented Feb 23, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Feb 24, 2016

Uh oh!

jakevdp commented Feb 24, 2016

Uh oh!

jakevdp commented Feb 25, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants