Skip to content

[MRG+2] add get_feature_names to PolynomialFeatures#6372

Merged
jakevdp merged 4 commits intoscikit-learn:masterfrom
amueller:poly_feature_names
Feb 25, 2016
Merged

[MRG+2] add get_feature_names to PolynomialFeatures#6372
jakevdp merged 4 commits intoscikit-learn:masterfrom
amueller:poly_feature_names

Conversation

@amueller
Copy link
Copy Markdown
Member

Fixes #6185, replaces #6216

@amueller
Copy link
Copy Markdown
Member Author

ping @jakevdp ;)

return np.vstack(np.bincount(c, minlength=self.n_input_features_)
for c in combinations)

def get_feature_names(self, input_features=None):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been suggesting for a while that this is an appropriate way to deal with feature names in extractor/transformer pipelines. I'm happy to see this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Very nice.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wanting to do it but didn't have time. I'm just writing the book chapter about preprocessing and I'm embarrassed not having that feature ;)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, who knew writing books would be good for something?

@jnothman
Copy link
Copy Markdown
Member

LGTM.

@jnothman
Copy link
Copy Markdown
Member

I qualify that LGTM. I think we need to decide whether x_1, x_2, ... is the appropriate naming scheme. If we are to go with get_feature_names transforming its input feature name list, the same convention will apply in, e.g. feature selectors and feature agglomeration.

A further small issue: should these be unicodes in Python 2? We should at least test that when unicode input names are passed, an error is not triggered.

----------
input_features : list of string, length n_features, optional
String names for input features if available. By default,
"x0", "x1", ... "x_n_features" is used.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have underscores (x_0, x_1) to match code below.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be cleaner without underscores? I find things like x_1^2 x_2^1 harder to visually parse than x1^2 x2^1

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah... well I was thinking about latex. But maybe that's silly.

@amueller
Copy link
Copy Markdown
Member Author

I changed the naming scheme to x%d for easier visual parsing. I think x for data is good. We could also do x[%d], which is what we use in trees.

@amueller
Copy link
Copy Markdown
Member Author

added a unicode test. Is that what you had in mind @jnothman ?

# test some unicode
poly = PolynomialFeatures(degree=1, include_bias=True).fit(X)
feature_names = poly.get_feature_names([u"\u0001F40D", u"\u262E", u"\u05D0"])
assert_array_equal(["1", u"\u0001F40D^1", u"\u262E^1", u"\u05D0^1"],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit awkward that 1 is not unicode. Can we find a simple convention for such things?

@jnothman
Copy link
Copy Markdown
Member

We could also do x[%d], which is what we use in trees.

That consistency (I assume you mean with visualisation) might be worthwhile, though I know that the use of the first subscript (as opposed to X[:, %d]) confuses people, despite the lowercase x.

@amueller
Copy link
Copy Markdown
Member Author

hm, I can't reproduce the error... it seems to be in powers_. Weird.

@amueller
Copy link
Copy Markdown
Member Author

powers_ is never tested and is broken in numpy 0.16.1 (fixed in 0.16.2), because it used the bincount of an empty sequence

@amueller
Copy link
Copy Markdown
Member Author

should be fixed. I have no opinion re x[0] vs x0, The second has less "noise" I guess?

@jakevdp
Copy link
Copy Markdown
Member

jakevdp commented Feb 20, 2016

I would prefer x0 vs x[0], because it's half the number of characters and leads to easier-to-read variable names.

@amueller
Copy link
Copy Markdown
Member Author

appveyor failure due to #4914.
@jakevdp @jnothman +1?

@jakevdp
Copy link
Copy Markdown
Member

jakevdp commented Feb 22, 2016

+1 for merge

@jnothman
Copy link
Copy Markdown
Member

+1

@jnothman jnothman changed the title [MRG] add get_feature_names to PolynomialFeatures [MRG+2] add get_feature_names to PolynomialFeatures Feb 23, 2016

Returns
-------
output_feature_names : list of string, length n_output_features
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nitpick here:
Maybe there should be a "." in the end of this line?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conventionally not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is description, not type.

@amueller
Copy link
Copy Markdown
Member Author

should be good now.

@jakevdp
Copy link
Copy Markdown
Member

jakevdp commented Feb 24, 2016

Once tests pass I'd say we can merge

@jakevdp
Copy link
Copy Markdown
Member

jakevdp commented Feb 25, 2016

Thanks @amueller!

jakevdp added a commit that referenced this pull request Feb 25, 2016
[MRG+2] add get_feature_names to PolynomialFeatures
@jakevdp jakevdp merged commit 7895d38 into scikit-learn:master Feb 25, 2016
@amueller amueller deleted the poly_feature_names branch May 19, 2017 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants