ENH Add get_feature_names_out to FunctionTransformer#21569
ENH Add get_feature_names_out to FunctionTransformer#21569thomasjpfan merged 17 commits intoscikit-learn:mainfrom
Conversation
thomasjpfan
left a comment
There was a problem hiding this comment.
For validate=False and the feature_names_out parameter is set, I am propose we set feature_names_in_ and n_features_in_, but not validate it during fit or transform.
As for the API, I am thinking of restricting feature_names_out two options at first:
None: No feature names out- callable: User provide function to compute feature names out
Two more options for follow up PRs:
'one-to-one': Feature names out == feature names in- array-like of strings: I am currently unsure about the use case for this option that the callable can not resolve. But we can discuss in a follow up.
|
Thanks @thomasjpfan . I'll remove the option to set |
I think the default still needs to be Let's add |
…ake default 'one-to-one'
|
I just read your message, I had already updated the PR to remove the option to pass an array-like of strings, and I set the default to 'one-to-one'. |
It is, but I do not think we can assume it. If a user pass a function that creates a column then
We can use scikit-learn/sklearn/utils/metaestimators.py Line 140 in 48e83df |
|
Thanks @thomasjpfan . I updated the PR to make None the default. Right now get_feature_names_out raises a ValueError if |
|
I ran black, and flake8, make test-coverage, etc., but they didn't catch the issues with the numpydoc (a newline was missing) or with v1.1.rst (someone else had forgotten a `). I looked in the Contributing doc, but I can't find instructions to catch these errors before I push the code to github. Did I miss something? |
|
Hi @thomasjpfan, is there anything else you need me to do for this PR? |
For some reason the numpydoc validation was done externally and not part as the main test suite. I am not sure why we do that. We should probably run those checks as part of the main test suite to avoid the confusion. |
ogrisel
left a comment
There was a problem hiding this comment.
LGTM. I think the PR in its current state should cover most useful cases. I did not see any particular defect. Just a small improvement suggestion for one of the exception messages below:
…geron/scikit-learn into function_transformer_feature_names_out
|
Thanks for reviewing, Olivier. I just made the change you suggested. |
|
|
In such cases, should I pull and merge |
That would not hurt, and if the PR is "CI green ticked", it might get a better chance to attract reviewers' attention :) |
|
Thanks @ogrisel , I merged main, now there's a beautiful green tick. 😊 |
thomasjpfan
left a comment
There was a problem hiding this comment.
Thanks for the update @ageron !
|
Thanks for the review. 👍 |
There was a problem hiding this comment.
I copied your function in my scikit enviroment and tried to use it in my enviroment. However I still get the error as below, where preprocessor is my columntransformer and I try the following code:
preprocessor.get_feature_names_out()
Transformer argument looks like this:
('log', FunctionTransformer(np.log1p, validate=True), log_features)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [10], in <cell line: 3>()
1 xt = preprocessor.transform(X_test)
2 #mapie.single_estimator_[1].estimator
----> 3 preprocessor.get_feature_names_out()
File ~\miniconda3\envs\Master_ML\lib\site-packages\sklearn\compose\_column_transformer.py:481, in ColumnTransformer.get_feature_names_out(self, input_features)
479 transformer_with_feature_names_out = []
480 for name, trans, column, _ in self._iter(fitted=True):
--> 481 feature_names_out = self._get_feature_name_out_for_transformer(
482 name, trans, column, input_features
483 )
484 if feature_names_out is None:
485 continue
File ~\miniconda3\envs\Master_ML\lib\site-packages\sklearn\compose\_column_transformer.py:446, in ColumnTransformer._get_feature_name_out_for_transformer(self, name, trans, column, feature_names_in)
444 # An actual transformer
445 if not hasattr(trans, "get_feature_names_out"):
--> 446 raise AttributeError(
447 f"Transformer {name} (type {type(trans).__name__}) does "
448 "not provide get_feature_names_out."
449 )
450 if isinstance(column, Iterable) and not all(
451 isinstance(col, str) for col in column
452 ):
453 column = _safe_indexing(feature_names_in, column)
AttributeError: Transformer log (type FunctionTransformer) does not provide get_feature_names_out.
|
This feature is not released yet and will be released in v1.1. If you want to try out the feature now, you can install the nightly build: pip install --pre --extra-index https://pypi.anaconda.org/scipy-wheels-nightly/simple scikit-learn |
|
Are you sure this working as intended? I just installed Nightly and I still get exactly this error. The code is in my enviroment, at least function_transformer_.py has this method implemented. |
|
import numpy as np
import pandas as pd
from sklearn.preprocessing import FunctionTransformer
mean_transformer = FunctionTransformer(
func=np.log1p,
feature_names_out="one-to-one",
validate=True
)
X = pd.DataFrame({"my_feature": [1, 2, 3]})
X_trans = mean_transformer.fit_transform(X)
print(mean_transformer.get_feature_names_out())
# ['my_feature'] |
|
Thank you Thomas ... sorry for asking all these question that might be totally obvious :( |

Reference Issues/PRs
Follow-up on #18444.
Part of #21308.
This new feature was discussed in #21079.
What does this implement/fix? Explain your changes.
Adds the
get_feature_names_outmethod and a new parameterfeature_names_outtopreprocessing.FunctionTransformer. By default,get_feature_names_outreturns the input feature names, but you can setfeature_names_outto return a different list, which is especially useful when the number of output features differs from the number of input features.For example, here's a
FunctionTransformerthat outputs a single feature, equal to the input's mean along axis=1:The
feature_names_outparameter may also be a callable. This is useful if the output feature names depend on the input feature names, and/or if they depend on parameters likekw_args. Here's an example that uses both. It's a transformer that appendsnrandom features to existing features:Any other comments?
I have some concerns regarding the fact that
validateisFalseby default, which means thatn_features_in_andfeature_names_in_are not set automatically. So if you create aFunctionTransformerwith the defaultvalidate=Falseandfeature_names_out=None, then when you callget_feature_names_outwithout any argument, it will raise an exception (unlesstransformwas called before andfuncsetn_feature_in_orfeature_names_in_). I tried to make this clear in the error message, but I'm worried that this will confuse users. Wdyt?And if
validate=Falseand you setfeature_names_outto a callable, and callget_feature_names_outwith no arguments, then the callable will getinput_features=Noneas input (unlesstransformwas called before andfuncsetn_features_in_orfeature_names_in_). Users may be surprised by this. Should we output a warning in this case? Wdyt?Moreover, as shown in the second code example above, the output feature names may depend on
kw_args, so iffeature_names_outis a callable,get_feature_names_outpassesselfto it, plus theinput_features. I considered checkingfeature_names_out.__code__.co_varnamesto decide whether to pass no arguments, or just theinput_features, or theinput_featuresandself. But__code__is not used anywhere in the code base, andinspectis not used much, so I'm not sure whether such introspection would be frowned upon? I decided that it was simple enough to require users to always have two arguments: the transformer itself, and theinput_features. Wdyt?Lastly, when users want to create a
FunctionTransformerthat outputs a single feature, I expect that many will be tempted to setfeature_names_outto a string instead of a list. To keep things consistent, I decided to raise an exception in this case, and have a clear error message to tell them to use["foo"]instead. Wdyt?