Spyder version Errors and warning:
2) OneHotEncoder categorical_features depreciated, how to transform
specific column?
Warning : FutureWarning: The handling of integer data will change in version 0.22. Currently, the
categories are determined based on the range [0, max(values)], while in the future they will be
determined based on the unique values.
If you want the future behaviour and silence this warning, you can specify "categories='auto'".
In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers,
then you can now use the OneHotEncoder directly.
warnings.warn(msg, FutureWarning)
F:\Anaconda\lib\site-packages\sklearn\preprocessing\_encoders.py:451: DeprecationWarning: The
'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use
the ColumnTransformer instead.
"use the ColumnTransformer instead.", DeprecationWarning)
Solution:
1) In the future, you should not define the columns in the OneHotEncoder directly,
unless you want to use "categories='auto'". The first message also tells you to
use OneHotEncoder directly, without the LabelEncoder first. Finally, the second
message tells you to use ColumnTransformer, which is like a Pipe for columns
transformations.
Here is the equivalent code for your case :
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Name_Of_Your_Step", OneHotEncoder(),[0])],
remainder="passthrough")) # The last arg ([0]) is the list of columns you want to
transform in this step
ct.fit_transform(X)
See also : ColumnTransformer documentation
For the above example;
Encoding Categorical data (Basically Changing Text to Numerical data i.e,
Country Name) from sklearn.preprocessing import LabelEncoder,
OneHotEncoder from sklearn.compose import ColumnTransformer Encode Country
Column labelencoder_X = LabelEncoder() X[:,0] =
labelencoder_X.fit_transform(X[:,0]) ct = ColumnTransformer([("Country",
OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)
2) There is a way that you can do one hot encoding with pandas. Python:
import pandas as pd
ohe=pd.get_dummies(dataframe_name['column_name'])