0% found this document useful (0 votes)
43 views8 pages

Mastering Categorical Encoding

Uploaded by

tegik80106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views8 pages

Mastering Categorical Encoding

Uploaded by

tegik80106
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Mastering Categorical

Encoding

Encoding categorical variables is a crucial step in preparing data for machine


learning models. This presentation will explore the different encoding
techniques, their applications, and the trade-offs to consider when choosing
the right method.
Nominal Vs. Ordinal Encoding
Nominal Categorical Variables Ordinal Categorical Variables

Variables with unordered categories, like gender Variables with ordered categories, like education
or state, where the order of the categories level, where the rank of the categories is
doesn't matter. important.
One-Hot Encoding
1 For Nominal Variables 2 Avoids Dummy Variable Trap
Creates new binary columns, one for Skip the last column to avoid perfect
each unique category, indicating the multicollinearity.
presence or absence of each category.

3 Downsides
High dimensionality for datasets with many categories.
Label Encoding
For Ordinal Variables Preserves Ordering
Assign numerical labels to ordered categories Maintains the inherent order of the categories.
based on their rank.

Caution Combination
Assumes equal distance between categories, Can be combined with other techniques like
which may not always be the case. target encoding for better performance.
Target Encoding
1 For Ordinal Variables
Replace categories with the mean or median of the target variable for each category.

2 Captures Relationship
Exploits the relationship between the categorical variable and the target variable.

3 Caution
Prone to overfitting, so cross-validation is essential.
Mean Encoding

For Nominal Variables Captures Relationship Caution


Replace categories with the Effectively captures the May lead to overfitting, so
mean of the target variable for relationship between the cross-validation and
each category. categorical variable and the regularization are important.
target variable.
Ordinal Encoding for Nominal Variables

Step 1
Calculate the mean/median of the target variable for each nominal category.

Step 2
Rank the categories based on the calculated mean/median values.

Step 3
Assign numerical labels to the categories based on their rank.
Choosing the Right Encoding

Model Interpretability Dimensionality Overfitting


Performance
Consider the Avoid high- Employ cross-
Select the encoding interpretability of the dimensional validation and
that maximizes model encoded features for encodings that can regularization to
accuracy and better insights. lead to the curse of mitigate the risk of
generalization. dimensionality. overfitting.

You might also like