0% found this document useful (0 votes)

15 views5 pages

Target Encoding

Target Encoding Notes

Uploaded by

Raj Narayanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views5 pages

Target Encoding

Target Encoding Notes

Uploaded by

Raj Narayanan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Target Encoding

Here, we will overcome the disadvantages of Label Encoding, OHE.

- Label Encoding has the disadvantage of giving unwarranted order to categories and makes the
model think that some categories are superior to others when they’re actually not.

- One Hot Encoding has the issue of the Curse of Dimensionality. Whereby the number of features
will increase drastically if a feature has lot of categories.

Therefore, to overcome these difficulties and instead of just picking random

numbers to represent blue, red and green categories in the dataset on the
right, we can calculate the mean value of the Target for each option.

For e.g., of the 3 people that liked Blue color, only 1

liked Troll 2.
Therefore, the mean of Blue is 1/3 or 0.33.
Therefore, we replace Blue with 0.33.

Likewise, only 1 person likes Red. And they don’t like Troll 2. So, Red’s mean will be: 0/1 = 0

Like mean for Green will be: 2/3 = 0.67.

We will impute these values in place of Blue, Red, Green.

The dataset will now become:

Because we used the target’s help to determine the values for the categorical values, this method is
called Target Encoding. And this is the simplest type of Target Encoding.

A more common version of Target encoding talks about the fact that only one person, liked Red and
hence its data is insufficient to predict the Red’s values. But in contrast, Blue and Green have 3
people each supporting the values we used to replace them.

Because less data supports the value we replaced Red with, we have less confidence that we
replaced Red with the best value than we have for Blue and Green.

So, in order to deal with this, Target Encoding usually is done using a Weighted Mean that combines
the mean for a specific option, like Red, with the overall mean of the Target.
n ( option ' s mean ) +m(overall mean)
Weighted Mean=
n+m
Where, n = weight of option’s mean (usually the number of rows)
m = weight of overall mean (user defined)

Let’s use this formula:

 We start by plugging in the mean of Target for color Blue, which is 1/3.
Hence, Option’s mean = 1/3
- Now since 3 people were used to calculate mean of Blue, we use n = 3.
- Then we plug in the overall mean of the target = 3/7 because overall 3 people like Troll 2
- Let’s set m = 2.
Setting m = 2 means we need at least 3 rows of data before the Option Mean, the mean we
calculated for Blue, becomes more important than the Overall Mean.
Target Encoding 1
When we substitute the values and do the math, we get:

Weighted Mean Blue=

3 ( 13 )+ 2( 73 ) =0.37
3+ 2
Therefore, the dataset becomes:

 Now we can calculate the weighted mean of Red:

- Mean of Red: 0/1 (0 because no one liked Troll 2 and 1 because there’s only 1 sample)
- n = 1 (as there’s only one sample)
- Overall mean = 3/7
- m =2 (user defined)

Weighted Mean Red =

1 ( 01 )+2( 37 ) =0.29
1+2

 Now we can calculate the weighted mean of Green:

- Mean of Red: 2/3 (2 because 2 liked Troll 2 and 3 because there’re 3 samples)
- n = 3 (as there’re 3 samples)
- Overall mean = 3/7
- m =2 (user defined)

Weighted Mean Green=

3 ( 23 )+2( 37 )=0.57
3+2

The dataset becomes:

 Now let’s compare Target Encoding with weighted mean with Target Encoding without weighted
mean.

And this makes sense

because Blue and Green
had a relatively large
amount of data

And this also make sense

because we have so little
data for Red, only one
Target Encoding 2 row.
In a way, we can consider overall mean as the best guess, given no data.
However, as we get more data (more rows for each option), we use the data more, rather than
our best guess, to determine the Target Encoding.

NOTE: If you are familiar with Bayesian methods, this approach may look familiar because a lot
of Bayesian methods boil down calculating a weighted average between a guess and the data.
As a result, some call this Bayesian Mean Encoding.

 NOTE: It may be noted that we’re using the Target to modify the values in ‘ Favorite Color’
feature. This may lead to Data-Leakage.
Data Leakage results in models that work great with training data, but not so well with testing
data. In other words, Data Leakage results in models that are Overfit.
The good news is that there are a bunch of relatively simple ways to avoid Data Leakage, or at
least reduce the amount of Data Leakage, so that you can use Target Encoding without
overfitting your model.

One of the most popular methods to reduce leakage is called K-FoId Target Encoding.

 To understand more about the K-Fold Target Encoding, let’s consider the
initial dataset:
NOTE: The ‘Fold’ in K-fold Target Encoding refers to splitting the data into
equal sized subsets and ‘K’ refers to how many datasets we create.

Here let’s do 2-Fold Target Encoding. In this case, we would split the dataset
into 2 equal subsets (at least as equal as possible)

 Now to Target Encode Blue in Subset A, we ignore all the Target

values in subset A and instead use the Target values from Subset B
into the weighted mean equation:

n ( option ' s mean ) +m(overall mean)

Weighted Mean=
n+m
Here,
- Mean of option will be Subset B’s Blue Option: 0/1 (as the one person does not like Troll 2)
- n: 1 (as there is only one Blue Person)
- Overall mean: 1/3 (only 1 likes Troll 2 from a total of 3)
- m: 2

Hence, the equation will be:

Weighted Mean Blue =

1 ( 01 )+2( 13 ) =0.22
¿A
1+2

 Now to Target Encode Blue in Subset B, we ignore all the Target values in subset B and instead
use the Target values from Subset A into the weighted mean equation:
Here,
- Mean of option (Subset A’s Blues): 1/2
- n: 2
- Overall Mean: 2/4
- m: 2

Weighted Mean Blue =

2( 2) ( 4 )
1
+2
2
=0.5
¿B
2+2

Target Encoding 3
NOTE: It may be noted that different subsets have different values for Blue. This is alright since
Favorite color is becoming a continuous variable like Height.

 Now to Target Encode Red in Subset A, we ignore all the Target values in subset A and instead
use the Target values from Subset B into the weighted mean equation:
Here,
- Mean of option (Subset B’s Reds): 0 (no record of Red in subset B)
- n: 0 (no record of Red in subset B)
- Overall mean: 1/3 (only 1 likes Troll 2 from 3)
- m: 2

Weighted Mean Red =

0 ( 0 ) +2 ( 13 ) =0. 33
¿A
0+2

 Now to Target Encode Green in Subset A, we ignore all the Target values in subset A and
instead use the Target values from Subset B into the weighted mean equation:
Here,
- Mean of option (subset B’s Greens): 1/2 (only 1 likes Troll 2 from 2)
- n: 2 (total number of Greens)
- Overall Mean: 1/3 (only 1 likes Troll 2 from 3)
- m: 2

Weighted Mean Green =

2( 2) (3 )
1
+2
1
=0. 42
¿A
2+ 2

 Now to Target Encode Green in Subset B, we ignore all the Target values in subset B and instead
use the Target values from Subset A into the weighted mean equation:
Here,
- Mean of option (subset A’s Greens): 1/1 (only 1 Green in Subset A and he/she likes Troll 2)
- n: 1
- Overall Mean: 2/4 (overall 2 like Troll2 from total 4 people)
- m: 2

Weighted Mean Green =

1 ( 11 )+2( 24 ) =0. 67
¿B
1+2

 With the encoding done, we merge the datasets together; the

dataset will look like:

 NOTE: This process reduces Data Leakage because the rows do

not use their own target values to calculate their encoding.

 Now, if we set K = 7 instead of 2, then we would divide the

dataset into 7 subsets.

 Now, Target Encoding the 1st subset which consists of a single row
with Favorite Color = Blue means we ignore its Target Value and
use the Target Values from all the subsets to calculate the Weighted
Mean.
Likewise, for encoding the other Subsets, we would use all of the other
Target values except their own.

Target Encoding 4
 NOTE: When we use all of the Target values except one to Target encode, it is called Leave-
One-Out Target Encoding (LOO).
Some are successful with Leave-One-Out Target Encoding and others are successful with K=5.

Target Encoding 5

Categorical Variable Encoding Techniques
No ratings yet
Categorical Variable Encoding Techniques
25 pages
Comparing Categorical Encoding Methods
No ratings yet
Comparing Categorical Encoding Methods
11 pages
7 - InnovatiCS - Categorical Data & Data Transformation
No ratings yet
7 - InnovatiCS - Categorical Data & Data Transformation
20 pages
Categorical Variable Encoding Guide
No ratings yet
Categorical Variable Encoding Guide
21 pages
Mastering Categorical Encoding
No ratings yet
Mastering Categorical Encoding
8 pages
Lecture 5 Encoding
No ratings yet
Lecture 5 Encoding
35 pages
Data Preparation for Machine Learning
No ratings yet
Data Preparation for Machine Learning
45 pages
One-Hot Encoding for Categorical Data
No ratings yet
One-Hot Encoding for Categorical Data
4 pages
What Are Categorical Data Encoding Methods - Binary Encoding
No ratings yet
What Are Categorical Data Encoding Methods - Binary Encoding
14 pages
L7 - Categorical Data - Encoding - Preprocessing - NCU
No ratings yet
L7 - Categorical Data - Encoding - Preprocessing - NCU
23 pages
Feature Encoding
No ratings yet
Feature Encoding
5 pages
TP4-ML-features Encoding
No ratings yet
TP4-ML-features Encoding
4 pages
L1 - Data Pre-Processing & Steps of Building A Model
No ratings yet
L1 - Data Pre-Processing & Steps of Building A Model
30 pages
Exp 6
No ratings yet
Exp 6
9 pages
Regularized Target Encoding Outperforms Traditional Methods in Supervised Machine Learning With High Cardinality Features
No ratings yet
Regularized Target Encoding Outperforms Traditional Methods in Supervised Machine Learning With High Cardinality Features
22 pages
Label Encoding in Machine Learning
No ratings yet
Label Encoding in Machine Learning
11 pages
Categorical Encoding: Label vs One-Hot
No ratings yet
Categorical Encoding: Label vs One-Hot
9 pages
Lab Manual 5 Solved 40
No ratings yet
Lab Manual 5 Solved 40
13 pages
Handling Categorical Data in ML
No ratings yet
Handling Categorical Data in ML
18 pages
Handling Categorical Variables in Python
No ratings yet
Handling Categorical Variables in Python
8 pages
ML-Lab05-Data Preprocessing Techniques in Python
No ratings yet
ML-Lab05-Data Preprocessing Techniques in Python
7 pages
Dealing With Categorical
No ratings yet
Dealing With Categorical
25 pages
Data Preparation For ML in Practice v213
No ratings yet
Data Preparation For ML in Practice v213
78 pages
Lab 6
No ratings yet
Lab 6
6 pages
Data Preparation.2
No ratings yet
Data Preparation.2
18 pages
Handling Categorical Variables in Ensemble Algorithms 2
No ratings yet
Handling Categorical Variables in Ensemble Algorithms 2
18 pages
Encoding Presentation
No ratings yet
Encoding Presentation
8 pages
Encoding Presentation
No ratings yet
Encoding Presentation
8 pages
Encoding Categorical Data
No ratings yet
Encoding Categorical Data
4 pages
Practical 3 - Categorical Feature Engineering
No ratings yet
Practical 3 - Categorical Feature Engineering
6 pages
Dealing With Categorical Data
No ratings yet
Dealing With Categorical Data
14 pages
Feature Engineering Techniques in Data Science
100% (2)
Feature Engineering Techniques in Data Science
76 pages
Ex 3
No ratings yet
Ex 3
11 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
Working With Pre (Rocessing Data Files
No ratings yet
Working With Pre (Rocessing Data Files
4 pages
Unit II
No ratings yet
Unit II
119 pages
Lab 6
No ratings yet
Lab 6
7 pages
Encoding Methods for Categorical Data
No ratings yet
Encoding Methods for Categorical Data
2 pages
Vi Int368 Ml-I
No ratings yet
Vi Int368 Ml-I
2 pages
Data Analysis and Machine Learning Essentials
No ratings yet
Data Analysis and Machine Learning Essentials
14 pages
Week 10
No ratings yet
Week 10
50 pages
Encoding Notes
No ratings yet
Encoding Notes
4 pages
S3 Data Processing and Classification
No ratings yet
S3 Data Processing and Classification
25 pages
Feature Engineering for Regression Models
No ratings yet
Feature Engineering for Regression Models
23 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
CS-3035 (ML) - CS Mid Feb 2024
No ratings yet
CS-3035 (ML) - CS Mid Feb 2024
8 pages
KNN Practical Debasmita Datta
No ratings yet
KNN Practical Debasmita Datta
6 pages
DS 1
No ratings yet
DS 1
20 pages
Categorical Variables Explained
No ratings yet
Categorical Variables Explained
3 pages
Predictive ModellingAnalytics
No ratings yet
Predictive ModellingAnalytics
27 pages
Machine Learning
No ratings yet
Machine Learning
81 pages
Machine Learning for Data Scientists
No ratings yet
Machine Learning for Data Scientists
41 pages
AI&ML
No ratings yet
AI&ML
9 pages
Data Processing
No ratings yet
Data Processing
19 pages
Data Preprocessing for ML Pipelines
No ratings yet
Data Preprocessing for ML Pipelines
68 pages
Castel Cool 2019 V1 - 0
No ratings yet
Castel Cool 2019 V1 - 0
56 pages
OD and Change Managment Course Outline
No ratings yet
OD and Change Managment Course Outline
8 pages
Game Theory & Decision Making
No ratings yet
Game Theory & Decision Making
18 pages
Guilt and Culture in Orwell's Essay
No ratings yet
Guilt and Culture in Orwell's Essay
6 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
6 pages
Complete Fundamentals of Nursing 8th Edition Potter HQ File Verified
No ratings yet
Complete Fundamentals of Nursing 8th Edition Potter HQ File Verified
315 pages
3D Analogue Clock Project
100% (1)
3D Analogue Clock Project
2 pages
Spa Grauvellaquasports Manu FF1108
No ratings yet
Spa Grauvellaquasports Manu FF1108
2 pages
Design Thinking Strategy Guide
No ratings yet
Design Thinking Strategy Guide
23 pages
3.7 Solving Multi Step Equations DONE
No ratings yet
3.7 Solving Multi Step Equations DONE
12 pages
Subspace Analysis in R2 and R3
No ratings yet
Subspace Analysis in R2 and R3
3 pages
Syllabus For Annual Exam IX 2025
No ratings yet
Syllabus For Annual Exam IX 2025
5 pages
Research Instruments Guide
No ratings yet
Research Instruments Guide
3 pages
Report On Coal Resources of The Philippines, November 1985
No ratings yet
Report On Coal Resources of The Philippines, November 1985
51 pages
Forest Mensuration Techniques Guide
No ratings yet
Forest Mensuration Techniques Guide
29 pages
Pattern Language For Game Design
100% (9)
Pattern Language For Game Design
503 pages
Quest 4
No ratings yet
Quest 4
33 pages
Monitoring-Tool-Home-Likas (1) (Malitlit Es)
No ratings yet
Monitoring-Tool-Home-Likas (1) (Malitlit Es)
6 pages
Soal Bahasa Inggris Sma Kelas Xii Semester I
No ratings yet
Soal Bahasa Inggris Sma Kelas Xii Semester I
5 pages
IGS-NT 4.3.1 New Features List
No ratings yet
IGS-NT 4.3.1 New Features List
33 pages
Soql Cheatsheet: Presented by
No ratings yet
Soql Cheatsheet: Presented by
10 pages
Information Management: Prof. Sara Foresti, Prof. Giovanni Livraga
No ratings yet
Information Management: Prof. Sara Foresti, Prof. Giovanni Livraga
2 pages
Task Exercises
No ratings yet
Task Exercises
3 pages
Asiarope 2025
No ratings yet
Asiarope 2025
2 pages
? HPLC Operation
No ratings yet
? HPLC Operation
14 pages
Lesson Plan Philippine Archipelago Formation Grade11
No ratings yet
Lesson Plan Philippine Archipelago Formation Grade11
3 pages
Coal Combustion in Cement Kilns
No ratings yet
Coal Combustion in Cement Kilns
62 pages
Pharmacy
No ratings yet
Pharmacy
19 pages
Inquiry Training Model
100% (1)
Inquiry Training Model
22 pages

Target Encoding

Uploaded by

Target Encoding

Uploaded by

Target Encoding

Here, we will overcome the disadvantages of Label Encoding, OHE.

Therefore, to overcome these difficulties and instead of just picking random

For e.g., of the 3 people that liked Blue color, only 1

Like mean for Green will be: 2/3 = 0.67.

We will impute these values in place of Blue, Red, Green.

Let’s use this formula:

Weighted Mean Blue=

 Now we can calculate the weighted mean of Red:

Weighted Mean Red =

 Now we can calculate the weighted mean of Green:

Weighted Mean Green=

The dataset becomes:

And this makes sense

And this also make sense

 Now to Target Encode Blue in Subset A, we ignore all the Target

n ( option ' s mean ) +m(overall mean)

Hence, the equation will be:

Weighted Mean Blue =

Weighted Mean Blue =

Weighted Mean Red =

Weighted Mean Green =

Weighted Mean Green =

 With the encoding done, we merge the datasets together; the

 NOTE: This process reduces Data Leakage because the rows do

 Now, if we set K = 7 instead of 2, then we would divide the

You might also like