0% found this document useful (0 votes)
15 views5 pages

Target Encoding

Target Encoding Notes

Uploaded by

Raj Narayanan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views5 pages

Target Encoding

Target Encoding Notes

Uploaded by

Raj Narayanan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Target Encoding

Here, we will overcome the disadvantages of Label Encoding, OHE.

- Label Encoding has the disadvantage of giving unwarranted order to categories and makes the
model think that some categories are superior to others when they’re actually not.

- One Hot Encoding has the issue of the Curse of Dimensionality. Whereby the number of features
will increase drastically if a feature has lot of categories.

Therefore, to overcome these difficulties and instead of just picking random


numbers to represent blue, red and green categories in the dataset on the
right, we can calculate the mean value of the Target for each option.

For e.g., of the 3 people that liked Blue color, only 1


liked Troll 2.
Therefore, the mean of Blue is 1/3 or 0.33.
Therefore, we replace Blue with 0.33.

Likewise, only 1 person likes Red. And they don’t like Troll 2. So, Red’s mean will be: 0/1 = 0

Like mean for Green will be: 2/3 = 0.67.

We will impute these values in place of Blue, Red, Green.


The dataset will now become:

Because we used the target’s help to determine the values for the categorical values, this method is
called Target Encoding. And this is the simplest type of Target Encoding.

A more common version of Target encoding talks about the fact that only one person, liked Red and
hence its data is insufficient to predict the Red’s values. But in contrast, Blue and Green have 3
people each supporting the values we used to replace them.

Because less data supports the value we replaced Red with, we have less confidence that we
replaced Red with the best value than we have for Blue and Green.

So, in order to deal with this, Target Encoding usually is done using a Weighted Mean that combines
the mean for a specific option, like Red, with the overall mean of the Target.
n ( option ' s mean ) +m(overall mean)
Weighted Mean=
n+m
Where, n = weight of option’s mean (usually the number of rows)
m = weight of overall mean (user defined)

Let’s use this formula:


 We start by plugging in the mean of Target for color Blue, which is 1/3.
Hence, Option’s mean = 1/3
- Now since 3 people were used to calculate mean of Blue, we use n = 3.
- Then we plug in the overall mean of the target = 3/7 because overall 3 people like Troll 2
- Let’s set m = 2.
Setting m = 2 means we need at least 3 rows of data before the Option Mean, the mean we
calculated for Blue, becomes more important than the Overall Mean.
Target Encoding 1
When we substitute the values and do the math, we get:

Weighted Mean Blue=


3 ( 13 )+ 2( 73 ) =0.37
3+ 2
Therefore, the dataset becomes:

 Now we can calculate the weighted mean of Red:


- Mean of Red: 0/1 (0 because no one liked Troll 2 and 1 because there’s only 1 sample)
- n = 1 (as there’s only one sample)
- Overall mean = 3/7
- m =2 (user defined)

Weighted Mean Red =


1 ( 01 )+2( 37 ) =0.29
1+2

 Now we can calculate the weighted mean of Green:


- Mean of Red: 2/3 (2 because 2 liked Troll 2 and 3 because there’re 3 samples)
- n = 3 (as there’re 3 samples)
- Overall mean = 3/7
- m =2 (user defined)

Weighted Mean Green=


3 ( 23 )+2( 37 )=0.57
3+2

The dataset becomes:

 Now let’s compare Target Encoding with weighted mean with Target Encoding without weighted
mean.

And this makes sense


because Blue and Green
had a relatively large
amount of data

And this also make sense


because we have so little
data for Red, only one
Target Encoding 2 row.
In a way, we can consider overall mean as the best guess, given no data.
However, as we get more data (more rows for each option), we use the data more, rather than
our best guess, to determine the Target Encoding.

NOTE: If you are familiar with Bayesian methods, this approach may look familiar because a lot
of Bayesian methods boil down calculating a weighted average between a guess and the data.
As a result, some call this Bayesian Mean Encoding.

 NOTE: It may be noted that we’re using the Target to modify the values in ‘ Favorite Color’
feature. This may lead to Data-Leakage.
Data Leakage results in models that work great with training data, but not so well with testing
data. In other words, Data Leakage results in models that are Overfit.
The good news is that there are a bunch of relatively simple ways to avoid Data Leakage, or at
least reduce the amount of Data Leakage, so that you can use Target Encoding without
overfitting your model.

One of the most popular methods to reduce leakage is called K-FoId Target Encoding.

 To understand more about the K-Fold Target Encoding, let’s consider the
initial dataset:
NOTE: The ‘Fold’ in K-fold Target Encoding refers to splitting the data into
equal sized subsets and ‘K’ refers to how many datasets we create.

Here let’s do 2-Fold Target Encoding. In this case, we would split the dataset
into 2 equal subsets (at least as equal as possible)

 Now to Target Encode Blue in Subset A, we ignore all the Target


values in subset A and instead use the Target values from Subset B
into the weighted mean equation:

n ( option ' s mean ) +m(overall mean)


Weighted Mean=
n+m
Here,
- Mean of option will be Subset B’s Blue Option: 0/1 (as the one person does not like Troll 2)
- n: 1 (as there is only one Blue Person)
- Overall mean: 1/3 (only 1 likes Troll 2 from a total of 3)
- m: 2

Hence, the equation will be:

Weighted Mean Blue =


1 ( 01 )+2( 13 ) =0.22
¿A
1+2

 Now to Target Encode Blue in Subset B, we ignore all the Target values in subset B and instead
use the Target values from Subset A into the weighted mean equation:
Here,
- Mean of option (Subset A’s Blues): 1/2
- n: 2
- Overall Mean: 2/4
- m: 2

Weighted Mean Blue =


2( 2) ( 4 )
1
+2
2
=0.5
¿B
2+2

Target Encoding 3
NOTE: It may be noted that different subsets have different values for Blue. This is alright since
Favorite color is becoming a continuous variable like Height.

 Now to Target Encode Red in Subset A, we ignore all the Target values in subset A and instead
use the Target values from Subset B into the weighted mean equation:
Here,
- Mean of option (Subset B’s Reds): 0 (no record of Red in subset B)
- n: 0 (no record of Red in subset B)
- Overall mean: 1/3 (only 1 likes Troll 2 from 3)
- m: 2

Weighted Mean Red =


0 ( 0 ) +2 ( 13 ) =0. 33
¿A
0+2

 Now to Target Encode Green in Subset A, we ignore all the Target values in subset A and
instead use the Target values from Subset B into the weighted mean equation:
Here,
- Mean of option (subset B’s Greens): 1/2 (only 1 likes Troll 2 from 2)
- n: 2 (total number of Greens)
- Overall Mean: 1/3 (only 1 likes Troll 2 from 3)
- m: 2

Weighted Mean Green =


2( 2) (3 )
1
+2
1
=0. 42
¿A
2+ 2

 Now to Target Encode Green in Subset B, we ignore all the Target values in subset B and instead
use the Target values from Subset A into the weighted mean equation:
Here,
- Mean of option (subset A’s Greens): 1/1 (only 1 Green in Subset A and he/she likes Troll 2)
- n: 1
- Overall Mean: 2/4 (overall 2 like Troll2 from total 4 people)
- m: 2

Weighted Mean Green =


1 ( 11 )+2( 24 ) =0. 67
¿B
1+2

 With the encoding done, we merge the datasets together; the


dataset will look like:

 NOTE: This process reduces Data Leakage because the rows do


not use their own target values to calculate their encoding.

 Now, if we set K = 7 instead of 2, then we would divide the


dataset into 7 subsets.

 Now, Target Encoding the 1st subset which consists of a single row
with Favorite Color = Blue means we ignore its Target Value and
use the Target Values from all the subsets to calculate the Weighted
Mean.
Likewise, for encoding the other Subsets, we would use all of the other
Target values except their own.

Target Encoding 4
 NOTE: When we use all of the Target values except one to Target encode, it is called Leave-
One-Out Target Encoding (LOO).
Some are successful with Leave-One-Out Target Encoding and others are successful with K=5.

Target Encoding 5

You might also like