0% found this document useful (0 votes)
44 views9 pages

Hinge Loss

Hinge loss is a loss function primarily used in support vector machines (SVMs) that penalizes incorrect predictions and those that lack confidence, with labels represented as -1 or 1. The loss is calculated using the formula max(0, 1 - y * ŷ), where y is the actual label and ŷ is the prediction, resulting in zero loss when the signs of the label and prediction match. Hinge loss is faster than cross-entropy but may lead to degraded accuracy.

Uploaded by

Cát Lăng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views9 pages

Hinge Loss

Hinge loss is a loss function primarily used in support vector machines (SVMs) that penalizes incorrect predictions and those that lack confidence, with labels represented as -1 or 1. The loss is calculated using the formula max(0, 1 - y * ŷ), where y is the actual label and ŷ is the prediction, resulting in zero loss when the signs of the label and prediction match. Hinge loss is faster than cross-entropy but may lead to degraded accuracy.

Uploaded by

Cát Lăng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Hinge Loss

Consider 𝑦 to be the actual label (-1 or 1)


The 0/1 Loss and 𝑦ො to be the prediction.
Let’s try to multiply the two together: 𝑦𝑦ො
If the label is -1 and the prediction is -1:
-1(-1) = +1  Positive
If we follow the graph, any positive will
give us 0 loss.
If the label is +1 and the prediction is +1:
+1(+1) = +1  Positive
If we follow the graph, any positive will
give us 0 loss.

If the label is -1 and the prediction is +1:


-1(+1) = -1  Negative
If we follow the graph, any negative will
give us 1 loss.
If the label is +1 and the prediction is -1:
+1(-1) = -1  Negative
If we follow the graph, any negative will
give us 1 loss.
Rather than penalizing with 1, we make the penalization linear/proportional to the error
What if we include a margin of 1? We can introduce
confidence to the model! We can optimize until a margin,
rather than not penalizing for any positive prediction.

Margin
When signs match  (-)(-) = (+)(+) = +  Correct Classification and no loss
When signs don’t match  (-)(+) = (+)(-) = -  Wrong Classification and loss
Consider the plot of 1-x
Hinge Loss
• A marginal loss, usually used for SVMs
• Used when labels are [-1,1]
• It penalizes not only wrong predictions, but correct predictions which
are not confident enough.
• Faster than cross entropy but accuracy is degraded
For all samples: ෍ max(0,1 − 𝑦 ∗ 𝑦)

Where 𝑦 is the actual label (-1 or 1) and 𝑦ො is the prediction.


The loss is 0 when the signs of the label and prediction match.

Consider the prediction when the actual label is -1:


max[0, 1-(-1*0.3)] = max[0, 1.3] = 1.3  Loss is high
max[0, 1-(-1*-0.8)] = max[0, 0.2] = 0.2  Loss is low
max[0, 1-(-1*-1.1)] = max[0, -0.1] = 0  No Loss! 1.3

max[0, 1-(-1*-1)] = max[0, 0] = 0  No Loss!


max[0, 1-(-1*1)] = max[0, 2] = 2  Loss is very high
-0.3
෍ max(0,1 − 𝑦 ∗ 𝑦)

Where 𝑦 is the actual label (-1 or 1) and 𝑦ො is the prediction.


The loss is 0 when the signs of the label and prediction match.

Consider the prediction when the actual label is +1:


max[0, 1-(1*-0.3)] = max[0, 1.3] = 1.3  Loss is high
max[0, 1-(1*0.8)] = max[0, 0.2] = 0.2  Loss is low
max[0, 1-(1*1.1)] = max[0, -0.1] = 0  No Loss!
max[0, 1-(1*1)] = max[0, 0] = 0  No Loss!
max[0, 1-(1*-1)] = max[0, 2] = 2  Loss is very high

You might also like