Hinge Loss
Consider 𝑦 to be the actual label (-1 or 1)
The 0/1 Loss and 𝑦ො to be the prediction.
Let’s try to multiply the two together: 𝑦𝑦ො
If the label is -1 and the prediction is -1:
-1(-1) = +1 Positive
If we follow the graph, any positive will
give us 0 loss.
If the label is +1 and the prediction is +1:
+1(+1) = +1 Positive
If we follow the graph, any positive will
give us 0 loss.
If the label is -1 and the prediction is +1:
-1(+1) = -1 Negative
If we follow the graph, any negative will
give us 1 loss.
If the label is +1 and the prediction is -1:
+1(-1) = -1 Negative
If we follow the graph, any negative will
give us 1 loss.
Rather than penalizing with 1, we make the penalization linear/proportional to the error
What if we include a margin of 1? We can introduce
confidence to the model! We can optimize until a margin,
rather than not penalizing for any positive prediction.
Margin
When signs match (-)(-) = (+)(+) = + Correct Classification and no loss
When signs don’t match (-)(+) = (+)(-) = - Wrong Classification and loss
Consider the plot of 1-x
Hinge Loss
• A marginal loss, usually used for SVMs
• Used when labels are [-1,1]
• It penalizes not only wrong predictions, but correct predictions which
are not confident enough.
• Faster than cross entropy but accuracy is degraded
For all samples: max(0,1 − 𝑦 ∗ 𝑦)
ො
Where 𝑦 is the actual label (-1 or 1) and 𝑦ො is the prediction.
The loss is 0 when the signs of the label and prediction match.
Consider the prediction when the actual label is -1:
max[0, 1-(-1*0.3)] = max[0, 1.3] = 1.3 Loss is high
max[0, 1-(-1*-0.8)] = max[0, 0.2] = 0.2 Loss is low
max[0, 1-(-1*-1.1)] = max[0, -0.1] = 0 No Loss! 1.3
max[0, 1-(-1*-1)] = max[0, 0] = 0 No Loss!
max[0, 1-(-1*1)] = max[0, 2] = 2 Loss is very high
-0.3
max(0,1 − 𝑦 ∗ 𝑦)
ො
Where 𝑦 is the actual label (-1 or 1) and 𝑦ො is the prediction.
The loss is 0 when the signs of the label and prediction match.
Consider the prediction when the actual label is +1:
max[0, 1-(1*-0.3)] = max[0, 1.3] = 1.3 Loss is high
max[0, 1-(1*0.8)] = max[0, 0.2] = 0.2 Loss is low
max[0, 1-(1*1.1)] = max[0, -0.1] = 0 No Loss!
max[0, 1-(1*1)] = max[0, 0] = 0 No Loss!
max[0, 1-(1*-1)] = max[0, 2] = 2 Loss is very high