Understanding Loss Functions
A Step-by-Step Guide with Examples
Sridhar
August 6, 2025
Introduction
Loss functions measure how well a machine learning model’s predictions match the actual values. This guide
explains four important loss functions with detailed examples.
1 Mean Error (ME)
1.1 Formula
n
1X
ME = (yi − ŷi )
n i=1
1.2 Properties
• Measures average difference between predicted and actual values
• Can be positive or negative
• Simple to compute but sensitive to outliers
1.3 Example Problem
Given:
Actual values y = [3, 5, 2.5, 7]
Predicted values ŷ = [2.5, 5.1, 2, 7.8]
Calculate the Mean Error.
Solution:
Errors = [3 − 2.5, 5 − 5.1, 2.5 − 2, 7 − 7.8]
= [0.5, −0.1, 0.5, −0.8]
0.5 + (−0.1) + 0.5 + (−0.8)
ME =
4
0.1
= = 0.025
4
2 Mean Squared Error (MSE)
2.1 Formula
n
1X
MSE = (yi − ŷi )2
n i=1
1
2.2 Properties
• Always non-negative
• Penalizes larger errors more severely
• Sensitive to outliers
2.3 Example Problem
Using the same values:
y = [3, 5, 2.5, 7]
ŷ = [2.5, 5.1, 2, 7.8]
Calculate the MSE.
Solution:
Squared Errors = [0.52 , (−0.1)2 , 0.52 , (−0.8)2 ]
= [0.25, 0.01, 0.25, 0.64]
0.25 + 0.01 + 0.25 + 0.64
MSE =
4
1.15
= = 0.2875
4
3 Mean Absolute Error (MAE)
3.1 Formula
n
1X
MAE = |yi − ŷi |
n i=1
3.2 Properties
• Always non-negative
• Less sensitive to outliers than MSE
• Linear penalty for errors
3.3 Example Problem
Using the same values:
y = [3, 5, 2.5, 7]
ŷ = [2.5, 5.1, 2, 7.8]
Calculate the MAE.
Solution:
Absolute Errors = [|0.5|, | − 0.1|, |0.5|, | − 0.8|]
= [0.5, 0.1, 0.5, 0.8]
0.5 + 0.1 + 0.5 + 0.8
MAE =
4
1.9
= = 0.475
4
2
4 Huber Loss
4.1 Formula
(
1
− ŷ)2
2 (y for |y − ŷ| ≤ δ
Lδ (y, ŷ) =
δ|y − ŷ| − 12 δ 2 otherwise
4.2 Properties
• Combines MSE and MAE properties
• Less sensitive to outliers than MSE
• Requires choosing a delta parameter
4.3 Example Problem
Using δ = 1 and:
y = [3, 5, 2.5, 7]
ŷ = [2.5, 5.1, 2, 7.8]
Calculate the Huber Loss.
Solution:
Errors = [0.5, −0.1, 0.5, −0.8]
Absolute Errors = [0.5, 0.1, 0.5, 0.8]
For δ = 1 :
1
L(3, 2.5) =(0.5)2 = 0.125 (since 0.5 ≤ 1)
2
1
L(5, 5.1) = (−0.1)2 = 0.005 (since 0.1 ≤ 1)
2
1
L(2.5, 2) = (0.5)2 = 0.125 (since 0.5 ≤ 1)
2
1
L(7, 7.8) = 1 × 0.8 − (1)2 = 0.3 (since 0.8 ≤ 1)
2
Total Loss = 0.125 + 0.005 + 0.125 + 0.3 = 0.555
0.555
Average Loss = = 0.13875
4
Summary Table
Loss Function Formula Properties Example Result
1
P
Mean Error (ME) n P(yi − ŷi ) Can be negative, simple 0.025
1 2
Mean Squared Error (MSE) n P(yi − ŷi ) Sensitive to outliers 0.2875
1
Mean Absolute Error (MAE) n |yi − ŷi | Robust to outliers 0.475
Huber Loss Combination of MSE/MAE Adjustable robustness 0.13875 (δ = 1)
Conclusion
Key points about loss functions:
• ME shows direction of error but is rarely used alone
3
• MSE emphasizes large errors (sensitive to outliers)
• MAE treats all errors equally (more robust)
• Huber Loss provides a balance between MSE and MAE
The choice of loss function depends on your specific problem and tolerance for outliers.