Loss Functions in Deep Learning
Loss functions in deep learning are used to measure how well a neural
network model performs a certain task. They quantify how well or poorly a
model is performing by calculating the difference between predicted
values and actual values. This difference, or "loss," guides the
optimization process to improve model accuracy.
We are familiar by now with the mathematical operations which are
happening inside a neural network. Basically, there are just two: >Forward
Propagation
>Backpropagation with Gradient Descent .While forward propagation
refers to the computational process of predicting an output for a given
input vector x, backpropagation and gradient descent describes the
process of improving the weights and biases of the network in order to
make better predictions.
Categories of Loss Functions
The loss function estimates how well a particular algorithm models the
provided data. Loss functions are classified into two classes based on the
type of learning task
● Regression Models: predict continuous values.
● Classification Models: predict the output from a set of finite
categorical values.
Regression Loss Functions in Machine Learning
Regression tasks involve predicting continuous values, such as house prices
or temperatures. Here are some commonly used loss functions for regression:
1. Mean Squared Error (MSE)
It is the Mean of Square of Residuals for all the datapoints in the dataset.
Residuals is the difference between the actual and the predicted prediction
by the model. In machine learning, squaring the residuals is crucial to handle
both positive and negative errors effectively. Since normal errors can be
either positive or negative, summing them up might result in a net error of
zero, misleading the model to believe it is performing well, even when it is
not. To avoid this, we square the residuals, converting all values to positive,
which gives a true representation of the model’s performance.
Squaring also has the added benefit of assigning more weight to larger
errors, meaning that when the cost function is far from its minimal value, the
model is penalized more heavily for larger mistakes, helping it converge to
the minimal value faster.
The Mean Squared Error (MSE) is a common loss function in machine
learning where the mean of the squared residuals is taken rather than just
the sum. This ensures that the loss function is independent of the number of
data points in the training set, making the metric more reliable across
datasets of varying sizes. However, MSE is sensitive to outliers,
2. Mean Absolute Error (MAE) / La Loss
The Mean Absolute Error (MAE) is a commonly used loss function in
machine learning that calculates the mean of the absolute values of the
residuals for all datapoints in the dataset.
● The absolute value of the residuals is taken to convert any negative
differences into positive values, ensuring that all errors are treated
equally.
● Taking the mean makes the loss function independent of the
number of datapoints in the training set, allowing it to provide a
consistent measure of error across datasets of different sizes.
One key advantage of MAE is that it is robust to outliers, meaning that
extreme values do not disproportionately affect the overall error calculation.
2. Classification Loss Functions in Machine Learning
1. Cross-Entropy Loss
In classification tasks, we are dealing with predictions of probabilities.
Meaning the output of a neural network must be in a range between 0 and 1.
A loss function that can measure the error between a predicted probability
and the label which represents the actual class is called the Log –loss
/crossentropy loss function.This loss function measures how well the
predicted probabilities match the actual labels.
The cross-entropy loss increases as the predicted probability diverges from
the true label. In simpler terms, the farther the model's prediction is from the
actual class, the higher the loss. This makes cross-entropy loss an essential
tool for improving the accuracy of classification models by minimizing the
difference between the predicted and actual labels.
You should always use cross-entropy loss if probabilities are involved i.e if
you are doing some kind of classification.
Choosing the Right Loss Function
The choice of a loss function in machine learning is influenced by several
key factors:
1. Nature of the Task: Determine whether you are dealing with
regression or classification problems.
2. Presence of Outliers: Consider how outliers in your dataset may
impact your decision; some loss functions (e.g., Mean Absolute
Error (MAE) and Huber loss) are more robust to outliers than
others.
3. Model Complexity: Simpler models may benefit from more
straightforward loss functions, such as Mean Squared Error (MSE)
or Cross-Entropy.
4. Interpretability: Some loss functions provide more intuitive
explanations than others, making them easier to understand in
practice.