0% found this document useful (0 votes)
2 views8 pages

Regularization in Machine Learning

Regularization is a machine learning technique used to prevent overfitting by adding extra information to models, helping them generalize better to unseen data. The two main types of regularization techniques are Ridge Regression, which reduces model complexity by shrinking coefficients, and Lasso Regression, which not only reduces complexity but also performs feature selection by potentially eliminating some features. Both techniques modify the cost function of models to achieve better predictive accuracy and reduce overfitting.

Uploaded by

deepa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views8 pages

Regularization in Machine Learning

Regularization is a machine learning technique used to prevent overfitting by adding extra information to models, helping them generalize better to unseen data. The two main types of regularization techniques are Ridge Regression, which reduces model complexity by shrinking coefficients, and Lasso Regression, which not only reduces complexity but also performs feature selection by potentially eliminating some features. Both techniques modify the cost function of models to achieve better predictive accuracy and reduce overfitting.

Uploaded by

deepa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Regularization in Machine Learning

What is Regularization?
Regularization is one of the most important concepts of machine learning.
It is a technique to prevent the model from overfitting by adding extra
information to it.

Sometimes the machine learning model performs well with the training
data but does not perform well with the test data. It means the model is
not able to predict the output when deals with unseen data by introducing
noise in the output, and hence the model is called overfitted. This problem
can be deal with the help of a regularization technique.

This technique can be used in such a way that it will allow to maintain all
variables or features in the model by reducing the magnitude of the
variables. Hence, it maintains accuracy as well as a generalization of the
model.

It mainly regularizes or reduces the coefficient of features toward zero. In


simple words, "In regularization technique, we reduce the magnitude of
the features by keeping the same number of features."

Regularization refers to techniques used to calibrate machine


learning models to minimize the adjusted loss function and avoid
overfitting or underfitting.
Working of Regularization
Regularization works by adding a penalty or complexity term to the
complex model. Let's consider the simple linear regression equation:

y= β0+β1x1+β2x2+β3x3+⋯+βnxn +b

In the above equation, Y represents the value to be predicted

X1, X2, …Xn are the features for Y.

β0,β1,…..βn are the weights or magnitude attached to the features,


respectively. Here represents the bias of the model, and b represents the
intercept.

Linear regression models try to optimize the β0 and b to minimize the cost
function. The equation for the cost function for the linear model is given
below:

Now, we will add a loss function and optimize parameter to make the
model that can predict the accurate value of Y. The loss function for the
linear regression is called as RSS or Residual sum of squares.

Techniques of Regularization
There are mainly two types of regularization techniques, which are given
below:

o Ridge Regression
o Lasso Regression

Ridge Regression
o Ridge regression is one of the types of linear regression in which a small
amount of bias is introduced so that we can get better long-term
predictions.
o Ridge regression is a regularization technique, which is used to reduce the
complexity of the model. It is also called as L2 regularization.
o In this technique, the cost function is altered by adding the penalty term
to it. The amount of bias added to the model is called Ridge Regression
penalty. We can calculate it by multiplying with the lambda to the
squared weight of each individual feature.
o The equation for the cost function in ridge regression will be:

o In the above equation, the penalty term regularizes the coefficients of the
model, and hence ridge regression reduces the amplitudes of the
coefficients that decreases the complexity of the model.
o As we can see from the above equation, if the values of λ tend to zero,
the equation becomes the cost function of the linear regression
model. Hence, for the minimum value of λ, the model will resemble the
linear regression model.
o A general linear or polynomial regression will fail if there is high
collinearity between the independent variables, so to solve such problems,
Ridge regression can be used.
o It helps to solve the problems if we have more parameters than samples.

Also known as Ridge Regression, it modifies the over-fitted or under


fitted models by adding the penalty equivalent to the sum of the
squares of the magnitude of coefficients.

This means that the mathematical function representing our


machine learning model is minimized and coefficients are
calculated. The magnitude of coefficients is squared and added.
Ridge Regression performs regularization by shrinking the
coefficients present. The function depicted below shows the cost
function of ridge regression :
o

o Figure 7: Cost Function of Ridge


Regression
o In the cost function, the penalty term is represented by Lambda λ.
By changing the values of the penalty function, we are controlling
the penalty term. The higher the penalty, it reduces the magnitude
of coefficients. It shrinks the parameters. Therefore, it is used to
prevent multicollinearity, and it reduces the model complexity by
coefficient shrinkage.
o Consider the graph illustrated below which represents Linear
regression :

o
o Figure 8: Linear regression model

Cost function = Loss + λ x∑‖w‖^2


For Linear Regression line, let’s consider two points that are on the
line,
Loss = 0 (considering the two points on the line)
λ= 1
w = 1.4
Then, Cost function = 0 + 1 x 1.42
= 1.96
For Ridge Regression, let’s assume,
Loss = 0.32 + 0.22 = 0.13
λ=1
w = 0.7
Then, Cost function = 0.13 + 1 x 0.72
= 0.62

o
o Figure 9: Ridge regression model
o Comparing the two models, with all data points, we can see that the
Ridge regression line fits the model more accurately than the linear
regression line.

o
o Figure 10: Optimization of model fit using Ridge Regression
Lasso Regression:
o Lasso regression is another regularization technique to reduce the
complexity of the model. It stands for Least Absolute and Selection
Operator.
o It is similar to the Ridge Regression except that the penalty term contains
only the absolute weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas
Ridge Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for the cost function
of Lasso regression will be:

o Some of the features in this technique are completely neglected for model
evaluation.
o Hence, the Lasso regression can help us to reduce the overfitting in the
model as well as the feature selection.

It modifies the over-fitted or under-fitted models by adding the


penalty equivalent to the sum of the absolute values of coefficients.

Lasso regression also performs coefficient minimization, but


instead of squaring the magnitudes of the coefficients, it takes the
true values of coefficients. This means that the coefficient sum can
also be 0, because of the presence of negative coefficients. Consider
the cost function for Lasso regression :
Figure 11: Cost function for Lasso
Regression

We can control the coefficient values by controlling the penalty


terms, just like we did in Ridge Regression. Again consider a Linear
Regression model :

Figure 12: Linear Regression Model


Cost function = Loss + λ x ∑‖w‖
For Linear Regression line, let’s assume,
Loss = 0 (considering the two points on the line)
λ=1
w = 1.4
Then, Cost function = 0 + 1 x 1.4
= 1.4
For Ridge Regression, let’s assume,
Loss = 0.32 + 0.12 = 0.1
λ=1
w = 0.7
Then, Cost function = 0.1 + 1 x 0.7
= 0.8

Comparing the two models, with all data points, we can see that the
Lasso regression line fits the model more accurately than the linear
regression line.

Key Difference between Ridge Regression and Lasso


Regression
o Ridge regression is mostly used to reduce the overfitting in the model,
and it includes all the features present in the model. It reduces the
complexity of the model by shrinking the coefficients.
o Lasso regression helps to reduce the overfitting in the model as well as
feature selection.

You might also like