Linear Regression
Linear Regression is a supervised machine learning algorithm.
It tries to find out the best linear relationship that describes the data you have.
It assumes that there exists a linear relationship between a dependent variable and
independent variable(s).
The value of the dependent variable of a linear regression model is a continuous
value i.e. real numbers.
Linear Regression
We want to find the best line (linear function y=f(X)) to
explain the data.
y
X
Simple Linear Regression
Simple Linear Regression Equation
The equation that describes how individual y values relate to x
The predicted value of y is given by: 𝑦 = 𝑏0 + 𝑏1 X . Where;
y is a dependent variable.
𝑦 is the predicted value of y
X is an independent variable.
b0 and b1 are the regression coefficients.
b0 is the intercept or the bias that fixes the offset to a line.
b1 is the slope or weight that specifies the factor by which X
has an impact on Y.
Error for Simple Linear Regression model
Y= 𝛽0 + 𝛽1 X + 𝜀 ed the regressiondel.
𝜀: reflects how individuals deviate from others with the same
value of x
Ŷ82=b0 + b182 e82=Y82-Ŷ82
X=82
Estimated Simple Linear Regression Equation
Recall: The estimated simple linear regression equation is:
𝑌 = 𝑏0 + 𝑏1 X
b0 is the estimate for β0
b1 is the estimate for β1
𝑌 is the estimated (predicted) value of Y for a given x value.
ŷ
Least Squares method
• Least Squares Criterion: Choose the “best” β0 and β1 to minimize
• S=Σ(𝑌𝑖 – (𝛽0 + 𝛽1𝑋𝑖) )2
• Use calculus: take derivative with respect to β0 and with respect to
β1 and set the two resulting equations equal to zero and solve for β0
and β1
• Of all possible lines pick the one that minimizes the sum of the
distances squared of each point from that line
Least Squares Solution
b1
(X X )(Y Y )
i i
(X X )
slope:
2
i
Intercept: b 0 Y b1 X
Estimating the Variance s 2
• An Estimate of s 2
The mean square error (MSE) provides the estimate
of s 2, and the notation s2 is also used.
s2 = MSE = SSE/(n-2)
where:
If points are close to the regression line then SSE will be small
If points are far from the regression line then SSE will be large
SSE: Sum of Squared Errors
Bias, variance tradeoff
Variance
Bias
Regression predictions should be unbiased. That is:
"average of predictions" should ≈ "average of observations"
Bias measures how far the mean of predictions is from the mean of actual
values
𝐵𝑖𝑎𝑠 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 – 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑎𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 (𝑔𝑟𝑜𝑢𝑛𝑑 𝑡𝑟𝑢𝑡ℎ 𝑙𝑎𝑏𝑒𝑙𝑠)
The model can’t fit the data (usually too simplistic)
Increase the complexity of the model to minimize Bias
Variance
Variance indicates how much the estimate of the target function will alter if different
training data were used.
It describes how much a random variable differs from its expected value.
It is based on a single training set.
Measures the inconsistency of different predictions using different training
Different samples of training data yield different model fits
Increase the size training data set to minimize variance
Overfitting Vs Model’s Complexity
•Models with high bias will have low variance.
•Models with high variance will have a low bias
Over-fitting
Overfitting is an undesirable behavior where a learning model gives
accurate predictions for training data but not for new data.
The machines fits all the data points or more than the required data
points present in the training set,.
The model starts caching noise
How to minimize Overfitting
Reduce model complexity
Training with more data
Removing features
Early stopping the training
Regularization
Regularization and Over-fitting
Adding a regularizer:
Model
error Without regularizer
With regularizer
Number of iterations
Under fitting
The model is not able to capture the underlying trend of the
data.
How to minimize underfitting
• By increasing the training time of the model.
• By increasing the number of features
• Increasing model complexity
2. Multiple Linear Regression
In multiple linear regression, the dependent variable depends on more than one
independent variables
The predicted value of y is given by:
𝑦 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + 𝛽3𝑋3 + … … + 𝛽𝑛𝑋𝑛