What is Linear Regression?
Linear Regression is a supervised learning algorithm in machine
learning, which is widely used for solving regression problems.
Regression is a type of machine learning problem where the goal is to
predict a continuous output variable based on one or more input
variables.
In Linear Regression, the goal is to find the best-fitting linear equation
to describe the relationship between the input variables (also known
as predictors or features) and the output variable (also known as the
response variable).
The equation for a simple linear regression model can be written as
follows:
y = b0 + b1 * x
Here, y is the dependent variable (the variable we are trying to
predict), x is the independent variable (the predictor or feature), b0 is
the intercept term (the value of y when x is zero), and b1 is the slope
coefficient (the change in y for a unit change in x).
The goal of Linear Regression is to find the best values for b0 and b1
such that the line best fits the data points, minimizing the errors or
the difference between the predicted values and the actual values.
Types of Linear Regression?
There are two main types of Linear Regression models: Simple Linear
Regression and Multiple Linear Regression.
Simple Linear Regression: In simple linear regression, there is only
one independent variable (also known as the predictor or feature) and
one dependent variable (also known as the response variable). The
goal of simple linear regression is to find the best-fitting line to
describe the relationship between the independent and dependent
variable. The equation for a simple linear regression model can be
written as:
Y = b0 + b1 * X
Here, Y is the dependent variable, X is the independent variable, b0 is
the intercept term, and b1 is the slope coefficient.
Multiple Linear Regression: In multiple linear regression, there are
multiple independent variables and one dependent variable. The goal
of multiple linear regression is to find the best-fitting line to describe
the relationship between the independent variables and the
dependent variable. The equation for a multiple linear regression
model can be written as:
Y = b0 + b1 * X1 + b2 * X2 + … + bn * Xn
Here, Y is the dependent variable, X1, X2, …, Xn are the independent
variables, b0 is the intercept term, and b1, b2, …, bn are the slope
coefficients.
In both types of linear regression, the goal is to find the best values
for the intercept and slope coefficients that minimize the difference
between the predicted values and the actual values. Linear regression
is widely used in many real-world applications, such as finance,
marketing, and healthcare, for predicting outcomes such as stock
prices, customer behavior, and patient outcomes.
Linear Regression Line
In machine learning, a regression line can show two types of
relationships between the input variables (also known as predictors or
features) and the output variable (also known as the response
variable) in a linear regression model.
● Positive Relationship: A positive relationship exists
between the input variables and the output variable when
the slope of the regression line is positive. In other words, as
the values of the input variables increase, the value of the
output variable also increases. This can be seen as an
upward slope on a scatter plot of the data.
● Negative Relationship: A negative relationship exists
between the input variables and the output variable when
the slope of the regression line is negative. In other words,
as the values of the input variables increase, the value of the
output variable decreases. This can be seen as a downward
slope on a scatter plot of the data.
Finding the best fit line
In machine learning, finding the best-fitting line is crucial in linear
regression, as it determines the accuracy of the predictions made by
the model. The best-fitting line is the line that has the smallest
difference between the predicted values and the actual values.
To find the best-fitting line in a linear regression model, we use a
process called “ordinary least squares (OLS) regression”. This process
involves calculating the sum of the squared differences between the
predicted values and the actual values for each data point, and then
finding the line that minimizes this sum of squared errors.
The best-fitting line is found by minimizing the residual sum of
squares (RSS), which is the sum of the squared differences between
the predicted values and the actual values. This is achieved by
adjusting the values of the intercept and slope coefficients, also
known as c and m, respectively.
Once the values of c and m are determined, we can use the linear
regression equation to make predictions for new data points. The
equation for a simple linear regression model can be written as:
y=c+m*x
Here, y is the dependent variable (the variable we are trying to
predict), x is the independent variable (the predictor or feature), c is
the intercept term (the value of y when x is zero), and m is the slope
coefficient (the change in y for a unit change in x).
In multiple linear regression, the equation would have more
independent variables, and the slope coefficients for each variable
would be included in the equation.
Overall, finding the best-fitting line in a linear regression model is
critical for accurate predictions and is achieved by minimizing the
residual sum of squares using the OLS regression method.
Gradient Descent : Linear Regression
In this tutorial you can learn how the gradient descent algorithm
works and implement it from scratch in python. First we look at what
linear regression is, then we define the loss function. We learn how
the gradient descent algorithm works and finally we will implement it
on a given data set and make predictions.
Linear Regression
In statistics, linear regression is a linear approach to modeling
the relationship between a dependent variable and one or more
independent variables. Let X be the independent variable and Y
be the dependent variable. We will define a linear relationship
between these two variables as follows:
This is the equation for a line that you studied in high school. m
is the slope of the line and c is the y intercept. Today we will use
this equation to train our model with a given dataset and predict
the value of Y for any given value of X. Our challenge today is to
determine the value of m and c, such that the line corresponding
to those values is the best fitting line or gives the minimum
error.
Loss Function
The loss is the error in our predicted value of m and c. Our goal
is to minimize this error to obtain the most accurate value of m
and c.
We will use the Mean Squared Error function to calculate the
loss. There are three steps in this function:
1. Find the difference between the actual y and predicted
y value(y = mx + c), for a given x.
2. Square this difference.
3. Find the mean of the squares for every value in X.
Mean Squared Error Equation
Here yᵢ is the actual value and ȳᵢ is the predicted
value. Lets substitute the value of ȳᵢ:
Substituting the value of ȳᵢ
So we square the error and find the mean. hence the name Mean
Squared Error. Now that we have defined the loss function, lets
get into the interesting part — minimizing it and finding m and
c.
The Gradient Descent Algorithm
Gradient descent is an iterative optimization algorithm to find
the minimum of a function. Here that function is our Loss
Function.
Understanding Gradient Descent
Imagine a valley and a person with no sense of direction who
wants to get to the bottom of the valley. He goes down the slope
and takes large steps when the slope is steep and small steps
when the slope is less steep. He decides his next position based
on his current position and stops when he gets to the bottom of
the valley which was his goal.
Let’s try applying gradient descent to m and c and approach it
step by step:
1. Initially let m = 0 and c = 0. Let L be our learning rate.
This controls how much the value of m changes with
each step. L could be a small value like 0.0001 for good
accuracy.
2. Calculate the partial derivative of the loss function with
respect to m, and plug in the current values of x, y, m
and c in it to obtain the derivative value D.
Derivative with respect to m
Dₘ is the value of the partial derivative with respect to m.
Similarly lets find the partial derivative with respect to c, Dc :
Derivative with respect to c
3. Now we update the current value of m and c using the
following equation:
4. We repeat this process until our loss function is a very small
value or ideally 0 (which means 0 error or 100% accuracy). The
value of m and c that we are left with now will be the optimum
values.
Now going back to our analogy, m can be considered the current
position of the person. D is equivalent to the steepness of the
slope and L can be the speed with which he moves. Now the new
value of m that we calculate using the above equation will be his
next position, and L×D will be the size of the steps he will take.
When the slope is more steep (D is more) he takes longer steps
and when it is less steep (D is less), he takes smaller steps.
Finally he arrives at the bottom of the valley which corresponds
to our loss = 0.
Now with the optimum value of m and c our model is ready to
make predictions !
Implementing the Model
Now let’s convert everything above into code and see our model
in action !
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (12.0,
9.0)
# Preprocessing Input data
data = pd.read_csv('data.csv')
X = data.iloc[:, 0]
Y = data.iloc[:, 1]
plt.scatter(X, Y)
plt.show()
# Building the model
m = 0
c = 0
L = 0.0001 # The learning Rate
epochs = 1000 # The number of iterations to perform
gradient descent
n = float(len(X)) # Number of elements in X
# Performing Gradient Descent
for i in range(epochs):
Y_pred = m*X + c # The current predicted value of Y
D_m = (-2/n) * sum(X * (Y - Y_pred)) # Derivative
wrt m
D_c = (-2/n) * sum(Y - Y_pred) # Derivative wrt c
m = m - L * D_m # Update m
c = c - L * D_c # Update c
print (m, c)
1.4796491688889395 0.10148121494753726
# Making predictions
Y_pred = m*X + c
plt.scatter(X, Y)
plt.plot([min(X), max(X)], [min(Y_pred), max(Y_pred)],
color='red') # regression line
plt.show()
Gradient descent is one of the simplest and widely used
algorithms in machine learning, mainly because it can be applied
to any function to optimize it. Learning it lays the foundation to
mastering machine learning.