0% found this document useful (0 votes)
22 views6 pages

Linear Regression

Uploaded by

Akimahbus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views6 pages

Linear Regression

Uploaded by

Akimahbus
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Linear Regression (Python

Implementation)

This article discusses the basics of linear regression and its implementation
in the Python programming language. Linear regression is a statistical
method for modeling relationships between a dependent variable with a
given set of independent variables.

Simple Linear Regression


Simple linear regression is an approach for predicting a response using
a single feature. It is one of the most basic machine learning models that a
machine learning enthusiast gets to know about. In linear regression, we
assume that the two variables i.e. dependent and independent variables are
linearly related. Hence, we try to find a linear function that predicts the
response value(y) as accurately as possible as a function of the feature or
independent variable(x). Let us consider a dataset where we have a value of
response y for every feature x:

For generality, we define:


x as feature vector, i.e x = [x_1, x_2, …., x_n],
y as response vector, i.e y = [y_1, y_2, …., y_n]
for n observations (in the above example, n=10). A scatter plot of the above
dataset looks like this:-
Scatter plot for the randomly generated data

Now, the task is to find a line that fits best in the above scatter plot so that
we can predict the response for any new feature values. (i.e a value of x not
present in a dataset) This line is called a regression line. The equation of
the regression line is represented as:

Here,
 h(x_i) represents the predicted response value for ith observation.
 b_0 and b_1 are regression coefficients and represent the y-
intercept and slope of the regression line respectively.
To create our model, we must “learn” or estimate the values of regression
coefficients b_0 and b_1. And once we’ve estimated these coefficients, we
can use the model to predict responses!
In this article, we are going to use the principle of Least Squares.
Now consider:

Here, e_i is a residual error in ith observation. So, our aim is to minimize
the total residual error. We define the squared error or cost function, J as:
And our task is to find the value of b 0 and b1 for which J(b0, b1) is minimum!
Without going into the mathematical details, we present the result here:

Where SSxy is the sum of cross-deviations of y and x:

And SSxx is the sum of squared deviations of x:

Python Implementation of Linear Regression


We can use the Python language to learn the coefficient of linear regression
models. For plotting the input data and best-fitted line we will use
the matplotlib library. It is one of the most used Python libraries for plotting
graphs.

import numpy as np

import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points

n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)

m_y = np.mean(y)
# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x

SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx

b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot

plt.scatter(x, y, color = "m",

marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")


# putting labels

plt.xlabel('x')

plt.ylabel('y')

# function to show plot

plt.show()

def main():

# observations / data

x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients

b = estimate_coef(x, y)

print("Estimated coefficients:\nb_0 = {} \

\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)
if __name__ == "__main__":

main()

Output:
Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437
And the graph obtained looks like this:

Scatterplot of the points along with the regression line

You might also like