0% found this document useful (0 votes)

21 views11 pages

Multiple Linear Regression in Python

Uploaded by

md al amin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views11 pages

Multiple Linear Regression in Python

Uploaded by

md al amin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

MULTIPLE

LINEAR
REGRESSION
IN PYTHON
MACHINE LEARNING

BY JAY PATEL
Implementing Linear
Regression with multiple
features in Python
Let's start by first importing our dependencies. I will import numpy,
pandas and matplotlib. numpy is used when we are dealing with the
matrices. pandas is used when we are dealing with the data, and in
machine learning we are always dealing with the data. and matplotlib
is used while we are dealing with the graphs.

These three are one of the most important libraries when we are
implementing machine learning in python so make sure you have
knowledge about that.

Below are the links for Numpy and Pandas Tutorial :

Numpy : c
lick here

Pandas : c
lick here

import numpy as np

import pandas as pd
import matplotlib.pyplot as plt

Now lets load our dataset. I have divided the data set into two sets, one
for training and one for testing.

1
You can download/view dataset from here : train_dataset, t est_dataset
Once you open the above files, copy paste the entire content in your
notepad and save it.

train = pd.read_csv("train_data.csv")
test = pd.read_csv("test_data.csv")

Let's see how our data set actually looks like.

train.head()

Our data set has two unwanted features, (Unnamed and Id) so, we will
remove those. At the end you can see we have a sales price.

sales price is our prediction price, our Y. And all other features are our X.
I have attached a file below which describes our features in detail. Go
have a look on it. house-features

Also the original dataset had a lot of categorical data, which had
character or string in their datafield. Now we cannot train our machine
learning model with categorical (or string) data fields. So I have already
pre-processed the data, converting categorical features into numerical
features.

2
Apart from the above pre-processing, I have also done a lot of other
preprocessing which was essential to make our dataset trainable.

I will cover the pre-processing of the dataset in my future videos, so stay

tuned for that ! And I will also cover how I have pre-processed this
dataset, which was originally downloaded from kaggle:House Price

Now we will drop off those two unwanted features.

train = train.drop(["Unnamed: 0", "Id"], axis = 1)

test = test.drop(["Unnamed: 0", "Id"], axis = 1)

And now let's separate the X and Y.

train_data = train.values
Y = train_data[:, -1].reshape(train_data.shape[0], 1)
X = train_data[:, :-1]

test_data = test.values
Y_test = test_data[:, -1].reshape(test_data.shape[0], 1)
X_test = test_data[:, :-1]

Now let's see what is the shape of our dataset.

print("Shape of X_train :", X.shape)
print("Shape of Y_train :", Y.shape)
print("Shape of X_test :", X_test.shape)
print("Shape of Y_test :", Y_test.shape)

f X_train : (1200, 70)

Shape o
Shape o f Y_train : (1200, 1)

3
f X_test : (258, 70)
Shape o
Shape o f Y_test : (258, 1)

So, we have data of 1200 houses in our dataset, and each house has 70
features. Similarly, we have 258 houses in our dataset.

Now let's have a quick overview of the linear regression.

If you do not know anything about linear regression then go to this

video Linear Regression, where I have explained what is linear
regression.

We know that in linear regression we make our predictions by this

equation :

y_pred = θ n x n + θ n−1 x n−1 + θ n−2 x n−2 +...+ θ 2 x 2 + θ1 x 1 + θ 0

And in python, we can write above equation as the matrix

multiplication of theta and X

y_pred = matrix_mul(X,theta)

4
Now, to do a proper matrix multiplication of X and theta, we will need
to add a column of 1s before all the features of X. The reason for doing
so is that we are multiplying θ 2 with x 2 , θ1 with x 1 and there is no
. So we will add 1 at the place of X0.
X0 to multiple with θ 0

X = np.vstack((np.ones((X.shape[0], )), X.T)).T

X_test = np.vstack((np.ones((X_test.shape[0], )), X_test.T)).T

Now, we know that the cost function is the error representation and we
find the error by subtracting our prediction value with the actual value.
So our cost function will be calculated by the above formula.

5
To know in detail about the Linear Regression Cost function, go to this
video : Cost Function

Now to minimize the error (or Cost) we need to use something called
gradient descent and the gradient descent basically works by reducing
the cost value in such a way that it reaches its local minima.

If you want to know about the gradient descent then click Gradient
Descent which will take you to the video of the gradient descent where
i have explained the gradient descent as simple as possible and in as
much detail as possible.

The equation for the gradient descent is given above.

So let's make our linear regression model in python.

1. I'm taking four parameters X, Y, learning rate (which is alpha) and

iterations. Iterations specifies how many times we want to run the
loop.
2. Define m as the size of the data set( which is currently 1200).
3. And theta will be the vector of zeros. so it will be a matrix of size (n,
1) where n is the number of features. so basically of the size (70, 1)
4. We are running the loop for iteration time. And in every iteration
we will compute our above 4 equations.
5. We will also keep track of our cost at every iteration, by
maintaining a cost_list.
6. And finally, return the theta parameter (which will be trained) and
cost_list.

6
def model(X, Y, learning_rate, iteration):
m = Y.size
theta = np.zeros((X.shape[1], 1))
cost_list = []

for i in range(iteration):

y_pred = np.dot(X, theta)

cost = (1/(2m))np.sum(np.square(y_pred - Y))

d_theta = (1/m)*np.dot(X.T, y_pred - Y)

theta = theta - learning_rate*d_theta

cost_list.append(cost)

# to print the cost for 10 times

if(i%(iteration/10) == 0):
print("Cost is :", cost)

return theta, cost_list

Now let's call our model.

iteration = 10000
learning_rate = 0.000000005
theta, cost_list = model(X, Y, learning_rate = learning_rate, iteration =
iteration)

Cost is : 72.37539364066856

Cost is : 0.027904168310316866
Cost is : 0.017251065372144152
Cost is : 0.016355272705548277
Cost is : 0.016158836087530753
Cost is : 0.016040958498450615
Cost is : 0.015946827323753437
Cost is : 0.01586789631723002
Cost is : 0.015800568014785396
Cost is : 0.015742355306482898

7
We can see our cost decreasing with every iteration. We can also plot a
graph of cost vs iteration

rng = np.arange(0, iteration)

plt.plot(rng, cost_list)
plt.show()

Now the fun part!

We will test the accuracy of our model on a test dataset. And for testing
accuracy, I'm going to calculate the error, and subtract the 1 by error, to
get the accuracy.

Below is the equation for the error :

error = (1/m) * ∑|y_pred−Y|

In python, it will be :

y_pred = np.dot(X_test, theta)

error = (1/X_test.shape[0])*np.sum(np.abs(y_pred - Y_test))

8
print("Test error is :", error*100, "%")
print("Test Accuracy is :", (1- error)*100, "%")

rror is : 12.959999999999999 %
Test e
Test A ccuracy is : 87.03999999999999 %

Our model has 87 percent accuracy and we achieved that with just a
few lines of code !!

Congratulations !!

Subscribe to CODING LANE for more amazing content :

Python Linear Regression Guide
No ratings yet
Python Linear Regression Guide
9 pages
Linear Regression Practice Lab Guide
No ratings yet
Linear Regression Practice Lab Guide
51 pages
C1 W2 Lab02 Multiple Variable Soln
No ratings yet
C1 W2 Lab02 Multiple Variable Soln
11 pages
Linear Regression Practice Lab Guide
No ratings yet
Linear Regression Practice Lab Guide
14 pages
1 Tutorial: Linear Regression
No ratings yet
1 Tutorial: Linear Regression
8 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
Linear Regression Lab Guide
No ratings yet
Linear Regression Lab Guide
5 pages
DS
No ratings yet
DS
31 pages
Sofcomputing Da2
No ratings yet
Sofcomputing Da2
7 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
Linear Regression Exercise Guide
100% (1)
Linear Regression Exercise Guide
3 pages
01 Univariate Linear Regression
No ratings yet
01 Univariate Linear Regression
13 pages
Chapter 6 - Advanced Machine Learning PDF
No ratings yet
Chapter 6 - Advanced Machine Learning PDF
37 pages
Polynomial Regression Blogpost
No ratings yet
Polynomial Regression Blogpost
8 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Lab-5 Report
No ratings yet
Lab-5 Report
11 pages
Linear Regression for Beginners
No ratings yet
Linear Regression for Beginners
36 pages
Linear Regression Techniques
No ratings yet
Linear Regression Techniques
25 pages
21 CP 46 - (ML LAB 3)
No ratings yet
21 CP 46 - (ML LAB 3)
13 pages
Lab5 Linear Regression
No ratings yet
Lab5 Linear Regression
1 page
B.Tech AI & DS: Data Science Lab
No ratings yet
B.Tech AI & DS: Data Science Lab
35 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
LinearRegression Tutorial
No ratings yet
LinearRegression Tutorial
40 pages
Btech1007022 Lab5.1
No ratings yet
Btech1007022 Lab5.1
9 pages
Markdown in Jupyter Notebook Lab 4
No ratings yet
Markdown in Jupyter Notebook Lab 4
5 pages
Python Linear Regression Guide
No ratings yet
Python Linear Regression Guide
23 pages
Btech1007022 Lab5
No ratings yet
Btech1007022 Lab5
14 pages
Gradient Descent in Machine Learning
No ratings yet
Gradient Descent in Machine Learning
55 pages
AI Lab10
No ratings yet
AI Lab10
3 pages
Linear Regression Fundamentals and Techniques
No ratings yet
Linear Regression Fundamentals and Techniques
59 pages
Linear Regression
No ratings yet
Linear Regression
91 pages
Gradient Descent Algorithm
No ratings yet
Gradient Descent Algorithm
6 pages
Linear Regression Assignment Guide
No ratings yet
Linear Regression Assignment Guide
14 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
02 Multiple Regression Gradient Vectorized
No ratings yet
02 Multiple Regression Gradient Vectorized
12 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Fruit Data Analysis and Classification
No ratings yet
Fruit Data Analysis and Classification
27 pages
Python Machine Learning: Linear Regression
No ratings yet
Python Machine Learning: Linear Regression
14 pages
Mean Squared Error in Regression
No ratings yet
Mean Squared Error in Regression
40 pages
Multiple Linear Regression Guide
No ratings yet
Multiple Linear Regression Guide
7 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
ML Lab File
No ratings yet
ML Lab File
48 pages
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
AI Regression & Classification Guide
No ratings yet
AI Regression & Classification Guide
47 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
Aiml Practicals
No ratings yet
Aiml Practicals
22 pages
Expt 1
No ratings yet
Expt 1
6 pages
Unit-III Advanced Machine Learning
No ratings yet
Unit-III Advanced Machine Learning
8 pages
Deep Learning Lab Manual-36-41
No ratings yet
Deep Learning Lab Manual-36-41
6 pages
S1 - 25 (NSP) - ML - CS 34 - 10th17th Aug 2025
No ratings yet
S1 - 25 (NSP) - ML - CS 34 - 10th17th Aug 2025
89 pages
ML Record
No ratings yet
ML Record
19 pages
Linear Regression Lab: Methods & Examples
100% (1)
Linear Regression Lab: Methods & Examples
18 pages
Regression Model Training Guide
No ratings yet
Regression Model Training Guide
13 pages
Schrodinger Equation Detailed Notes
No ratings yet
Schrodinger Equation Detailed Notes
2 pages
Internship Report Taniya
No ratings yet
Internship Report Taniya
56 pages
Operations Research Lab Report
No ratings yet
Operations Research Lab Report
3 pages
2024 - Automated Digitization of Paper ECG Records Using Convolutional Networks - A Faster R-CNN and U-Net Approach
No ratings yet
2024 - Automated Digitization of Paper ECG Records Using Convolutional Networks - A Faster R-CNN and U-Net Approach
4 pages
7178-Article Text-27147-1-10-20240527
No ratings yet
7178-Article Text-27147-1-10-20240527
8 pages
Accountants Ethical Perceptions From Several Perspectives Evidence From Slovenia PDF
No ratings yet
Accountants Ethical Perceptions From Several Perspectives Evidence From Slovenia PDF
20 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
21 pages
Econometrics Homework Guide
No ratings yet
Econometrics Homework Guide
8 pages
Kenyan Airlines' Credit Impact
No ratings yet
Kenyan Airlines' Credit Impact
41 pages
Experiment 1: Calibration of Analytical Balance Group Members: Cruz, Danielle
No ratings yet
Experiment 1: Calibration of Analytical Balance Group Members: Cruz, Danielle
7 pages
Pricing Analysis for Maritime Assets
No ratings yet
Pricing Analysis for Maritime Assets
3 pages
SPSS: Descriptive and Inferential Statistics: For Windows
No ratings yet
SPSS: Descriptive and Inferential Statistics: For Windows
34 pages
Economics Honors Exam 2008 Solutions Question 9
No ratings yet
Economics Honors Exam 2008 Solutions Question 9
4 pages
Algebra II Regents Exam Questions CCSS
No ratings yet
Algebra II Regents Exam Questions CCSS
96 pages
Factors Affecting The Decision On Bank Loan A Case of Individual Customers at Agribank O Mon, Can Tho City, Vietnam
No ratings yet
Factors Affecting The Decision On Bank Loan A Case of Individual Customers at Agribank O Mon, Can Tho City, Vietnam
5 pages
Econ1203 Final Exam: Poverty Analysis
No ratings yet
Econ1203 Final Exam: Poverty Analysis
10 pages
Understanding Single Index Models in Finance
No ratings yet
Understanding Single Index Models in Finance
14 pages
Sample Thesis Using Regression Analysis
100% (5)
Sample Thesis Using Regression Analysis
6 pages
Forecasting of Economic Recession Using Machine Learning
No ratings yet
Forecasting of Economic Recession Using Machine Learning
6 pages
Financial Data Science
No ratings yet
Financial Data Science
5 pages
Stability Testing Guidelines for Drugs
No ratings yet
Stability Testing Guidelines for Drugs
44 pages
Waste Generation and Socio-Economic Factors
No ratings yet
Waste Generation and Socio-Economic Factors
10 pages
Gradient Boosting: Presentation Edited by
100% (1)
Gradient Boosting: Presentation Edited by
38 pages
Elementary Statistics: A Step by Step Approach (10th Edition) PDF
No ratings yet
Elementary Statistics: A Step by Step Approach (10th Edition) PDF
10 pages
Green Finance's Impact on China's Economy
No ratings yet
Green Finance's Impact on China's Economy
18 pages
Be - Computer Engineering - Semester 6 - 2023 - May - Quantitative Analysisrev 2019 C Scheme
No ratings yet
Be - Computer Engineering - Semester 6 - 2023 - May - Quantitative Analysisrev 2019 C Scheme
2 pages
Repeatability Standard Deviation
No ratings yet
Repeatability Standard Deviation
6 pages
Research Method Assignment
No ratings yet
Research Method Assignment
19 pages
Is Being Physically Strong Still Important in Today's Workplace?
No ratings yet
Is Being Physically Strong Still Important in Today's Workplace?
13 pages
Visualizing Interaction Effects in R
No ratings yet
Visualizing Interaction Effects in R
8 pages
Prediction of Maximal Aerobic Power From The 20-m Multi-Stage Shuttle Run Test
No ratings yet
Prediction of Maximal Aerobic Power From The 20-m Multi-Stage Shuttle Run Test
12 pages
Syallaus 6 Final
No ratings yet
Syallaus 6 Final
16 pages
Steel Industry Capital Insights
No ratings yet
Steel Industry Capital Insights
25 pages
STATA Basics Regression and Panal Data
100% (1)
STATA Basics Regression and Panal Data
26 pages

Multiple Linear Regression in Python

Uploaded by

Multiple Linear Regression in Python

Uploaded by

MULTIPLE

Below are the links for Numpy and Pandas Tutorial :

import​ numpy ​as​ np

Let's see how our data set actually looks like.

I will cover the pre-processing of the dataset in my future videos, so stay

Now we will drop off those two unwanted features.

train = train.drop(["Unnamed: 0", "Id"], axis = 1)

And now let's separate the X and Y.

Now let's see what is the shape of our dataset.

​ f​ X_train : (1200, 70)

Now let's have a quick overview of the linear regression.

If you do not know anything about linear regression then go to this

We know that in linear regression we make our predictions by this

y_pred = ​θ ​n ​x ​n ​+ θ ​n−1 ​x ​n−1 ​+ θ ​n−2 ​x ​n−2 ​+...+ θ ​2 ​x ​2 ​+ θ​1 ​x ​1 ​+ θ ​0

And in python, we can write above equation as the matrix

X = np.vstack((np.​on​es((X.shape[​0​], )), X.T)).T

The equation for the gradient descent is given above.

So let's make our linear regression model in python.

1. I'm taking four parameters X, Y, learning rate (which is alpha) and

​for​ i ​in​ ​range​(iteration):

y_pred = ​np​.dot(X, theta)

cost = (​1​/(​2​*m))*​np​.​sum​(​np​.square(y_pred - Y))

d_theta = (​1​/m)*​np​.dot(X.T, y_pred - Y)

# to print the cost for 10 times

​return​ theta, cost_list

Now let's call our model.

Cost ​is​ : 72.37539364066856

rng = np.arange(​0​, iteration)

Now the fun part!

Below is the equation for the error :

error = ​(​1/m) * ​∑|y_pred−Y|

y_pred = np.dot(X_test, theta)

Subscribe to ​CODING LANE​ for more amazing content :

You might also like

import numpy as np

f X_train : (1200, 70)

y_pred = θ n x n + θ n−1 x n−1 + θ n−2 x n−2 +...+ θ 2 x 2 + θ1 x 1 + θ 0

X = np.vstack((np.ones((X.shape[0], )), X.T)).T

for i in range(iteration):

y_pred = np.dot(X, theta)

cost = (1/(2m))np.sum(np.square(y_pred - Y))

d_theta = (1/m)*np.dot(X.T, y_pred - Y)

return theta, cost_list

Cost is : 72.37539364066856

rng = np.arange(0, iteration)

error = (1/m) * ∑|y_pred−Y|

Subscribe to CODING LANE for more amazing content :