0% found this document useful (0 votes)

338 views15 pages

Machine Learning Lab: Regression Analysis

This document summarizes the methods used in a machine learning lab assignment. It discusses two housing datasets used from Kaggle and UCI repositories. It describes preprocessing steps like importing packages and cleaning data. Correlation between features and sale price is found using a heatmap. Three linear regression methods are applied - ordinary least squares, scikit-learn's linear regression function, and a user-defined gradient descent function. Models are trained and predictions are evaluated to find the best fitting features for predicting house prices.

Uploaded by

Nikhilesh Prabhakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

338 views15 pages

Machine Learning Lab: Regression Analysis

Uploaded by

Nikhilesh Prabhakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Lab 3 Machine Learning

Nikhilesh Prabhakar
16BCE1158

Datasets used

Apart from the usual House Prices dataset that were used in previous
lab submissions, the dataset that I had worked on for this one was
the one presented in Sebastian Raschka’s “Python Machine
Learning”.

Link: https://www.kaggle.com/c/house-prices-advanced-regression-
techniques/data. There are 79 variables describing almost every
aspect of residential homes for sale in Iowa (at the time of collecting
data).
Link for the second dataset:
https://archive.ics.uci.edu/ml/machine-
learningdatabases/housing/housing.data

Methodology

Step 1: Importing the required Packages

import pandas as pd
import matplotlib as plot
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import seaborn as sb

Step 2: Cleaning the data

This was covered in Lab 1’s submission. Additionally all string
columns were converted to integer by using Panda’s function
get_dummies() which separates all unique string values by making
them separate columns

Step 3: Using a heatmap

The Pearson’s correlation coefficient is figured out between Sale
Price and the other columns in the dataset. The columns that had
absolute correlation above 0.5 were plotted on the map as shown.

corrmat = fdata.corr()
top_corr_features =
corrmat.index[abs(corrmat["SalePrice"])>0.50]
plt.figure(figsize=(10,10))
g =
sb.heatmap(fdata[top_corr_features].corr(),annot=True)
#Sale Price is most correlated with OverallQual,
GrLivArea, GarageCars, GarageArea, TotalBsmtSF, 1stFlrSF
Step 4: The Linear Regression Model
There are 3 methods covered in this lecture to for finding out the
Linear Regression

Method 1: OLS (Ordinary Least Squares)

from statsmodels.formula.api import ols

from IPython.display import HTML, display

housing_model = ols("SalePrice ~ OverallQual",

data=fdata).fit()
Adj. R-squared indicates that 62% of housing prices can be
explained by our predictor variable
The standard error measures the accuracy of OverallQual’s
coefficient by estimating the variation of the coefficient if the same
test were run on a different sample of our population. Our standard
error, 5756.407, is extremely and therefore a linear model doesn’t
suit our data.
A lot more was tried using OLS as can be seen in the python
notebook, even multiple regression’s statistical data can be viewed
through this.
Method 2: sklearn’s Linear Regression Function

from sklearn.linear_model import LinearRegression

X = fdata[["OverallQual"]]
Y = fdata[["SalePrice"]]
clf = LinearRegression()
clf.fit(X,Y)
Out[174]: LinearRegression(copy_X=True, fit_intercept=True,
n_jobs=1, normalize=False)
data_test = pd.read_csv("train.csv")
X_test = data_test[["OverallQual"]]
Y_test = data_test[["SalePrice"]]
clf.score(X_test,Y_test)
Out[179]: 0.62544678976769652

Similar to the Adj R-squared value seen in OLS

Y_pred = clf.predict(X_test)
plt.scatter(X_test, Y_test, color='black')
plt.plot(X_test, Y_pred, color='blue', linewidth=3)
#OverallQual is a discrete data
Same code was tried out with another attribute “GrLivArea” and this
was formed
Method 2: User-defined function given in Sebastian Raschka’s
book

class LinearRegressionGD(object):
def __init__(self, eta=0.001, n_iter=20):
self.eta = eta
self.n_iter = n_iter
def fit(self, X, y):
self.w_ = np.zeros(1 + X.shape[1])
self.cost_ = []
for i in range(self.n_iter):
output = self.net_input(X)
errors = (y - output)
b = self.eta * X.T.dot(errors)
self.w_[1:] += b
self.w_[0] += self.eta * errors.sum()
cost = (errors**2).sum() / 2.0
self.cost_.append(cost)
return self
def net_input(self, X):
return np.dot(X, self.w_[1:]) + self.w_[0]
def predict(self, X):
return self.net_input(X)
def lin_regplot(X, y, model):
plt.scatter(X, y, c='blue')
plt.plot(X, model.predict(X), color='red')
return None
lin_regplot(X_std, y_std, lr)
plt.xlabel('Average number of rooms [RM]
(standardized)')
plt.ylabel('Price in $1000\'s [MEDV] (standardized)')
plt.show()
slr = LinearRegression()
slr.fit(X, y)
print('Slope: %.3f' % slr.coef_[0])
print('Intercept: %.3f' % slr.intercept_)
Slope: 107.130
Intercept: 18569.026

Similar was done with “GrLivArea"

The same code was tried out with the example given in the textbook.
A python notebook along with snippets down below is provided for
proof of execution.

Machine Learning in Mechanical Engineering
No ratings yet
Machine Learning in Mechanical Engineering
20 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Sentiment Analysis Using Recurrent Neural Network
No ratings yet
Sentiment Analysis Using Recurrent Neural Network
7 pages
Variable Selection Techniques in R
No ratings yet
Variable Selection Techniques in R
15 pages
Decision Tree Classification on Iris Dataset
No ratings yet
Decision Tree Classification on Iris Dataset
6 pages
Data Mining Project: Clustering & Model Analysis
100% (1)
Data Mining Project: Clustering & Model Analysis
40 pages
Statistics I
100% (2)
Statistics I
686 pages
H-311 Linear Regression Analysis With R
100% (1)
H-311 Linear Regression Analysis With R
71 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
Ensemble Methods in Data Analytics
No ratings yet
Ensemble Methods in Data Analytics
23 pages
Multivariate Statistical Modelling Based On Generalized Linear Models 2nd Edition ISBN 0387951873, 9780387951874 PDF
No ratings yet
Multivariate Statistical Modelling Based On Generalized Linear Models 2nd Edition ISBN 0387951873, 9780387951874 PDF
17 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Semi-Automated EDA in Python
No ratings yet
Semi-Automated EDA in Python
3 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
7143cem Portfolio March2023 Brief
No ratings yet
7143cem Portfolio March2023 Brief
9 pages
Understanding SHAP Values in ML Models
No ratings yet
Understanding SHAP Values in ML Models
12 pages
Real-Time Car Make and Model Recognition
No ratings yet
Real-Time Car Make and Model Recognition
8 pages
Machine Learning Algorithm Guide
100% (1)
Machine Learning Algorithm Guide
15 pages
Build Your Movie Recommendation System
No ratings yet
Build Your Movie Recommendation System
8 pages
Multivariate Linear Regression Guide
100% (1)
Multivariate Linear Regression Guide
12 pages
Finance-Focused Big Data Techniques
100% (1)
Finance-Focused Big Data Techniques
23 pages
Linear Regression with Python OLS
No ratings yet
Linear Regression with Python OLS
23 pages
ML Unit 1 Notes
100% (1)
ML Unit 1 Notes
19 pages
Automatic Differentiation With Pytorch: Stat 479: Deep Learning, Spring 2019 Sebastian Raschka
No ratings yet
Automatic Differentiation With Pytorch: Stat 479: Deep Learning, Spring 2019 Sebastian Raschka
43 pages
Data Analytics Lab Manual
No ratings yet
Data Analytics Lab Manual
26 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Weka A Tool For Exploratory Data Mining
No ratings yet
Weka A Tool For Exploratory Data Mining
157 pages
Time Series
67% (3)
Time Series
34 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Object-Oriented Programming in R
No ratings yet
Object-Oriented Programming in R
138 pages
Random Forests for Data Scientists
100% (1)
Random Forests for Data Scientists
12 pages
Understanding Machine Learning Basics
100% (1)
Understanding Machine Learning Basics
64 pages
Multivariate Linear Regression Guide
No ratings yet
Multivariate Linear Regression Guide
24 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Xgboost in Online Transaction Fraud Detection
100% (1)
Xgboost in Online Transaction Fraud Detection
8 pages
How To Model Residual Errors To Correct Time Series Forecasts With Python
No ratings yet
How To Model Residual Errors To Correct Time Series Forecasts With Python
22 pages
Greet Manual
No ratings yet
Greet Manual
121 pages
Beginner's Guide to ML Models
No ratings yet
Beginner's Guide to ML Models
12 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Machine Learning Cheatsheet
100% (1)
Machine Learning Cheatsheet
15 pages
5 Data Science Project Lifecycle
No ratings yet
5 Data Science Project Lifecycle
33 pages
ARMA Model
No ratings yet
ARMA Model
12 pages
EEE1007 Neural Network and Fuzzy Control
No ratings yet
EEE1007 Neural Network and Fuzzy Control
2 pages
Introduction to Basic Statistics Concepts
No ratings yet
Introduction to Basic Statistics Concepts
27 pages
Vision-Face Recognition Attendance Monitoring System For Surveillance Using Deep Learning Technology and Computer Vision
No ratings yet
Vision-Face Recognition Attendance Monitoring System For Surveillance Using Deep Learning Technology and Computer Vision
5 pages
Rapidminer 4.6 Tutorial
100% (1)
Rapidminer 4.6 Tutorial
695 pages
Data Science Life Cycle Sheet
No ratings yet
Data Science Life Cycle Sheet
191 pages
Customer Lifetime Value Analysis Based On Machine Learning: Xinqian Dai
No ratings yet
Customer Lifetime Value Analysis Based On Machine Learning: Xinqian Dai
5 pages
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
No ratings yet
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
20 pages
Housing Price Prediction with Regression
No ratings yet
Housing Price Prediction with Regression
5 pages
ML Record
No ratings yet
ML Record
19 pages
Integrated System Lab
No ratings yet
Integrated System Lab
25 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
Document From Jahnavi
No ratings yet
Document From Jahnavi
20 pages
Exercise4 Solution
No ratings yet
Exercise4 Solution
20 pages
ML Manual
No ratings yet
ML Manual
30 pages
ML Lab Experiment Shivansh
No ratings yet
ML Lab Experiment Shivansh
29 pages
SML - Week 3
No ratings yet
SML - Week 3
5 pages
Enactus Fest 2018 Program Schedule
No ratings yet
Enactus Fest 2018 Program Schedule
1 page
Fall06 Mid Solution
No ratings yet
Fall06 Mid Solution
8 pages
Database Normalization Exercise
33% (6)
Database Normalization Exercise
4 pages
Batch Job Executor Project Report
No ratings yet
Batch Job Executor Project Report
14 pages
Understanding Support Vector Machines
0% (1)
Understanding Support Vector Machines
7 pages
OSPF Routing Setup Guide
100% (1)
OSPF Routing Setup Guide
1 page
8087 Notes
No ratings yet
8087 Notes
8 pages
ER Relational Mapping
No ratings yet
ER Relational Mapping
52 pages
Digital Logic and Number Systems Guide
No ratings yet
Digital Logic and Number Systems Guide
217 pages
Model QP - Dr.S.geetha
No ratings yet
Model QP - Dr.S.geetha
2 pages
Exercise 2 - Tutorial Problem-Solved in Class
No ratings yet
Exercise 2 - Tutorial Problem-Solved in Class
2 pages
B.Tech CSE Digital Logic Exam
No ratings yet
B.Tech CSE Digital Logic Exam
3 pages
University Database ER Diagram Assignment
No ratings yet
University Database ER Diagram Assignment
2 pages
SQL Basics for Students
No ratings yet
SQL Basics for Students
20 pages
Matrix Diagonalization Explained
No ratings yet
Matrix Diagonalization Explained
3 pages
CSE - Database Management Systems
No ratings yet
CSE - Database Management Systems
17 pages
Overview of SQL DDL Commands
No ratings yet
Overview of SQL DDL Commands
13 pages
Document Scanned by CamScanner
No ratings yet
Document Scanned by CamScanner
18 pages
Cauchy-Legendre de IIorder
100% (2)
Cauchy-Legendre de IIorder
2 pages
MAT2002 Applications of Differential and Difference Equations ETH 1 AC37
No ratings yet
MAT2002 Applications of Differential and Difference Equations ETH 1 AC37
3 pages
Eigenvalue Properties Explained
No ratings yet
Eigenvalue Properties Explained
7 pages
New Doc 23
No ratings yet
New Doc 23
14 pages
Cat-1 Model
No ratings yet
Cat-1 Model
1 page
CamScanner Document Scans
No ratings yet
CamScanner Document Scans
27 pages
Particular Integral UCmethod
No ratings yet
Particular Integral UCmethod
2 pages
CHY Exp 4
No ratings yet
CHY Exp 4
5 pages
Pre-Req:: Calculus For Engineers L, T, P, J, C 3, 0,2,0, 4 Topics L Hrs SLO
No ratings yet
Pre-Req:: Calculus For Engineers L, T, P, J, C 3, 0,2,0, 4 Topics L Hrs SLO
2 pages
Experiment-1 Registration NO: 16BCE1158 Faculty Name: Muthunagai Session: L5+ L6 DATE: 8/08/2016 Aim
No ratings yet
Experiment-1 Registration NO: 16BCE1158 Faculty Name: Muthunagai Session: L5+ L6 DATE: 8/08/2016 Aim
5 pages
Programming Problems
No ratings yet
Programming Problems
6 pages
3rd Prep Language Computer Booklet - First Term 2023 - 2024
No ratings yet
3rd Prep Language Computer Booklet - First Term 2023 - 2024
40 pages
JavaScript Interview Questions and Answers PDF - CodeProject
No ratings yet
JavaScript Interview Questions and Answers PDF - CodeProject
8 pages
BNM802 Assignment
No ratings yet
BNM802 Assignment
3 pages
Gradually Varied Flow Simulation in Excel
No ratings yet
Gradually Varied Flow Simulation in Excel
26 pages
Music Tech Focus - Mastering Volume 4
100% (2)
Music Tech Focus - Mastering Volume 4
132 pages
Understanding Programmable Logic Controllers
No ratings yet
Understanding Programmable Logic Controllers
75 pages
OOSE Lab Record - 2025 - IT Dept
No ratings yet
OOSE Lab Record - 2025 - IT Dept
122 pages
Denuvo - Ticket - 2025 11 16 - 21 59 36
No ratings yet
Denuvo - Ticket - 2025 11 16 - 21 59 36
1 page
HCI Proposal
No ratings yet
HCI Proposal
5 pages
E Catalog Platinous PDF
No ratings yet
E Catalog Platinous PDF
8 pages
Integrating Remote Sensing, Image Processing and Machine Learning Algorithms For The Recognition and Counting of Agave Plants
No ratings yet
Integrating Remote Sensing, Image Processing and Machine Learning Algorithms For The Recognition and Counting of Agave Plants
9 pages
ME Project Work
No ratings yet
ME Project Work
2 pages
Google Sketchup Hotkeys PDF
No ratings yet
Google Sketchup Hotkeys PDF
3 pages
OpenSAP Btpt1 Week 1 Transcript en
No ratings yet
OpenSAP Btpt1 Week 1 Transcript en
23 pages
Job Letter Sample PDF Download
100% (2)
Job Letter Sample PDF Download
5 pages
Unifast Application Form
No ratings yet
Unifast Application Form
1 page
Bapi
No ratings yet
Bapi
2 pages
DraftSight 2017 SP3 System Requirements v1
No ratings yet
DraftSight 2017 SP3 System Requirements v1
1 page
Toshiba Drivve - Solutions Overview
No ratings yet
Toshiba Drivve - Solutions Overview
16 pages
Docker Flash Card
No ratings yet
Docker Flash Card
12 pages
8086 Microprocessor Interrupts Explained
No ratings yet
8086 Microprocessor Interrupts Explained
4 pages
CS621 FT Highlighted by Vaniza
No ratings yet
CS621 FT Highlighted by Vaniza
111 pages
Digital Art Brush Installation Guide
No ratings yet
Digital Art Brush Installation Guide
3 pages
Network Security Analyst Steps
No ratings yet
Network Security Analyst Steps
15 pages
Staad Examples
100% (1)
Staad Examples
35 pages
Unit 1 - Sic 1
No ratings yet
Unit 1 - Sic 1
11 pages
C Programs for Stack, Queue, SLL, DLL
No ratings yet
C Programs for Stack, Queue, SLL, DLL
56 pages
Microsoft Publisher Activities
No ratings yet
Microsoft Publisher Activities
1 page
Mca Java Lab Manual Lab 2nd Sem
No ratings yet
Mca Java Lab Manual Lab 2nd Sem
19 pages

Machine Learning Lab: Regression Analysis

Uploaded by

Machine Learning Lab: Regression Analysis

Uploaded by

Lab 3 Machine Learning

Step 1: Importing the required Packages

Step 2: Cleaning the data

Step 3: Using a heatmap

Method 1: OLS (Ordinary Least Squares)

from statsmodels.formula.api import ols

housing_model = ols("SalePrice ~ OverallQual",

from sklearn.linear_model import LinearRegression

Similar to the Adj R-squared value seen in OLS

Similar was done with “GrLivArea"

You might also like