0% found this document useful (0 votes)

140 views12 pages

Lab Manual 04

This document provides information about a machine learning lab session on logistic regression held from January 29th to February 2nd, 2024. The objectives are to cover test/train splits, saving and loading trained models using pickle and joblib, and logistic regression. It includes examples of creating training and test sets from sample data, fitting a logistic regression model to predict tumor classifications, and using it to make predictions on new data points. Saved models can be loaded later to avoid retraining, allowing for faster predictions.

Uploaded by

Islam Ulhaq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

140 views12 pages

Lab Manual 04

Uploaded by

Islam Ulhaq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA

FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING

COMPUTER ENGINEERING DEPARTMENT

Machine Learning

Logistic Regression

Dated:
29th Jan, 2024 to 2nd Feb, 2024

Semester:
2024

Lab Instructor: Sheharyar Khan

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT

Objectives:-

The objectives of this session are: -

1. Test/Train
2. Save and Load trained Model
3. Pickle and sklearn joblib
4. Logistic Regression

What is Train/Test

Train/Test is a method to measure the accuracy of your model.

It is called Train/Test because you split the data set into two sets: a training set and a testing set.

80% for training, and 20% for testing.

You train the model using the training set.

You test the model using the testing set.

Start With a Data Set

Start with a data set you want to test.

Our data set illustrates 100 customers in a shop, and their shopping habits.

import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)

x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x

plt.scatter(x, y)
plt.show()

Split Into Train/Test

The training set should be a random selection of 80% of the original data.

The testing set should be the remaining 20%.

Lab Instructor: Sheharyar Khan

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

Display the Training Set

Display the same scatter plot with the training set:

plt.scatter(train_x, train_y)
plt.show()

Display the Testing Set

To make sure the testing set is not completely different, we will take a look at the testing set as
well.

plt.scatter(test_x, test_y)
plt.show()

Example

Draw a polynomial regression line through the data points:

import numpy
import matplotlib.pyplot as plt
numpy.random.seed(2)

x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))

myline = numpy.linspace(0, 6, 100)

Lab Instructor: Sheharyar Khan

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT

plt.scatter(train_x, train_y)
plt.plot(myline, mymodel(myline))
plt.show()

The result can back my suggestion of the data set fitting a polynomial regression, even though it
would give us some weird results if we try to predict values outside of the data set. Example: the
line indicates that a customer spending 6 minutes in the shop would make a purchase worth 200.
That is probably a sign of overfitting.

But what about the R-squared score? The R-squared score is a good indicator of how well my
data set is fitting the model.

Remember R2, also known as R-squared?

It measures the relationship between the x axis and the y axis, and the value ranges from 0 to 1,
where 0 means no relationship, and 1 means totally related.

The sklearn module has a method called r2_score() that will help us find this relationship.

In this case we would like to measure the relationship between the minutes a customer stays in
the shop and how much money they spend.

Example

How well does my training data fit in a polynomial regression?

import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)

x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

Lab Instructor: Sheharyar Khan

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT

mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))

r2 = r2_score(train_y, mymodel(train_x))

print(r2)

Bring in the Testing Set

Now we have made a model that is OK, at least when it comes to training data.

Now we want to test the model with the testing data as well, to see if gives us the same result.

Example

Let us find the R2 score when using testing data:

import numpy
from sklearn.metrics import r2_score
numpy.random.seed(2)

x = numpy.random.normal(3, 1, 100)
y = numpy.random.normal(150, 40, 100) / x

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))

r2 = r2_score(test_y, mymodel(test_x))

print(r2)

Predict Values

Now that we have established that our model is OK, we can start predicting new values.

Example

How much money will a buying customer spend, if she or he stays in the shop for 5 minutes?

Lab Instructor: Sheharyar Khan

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT

print(mymodel(5))

Save and Load trained Model

Solving a problem in ML consist of two steps typically. The first step is to training a model
using your training dataset and the second step is to ask your questions to the trained model
which sort like a human brain and that will give you the answers often the size of the training
dataset is pretty huge because as the size increases your model becomes more accurate. It is
like if you are doing a football training and if you train yourself more and more you become
more and more better at your football game and when your training dataset is so huge often it
in like giga bytes the training steps become more time consuming if you save the train model
to a file you can latter on use that same model to make the actual prediction. So, you don’t
need to train it every time you want to ask these questions

Quick Task Load Saved Model Example

Open the Linear Regression Python File predicting Home Prices and do these changes.

Lab Instructor: Sheharyar Khan

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT

In the above file take a new code block in google colab and write below code

import pickle

with open('model_pickle','wb') as file:

pickle.dump(model,file)

with open('model_pickle','rb') as file:

mp = pickle.load(file)

mp.coef_

mp.intercept_

mp.predict([[5000]])

Save Trained Model Using joblib (Second Way to save Model)

from sklearn.externals import joblib

joblib.dump(model, 'model_joblib')

Load Saved Model

mj = joblib.load('model_joblib')

mj.coef_

mj.intercept_

mj.predict([[5000]])

Question to Think?? What is difference between Joblib and Pickle?

Lab Instructor: Sheharyar Khan

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT

Logistic Regression

Logistic regression aims to solve classification problems. It does this by predicting categorical
outcomes, unlike linear regression that predicts a continuous outcome.

In the simplest case there are two outcomes, which is called binomial, an example of which is
predicting if a tumor is malignant or benign. Other cases have more than two outcomes to
classify, in this case it is called multinomial. A common example for multinomial logistic
regression would be predicting the class of an iris flower between 3 different species.

Here we will be using basic logistic regression to predict a binomial variable. This means it has
only two possible outcomes.

How does it work?

In Python we have modules that will do the work for us. Start by importing the NumPy module.

import numpy

Store the independent variables in X.

Store the dependent variable in y.

Below is a sample dataset:

#X represents the size of a tumor in centimeters.

X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52,
3.69, 5.88]).reshape(-1,1)

#Note: X has to be reshaped into a column from a row for the

LogisticRegression() function to work.
#y represents whether or not the tumor is cancerous (0 for "No", 1 for
"Yes").
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

We will use a method from the sklearn module, so we will have to import that module as well:

from sklearn import linear_model

From the sklearn module we will use the LogisticRegression() method to create a logistic
regression object.

This object has a method called fit() that takes the independent and dependent values as
parameters and fills the regression object with data that describes the relationship:
Lab Instructor: Sheharyar Khan
UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT

logr = linear_model.LogisticRegression()
logr.fit(X,y)

Now we have a logistic regression object that is ready to whether a tumor is cancerous based on
the tumor size:

#predict if tumor is cancerous where the size is 3.46mm:

predicted = logr.predict(numpy.array([3.46]).reshape(-1,1))

Example
import numpy
from sklearn import linear_model

#Reshaped for Logistic function.

X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-
1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()
logr.fit(X,y)

#predict if tumor is cancerous where the size is 3.46mm:

predicted = logr.predict(numpy.array([3.46]).reshape(-1,1))
print(predicted)

Coefficient

In logistic regression the coefficient is the expected change in log-odds of having the outcome
per unit change in X.

This does not have the most intuitive understanding so let's use it to create something that makes
more sense, odds.

Example

See the whole example in action:

import numpy
from sklearn import linear_model

#Reshaped for Logistic function.

X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-
1,1)

Lab Instructor: Sheharyar Khan

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT

y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()
logr.fit(X,y)

log_odds = logr.coef_
odds = numpy.exp(log_odds)

print(odds)

This tells us that as the size of a tumor increases by 1mm the odds of it being a cancerous tumor
increases by 4x.

Probability

The coefficient and intercept values can be used to find the probability that each tumor is
cancerous.

Create a function that uses the model's coefficient and intercept values to return a new value.
This new value represents probability that the given observation is a tumor:

def logit2prob(logr,x):
log_odds = logr.coef_ * x + logr.intercept_
odds = numpy.exp(log_odds)
probability = odds / (1 + odds)
return(probability)

Function Explained

To find the log-odds for each observation, we must first create a formula that looks similar to the
one from linear regression, extracting the coefficient and the intercept.

log_odds = logr.coef_ * x + logr.intercept_

To then convert the log-odds to odds we must exponentiate the log-odds.

odds = numpy.exp(log_odds)

Now that we have the odds, we can convert it to probability by dividing it by 1 plus the odds.

probability = odds / (1 + odds)

Lab Instructor: Sheharyar Khan

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT

Let us now use the function with what we have learned to find out the probability that each
tumor is cancerous

Example

See the whole example in action:

import numpy
from sklearn import linear_model

X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69, 5.88]).reshape(-
1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()
logr.fit(X,y)

def logit2prob(logr, X):

log_odds = logr.coef_ * X + logr.intercept_
odds = numpy.exp(log_odds)
probability = odds / (1 + odds)
return(probability)

print(logit2prob(logr, X))

Results Explained

3.78 0.61 The probability that a tumor with the size 3.78cm is cancerous is 61%.

2.44 0.19 The probability that a tumor with the size 2.44cm is cancerous is 19%.

2.09 0.13 The probability that a tumor with the size 2.09cm is cancerous is 13%.

Lab Instructor: Sheharyar Khan

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA
FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING
COMPUTER ENGINEERING DEPARTMENT

Tasks 1

Find the 7_Logistic Regression Python file and upload it in Google collab with data file
Insurance.csv

Run all Codes in the cell and write your understanding with output.

Task2

Download employee retention dataset from here: https://www.kaggle.com/giripujar/hr-analytics.

1. Now do some exploratory data analysis to figure out which variables have direct and clear
impact on employee retention (i.e. whether they leave the company or continue to work)
2. Plot bar charts showing impact of employee salaries on retention
3. Plot bar charts showing corelation between department and employee retention
4. Now build logistic regression model using variables that were narrowed down in step 1
5. Measure the accuracy of the model

Task 3: Run All Examples in the Lab Manual

Lab Instructor: Sheharyar Khan

Python Simple Linear Regression Guide
No ratings yet
Python Simple Linear Regression Guide
14 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Logistic Regression
No ratings yet
Logistic Regression
21 pages
Logistic Regression for Classification
No ratings yet
Logistic Regression for Classification
13 pages
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
28 pages
Machine Intelligence
No ratings yet
Machine Intelligence
24 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Sales and Regression Data Generation
No ratings yet
Sales and Regression Data Generation
30 pages
Wa0004.
No ratings yet
Wa0004.
9 pages
Assignment No.4 - (20-Ele-68)
No ratings yet
Assignment No.4 - (20-Ele-68)
17 pages
Hemraj Python Ass1
No ratings yet
Hemraj Python Ass1
7 pages
B24 ML Exp-1
No ratings yet
B24 ML Exp-1
10 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Linear Regression Lab Guide
No ratings yet
Linear Regression Lab Guide
5 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Rain in Australia Logistic Regression Classifier
No ratings yet
Rain in Australia Logistic Regression Classifier
10 pages
Week-7 DS Practical
No ratings yet
Week-7 DS Practical
8 pages
Python Data Preprocessing & Regression
No ratings yet
Python Data Preprocessing & Regression
68 pages
Integrated System Lab
No ratings yet
Integrated System Lab
25 pages
ML File - Merged
No ratings yet
ML File - Merged
24 pages
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
No ratings yet
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
5 pages
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
No ratings yet
Lab (Work) Experiment File Priyanka Rajak 0901MC221056
19 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
ML-Unit 4
No ratings yet
ML-Unit 4
29 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Week 7 Laboratory Activity
No ratings yet
Week 7 Laboratory Activity
12 pages
ML Lab Manual
No ratings yet
ML Lab Manual
14 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
ML Lab Manual
No ratings yet
ML Lab Manual
13 pages
ML Exp1 Part A
No ratings yet
ML Exp1 Part A
5 pages
Regression Model Training Guide
No ratings yet
Regression Model Training Guide
13 pages
Intro To Linear and Logistic Reg
No ratings yet
Intro To Linear and Logistic Reg
5 pages
Aiml Exp 7
No ratings yet
Aiml Exp 7
10 pages
AI Lec 3
No ratings yet
AI Lec 3
36 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
Deep Learning Practical Exercises
No ratings yet
Deep Learning Practical Exercises
34 pages
Tiny Logistic Regression Program
No ratings yet
Tiny Logistic Regression Program
48 pages
DSBDL - Write - Ups - 4 To 7
No ratings yet
DSBDL - Write - Ups - 4 To 7
11 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Logistic REGRESSION
No ratings yet
Logistic REGRESSION
10 pages
CL IV Manual
No ratings yet
CL IV Manual
108 pages
Logistic Regression Lab Manual
No ratings yet
Logistic Regression Lab Manual
7 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
Sahil ML
No ratings yet
Sahil ML
21 pages
Regression Analysis
No ratings yet
Regression Analysis
16 pages
Lecture No02
No ratings yet
Lecture No02
82 pages
M02C08 - Technology Overview
No ratings yet
M02C08 - Technology Overview
26 pages
M02C07 - Approve Documents
No ratings yet
M02C07 - Approve Documents
6 pages
m02c04 - Manage Items
No ratings yet
m02c04 - Manage Items
40 pages
M02C06 - Process Sales
No ratings yet
M02C06 - Process Sales
20 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
SPSS Forecasting Techniques Guide
No ratings yet
SPSS Forecasting Techniques Guide
3 pages
Multiple Regression Analysis Homework
No ratings yet
Multiple Regression Analysis Homework
3 pages
WMBA Statistics Exam 2020
No ratings yet
WMBA Statistics Exam 2020
2 pages
Decision Sciences Exam Paper
No ratings yet
Decision Sciences Exam Paper
9 pages
Multinomial Response Models Overview
No ratings yet
Multinomial Response Models Overview
67 pages
Chapter 3 ODL
No ratings yet
Chapter 3 ODL
13 pages
Regression Line Basics for BBA Students
No ratings yet
Regression Line Basics for BBA Students
10 pages
Analisis Algoritma K-Medoids Clustering Dalam Pengelompokan Penyebaran Covid-19 Di Indonesia
No ratings yet
Analisis Algoritma K-Medoids Clustering Dalam Pengelompokan Penyebaran Covid-19 Di Indonesia
8 pages
Bayesian Analysis of Plant Density
No ratings yet
Bayesian Analysis of Plant Density
2 pages
05 - Statind2 - Regresi Linier Sederhana Dan Korelasi
No ratings yet
05 - Statind2 - Regresi Linier Sederhana Dan Korelasi
15 pages
Deming Regression: Methcomp Package May 2007
100% (1)
Deming Regression: Methcomp Package May 2007
10 pages
A Comparison of Cranial Base Growth in Class I
No ratings yet
A Comparison of Cranial Base Growth in Class I
5 pages
Math 141e
No ratings yet
Math 141e
3 pages
Homework 2 Design
No ratings yet
Homework 2 Design
7 pages
Financial Inclusion - Research Paper
No ratings yet
Financial Inclusion - Research Paper
5 pages
Spearman's Rank Answers
No ratings yet
Spearman's Rank Answers
2 pages
Statistika Kekuatan Otot dan Koordinasi
No ratings yet
Statistika Kekuatan Otot dan Koordinasi
21 pages
Econometrics Assignmnent
No ratings yet
Econometrics Assignmnent
1 page
Analysis of Panel Data Fourth Edition Cheng Hsiao Install Download
100% (1)
Analysis of Panel Data Fourth Edition Cheng Hsiao Install Download
84 pages
Research Methods for Tech Firms
No ratings yet
Research Methods for Tech Firms
7 pages
DP 1 Math Ai SL Paper 2 Eoya May 2025
No ratings yet
DP 1 Math Ai SL Paper 2 Eoya May 2025
4 pages
Econometric Analysis II (Theory and Lab) - Course Outline
No ratings yet
Econometric Analysis II (Theory and Lab) - Course Outline
3 pages
Generative vs Discriminative Models in ML
No ratings yet
Generative vs Discriminative Models in ML
9 pages
Curve Fitting Techniques Explained
No ratings yet
Curve Fitting Techniques Explained
157 pages
2021 Quiz2 Problems
No ratings yet
2021 Quiz2 Problems
13 pages
ACSPRI Topic 1 EFA and Regr
No ratings yet
ACSPRI Topic 1 EFA and Regr
35 pages
Multivariate Analysis I: Decarlo@tc - Edu
No ratings yet
Multivariate Analysis I: Decarlo@tc - Edu
2 pages
G Assin
No ratings yet
G Assin
7 pages
Understanding Correlation Types
No ratings yet
Understanding Correlation Types
18 pages

Lab Manual 04

Uploaded by

Lab Manual 04

Uploaded by

UNIVERSITY OF ENGINEERING AND TECHNOLOGY, TAXILA

FACULTY OF TELECOMMUNICATION AND INFORMATION ENGINEERING

Lab Instructor: Sheharyar Khan

The objectives of this session are: -

Train/Test is a method to measure the accuracy of your model.

80% for training, and 20% for testing.

You train the model using the training set.

You test the model using the testing set.

Start With a Data Set

Start with a data set you want to test.

Split Into Train/Test

The testing set should be the remaining 20%.

Lab Instructor: Sheharyar Khan

Display the Training Set

Display the same scatter plot with the training set:

Display the Testing Set

Draw a polynomial regression line through the data points:

mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))

myline = numpy.linspace(0, 6, 100)

Lab Instructor: Sheharyar Khan

Remember R2, also known as R-squared?

How well does my training data fit in a polynomial regression?

Lab Instructor: Sheharyar Khan

mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))

Bring in the Testing Set

Let us find the R2 score when using testing data:

mymodel = numpy.poly1d(numpy.polyfit(train_x, train_y, 4))

Lab Instructor: Sheharyar Khan

Save and Load trained Model

Quick Task Load Saved Model Example

Lab Instructor: Sheharyar Khan

with open('model_pickle','wb') as file:

with open('model_pickle','rb') as file:

Save Trained Model Using joblib (Second Way to save Model)

from sklearn.externals import joblib

Load Saved Model

Question to Think?? What is difference between Joblib and Pickle?

Lab Instructor: Sheharyar Khan

How does it work?

Store the independent variables in X.

Store the dependent variable in y.

Below is a sample dataset:

#X represents the size of a tumor in centimeters.

#Note: X has to be reshaped into a column from a row for the

from sklearn import linear_model

#predict if tumor is cancerous where the size is 3.46mm:

#Reshaped for Logistic function.

#predict if tumor is cancerous where the size is 3.46mm:

See the whole example in action:

#Reshaped for Logistic function.

Lab Instructor: Sheharyar Khan

log_odds = logr.coef_ * x + logr.intercept_

To then convert the log-odds to odds we must exponentiate the log-odds.

probability = odds / (1 + odds)

Lab Instructor: Sheharyar Khan

See the whole example in action:

def logit2prob(logr, X):

Lab Instructor: Sheharyar Khan

Download employee retention dataset from here: https://www.kaggle.com/giripujar/hr-analytics.

Task 3: Run All Examples in the Lab Manual

Lab Instructor: Sheharyar Khan

You might also like