0% found this document useful (0 votes)
19 views11 pages

Example of Logistic Regression in Python - Data To Fish

This document provides a tutorial on implementing logistic regression in Python, focusing on predicting university admissions based on candidates' GMAT scores, GPAs, and work experience. It outlines the steps to gather data, import necessary packages, build a dataframe, and apply the logistic regression model, including how to evaluate its accuracy using a confusion matrix. The tutorial concludes with an example of predicting outcomes for new candidates using the trained model.

Uploaded by

A MISHRA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views11 pages

Example of Logistic Regression in Python - Data To Fish

This document provides a tutorial on implementing logistic regression in Python, focusing on predicting university admissions based on candidates' GMAT scores, GPAs, and work experience. It outlines the steps to gather data, import necessary packages, build a dataframe, and apply the logistic regression model, including how to evaluate its accuracy using a confusion matrix. The tutorial concludes with an example of predicting outcomes for new candidates using the trained model.

Uploaded by

A MISHRA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

11/23/2019 Example of Logistic Regression in Python - Data to Fish

Data To Fish
Data Science Tutorials

L A S T U P D AT E D O N J U LY 4 T H , 2 0 1 9 ,

Example of Logistic Regression in Python


In this guide, I’ll show you an example of Logistic Regression in Python.

In general, a binary logistic regression describes the relationship between the dependent
binary variable and one or more independent variable/s.

The binary dependent variable has two possible outcomes:

• ‘1’ for true/success; or


• ‘0’ for false/failure

Let’s now see how to apply logistic regression in Python using a practical example.

Steps to Apply Logistic Regression in Python

Step 1: Gather your data


To start with a simple example, let’s say that your goal is to build a logistic regression model in
Python in order to determine whether candidates would get admitted to a prestigious
university.

Here, there are two possible outcomes: Admitted (represented by the value of ‘1’) vs.
Rejected (represented by the value of ‘0’).

https://datatofish.com/logistic-regression-python/ 1/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish

You can then build a logistic regression in Python, where:

• The dependent variable represents whether a person gets admitted; and


• The 3 independent variables are the GMAT score, GPA and Years of work experience

This is how the dataset would look like:

https://datatofish.com/logistic-regression-python/ 2/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish

Note that the above dataset contains 40 observations. In practice, you’ll need a larger sample
size to get more accurate results.

https://datatofish.com/logistic-regression-python/ 3/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish

Step 2: Import the needed Python packages


Before you start, make sure that the following packages are installed in Python:

• pandas – used to create the DataFrame to capture the dataset in Python


• sklearn – used to build the logistic regression model in Python
• seaborn – used to display the results via a Confusion Matrix

You’ll then need to import all the packages as follows:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import seaborn as sn

Step 3: Build a dataframe


For this step, you’ll need to capture the dataset (from step 1) in Python. You can accomplish
this task using pandas Dataframe:

import pandas as pd
candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,7
'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,
'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0
}

df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','


print (df)

Alternatively, you could import the data into Python from an external file.

https://datatofish.com/logistic-regression-python/ 4/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish

Step 4: Create the logistic regression in Python


Now, set the independent variables (represented as X) and the dependent variable
(represented as y) :

X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']

Then, apply train_test_split. For example, you can set the test size to 0.25, and therefore the
model testing will be based on 25% of the dataset, while the model training will be based on
75% of the dataset:

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,rand

Apply the logistic regression as follows:

logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
y_pred=logistic_regression.predict(X_test)

Then, use the code below to get the Confusion Matrix:

confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], coln


sn.heatmap(confusion_matrix, annot=True)

For the final part, print the Accuracy:

print('Accuracy: ',metrics.accuracy_score(y_test, y_pred))

Putting all the code components together:


https://datatofish.com/logistic-regression-python/ 5/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import seaborn as sn

candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,7


'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,
'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0
}

df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','

#print (df)

X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,rand

logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
y_pred=logistic_regression.predict(X_test)

confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], coln


sn.heatmap(confusion_matrix, annot=True)

print('Accuracy: ',metrics.accuracy_score(y_test, y_pred))

Run the code in Python, and you’ll get the following Confusion Matrix (with an Accuracy of
0.8):

https://datatofish.com/logistic-regression-python/ 6/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish

As can be observed from the matrix:

• TP = True Positives = 5
• TN = True Negatives = 3
• FP = False Positives = 2
• FN = False Negatives = 0

You can then also get the Accuracy using:

Accuracy = (TP+TN)/Total = (5+3)/10 = 0.8

The accuracy is therefore 80% for the test set.

Diving Deeper into the Results


Let’s now print two components in the python code:

• print (X_test)
• print (y_pred)

Here is the code used:

import pandas as pd
from sklearn.model_selection import train_test_split

https://datatofish.com/logistic-regression-python/ 7/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish

from sklearn.linear_model import LogisticRegression

candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,7


'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,
'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0
}

df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','

X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,rand

logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
y_pred=logistic_regression.predict(X_test)

print (X_test) #test dataset (without the actual outcome)


print (y_pred) #predicted values

Recall that our original dataset (from step 1) had 40 observations. Since we set the test size
to 0.25, then the confusion matrix displayed the results for a total of 10 records (=40*0.25).
These are the 10 test records:

The prediction was also made for those 10 records (where 1 = admitted, while 0 = rejected):

https://datatofish.com/logistic-regression-python/ 8/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish

In the actual dataset (from step-1), you’ll see that for the test data, we got the correct results 8
out of 10 times:

This is matching with the accuracy level of 80%

Checking the Prediction for a New Set of Data


Let’s say that you have a new set of data, with 5 new candidates:

gmat gpa work_experience

590 2 3

740 3.7 4

680 3.3 6

610 2.3 1

710 3 5

Your goal is to use the existing logistic regression model to predict whether the new
candidates will get admitted.

The new set of data can then be captured in a second DataFrame called df2:

https://datatofish.com/logistic-regression-python/ 9/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish

new_candidates = {'gmat': [590,740,680,610,710],


'gpa': [2,3.7,3.3,2.3,3],
'work_experience': [3,4,6,1,5]
}

df2 = pd.DataFrame(new_candidates,columns= ['gmat', 'gpa','work_experien

And here is the complete code to get the prediction for the 5 new candidates:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,7


'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,
'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0
}

df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','

X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,rand

logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)

new_candidates = {'gmat': [590,740,680,610,710],


'gpa': [2,3.7,3.3,2.3,3],
'work_experience': [3,4,6,1,5]
}

https://datatofish.com/logistic-regression-python/ 10/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish

df2 = pd.DataFrame(new_candidates,columns= ['gmat', 'gpa','work_experien


y_pred=logistic_regression.predict(df2)

print (df2)
print (y_pred)

Run the code, and you’ll get the following prediction:

The first and fourth candidates are not expected to be admitted, while the other candidates
are expected to be admitted.

PYTHON

Home » Blog » Example of Logistic Regression in Python

https://datatofish.com/logistic-regression-python/ 11/11

You might also like