11/23/2019 Example of Logistic Regression in Python - Data to Fish
Data To Fish
Data Science Tutorials
L A S T U P D AT E D O N J U LY 4 T H , 2 0 1 9 ,
Example of Logistic Regression in Python
In this guide, I’ll show you an example of Logistic Regression in Python.
In general, a binary logistic regression describes the relationship between the dependent
binary variable and one or more independent variable/s.
The binary dependent variable has two possible outcomes:
• ‘1’ for true/success; or
• ‘0’ for false/failure
Let’s now see how to apply logistic regression in Python using a practical example.
Steps to Apply Logistic Regression in Python
Step 1: Gather your data
To start with a simple example, let’s say that your goal is to build a logistic regression model in
Python in order to determine whether candidates would get admitted to a prestigious
university.
Here, there are two possible outcomes: Admitted (represented by the value of ‘1’) vs.
Rejected (represented by the value of ‘0’).
https://datatofish.com/logistic-regression-python/ 1/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish
You can then build a logistic regression in Python, where:
• The dependent variable represents whether a person gets admitted; and
• The 3 independent variables are the GMAT score, GPA and Years of work experience
This is how the dataset would look like:
https://datatofish.com/logistic-regression-python/ 2/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish
Note that the above dataset contains 40 observations. In practice, you’ll need a larger sample
size to get more accurate results.
https://datatofish.com/logistic-regression-python/ 3/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish
Step 2: Import the needed Python packages
Before you start, make sure that the following packages are installed in Python:
• pandas – used to create the DataFrame to capture the dataset in Python
• sklearn – used to build the logistic regression model in Python
• seaborn – used to display the results via a Confusion Matrix
You’ll then need to import all the packages as follows:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import seaborn as sn
Step 3: Build a dataframe
For this step, you’ll need to capture the dataset (from step 1) in Python. You can accomplish
this task using pandas Dataframe:
import pandas as pd
candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,7
'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,
'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0
}
df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','
print (df)
Alternatively, you could import the data into Python from an external file.
https://datatofish.com/logistic-regression-python/ 4/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish
Step 4: Create the logistic regression in Python
Now, set the independent variables (represented as X) and the dependent variable
(represented as y) :
X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']
Then, apply train_test_split. For example, you can set the test size to 0.25, and therefore the
model testing will be based on 25% of the dataset, while the model training will be based on
75% of the dataset:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,rand
Apply the logistic regression as follows:
logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
y_pred=logistic_regression.predict(X_test)
Then, use the code below to get the Confusion Matrix:
confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], coln
sn.heatmap(confusion_matrix, annot=True)
For the final part, print the Accuracy:
print('Accuracy: ',metrics.accuracy_score(y_test, y_pred))
Putting all the code components together:
https://datatofish.com/logistic-regression-python/ 5/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import seaborn as sn
candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,7
'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,
'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0
}
df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','
#print (df)
X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,rand
logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
y_pred=logistic_regression.predict(X_test)
confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], coln
sn.heatmap(confusion_matrix, annot=True)
print('Accuracy: ',metrics.accuracy_score(y_test, y_pred))
Run the code in Python, and you’ll get the following Confusion Matrix (with an Accuracy of
0.8):
https://datatofish.com/logistic-regression-python/ 6/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish
As can be observed from the matrix:
• TP = True Positives = 5
• TN = True Negatives = 3
• FP = False Positives = 2
• FN = False Negatives = 0
You can then also get the Accuracy using:
Accuracy = (TP+TN)/Total = (5+3)/10 = 0.8
The accuracy is therefore 80% for the test set.
Diving Deeper into the Results
Let’s now print two components in the python code:
• print (X_test)
• print (y_pred)
Here is the code used:
import pandas as pd
from sklearn.model_selection import train_test_split
https://datatofish.com/logistic-regression-python/ 7/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish
from sklearn.linear_model import LogisticRegression
candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,7
'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,
'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0
}
df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','
X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,rand
logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
y_pred=logistic_regression.predict(X_test)
print (X_test) #test dataset (without the actual outcome)
print (y_pred) #predicted values
Recall that our original dataset (from step 1) had 40 observations. Since we set the test size
to 0.25, then the confusion matrix displayed the results for a total of 10 records (=40*0.25).
These are the 10 test records:
The prediction was also made for those 10 records (where 1 = admitted, while 0 = rejected):
https://datatofish.com/logistic-regression-python/ 8/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish
In the actual dataset (from step-1), you’ll see that for the test data, we got the correct results 8
out of 10 times:
This is matching with the accuracy level of 80%
Checking the Prediction for a New Set of Data
Let’s say that you have a new set of data, with 5 new candidates:
gmat gpa work_experience
590 2 3
740 3.7 4
680 3.3 6
610 2.3 1
710 3 5
Your goal is to use the existing logistic regression model to predict whether the new
candidates will get admitted.
The new set of data can then be captured in a second DataFrame called df2:
https://datatofish.com/logistic-regression-python/ 9/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish
new_candidates = {'gmat': [590,740,680,610,710],
'gpa': [2,3.7,3.3,2.3,3],
'work_experience': [3,4,6,1,5]
}
df2 = pd.DataFrame(new_candidates,columns= ['gmat', 'gpa','work_experien
And here is the complete code to get the prediction for the 5 new candidates:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,7
'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,
'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0
}
df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','
X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,rand
logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
new_candidates = {'gmat': [590,740,680,610,710],
'gpa': [2,3.7,3.3,2.3,3],
'work_experience': [3,4,6,1,5]
}
https://datatofish.com/logistic-regression-python/ 10/11
11/23/2019 Example of Logistic Regression in Python - Data to Fish
df2 = pd.DataFrame(new_candidates,columns= ['gmat', 'gpa','work_experien
y_pred=logistic_regression.predict(df2)
print (df2)
print (y_pred)
Run the code, and you’ll get the following prediction:
The first and fourth candidates are not expected to be admitted, while the other candidates
are expected to be admitted.
PYTHON
Home » Blog » Example of Logistic Regression in Python
https://datatofish.com/logistic-regression-python/ 11/11