100% found this document useful (1 vote)

351 views15 pages

Supervised Learning

This document performs exploratory data analysis and builds machine learning models on a bank personal loan dataset. It begins with univariate analysis of each attribute to understand patterns and insights. Correlation analysis is done between all variables. The education variable is one-hot encoded. The data is split into train and test sets. Multiple classification models are trained - Logistic Regression, Naive Bayes, and KNN. Their performance is evaluated on the test set using various metrics stored in a dataframe. Based on the results, KNN is identified as the best performing model with highest accuracy, specificity, precision and F1 score.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

351 views15 pages

Supervised Learning

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

1.

Read the column description and ensure you understand each attribute well

import numpy as np
import pandas as pd
import matplotlib as plt
import seaborn as sns
%matplotlib inline

data = pd.read_csv('Bank_Personal_Loan_Modelling.csv')
data.head()

data.dtypes

data.shape
2. Perform univariate analysis of each and every attribute - use an appropriate plot
for a given attribute and mention your insights

sns.histplot(x=data['Age'], hue =data['Personal Loan'], bins = 5 )

sns.countplot(x=data['Family'], hue=data['Personal Loan'])

sns.countplot(x=data['Education'], hue=data['Personal Loan'])

sns.countplot(x=data['Online'], hue=data['Personal Loan'])

sns.countplot(x=data['Securities Account'], hue=data['Personal Loan'])

sns.countplot(x=data['CD Account'], hue=data['Personal Loan'])

sns.countplot(x=data['CreditCard'], hue=data['Personal Loan'])

sns.histplot(x=data['Income'], hue=data['Personal Loan'], hue_order = [1,0])

Inference:
i. The age groups of 35-45 take the maximum number of personal loans
ii. The bank has maximum customers with 1 family member, but this customer
group has the lowest personal loan number whereas customers with family of
3 and 4 members have taken a greater number of personal loans
iii. More customers with advanced/professional education take personal loan as
compared to the customers with undergrad or graduate education
iv. A greater number of customers who avail internet banking facilities avail
persona loans than those who do not avail internet banking but the percentage
of both is approximately same.
v. Customers without securities account with the bank avail more personal loan
vi. More number of customers without CD account use personal loans although
more percentage of customers with CD account use personal loan
vii. Higher number of customers without credit cards avail personal loan
viii. The count of people who have taken personal loan is the highest between the
income range of $120,000 – $140,000 and then rises again in the $170,000 –
$190,000. Lower income groups have not taken personal loans
3. Perform correlation analysis among all the variables - you can use Pairplot and
Correlation coefficients of every attribute with every other attribute

datacor = data.corr()
datacor

sns.pairplot(data, diag_kind='kde')
plt.subplots(figsize=(12,10))
sns.heatmap(datacor,annot=True)
4. One hot encode the Education variable (3 points)

OHED = pd.get_dummies(data, columns =['Education'])

OHED

5. Separate the data into dependant and independent variables and create training
and test sets out of them (X_train, y_train, X_test, y_test) (2 points)

Before separating, we need to drop the redundant columns and append the education
one hot encoded data into the main data.

Here, the dependent variable and the variable of interest is Personal loan, therefore,
Personal Loan column would be y axis, the rest of the columns would be counted as
independent variables.

data['Experience']=abs(data['Experience'])
dt = OHED
dt = dt.drop(['ID'], axis =1)
dt = dt.drop(['ZIP Code'], axis=1)

X = dt.drop(['Personal Loan'], axis=1)

Y = dt['Personal Loan']

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X,Y, train_size = 0.7, test_size=0.3,
random_state = 100)

X_train
Y_train
print("Training set: {0:0.2f}% ".format((len(X_train)/len(data.index)) * 100))
print("Test set: {0:0.2f}% ".format((len(X_test)/len(data.index)) * 100))

6. Use StandardScaler( ) from sklearn, to transform the training and test data into
scaled values ( fit the StandardScaler object to the train data and transform
train and test data using this object, making sure that the test set does not
influence the values of the train set)

from sklearn.preprocessing import StandardScaler

Std_Scalar = StandardScaler()
X_train = Std_Scalar.fit_transform(X_train)
X_test = Std_Scalar.fit_transform(X_test)
X_train
X_test

7. Write a function which takes a model, X_train, X_test, y_train and y_test as
input and returns the accuracy, recall, precision, specificity, f1_score of the
model trained on the train set and evaluated on the test set

def funct(confusion_matrix):

total=sum(sum(confusion_matrix))
accuracy= (confusion_matrix[0,0]+confusion_matrix[1,1])/total
print ('Accuracy :{:.2%}'.format(accuracy))

specificity =
confusion_matrix[0,0]/(confusion_matrix[0,0]+confusion_matrix[0,1])
print('Specificity : {:.2%}'.format(specificity) )

sensitivity = confusion_matrix[1,1]/(confusion_matrix[1,0]+confusion_matrix[1,1])
print('Sensitivity : {:.2%}'.format(sensitivity))

precision = confusion_matrix[1,1]/(confusion_matrix[1,1]+confusion_matrix[0,1])
print('Precision : {:.2%}'.format(precision))

F1 = 2*((precision*sensitivity)/(precision+sensitivity))
print('F1 Score : {:.2%}'.format(F1))

return accuracy, sensitivity, specificity, precision, F1

8. Employ multiple Classification models (Logistic, K-NN, Naïve Bayes etc) and use
the function from step 7 to train and get the metrics of the model

Logistic Regression:

# Import libraries
from sklearn import metrics
from sklearn.linear_model import LogisticRegression

# Model
Log_model = LogisticRegression(solver="liblinear")
Log_model.fit(X_train, Y_train)

predict = Log_model.predict(X_test)
coef_df = pd.DataFrame(Log_model.coef_)
coef_df['intercept'] = Log_model.intercept_
print(coef_df)

# Score
score = Log_model.score(X_test, Y_test)
print(score)

# Confusion matrix
from sklearn.metrics import confusion_matrix
conf_mat_Log = confusion_matrix(Y_test, predict)
print(conf_mat_Log)
test_mat = funct(conf_mat_Log)

Naïve Bayes

# Import required libraries

from sklearn.naive_bayes import GaussianNB

# Model
GNB1 = GaussianNB()
GNB1.fit(X_train, Y_train)
predicted_labels_GNB = GNB1.predict(X_test)

GNB1.score(X_test, Y_test)

# Confusion matrix
con_mat = metrics.confusion_matrix(Y_test, predicted_labels_GNB)
print(con_mat)
u = funct(con_mat)
K-NN

# Import required libraries

from sklearn.neighbors import KNeighborsClassifier

# Model
NNH = KNeighborsClassifier(n_neighbors= 5 , weights = 'uniform' )
NNH.fit(X_train, Y_train)

predicted_labels_KNN = NNH.predict(X_test)

NNH.score(X_test, Y_test)

# Create confusion matrix

Conf_matrix_KNN = metrics.confusion_matrix(Y_test, predicted_labels_KNN)
print(Conf_matrix_KNN)
tests = funct(Conf_matrix_KNN)

9. Create a dataframe with the columns - “Model”, “accuracy”, “recall”,

“precision”, “specificity”, “f1_score”. Populate the dataframe accordingly

df = pd.DataFrame({'model': ['Logistic', 'Naive','KNN'],

'Accuracy':[95.47,90.53,95.60],
'Specificity':[99.25,93.89,99.55],
'sensitivity':[63.29,62.03,62.03],
'precision':[90.91,54.44,94.23],
'F1 Score':[74.63,57.99,74.81]

})
df
10. Give your reasoning on which is the best model in this case
In this case, Naïve Bayes is the worst fit model with lowest accuracy, F1 score,
Precision and specificity.
K-NN has the highest accuracy (95.60%), specificity (99.55%), precision (94.23%)
and F1 score (74.81%). Although the sensitivity of logistic regression is higher than
K-NN (63.29%), K-NN showcases an acceptable level of sensitivity at 62.03%.
Therefore, from the above data, it can be seen that K-NN is the best fit model for the
given bank data.

DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
DA Programs
No ratings yet
DA Programs
44 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
Loan Default Prediction System 1753830667
No ratings yet
Loan Default Prediction System 1753830667
11 pages
Final-12-Lab Programs
No ratings yet
Final-12-Lab Programs
30 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Regression Analysis Cheat Sheet
No ratings yet
Regression Analysis Cheat Sheet
9 pages
Train
No ratings yet
Train
17 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
Titanic Data Analysis with Python
No ratings yet
Titanic Data Analysis with Python
20 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
Medical Data ML
No ratings yet
Medical Data ML
6 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
34 pages
EDA and Modeling for Insurance Data
No ratings yet
EDA and Modeling for Insurance Data
11 pages
Credit - Defaulters - Prediction Using Logostic Regression
No ratings yet
Credit - Defaulters - Prediction Using Logostic Regression
17 pages
Home Work
No ratings yet
Home Work
12 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
ML PDF
No ratings yet
ML PDF
30 pages
Easy Pract ML
No ratings yet
Easy Pract ML
7 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
A09Ass05 - Jupyter Notebook
No ratings yet
A09Ass05 - Jupyter Notebook
15 pages
ML Manual Final
No ratings yet
ML Manual Final
35 pages
1
No ratings yet
1
13 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Da 012307
No ratings yet
Da 012307
8 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Classification
No ratings yet
Classification
3 pages
AIML Project
No ratings yet
AIML Project
4 pages
Da Rec
No ratings yet
Da Rec
29 pages
Banking Marketing Target Prediction
No ratings yet
Banking Marketing Target Prediction
13 pages
R Assignment
No ratings yet
R Assignment
8 pages
ML 1-10
No ratings yet
ML 1-10
53 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
Practicalpgm ML
No ratings yet
Practicalpgm ML
33 pages
Data Analysis and Visualization Guide
No ratings yet
Data Analysis and Visualization Guide
16 pages
ML Manual With Outputs
No ratings yet
ML Manual With Outputs
30 pages
ML Lab
No ratings yet
ML Lab
29 pages
FIND-S and Decision Tree Algorithms Explained
No ratings yet
FIND-S and Decision Tree Algorithms Explained
24 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
Dsbda 5
No ratings yet
Dsbda 5
4 pages
Codes
No ratings yet
Codes
14 pages
Universal Bank Card Analysis
No ratings yet
Universal Bank Card Analysis
9 pages
Datascience PR 6 Veda
No ratings yet
Datascience PR 6 Veda
6 pages
CSE AIML Flood Prediction Guide
No ratings yet
CSE AIML Flood Prediction Guide
5 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
ML Full For Print New 1
No ratings yet
ML Full For Print New 1
38 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
Model Learning Steps
No ratings yet
Model Learning Steps
12 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Bacdeaf 23032025 115708 Split 1
No ratings yet
Bacdeaf 23032025 115708 Split 1
37 pages
Samplecode (HDPS)
No ratings yet
Samplecode (HDPS)
29 pages
Lesson Plan GR 9 Creative Arts Visual Art T4 W1, 2
No ratings yet
Lesson Plan GR 9 Creative Arts Visual Art T4 W1, 2
6 pages
A Long Journey
No ratings yet
A Long Journey
12 pages
3431 Wksiteposter en
No ratings yet
3431 Wksiteposter en
2 pages
Đề Cương Ôn Tập Cuối Học Kì 1 Tiếng Anh Lớp 9 Global Success (12.12.2024) - Tiếng Anh Thầy Thắng
No ratings yet
Đề Cương Ôn Tập Cuối Học Kì 1 Tiếng Anh Lớp 9 Global Success (12.12.2024) - Tiếng Anh Thầy Thắng
7 pages
15th Sha'ban: Night of Worship
No ratings yet
15th Sha'ban: Night of Worship
19 pages
Media-Based Arts in the Philippines: LAS 10
No ratings yet
Media-Based Arts in the Philippines: LAS 10
7 pages
Foreign Policy of Roosevelt
No ratings yet
Foreign Policy of Roosevelt
2 pages
My Life - Past - Present - and Future
No ratings yet
My Life - Past - Present - and Future
15 pages
Supply Chain Specialist Resume
No ratings yet
Supply Chain Specialist Resume
2 pages
Global Legal Systems Overview
No ratings yet
Global Legal Systems Overview
99 pages
Class 9 Portion (Half Yearly Exam 2025-26)
No ratings yet
Class 9 Portion (Half Yearly Exam 2025-26)
5 pages
Lucy_10186792_Thesis_redacted
No ratings yet
Lucy_10186792_Thesis_redacted
333 pages
History of Group Work
No ratings yet
History of Group Work
3 pages
Tony Stark: The Iron Man Legacy
No ratings yet
Tony Stark: The Iron Man Legacy
1 page
Syllabus: Agusan Del Sur State College of Agriculture and Technology
No ratings yet
Syllabus: Agusan Del Sur State College of Agriculture and Technology
12 pages
African Historiography Challenges
No ratings yet
African Historiography Challenges
14 pages
Nonviolent Communication - A Literature Review (2023)
No ratings yet
Nonviolent Communication - A Literature Review (2023)
13 pages
Cebu City Mandamus Appeal Decision
No ratings yet
Cebu City Mandamus Appeal Decision
3 pages
ASL Course for Junior High Students
No ratings yet
ASL Course for Junior High Students
1 page
Engaging Activities for ESL Teachers
No ratings yet
Engaging Activities for ESL Teachers
4 pages
Congreve's Use of Wit and Humour. SH C
100% (1)
Congreve's Use of Wit and Humour. SH C
3 pages
Logic Definitions & Laws Guide
No ratings yet
Logic Definitions & Laws Guide
3 pages
Framing Legal Issues: A Guide
No ratings yet
Framing Legal Issues: A Guide
16 pages
UPSC ESE Electrical Engineering Syllabus
No ratings yet
UPSC ESE Electrical Engineering Syllabus
4 pages
Province of Apayao
No ratings yet
Province of Apayao
24 pages
Characters in Motion
No ratings yet
Characters in Motion
25 pages
Erratum Notice To Be Opened On Receipt
No ratings yet
Erratum Notice To Be Opened On Receipt
1 page
Robert I Binnick (1979) : Modern Mongolian: A Transformational Syntax
100% (2)
Robert I Binnick (1979) : Modern Mongolian: A Transformational Syntax
86 pages
Making Music Mean
No ratings yet
Making Music Mean
160 pages
Anne of Green Gables: A Level 2 Retelling
100% (1)
Anne of Green Gables: A Level 2 Retelling
41 pages