0% found this document useful (0 votes)
23 views21 pages

Maths

Uploaded by

kushalabrijesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views21 pages

Maths

Uploaded by

kushalabrijesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Topic

PREDICTING
POSSIBLE LOAN
DEFAULTERS
L S
KI A M O
UK N A U
S HU N J
HI S I A
AT H S N
L HA H Y
AA A A
L S
KI A M O
UK N A U
S HU N J
HI S I A
AT H S N
L HA H Y
AA A A
L S
KI M A O
UK A N U
SH N U J
HI I S A
AT S H N
LH H A Y
AA A A
L S
KI M A O
UK A N U
SH N U J
HI I S A
AT S H N
LH H A Y
AA A A
L S
K I M A O
U K A N U
S H N U J
H I I S A
A T S H N
L H H A Y
A A A A
OUR TEAM
KUSHALA B Coding and Data
1KS23AI023
GOWDA Analysis

Coding and
MANISHA T P 1KS23AI028
Data collection

Presentation
ANUSHA C 1KS23AI003 Layout and
Design

Report and
LIKHITHA M 1KS23AI025
editing

Presentation
SOUJANYA Coding
Typing

7
Objectives
The program aims to predict the likelihood of
loan defaults among borrowers using statistical,
probabilistic, and machine learning techniques.

This helps financial institutions make informed


lending decisions and manage risk effectively.

Language
Programming language used is python.

8
LOAN
In Last Class We Discussed About

1. PROBLEMS FACED BY BANK


2. SOLUTIONS
3. MATHEMATIC TOOLS
* STATISTICS
* GRAPHS
*PROBABILITY

9
SAMPLE DATA

1
Married House_ Car_
Experienc / Ownershi Ownershi Professio CURRENT_ CURRENT_
ID Income Age e Single p p n CITY STATE JOB_YRS HOUSE_YRS
739309 West
1 0 59 19 single rented no Geologist Malda Bengal 4 13
121500 Firefighte Maharashtr
2 4 25 5 single rented no r Jalna a 5 10
890134 Maharashtr
3 2 50 12 single rented no Lawyer Thane a 9 14
194442 Maharashtr
4 1 49 9 married rented yes Analyst Latur a 3 12
Comedia West
5 13429 25 18 single rented yes n Berhampore Bengal 13 11
343762 Economis
6 1 78 14 single rented no t Ramgarh Jharkhand 3 10
510149
7 8 55 0 married rented no Artist Pallavaram Tamil Nadu 0 14
671694 Flight
8 6 70 15 single rented yes attendant Yamunanagar Haryana 14 13
836980
9 2 43 7 single rented no Secretary Anand Gujarat 6 13
1 956545 Andhra
0 7 65 5 single rented yes Engineer Nandyal Pradesh 3 12
1
CODE
LANGUAGE -->PYTHON
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))

import matplotlib.pyplot as plt


import seaborn as sns
%matplotlib inline

sns.set_theme(style = "darkgrid")

data = pd.read_csv("/kaggle/input/loan-prediction-based-on-customer-behavior/Training Data.csv")


data.head()

rows, columns = data.shape #understanding the data set


('Rows:', rows)
('Columns:',columns)

data.info()
1
()
data.isnull().sum()

data.columns

data.describe() #Analysing Numerical columns

data.corr()

data.hist( figsize = (22, 20) )


plt.show()

data["Risk_Flag"].value_counts()
fig, ax = plt.subplots( figsize = (12,8) )
corr_matrix = data.corr()
corr_heatmap = sns.heatmap( corr_matrix, cmap = "flare", annot=True, ax=ax, annot_kws={"size": 14})
plt.show()

def categorical_valcount_hist(feature): #Analysing the categorical features


(data[feature].value_counts())
fig, ax = plt.subplots( figsize = (6,6) )
sns.countplot(x=feature, ax=ax, data=data)
plt.show()
categorical_valcount_hist("Married/Single")
categorical_valcount_hist("House_Ownership")

Print( "Total categories in STATE:", len(data["STATE"].unique() ) )


Print()
Print(data["STATE"].value_counts() )
Print( "Total categories in Profession:",len ( data["Profession"].unique() ) )
Print()
data["Profession"].value_counts()

data.info() #Data Analysis

sns.boxplot(x ="Risk_Flag",y="Income" ,data = data)


sns.boxplot(x ="Risk_Flag",y="Age" ,data = data)
sns.boxplot(x ="Risk_Flag",y="Experience" ,data = data)
sns.boxplot(x ="Risk_Flag",y="CURRENT_JOB_YRS" ,data = data)
sns.boxplot(x ="Risk_Flag",y="CURRENT_HOUSE_YRS" ,data = data)
fig, ax = plt.subplots( figsize = (8, 6) )
sns.countplot(x='House_Ownership', hue='Risk_Flag', ax=ax, data=data)

fig, ax = plt.subplots( figsize = (8,6) )


sns.countplot(x='Car_Ownership', hue='Risk_Flag', ax=ax, data=data)

fig, ax = plt.subplots( figsize = (8,6) )


sns.countplot( x='Married/Single', hue='Risk_Flag', data=data )

fig, ax = plt.subplots( figsize = (10,8) )


sns.boxplot(x = "Risk_Flag", y = "CURRENT_JOB_YRS", hue='House_Ownership', data = data)

#Feature Engineering
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
import category_encoders as ce

data.info()

label_encoder = LabelEncoder()
for col in ['Married/Single','Car_Ownership’]:
data[col] = label_encoder.fit_transform( data[col] )
onehot_encoder = OneHotEncoder(sparse = False)
data['House_Ownership'] = onehot_encoder.fit_transform(data['House_Ownership'].values.reshape(-1, 1) )

high_card_features = ['Profession', 'CITY', 'STATE']


count_encoder = ce.CountEncoder()

# Transform the features, rename the columns with the _count suffix, and join to dataframe
count_encoded = count_encoder.fit_transform( data[high_card_features] )
data = data.join(count_encoded.add_suffix("_count"))
data.head()
data= data.drop(labels=['Profession', 'CITY', 'STATE'], axis=1)
data.head()

#Splitting the data into train and test splits


x = data.drop("Risk_Flag", axis=1)y = data["Risk_Flag"]
from sklearn.model_selection import train_test_splitx_train, x_test, y_train, y_test = train_test_split(x, y,
test_size = 0.2, stratify = y, random_state = 7)
#Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier
from imblearn.over_sampling import SMOTE
from imblearn.pipeline import Pipeline

rf_clf = RandomForestClassifier(criterion='gini', bootstrap=True, random_state=100)


smote_sampler = SMOTE(random_state=9)
pipeline = Pipeline(steps = [['smote', smote_sampler], ['classifier', rf_clf]])
pipeline.fit(x_train, y_train)
y_pred = pipeline.predict(x_test)
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score,
accuracy_score, roc_auc_score

Print("-------------------------TEST SCORES-----------------------")
Print(f"Recall: {(recall_score(y_test, y_pred)*100, 4) }")
Print(f"Precision: ({precision_score(y_test, y_pred)*100, 4) }")
Print(f"F1-Score:{(f1_score(y_test, y_pred)*100, 4)} ")
Print(f"Accuracy score: {(accuracy_score(y_test, y_pred)*100, 4) }")
Print(f"AUC Score:{ (roc_auc_score(y_test, y_pred)*100, 4) }")
Reference

YOUTUBE CHANNELS
@NYCDataScienceAcademy
@PyDataTV
WEBSITES
global.pydata.org
numfocus.org
https://www.kaggle.com/

2
THANK YOU

You might also like