0% found this document useful (0 votes)

16 views16 pages

Machine Learning Techniques For Predicting Credit Approvals: Prawar Mundra 2018IMG-037

This document discusses the importance of accurate credit risk assessment for lending organizations and proposes a deep learning model for predicting credit approvals using data from UCI. It outlines the process of exploratory data analysis, data transformations, and the development of various analytical models, including logistic regression and classification trees, to improve credit scoring accuracy. The results indicate that the classification and regression tree model outperforms logistic regression, achieving an accuracy of 86.1% compared to 77%.

Uploaded by

mohitmochi666

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views16 pages

Machine Learning Techniques For Predicting Credit Approvals: Prawar Mundra 2018IMG-037

Uploaded by

mohitmochi666

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Machine learning Techniques for

Predicting Credit Approvals

Prawar Mundra
2018IMG-037
Introduction

● The accurate assessment of consumer credit risk is of uttermost importance for

lending organizations.

● Credit scoring is a widely used technique that helps financial institutions evaluates
the likelihood for a credit applicant to default on the financial obligation and
decide whether to grant credit or not

● The credit industry has experienced a tremendous growth in the past few decades.
The increased number of potential applicants impelled the development of
sophisticated techniques that automate the credit approval procedure and
supervise the financial health of the borrower.
● In the last few decades, various quantitative methods were proposed in the
literature to evaluate consumer loans and improve the credit scoring accuracy (for
a review, see e.g. Crook et al., 2007)
● The goal of a credit scoring model is to classify credit applicants into two classes:
the “good credit” class that is liable to reimburse the financial obligation and the
“bad credit” class that should be denied credit due to the high probability of
defaulting on the financial obligation.
● The classification is contingent on sociodemographic characteristics of the
borrower (such as age, education level, occupation and income), the repayment
Performance on previous loans and the type of loan
● This paper proposes a credit scoring model of consumer loans based on various
analytical models
Objective

● The aim of this paper is to create a deep learning model that can be used to aid
credit card acceptance decisions using data from UCI (University of California,
Irvine).

This analysis is organized as follows:

1. Generate several data visualizations to understand the underlying data;

2. Perform data transformations as needed;

3. Develop research questions about the data; and

4. Generate and apply the model to answer the research questions.

Exploratory Data analysis

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to

summarize their main characteristics, often with visual methods. A statistical model can
be used or not, but primarily EDA is for seeing what the data can tell us beyond the
formal modeling or hypothesis testing task.

EDA tackle specific tasks such as:

● Spotting mistakes and missing data

● Mapping out the underlying structure of the data
● Identifying the most important variables
● Listing anomalies and outliers;
● Testing a hypothesis / checking assumptions related to a specific model

● Clustering and dimension reduction techniques, which help you to create

graphical displays of high dimensional data containing many variables

● Univariate visualization of each field in the raw dataset, with summary statistics

● Bivariate visualizations and summary statistics that allow you to assess the
relationship between each variable in the dataset and the target variable you’re
looking at

● Multivariate visualizations, for mapping and understanding interactions between

different fields in the data

● K-Means Clustering (creating “centres” for each cluster, based on the nearest
mean); Predictive models, e.g. linear regression.
Dataset and codebook for credit approval data
● The first step in any analysis is to obtain the dataset and codebook. Both the dataset and the
codebook can be downloaded for free from the UCI website.
● Once the dataset is loaded, we’ll use the str() function to quickly understand the type of data in
the dataset
Data Transformations
The binary values, such as Approved, need to be converted to 1s and 0s. We’ll need to
do additional transformations such as filling in missing values. That process begins by
first identifying which values are missing and then determining the best way to address
them. We can remove them, zero them out, or estimate a plug value. A scan through the
dataset shows that missing values are labeled with ‘?’. For each variable, we’ll convert
the missing values to NA which will interpret differently than a character value.

1. Continuous Values:

we will use the summary() function to see the descriptive statistics of the numeric
values such as min, max, mean, and median. The range is the difference between
the minimum and maximum values and can be calculated from the summary()
output. For the B variable, the range is 66.5 and the standard deviation is 11.9667.
Missing Values:

Method would be to check the relationship among the numeric values and use a linear regression to
fill them in. The table below shows the correlation between all of the variables.The largest value in the
first row is 0.396 meaning age is most closely correlated with YearsEmployed. Similarly, Debt is mostly
correlated with YearsEmployed.
We can use this information to create a linear regression model between the two
variables. The model produces the two coefficients below: Intercept and
YearsEmployed. These coefficients are used to predict future values. The
YearsEmployed coefficients is multiplied by the value for YearsEmployed and the
intercept is added.
Descriptive Statistics
Descriptive statistics are used to describe the basic features of the data in a study. They provide
simple summaries about the sample and the measures. Together with simple graphics analysis, they
form the basis of virtually every quantitative analysis of data.
● First, we use the mean and standard deviation calculated
● subtract the mean from each value and, finally, divide by the standard deviation.The end
result is the z-score.

We did similar transformations on the other continuous variables and then plotted them.
Categorical Variables (Association Rules)

A categorical variable is a discrete variable that captures qualitative outcomes by

placing observations into fixed groups (or levels).

The data is distributed across factors ‘1’ and ‘0’ plus 12 of them are missing values.
Again, the missing values will not work well in classifier models so we’ll need to fill in
them in. The simplest way to do so is to use the most common value. For example,
since the ‘0’ factor is the most common, we could replace all missing values with ‘o’

Generate Analytic Models

In order to prepare and apply a model to this dataset, we’ll first have to break it into
two subsets. The first will be the training set on which we will develop the model.
The second will be the test dataset which we will use to test the accuracy of our
model. We will allocate 75% of the items to Training and 25% items to the Test set
Base Line
There are 517 applications and 287 or 56% of which were denied. Since more
applications were denied than were approved, our baseline model will predict that all
applications were denied. This simple model would be correct 56% of the time. Our
models have to be more accurate than 56% to add value to the business.

Logistic Regression -Create the Model:

Regression models are useful for predicting continuous (numeric) variables.. However,
the target value in Approved is binary and can only be values of 1 or 0.We could use
linear regression to predict the approval decision using threshold and anything below
assigned to 0 and anything above is assigned to 1. Unfortunately, the predicted values
could be well outside of the 0 to 1 expected range. Therefore, linear or multivariate
regression will not be effective for predicting the values. Instead, logistic regression will
be more useful because it will produce probability that the target value is 1.
Probabilities are always between 0 and 1 so the output will more closely match the
target value range than linear regression.
The model summary shows that the p-
values for each coefficient. Alongside these
coefficients, the summary gives R’s usual
at-a-glance scale of asterisks for
significance.

Using this scale, we can see that the

coefficients for AgeNorm and Debt3 are not
significant. We can likely simplify the
model by removing these two variables
and get nearly the same accuracy.

The confusion matrix shows the

distribution of actual values and predicted
values.

Of the 517 observations, the model

correctly predicted 398 approval decisions
(249 + 149) or about 77% accuracy
Classification and Regression Tree - Create
the Model
Classification and Regression Trees (CART) can be used for similar purposes as logistic
regression. They both can be used to classify items in a dataset to a binary class attribute.
The trees work by splitting the dataset at series of nodes that eventually segregates the
data into the target variable. The models are sometimes referred to as decision trees
because at each node the model determines which path the item should take. They have
an advantage over logarithmic regression models in that the splits or decision are more
easily interpreted than a collection of numerical coefficients and logarithmic scores

The confusion matrix resulting from this CART model shows that we correctly classified 231
denied credit applications and 214 approved applications. The accuracy score for this
model is 86.1% which is better than the 75% accuracy the logistic regression model scored
and significantly better than the baseline model.
Apply the Model
We’ll now apply our classifier model to the test dataset and determine how
effective it is. Our confusion matrix shows 144 items were correctly
predicted for 83% accuracy. We can see that this model is both more
effective and easier to interpret than the logistic regression model.

Conclusion and Future enhancement

In this paper, data preprocessing and transformation techniques are applied
and results are generated by implementing analytical models. The
performance is analyzed using the confusion matrix table. We can also use
this model to make detail testing selections. Any credit application that
does not have the same outcome as predicted by the model is potential
audit exception. The inherent risk is that a credit card was issued to
someone that should have been denied. This account is more likely to
default than a properly approved account which, in turn, exposes the
company to loss. The different machine learning models can be

Unit 2
No ratings yet
Unit 2
48 pages
Analytics Boot Camp
No ratings yet
Analytics Boot Camp
126 pages
Machine Learning for Transport Prediction
80% (5)
Machine Learning for Transport Prediction
118 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Loan Default Prediction - Course Slides
No ratings yet
Loan Default Prediction - Course Slides
92 pages
Article Review 11 Eng
No ratings yet
Article Review 11 Eng
18 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
Assignment 1 DA - E Oct 2023 V1-1
No ratings yet
Assignment 1 DA - E Oct 2023 V1-1
3 pages
Data Mining Problem 2 Report
No ratings yet
Data Mining Problem 2 Report
13 pages
Deep Learning & AI Superhero Guide
No ratings yet
Deep Learning & AI Superhero Guide
24 pages
Linear Regression Hands-On
No ratings yet
Linear Regression Hands-On
27 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
FRA Milestone 1
No ratings yet
FRA Milestone 1
33 pages
Oe Cae 3
No ratings yet
Oe Cae 3
7 pages
Turover Prediction
No ratings yet
Turover Prediction
52 pages
AIMLB PGP 2025 Session 8
No ratings yet
AIMLB PGP 2025 Session 8
52 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Bussiness Report PM
No ratings yet
Bussiness Report PM
44 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Project Employee Absenteeism
No ratings yet
Project Employee Absenteeism
33 pages
Unit 2
No ratings yet
Unit 2
19 pages
Analysis of German Credit Data
100% (1)
Analysis of German Credit Data
24 pages
Capstone Project
No ratings yet
Capstone Project
33 pages
Machine Learning
100% (2)
Machine Learning
30 pages
Capstone Project Vivek
100% (4)
Capstone Project Vivek
145 pages
BUSINESS INTELLIGENCE Docs
No ratings yet
BUSINESS INTELLIGENCE Docs
12 pages
Machine Learning Paper BD
No ratings yet
Machine Learning Paper BD
16 pages
Australian Credit Risk Data Analysis
No ratings yet
Australian Credit Risk Data Analysis
10 pages
StatLearning3r PDF
No ratings yet
StatLearning3r PDF
136 pages
IS5312 Mini Project-2
No ratings yet
IS5312 Mini Project-2
5 pages
Aditya Slides For IBM
No ratings yet
Aditya Slides For IBM
125 pages
5 - InnovatiCS - Data Types - Measure of Shape - Position - Dispersion
No ratings yet
5 - InnovatiCS - Data Types - Measure of Shape - Position - Dispersion
47 pages
Progress Report 2
No ratings yet
Progress Report 2
10 pages
Employee Performance Analysis
No ratings yet
Employee Performance Analysis
3 pages
Certificate in Data Science Foundation PDF
No ratings yet
Certificate in Data Science Foundation PDF
10 pages
Summary and Context
No ratings yet
Summary and Context
51 pages
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
No ratings yet
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
99 pages
Thera Bank Loan Campaign Analysis
100% (1)
Thera Bank Loan Campaign Analysis
21 pages
Thera Bank Loan Campaign Analysis
No ratings yet
Thera Bank Loan Campaign Analysis
21 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Session1 DataCharacteristics
No ratings yet
Session1 DataCharacteristics
41 pages
Credit Card Marketing Analytics
100% (1)
Credit Card Marketing Analytics
18 pages
Lecture 1.2.2
No ratings yet
Lecture 1.2.2
18 pages
Unit6 Part3 General Procedure
No ratings yet
Unit6 Part3 General Procedure
19 pages
2 Data Analysis With Python Workshop - Learning Material: by Maaheen & Megna
No ratings yet
2 Data Analysis With Python Workshop - Learning Material: by Maaheen & Megna
18 pages
CRISP-DM Methodology for Predictive Analytics
No ratings yet
CRISP-DM Methodology for Predictive Analytics
21 pages
Group 11 Data Analytics
No ratings yet
Group 11 Data Analytics
8 pages
Predicting Credit Card Approvals
100% (1)
Predicting Credit Card Approvals
14 pages
Statistical Analysis & Predictive Modeling
No ratings yet
Statistical Analysis & Predictive Modeling
4 pages
An Kit
No ratings yet
An Kit
12 pages
Loan Prediction ML Project
No ratings yet
Loan Prediction ML Project
17 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
Machine Learning: Technical Requirements & Data Processing Guide
No ratings yet
Machine Learning: Technical Requirements & Data Processing Guide
30 pages
Data Analytics & Visualization Guide
No ratings yet
Data Analytics & Visualization Guide
77 pages
Chapter 3 Lecture ACC32021
No ratings yet
Chapter 3 Lecture ACC32021
57 pages
2018 Analyze This Problem Statement
No ratings yet
2018 Analyze This Problem Statement
4 pages
Final 20 Assessment
No ratings yet
Final 20 Assessment
2 pages
Plato's " Saving The Appearances" .
50% (2)
Plato's " Saving The Appearances" .
41 pages
Active Magnetic Bearings For Frictionless Rotating Machineries by Joga Dharma Setiawan
No ratings yet
Active Magnetic Bearings For Frictionless Rotating Machineries by Joga Dharma Setiawan
6 pages
Solutions Stat200 Final Fall2015 Ol4 B
No ratings yet
Solutions Stat200 Final Fall2015 Ol4 B
9 pages
IB 04 Straight Lines (11 16)
100% (1)
IB 04 Straight Lines (11 16)
3 pages
Exact Solutions For Free-Vibration Analysis of Rectangular Plates Using Bessel Functions
No ratings yet
Exact Solutions For Free-Vibration Analysis of Rectangular Plates Using Bessel Functions
5 pages
Latin Squares Design Has Following Features
No ratings yet
Latin Squares Design Has Following Features
9 pages
Class 6 Worksheet Elementary Shapes
No ratings yet
Class 6 Worksheet Elementary Shapes
1 page
EPRG 2014 Guidelines for Pipeline Welds
No ratings yet
EPRG 2014 Guidelines for Pipeline Welds
20 pages
Kruskal's Algorithm for MST Calculation
No ratings yet
Kruskal's Algorithm for MST Calculation
8 pages
66.0.0 Differentiation Q
No ratings yet
66.0.0 Differentiation Q
4 pages
Year 10 Baseline Test Maths Foundation Calculator - Mark Scheme
No ratings yet
Year 10 Baseline Test Maths Foundation Calculator - Mark Scheme
4 pages
Lecture 16 - Worm Gears Worked Out Problems
50% (2)
Lecture 16 - Worm Gears Worked Out Problems
19 pages
Java Programming Questions
100% (1)
Java Programming Questions
50 pages
3D Trigonometry Worksheet
No ratings yet
3D Trigonometry Worksheet
35 pages
Trigonometric Functions Practice Key
No ratings yet
Trigonometric Functions Practice Key
9 pages
Aircraft Skin Manufacturing and Material Properties
No ratings yet
Aircraft Skin Manufacturing and Material Properties
6 pages
Agile Web Development with Rails 7 2nd Edition Sam Ruby ebook one stop download
100% (1)
Agile Web Development with Rails 7 2nd Edition Sam Ruby ebook one stop download
146 pages
Theory of Elasticity-Polar Coordinates
0% (1)
Theory of Elasticity-Polar Coordinates
17 pages
Butterfly Effects PDF
No ratings yet
Butterfly Effects PDF
3 pages
MAPLE Basics: Quick Start Guide
No ratings yet
MAPLE Basics: Quick Start Guide
6 pages
33 As Statistics Unit 5 Test
No ratings yet
33 As Statistics Unit 5 Test
2 pages
CH 5 More About Block Ciphers-3
No ratings yet
CH 5 More About Block Ciphers-3
68 pages
DLP8
100% (1)
DLP8
4 pages
Causal Thermodynamics in Relativity - Maartens
No ratings yet
Causal Thermodynamics in Relativity - Maartens
39 pages
Mathematical Biology Textbook
No ratings yet
Mathematical Biology Textbook
119 pages
Term 1 STD 12 Paper Solutions
No ratings yet
Term 1 STD 12 Paper Solutions
14 pages
NC 5 Prezentare Eng
No ratings yet
NC 5 Prezentare Eng
39 pages
Formation Evaluation & Petrophysics Guide
100% (3)
Formation Evaluation & Petrophysics Guide
273 pages
Linear Algebra: Basis and Spanning Sets
No ratings yet
Linear Algebra: Basis and Spanning Sets
19 pages

Machine Learning Techniques For Predicting Credit Approvals: Prawar Mundra 2018IMG-037

Uploaded by

Machine Learning Techniques For Predicting Credit Approvals: Prawar Mundra 2018IMG-037

Uploaded by

Machine learning Techniques for

Predicting Credit Approvals

● The accurate assessment of consumer credit risk is of uttermost importance for

This analysis is organized as follows:

2. Perform data transformations as needed;

3. Develop research questions about the data; and

4. Generate and apply the model to answer the research questions.

In statistics, exploratory data analysis (EDA) is an approach to analyzing data sets to

EDA tackle specific tasks such as:

● Spotting mistakes and missing data

● Clustering and dimension reduction techniques, which help you to create

● Multivariate visualizations, for mapping and understanding interactions between

A categorical variable is a discrete variable that captures qualitative outcomes by

Generate Analytic Models

Logistic Regression -Create the Model:

Using this scale, we can see that the

The confusion matrix shows the

Of the 517 observations, the model

Conclusion and Future enhancement

You might also like