Loan Prediction Using Machine Learning
Loan Prediction Using Machine Learning
Hemanth Kumar Sarisa, Varun Khurana, Venkat Chandu Koti and Neha Garg
Department of computer science & engineering
Manav Rachna International Institute of Research and Studies, India
[email protected], [email protected], [email protected],
[email protected]
Abstract: Loan prediction is a significant problem in the banking 1. Identify the most important factors that influence loan
industry. Using Technology, will change and make some repayment, such as credit score, income level, loan
improvements in Loan prediction by bank employees manually. amount, and loan term [4].
By using “Machine learning “in loan prediction. The machine 2. Collect and analyze a large dataset of loan applications
learning system will check the data for a loan of the applicant
and the data will be checked so fast and without any human
and outcomes to train and test our machine learning
error. So, it will help to improve technology in Bank sectors. By model.
this, processing the data will be checked quickly, making the 3. Evaluate the performance of different machine learning
data output in less time. It helps both the applicant and the bank algorithms, such as logistic regression, decision trees,
employees. and neural networks, in predicting loan repayment.
4. Compare the accuracy of our machine learning model to
Machine learning algorithms can be used to analyze historical
data and make accurate predictions about whether a loan traditional credit scoring methods, such as FICO scores.
application will be approved or not. In this study, we propose a 5. Provide recommendations for lending institutions on
machine learning-based loan prediction model that utilizes how to use machine learning models to improve loan
various features such as credit score, income, loan amount, and approval processes and minimize risk [7].
loan term. We explore different classification algorithms, 6. Ensure that the loan prediction model complies with
including logistic regression, decision tree, and random forest, to legal and regulatory requirements related to lending
build the loan prediction model. Our experimental results show practices and anti-discrimination laws.
that the random forest algorithm outperforms other models with
an accuracy rate of 83.45%. The proposed model can help banks Overall, the objective of this research is to demonstrate the
and financial institutions streamline their loan approval process, potential of machine learning in the lending industry and its
reduce manual effort, and minimize the risk of bad loans. ability to improve loan decision-making processes.
Keywords: Python, Machine Learning, Data Pre-Processing, III. LITERATURE REVIEW
Random Forest Classifier, Decision Tree, K-Neighbours
Classifier There are micro and macro factors that affect housing
expenses. These components are divided into three crucial
I. INTRODUCTION
groups for this investigation: state of being, thought, action,
Machine learning has become an increasingly popular tool and region [2]. The range of the house, the number of rooms,
for predicting loan outcomes in recent years. Lenders are the availability of a kitchen and parking space, the openness
always looking for ways to minimize risk and increase the of the yard nursery, the zone of land and structures, and the
chances of loan repayment. Machine learning models can be age of the house are states of being that can be observed by
trained to analyze large datasets and predict whether or not a human beings, while the thought is an idea offered by
borrower will be able to repay a loan based on a variety of architects to entice potential buyers, such as the possibility of
factors [1].The goal of this research paper is to explore the use a moderate home, strong and green conditions, and world-
of machine learning algorithms in predicting loan outcomes. class conditions [3]. The zone has a significant impact on how
We will analyze various factors that influence loan much a home costs. Loan prediction through machine learning
repayment, including credit score, income level, and loan techniques has gained significant attention in the financial
amount [15]. We will also evaluate the performance of industry, owing to its potential to enhance credit risk
different machine learning models in predicting loan assessment [6], streamline loan approval processes, and
repayment.This research has significant practical implications reduces default rates. Scholars have investigated diverse
for lending institutions, as it can help them make more components influencing loan outcomes.
informed decisions about which loans to approve and which
borrowers to lend to. We hope that this research will The analysis categorizes these components into three
contribute to a better understanding of the potential of essential groups: state of being, thought, and territory. State of
machine learning in the lending industry. being referred to the intrinsic properties of loan applicants,
including credit history, income, employment status, debt-to-
II. OBJECTIVE income ratio, and loan purpose [4]. Thought encompasses
factors offered by lenders to attract potential borrowers, such
The main objective of this research paper on loan
as attractive interest rates, flexible repayment terms, and
prediction using machine learning is to develop a predictive
personalized loan products [9]. Territory plays a vital role in
model [9] that can accurately predict the likelihood of loan
shaping loan decisions as it determines the economic
repayment based on a variety of factors. Specifically, we aim
environment, market conditions, collateral, and accessibility
to:
to public amenities. Proximity to schools, hospitals, shopping
centres, and recreational areas influences loan eligibility and
interest rates [10].
194
Authorized licensed use limited to: Manav Rachna International Institute of Research and Studies. Downloaded on March 22,2024 at 07:05:16 UTC from IEEE Xplore. Restrictions apply.
IV. DATASET
In Loan Prediction, the Data is taken from the Data Set.
The Data Set consisted of some variables which were taken
from the Bank. The dataset consists of tags like Loan ID,
Gender, Married, dependents, etc [2] as shown in Figure 1.
The Data Set is used to take the information from the loan
Applicants. With, the help of this data the machine learning
algorithm will find whether the applicant applies for a loan
(or) not [3].
Machine Learning uses this data set as a training dataset.
By this Dataset, the model will train with the help of this
Dataset. After the training of the model, the new entries act
as test data which was filled in at the time of applying.
After performing the tests, the model will be able to
predict whether the applicant can pay the loan (or) not.
Fig:2 Steps for loan prediction
A. Data Collection
Data collection refers to the process of gathering
information and acquiring data from various sources. It
involves systematically gathering relevant data to address
specific research questions, support decision-making, or
analyze patterns and trends [8]. Effective data collection
requires careful planning, clear objectives, and appropriate
methodologies. It also involves ensuring the accuracy,
reliability, and ethical handling of data. Depending on the
nature of the data, considerations must be made to protect
privacy and comply with relevant data protection regulations.
I have collected the various datasets from GitHub. We have
taken a dataset that is suitable for loan prediction. This dataset
has less scope for errors and variations [12].
B. Pre-Processing
Pre-processing refers to the steps taken to clean,
transform, and prepare data before analysis or modelling. It
involves various techniques and operations to ensure that the
data is in a suitable format for further analysis and to address
issues such as missing values, outliers, noise, or
inconsistencies. Pre-processing steps are highly dependent on
the specific characteristics of the dataset, the analysis goals,
and the modelling techniques to be applied. It is crucial to
carefully assess and understand the data to determine the
appropriate pre-processing steps required for a particular
analysis task.
Fig: 1Dataset variables, their description and data type
C. Splitting Data into Train Set& Test Set
Splitting data into a train set and a test set is a common
V. METHODOLOGY
practice in data analysis and machine learning. The purpose of
The methodology provides a brief layout of the this split is to evaluate the performance of a model on unseen
architecture of proposed system. The processing is followed a data and to avoid over fitting, which occurs when a model
pipeline manner, where after completing one phase, the performs well on the training data but poorly on new, unseen
second phase is being started. data. Splitting the data into train and test sets helps estimate
the model's ability to generalize to new, unseen data. It allows
In the Figure 2 the steps to predict the loan whether is for unbiased evaluation of model performance and helps
approved for the loan applicant (or) not.First, the data set is identify potential issues like over fitting or under fitting.
collected and pre-processing of data is performed to clean the
data, and remove the null values in the data set [11]. Then, the D. Model Selection & Training
data is splitter into two parts Train Data and test Data. Then,
Model selection and training is a critical step in machine
application of machine learning algorithm is being performed
learning, where you choose an appropriate algorithm and train
to find the accuracy of the data set By using different types of
it on your dataset.It's important to choose evaluation metrics
machine learning algorithms and make a comparative analysis
based on the problem type, data characteristics, and specific
of their accuracy rate.
goals of your project. Consider the context, the potential
impact of false positives and false negatives, and any domain-
specific requirements. Additionally, keep in mind that
195
Authorized licensed use limited to: Manav Rachna International Institute of Research and Studies. Downloaded on March 22,2024 at 07:05:16 UTC from IEEE Xplore. Restrictions apply.
evaluation should not be limited to a single metric, but should B. Decision Tree
be a comprehensive analysis considering multiple metrics and
a thorough understanding of the problem at hand. Decision tree is a supervised learning problem. A decision
tree is another popular machine learning algorithm used for
E. Model Evaluation classification problems as well as for regression algorithms,
such as loan prediction [7]. A decision tree is a tree-like
Model evaluation is a crucial step in machine learning to model where each node represents a feature or attribute, and
assess the performance and effectiveness of a trained model. each edge represents a decision or rule based on that featureas
It involves measuring how well the model generalizes to new, shown in Figure 4. The goal of the decision tree is to split the
unseen data and how accurately it predicts the target variable. dataset into increasingly homogeneous subsets until a
Here are some common techniques and metrics used for stopping criterion is met [7].
model evaluation. Model selection and training is an iterative
process. It often involves experimentation, fine-tuning, and
comparing different models to find the one that best suits your Decision Node
problem and data.
F. Prediction
Prediction, in the context of data analysis and machine
learning, refers to the process of estimating or forecasting an
unknown or future outcome based on available data and a
Decision Node Decision Node
trained model. It involves using a trained model to make
informed guesses or projections about what might happen in a
given situation[5]. It's important to note that the accuracy of
loan predictions depends on the quality and representativeness
of the training data, the chosen machine learning algorithm,
and the features used in the prediction model. Regular model
evaluation and monitoring are necessary to ensure its Leaf Leaf Leaf Leaf
performance and to adapt to any changes in data patterns or
application requirements [16].
VI. ALGORITHM USED IN MACHINE LEARNING Fig:4 Decision Tree
196
Authorized licensed use limited to: Manav Rachna International Institute of Research and Studies. Downloaded on March 22,2024 at 07:05:16 UTC from IEEE Xplore. Restrictions apply.
VII. RESULTS The accuracy of prediction has been calculated by
applying various algorithms. For Loan prediction using a
To achieve the result for loan prediction various Random Forest Classifier an accuracy of 76.66%was
machine learning algorithms have been utilized. Macro and achieved. By, using the Decision Tree Algorithm accuracy
micro factors that affect the calculation for loan prediction of 67.52% and by using K-Neighbours Classifiers an
are considered to provide the desired result [13-14]. Data accuracy of 80.83% has been received. By, considering the
collecting is started first. Then, data cleaning is carried out applied algorithms and achieved accuracy rate, the K-
to make the data clean and error-free. The following data Neighbours Classifier had performed well. The proposed
preparation is finished. The distribution of the data is then framework will help the banks and the customers for
intended to be depicted through the creation of various decision making in lesser time.
graphs using data visualization. In the end, the commercial
costs of the homes were calculated precisely. This was In conclusion, machine learning algorithms have
possible because our house pricing dataset's multiple demonstrated their effectiveness in loan prediction tasks.
regression methods were applied to improve their accuracy The data has been cleaned and followed by the pre-
and produce better results. This improvement was made processing tasks. The comparative analysis presented in
possible by a straightforward stacking algorithm. this research highlights the strengths and weaknesses of
decision tree-based models, random forest classifiers, and
In addition to applying regression techniques, some K-nearest neighbours classifiers.
classification algorithms are also taken into account,
including the decision tree algorithm, Random Forest Future research should focus on exploring advanced
classifier, K-Neighbour classifier, etc. The Accuracy of the techniques, such as deep learning models or hybrid
Machine Learning algorithm in this Loan Prediction is approaches, and incorporating alternative data sources to
shown in the table-1below. Table-1 depicts that K- further enhance the accuracy and robustness of loan
Neighbours Classifiers perform well in comparison with prediction models. Additionally, investigating the ethical
random forest and decision tree classifiers. The graph considerations, fairness, and transparency of machine
plotted in Figure-5, using the same values as given in learning algorithms in loan decision-making is essential to
Table-1, represent the accuracy in percentage of the ensure unbiased and responsible lending practices in the
algorithms for loan prediction using machine learning. The financial industry.
model get the accuracy rate of the Random Forest
Classifier is 76.66% the Decision Tree is 67.52% and the REFERENCES
K-Neighbours Classifier is 80.83%. [1] Arun, K., Ishan, G. and Sanmeet, K., (2016). Loan approval
prediction based on machine learning approach. IOSR J. Comput.
TABLE 1 Eng, 18(3), pp.18-21.
STATISTICS OF ML ALGORITHMS FOR LOAN PREDICTION [2] Bhattad, S., Bawane, S., Agrawal, S., Ramteke, U. and Ambhore,
S.No. Machine Learning Accuracy P.B., (2021). Loan Prediction using Machine Learning
Algorithms. International Journal of Computer Science Trends and
Algorithm Technology, 9(3), pp.143-146.Dua, D. and Graff, C. (2019). UCI
Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine,
1 Random Forest 76.66% CA: University of California, School of Information and Computer
Classifier Science.
[3] DeNicola, Louis,”What Is A Good Credit Score.”Web Blog post.
2 Decision Tree 67.52% Score Advise. (6 June 2023). Available At:
https://www.novacredit.com/resources/what-is-a-good-credit-score
3 K-NeighboursClassifier 80.83% [ Accessed July 2023]
[4] Rahadi, R.A., Wiryono, S.K., Koesrindartoto, D.P. and Syamwil,
I.B., (2015). Factors influencing the price of housing in
Indonesia. International Journal of Housing Markets and
85.00% Analysis, 8(2), pp.169-188.
[5] Arokianathan, P., Dinesh, V., Elamaran, B., Veluchamy, M. and
Sivakumar, S., (2017), April. Automated toll booth and theft
80.00% detection system. In 2017 IEEE Technological Innovations in ICT
for Agriculture and Rural Development (TIAR) (pp. 84-88). IEEE.
75.00% DOI: 10.1109/TIAR.2017.8273691
[6] Ereiz, Z., (2019), November. Predicting default loans using
machine learning (OptiML). In 2019 27th Telecommunications
70.00% Series1
Forum (TELFOR) (pp. 1-4). IEEE.
[7] Quinlan, J.R., (1986). Induction of decision trees. Machine
65.00% learning, 1, pp.81-106.
[8] Rao, K.H., Srinivas, G., Damodhar, A. and Krishna, M.V., (2011).
Implementation of anomaly detection technique using machine
60.00% learning algorithms. International journal of computer science and
RandomForest Decision Tree KNeighborsClassifier telecommunications, 2(3), pp.25-31.
Classifier [9] M. Jain, H. Rajput, N. Garg and P. Chawla, "Prediction of House
Pricing using Machine Learning with Python," (2020) International
Fig: 5 Performance Analysis of Algorithms Conference on Electronics and Sustainable Communication
Systems (ICESC), Coimbatore, India, 2020, pp. 570-574, doi:
10.1109/ICESC48915.2020.9155839.
[10] Fan, C., Cui, Z., &Zhong, X. (2018, February). House Prices
VIII. CONCLUSION AND FUTURE SCOPE Prediction with Machine Learning Algorithms. In Proceedings of
197
Authorized licensed use limited to: Manav Rachna International Institute of Research and Studies. Downloaded on March 22,2024 at 07:05:16 UTC from IEEE Xplore. Restrictions apply.
the 2018 10th International Conference on Machine Learning and
computing (pp. 6-10).ACM.
[11] Bradley, A. P. (1997). The use of the area under the ROC curve in
the evaluation of machine learning algorithms. Pattern recognition,
30(7),1145- 1159.
[12] Liu, J., Ye, Y., Shen, C., Wang, Y., &Erdélyi, R. (2018). A New
Tool for CME Arrival Time Prediction using Machine Learning
Algorithms: CATPUMA. The Astrophysical Journal, 855(2), 109.
[13] Kadir, T., & Gleeson, F. (2018). Lung cancer prediction using
machine learning and advanced imaging techniques. Translational
Lung Cancer Research, 7(3), 304-312.
[14] Goyal, A. and Kaur, R., (2016). Accuracy prediction for loan risk
using machine learning models. Int. J. Comput. Sci. Trends
Technol, 4(1), pp.52-57.
[15] Sujatha, C.N., Gudipalli, A., Pushyami, B., Karthik, N. and
Sanjana, B.N., 2021, November. Loan Prediction Using Machine
Learning and Its Deployement On Web Application. In 2021
Innovations in Power and Advanced Computing Technologies (i-
PACT) (pp. 1-7). IEEE.
[16] Singh, V., Yadav, A., Awasthi, R. and Partheeban, G.N., 2021,
June. Prediction of modernized loan approval system based on
machine learning approach. In 2021 International Conference on
Intelligent Technologies (CONIT) (pp. 1-4). IEEE.
198
Authorized licensed use limited to: Manav Rachna International Institute of Research and Studies. Downloaded on March 22,2024 at 07:05:16 UTC from IEEE Xplore. Restrictions apply.