0% found this document useful (0 votes)

11 views7 pages

Irjet V12i425

This study focuses on enhancing loan default prediction and fraud detection using ensemble learning techniques, specifically combining Random Forest, XGBoost, and Multi-Layer Perceptron models. The proposed system achieves a high accuracy of 92.59% in predicting loan defaults and effectively identifies ghost borrowers through a hybrid approach integrating temporal analysis and feature-based classification. By leveraging advanced machine learning algorithms, the framework aims to improve risk management for financial institutions and reduce potential losses.

Uploaded by

mdmomenur.rahaman2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views7 pages

Irjet V12i425

Uploaded by

mdmomenur.rahaman2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 04 | Apr 2025 [Link] p-ISSN: 2395-0072

Enhancing Loan Default Prediction and Fraud Detection with

Ensemble Learning
Adwait Mandge1, Rohan Fatehchandka2, Kunal Goudani3, Tanaya Shelke4, Prof. Pramila M. Chawan5

1,2,3,41B. Tech Student, Dept of Computer Engineering, and IT, VJTI College, Mumbai, Maharashtra, India
5Associate Professor, Dept of Computer Engineering, and IT, VJTI College, Mumbai, Maharashtra, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Loan risk assessment is crucial for financial aggregates the outcomes to increase prediction accuracy.
security. This study combines ensemble learning for loan Features are randomly selected at every node(feature
default prediction and time series analysis for ghost bagging), and each tree in the forest is trained on an
borrower detection. Using PyCaret, we optimize model arbitrary subclass of the data (bootstrapping). Random
selection to identify high-risk borrowers. Additionally, Forest reduces variation and overfitting and averages the
ARIMA, LSTMs, and anomaly detection techniques analyze outcomes of these trees, which enhances generalization.
transaction patterns to flag fraudulent behaviors like The advantages of this algorithm are handling large
sudden withdrawals and post-loan inactivity. By integrating datasets, reducing overfitting, and maintaining good
predictive modeling and anomaly detection, we enhance accuracy even in the presence of missing data.
early fraud detection. This approach provides financial
institutions with a comprehensive risk management XGBoost (Extreme Gradient Boosting) is a scalable and
framework, improving decision-making and reducing extremely effective gradient boosting implementation. It
potential losses. constructs decision trees sequentially, aiming to fix the
mistakes caused by preceding trees with each new tree.
Key Words: Machine Learning, Deep Learning, XGBoost uses gradient descent to minimize the overall loss
Ensemble, Loan Default Prediction, Ghost Borrower function in order to optimize the model. Performance and
Detection, TCN speed are well-known for XGBoost, particularly with
tabular or structured [Link] advantages of this
[Link] algorithm are its speed, scalability, handling of missing
data, and regularization to minimize overfitting.
Accurate loan default prediction is vital for financial
institutions to mitigate risks. Traditional credit 2.2 Deep Learning Algorithms
assessment methods often overlook hidden patterns,
making machine learning a powerful alternative. This Neural Networks replicate the composition of the human
study utilizes PyCaret to compare ensemble techniques brain. Neural networks comprise multiple bands of
like bagging, boosting, and stacking for loan default neurons connected by edges with weights that are
prediction. By analyzing borrower demographics, financial updated during training. Neural networks are widely used
history, and loan details, we evaluate model performance in deep learning processes as they can interpret
using accuracy, precision, recall, and F1-score, identifying occurrences of perplexing repetitive sequences. The
the most effective approach for credit risk assessment. advantages of this algorithm are its flexibility, it can
interpret information from huge datasets, and its potential
To enhance fraud detection, we address ghost to replicate lateral relationships.
borrowers—fraudsters who manipulate financial records
to evade repayment. We integrate time series analysis Multi-Layer Perceptron (MLP) is used in deep learning
using TCN (Temporal Convolution Networks) to identify tasks. MLPs consist of multiple layers which are
suspicious transaction patterns. By combining predictive interconnected to each other, every neuron of one layer
analytics with fraud detection, our study provides acts as an input to the neuron in the next layer. MLPs are
actionable insights to improve lending decisions and used in classification, regression, and serve as the
reduce financial losses. foundation for more intrinsic neural networks. Its
capability to handle both regression and classification
2. LITERATURE REVIEW problems, and its role as a foundation for more advanced
neural networks is advantageous.
2.1 Machine Learning Algorithms
Temporal Convolutional Networks (TCNs) are a type of
Random Forest learning technique is used to solve deep learning architecture designed for sequence
various problems of regression and classification modeling tasks, offering an alternative to recurrent neural
problems. It builds several trees during training and networks (RNNs) like LSTMs and GRUs. TCNs leverage 1D

© 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page 157
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 [Link] p-ISSN: 2395-0072

dilated causal convolutions, ensuring that predictions at The architecture of our loan default prediction model
any time step depend only on past information, making follows a structured machine learning pipeline, integrating
them suitable for time-series forecasting, natural language ensemble learning techniques to enhance predictive
processing, and anomaly detection. They utilize residual accuracy. The system is designed to preprocess financial
connections and dilation to capture long-range datasets, extract meaningful patterns, and make robust
dependencies efficiently, allowing for parallel computation predictions through a combination of machine learning
and stable gradients, unlike RNNs, which suffer from models. The workflow consists of several key stages: data
vanishing gradients and sequential processing limitations. preprocessing, model training using ensemble techniques,
and performance evaluation with hyperparameter tuning.
3. Proposed System
This flowchart represents an ensemble learning pipeline
3.1 Problem Statement: “To predict loan default using for loan default prediction using multiple models. This
machine learning techniques.” ensemble learning pipeline for loan default prediction
integrates Neural Networks, Random Forest, and XGBoost.
3.2 Problem Elaboration: For financial organizations, The dataset undergoes preprocessing, including cleaning
loan failure poses a serious problem since it can result in and scaling, before training. A weighted voting approach
large losses and elevated risk. Conventional credit combines predictions based on model performance. After
evaluation techniques are frequently ineffective and have evaluation on a test set using accuracy metrics,
trouble identifying subtle trends in borrower behavior, hyperparameter tuning is applied if needed. Once
which leads to imprecise forecasts. The volume of loan optimized, final predictions are generated, leveraging the
data has increased due to the growth of digital financial strengths of multiple models to enhance predictive
services, necessitating the use of more advanced accuracy and robustness.
algorithms to forecast defaults. Loan default prediction
can be automated and data-driven with machine learning; [Link] Data:
nevertheless, choosing the best algorithm can be difficult,
especially in cases when the datasets are unbalanced and Data Collection:
defaults are few. In order to determine which machine
learning model performs best, this study compares the The dataset for this study was collected from Kaggle,
efficacy of several algorithms in forecasting loan defaults. containing loan default prediction records with 34
Assisting financial institutions in managing risk better and attributes. These attributes include loan_purpose,
making more informed loan decisions is the aim. Credit_Worthiness, open_credit, business_or_commercial,
Credit_Score, age, LTV, Region, Security_Type, and Status.
3.3 Architecture of the proposed models: These features capture essential financial and
demographic details about loan applicants, providing
3.3.1 Loan Default Prediction System insights into their creditworthiness and likelihood of
default. The dataset also includes variables like
credit_type, co-applicant_credit_type, and dtir1 (Debt-to-
Income Ratio), which are crucial indicators for assessing
risk. With a mix of categorical and numerical attributes,
this dataset offers a well-rounded foundation for training
predictive models.

Data Preprocessing:
Effective preprocessing enhances the reliability of
machine learning models by systematically cleaning and
transforming data. This study follows a structured
pipeline, beginning with data cleaning and feature
selection using pandas. Redundant columns (e.g.,
Interest_rate_spread, credit_type, Upfront_charges) are
removed, and missing values are handled by imputing
numerical features with the median and categorical
features with the mode. Categorical variables are encoded
using one-hot encoding with drop_first=True to avoid the
dummy variable trap. PyCaret automates key
preprocessing tasks, including feature scaling,
transformation, imbalance handling, and feature selection,
Fig-1 Loan Default Prediction System Architecture

© 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page 158
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 [Link] p-ISSN: 2395-0072

before splitting the dataset into training (75%) and testing Ensemble Learning Approach
(25%) sets with Status as the target variable.
Ensemble learning enhances accuracy and robustness by
To address class imbalance, SMOTE generates synthetic integrating multiple models to make more reliable
samples for the minority class, improving predictive predictions. The strengths of MLP (captures complex
fairness. Numerical features undergo standardization via patterns), Random Forest (reduces overfitting), and
StandardScaler (mean = 0, std = 1), while XGBoost (enhances generalization) are leveraged together.
PowerTransformer ensures Gaussian-like distributions for Predictions from each model are weighted based on
skewed data. Finally, column names are sanitized using accuracy, and a final probability score is calculated. This
regular expressions to remove special characters, ensuring approach ensures better bias-variance tradeoff, improving
compatibility with machine learning frameworks. This loan default prediction performance compared to
streamlined approach optimizes model performance and individual models.
enhances predictive accuracy.
[Link] Results
[Link] Models Used in This Project and Their
Working Mechanism The final ensemble model achieves an accuracy of 92.59%,
indicating strong predictive performance. It has a
This project utilizes an ensemble learning approach, precision of 79.98%, meaning 79.98% of predicted
combining multiple machine learning models to improve defaulters were actual defaulters, and a recall of 93.29%,
predictive performance. The individual models used are: showing it successfully identified 93.29% of all actual
defaulters. The F1-score of 86.12% balances precision and
1. Multi-Layer Perceptron (MLP) Classifier recall effectively. The ROC-AUC score of 98.47% suggests
2. Random Forest Classifier excellent differentiation between defaulters and non-
defaulters. The classification report shows that for non-
3. XGBoost Classifier defaulters (Class 0), precision is 98% and recall is 92%,
while for defaulters (Class 1), precision is 80% and recall
The final prediction is made using a weighted ensemble is 93%. The confusion matrix reveals 25869 true
method, where each model's contribution is proportional negatives, 2139 false positives, 615 false negatives, and
to its accuracy. 8545 true positives, showing that while the model
correctly predicts most cases, it misclassifies some non-
1. Multi-Layer Perceptron (MLP) Classifier defaulters as defaulters. Overall, the model is well-
balanced, with high recall ensuring minimal missed
MLP is an artificial neural network with multiple layers
defaults.
that captures complex data relationships. It consists of an
input layer, hidden layers, and an output layer, using High recall (0.9329) indicates the model's strong ability to
activation functions like ReLU to introduce non-linearity. correctly identify positive cases, minimizing false
It learns through backpropagation and gradient descent, negatives. This is particularly crucial in applications such
making it effective for non-linear patterns in financial data as fraud detection, medical diagnosis, and intrusion
like loan default prediction. detection, where missing a true positive can have serious
consequences. With a recall of 93.29%, the ensemble
2. Random Forest Classifier model effectively detects the majority of actual positive
instances, reducing the risk of undetected critical cases.
Random Forest is an ensemble learning method that
Additionally, a high F1-score (0.8612) balances precision
builds multiple decision trees using different data subsets.
and recall, ensuring that the model does not generate too
It reduces overfitting by training trees independently and
many false positives while still identifying true positives
making predictions through majority voting. It is efficient
effectively. This trade-off is essential in scenarios where
with high-dimensional and imbalanced datasets, making it
both false positives and false negatives carry significant
suitable for robust classification tasks.
consequences. An F1-score of 86.12% demonstrates that
3. XGBoost (Extreme Gradient Boosting) the model is well-optimized for overall reliable
classification, making it a robust choice for practical
Classifier
applications requiring high accuracy and minimal errors.
XGBoost is a gradient boosting algorithm optimized for
A model with high recall ensures fewer missed
high performance and accuracy. It builds decision trees
detections, while a high F1-score ensures a balanced
sequentially, correcting previous errors, while using
and optimal classification performance. This is
regularization to prevent overfitting. Its efficiency in
particularly beneficial in applications where false
handling large datasets makes it a top choice for fraud
negatives are costly but precision cannot be sacrificed
detection and credit risk modeling.
entirely.
© 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page 159
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 [Link] p-ISSN: 2395-0072

model effectively distinguishes between positive and

negative classes.

Fig-2 Final Ensemble Model Performance Metrics and

Confusion Matrix (Loan Default Prediction)

The comparison table shows that the Random Forest

Classifier (RF) achieves the highest accuracy (92.75%),
AUC (98.20%), and F1-score (85.88%) among all
models, making it the best performer overall. The MLP
Classifier (Neural Network) follows closely with an Fig-4 ROC Curve for the Ensemble Model (Loan Default
accuracy of 92.11% and AUC of 97.85%, indicating Prediction)
strong performance in complex decision boundaries. The
AdaBoost Classifier has a slightly lower accuracy 3.3.2 Ghost Borrower Detection:
(90.90%) but excels in recall (90.04%), making it more
suitable for identifying positive cases. Traditional models The architecture of our Ghost Borrower Detection
like Logistic Regression (74.91%) and SVM (75.20%) System is designed to leverage both temporal patterns
lag significantly behind, with Naïve Bayes (77.17%) in transaction data and aggregated customer-level
performing the worst in accuracy but surprisingly having a features to enhance fraud detection accuracy. At its core,
high precision (85.56%), meaning it correctly identifies the system employs a hybrid ensemble approach,
many positive cases but struggles with overall integrating a Temporal Convolutional Network (TCN)
performance. The ensemble-based Random Forest ,MLP for sequential transaction analysis and a Random Forest
and XGBoost models outperform individual classifiers, Classifier for feature-based classification. The TCN
reinforcing the advantage of ensemble learning in extracts temporal dependencies from transaction
boosting predictive performance. sequences using 1D convolutional layers with causal
padding, ensuring that past data influences future
predictions without leakage. Meanwhile, the Random
Forest model identifies key risk indicators by analyzing
customer behavior at an aggregated level and providing
feature importance insights. These models are
dynamically weighted based on their individual accuracy,
and their predictions are combined using an ensemble
mechanism to enhance robustness. This multi-layered
architecture ensures a comprehensive evaluation of
borrower risk, enabling early detection of ghost
borrowers before significant financial loss occurs.
Fig-3 Comparative Performance of Machine Learning
Models (Loan Default Prediction)

This ROC curve evaluates the classification performance

by plotting the True Positive Rate (Recall) against the
False Positive Rate at various thresholds. The blue curve
represents the model’s performance, with an AUC (Area
Under Curve) of 0.98, indicating excellent discrimination
between classes. The diagonal gray line represents a
random classifier (AUC = 0.5), and the model significantly
outperforms it. A higher AUC value suggests that the

© 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page 160
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 [Link] p-ISSN: 2395-0072

and repayment trends. Key indicators like the largest

withdrawal ratio and post-loan activity further enhance
fraud detection, providing a robust foundation for
predictive modeling.

Data Preprocessing:
This study follows a structured preprocessing pipeline to
enhance model performance and reliability. Missing values
in activity_after_large_withdrawal, loan_payment_total, and
loan_payment_count are imputed with zero to maintain
consistency. Feature engineering introduces derived
variables such as withdrawal_to_loan_ratio,
repayment_ratio, and transaction_activity_ratio to improve
predictive power. The target variable borrower_type is
transformed into a binary format (1 for ghost borrowers, 0
for normal borrowers) for supervised classification. Key
features selected include loan_amount,
pre_loan_transaction_count, first_week_withdrawal_ratio,
post_loan_transaction_count, and repayment_ratio. Finally,
numerical features are standardized using StandardScaler
to ensure uniformity and optimize model performance.

[Link] Models Used in This Project and Their

Working Mechanism
This project implements a hybrid ensemble strategy to
identify ghost borrowers by integrating two
Fig-5. Ghost Borrower System Architecture
complementary models.:
This flowchart represents an ensemble learning pipeline
1. Temporal Convolutional Network (TCN) Model
for ghost borrowers using multiple models. This
ensemble learning pipeline for detecting ghost borrowers 2. Random Forest Classifier
integrates Temporal Convolutional Networks (TCN) and
Random Forest Classifiers. The transaction-level dataset The final decision is derived through a weighted
undergoes preprocessing, including cleaning and feature combination of their predictions, with each
extraction. TCN captures sequential transaction patterns, model's influence determined by its validated
while Random Forest analyzes aggregated features for [Link] the algorithms give us
classification and feature importance. A dynamic probability of a person being a ghost borrower
weighting mechanism adjusts based on model accuracy. which is then combined using ensemble learning.
After training, models are evaluated on a separate dataset
using accuracy metrics, with hyperparameter tuning if 1. Temporal Convolutional Network (TCN)
needed. Final predictions are generated through weighted Model
voting, leveraging both models’ strengths to enhance
predictive accuracy and robustness. TCN is a neural network architecture tailored for
sequential data analysis. It utilizes causal
[Link] Data convolutions to capture temporal dependencies in
transaction records. Trained on raw transaction
This study utilizes a synthetically generated dataset data—including features such as customer_id,
containing transaction-level financial records to detect date, transaction_type, amount, loan_date,
ghost borrowers—fraudulent actors who default after loan_amount, borrower_type, and balance—TCN
securing a loan. Each record includes key attributes such analyzes the sequence of transactions to detect
as customer_id, transaction_type, amount, loan_date, anomalous patterns indicative of ghost borrower
loan_amount, borrower_type (ghost/non-ghost), balance, behavior. Its strength lies in effectively modeling
and days_since_loan. Additionally, aggregated features for time-dependent relationships and detecting
a Random Forest classifier capture pre-loan and post-loan deviations from normal transactional trends.
transactional behavior, including transaction counts,
deposit and withdrawal totals, early withdrawal patterns,

© 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page 161
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 [Link] p-ISSN: 2395-0072

2. Random Forest Classifier decision-making, helping financial institutions understand

key risk factors. Additionally, leveraging real-time data
The Random Forest classifier operates on a integration and adaptive learning, possibly through
synthetic feature set derived from the raw reinforcement learning, can create dynamic loan approval
transaction data. This synthetic dataset is and fraud detection systems that evolve with emerging
generated by aggregating transaction information trends. Expanding training datasets with alternative
into summary statistics, such as pre-loan financial records, macroeconomic indicators, and
transaction count, deposit and withdrawal totals, regional data will enhance generalizability across diverse
first-week withdrawal metrics, and post-loan economic conditions. Finally, further research in enhanced
activity metrics. By constructing an ensemble of feature engineering for ghost borrowers—by
decision trees, Random Forest reduces overfitting incorporating digital footprints, social media data, and
and captures non-linear interactions among these other alternative sources—could enrich the synthetic
engineered features, efficiently predicting dataset and improve anomaly detection accuracy.
borrowers as ghost or non-ghost.
5. CONCLUSION
Ensemble Learning Approach
In this paper, we presented an advanced loan default
To overcome the individual limitations of the TCN prediction model using ensemble learning techniques,
and Random Forest models, an ensemble method combining Random Forest, XGBoost, and Neural
is implemented. This approach combines the Networks in a stacking framework. Detailed data
predictions from both models using a weighted preprocessing, including missing value treatment,
scheme, where each model's contribution is feature engineering, and normalization, enhanced
proportional to its accuracy and reliability on model effectiveness. Evaluation using metrics such as
validation data. By integrating the temporal Accuracy, Precision, Recall, F1-score, and AUC-ROC—
insights from the TCN with the aggregated feature coupled with hyperparameter tuning and cross-
perspective of the Random Forest, the ensemble validation—demonstrated improved performance over
classifier achieves a more robust and balanced single-model approaches.
bias-variance tradeoff, leading to improved ghost
borrower detection performance. Additionally, we addressed the challenge of ghost
borrowers by integrating a Temporal Convolutional
[Link] Results Network (TCN) to analyze transaction sequences with a
Random Forest classifier based on engineered
features. This hybrid approach effectively identifies
anomalous borrowing [Link] integrated model
offers financial institutions a powerful tool to mitigate
loan default risks, detect ghost borrowers, and optimize
Fig-6 Ensemble Model Accuracy Output (Ghost Borrower lending decisions. Future work on Explainable AI, real-
Detection) time data integration, and adaptive risk assessment
will further enhance its impact on financial risk
The ensemble model achieved an accuracy of 95.00% in management.
detecting ghost borrowers, demonstrating its effectiveness
in identifying fraudulent loan defaulters. The high ACKNOWLEDGEMENT
accuracy suggests that combining multiple models
enhances predictive performance by capturing critical This work is supported in part by Prof. Pramila M.
patterns in borrower behavior. This result reinforces the Chawan. We thank the reviewers for their valuable
model’s reliability in distinguishing ghost borrowers from discussions and feedback.
legitimate ones.
6. REFERENCES
4. Future Scope
[1] Vijay Kumar, Rachna Narula, Akanksha Kochhar "Loan
The proposed integrated framework for loan default Default Prediction using Machine Learning Models"
prediction and ghost borrower detection can be 2024 DOI: 10.5281/zenodo.8337054
enhanced by several innovative directions. Future work
may implement advanced ensemble techniques, such as [2] Jinchen Lin "Research on loan default prediction
heterogeneous stacking with deep learning models, to based on logistic regression, randomforest, xgboost
boost prediction robustness. Incorporating Explainable and adaboost" 2023 pp DOI:
AI (XAI) will offer clear, interpretable insights into 10.1051/shsconf/202418102008

© 2025, IRJET | Impact Factor value: 8.315 | ISO 9001:2008 Certified Journal | Page 162
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 12 Issue: 04 | Apr 2025 [Link] p-ISSN: 2395-0072

[3] Wanjun Wu "Machine Learning Approaches to Predict Workshops/ STTPs/FDPs. She has participated in 16
Loan Default 2022 DOI:10.4236/iim.2022.145011 National/International Conferences. Worked as Consulting
Editor on – JEECER, JETR,JETMS, Technology Today,
[4] Platur Gashi, “Loan Default Prediction Model” 2023, JAM&AER Engg. Today, The Tech. World Editor – Journals
DOI:10.13140/RG.2.2.22985.01126 of ADR Reviewer -IJEF, Inderscience She has worked as
NBA Coordinator of the Computer Engineering
[5] Lai, L, “Loan default prediction with machine learning Department of VJTI for 5 years. She had written a proposal
techniques”. In: 2020 International Conference on under TEQIP-I in June 2004 for ‘Creating Central
Computer Communication and Network Security Computing Facility at VJTI’. Rs. Eight Crore were
(CCNS). pp. 5–9. IEEE (2020) sanctioned by the World Bank under TEQIP-I on this
proposal. Central Computing Facility was set up at VJTI
[6] Xu Zhu, Qingyong Chu,Xinchang Song, Ping Hu,Lu
through this fund which has played a key role in
Peng, Explainable prediction of loan default based on
improving the teaching learning process at VJTI. Awarded
machine learning models
by SIESRP with Innovative & Dedicated Educationalist
DOI:10.1016/[Link].2023.04.003
Award Specialization : Computer Engineering & I.T. in
[7] Zhao X, Guan S. CTCN: a novel credit card fraud 2020 AD Scientific Index Ranking (World Scientist and
detection method based on Conditional Tabular University Ranking 2022) – 2nd Rank- Best Scientist, VJTI
Generative Adversarial Networks and Temporal Computer Science domain 1138th Rank- Best Scientist,
Convolutional Network. PeerJ Comput Sci. 2023 Oct Computer Science, India.
10;9:e1634. doi: 10.7717/peerj-cs.1634. PMID:
37869461; PMCID: PMC10588710. Kunal Goudani,
B-Tech Student, Dept. of
[8] A. Mandge, R. Fatehchandka, K. Goudani, T. Shelke, and Computer Engineering and IT,
P. M. Chawan, "A Survey on Loan Default Prediction VJTI, Mumbai, Maharashtra,
using Machine Learning Techniques," International India
Research Journal of Engineering and Technology
(IRJET), vol. 11, no. 11, pp. XX-XX, Nov. 2024

[9] Boulieris, P., Pavlopoulos, J., Xenos, A. et al. Fraud

detection with natural language processing. Mach
Learn 113, 5087–5108 (2024). Rohan Fatehchandka,
[Link] B-Tech Student, Dept. of
Computer Engineering and IT,
[10] Loan Default Dataset Kaggle - VJTI, Mumbai, Maharashtra,
[Link] India

BIOGRAPHIES

Adwait Mandge,
B. Tech Student, Dept. of
Computer Engineering and IT,
VJTI, Mumbai, Maharashtra,
India

Prof. Pramila M. Chawan, is working as an Associate

Professor in the Computer Engineering Department of
VJTI, Mumbai. She has done her B.E.(Computer
Engineering) and M.E.(Computer Engineering) from VJTI
College of Engineering, Mumbai University. She has 28 Tanaya Shelke,
years of teaching experience and has guided 85+ M. Tech. B. Tech Student, Dept. of
projects and 130+ B. Tech. projects. She has published 143 Computer Engineering and IT,
papers in the International Journals, 20 papers in the VJTI, Mum
National/ International Conferences/ Symposiums. She
has worked as an Organizing Committee member for 25
International Conferences and 5 AICTE/MHRD sponsored

Loan Prediction via Machine Learning
No ratings yet
Loan Prediction via Machine Learning
9 pages
Shsconf Icdeba2023 02008
No ratings yet
Shsconf Icdeba2023 02008
5 pages
1 PB
No ratings yet
1 PB
13 pages
Paper 4
No ratings yet
Paper 4
9 pages
Research Report
No ratings yet
Research Report
8 pages
Vehicle Loan Fraud Prediction Using Data Science and Machine Learning Techniques
No ratings yet
Vehicle Loan Fraud Prediction Using Data Science and Machine Learning Techniques
4 pages
Loan Default Prediction Using Machine Learning
No ratings yet
Loan Default Prediction Using Machine Learning
5 pages
AI Loan Risk Prediction for Banks
No ratings yet
AI Loan Risk Prediction for Banks
3 pages
Credit Loan Default Prediction
No ratings yet
Credit Loan Default Prediction
22 pages
2022 V13i1198
No ratings yet
2022 V13i1198
12 pages
Bank Loan Approval Prediction Using Data Science Technique (ML)
No ratings yet
Bank Loan Approval Prediction Using Data Science Technique (ML)
10 pages
Ajol-File-Journals 543 Articles 255840 650d5184b77f4
No ratings yet
Ajol-File-Journals 543 Articles 255840 650d5184b77f4
14 pages
Paper 1
No ratings yet
Paper 1
10 pages
Bank Loan Prediction Using ML
No ratings yet
Bank Loan Prediction Using ML
65 pages
Project Stage I Report
No ratings yet
Project Stage I Report
17 pages
Loan Approval - PPT
No ratings yet
Loan Approval - PPT
19 pages
Prathyush PullaUB9A
No ratings yet
Prathyush PullaUB9A
9 pages
DefaultX 1
No ratings yet
DefaultX 1
8 pages
School of Information Technology and Engineering M.Tech Software Engineering (Integrated) FALL SEMESTER 2020 - 2021
No ratings yet
School of Information Technology and Engineering M.Tech Software Engineering (Integrated) FALL SEMESTER 2020 - 2021
36 pages
Loan Prediction Using Artificial Intelligence and Machine Learning
No ratings yet
Loan Prediction Using Artificial Intelligence and Machine Learning
24 pages
Loan Approval Prediction Based On Machine Learning Approach: Kumar Arun, Garg Ishan, Kaur Sanmeet
No ratings yet
Loan Approval Prediction Based On Machine Learning Approach: Kumar Arun, Garg Ishan, Kaur Sanmeet
4 pages
10.3934 Dsfe.2024009
No ratings yet
10.3934 Dsfe.2024009
14 pages
Machine Learning and Deep Learning For Loan Prediction in Banking Exploring Ensemble Methods and Data
No ratings yet
Machine Learning and Deep Learning For Loan Prediction in Banking Exploring Ensemble Methods and Data
23 pages
Paper 3
No ratings yet
Paper 3
5 pages
Loan Prediction 10
No ratings yet
Loan Prediction 10
10 pages
Loan Default Risk Assessment Using Supervised Learning
No ratings yet
Loan Default Risk Assessment Using Supervised Learning
7 pages
Coser Al. Crisan Albu (T)
No ratings yet
Coser Al. Crisan Albu (T)
17 pages
Iim 2022092709434339
No ratings yet
Iim 2022092709434339
8 pages
Loan Prediction Using Artificial Intelligence and Machine Learning
No ratings yet
Loan Prediction Using Artificial Intelligence and Machine Learning
23 pages
Arpit Pal E2 17 Report Loan-Prediction-System
No ratings yet
Arpit Pal E2 17 Report Loan-Prediction-System
34 pages
Paper 14014
No ratings yet
Paper 14014
9 pages
BDCC 08 00028
No ratings yet
BDCC 08 00028
22 pages
Decision Tree Model for Loan Approval
No ratings yet
Decision Tree Model for Loan Approval
7 pages
Loan Approval Prediction Using Machine Learning
No ratings yet
Loan Approval Prediction Using Machine Learning
16 pages
Loan Prediction with ML Models
No ratings yet
Loan Prediction with ML Models
11 pages
SSRN 5088929
No ratings yet
SSRN 5088929
11 pages
Loan Approval Prediction Using Machine Learning
No ratings yet
Loan Approval Prediction Using Machine Learning
2 pages
Phase 2 Loan Prediction
No ratings yet
Phase 2 Loan Prediction
26 pages
1 s2.0 S2666307423000293 Main
No ratings yet
1 s2.0 S2666307423000293 Main
13 pages
Assessment Report Richa
No ratings yet
Assessment Report Richa
12 pages
Project Review I Final Pid 02
No ratings yet
Project Review I Final Pid 02
9 pages
Loan Approval Prediction Using Machine Learning
No ratings yet
Loan Approval Prediction Using Machine Learning
14 pages
Loan Approval Prediction with ML
No ratings yet
Loan Approval Prediction with ML
4 pages
Banking Loan Prediction with AI
No ratings yet
Banking Loan Prediction with AI
2 pages
Wa0001.
No ratings yet
Wa0001.
8 pages
Dr. Vetrivelan. P School of Electronics Engineering: Loan Prediction Using Data Analytics
No ratings yet
Dr. Vetrivelan. P School of Electronics Engineering: Loan Prediction Using Data Analytics
31 pages
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
No ratings yet
Prediction of Modernized Loan Approval System Based On Machine Learning Approach
11 pages
Research Paper ALAS
No ratings yet
Research Paper ALAS
4 pages
Credit Risk Prediction with ML Models
No ratings yet
Credit Risk Prediction with ML Models
5 pages
Rapport Loan Prediction Finance
No ratings yet
Rapport Loan Prediction Finance
24 pages
IJCSIS Camera Ready Academia
No ratings yet
IJCSIS Camera Ready Academia
11 pages
2022 V13i876
No ratings yet
2022 V13i876
9 pages
Loan
No ratings yet
Loan
4 pages
Loan Prediction System
No ratings yet
Loan Prediction System
8 pages
Bank Loan Approval via ML
No ratings yet
Bank Loan Approval via ML
13 pages
Loan Default Prediction System
No ratings yet
Loan Default Prediction System
44 pages
AMAKH - Deep Learning - 2025
No ratings yet
AMAKH - Deep Learning - 2025
5 pages
Bbae 275
No ratings yet
Bbae 275
15 pages
Aditya Singh Resume
No ratings yet
Aditya Singh Resume
1 page
Final Report Park
No ratings yet
Final Report Park
53 pages
Machine Learning and Optimization For Engineering Design 1st Edition Apoorva S. Shastri Full Chapters Instanly
100% (1)
Machine Learning and Optimization For Engineering Design 1st Edition Apoorva S. Shastri Full Chapters Instanly
76 pages
Martínez Gil, J. (2021) .
No ratings yet
Martínez Gil, J. (2021) .
57 pages
Deep Learning MCQs
No ratings yet
Deep Learning MCQs
18 pages
Second Exam 2021-22 Solution
No ratings yet
Second Exam 2021-22 Solution
9 pages
Palm Leaf Health Management A Hybrid Approach For Automated Disease Detection and Therapy Enhancement
No ratings yet
Palm Leaf Health Management A Hybrid Approach For Automated Disease Detection and Therapy Enhancement
15 pages
Base Paper of Hostel Control System
No ratings yet
Base Paper of Hostel Control System
13 pages
8
No ratings yet
8
19 pages
Computer Vision
No ratings yet
Computer Vision
43 pages
Welcome To The Basics Guide To Generative AI and Prompt Engineering!
No ratings yet
Welcome To The Basics Guide To Generative AI and Prompt Engineering!
8 pages
Real Time Crop Harvest Time Prediction Model Using Raspberry Pi and Image Processing
No ratings yet
Real Time Crop Harvest Time Prediction Model Using Raspberry Pi and Image Processing
6 pages
Classifying Constructive Comments: University of British Columbia Jigsaw Simon Fraser University
No ratings yet
Classifying Constructive Comments: University of British Columbia Jigsaw Simon Fraser University
24 pages
AI Driven Risk Assessment
No ratings yet
AI Driven Risk Assessment
11 pages
DLT Unit-1
No ratings yet
DLT Unit-1
19 pages
State of AI in Education
No ratings yet
State of AI in Education
29 pages
A Brief History of AI How To Prevent Another Winter (A Critical Review) 2025-03-17 22-23-34
No ratings yet
A Brief History of AI How To Prevent Another Winter (A Critical Review) 2025-03-17 22-23-34
22 pages
Journal of Electrical and Computer Engineering - 2023 - Albaji - Investigation On Machine Learning Approaches For
No ratings yet
Journal of Electrical and Computer Engineering - 2023 - Albaji - Investigation On Machine Learning Approaches For
26 pages
42 Recurrent Neural Networks and LSTM
No ratings yet
42 Recurrent Neural Networks and LSTM
68 pages
AI Engineer Road Map 2024
No ratings yet
AI Engineer Road Map 2024
9 pages
11 Vol 103 No 1
No ratings yet
11 Vol 103 No 1
11 pages
Deep Learning Fill in The Blanks With Answers
No ratings yet
Deep Learning Fill in The Blanks With Answers
8 pages
AI Class 10th Answers (Unsolved Questions) 25-26
No ratings yet
AI Class 10th Answers (Unsolved Questions) 25-26
19 pages
Sentiment Analysis of Twitter Data Using
No ratings yet
Sentiment Analysis of Twitter Data Using
25 pages
465-Lecture 1 (Deep Learning)
No ratings yet
465-Lecture 1 (Deep Learning)
47 pages
Smart Surveillance Systems Using Yolov8: A Scalable Approach For Crowd and Threat Detection
No ratings yet
Smart Surveillance Systems Using Yolov8: A Scalable Approach For Crowd and Threat Detection
13 pages
Internship Report Vanaja 4-1 VANAJA
No ratings yet
Internship Report Vanaja 4-1 VANAJA
52 pages
Award Price Estimator For Public Procurement Auctions Using Machine Learning Algorithms: Case Study With Tenders From Spain
No ratings yet
Award Price Estimator For Public Procurement Auctions Using Machine Learning Algorithms: Case Study With Tenders From Spain
10 pages

Irjet V12i425

Uploaded by

Irjet V12i425

Uploaded by

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 04 | Apr 2025 [Link] p-ISSN: 2395-0072

Enhancing Loan Default Prediction and Fraud Detection with

model effectively distinguishes between positive and

Fig-2 Final Ensemble Model Performance Metrics and

The comparison table shows that the Random Forest

This ROC curve evaluates the classification performance

and repayment trends. Key indicators like the largest

[Link] Models Used in This Project and Their

2. Random Forest Classifier decision-making, helping financial institutions understand

[9] Boulieris, P., Pavlopoulos, J., Xenos, A. et al. Fraud

Prof. Pramila M. Chawan, is working as an Associate

You might also like