Final Project – Title & Abstract
Group-3
Title of the Project :
Loan Default Prediction : A Machine Learning Approach to Risk Mitigation
Names of Team Members :
Sushant Chadha, Sri Naga Rudrama Kondamudi, Samiksha Gavand
Abstract :
Predicting loan defaults is essential for financial institutions to reduce risks and improve
decision-making processes. This project focuses on building a system that analyzes borrower
data, including financial history and demographic details, to identify patterns that indicate the
likelihood of loan repayment. By utilizing advanced machine learning techniques, the system
aims to provide accurate predictions, enabling lenders to assess risks effectively and make
informed decisions. This approach ensures a balance between minimizing defaults and
maintaining smooth loan approval workflows, contributing to better financial stability and
operational efficiency.
Objective :
Loan defaults pose a significant challenge for financial institutions, leading to financial losses
and increased risk exposure. Accurately predicting loan defaulters is critical to minimizing
these risks and ensuring stable operations. The problem involves identifying patterns and key
factors from borrower data that can predict the likelihood of loan repayment or default.
This problem is important because it directly impacts the profitability, operational efficiency,
and risk management strategies of lenders. Financial institutions can use these predictions to
make more informed decisions, optimize loan approval processes, and take proactive measures
to mitigate risks.
The primary users of this solution are banks, lending companies, and credit agencies seeking
to improve their risk assessment and decision-making processes.
Data :
Dataset link- Loan_default.csv
The dataset contains 255,347 rows and 18 columns. Here’s an overview of its structure:
Key Columns
LoanID: Unique identifier for each loan.
Age: Borrower's age.
Income: Annual income of the borrower (in dollars).
LoanAmount: Amount borrowed.
CreditScore: Credit score of the borrower.
MonthsEmployed: Total months of employment.
NumCreditLines: Number of active credit lines.
InterestRate: Interest rate for the loan (in %).
LoanTerm: Loan repayment period (in months).
DTIRatio: Debt-to-income ratio.
Education: Borrower’s educational background.
EmploymentType: Nature of employment (e.g., full-time, part-time, unemployed).
MaritalStatus: Marital status of the borrower.
HasMortgage: Whether the borrower has a mortgage (Yes/No).
HasDependents: Whether the borrower has dependents (Yes/No).
LoanPurpose: Purpose of the loan (e.g., auto, business).
HasCoSigner: Whether the borrower has a co-signer (Yes/No).
Default: Whether the borrower defaulted on the loan (0 = No, 1 = Yes).
Observations:
All columns are fully populated, with no missing data.
Numeric columns include details like income, loan amount, and credit score.
Categorical columns include details such as education, employment type, and marital status.
Potential Insights:
Default Rate: Percentage of loans that defaulted.
Credit Risk Indicators: Relationships between factors like credit score, DTI ratio, and defaults.
Demographics and Loan Behavior: Influence of age, education, and marital status on default.
Loan Attributes: Impact of interest rates, loan amounts, and terms on default probability.
Employment & Financial Health: Examining how employment status and income influence
default.
Model :
For predicting loan defaulters, the following models will be considered :
• Logistic Regression: A simple and interpretable model that serves as a baseline for
classification tasks.
• Random Forest: An ensemble model that captures non-linear relationships and reduces
overfitting, suitable for structured datasets.
• Gradient Boosting (e.g., XGBoost): Known for its high accuracy and ability to handle
imbalanced datasets effectively.
Evaluation Criteria:
To determine the best-performing model, the following metrics will be used:
• Accuracy: For an overall evaluation of correct predictions.
• Precision and Recall: To assess the model’s effectiveness in identifying defaulters and
minimizing false negatives.
• F1-Score: To balance precision and recall, particularly for imbalanced data.
• ROC-AUC Score: To measure the model’s ability to distinguish between defaulters
and non-defaulters.
Expected Outcome :
• Accurate identification of borrowers likely to default on their loans.
• Reduction in financial losses by minimizing loan defaults.
• Streamlined and efficient loan approval processes.
• Data-driven insights to improve credit policies and lending strategies.
• Improved financial stability and sustainable growth for lending institutions.
Guidance from Instructor:
Currently, we are in the initial stage of project planning. Further variations with the data will
be updated. As soon as we start implementing the project, we will reach out to you if we need
any assistance.