0% found this document useful (0 votes)
37 views2 pages

Exploratory Data Analysis (EDA) Report

This report presents an exploratory data analysis (EDA) of Geldium's delinquency prediction dataset, focusing on understanding its structure, quality, and identifying risk indicators for customer delinquency. Key findings include correlations between Credit_Score, Credit_Utilization, and Missed_Payments as significant risk factors, alongside data quality issues such as inconsistent categorical entries. The next steps involve data cleaning, imputation, and developing a predictive model to forecast delinquency risk.

Uploaded by

majjigagowtham21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views2 pages

Exploratory Data Analysis (EDA) Report

This report presents an exploratory data analysis (EDA) of Geldium's delinquency prediction dataset, focusing on understanding its structure, quality, and identifying risk indicators for customer delinquency. Key findings include correlations between Credit_Score, Credit_Utilization, and Missed_Payments as significant risk factors, alongside data quality issues such as inconsistent categorical entries. The next steps involve data cleaning, imputation, and developing a predictive model to forecast delinquency risk.

Uploaded by

majjigagowtham21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Exploratory Data Analysis (EDA) Report

1. Introduction
This report summarizes the exploratory data analysis (EDA) conducted on Geldium's
delinquency prediction dataset. The primary goal of this analysis is to understand the dataset's
structure, quality, and to identify key variables and patterns that may serve as risk indicators for
predicting customer delinquency. The insights gained will inform the development of a predictive
model and support the refinement of intervention strategies.
2. Dataset Overview
The dataset contains customer financial information, including demographics, credit behavior,
and payment history over a six-month period.
Key dataset attributes:
Number of records: 494
Key variables: Customer_ID, Age, Income, Credit_Score, Credit_Utilization, Missed_Payments,
Delinquent_Account, Loan_Balance, Debt_to_Income_Ratio, Employment_Status,
Account_Tenure, Credit_Card_Type, Location, and Month_1 to Month_6 payment history.
Data types: The dataset contains a mix of numerical (e.g., Age, Income), categorical (e.g.,
Employment_Status, Location), and binary/ordinal (e.g., Delinquent_Account, Month_1 to
Month_6) data types.
3. Missing Data Analysis
Identifying and addressing missing data is critical to ensuring model accuracy. This section
outlines missing values in the dataset, the approach taken to handle them, and justifications for
the chosen method.
Key missing data findings:
Variables with missing values:
Employment_Status has inconsistent values (e.g., EMP, employed, Employed).
Payment history columns (Month_1 to Month_6) contain inconsistent payment status entries like
Missed which should be categorized consistently.
3. Key Findings and Risk Indicators
This section identifies trends and patterns that may indicate risk factors for delinquency. Feature
relationships and statistical correlations are explored to uncover insights relevant to predictive
modeling.
Key findings:
Correlations observed between key variables:
A strong negative correlation is likely to exist between Credit_Score and the number of
Missed_Payments. A lower credit score is a known indicator of higher credit risk.
Credit_Utilization is a key risk factor. Customers with high credit utilization tend to have a higher
propensity for delinquency.
Missed_Payments and Delinquent_Account show a direct correlation. The more payments a
customer has missed, the more likely their account is to be marked as delinquent.
Unexpected anomalies:
Some records may show a low Credit_Score but a low number of Missed_Payments, which
could indicate recent financial distress or data entry errors that require further investigation.
4. AI & GenAI Usage
Generative AI tools were used to summarize the dataset, impute missing data, and detect
patterns. This section documents AI-generated insights and the prompts used to obtain results.
Example AI prompts used:
Summarize key patterns in the dataset and identify anomalies.
Suggest an imputation strategy for missing income values based on industry best practices.
5. Conclusion & Next Steps
The EDA revealed that while the dataset is relatively clean, some data quality issues, such as
inconsistent categorical entries, need to be addressed before modeling. The analysis identified
several key risk indicators, including Credit_Utilization, Credit_Score, and Missed_Payments.
The next steps involve a comprehensive data cleaning and imputation process, followed by
feature engineering and the development of a predictive model to forecast delinquency risk.
Initial Data Quality Observations
The dataset appears to be of moderate quality, but several inconsistencies and potential issues
were identified. The Employment_Status column contains a variety of capitalization and
spelling, which needs to be standardized. The payment history columns (Month_1 to Month_6)
also have inconsistent values that require normalization to a single standard. These issues, if
not addressed, could negatively impact the accuracy and reliability of any predictive model built
on this data. Addressing these inconsistencies is a critical next step.
High-Risk Indicators and Insights
High Credit Utilization: A high ratio of credit used to total credit available is a significant risk
factor, as it indicates a customer may be overextended and more likely to miss payments.
Low Credit Score: A low credit score is a direct measure of a customer's creditworthiness and is
strongly correlated with a higher risk of delinquency.
Multiple Missed Payments: The total count of missed payments serves as a historical record of
poor payment behavior and is a strong predictor of future delinquency.
High Debt-to-Income Ratio: A high ratio of debt to income suggests that a customer's financial
obligations may be too large to manage, increasing the likelihood of missed payments.
Short Account Tenure: Newer customers with a short account history may present a higher risk
due to a lack of established payment behavior.
Unexpected trends: Some accounts with high income and good credit scores show signs of
delinquency, which could indicate anomalies or specific life events not captured in the data.
Further investigation is needed to understand these cases.

You might also like