Exploratory Data Analysis (EDA) Summary Report
1. Introduction
This report presents the findings from an exploratory data analysis (EDA) conducted on
Geldium’s credit delinquency dataset. The goal is to assess the dataset’s structure, identify
data quality issues, and uncover early indicators of delinquency risk to support the
development of AI-driven predictive models and intervention strategies.
2. Dataset Overview
Key dataset attributes:
Number of records: Approx. 1000 (estimated from sample)
Key variables: Income, Credit_Utilization, Debt_to_Income_Ratio,
Missed_Payments, Delinquent_Account, Employment_Status, Credit_Score, Monthly
repayment flags (Month_1 to Month_6)
Data types:
o Numerical: Income, Credit_Utilization, Credit_Score, Debt_to_Income_Ratio
o Categorical: Employment_Status, Credit_Card_Type, Location, Month_1 to
Month_6
3. Missing Data Analysis
Key missing data findings:
Income (18%)
Credit_Utilization (12%)
Employment_Status (9%)
Missing data treatment:
Income: Median imputation grouped by employment type
Credit_Utilization: Regression-based imputation using debt ratio and missed
payments
Employment_Status: Mode imputation
4. Key Findings and Risk Indicators
Key findings:
High correlation between Credit_Utilization and Delinquent_Account
High Debt_to_Income_Ratio (>0.6) increases delinquency likelihood
Low Credit_Score and multiple missed payments are major risk signals
Unexpected anomalies:
Some customers with high credit scores are marked delinquent
Delinquent accounts found with short account tenure (<3 months)
5. AI & GenAI Usage
GenAI tools (ChatGPT) were used for:
Summarizing dataset structure
Suggesting imputation methods
Drafting EDA report content
Example prompts used:
“Summarize key patterns, outliers, and missing values in this dataset.”
“Identify the top 3 variables most likely to predict delinquency.”
“Suggest imputation strategies for missing income data.”
6. Conclusion & Next Steps
The dataset is largely complete and structured, with missing values in key financial features
handled via imputation. Several risk factors for delinquency have been identified, such as
high credit utilization, high DTI, low credit score, and missed payments. The next step is to
build and test a predictive model using these insights and refine intervention strategies
accordingly.