0% found this document useful (0 votes)
2 views2 pages

Task 1

Uploaded by

abhi.patel190105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views2 pages

Task 1

Uploaded by

abhi.patel190105
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Exploratory Data Analysis (EDA) Summary Report

1. Introduction
This report presents the findings from an exploratory data analysis (EDA) conducted on
Geldium’s credit delinquency dataset. The goal is to assess the dataset’s structure, identify
data quality issues, and uncover early indicators of delinquency risk to support the
development of AI-driven predictive models and intervention strategies.

2. Dataset Overview
Key dataset attributes:
 Number of records: Approx. 1000 (estimated from sample)
 Key variables: Income, Credit_Utilization, Debt_to_Income_Ratio,
Missed_Payments, Delinquent_Account, Employment_Status, Credit_Score, Monthly
repayment flags (Month_1 to Month_6)
 Data types:
o Numerical: Income, Credit_Utilization, Credit_Score, Debt_to_Income_Ratio

o Categorical: Employment_Status, Credit_Card_Type, Location, Month_1 to


Month_6

3. Missing Data Analysis


Key missing data findings:
 Income (18%)
 Credit_Utilization (12%)
 Employment_Status (9%)
Missing data treatment:
 Income: Median imputation grouped by employment type
 Credit_Utilization: Regression-based imputation using debt ratio and missed
payments
 Employment_Status: Mode imputation

4. Key Findings and Risk Indicators


Key findings:
 High correlation between Credit_Utilization and Delinquent_Account
 High Debt_to_Income_Ratio (>0.6) increases delinquency likelihood
 Low Credit_Score and multiple missed payments are major risk signals
Unexpected anomalies:
 Some customers with high credit scores are marked delinquent
 Delinquent accounts found with short account tenure (<3 months)

5. AI & GenAI Usage


GenAI tools (ChatGPT) were used for:
 Summarizing dataset structure
 Suggesting imputation methods
 Drafting EDA report content
Example prompts used:
 “Summarize key patterns, outliers, and missing values in this dataset.”
 “Identify the top 3 variables most likely to predict delinquency.”
 “Suggest imputation strategies for missing income data.”

6. Conclusion & Next Steps


The dataset is largely complete and structured, with missing values in key financial features
handled via imputation. Several risk factors for delinquency have been identified, such as
high credit utilization, high DTI, low credit score, and missed payments. The next step is to
build and test a predictive model using these insights and refine intervention strategies
accordingly.

You might also like