0% found this document useful (0 votes)
10 views3 pages

EDA Summary Report

This report analyzes a dataset for predicting customer delinquency, identifying data quality issues and key predictors such as Credit_Utilization, Missed_Payments, and Credit_Score. It highlights missing data, inconsistencies, and anomalies that need to be addressed for accurate modeling. The next steps involve cleaning the data and conducting further analysis to ensure reliability in delinquency predictions.

Uploaded by

Geeta Birle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views3 pages

EDA Summary Report

This report analyzes a dataset for predicting customer delinquency, identifying data quality issues and key predictors such as Credit_Utilization, Missed_Payments, and Credit_Score. It highlights missing data, inconsistencies, and anomalies that need to be addressed for accurate modeling. The next steps involve cleaning the data and conducting further analysis to ensure reliability in delinquency predictions.

Uploaded by

Geeta Birle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Exploratory Data Analysis (EDA) Summary

1. Introduction
This report reviews the dataset used for predicting customer delinquency. The goal
is to identify data quality issues, detect key trends or anomalies, and highlight the
top predictors of delinquency risk.

2. Dataset Overview
Number of records: 110 (sample from CUST0001–CUST0110)

Key variables:

- Demographics: Age, Employment_Status, Location


- Financials: Income, Credit_Score, Credit_Utilization, Loan_Balance,
Debt_to_Income_Ratio
- Behavioral: Missed_Payments, Account_Tenure, Credit_Card_Type, Month_1–
Month_6 (repayment behavior)
- Target: Delinquent_Account (1 = delinquent, 0 = non-delinquent)

Data types: Mostly numerical (Age, Income, Credit_Score, Ratios), some categorical
(Employment_Status, Credit_Card_Type, Location).

3. Missing Data Analysis


Variables with missing or inconsistent values:

- Income: Missing for several customers (e.g., CUST0041, CUST0043,


CUST0049, CUST0060, etc.)
- Loan_Balance: Missing in multiple rows.
- Employment_Status: Inconsistencies such as ‘EMP’, ‘employed’, and
‘Employed’.
- Credit_Utilization: Contains extreme values (0.05 and >1.0).

Recommended treatment:
- Impute Income and Loan_Balance using median values by
Employment_Status.
- Standardize categorical text (e.g., unify Employment_Status casing).

Identifying and addressing missing data is critical to ensuring model accuracy. This
section outlines missing values in the dataset, the approach taken to handle them,
and justifications for the chosen method.

4. Key Findings and Risk Indicators


Correlations and patterns:

- Higher Missed_Payments and frequent ‘Late’/’Missed’ entries across months


align with delinquency.
- Lower Credit_Score (<450) and high Credit_Utilization (>0.6) correlate with
delinquency.
- Unemployed and Self-employed customers show higher risk.
- Short Account_Tenure (<5 years) may indicate instability.

Unexpected anomalies:

- Credit_Utilization values above 1.0 (e.g., 1.025) indicate scaling errors.


- Mixed casing for Employment_Status.
- Missing financial data reduces completeness.

5. AI & GenAI Usage


AI-assisted analysis was used to summarize key data patterns, identify missing
value trends, and detect anomalies. Prompts focused on identifying predictors of
delinquency and highlighting quality issues.

Prompts used:

- ‘Summarize key patterns, outliers, and missing values in this dataset.’


- ‘Identify early indicators of delinquency risk.’
- ‘Recommend imputation and data cleaning strategies for inconsistent
categorical fields.’
6. Conclusion & Next Steps:
Top 3 variables most likely to predict delinquency:

1. Credit_Utilization – Higher utilization rates often signal financial strain.


2. Missed_Payments – Direct behavioral evidence of repayment issues.
3. Credit_Score – Reflects long-term creditworthiness.

Next Steps:

- Clean and standardize categorical variables.


- Impute missing financial values.
- Cap or normalize outlier Credit_Utilization values.
- Conduct correlation and feature importance analysis before model training.

Overall, the dataset provides a rich mix of behavioral and financial indicators for
delinquency modeling. However, several quality issues were observed, including
missing income and loan data, inconsistent text formatting, and extreme credit
utilization values. The monthly repayment patterns and missed payment counts are
strong early indicators of risk. Once missing values and anomalies are addressed,
the dataset will be suitable for reliable delinquency prediction modeling.

You might also like