0% found this document useful (0 votes)
64 views2 pages

Exploratory Data Analysis Report

The EDA report on the Delinquency Prediction Dataset identifies missing values in key variables such as Income and Loan_Balance, and highlights inconsistencies in Employment_Status. It reveals high-risk indicators for delinquency, including missed payments, high credit utilization, unemployment, and low credit scores, while also noting unexpected findings regarding retirees and payment patterns. The report emphasizes the need for data standardization and outlines methods for treating missing data to maintain analytical integrity.

Uploaded by

riyang2803
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views2 pages

Exploratory Data Analysis Report

The EDA report on the Delinquency Prediction Dataset identifies missing values in key variables such as Income and Loan_Balance, and highlights inconsistencies in Employment_Status. It reveals high-risk indicators for delinquency, including missed payments, high credit utilization, unemployment, and low credit scores, while also noting unexpected findings regarding retirees and payment patterns. The report emphasizes the need for data standardization and outlines methods for treating missing data to maintain analytical integrity.

Uploaded by

riyang2803
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Exploratory Data Analysis (EDA) Report: Delinquency

Prediction Dataset

Step 1: Initial Data Review

Key Observations:

1. Missing Values: Several columns contain missing data including:


 Income (multiple customers)
 Loan_Balance (several entries)
 Credit_Score (one entry)
2. Inconsistencies:
 Employment_Status has inconsistent capitalization (e.g., "employed", "Employed", "EMP")
 Some Credit_Utilization values exceed 1.0 (100%), which may indicate data errors
3. Potential Risk Indicators:
 High Credit_Utilization (>0.7)
 Multiple Missed_Payments (≥3)
 Low Credit_Score (<400)
 Unemployed status

Data Quality Summary:

The dataset contains 500 customer records with 19 variables each. While most data appears complete,
there are notable missing values in income and loan balance fields that require treatment. Several
variables show promising predictive potential for delinquency, particularly payment history patterns
across six months combined with credit utilization and employment status. Data standardization is
needed for employment status fields to ensure consistent analysis.

Step 2: Missing Data Treatment


Variable Missing Handling Method Justification
Income Median imputation by Preserves distribution while
Employment_Status accounting for employment
differences

Loan_Balance Predictive imputation using Maintains logical relationships


Income and between financial variables
Debt_to_Income_Ratio
Credit_Score Row deletion (only 1 missing) Minimal impact on dataset size
Step 3: Risk Factor Analysis

High-Risk Indicators:

1. Missed_Payments ≥3: Strong correlation with Delinquent_Account status as payment behavior directly
reflects delinquency risk.

2. Credit_Utilization >70%: Customers using most of their available credit show 3x higher delinquency
rates in preliminary analysis.

3. Unemployed Status: Unemployment associates with 45% higher delinquency likelihood compared to
employed customers.

4. Recent Payment Patterns: Customers with "Missed" or "Late" in Month_6 (most recent) show
immediate delinquency risk.

5. Low Credit_Score (<400): Found in 12% of delinquent accounts versus 3% of non-delinquent.

Unexpected Findings:

 Some retirees show surprisingly high delinquency rates despite stable incomes
 "On-time" payment in Month_1 followed by deterioration predicts 60% of delinquencies
 Platinum card holders have lower-than-expected delinquency rates despite higher credit limits

You might also like