0% found this document useful (0 votes)
41 views2 pages

EDA SummaryReport

Uploaded by

ajayjalal2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views2 pages

EDA SummaryReport

Uploaded by

ajayjalal2005
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Exploratory Data Analysis (EDA) Report

Generated on August 18, 2025


1. Introduction

The purpose of this report is to analyze Geldium’s dataset in order to assess its
readiness for predictive modeling. The analysis focuses on identifying missing data,
detecting anomalies, and uncovering early indicators of delinquency risk. The findings
will guide Tata iQ’s analytics team in refining delinquency risk models and improving
intervention strategies.

2. Dataset Overview

The dataset contains several thousand customer records with key attributes such as payment
history, credit utilization, income, account age, and delinquency outcomes. Data types
include both categorical (e.g., customer segment, product type) and numerical (e.g.,
income, utilization ratio, number of late payments).

Notable anomalies: Some utilization rates exceed 100%, negative or zero incomes were
detected, and duplicates were observed in customer IDs. These will require correction to
prevent skewed modeling results.

3. Missing Data Analysis

Several variables exhibit missing values. For example, income data is missing for ~15% of
records, credit utilization has ~8% missing, and payment history fields are incomplete for
~5%.

Treatment approach: - Income: Imputed using log-normal distribution with segment-wise


medians. - Credit utilization: Median imputation within account type bands. - Payment
history: Missingness flagged with binary indicator and imputed with 'no late payments'.

4. Key Findings and Risk Indicators

The following trends and patterns were identified as risk indicators for delinquency:

- High credit utilization (>80%) strongly correlates with missed payments. - A payment-to-
income ratio above 40% is a significant predictor of stress. - Customers with 2+ recent
credit inquiries show higher delinquency likelihood. - Accounts with age <12 months are at
elevated risk due to thin credit files. - Unexpected anomaly: Some high-income segments
still show high delinquency, suggesting behavioral or product-specific risks.

5. AI & GenAI Usage

Generative AI tools (ChatGPT) were used to summarize dataset characteristics, propose


imputation strategies, and highlight key risk factors. Prompts included:

- 'Summarize key patterns in the dataset and identify anomalies.' - 'Suggest an imputation
strategy for missing income values based on industry best practices.' - 'Identify the top
variables most likely to predict delinquency risk.'

6. Conclusion & Next Steps

You might also like