Predictive Model Plan – Customer Delinquency Prediction
1. Model Logic (Generated with GenAI)
Objective: Predict whether a customer will become delinquent (Delinquent_Account = 1)
based on financial and demographic features.
Structure:
- Step 1: Load and preprocess dataset
- Handle missing values (median/mean imputation)
- Encode categorical variables (e.g., Employment_Status, Credit_Card_Type)
- Normalize or scale numeric features
- Step 2: Feature selection
- Identify top predictors (e.g., Missed_Payments, Credit_Score, Credit_Utilization)
- Step 3: Train predictive models (baseline + advanced)
- Logistic Regression (baseline)
- Decision Tree / Random Forest / XGBoost
- Step 4: Evaluate model using cross-validation
- Step 5: Generate predictions and interpret results using SHAP or feature importance plots
GenAI Prompt Example Used:
'Write a Python-based pseudocode to build a predictive model for customer delinquency
using financial data.'
2. Justification for Model Choice
Chosen Models:
- Logistic Regression for baseline: simple, interpretable, and effective for binary
classification
- Random Forest for performance: handles non-linearity and captures feature interactions
Reasons:
- High interpretability is crucial in financial services
- Ensemble models like Random Forest and XGBoost capture complex delinquency patterns
- Transparent results using SHAP or feature importance aid in business decision-making
Business Relevance:
- Supports early identification of high-risk customers
- Enables informed credit and collection strategies
- Aligns with Geldium’s need for explainable, scalable, and reliable risk models
3. Evaluation Strategy
Key Evaluation Metrics:
- Accuracy: Overall prediction performance
- Precision & Recall: Important for reducing false positives/negatives
- F1 Score: Balances precision and recall
- AUC-ROC: Measures classifier’s overall quality
Bias Detection & Ethics:
- Monitor disparate impact across demographic features (e.g., Employment_Status,
Location)
- Evaluate fairness metrics (e.g., demographic parity)
- Prevent overfitting and ensure data privacy
- Use ethical AI practices to avoid discrimination or biased predictions
Plan:
- Apply k-fold cross-validation
- Use confusion matrix to examine false positives/negatives
- Visualize feature importance using SHAP values
- Regularly retrain model to reflect evolving data patterns