0% found this document useful (0 votes)
94 views4 pages

Predictive Model Plan Report

The report outlines a predictive model plan for assessing customer delinquency using a Random Forest algorithm, emphasizing data preparation, feature selection, model training, and evaluation strategies. It highlights the importance of accuracy, precision, recall, and ethical considerations such as transparency and fairness in the model's deployment. The model aims to balance performance with interpretability to support risk-based decision-making in a financial context.

Uploaded by

sk3818966
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views4 pages

Predictive Model Plan Report

The report outlines a predictive model plan for assessing customer delinquency using a Random Forest algorithm, emphasizing data preparation, feature selection, model training, and evaluation strategies. It highlights the importance of accuracy, precision, recall, and ethical considerations such as transparency and fairness in the model's deployment. The model aims to balance performance with interpretability to support risk-based decision-making in a financial context.

Uploaded by

sk3818966
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Predictive model plan report

1 model logic (generated with genAI)


1 Data Preparation

• Impute missing values (Income, Loan_Balance, Credit_Score)

• Cap Credit_Utilization at 1.0

• Encode categorical variables (e.g., Employment_Status, Month_1–Month_6)

• Normalize or scale numerical features if needed

2. Feature Selection

Select key predictors:

• Credit_Utilization
• Debt_to_Income_Ratio
• Missed_Payments
• Account_Tenure
• Recent Payment Status (Month_6)
• Employment_Status
• Credit_Score, Income

3. Model Training

• Choose model: Logistic Regression (baseline), or Random Forest (for feature importance)

• Train on labeled data (Delinquent_Account as target)

4. Model Evaluation

• Use train-test split or cross-validation

• Evaluate with accuracy, precision, recall, F1-score

• Review confusion matrix for false positives/negatives

5. Prediction and Risk Scoring

• Output a binary label (0 = Non-delinquent, 1 = Delinquent)

• Optional: Output probability score for risk ranking.

Pseudocode

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import classification_report

# Step 1: Define features and target


X = df_cleaned[selected_features] # e.g., Credit_Utilization, etc.

y = df_cleaned['Delinquent_Account']

# Step 2: Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 3: Model training

model = RandomForestClassifier()

model.fit(X_train, y_train)

# Step 4: Prediction

y_pred = model.predict(X_test)

# Step 5: Evaluation

print(classification_report(y_test, y_pred))

2 justification for model choice


For the task of predicting customer delinquency, the Random Forest model was selected based on a
balance of accuracy, interpretability, and operational fit for Geldium’s business environment.

Factor Justification
Random Forests are ensemble models that
combine multiple decision trees to improve
Factor Justification prediction performance and reduce overfitting.
They consistently outperform simpler models
Accurac on structured financial data.
Transparency While not as transparent as logistic regression,
Random Forests provide feature importance
scores, allowing business analysts to
understand which variables most influence risk
predictions.
Ease of Use Random Forests are easy to implement using
libraries like scikit-learn, require minimal
feature scaling, and are robust to outliers and
missing data.
Financial Relevance tree-based models like Random Forests have a
proven track record in credit scoring and fraud
detection, making them well-aligned with
financial use cases.
Geldium needs fast, interpretable, and
Business Suitability (Geldium) deployable solutions to identify high-risk
customers. Random Forests offer a strong
trade-off between performance and
explainability, making them ideal for risk-based
decision support systems.

Alternative considerations

• Logistic Regression was considered for its simplicity and transparency, but it lacks the ability
to capture complex, non-linear relationships in behavioral data.
• Neural Networks were ruled out for now due to their "black-box" nature, which is not ideal
for regulated industries like finance where model interpretability is crucial.

3 Evaluation strategy
To ensure the model is both effective and responsible, we use a combination of performance
metrics, bias detection techniques, and ethical safeguards.

Metric What is measure Why it matter for delinquency


predictions
Accuracy Overall correctness of the Useful for balanced datasets,
model's predictions but can be misleading if classes
are imbalanced
Precision % of predicted delinquents Important for minimizing false
that were actually delinquent positives (wrongly labeling
someone as high-risk).
Recall Critical for catching at-risk
% of actual delinquents customers and preventing
correctly identified financial losses.
F1 score Balances false positives and
Harmonic mean of precision
false negatives, ideal for
and recal
uneven class distributions.
Area under ROC curve(AUC) Ability of model to distinguish Robust summary of model
between delinquent and non- performance across
delinquent cases thresholds. AUC closer to 1 is
ideal

Metric interpretation

• High Recall + Moderate Precision: Acceptable in early warning systems to flag potential risk,
followed by manual review.
• Low Recall: Risk of missing truly delinquent customers — unacceptable for financial
applications.
• High F1 Score: Signals strong balance — key indicator of a model ready for production.
bias detections & mitigations

Technique Purpose
Stratified sampling Ensures balanced representation of delinquent
and non-delinquent cases during training.
Fairness audits Evaluate model performance across subgroups
(e.g., gender, location, income level).
Feature sensitivity analysis
Detect if non-relevant features (e.g., ZIP code,
ethnicity) are unduly influencing outcomes
Re-weighting
Adjust class distribution to prevent model
from favoring the majority class.

Ethical considerations

• Transparency: Customers have a right to know if and why they were flagged as high-risk. The
model must support explainability.
• Fairness: Avoid discrimination against protected groups. Model inputs should be behavior-
and performance-based, not demographic.
• Human Oversight: High-risk predictions should trigger manual review, not automatic
rejections or penalties.
• Data Privacy: All customer data used must be anonymized, securely stored, and aligned with
data protection regulations (e.g., GDPR).

You might also like