0% found this document useful (0 votes)

29 views5 pages

Understanding Feature Importance in Logistic Regression

The document discusses feature importance in logistic regression, highlighting how coefficients indicate the influence of various features on model predictions. Key observations reveal that scaled numerical features and certain demographic categories significantly impact fixed deposit recommendations, while age and withdrawal trends are less influential. Recommendations for model improvement include feature selection, regularization adjustments, and evaluating model generalization to enhance performance.

Uploaded by

sushnt0345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views5 pages

Understanding Feature Importance in Logistic Regression

Uploaded by

sushnt0345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Understanding Feature Importance in Logistic

Regression
Feature importance helps in understanding which variables most significantly impact the
model's predictions. In a logistic regression model, the coefficients of each feature determine
its influence on the final decision boundary.

 Positive Coefficients: Indicate a positive relationship between the feature and the target
variable. Higher values of the feature increase the probability of the positive class.

 Negative Coefficients: Indicate a negative relationship, meaning that higher values of

the feature decrease the likelihood of the positive class.

 Magnitude of Importance: Larger absolute values (either positive or negative) indicate

stronger influence.

Key Observations from the Feature Importance Graph

1. Scaled Numerical Features:

o The most important features are scaled_numerical_features_X, which are

transformed versions of numerical variables.

o These indicate continuous variables, such as transaction amounts, number of

accounts, or customer behavior trends that strongly impact model predictions.

o Features like scaled_numerical_features_6, scaled_numerical_features_17, and

scaled_numerical_features_5 suggest that some transactional patterns
significantly influence the recommendation.

2. Categorical Features (One-Hot Encoded)

o Features like marital_status_vec_1.0, employment_status_vec_3.0, and

gender_vec_2.0 show how different demographic groups influence model
predictions.

o marital_status_vec_1.0 being highly important suggests that marital status plays

a strong role in whether a fixed deposit (FD) recommendation is made.

o employment_status_vec_3.0 indicates that specific employment categories have

a higher correlation with FD recommendations.

3. Employment and Occupation Features

o Multiple employment_status_vec_X and occupation_vec_X features have
medium to high importance, meaning job stability, income type, or industry
may be key indicators of fixed deposit interest.

4. Age and Withdrawal Trends

o age_group_vec_X and withdrawal_trends_vec_X are present but lower in

importance, suggesting that while they may contribute, they are not the
strongest predictors.

How This Helps in Model Optimization

 Feature Selection: If some features have very low importance, they can be removed to
simplify the model and reduce noise.

 Regularization Adjustments: If only a few features dominate, it may indicate overfitting

to specific variables. Increasing L1 regularization (elasticNetParam=1.0) could help in
feature sparsity.

 Domain Insights: This analysis confirms that customer demographics, employment

type, and transaction behavior are crucial factors in FD recommendations.

Next Steps for Model Improvement

1. Evaluate Model Generalization:

o If the model achieves high accuracy (e.g., ROC AUC = 1.0), it may be overfitting.
Testing on new data will confirm this.

2. Feature Pruning:

o Remove low-importance categorical features if they do not contribute much.

o Focus more on highly important transactional features.

3. Hyperparameter Fine-Tuning:

o Increase L1 Regularization to remove less significant features.

o Adjust elasticNetParam to balance feature selection and generalization.

4. Check for Data Leakage:

o If certain features seem too powerful, they may contain unintended correlations
with the target variable.

Final Takeaway

The top features in this visualization provide strong indicators for how financial behaviors and
demographics affect FD recommendations. By refining feature selection and regularization, we
can build a more robust and interpretable model.
Class Distribution:

recommendation_fd_target count

1 13
0 32

Cross Validation with K fold where k = 4:

ROC AUC for Fold 1: 0.8846153846153847

ROC AUC for Fold 2: 0.9423076923076923

ROC AUC for Fold 3: 0.9927884615384616

ROC AUC for Fold 4: 0.9951923076923077

LogisticRegressionModel: uid=LogisticRegression_79495f0c73fe, numClasses=2,

numFeatures=54
Correlation matrix

Finance and Risk Analytics Project PDF
No ratings yet
Finance and Risk Analytics Project PDF
94 pages
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
No ratings yet
Finance and Risk Analytics Project Sai Vinayak Sanam PDF
99 pages
Reading Material - Module-5 - Introduction To Special Topics
No ratings yet
Reading Material - Module-5 - Introduction To Special Topics
27 pages
Credit Default Project 23124001
No ratings yet
Credit Default Project 23124001
13 pages
FRA Project Report - Chilla Nagaraju
100% (1)
FRA Project Report - Chilla Nagaraju
66 pages
Bankruptcy Prediction Model Overview
No ratings yet
Bankruptcy Prediction Model Overview
16 pages
FRA Extended
No ratings yet
FRA Extended
22 pages
Finance & Risk Analytics QSTN 1 - Credit Risk
No ratings yet
Finance & Risk Analytics QSTN 1 - Credit Risk
24 pages
Ind2627 - HM1
No ratings yet
Ind2627 - HM1
28 pages
Fra Business Report
No ratings yet
Fra Business Report
37 pages
DA Assignmnet 3 Based On Format Solu
No ratings yet
DA Assignmnet 3 Based On Format Solu
9 pages
Business Report M2 PDF
100% (2)
Business Report M2 PDF
14 pages
Unit V 2
No ratings yet
Unit V 2
30 pages
Dissertation Presentation Bidyut Mondal
No ratings yet
Dissertation Presentation Bidyut Mondal
22 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
Business Report FRA-Extended Project
No ratings yet
Business Report FRA-Extended Project
22 pages
India Credit Risk Model Report
No ratings yet
India Credit Risk Model Report
18 pages
A Strategic Framework For Predictive Feature Engineering - Maximizing Model Performance On E-Commerce Transaction Data
No ratings yet
A Strategic Framework For Predictive Feature Engineering - Maximizing Model Performance On E-Commerce Transaction Data
19 pages
Robust Principal Component Functional Logistic Regression
No ratings yet
Robust Principal Component Functional Logistic Regression
23 pages
Machine Learning Paper BD
No ratings yet
Machine Learning Paper BD
16 pages
Unit 3-2
No ratings yet
Unit 3-2
20 pages
Mscfe 652 PM Gwp1 2023 Set B
No ratings yet
Mscfe 652 PM Gwp1 2023 Set B
5 pages
Loan Status Prediction
No ratings yet
Loan Status Prediction
23 pages
Azki-Loan Data Analysis & Modeling
No ratings yet
Azki-Loan Data Analysis & Modeling
7 pages
R Package for Functional Data Analysis
No ratings yet
R Package for Functional Data Analysis
92 pages
Kaggle Competition Top 2% Strategies
No ratings yet
Kaggle Competition Top 2% Strategies
10 pages
DM Prathameshwadnerkar92
No ratings yet
DM Prathameshwadnerkar92
9 pages
Propensity Models
No ratings yet
Propensity Models
4 pages
Bank Credit Scoring Analysis With Bayesian Logistic Regression As A Decision Tool
No ratings yet
Bank Credit Scoring Analysis With Bayesian Logistic Regression As A Decision Tool
76 pages
Thesis Frank Wagenmans 3870154
No ratings yet
Thesis Frank Wagenmans 3870154
52 pages
Importance of Features in Machine Learning
No ratings yet
Importance of Features in Machine Learning
19 pages
Liquidity Dynamics in North American Stocks
No ratings yet
Liquidity Dynamics in North American Stocks
66 pages
Supervised Learning
100% (1)
Supervised Learning
15 pages
Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Credit Card Default Prediction PRESENTATION
No ratings yet
Credit Card Default Prediction PRESENTATION
12 pages
20BCP021 Assignment 6
No ratings yet
20BCP021 Assignment 6
15 pages
Muhammad Ali Ansari 24855 A2
No ratings yet
Muhammad Ali Ansari 24855 A2
5 pages
Logistic Regression
No ratings yet
Logistic Regression
41 pages
Final Credit Risk Prediction Report Corrected
No ratings yet
Final Credit Risk Prediction Report Corrected
19 pages
Features Selection and Featurs Generation
No ratings yet
Features Selection and Featurs Generation
5 pages
Financial Risk Analysis: Great Learning PGPBABI 2017
No ratings yet
Financial Risk Analysis: Great Learning PGPBABI 2017
25 pages
Assumptions and Limitations of Logistic Regression
No ratings yet
Assumptions and Limitations of Logistic Regression
6 pages
Numsense! Data Science For The Layman
100% (3)
Numsense! Data Science For The Layman
65 pages
Document 2
No ratings yet
Document 2
20 pages
Churn Prediction Algorithms Study
No ratings yet
Churn Prediction Algorithms Study
25 pages
RNNs in Financial Time-Series Analysis
No ratings yet
RNNs in Financial Time-Series Analysis
17 pages
Machine Learning PBL
No ratings yet
Machine Learning PBL
9 pages
Group 9
No ratings yet
Group 9
9 pages
Feature Selection 16891042299
No ratings yet
Feature Selection 16891042299
23 pages
Capstone Assessment
No ratings yet
Capstone Assessment
18 pages
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
No ratings yet
The Art of Finding The Best Features For Machine Learning - by Rebecca Vickery - Towards Data Science
14 pages
HBAT Industries Marketing Analysis Insights
No ratings yet
HBAT Industries Marketing Analysis Insights
12 pages
CRM Descriptive Analytics Guide
No ratings yet
CRM Descriptive Analytics Guide
33 pages
PSB Hackathon
No ratings yet
PSB Hackathon
15 pages
Mlproj
No ratings yet
Mlproj
49 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Banking Project Final
No ratings yet
Banking Project Final
38 pages
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
17 pages
Experiment by Anoop
No ratings yet
Experiment by Anoop
20 pages
Understanding Cronbach's Alpha Reliability
100% (1)
Understanding Cronbach's Alpha Reliability
16 pages
RM Model Question Final
No ratings yet
RM Model Question Final
3 pages
(Ebook) Microeconometrics Using Stata: Volume II: Nonlinear Models and Causal Inference Methods (Second Edition) by A. Colin Csmron, Pravin K. Trivedi ISBN 9781597183642, 1597183644 - The ebook version is available in PDF and DOCX for easy access
100% (1)
(Ebook) Microeconometrics Using Stata: Volume II: Nonlinear Models and Causal Inference Methods (Second Edition) by A. Colin Csmron, Pravin K. Trivedi ISBN 9781597183642, 1597183644 - The ebook version is available in PDF and DOCX for easy access
50 pages
Q-7, Ritika
No ratings yet
Q-7, Ritika
3 pages
One-Way Analysis of Variance (ANOVA) : Assumptions
No ratings yet
One-Way Analysis of Variance (ANOVA) : Assumptions
9 pages
Fidia Oktarisa, 2023
No ratings yet
Fidia Oktarisa, 2023
14 pages
Dollar-Peso Exchange Rate Forecasting
No ratings yet
Dollar-Peso Exchange Rate Forecasting
13 pages
EDU821 October 2019
No ratings yet
EDU821 October 2019
2 pages
Unit1 - Deep Learning 7th
No ratings yet
Unit1 - Deep Learning 7th
173 pages
Causal-Comparative Research Guide
No ratings yet
Causal-Comparative Research Guide
14 pages
ANOVA Analysis for Wheat Yield Data
No ratings yet
ANOVA Analysis for Wheat Yield Data
8 pages
Linear Regression and Gradient Descent
No ratings yet
Linear Regression and Gradient Descent
8 pages
An Economist Wished To Relate The Speed With Which A Particular Insurance Innovation Is Adopted
No ratings yet
An Economist Wished To Relate The Speed With Which A Particular Insurance Innovation Is Adopted
2 pages
Correlation & Regression Guide
100% (1)
Correlation & Regression Guide
19 pages
Data Science Lab
No ratings yet
Data Science Lab
28 pages
Arima Model
No ratings yet
Arima Model
6 pages
Dummy Variable Trap - LearnDataSci
No ratings yet
Dummy Variable Trap - LearnDataSci
10 pages
Linear Models 2nd Edition Shayle R. Searle PDF Download
No ratings yet
Linear Models 2nd Edition Shayle R. Searle PDF Download
52 pages
Activity Based Costing Testbanks
No ratings yet
Activity Based Costing Testbanks
18 pages
Time Series
No ratings yet
Time Series
14 pages
Machine Learning Algorithm Cheat Sheet
No ratings yet
Machine Learning Algorithm Cheat Sheet
1 page
LINEAR MODELS Cheatsheet
No ratings yet
LINEAR MODELS Cheatsheet
14 pages
Answers To Analysis of Covarience Research Questions
No ratings yet
Answers To Analysis of Covarience Research Questions
9 pages
SVM Set3
No ratings yet
SVM Set3
6 pages
Generalized Estimating Equations (Gees)
No ratings yet
Generalized Estimating Equations (Gees)
40 pages
Joint Probability Density Functions Explained
No ratings yet
Joint Probability Density Functions Explained
1 page
Decision Tree
0% (1)
Decision Tree
24 pages
Factor Analysis Quiz
No ratings yet
Factor Analysis Quiz
4 pages
Lampiran Hasil Analisis Jalur Dengan Lisrel
No ratings yet
Lampiran Hasil Analisis Jalur Dengan Lisrel
7 pages

Understanding Feature Importance in Logistic Regression

Uploaded by

Understanding Feature Importance in Logistic Regression

Uploaded by

Understanding Feature Importance in Logistic

 Negative Coefficients: Indicate a negative relationship, meaning that higher values of

 Magnitude of Importance: Larger absolute values (either positive or negative) indicate

Key Observations from the Feature Importance Graph

1. Scaled Numerical Features:

o The most important features are scaled_numerical_features_X, which are

o These indicate continuous variables, such as transaction amounts, number of

o Features like scaled_numerical_features_6, scaled_numerical_features_17, and

2. Categorical Features (One-Hot Encoded)

o Features like marital_status_vec_1.0, employment_status_vec_3.0, and

o marital_status_vec_1.0 being highly important suggests that marital status plays

o employment_status_vec_3.0 indicates that specific employment categories have

3. Employment and Occupation Features

4. Age and Withdrawal Trends

o age_group_vec_X and withdrawal_trends_vec_X are present but lower in

How This Helps in Model Optimization

 Regularization Adjustments: If only a few features dominate, it may indicate overfitting

 Domain Insights: This analysis confirms that customer demographics, employment

Next Steps for Model Improvement

1. Evaluate Model Generalization:

o Remove low-importance categorical features if they do not contribute much.

o Focus more on highly important transactional features.

o Increase L1 Regularization to remove less significant features.

o Adjust elasticNetParam to balance feature selection and generalization.

4. Check for Data Leakage:

Cross Validation with K fold where k = 4:

ROC AUC for Fold 1: 0.8846153846153847

ROC AUC for Fold 2: 0.9423076923076923

ROC AUC for Fold 3: 0.9927884615384616

ROC AUC for Fold 4: 0.9951923076923077

LogisticRegressionModel: uid=LogisticRegression_79495f0c73fe, numClasses=2,

You might also like