0% found this document useful (0 votes)
32 views31 pages

Project

Uploaded by

sujeydash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views31 pages

Project

Uploaded by

sujeydash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Prepared by group 5

Bankruptcy
Prevention
Team Members

Kaushik Sonnakula Sujesh Dash Sasi Anuhya Karlapudi

Likith Tikkireddy Malavika.A.S Kruteeka Samal


Introduction

Bankruptcy is a legal state that occurs when an individual or a


company is no longer able to meet its financial obligations to
creditors. It typically arises when liabilities exceed assets, making it
impossible to repay debts. Bankruptcy prediction is therefore a crucial
task in finance and risk management, as it allows early detection of
financial distress, minimizes losses for stakeholders, and provides
organizations with the opportunity to restructure or take preventive
actions.
Why Predict Bankruptcy?

Predicting bankruptcy is essential because it enables early


identification of companies or individuals facing financial distress. This
early warning helps banks, investors, and regulatory bodies to
minimize losses, reduce financial risks, and take preventive measures
such as restructuring debts or improving financial strategies. For
businesses, it ensures stability, builds trust with stakeholders, and
supports long-term growth. By predicting bankruptcy in advance,
organizations can protect both their financial health and the wider
economy.
Dataset Overview
Dataset: Copy of bankruptcy-prevention.csv (250 records)
Objective: Predict whether a company will go bankrupt or not
based on risk factors.
Target Variable: class (0 → Non-bankrupt, 1 → Bankrupt)
Features:
Industrial Risk
Management Risk
Financial Flexibility
Credibility
Competitiveness
Operating Risk
Data Preprocessing
Dataset: Copy of bankruptcy-prevention.csv (250 records)
Objective: Predict whether a company will go bankrupt or not
based on risk factors.
Target Variable: class (0 → Non-bankrupt, 1 → Bankrupt)
Features:
Industrial Risk
Management Risk
Financial Flexibility
Credibility
Competitiveness
Operating Risk
Exploratory Data Analysis (EDA)
Class Distribution:
In our dataset, the target variable is class, which indicates
whether a company is bankrupt (1) or non-bankrupt (0).
Class distribution simply means: how many samples belong to
each class.
Often in bankruptcy datasets, the classes are imbalanced (i.e.,
many more non-bankrupt companies compared to bankrupt
ones).
Imbalanced data can bias the models toward predicting the
majority class (non-bankrupt), so techniques like feature
selection, ensemble methods, and proper evaluation metrics
(AUC, Precision, Recall, F1-score) are important.
Exploratory Data Analysis (EDA)
Feature Correlation:
Feature correlation tells us how strongly two variables are related.
Values range between -1 and +1:
+1→ Strong positive correlation (both increase together).
-1 → Strong negative correlation (one increases, the other decreases).
0 → No correlation (independent).
Key Observations (based on correlation heatmap & RandomForest importance in your
codes):
industrial_risk and operating_risk showed a moderate positive correlation.
management_risk and credibility were weakly negatively correlated.
financial_flexibility had low correlation with most others, meaning it provided unique
information.
None of the features had very high (>0.9) correlation, so multicollinearity was not a big
issue.
Feature importance (from Random Forest) confirmed that industrial_risk,
financial_flexibility, and management_risk were the most significant predictors of
bankruptcy.
EDA Visualizations and Feature Distribution:
EDA Visualizations and Feature Distribution:
Model Building
We implemented 9 classification models + Ensemble to predict bankruptcy.
Class distribution:
class
Logistic Regression (Using 10 fold 0 143
stratified CV) 1 107
Name: count, dtype: int64
Simple linear model for binary Fitting 10 folds for each of 20 candidates, totalling 200 fits
classification. Best hyperparameters: {'C': 0.01, 'class_weight': None, 'penalty': 'l2'}
Assumes linear relationship
Mean Accuracy: 1.0000
between features and target.
Mean F1-Score: 1.0000
Used L2 regularization to avoid Mean ROC-AUC: 1.0000
overfitting.
✅ Advantages: Easy to interpret, Classification Report (All Data):
baseline model.
❌ Limitations: Not suitable for
precision recall f1-score support

complex non-linear patterns. 0 1.00 1.00 1.00 143


1 1.00 1.00 1.00 107

accuracy 1.00 250


macro avg 1.00 1.00 1.00 250
weighted avg 1.00 1.00 1.00 250
Logistic Regression (Using 10 fold stratified CV)
Predictions saved for model 5
Sample Predictions:

Actual Predicted Probability_Class1


0 1 1 0.708022
1 1 1 0.721990
2 1 1 0.681335
3 1 1 0.544603
4 1 1 0.855483
Model Building
We implemented 9 classification models + Ensemble to predict bankruptcy.
Class distribution:
class
Decision Tree (Using 10 fold stratified 0 143
CV) 1 107
Name: count, dtype: int64
Fitting 10 folds for each of 480 candidates, totalling 4800 fits
Tree-like structure that splits data Best hyperparameters: {'criterion': 'gini', 'max_depth': 3, 'max_features': 'sqrt',
based on feature values. 'min_samples_leaf': 1, 'min_samples_split': 2}
Mean Accuracy: 0.9920
Captures non-linear relationships.
✅ Advantages: Easy
Mean F1-Score: 0.9905
Mean ROC-AUC: 0.9921
visualization, interpretable rules.
❌ Limitations: High risk of Classification Report (All Data):
overfitting if tree depth not
precision recall f1-score support
controlled.
0 0.99 0.99 0.99 143
1 0.99 0.99 0.99 107

accuracy 0.99 250


macro avg 0.99 0.99 0.99 250
weighted avg 0.99 0.99 0.99 250
Decision Tree (Using 10 fold stratified CV)
Actual vs Predicted values for all rows:

Actual Predicted Probability_Class1


0 1 1 1.0
1 1 1 1.0
2 1 1 1.0
3 1 1 1.0
4 1 1 1.0
Model Building
We implemented 9 classification models + Ensemble to predict bankruptcy.
Class distribution:
class
Random Forest (Using 10 fold stratified 0 143
CV) 1 107
Name: count, dtype: int64
Ensemble of multiple decision Fitting 10 folds for each of 1152 candidates, totalling 11520 fits
trees (bagging technique). Best hyperparameters: {'criterion': 'entropy', 'max_depth': 5, 'max_features': 'log2',
'min_samples_leaf': 1, 'min_samples_split': 20, 'n_estimators': 200}
Each tree trained on a random
subset of data & features. Mean Accuracy: 1.0000
Final prediction = majority vote Mean F1-Score: 1.0000
Mean ROC-AUC: 1.0000
(classification).
✅ Advantages: Handles non- Classification Report (All Data):
linearity, reduces overfitting.
❌ Limitations: Less interpretable precision recall f1-score support

compared to a single tree. 0 1.00 1.00 1.00 143


1 1.00 1.00 1.00 107

accuracy 1.00 250


macro avg 1.00 1.00 1.00 250
weighted avg 1.00 1.00 1.00 250
Random Forest (Using 10 fold stratified CV)
Predictions saved for model 2
Actual vs Predicted values for all rows:

Actual Predicted Probability_Class1


0 1 1 0.995580
1 1 1 0.985940
2 1 1 0.999929
3 1 1 0.999850
4 1 1 0.971016
Model Building
We implemented 9 classification models + Ensemble to predict bankruptcy.
Class distribution:
class
SVM (RBF kernel) classification 0 143
1 107
Name: count, dtype: int64
Creates a hyperplane to separate Fitting 10 folds for each of 30 candidates, totalling 300 fits
classes. Best hyperparameters: {'C': 1, 'gamma': 'scale'}

Used RBF kernel to capture non- Mean Accuracy: 0.9960


linear boundaries. Mean F1-Score: 0.9952
✅ Advantages: Works well in Mean ROC-AUC: 1.0000

high-dimensional space.
❌ Limitations: Computationally
Classification Report (All Data):

expensive, sensitive to parameter precision recall f1-score support


tuning.
0 0.99 1.00 1.00 143
1 1.00 0.99 1.00 107

accuracy 1.00 250


macro avg 1.00 1.00 1.00 250
weighted avg 1.00 1.00 1.00 250
SVM (RBF kernel) classification (Using 10 fold stratified CV)
Predictions saved for model 6
Sample Predictions:

Actual Predicted Probability_Class1


0 1 1 0.999994
1 1 1 0.997130
2 1 1 0.979422
3 1 1 0.985565
4 1 1 0.984048
Model Building
We implemented 9 classification models + Ensemble to predict bankruptcy.
Class distribution:
class
K-Nearest Neighbors (Using 10-fold 0 143
Stratified CV) 1 107
Name: count, dtype: int64
Fitting 10 folds for each of 720 candidates, totalling 7200 fits
Classifies based on the majority Best hyperparameters: {'leaf_size': 10, 'metric': 'chebyshev', 'n_neighbors': 5, 'p': 1,
class among k-nearest neighbors. 'weights': 'distance'}
Distance metric (Euclidean) used
Mean Accuracy: 0.9960
to measure closeness.
✅ Advantages: Simple, no
Mean F1-Score: 0.9957
Mean ROC-AUC: 1.0000
training phase.
❌ Limitations: Sensitive to noise, Classification Report (All Data):

computationally heavy on large precision recall f1-score support


datasets.
0 1.00 0.99 1.00 143
1 0.99 1.00 1.00 107

accuracy 1.00 250


macro avg 1.00 1.00 1.00 250
weighted avg 1.00 1.00 1.00 250
K-Nearest Neighbors (Using 10-fold Stratified CV)
Predictions saved for model 7
Sample Predictions:

Actual Predicted Probability_Class1


0 1 1 1.0
1 1 1 1.0
2 1 1 1.0
3 1 1 1.0
4 1 1 1.0
Model Building
We implemented 9 classification models + Ensemble to predict bankruptcy.
Class distribution:
class
XGBoost (Extreme Gradient Boosting) 0 143
(Using 10-fold Stratified CV) 1 107
Name: count, dtype: int64
scale_pos_weight: 1.3364485981308412
Boosting algorithm that builds Fitting 10 folds for each of 432 candidates, totalling 4320 fits
trees sequentially to correct Best hyperparameters: {'colsample_bytree': 0.8, 'learning_rate': 0.01, 'max_depth': 3,
previous errors. 'n_estimators': 300, 'reg_alpha': 0, 'reg_lambda': 1, 'subsample': 0.8}
Mean Accuracy: 0.9960
Optimized for speed and Mean F1-Score: 0.9952
performance. Mean ROC-AUC: 1.0000
✅ Advantages: High accuracy,
Classification Report (All Data):
handles missing values well.
❌ Limitations: More complex precision recall f1-score support
tuning required.
0 1.00 0.99 1.00 143
1 0.99 1.00 1.00 107

accuracy 1.00 250


macro avg 1.00 1.00 1.00 250
weighted avg 1.00 1.00 1.00 250
XGBoost (Extreme Gradient Boosting) (Using 10-fold Stratified CV)
Predictions saved for model 8
Sample Predictions:

Actual Predicted Probability_Class1


0 1 1 0.966506
1 1 1 0.966915
2 1 1 0.966268
3 1 1 0.960596
4 1 1 0.966506
Model Building
We implemented 9 classification models + Ensemble to predict bankruptcy.
Class distribution:
class
LightGBM (Using 10-fold Stratified CV) 0 143
1 107
Name: count, dtype: int64
Fitting 10 folds for each of 1296 candidates, totalling 12960 fits
Gradient boosting framework Mean Accuracy: 0.9960
using leaf-wise growth strategy. Mean F1-Score: 0.9952
Faster than XGBoost on large Mean ROC-AUC: 1.0000

datasets.
✅ Advantages: Very efficient,
Classification Report (All Data):

high accuracy. precision recall f1-score support


❌ Limitations: May overfit on
0 1.00 0.99 1.00 143
small datasets if not tuned 1 0.99 1.00 1.00 107
properly.
accuracy 1.00 250
macro avg 1.00 1.00 1.00 250
weighted avg 1.00 1.00 1.00 250
LightGBM (Using 10-fold Stratified CV)
Predictions saved for model 3
Actual vs Predicted values for all rows:

Actual Predicted Probability_Class1


0 1 1 0.932579
1 1 1 0.932647
2 1 1 0.932544
3 1 1 0.932544
4 1 1 0.932330
Model Building
We implemented 9 classification models + Ensemble to predict bankruptcy.
Class distribution:
class
CatBoost (Using 10-fold Stratified CV) 0 143
1 107
Name: count, dtype: int64
Gradient boosting library Fitting 10 folds for each of 144 candidates, totalling 1440 fits
optimized for categorical data. Best hyperparameters: {'depth': 4, 'iterations': 300, 'l2_leaf_reg': 3, 'learning_rate':
0.01}
Requires less preprocessing
(handles categorical values Mean Accuracy: 0.9960
automatically). Mean F1-Score: 0.9952

✅ Advantages: Strong Mean ROC-AUC: 1.0000

performance with default Classification Report (All Data):


parameters.
❌ Limitations: Training can be precision recall f1-score support

slower compared to LightGBM. 0 1.00 0.99 1.00 143


1 0.99 1.00 1.00 107

accuracy 1.00 250


macro avg 1.00 1.00 1.00 250
weighted avg 1.00 1.00 1.00 250
CatBoost (Using 10-fold Stratified CV)
Predictions saved for model 4
Sample Predictions:

Actual Predicted Probability_Class1


0 1 1 0.997050
1 1 1 0.996658
2 1 1 0.996147
3 1 1 0.994708
4 1 1 0.997146
Model Building
We implemented 9 classification models + Ensemble to predict bankruptcy.
--- Overall Metrics ---
Mean Accuracy: 0.9960
Artificial Neural Network (ANN) (Using Mean F1-Score: 0.9952
10-fold Stratified CV) Mean ROC-AUC: 1.0000
Class distribution:
class
Inspired by human brain structure 0 143
with input, hidden, and output 1 107
layers. Name: count, dtype: int64
Classification Report:
Captures complex non-linear
patterns. precision recall f1-score support
✅ Advantages: Very powerful for
0 0.99 1.00 1.00 143
pattern recognition.
❌ Limitations: Requires more
1 1.00 0.99 1.00 107

data & careful tuning to avoid accuracy 1.00 250


overfitting. macro avg 1.00 1.00 1.00 250
weighted avg 1.00 1.00 1.00 250
CatBoost (Using 10-fold Stratified CV)
Predictions saved for model 9
Actual Predicted Probability_Class1
0 1 1 1.0
1 1 1 1.0
2 1 1 1.0
3 1 1 1.0
4 1 1 1.0
Ensemble all 9 models to get best results (For best
predictions, avoiding overfitting & accurate results)
Combined predictions from all 9 models Classification Report:
using Weighted Average Ensemble. precision recall f1-score support
Weights assigned based on AUC score of
each model. 0 1.00 1.00 1.00 143
Ensemble prediction gave higher stability 1 1.00 1.00 1.00 107

and accuracy than individual models.


accuracy 1.00 250
Ensemble Metrics
macro avg 1.00 1.00 1.00 250
Accuracy: 1.0000
weighted avg 1.00 1.00 1.00 250
F1-Score: 1.0000
ROC-AUC: 1.0000
Ensemble all 9 models to get best results (For best
predictions, avoiding overfitting & accurate results)
Thank you

You might also like