0% found this document useful (0 votes)

198 views5 pages

Developing Scorecards in Python Using OptBinning

The document discusses the development of scorecards in Python using the OptBinning library, which simplifies the binning process essential for creating risk models in lending. OptBinning provides a flexible and efficient way to perform optimal binning, allowing users to create production-ready scorecards with minimal code. It emphasizes the accessibility of these tools for small businesses and fintechs, leveraging open-source libraries for data science and analytics.

Uploaded by

place.abhishekh99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views5 pages

Developing Scorecards in Python Using OptBinning

Uploaded by

place.abhishekh99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

This is Google's cache of https://towardsdatascience.com/developing-scorecards-in-python-using-optbinning-ab9a205e1f69.

It is a snapshot
of the page as it appeared on 22 Sep 2023 22:29:50 GMT. The current page could have changed in the meantime. Learn more.

Full version Text-only version View source

Tip: To quickly find your search term on this page, press Ctrl+F or ⌘-F (Mac) and use the find bar.

Open in app

Write

Member-only story

Developing Scorecards in Python using OptBinning

Create industry level Scorecards with just a few lines of code
Gabriel dos Santos Gonçalves
Towards Data Science

Gabriel dos Santos Gonçalves

Published in

Towards Data Science

·
7 min read
·
Mar 16, 2021

Listen

Photo by Avery Evans on Unsplash

1. Introduction
Scorecards are risk models used by lending businesses to evaluate customers trying to access credit. A well-developed scorecard
brings a lot of value to financial institutions and is essential for making decisions in terms of credit policy. Even though the
mathematics and logic behind a scorecard are not complex, developing a well-performing model can be hard, as it takes a lot of time
to organize and process data.

The traditional approach is to separate the variables in numerical or categorical and apply a binning approach to group values that
show a similar relationship with the target (usually binary) using the Weight of Evidence of each value. This process of binning can
be time-consuming and imperfect, as some decisions on if to merge bins can be judgmental and influenced by the scorecard
developer. That’s one of the reasons banks and other institutions can take several months to develop or re-train scorecard models.
2. OptBinning to the rescue!
OptBinning tries to fill the gap between reliability in binning features and scorecard development, and flexibility in terms of having a
library written in Python (a widely used language for data analytics).

“OptBinning is a library written in Python implementing a rigorous and flexible mathematical programming formulation for solving
the optimal binning problem for a binary, continuous or multiclass target type, incorporating constraints not previously addressed”.

More than just offering a powerful method for performing binning, OptBinning also provides a wide variety of tools needed to select
features, create scorecards, and visualize the performance during the development process.

OptBinning uses Scikit-Learn BaseEstimator as the structure of its binning classes, making it intuitive to use, with fit and transform
methods, just like any Scikit-Learn estimator.

3. The logic behind Optimal binning

Binning is the process of dividing values of a continuous variable into groups that share a similar behavior in respect to a
characteristic. This technique that discretizes values into buckets is extremely valuable for understanding the relationship between the
feature and the target. Binning is an essential step in Scorecard development, as each bin is associated with a Scorecard value,
helping bring explainability to the model.

“From a modeling perspective, the binning technique may address prevalent data issues such as the handling of missing values, the
presence of outliers and statistical noise, and data scaling.”

— Optimal binning: mathematical programming formulation, Navas-Palencia G.

There are many available techniques for performing binning, and although some can be successfully implemented, there is no
guarantee that they can reach the optimal bins. The optimal binning of a variable is the process where you discretize the samples in
groups in order to satisfy a specific constraint while optimizing a divergence (or performance) metric. This constraint can be a
specific number of bins or a minimum number of samples per bin.

OptBinning offers an efficient implementation of the optimal binning process, giving you control over parameters and constraints.

4. Undestanding OptBinning Classes

OptBinning has 3 main class types hierarchically related that perform all the processing needed to bin your features and create a
Scorecard. The sessions below offer a brief description of how these classes are structured. For more detail please refer to
OptBinning official documentation.

4.1. OptimalBinning, ContinuousOptimalBinning, and MulticlassOptimalBinning

OptimalBinning is the base class for performing binning of a feature with a binary target. For continuous or multiclass targets two
other classes are available: ContinuosOptimalBinning and MulticlassOptimalBinning .

As mentioned before, these 3 classes are built following sklearn.base.BaseEstimator structure with the fitand transform methods.
Binning a feature using the mentioned classes is as simple as the code below:
# 1) Define your feature and target arrays
X = df_train['feat_name']
y = df_train['target']# 2) Instantiate class and fit to train dataset
optb = OptimalBinning(name='feat_name', dtype="numerical")
optb.fit(X, y)# 3) To perform the binning of a dataset
X_binned = optb.transform(X)# 4) To visualize the results table and plot
optb.binning_table.build()
optb.binning_table.plot(metric="woe")

As a default, the binning classes return the value of Weight of Evidence for the respective bin category. More parameters other than
feature name and data type (numerical or categorical) are available, offering a considerable level of customization for this process.

4.2. BinningProcess
The class BinningProcess is built with the goal to perform optimal binning over a whole dataset, not just one feature as exemplified
in the session above.
So the best way to view BinningProcess is as a wrapper for OptimalBinning . The usage is fairly simple, with just a few parameters
needed for performing the binning of a full dataset.
# 1) Define list of features and categorical ones
list_features = df_train.drop(columns=['TARGET']).columns.values
list_categorical = df_train.select_dtypes(include=['object', 'category']).columns.values# 2) Instantiate BinningProcess
binning_process = BinningProcess(
categorical_variables=list_categorical,
variable_names=list_features)# 3) Fit and transform dataset
df_train_binned = binning_process.fit_transform(df_train, y)

4.3. ScoreCard
The class ScoreCard offers the possibility of combining the binned dataset generated from a BinningProcess with a linear estimator
from Scikit-Learn to generate a production-ready Scorecard.
# 1) Define a linear estimator (model)
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()# 2) Instatiate a ScoreCard and fit to dataset
scaling_method = "min_max"
scaling_method_data = {"min": 0, "max": 1000}
scorecard = Scorecard(
target='TARGET',
binning_process=binning_process,
estimator=logreg,
scaling_method=scaling_method,
scaling_method_params=scaling_method_data,
intercept_based=False,
reverse_scorecard=True,
)scorecard.fit(df_application_train)

So with just a few lines of code, you create a Scorecard model ready to be tested and put in production! The tutorial on how to create
and validate a Scorecard is detailed in the next session.

Figure 1 summarizes the relationship of classes that are part of OptBinning.

Figure 1. Optbinning hierarchy of classes.

5. Tutorial: Creating a Scorecard with OptBinning

To illustrate the process of creating a production-ready Scorecard with Optbinning we are going to use Kaggle’s Home Credit Risk
Default dataset. You can find the Jupyter Notebook with the code on the tutorial repository.

5.1. Loading the dataset

After downloading the dataset files from Kaggle’s page and extracting the folder you’ll end up with a few CSV files. These files are
part of Kaggle’s challenge description with information about features and tables. We are going to use the application_train.csv file to
demonstrate OptBinning. After loading the dataset as a Pandas Dataframe we set the column SK_ID_CURR as index (line 8) and
split the dataset into train and test (line 11).

5.2. Exploring Features and BinningProcess

Feature engineering is one of the most important steps in any model development and Scorecards are no exception. As our focus is
on demonstrating OptBinning usage for Scorecard development we won’t explore much of the possibilities engineering features from
the dataset. Our approach is going to separate categorical from numerical features and define them when instantiating
BinningProcess as the optimal binning process deals differently with these types of features. Another parameter we need to set at this
stage is the selection_criteria , the constraints to use to define the optimal bins.

5.3. Choosing a linear estimator

One great feature present in OptBinning is the flexibility to choose any linear model (estimator) from Scikit_learn to use in your
Scorecard. By tuning the parameters of the linear estimator you can increase the performance of your Scorecard. We are going to use
Logistic Regression to illustrate the usage, but feel free to explore other estimators.

5.4. Creating the Scorecard

After instantiating a BinningProcess and a linear estimator you need to specify the scaling parameters (line 2 and 3) and pass them to
your Scorecard instance. Next, you run the fit method on your dataset (line 16) and that’s it! After a few seconds, your Scorecard is
ready for performance validation.

5.5. Visualizing and validating the Scorecard

OptBinning offers a wide variety of methods for you to visualize and evaluate your Scorecard. You can access the Scorecard table
with metrics for each binned feature and save it as CSV to document the model development.

Below you can see the part of the Scorecard table:

Finally, you can visualize your Scorecard performance using functions from optbinning.scorecard.plots module.
Figure. KS and ROC-AUC plots for the trained Scorecard model.

5.6. Using Scorecard in production

One of OptBinning's most incredible features is its capacity to be easily put into production. You can save your Scorecard model with
pickle, store it and use it in production. To perform predictions you simply need to unpickle the Scorecard object to score samples
from a Pandas Dataframe with the features you used to develop your model.

5. Closing remarks
For many years, the development of scorecards was confined to large financial institutions that had the money to acquire expensive
software tools like SAS. We have demonstrated OptBinning’s power and versatility for binning variables and creating scorecards.

Combining Optbinning with Python’s libraries for Data Science and Analytics (Pandas, Numpy, Matplotlib, Scikit-Learn) could offer
all the tools needed to develop industry-level Scorecard models.

This could be a game-changer for small businesses and Fintechs as all these mentioned libraries are open-source, meaning the only
investment these companies would need to do is on human resources.

Thanks a lot for reading my article!

You can find my other articles on my profile page 🔬
If you enjoyed it and want to become Medium a member you can use my referral link to also support me 👍

Resources
For more information about OptBinning, check out the project GitHub page and documentation page. For information about the logic
and mathematics behind optimal binning, you can find the description in the article “Optimal binning: mathematical programming
formulation” by Guillermo Navas-Palencia.

guillermo-navas-palencia/optbinning
OptBinning is a library written in Python implementing a rigorous and flexible mathematical
programming formulation to…

github.com

Optimal binning: mathematical programming formulation

The optimal binning is the optimal discretization of a variable into bins given a discrete or continuous
numeric…
arxiv.org

OptBinning: The Python Optimal Binning library - optbinning 0.7.0

documentation
The optimal binning is the optimal discretization of a variable into bins given a discrete or continuous
numeric…

gnpalencia.org

GabrielSGoncalves/optbinning_tutorial
Contribute to GabrielSGoncalves/optbinning_tutorial development by creating an account on GitHub.
github.com
Scorecard
Credit Score
Python
Machine Learning
Data Science

Gabriel dos Santos Gonçalves

Towards Data Science
Follow

Written by Gabriel dos Santos Gonçalves

467 Followers
·Writer for

Towards Data Science

Data Engineer at bigdata.com.br | www.linkedin.com/in/gabrielsantosgoncalves

More from Gabriel dos Santos Gonçalves and Towards Data Science
Introduction to Papermill
Gabriel dos Santos Gonçalves

Gabriel dos Santos Gonçalves

Towards Data Science

Introduction to Papermill
How to transform your Jupyter Notebook into a workflow tool
·9 min read·Dec 10, 2019

RAG vs Finetuning — Which Is the Best Tool to Boost Your LLM Application?
Heiko Hotz

Heiko Hotz

Towards Data Science

Scikit Hca
No ratings yet
Scikit Hca
8 pages
Data Binning Techniques Guide
No ratings yet
Data Binning Techniques Guide
32 pages
Data Binning Tutorial PDF
No ratings yet
Data Binning Tutorial PDF
15 pages
SM Binning
No ratings yet
SM Binning
12 pages
Optimal Binning via Mathematical Programming
No ratings yet
Optimal Binning via Mathematical Programming
21 pages
Data Preprocessing Techniques in ML
No ratings yet
Data Preprocessing Techniques in ML
12 pages
ML Notes
No ratings yet
ML Notes
44 pages
Feature Engineering
No ratings yet
Feature Engineering
15 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
48 pages
C4.5 vs CHAID: Decision Tree Algorithms
No ratings yet
C4.5 vs CHAID: Decision Tree Algorithms
30 pages
Understanding Discrete and Continuous Data
No ratings yet
Understanding Discrete and Continuous Data
43 pages
(12.1.19) Concept Hierarchy
No ratings yet
(12.1.19) Concept Hierarchy
4 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
43 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Cheatsheet Midterms 2 - 3
No ratings yet
Cheatsheet Midterms 2 - 3
2 pages
Slides Concepts
No ratings yet
Slides Concepts
55 pages
Unit 3 - Test 2 Portions-1
No ratings yet
Unit 3 - Test 2 Portions-1
24 pages
Unit 2exploratory Analysis
No ratings yet
Unit 2exploratory Analysis
37 pages
Recap of Machine Learning
No ratings yet
Recap of Machine Learning
29 pages
Feature Selection Techniques Explained
No ratings yet
Feature Selection Techniques Explained
54 pages
24ucs172 S6
No ratings yet
24ucs172 S6
19 pages
AIPPTMaker - Data Preprocessing and Feature Engineering - Key To Improving AI Algorithm Performance
No ratings yet
AIPPTMaker - Data Preprocessing and Feature Engineering - Key To Improving AI Algorithm Performance
35 pages
Data Science Feature Engineering Techniques
No ratings yet
Data Science Feature Engineering Techniques
5 pages
DS Notes
No ratings yet
DS Notes
36 pages
PPA Data Preparation
No ratings yet
PPA Data Preparation
31 pages
Lec06 7 Feature Engineering 08112022 100115am
No ratings yet
Lec06 7 Feature Engineering 08112022 100115am
44 pages
Data Integration and Binning
No ratings yet
Data Integration and Binning
4 pages
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
No ratings yet
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
253 pages
Chapter4 PDF
No ratings yet
Chapter4 PDF
34 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
Unit 3
No ratings yet
Unit 3
36 pages
Feature Engineering for Regression Models
No ratings yet
Feature Engineering for Regression Models
23 pages
Intro Data Binning
No ratings yet
Intro Data Binning
19 pages
Feature Selection for ML Experts
No ratings yet
Feature Selection for ML Experts
38 pages
Session 2 On Discreatization - Binning Notes
No ratings yet
Session 2 On Discreatization - Binning Notes
14 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
Machine Learning: Technical Requirements & Data Processing Guide
No ratings yet
Machine Learning: Technical Requirements & Data Processing Guide
30 pages
Practical Guide and Concepts Data Mining
No ratings yet
Practical Guide and Concepts Data Mining
63 pages
Types of Data (Qualitative and Quantitative)
No ratings yet
Types of Data (Qualitative and Quantitative)
89 pages
Module 2 Data Preprocessing
No ratings yet
Module 2 Data Preprocessing
31 pages
My Notes
No ratings yet
My Notes
15 pages
Feature Engineering Techniques Guide
No ratings yet
Feature Engineering Techniques Guide
69 pages
Data Transformation
No ratings yet
Data Transformation
16 pages
Learning Framework
No ratings yet
Learning Framework
18 pages
Lecture1-Introduction To Data Mining
No ratings yet
Lecture1-Introduction To Data Mining
46 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
69 pages
01 Phan Tich Dau Tu Nang Cao - CRISP Trong KHDL
No ratings yet
01 Phan Tich Dau Tu Nang Cao - CRISP Trong KHDL
37 pages
Study Material For Machine Learning - 1 - 1754721598318
No ratings yet
Study Material For Machine Learning - 1 - 1754721598318
18 pages
Features
No ratings yet
Features
37 pages
Final ML
No ratings yet
Final ML
2 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
Binary Classification
No ratings yet
Binary Classification
2 pages
Btae 137
No ratings yet
Btae 137
4 pages
Mccormick 2011
No ratings yet
Mccormick 2011
8 pages
Module 2
No ratings yet
Module 2
12 pages
Data Preprocessing and Feature Engineering
No ratings yet
Data Preprocessing and Feature Engineering
32 pages
Unit II 10 Data Preprocessing Techniques
No ratings yet
Unit II 10 Data Preprocessing Techniques
13 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Hypotheses, MLRM, Dummy
No ratings yet
Hypotheses, MLRM, Dummy
6 pages
Credit Risk
No ratings yet
Credit Risk
11 pages
Chapter 9 Story HELOC Credits - XAI Stories
No ratings yet
Chapter 9 Story HELOC Credits - XAI Stories
9 pages
Climate Risk and Financial Stability - Evidence From Bank Lending
No ratings yet
Climate Risk and Financial Stability - Evidence From Bank Lending
45 pages
Engineering Landmark: Belle Isle Turbine
100% (6)
Engineering Landmark: Belle Isle Turbine
15 pages
Furniture & Cabinetmaking 279 - Jan 2019 PDF
No ratings yet
Furniture & Cabinetmaking 279 - Jan 2019 PDF
84 pages
Grade V Exam Syllabus & Schedule
No ratings yet
Grade V Exam Syllabus & Schedule
4 pages
3452
No ratings yet
3452
12 pages
Module One Wellness Plan
No ratings yet
Module One Wellness Plan
8 pages
GIDB8459117-Class 8 Chapter 7 Notes
No ratings yet
GIDB8459117-Class 8 Chapter 7 Notes
2 pages
Andrei Prica: Team Leader
No ratings yet
Andrei Prica: Team Leader
2 pages
Figurative Language 2024
No ratings yet
Figurative Language 2024
15 pages
Neuroscience & Psychiatry Conference
No ratings yet
Neuroscience & Psychiatry Conference
31 pages
Catalogo Compresores de Tornillo Boge
No ratings yet
Catalogo Compresores de Tornillo Boge
28 pages
EFI Scan 1.5 Tool for Toyota & Lexus
No ratings yet
EFI Scan 1.5 Tool for Toyota & Lexus
2 pages
Technical Specification - Bathymetry 120511
No ratings yet
Technical Specification - Bathymetry 120511
3 pages
OTA Student Evaluation at Taunton Mills
No ratings yet
OTA Student Evaluation at Taunton Mills
10 pages
Burien Park & Recreation Plan 2011-2025
No ratings yet
Burien Park & Recreation Plan 2011-2025
443 pages
Career Management
50% (2)
Career Management
20 pages
Power System Frequency Control Q&A
No ratings yet
Power System Frequency Control Q&A
4 pages
III Research Template
No ratings yet
III Research Template
13 pages
Guide No. 12: SSPC: The Society For Protective Coatings
100% (1)
Guide No. 12: SSPC: The Society For Protective Coatings
6 pages
Da Hood Aimbot Script Guide
No ratings yet
Da Hood Aimbot Script Guide
7 pages
PDF RT
No ratings yet
PDF RT
8 pages
Student Study Guide Earths Surface
No ratings yet
Student Study Guide Earths Surface
7 pages
CUSAT B.Tech Syllabus 2006: CS Sem VII
No ratings yet
CUSAT B.Tech Syllabus 2006: CS Sem VII
19 pages
Discover Bhedetar: Nepal's Scenic Hill Station
No ratings yet
Discover Bhedetar: Nepal's Scenic Hill Station
2 pages
AI Course Overview for CSE Students
No ratings yet
AI Course Overview for CSE Students
77 pages
TLE 7-8 Front Office Service Q1 - M2 For Printing
No ratings yet
TLE 7-8 Front Office Service Q1 - M2 For Printing
22 pages
iR-ADV DX 6000
No ratings yet
iR-ADV DX 6000
171 pages
The Slave's Dream: Yearning for Freedom
No ratings yet
The Slave's Dream: Yearning for Freedom
12 pages
Media Planning Notes Module 3
No ratings yet
Media Planning Notes Module 3
12 pages
Digital Filters for Signal Processing
No ratings yet
Digital Filters for Signal Processing
6 pages
Quiz - Week 09 Quiz
No ratings yet
Quiz - Week 09 Quiz
2 pages

Developing Scorecards in Python Using OptBinning

Uploaded by

Developing Scorecards in Python Using OptBinning

Uploaded by

This is Google's cache of https://towardsdatascience.com/developing-scorecards-in-python-using-optbinning-ab9a205e1f69.

Full version Text-only version View source

Developing Scorecards in Python using OptBinning

Gabriel dos Santos Gonçalves

Towards Data Science

Photo by Avery Evans on Unsplash

3. The logic behind Optimal binning

— Optimal binning: mathematical programming formulation, Navas-Palencia G.

4. Undestanding OptBinning Classes

4.1. OptimalBinning, ContinuousOptimalBinning, and MulticlassOptimalBinning

Figure 1 summarizes the relationship of classes that are part of OptBinning.

Figure 1. Optbinning hierarchy of classes.

5. Tutorial: Creating a Scorecard with OptBinning

5.1. Loading the dataset

5.2. Exploring Features and BinningProcess

5.3. Choosing a linear estimator

5.4. Creating the Scorecard

5.5. Visualizing and validating the Scorecard

Below you can see the part of the Scorecard table:

5.6. Using Scorecard in production

Thanks a lot for reading my article!

Optimal binning: mathematical programming formulation

OptBinning: The Python Optimal Binning library - optbinning 0.7.0

Gabriel dos Santos Gonçalves

Written by Gabriel dos Santos Gonçalves

Towards Data Science

Data Engineer at bigdata.com.br | www.linkedin.com/in/gabrielsantosgoncalves

Gabriel dos Santos Gonçalves

Towards Data Science

Towards Data Science

You might also like