0% found this document useful (0 votes)
198 views5 pages

Developing Scorecards in Python Using OptBinning

The document discusses the development of scorecards in Python using the OptBinning library, which simplifies the binning process essential for creating risk models in lending. OptBinning provides a flexible and efficient way to perform optimal binning, allowing users to create production-ready scorecards with minimal code. It emphasizes the accessibility of these tools for small businesses and fintechs, leveraging open-source libraries for data science and analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
198 views5 pages

Developing Scorecards in Python Using OptBinning

The document discusses the development of scorecards in Python using the OptBinning library, which simplifies the binning process essential for creating risk models in lending. OptBinning provides a flexible and efficient way to perform optimal binning, allowing users to create production-ready scorecards with minimal code. It emphasizes the accessibility of these tools for small businesses and fintechs, leveraging open-source libraries for data science and analytics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

This is Google's cache of https://towardsdatascience.com/developing-scorecards-in-python-using-optbinning-ab9a205e1f69.

It is a snapshot
of the page as it appeared on 22 Sep 2023 22:29:50 GMT. The current page could have changed in the meantime. Learn more.

Full version Text-only version View source


Tip: To quickly find your search term on this page, press Ctrl+F or ⌘-F (Mac) and use the find bar.

Open in app

Sign up

Sign In

Write

Sign up

Sign In

Member-only story

Developing Scorecards in Python using OptBinning


Create industry level Scorecards with just a few lines of code
Gabriel dos Santos Gonçalves
Towards Data Science

Gabriel dos Santos Gonçalves

Follow

Published in

Towards Data Science

·
7 min read
·
Mar 16, 2021

--

Listen

Share

Photo by Avery Evans on Unsplash

1. Introduction
Scorecards are risk models used by lending businesses to evaluate customers trying to access credit. A well-developed scorecard
brings a lot of value to financial institutions and is essential for making decisions in terms of credit policy. Even though the
mathematics and logic behind a scorecard are not complex, developing a well-performing model can be hard, as it takes a lot of time
to organize and process data.

The traditional approach is to separate the variables in numerical or categorical and apply a binning approach to group values that
show a similar relationship with the target (usually binary) using the Weight of Evidence of each value. This process of binning can
be time-consuming and imperfect, as some decisions on if to merge bins can be judgmental and influenced by the scorecard
developer. That’s one of the reasons banks and other institutions can take several months to develop or re-train scorecard models.
2. OptBinning to the rescue!
OptBinning tries to fill the gap between reliability in binning features and scorecard development, and flexibility in terms of having a
library written in Python (a widely used language for data analytics).

“OptBinning is a library written in Python implementing a rigorous and flexible mathematical programming formulation for solving
the optimal binning problem for a binary, continuous or multiclass target type, incorporating constraints not previously addressed”.

More than just offering a powerful method for performing binning, OptBinning also provides a wide variety of tools needed to select
features, create scorecards, and visualize the performance during the development process.

OptBinning uses Scikit-Learn BaseEstimator as the structure of its binning classes, making it intuitive to use, with fit and transform
methods, just like any Scikit-Learn estimator.

3. The logic behind Optimal binning


Binning is the process of dividing values of a continuous variable into groups that share a similar behavior in respect to a
characteristic. This technique that discretizes values into buckets is extremely valuable for understanding the relationship between the
feature and the target. Binning is an essential step in Scorecard development, as each bin is associated with a Scorecard value,
helping bring explainability to the model.

“From a modeling perspective, the binning technique may address prevalent data issues such as the handling of missing values, the
presence of outliers and statistical noise, and data scaling.”

— Optimal binning: mathematical programming formulation, Navas-Palencia G.

There are many available techniques for performing binning, and although some can be successfully implemented, there is no
guarantee that they can reach the optimal bins. The optimal binning of a variable is the process where you discretize the samples in
groups in order to satisfy a specific constraint while optimizing a divergence (or performance) metric. This constraint can be a
specific number of bins or a minimum number of samples per bin.

OptBinning offers an efficient implementation of the optimal binning process, giving you control over parameters and constraints.

4. Undestanding OptBinning Classes


OptBinning has 3 main class types hierarchically related that perform all the processing needed to bin your features and create a
Scorecard. The sessions below offer a brief description of how these classes are structured. For more detail please refer to
OptBinning official documentation.

4.1. OptimalBinning, ContinuousOptimalBinning, and MulticlassOptimalBinning


OptimalBinning is the base class for performing binning of a feature with a binary target. For continuous or multiclass targets two
other classes are available: ContinuosOptimalBinning and MulticlassOptimalBinning .

As mentioned before, these 3 classes are built following sklearn.base.BaseEstimator structure with the fitand transform methods.
Binning a feature using the mentioned classes is as simple as the code below:
# 1) Define your feature and target arrays
X = df_train['feat_name']
y = df_train['target']# 2) Instantiate class and fit to train dataset
optb = OptimalBinning(name='feat_name', dtype="numerical")
optb.fit(X, y)# 3) To perform the binning of a dataset
X_binned = optb.transform(X)# 4) To visualize the results table and plot
optb.binning_table.build()
optb.binning_table.plot(metric="woe")

As a default, the binning classes return the value of Weight of Evidence for the respective bin category. More parameters other than
feature name and data type (numerical or categorical) are available, offering a considerable level of customization for this process.

4.2. BinningProcess
The class BinningProcess is built with the goal to perform optimal binning over a whole dataset, not just one feature as exemplified
in the session above.
So the best way to view BinningProcess is as a wrapper for OptimalBinning . The usage is fairly simple, with just a few parameters
needed for performing the binning of a full dataset.
# 1) Define list of features and categorical ones
list_features = df_train.drop(columns=['TARGET']).columns.values
list_categorical = df_train.select_dtypes(include=['object', 'category']).columns.values# 2) Instantiate BinningProcess
binning_process = BinningProcess(
categorical_variables=list_categorical,
variable_names=list_features)# 3) Fit and transform dataset
df_train_binned = binning_process.fit_transform(df_train, y)

4.3. ScoreCard
The class ScoreCard offers the possibility of combining the binned dataset generated from a BinningProcess with a linear estimator
from Scikit-Learn to generate a production-ready Scorecard.
# 1) Define a linear estimator (model)
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()# 2) Instatiate a ScoreCard and fit to dataset
scaling_method = "min_max"
scaling_method_data = {"min": 0, "max": 1000}
scorecard = Scorecard(
target='TARGET',
binning_process=binning_process,
estimator=logreg,
scaling_method=scaling_method,
scaling_method_params=scaling_method_data,
intercept_based=False,
reverse_scorecard=True,
)scorecard.fit(df_application_train)

So with just a few lines of code, you create a Scorecard model ready to be tested and put in production! The tutorial on how to create
and validate a Scorecard is detailed in the next session.

Figure 1 summarizes the relationship of classes that are part of OptBinning.

Figure 1. Optbinning hierarchy of classes.

5. Tutorial: Creating a Scorecard with OptBinning


To illustrate the process of creating a production-ready Scorecard with Optbinning we are going to use Kaggle’s Home Credit Risk
Default dataset. You can find the Jupyter Notebook with the code on the tutorial repository.

5.1. Loading the dataset


After downloading the dataset files from Kaggle’s page and extracting the folder you’ll end up with a few CSV files. These files are
part of Kaggle’s challenge description with information about features and tables. We are going to use the application_train.csv file to
demonstrate OptBinning. After loading the dataset as a Pandas Dataframe we set the column SK_ID_CURR as index (line 8) and
split the dataset into train and test (line 11).

5.2. Exploring Features and BinningProcess


Feature engineering is one of the most important steps in any model development and Scorecards are no exception. As our focus is
on demonstrating OptBinning usage for Scorecard development we won’t explore much of the possibilities engineering features from
the dataset. Our approach is going to separate categorical from numerical features and define them when instantiating
BinningProcess as the optimal binning process deals differently with these types of features. Another parameter we need to set at this
stage is the selection_criteria , the constraints to use to define the optimal bins.

5.3. Choosing a linear estimator


One great feature present in OptBinning is the flexibility to choose any linear model (estimator) from Scikit_learn to use in your
Scorecard. By tuning the parameters of the linear estimator you can increase the performance of your Scorecard. We are going to use
Logistic Regression to illustrate the usage, but feel free to explore other estimators.

5.4. Creating the Scorecard


After instantiating a BinningProcess and a linear estimator you need to specify the scaling parameters (line 2 and 3) and pass them to
your Scorecard instance. Next, you run the fit method on your dataset (line 16) and that’s it! After a few seconds, your Scorecard is
ready for performance validation.

5.5. Visualizing and validating the Scorecard


OptBinning offers a wide variety of methods for you to visualize and evaluate your Scorecard. You can access the Scorecard table
with metrics for each binned feature and save it as CSV to document the model development.

Below you can see the part of the Scorecard table:

Finally, you can visualize your Scorecard performance using functions from optbinning.scorecard.plots module.
Figure. KS and ROC-AUC plots for the trained Scorecard model.

5.6. Using Scorecard in production


One of OptBinning's most incredible features is its capacity to be easily put into production. You can save your Scorecard model with
pickle, store it and use it in production. To perform predictions you simply need to unpickle the Scorecard object to score samples
from a Pandas Dataframe with the features you used to develop your model.

5. Closing remarks
For many years, the development of scorecards was confined to large financial institutions that had the money to acquire expensive
software tools like SAS. We have demonstrated OptBinning’s power and versatility for binning variables and creating scorecards.

Combining Optbinning with Python’s libraries for Data Science and Analytics (Pandas, Numpy, Matplotlib, Scikit-Learn) could offer
all the tools needed to develop industry-level Scorecard models.

This could be a game-changer for small businesses and Fintechs as all these mentioned libraries are open-source, meaning the only
investment these companies would need to do is on human resources.

Thanks a lot for reading my article!


You can find my other articles on my profile page 🔬
If you enjoyed it and want to become Medium a member you can use my referral link to also support me 👍

Resources
For more information about OptBinning, check out the project GitHub page and documentation page. For information about the logic
and mathematics behind optimal binning, you can find the description in the article “Optimal binning: mathematical programming
formulation” by Guillermo Navas-Palencia.

guillermo-navas-palencia/optbinning
OptBinning is a library written in Python implementing a rigorous and flexible mathematical
programming formulation to…

github.com

Optimal binning: mathematical programming formulation


The optimal binning is the optimal discretization of a variable into bins given a discrete or continuous
numeric…
arxiv.org

OptBinning: The Python Optimal Binning library - optbinning 0.7.0


documentation
The optimal binning is the optimal discretization of a variable into bins given a discrete or continuous
numeric…

gnpalencia.org

GabrielSGoncalves/optbinning_tutorial
Contribute to GabrielSGoncalves/optbinning_tutorial development by creating an account on GitHub.
github.com
Scorecard
Credit Score
Python
Machine Learning
Data Science

--

--

Gabriel dos Santos Gonçalves


Towards Data Science
Follow

Written by Gabriel dos Santos Gonçalves


467 Followers
·Writer for

Towards Data Science

Data Engineer at bigdata.com.br | www.linkedin.com/in/gabrielsantosgoncalves

Follow

More from Gabriel dos Santos Gonçalves and Towards Data Science
Introduction to Papermill
Gabriel dos Santos Gonçalves

Gabriel dos Santos Gonçalves

in

Towards Data Science

Introduction to Papermill
How to transform your Jupyter Notebook into a workflow tool
·9 min read·Dec 10, 2019

--

RAG vs Finetuning — Which Is the Best Tool to Boost Your LLM Application?
Heiko Hotz

Heiko Hotz

in

Towards Data Science

You might also like