Module 10 – Part III
Advanced Boosting models
Prof. Pedram Jahangiry
Decision Trees Fundamental questions
• Four fundamental questions to be answered:
1) What feature and cut off to start with?
2) How to split the samples?
3) How to grow a tree?
4) How to combine trees?
Prof. Pedram Jahangiry
What feature and cut off to start with?
• Which feature and cut off adds the most information gain (minimum impurity)?
• Regression trees: MSE
Control how a Decision Tree
• Classification trees: decides to split the data
1. Error rate
2. Entropy
3. Gini Index
Prof. Pedram Jahangiry
How to split the samples?
Method Description
This method sorts the data and creates histograms of the values
before splitting the tree. This allows for faster splits but can
Pre-sorted and histogram based
result in less accurate trees.
This method uses gradient information as a measure of the
weight of a sample for splitting.
GOSS (Gradient-based One-Side
Keeps instances with large gradients while performing random
Sampling)
sampling on instances with small gradients.
This method selects the best split at each step without
considering the impact on future splits. This method May
Greedy method
result in suboptimal trees
Prof. Pedram Jahangiry
How to grow a tree?
Algorithm Description
Depth-Wise Repeatedly splitting the data along the feature with the highest
Level-Wise information gain, until a certain maximum depth is reached. Resulting in a
tree with a balanced structure, where all leaf nodes are at the same depth.
Repeatedly splitting the data along the feature with the highest
information gain, until all leaf nodes contain only a single class. Resulting
Leaf-wise in a tree with a highly unbalanced structure, where some branches are
much deeper than others.
Builds the tree by repeatedly splitting the data along the feature with the
highest information gain, until a certain stopping criterion is met (e.g. a
Symmetric minimum number of samples per leaf node). Resulting in a more balanced
tree structure than leaf-wise growth.
Prof. Pedram Jahangiry
How to combine trees?
• Bagging consists of creating many “copies” of the training data (each
copy is slightly different from another) and then apply the weak
learner to each copy to obtain multiple weak models and then
combine them.
• In bagging, the bootstrapped trees are independent from each other.
• Boosting consists of using the “original” training data and iteratively
creating multiple models by using a weak learner. Each new model
tries to “fix” the errors which previous models make.
• In boosting, each tree is grown using information from previous tree.
Prof. Pedram Jahangiry
Evolution of XGBoost
Prof. Pedram Jahangiry
XGBoost: eXtreme Gradient Boosting
• XGBoost is an open-source gradient boosting library developed by Tianqi Chen (2014)
focused on developing efficient and scalable machine learning algorithms.
• Extreme refers to the fact that the algorithms and methods have been customized to push the
limit of what is possible for gradient boosting algorithms.
• XGBoost includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.
Prof. Pedram Jahangiry
LightGBM (Light Gradient Boosted Machine)
• LightGBM is an open-source gradient boosting library developed by Microsoft (2016) that
is fast and efficient, making it suitable for large-scale learning tasks.
• LightGBM can handle categorical features, but requires one-hot encoding, ordinal
encoding or other preprocessing
• LightGBM includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.
Prof. Pedram Jahangiry
CatBoost (Category Boosting)
• CatBoost is an open-source gradient boosting library developed by Yandex (2017) that is
specifically designed to handle categorical data.
• CatBoost can handle categorical features directly, without the need for one-hot encoding or
other preprocessing.
• CatBoost includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.
Prof. Pedram Jahangiry
XGBoost vs LightGBM vs CatBoost
XGBoost LightGBM CatBoost
Developer Tianqi Chen (2014) Microsoft (2016) Yandex (2017)
Base Model Decision Trees Decision Trees Decision Trees
Tree growing algorithm Depth-wise tree growth Leaf-wise tree growth Symmetric tree growth
Leaf-wise is also available
Parallel training Single GPU Multiple GPUs Multiple GPUs
Handling categorical Encoding required (one-hot, Automated encoding No encoding required
features ordinal, target, label, …) using categorical feature
binning
Splitting method Pre-sorted and histogram GOSS (Gradient based Greedy method
based one-side sampling)
Prof. Pedram Jahangiry