Module 10-Part 3 - Advanced Boosting Models

The document discusses advanced boosting models, focusing on decision trees and their fundamental questions, including feature selection, sample splitting, tree growth, and tree combination methods. It introduces three popular gradient boosting libraries: XGBoost, LightGBM, and CatBoost, highlighting their unique features and differences. The document emphasizes the evolution and efficiency of these algorithms in handling various data types and improving model performance.

Uploaded by

Aashir Aftab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views11 pages

Module 10-Part 3 - Advanced Boosting Models

Uploaded by

Aashir Aftab

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Module 10 – Part III

Advanced Boosting models

Prof. Pedram Jahangiry

Decision Trees Fundamental questions
• Four fundamental questions to be answered:
1) What feature and cut off to start with?
2) How to split the samples?
3) How to grow a tree?
4) How to combine trees?

Prof. Pedram Jahangiry

What feature and cut off to start with?
• Which feature and cut off adds the most information gain (minimum impurity)?
• Regression trees: MSE
Control how a Decision Tree
• Classification trees: decides to split the data
1. Error rate
2. Entropy
3. Gini Index

Prof. Pedram Jahangiry

How to split the samples?

Method Description
This method sorts the data and creates histograms of the values
before splitting the tree. This allows for faster splits but can
Pre-sorted and histogram based
result in less accurate trees.

This method uses gradient information as a measure of the

weight of a sample for splitting.
GOSS (Gradient-based One-Side
Keeps instances with large gradients while performing random
Sampling)
sampling on instances with small gradients.

This method selects the best split at each step without

considering the impact on future splits. This method May
Greedy method
result in suboptimal trees

Prof. Pedram Jahangiry

How to grow a tree?
Algorithm Description
Depth-Wise Repeatedly splitting the data along the feature with the highest
Level-Wise information gain, until a certain maximum depth is reached. Resulting in a
tree with a balanced structure, where all leaf nodes are at the same depth.

Repeatedly splitting the data along the feature with the highest
information gain, until all leaf nodes contain only a single class. Resulting
Leaf-wise in a tree with a highly unbalanced structure, where some branches are
much deeper than others.

Builds the tree by repeatedly splitting the data along the feature with the
highest information gain, until a certain stopping criterion is met (e.g. a
Symmetric minimum number of samples per leaf node). Resulting in a more balanced
tree structure than leaf-wise growth.

Prof. Pedram Jahangiry

How to combine trees?
• Bagging consists of creating many “copies” of the training data (each
copy is slightly different from another) and then apply the weak
learner to each copy to obtain multiple weak models and then
combine them.
• In bagging, the bootstrapped trees are independent from each other.

• Boosting consists of using the “original” training data and iteratively

creating multiple models by using a weak learner. Each new model
tries to “fix” the errors which previous models make.
• In boosting, each tree is grown using information from previous tree.

Prof. Pedram Jahangiry

Evolution of XGBoost

Prof. Pedram Jahangiry

XGBoost: eXtreme Gradient Boosting
• XGBoost is an open-source gradient boosting library developed by Tianqi Chen (2014)
focused on developing efficient and scalable machine learning algorithms.
• Extreme refers to the fact that the algorithms and methods have been customized to push the
limit of what is possible for gradient boosting algorithms.
• XGBoost includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.

Prof. Pedram Jahangiry

LightGBM (Light Gradient Boosted Machine)
• LightGBM is an open-source gradient boosting library developed by Microsoft (2016) that
is fast and efficient, making it suitable for large-scale learning tasks.
• LightGBM can handle categorical features, but requires one-hot encoding, ordinal
encoding or other preprocessing
• LightGBM includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.

Prof. Pedram Jahangiry

CatBoost (Category Boosting)
• CatBoost is an open-source gradient boosting library developed by Yandex (2017) that is
specifically designed to handle categorical data.
• CatBoost can handle categorical features directly, without the need for one-hot encoding or
other preprocessing.
• CatBoost includes several other features that can improve model performance, such as
handling missing values, automatic feature selection, and model ensembling.

Prof. Pedram Jahangiry

XGBoost vs LightGBM vs CatBoost

XGBoost LightGBM CatBoost

Developer Tianqi Chen (2014) Microsoft (2016) Yandex (2017)
Base Model Decision Trees Decision Trees Decision Trees
Tree growing algorithm Depth-wise tree growth Leaf-wise tree growth Symmetric tree growth
Leaf-wise is also available
Parallel training Single GPU Multiple GPUs Multiple GPUs
Handling categorical Encoding required (one-hot, Automated encoding No encoding required
features ordinal, target, label, …) using categorical feature
binning
Splitting method Pre-sorted and histogram GOSS (Gradient based Greedy method
based one-side sampling)

Prof. Pedram Jahangiry

Module 10 - Part 2 - Boosting Models
No ratings yet
Module 10 - Part 2 - Boosting Models
14 pages
05 XGBoost
No ratings yet
05 XGBoost
6 pages
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
100% (1)
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
13 pages
XGBoost
No ratings yet
XGBoost
4 pages
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
No ratings yet
Xgboost: A Scalable Tree Boosting System: Tianqi Chen Tqchen@Cs - Washington.Edu Carlos Guestrin Guestrin@Cs - Washington.Edu
13 pages
Plagiarism
No ratings yet
Plagiarism
20 pages
Plagiarism
No ratings yet
Plagiarism
18 pages
Gradient Boosting with XGBoost Explained
No ratings yet
Gradient Boosting with XGBoost Explained
26 pages
XGBoost for Data Scientists
No ratings yet
XGBoost for Data Scientists
26 pages
XGBoost and Random Forest Algorithms
100% (1)
XGBoost and Random Forest Algorithms
6 pages
Session 10 - Ensemble Methods (XGBoost)
No ratings yet
Session 10 - Ensemble Methods (XGBoost)
37 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
93 pages
Comparative Analysis of XGBoost
No ratings yet
Comparative Analysis of XGBoost
20 pages
Module 4
No ratings yet
Module 4
44 pages
Xgboost Presentation
100% (3)
Xgboost Presentation
54 pages
Lecture12 Annotated
No ratings yet
Lecture12 Annotated
20 pages
XGBoost
No ratings yet
XGBoost
4 pages
Phys361 S24 Lecture 17 Random Forests
No ratings yet
Phys361 S24 Lecture 17 Random Forests
24 pages
A Comparative Analysis of Gradient Boosting Algorithms: Candice Bentéjac Anna Csörgő Gonzalo Martínez Muñoz
No ratings yet
A Comparative Analysis of Gradient Boosting Algorithms: Candice Bentéjac Anna Csörgő Gonzalo Martínez Muñoz
31 pages
Module 9 - CART
No ratings yet
Module 9 - CART
33 pages
rfp0697 Chenaemb
No ratings yet
rfp0697 Chenaemb
10 pages
Boosting vs. Random Forest Analysis
No ratings yet
Boosting vs. Random Forest Analysis
14 pages
Chapter 12
No ratings yet
Chapter 12
27 pages
Understanding XGBoost Algorithm
No ratings yet
Understanding XGBoost Algorithm
5 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
Beginner's Guide to XGBoost Algorithm
No ratings yet
Beginner's Guide to XGBoost Algorithm
3 pages
Gradient Boosted Trees: Dr. Geetha Kuntoji
No ratings yet
Gradient Boosted Trees: Dr. Geetha Kuntoji
24 pages
Module 4 ML
No ratings yet
Module 4 ML
33 pages
Lesson 8 - Ensemble Learning
No ratings yet
Lesson 8 - Ensemble Learning
61 pages
Baysian Final
No ratings yet
Baysian Final
7 pages
XG Boost
No ratings yet
XG Boost
13 pages
XGBoost: Efficient Gradient Boosting Explained
100% (1)
XGBoost: Efficient Gradient Boosting Explained
13 pages
Module 10 - Part 1 - Bagging and RandomForest
No ratings yet
Module 10 - Part 1 - Bagging and RandomForest
22 pages
Finance-Focused Big Data Techniques
100% (1)
Finance-Focused Big Data Techniques
23 pages
ML Mod1
No ratings yet
ML Mod1
48 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Breast Cancer Tumor Prediction Using XGBOOST
No ratings yet
Breast Cancer Tumor Prediction Using XGBOOST
1 page
XGBoost: The Ultimate Guide
No ratings yet
XGBoost: The Ultimate Guide
93 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
Tree Ensembles
No ratings yet
Tree Ensembles
3 pages
Machine Learning: Video 106: Gradient Boosting Explained - How Gradient Boosting Works?
No ratings yet
Machine Learning: Video 106: Gradient Boosting Explained - How Gradient Boosting Works?
6 pages
XGBoost - Unleashing The Power of Gradient Boosting
No ratings yet
XGBoost - Unleashing The Power of Gradient Boosting
10 pages
XGBoost Course: Supervised Learning Basics
100% (1)
XGBoost Course: Supervised Learning Basics
39 pages
XGBoost for Data Scientists
No ratings yet
XGBoost for Data Scientists
8 pages
Out-of-Core GPU Gradient Boosting: Rong Ou
No ratings yet
Out-of-Core GPU Gradient Boosting: Rong Ou
5 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
Ensemble Learning Explained
No ratings yet
Ensemble Learning Explained
32 pages
Unit 3 by GPT
No ratings yet
Unit 3 by GPT
10 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Mastering LightGBM for Tabular ML
No ratings yet
Mastering LightGBM for Tabular ML
28 pages
XG Boosting Reference
No ratings yet
XG Boosting Reference
6 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
XGBoost and Upgrades
No ratings yet
XGBoost and Upgrades
14 pages
Comparison Between Xgboost, Lightgbm and Catboost Using A Home Credit Dataset
No ratings yet
Comparison Between Xgboost, Lightgbm and Catboost Using A Home Credit Dataset
5 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Bagging, Boosting, and Random Forests Explained
No ratings yet
Bagging, Boosting, and Random Forests Explained
27 pages
Graphic Design Business Plan Example
No ratings yet
Graphic Design Business Plan Example
35 pages
CH 12
No ratings yet
CH 12
19 pages
Overview of the Research Process
No ratings yet
Overview of the Research Process
12 pages
Chapter 5: Personnel Planning and Recruiting
No ratings yet
Chapter 5: Personnel Planning and Recruiting
20 pages
Chapter 1.1 Principles of Marketing
No ratings yet
Chapter 1.1 Principles of Marketing
10 pages
GROUP2-Ak Bank Part A PDF
No ratings yet
GROUP2-Ak Bank Part A PDF
19 pages
Chap 05 Power Point Slides
No ratings yet
Chap 05 Power Point Slides
103 pages
Dsa Lab Questions
No ratings yet
Dsa Lab Questions
2 pages
Chapter 11 Exam Prep: Heaps & B-Trees
No ratings yet
Chapter 11 Exam Prep: Heaps & B-Trees
4 pages
For The Single Machine Total Weighted Tardiness Scheduling Problem
No ratings yet
For The Single Machine Total Weighted Tardiness Scheduling Problem
18 pages
Unit 2 Important Questions With Answers
No ratings yet
Unit 2 Important Questions With Answers
19 pages
Greedy Algorithms: Key Problems Explained
No ratings yet
Greedy Algorithms: Key Problems Explained
5 pages
ADA Lab Manual-BCSL404-AY2023-2024-2022Batch
No ratings yet
ADA Lab Manual-BCSL404-AY2023-2024-2022Batch
76 pages
Cheat Sheet 1
No ratings yet
Cheat Sheet 1
2 pages
Unit 2 Analysis of Algorithm Complexity Theory
No ratings yet
Unit 2 Analysis of Algorithm Complexity Theory
66 pages
Hill Climbing Ipynb - Colaboratory 24102023 053236pm
No ratings yet
Hill Climbing Ipynb - Colaboratory 24102023 053236pm
2 pages
Dsa With Javascript - Week 1 (Beginner Friendly) : Day 1: Introduction To Dsa + Arrays (Basics)
No ratings yet
Dsa With Javascript - Week 1 (Beginner Friendly) : Day 1: Introduction To Dsa + Arrays (Basics)
5 pages
Disk Scheduling Fcfs SSTF Scan
No ratings yet
Disk Scheduling Fcfs SSTF Scan
11 pages
Dual Opt PDF
No ratings yet
Dual Opt PDF
11 pages
Arrays in C: Types, Syntax, and Examples
No ratings yet
Arrays in C: Types, Syntax, and Examples
36 pages
Stacks and Queues 2024 Edition
No ratings yet
Stacks and Queues 2024 Edition
127 pages
Practical No.8
No ratings yet
Practical No.8
5 pages
Dsa 2022
No ratings yet
Dsa 2022
3 pages
The Remainder Theorem
0% (1)
The Remainder Theorem
5 pages
Java Collections Interview Questions (2022) - Javatpoint
No ratings yet
Java Collections Interview Questions (2022) - Javatpoint
12 pages
2.1 Algorithms End of Unit Quiz Lesson Element
No ratings yet
2.1 Algorithms End of Unit Quiz Lesson Element
14 pages
Flipkart SDE Last Mile Preparation
No ratings yet
Flipkart SDE Last Mile Preparation
6 pages
Discrete Optimization: Assignments: Knapsack
No ratings yet
Discrete Optimization: Assignments: Knapsack
14 pages
Elementary Data Structures Lecture
No ratings yet
Elementary Data Structures Lecture
25 pages
Introduction To Optimization: Class Notes On: Mathematical Foundations in Engineering, ECEG 6209
No ratings yet
Introduction To Optimization: Class Notes On: Mathematical Foundations in Engineering, ECEG 6209
34 pages
Super Sort Research Paper
No ratings yet
Super Sort Research Paper
5 pages
Lab 02
No ratings yet
Lab 02
7 pages
Roots of Equations Case Studies
No ratings yet
Roots of Equations Case Studies
16 pages
AD3251 Data Structures Design Question Bank 1
No ratings yet
AD3251 Data Structures Design Question Bank 1
1 page
Support Vector Machine Guide
No ratings yet
Support Vector Machine Guide
41 pages
Ai Lab Manual r2021 Final For Print
No ratings yet
Ai Lab Manual r2021 Final For Print
45 pages
Unit 2 Lecture 4 Transportation & Assignment Models
No ratings yet
Unit 2 Lecture 4 Transportation & Assignment Models
33 pages