0% found this document useful (0 votes)
19 views16 pages

Week 11

The document discusses ensemble techniques in machine learning, which improve predictive performance by combining multiple models. It covers methods such as bagging, boosting, and stacking, explaining their mechanisms and applications. Key concepts include the importance of model independence, error correction in boosting, and the use of validation sets in stacking and blending.

Uploaded by

ibrahimsani2560
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views16 pages

Week 11

The document discusses ensemble techniques in machine learning, which improve predictive performance by combining multiple models. It covers methods such as bagging, boosting, and stacking, explaining their mechanisms and applications. Key concepts include the importance of model independence, error correction in boosting, and the use of validation sets in stacking and blending.

Uploaded by

ibrahimsani2560
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Machine Learning

Ensemble Techniques

1
Ensemble Methods
 better predictive performance by combining the predictions
from multiple models
 Construct a set of multiple random base learners
(classifiers/regressors) learned from the training data
 Predict class label of test records by combining the
predictions made by multiple classifiers/regressors (e.g.,
by taking majority vote, mean/median)
 Bagging: fitting many decision trees on different samples of
the same dataset and averaging/majority voting the
predictions.
 Boosting: adding sequential ensemble models that correct
the predictions made by prior models and outputs a
weighted average of the predictions 2
Necessary Conditions for Ensemble
Methods

 Ensemble Methods work better than a single base classifier if:


1. All base classifiers are independent of each other
2. All base classifiers perform better than random guessing
(error rate < 0.5 for binary classification)
Rather than making one model and hoping this model is the
best/most accurate predictor, ensemble methods take many
models and reduces variance (Overfit to balanced-fit)
Custom/Voting Ensemble: Random row selection with
Replacement; Multiple models, NOT all the models are decision
tree
Random Forest: Random row and feature selection with
Replacement; Multiple models, ALL the models are decision tree

3
Approach of Ensemble Learning
(Bagging)

Bootstrap

Aggregate Using majority vote or


averaging

Random row selection with Replacement: Replacement means that if a row is selected,
it is returned to the training dataset for potential re-selection in the same training dataset.
This means that a row of data may be selected zero, one, or multiple times for a given
4
training dataset
Bagging (Bootstrap AGGregatING)

 Bootstrap sampling: random sampling with


replacement
Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

 Build classifier on each bootstrap sample

6
Bagging (Bootstrap AGGregatING)

 Probability that a particular observation is chosen from a


set of observations:
 Probability that a particular observation is not chosen from
a set of observations:
 Probability that the observation is not chosen times:
 = the number of estimators/decision trees
 Probability of a training instance being selected in a
bootstrap sample: (: number of training instances)
 P (selected) 0.632 when is large
 P (not selected) 0.368 when is large = Out-of-bag
samples
7
Random Forest Algorithm

 Construct an ensemble of decision trees by


manipulating training set instances/rows as well
as features
– Use bootstrap sample to train every decision
tree (similar to Bagging)
– Use the following tree induction algorithm:
 At every internal node of decision tree, randomly
sample attributes for selecting split criterion
 Repeat this procedure until all leaves are pure

(unpruned tree)

8
Random Forest Hyperparameters

– n_estimators: default=100. The number of trees in


the forest
– max_samples: The number of samples to draw
from to train each base estimator
– oob_score: Out-of-bag Score of the training dataset
obtained using an out-of-bag estimate.
– Since the bootstrapping samples the data with the
possibility of selecting one sample multiple times, it
is very likely that we won’t select all the samples
from the original data set. Therefore, one smart
decision would be to exploit somehow these
unselected samples, called out-of-bag samples 9
Hard and Soft Voting Classifier

 In hard voting, we combine the outputs by returning the


mode, i.e., the most frequently/majority voting occurring
label among the base classifiers’ outputs
 In soft voting, the base classifiers output probabilities or

numerical scores
 Soft voting classifier classifies input data based on the

probabilities of all the predictions made by different


classifiers

𝑨
Predicted Output ()
𝑩 Hard Voting:
Soft Voting:
𝑨 10
Boosting Algorithms

• Boosting: sequential ensemble algorithms which converts weak/base


learner to strong learners in series.
• Step 1: The base learner takes all the distributions and assign equal
weight/attention to each observation/sample.
• Step 2: If there is any prediction error caused by first base learning
algorithm, then we pay higher attention/weight to observations having
prediction error. Then, we apply the next base learning algorithm.
Second model/step is built which tries to correct the errors present in the
first model.
• Step 3: Iterate Step 2 till the limit of base learning algorithm is reached
or higher accuracy is achieved. Complete training dataset is predicted
correctly or the maximum number of models are added
• Finally, it combines the outputs from weak learner and creates a
strong learner which eventually improves the prediction power of the
model. Boosting pays higher focus on examples which are misclassified
or have higher errors by preceding weak rules. 11
Boosting Algorithms

• Improved Accuracy – Boosting can improve the accuracy of the


model by combining several weak models’ accuracies and averaging
them for regression or voting over them for classification to increase
the accuracy of the final model.
• Robustness to Overfitting – Boosting can reduce the risk of
overfitting by reweighting the inputs that are classified wrongly.
• Better handling of imbalanced data – Boosting can handle the
imbalance data by focusing more on the data points that are
misclassified
• Better Interpretability – Boosting can increase the interpretability of
the model by breaking the model decision process into multiple
processes.
1. AdaBoost (Adaptive Boosting)
2. Gradient Tree Boosting
3. XGBoost 12
Approach of Ensemble Learning
(Boosting)

13
Boosting Hyperparameters

– estimators: The base estimator from which the


boosted ensemble is built. default= Decision Tree
with max_depth=1 (Decision Stump)
– # of estimators: maximum number of estimators at
which boosting is terminated. In case of perfect fit,
the learning procedure is stopped early
– max_samples: The number of samples to draw
from to train each base estimator
– Algorithm: {‘SAMME’, ‘SAMME.R’},
default=’SAMME.R’.

14
Stacking and Blending Ensemble
– Heterogeneous base/weak learners
– Intermediate meta model from predictions of base
– Train set is split into training and validation sets.
– We train the base models on the training set.
– We make predictions only on the validation set.
– The validation predictions are used as features to build a new model.
– Output of base learners are used to create a new dataset and meta
learner
– This model is used to make final predictions on the test set using the
prediction values as features.
– Step 1: Split the custom dataset into training, validation and test.
Step 2: Train the multiple base models with the training samples.
– Step 3: Generate the new dataset from predictions of base models
with validation samples.
– Step 4: Train the meta model with the new dataset.
15
Stacking and Blending Ensemble

The difference between stacking and blending is that


Stacking uses out-of-k-fold predictions for the train set of
the next layer (i.e., meta-model), and Blending uses a
hold out validation set (let’s say, 10-15% of the training
set) to train the next layer 16
Stacking and Blending
hyperparameters

1. Estimators: Base models


2. final_estimator: Meta model
3. CV: Number of folds
4. Passthroughbool: When False, only the predictions of
estimators will be used as training data for
final_estimator. When True, the final_estimator is
trained on the predictions as well as the original
training data.

17

You might also like