ENSEMBLE TECHNIQUES
-BAGGING AND BOOSTING
-BY : K SUBBA HANEESH ,B-SEC
ENG22AM0104
INTRODUCTION
Ensemble learning helps improve machine learning results by
combining several models. This approach allows the production of
better predictive performance compared to a single model. Basic
idea is to learn a set of classifiers (experts) and to allow them to
vote. Bagging and Boosting are two types of Ensemble
Learning. These two decrease the variance of a single estimate as
they combine several estimates from different models.
BAGGING
Bootstrap Aggregating, also known as bagging, is a machine
learning ensemble meta-algorithm designed to improve the stability
and accuracy of machine learning algorithms used in statistical
classification and regression. It decreases the variance and helps to
avoid overfitting. It is usually applied to decision tree methods.
Bagging is a special case of the model averaging approach.
IMPLEMENTATION STEPS OF BAGGING
Step 1: Multiple subsets are created from the original data set with
equal tuples, selecting observations with replacement.
Step 2: A base model is created on each of these subsets.
Step 3: Each model is learned in parallel with each training set and
independent of each other.
Step 4: The final predictions are determined by combining the
predictions from all the models.
EXAMPLE OF BAGGING
THE RANDOM FOREST MODEL
USES BAGGING, WHERE DECISION
TREE MODELS WITH HIGHER
VARIANCE ARE PRESENT. IT MAKES
RANDOM FEATURE SELECTION TO
GROW TREES. SEVERAL RANDOM
TREES MAKE A RANDOM FOREST.
BOOSTING
Boosting is an ensemble modeling technique designed to create a strong classifier by combining multiple
weak classifiers. The process involves building models sequentially, where each new model aims to
correct the errors made by the previous ones.
Initially, a model is built using the training data.
Subsequent models are then trained to address the mistakes of their predecessors.
boosting assigns weights to the data points in the original dataset.
Higher weights: Instances that were misclassified by the previous model receive higher weights.
Lower weights: Instances that were correctly classified receive lower weights.
Training on weighted data: The subsequent model learns from the weighted dataset, focusing its
attention on harder-to-learn examples (those with higher weights).
This iterative process continues until:
The entire training dataset is accurately predicted, or
A predefined maximum number of models is reached.
ALGORITHM:
1)Initialise the dataset and assign equal weight to each of the data point.
2)Provide this as input to the model and identify the wrongly classified data points.
3)Increase the weight of the wrongly classified data points and decrease the weights of
correctly classified data points. And then normalize the weights of all data points.
4)if (got required results)
Goto step 5
else
Goto step 2
5)End
BOOSTING
DIFFERENCES BETWEEN BAGGING AND BOOSTING
S.NO Bagging Boosting
The simplest way of combining predictions that A way of combining predictions that
1.
belong to the same type. belong to the different types.
2. Aim to decrease variance, not bias. Aim to decrease bias, not variance.
3. Each model receives equal weight. Models are weighted according to their performance.
New models are influenced
4. Each model is built independently.
by the performance of previously built models.
5 Bagging tries to solve the over-fitting problem. Boosting tries to reduce bias.
If the classifier is unstable (high variance), then apply If the classifier is stable and simple (high bias) the apply
6.
bagging. boosting.
7. In this base classifiers are trained parallelly. In this base classifiers are trained sequentially.
8. Example: The Random forest model uses Bagging. Example: The AdaBoost uses Boosting techniques