Understanding Ensembles
● Ensemble learning is a machine learning technique
that combines predictions from multiple models
(learners) to improve performance.
● It follows the idea that many weak models, when
combined, can perform better than a single strong
model.
● Analogy: Like choosing a team of diverse experts on
a quiz show. Individually, they may not know
everything, but together, they cover a wide range of
knowledge.
🔸 Types of Ensemble Methods
1. Bagging (Bootstrap Aggregating)
● Stands for Bootstrap AGGregatING.
● Uses homogeneous weak learners (same type of
models, like decision trees).
● Each model is trained on a random subset of the
training data (with replacement).
● Training is done in parallel.
● Final output is based on:
○ Majority vote for classification.
○ Average for regression.
● Helps reduce variance and prevents overfitting.
2. Boosting
● Also uses homogeneous weak learners.
● Models are trained sequentially—each new model
tries to fix the errors made by the previous one.
● Gives more weight to misclassified instances in each
step.
● Combines models using a deterministic strategy.
● Helps reduce bias and increases model accuracy, but
can overfit if not controlled.
3. Stacking (Stacked Generalization)
● Combines heterogeneous weak learners (different
types like SVM, KNN, DT, etc.).
● Trained in parallel, and outputs of these models are
passed to a meta-model (like logistic regression).
● Meta-model learns how to best combine the base
model outputs to make the final prediction.
● Offers flexibility and higher performance, especially
when base models have different strengths.
🌲 Random Forest – Key Concepts
● Random Forest is an ensemble of decision trees,
built using bagging.
● Rather than averaging predictions from similar trees,
Random Forest:
1.Trains each tree using random samples
(bootstrap).
2.At each split, considers a random subset of
features, not all.
● This randomness gives two main benefits:
1.Reduces overfitting by decorrelating trees.
2.Increases generalization performance on unseen
data.
● Each tree contributes to the final prediction through
voting (classification) or averaging (regression).
📦 Bootstrap Sampling Method
● A statistical technique to estimate metrics (like mean)
from a small dataset.
● Steps:
1.Draw many sub-samples with replacement
from the dataset (e.g., 1000).
2.Calculate the mean (or other metric) for each
sample.
3.Average all these values to get a more accurate
estimate.
● It helps in reducing estimation errors and is used in
bagging/random forest.
🔁 Bagging – In Detail
● In bagging, each model is trained on a different
bootstrapped version of the data.
● Models are trained using the same learning
algorithm (like decision trees).
● Final prediction is made by aggregating predictions.
● Works best with unstable learners—models that
change a lot when input data changes slightly.
○ Example: Decision trees.
● Main idea: diversity from random sampling increases
model robustness and stability.
🧮 Gini Impurity (Used in Decision Trees)
● Gini impurity is a measure of how impure a node is in
terms of class distribution.
● Formula:
IG(n)=1−∑i=1J(pi)2I_G(n) = 1 - \sum_{i=1}^{J}
(p_i)^2IG(n)=1−i=1∑J(pi)2
where pip_ipiis the proportion of samples belonging
to class iii.
● Example:
○ A Gini impurity of 0.444 means there's a 44.4%
chance of misclassifying a random sample from
that node.
● Lower Gini = better split.
● The decision tree chooses the feature and threshold
that minimizes the Gini impurity at each node.
🔄 Random Forest Pseudocode
1.Randomly choose k features from the total m
features (where k ≪ m).
2.Among the k features, find the best feature and
threshold to split the node.
3.Divide the data into daughter nodes using the best
split.
4.Repeat steps 1–3 until the max depth or stopping
criteria is met.
5.Repeat this process for n trees to build the complete
forest.
Boosting and AdaBoost (Image 1)
● Boosting combines multiple weak classifiers to form a
strong classifier.
● It works by:
○ Training a model on data.
○ Creating the next model to correct errors made by
the previous one.
○ Repeating this until the data is predicted well or a
set limit is reached.
● AdaBoost is the first successful boosting algorithm:
○ Originally for binary classification, later used for
multi-class problems.
○ A great starting point to understand boosting.
● Best used with weak learners (models that perform
slightly better than random guessing).
● Can enhance any machine learning algorithm’s
performance.
🔸 Advantages of Random Forest (Image 2)
● Works for both classification and regression tasks.
● Handles missing values well.
● Avoids overfitting in most classification problems.
● Can be reused for different tasks without changing the
core algorithm.
● Useful for feature engineering:
○ Helps identify the most important features in the
dataset.
🔸 Boosting Algorithm (AdaBoost Process) (Image 3)
● Hard-to-classify samples get higher weights in each
iteration.
● Algorithm focuses more on the misclassified
samples.
● In each round:
○ A stage weight = ln((1 - error) / error)
is calculated.
○ Initially, equal weights 1/N are given to all
samples.
● Final result is a weighted ensemble of classifiers.
○ This combined model performs better than
individual classifiers.
○ Shows strong potential for accurate
classification.