0% found this document useful (0 votes)

5 views9 pages

Ensembles

Uploaded by

sssalunkheb22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views9 pages

Ensembles

Uploaded by

sssalunkheb22

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Understanding Ensembles

● Ensemble learning is a machine learning technique

that combines predictions from multiple models
(learners) to improve performance.

● It follows the idea that many weak models, when

combined, can perform better than a single strong
model.

● Analogy: Like choosing a team of diverse experts on

a quiz show. Individually, they may not know
everything, but together, they cover a wide range of
knowledge.

🔸 Types of Ensemble Methods

1. Bagging (Bootstrap Aggregating)

● Stands for Bootstrap AGGregatING.

● Uses homogeneous weak learners (same type of

models, like decision trees).

● Each model is trained on a random subset of the

training data (with replacement).

● Training is done in parallel.

● Final output is based on:

○ Majority vote for classification.

○ Average for regression.

● Helps reduce variance and prevents overfitting.

2. Boosting

● Also uses homogeneous weak learners.

● Models are trained sequentially—each new model

tries to fix the errors made by the previous one.

● Gives more weight to misclassified instances in each

step.

● Combines models using a deterministic strategy.

● Helps reduce bias and increases model accuracy, but

can overfit if not controlled.

3. Stacking (Stacked Generalization)

● Combines heterogeneous weak learners (different

types like SVM, KNN, DT, etc.).
● Trained in parallel, and outputs of these models are
passed to a meta-model (like logistic regression).

● Meta-model learns how to best combine the base

model outputs to make the final prediction.

● Offers flexibility and higher performance, especially

when base models have different strengths.

🌲 Random Forest – Key Concepts

● Random Forest is an ensemble of decision trees,
built using bagging.

● Rather than averaging predictions from similar trees,

Random Forest:

1.Trains each tree using random samples

(bootstrap).

2.At each split, considers a random subset of

features, not all.

● This randomness gives two main benefits:

1.Reduces overfitting by decorrelating trees.

2.Increases generalization performance on unseen
data.

● Each tree contributes to the final prediction through

voting (classification) or averaging (regression).

📦 Bootstrap Sampling Method

● A statistical technique to estimate metrics (like mean)
from a small dataset.

● Steps:

1.Draw many sub-samples with replacement

from the dataset (e.g., 1000).

2.Calculate the mean (or other metric) for each

sample.

3.Average all these values to get a more accurate

estimate.

● It helps in reducing estimation errors and is used in

bagging/random forest.

🔁 Bagging – In Detail
● In bagging, each model is trained on a different
bootstrapped version of the data.

● Models are trained using the same learning

algorithm (like decision trees).

● Final prediction is made by aggregating predictions.

● Works best with unstable learners—models that

change a lot when input data changes slightly.

○ Example: Decision trees.

● Main idea: diversity from random sampling increases

model robustness and stability.

🧮 Gini Impurity (Used in Decision Trees)

● Gini impurity is a measure of how impure a node is in
terms of class distribution.

● Formula:
IG(n)=1−∑i=1J(pi)2I_G(n) = 1 - \sum_{i=1}^{J}
(p_i)^2IG(n)=1−i=1∑J(pi)2
where pip_ipiis the proportion of samples belonging
to class iii.

● Example:
○ A Gini impurity of 0.444 means there's a 44.4%
chance of misclassifying a random sample from
that node.

● Lower Gini = better split.

● The decision tree chooses the feature and threshold

that minimizes the Gini impurity at each node.

🔄 Random Forest Pseudocode

1.Randomly choose k features from the total m
features (where k ≪ m).

2.Among the k features, find the best feature and

threshold to split the node.

3.Divide the data into daughter nodes using the best

split.

4.Repeat steps 1–3 until the max depth or stopping

criteria is met.

5.Repeat this process for n trees to build the complete

forest.
Boosting and AdaBoost (Image 1)

● Boosting combines multiple weak classifiers to form a

strong classifier.

● It works by:

○ Training a model on data.

○ Creating the next model to correct errors made by

the previous one.

○ Repeating this until the data is predicted well or a

set limit is reached.

● AdaBoost is the first successful boosting algorithm:

○ Originally for binary classification, later used for

multi-class problems.

○ A great starting point to understand boosting.

● Best used with weak learners (models that perform

slightly better than random guessing).

● Can enhance any machine learning algorithm’s

performance.

🔸 Advantages of Random Forest (Image 2)

● Works for both classification and regression tasks.

● Handles missing values well.

● Avoids overfitting in most classification problems.

● Can be reused for different tasks without changing the

core algorithm.

● Useful for feature engineering:

○ Helps identify the most important features in the

dataset.

🔸 Boosting Algorithm (AdaBoost Process) (Image 3)

● Hard-to-classify samples get higher weights in each
iteration.

● Algorithm focuses more on the misclassified

samples.

● In each round:

○ A stage weight = ln((1 - error) / error)

is calculated.
○ Initially, equal weights 1/N are given to all
samples.

● Final result is a weighted ensemble of classifiers.

○ This combined model performs better than

individual classifiers.

○ Shows strong potential for accurate

classification.

Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Assessing Predictive Models
No ratings yet
Assessing Predictive Models
25 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Data Mining Notes
No ratings yet
Data Mining Notes
5 pages
Bagging
No ratings yet
Bagging
7 pages
Ensemble Methods
No ratings yet
Ensemble Methods
19 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Bagging vs Boosting in Ensemble Learning
No ratings yet
Bagging vs Boosting in Ensemble Learning
40 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
Ensemble Learning
No ratings yet
Ensemble Learning
13 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
Machine Learning Lecture 2,3,4
No ratings yet
Machine Learning Lecture 2,3,4
26 pages
Week 11
No ratings yet
Week 11
16 pages
Understanding Bagging and Boosting in ML
No ratings yet
Understanding Bagging and Boosting in ML
6 pages
Pa - Unit - Iv
No ratings yet
Pa - Unit - Iv
45 pages
22AIP3101A Session 11
No ratings yet
22AIP3101A Session 11
30 pages
2025 Ensemble Learning
No ratings yet
2025 Ensemble Learning
25 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
54 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
32 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Bagging and Random Forest Presentation1
100% (4)
Bagging and Random Forest Presentation1
23 pages
Unit I ML (I) 24-25
No ratings yet
Unit I ML (I) 24-25
79 pages
Unit 2
No ratings yet
Unit 2
13 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
Ensemble Methods Unit - 4
No ratings yet
Ensemble Methods Unit - 4
17 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Bagging vs Pasting in Ensemble Learning
No ratings yet
Bagging vs Pasting in Ensemble Learning
28 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Random Forests Simplified
No ratings yet
Random Forests Simplified
39 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
LR Desktop Udo6rlp
No ratings yet
LR Desktop Udo6rlp
4 pages
Bagging, Boosting, and Random Forests Explained
No ratings yet
Bagging, Boosting, and Random Forests Explained
27 pages
CH 7 Ensemble Learning
No ratings yet
CH 7 Ensemble Learning
34 pages
Module 7 Notes
No ratings yet
Module 7 Notes
3 pages
Enseble LEarning
100% (1)
Enseble LEarning
57 pages
Ensemble Learning Explained
No ratings yet
Ensemble Learning Explained
32 pages
Unit 3 by GPT
No ratings yet
Unit 3 by GPT
10 pages
Technical Report
No ratings yet
Technical Report
10 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
Module 2
No ratings yet
Module 2
34 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Unit I ML (I) 24-25-1
No ratings yet
Unit I ML (I) 24-25-1
152 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Ensembling Techniques
No ratings yet
Ensembling Techniques
11 pages
Random Forest
No ratings yet
Random Forest
27 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
Ensemble Methods Send
No ratings yet
Ensemble Methods Send
20 pages
Energy Efficiency Financial Model for Mining
No ratings yet
Energy Efficiency Financial Model for Mining
26 pages
Reading Comprehension
No ratings yet
Reading Comprehension
1 page
Calasanz' Sprituality
No ratings yet
Calasanz' Sprituality
2 pages
Thematic Analysis GRP1
No ratings yet
Thematic Analysis GRP1
6 pages
Applied Technologies in Pulmonary Medicine 1st Edition A.M.Esquinas No Waiting Time
100% (1)
Applied Technologies in Pulmonary Medicine 1st Edition A.M.Esquinas No Waiting Time
136 pages
Lean Construction in Chile
No ratings yet
Lean Construction in Chile
13 pages
Contract Law Case Studies
No ratings yet
Contract Law Case Studies
32 pages
DSWD Hiv Referral Book (1) Final and Published
No ratings yet
DSWD Hiv Referral Book (1) Final and Published
98 pages
Ambedkar Critique of Hindu Philosophy English
No ratings yet
Ambedkar Critique of Hindu Philosophy English
13 pages
Reading Comprehension Exercises
No ratings yet
Reading Comprehension Exercises
2 pages
Indian Art & History Academic Profile
No ratings yet
Indian Art & History Academic Profile
3 pages
Nekropsi Kucing PDF
No ratings yet
Nekropsi Kucing PDF
10 pages
Song Analysis Worksheet: Skyscraper
No ratings yet
Song Analysis Worksheet: Skyscraper
5 pages
Annex A: Grade 12 Career Guidance Learning Activity Plan (3Rd Quarter, S.Y. 2020 - 2021)
100% (1)
Annex A: Grade 12 Career Guidance Learning Activity Plan (3Rd Quarter, S.Y. 2020 - 2021)
1 page
Correctional Administration: (Institutional Correction)
100% (11)
Correctional Administration: (Institutional Correction)
35 pages
MML Commands
100% (1)
MML Commands
375 pages
Smile Rehabilitation in Anterior Aesthetic Zone Using Basal Implant
No ratings yet
Smile Rehabilitation in Anterior Aesthetic Zone Using Basal Implant
7 pages
神经生物学
No ratings yet
神经生物学
58 pages
CVH o Complete
No ratings yet
CVH o Complete
3 pages
Imara Corporate Finance Profile June 2011
No ratings yet
Imara Corporate Finance Profile June 2011
16 pages
EVIEWS Tutorial: Basics: Professor Roy Batchelor City University Business School, London & ESCP, Paris
No ratings yet
EVIEWS Tutorial: Basics: Professor Roy Batchelor City University Business School, London & ESCP, Paris
17 pages
Department of Labor: 198 Publication
No ratings yet
Department of Labor: 198 Publication
32 pages
LESSON 6 - Week5 - BESR - The Impact of Belief System in Business Practices
No ratings yet
LESSON 6 - Week5 - BESR - The Impact of Belief System in Business Practices
21 pages
NATTA
No ratings yet
NATTA
30 pages
Mon Ford Tulio-Resume
No ratings yet
Mon Ford Tulio-Resume
3 pages
French Beginner Lesson 5
No ratings yet
French Beginner Lesson 5
4 pages
UCSP Report
No ratings yet
UCSP Report
16 pages
Why Are Significant Figures and Rounding Important?
No ratings yet
Why Are Significant Figures and Rounding Important?
2 pages
Lesson Plan in MAPEH COT P.E
No ratings yet
Lesson Plan in MAPEH COT P.E
5 pages
Meth Trafficking Appeal Ruling
No ratings yet
Meth Trafficking Appeal Ruling
17 pages