Ensemble Method

Ensemble methods combine multiple models to enhance accuracy, with popular techniques including bagging, boosting, and random forests. Bagging involves averaging predictions from various classifiers, while boosting focuses on weighted votes based on classifier accuracy. Additionally, strategies for handling class-imbalanced datasets include oversampling, under-sampling, and threshold-moving, which are essential for improving classification in scenarios with rare positive examples.

Uploaded by

Prerna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views8 pages

Ensemble Method

Uploaded by

Prerna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 8

ENSEMBLE METHODS: INCREASING THE ACCURACY

Ensemble methods
 Use a combination of models to
increase accuracy
 Combine a series of k learned
models, M1, M2, …, Mk, with the aim
of creating an improved model M*
Popular ensemble methods
 Bagging: averaging the prediction
over a collection of classifiers
 Boosting: weighted vote with a
collection of classifiers
 Ensemble: combining a set of
heterogeneous classifiers
1
BAGGING: BOOSTRAP AGGREGATION
Analogy: Diagnosis based on multiple doctors’ majority vote
Training
 Given a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled
with replacement from D (i.e., bootstrap)
 A classifier model Mi is learned for each training set D i

Classification: classify an unknown sample X

 Each classifier Mi returns its class prediction
 The bagged classifier M* counts the votes and assigns the class with the most votes
to X
Prediction: can be applied to the prediction of continuous values by taking the average
value of each prediction for a given test tuple
Accuracy
 Often significantly better than a single classifier derived from D
 For noise data: not considerably worse, more robust
 Proved improved accuracy in prediction
2
BOOSTING
Analogy: Consult several doctors, based on a combination of
weighted diagnoses—weight assigned based on the previous
diagnosis accuracy
How boosting works?
 Weights are assigned to each training tuple
 A series of k classifiers is iteratively learned
 After a classifier Mi is learned, the weights are updated to allow
the subsequent classifier, Mi+1, to pay more attention to the
training tuples that were misclassified by Mi
 The final M* combines the votes of each individual classifier,
where the weight of each classifier's vote is a function of its
accuracy
Boosting algorithm can be extended for numeric prediction
Comparing with bagging: Boosting tends to have greater accuracy,
but it also risks overfitting the model to misclassified data
4
ADABOOST (FREUND AND SCHAPIRE, 1997)

Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)

Initially, all the weights of tuples are set the same (1/d)
Generate k classifiers in k rounds. At round i,
 Tuples from D are sampled (with replacement) to form a training set D i of the same size
 Each tuple’s chance of being selected is based on its weight
 A classification model Mi is derived from Di
 Its error rate is calculated using Di as a test set
 If a tuple is misclassified, its weight is increased, o.w. it is decreased

Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier Mi error rate is the sum of
the weights of the misclassified tuples: d
error ( M i )  w j err ( X j )
j

1  error ( M i )
log
The weight of classifier Mi’s vote is error ( M i )
5
RANDOM FOREST ( BREIMAN
2001)
Random Forest:
 Each classifier in the ensemble is a decision tree classifier and is generated
using a random selection of attributes at each node to determine the split
 During classification, each tree votes and the most popular class is returned

Two Methods to construct Random Forest:

 Forest-RI (random input selection): Randomly select, at each node, F
attributes as candidates for the split at the node. The CART methodology is
used to grow the trees to maximum size
 Forest-RC (random linear combinations): Creates new attributes (or
features) that are a linear combination of the existing attributes (reduces
the correlation between individual classifiers)
Comparable in accuracy to Adaboost, but more robust to errors and outliers
Insensitive to the number of attributes selected for consideration at each
split, and faster than bagging or boosting
7
CLASSIFICATION OF CLASS-IMBALANCED
DATA SETS
Class-imbalance problem: Rare positive example but numerous negative
ones, e.g., medical diagnosis, fraud, oil-spill, fault, etc.
Traditional methods assume a balanced distribution of classes and equal
error costs: not suitable for class-imbalanced data
Typical methods for imbalance data in 2-class classification:
 Oversampling: re-sampling of data from positive class
 Under-sampling: randomly eliminate tuples from negative class
 Threshold-moving: moves the decision threshold, t, so that the rare
class tuples are easier to classify, and hence, less chance of costly false
negative errors
 Ensemble techniques: Ensemble multiple classifiers introduced above
Still difficult for class imbalance problem on multiclass tasks
8

Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Bagging
No ratings yet
Bagging
7 pages
Ensemble Methods
No ratings yet
Ensemble Methods
19 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages
Ensemble Methods in Machine Learning
No ratings yet
Ensemble Methods in Machine Learning
54 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
Module 2
No ratings yet
Module 2
34 pages
Ensemble Classifiers Overview
No ratings yet
Ensemble Classifiers Overview
37 pages
Ensembles
No ratings yet
Ensembles
9 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
Bagging vs Boosting in Ensemble Learning
No ratings yet
Bagging vs Boosting in Ensemble Learning
40 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Bagging, Boosting, and Random Forests Explained
No ratings yet
Bagging, Boosting, and Random Forests Explained
27 pages
Ensemble Methods Final PDF
No ratings yet
Ensemble Methods Final PDF
25 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Ensemble Learning (Autosaved)
No ratings yet
Ensemble Learning (Autosaved)
31 pages
Ensemble Methods Unit - 4
No ratings yet
Ensemble Methods Unit - 4
17 pages
Week 11
No ratings yet
Week 11
16 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Ensembling Techniques
No ratings yet
Ensembling Techniques
11 pages
Assessing Predictive Models
No ratings yet
Assessing Predictive Models
25 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
Chapter 3 Ensemble Learning
No ratings yet
Chapter 3 Ensemble Learning
37 pages
Class Adv Classification V
No ratings yet
Class Adv Classification V
50 pages
Bagging vs Pasting in Ensemble Learning
No ratings yet
Bagging vs Pasting in Ensemble Learning
28 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
Ensembles 1
No ratings yet
Ensembles 1
4 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Machine Learning Lecture 2,3,4
No ratings yet
Machine Learning Lecture 2,3,4
26 pages
Unit 3
No ratings yet
Unit 3
63 pages
2025 Ensemble Learning
No ratings yet
2025 Ensemble Learning
25 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
14-AI ML Ensemble 2022
No ratings yet
14-AI ML Ensemble 2022
41 pages
Boosting
No ratings yet
Boosting
2 pages
Module 7 Notes
No ratings yet
Module 7 Notes
3 pages
12 Ensemble Model
No ratings yet
12 Ensemble Model
90 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Ensemble Learning
No ratings yet
Ensemble Learning
13 pages
ML Lecture 7 - Ensemble Learning
No ratings yet
ML Lecture 7 - Ensemble Learning
18 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
CL Back Propogation
No ratings yet
CL Back Propogation
11 pages
D Clustering
No ratings yet
D Clustering
8 pages
Ensemble Method
No ratings yet
Ensemble Method
18 pages
BD Sec B
No ratings yet
BD Sec B
19 pages
Synposis FInal 2
No ratings yet
Synposis FInal 2
20 pages
Reporting VS Analysis
No ratings yet
Reporting VS Analysis
9 pages
Turing Machines Explained
No ratings yet
Turing Machines Explained
59 pages
Binary and Octal Arithmetic Operations
No ratings yet
Binary and Octal Arithmetic Operations
21 pages
Max & Min Marks Using Heaps
No ratings yet
Max & Min Marks Using Heaps
3 pages
Lab Experiment
No ratings yet
Lab Experiment
4 pages
Math Enthusiasts: Cubic Function Analysis
No ratings yet
Math Enthusiasts: Cubic Function Analysis
2 pages
10.1007 - 978 3 319 33003 7 - 11
No ratings yet
10.1007 - 978 3 319 33003 7 - 11
10 pages
PyTorch Geometric Graph Learning Guide
100% (1)
PyTorch Geometric Graph Learning Guide
26 pages
537 Overview
No ratings yet
537 Overview
12 pages
Numsol Midterms
No ratings yet
Numsol Midterms
7 pages
ES 204 Syllabus AY 2019-2020
No ratings yet
ES 204 Syllabus AY 2019-2020
2 pages
Polynomial Time Algorithms for LP, CQP, SDP
No ratings yet
Polynomial Time Algorithms for LP, CQP, SDP
49 pages
Ai File
No ratings yet
Ai File
28 pages
Sorting Algorithms
No ratings yet
Sorting Algorithms
6 pages
Bell Number
No ratings yet
Bell Number
3 pages
Heap Sort 001
No ratings yet
Heap Sort 001
4 pages
Comprehensive Guide to Algorithms and Data Structures
No ratings yet
Comprehensive Guide to Algorithms and Data Structures
2 pages
Graph Coloring Techniques
No ratings yet
Graph Coloring Techniques
3 pages
Nearest Neighbor Search Algorithms
No ratings yet
Nearest Neighbor Search Algorithms
20 pages
Sample Paper - 2009 Class - X Subject - Computer Application
No ratings yet
Sample Paper - 2009 Class - X Subject - Computer Application
3 pages
Chapter 6 - Methods Part III - BLOCPLAN and LOGIC
No ratings yet
Chapter 6 - Methods Part III - BLOCPLAN and LOGIC
17 pages
Neural Networks in Data Mining
No ratings yet
Neural Networks in Data Mining
22 pages
Algorithms Minimum Spanning Trees (MST) Solutions
No ratings yet
Algorithms Minimum Spanning Trees (MST) Solutions
14 pages
Standard Representation For Logic Functions
No ratings yet
Standard Representation For Logic Functions
15 pages
Btech Cse 4 Sem Design and Analysis of Algorithms 105402 2022
No ratings yet
Btech Cse 4 Sem Design and Analysis of Algorithms 105402 2022
4 pages
ParaMetis Manual
No ratings yet
ParaMetis Manual
29 pages
NeurIPS 2022 Salsa Attacking Lattice Cryptography With Transformers Supplemental Conference
No ratings yet
NeurIPS 2022 Salsa Attacking Lattice Cryptography With Transformers Supplemental Conference
9 pages
Resmi N.G. Reference: Data Structures and Algorithms: Alfred V. Aho, John E. Hopcroft, Jeffrey D. Ullman
No ratings yet
Resmi N.G. Reference: Data Structures and Algorithms: Alfred V. Aho, John E. Hopcroft, Jeffrey D. Ullman
120 pages
Dynamic Programming Overview
No ratings yet
Dynamic Programming Overview
5 pages
Brute Force Approach
No ratings yet
Brute Force Approach
3 pages
Network Design with MST Algorithms
No ratings yet
Network Design with MST Algorithms
20 pages

Ensemble Method

Uploaded by

Ensemble Method

Uploaded by

ENSEMBLE METHODS: INCREASING THE ACCURACY

Classification: classify an unknown sample X

Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)

Two Methods to construct Random Forest:

You might also like