ENSEMBLE LEARNING:
The ensemble methods in machine learning combine the insights obtained from multiple
learning models to facilitate accurate and improved decisions. These methods follow the same
principle as the example of buying an air-conditioner cited above
Ensemble learning helps improve machine learning results by combining several models.
This approach allows the production of better predictive performance compared to a single
model. Basic idea is to learn a set of classifiers (experts) and to allow them to vote.
Advantage : Improvement in predictive accuracy.
Disadvantage : It is difficult to understand an ensemble of classifiers.
Why do ensembles work
Dietterich(2002) showed that ensembles overcome three problems –
StatisticalProblem –
The Statistical Problem arises when the hypothesis space is too large for the
amount of available data. Hence, there are many hypotheses with the same
accuracy on the data and the learning algorithm chooses only one of them! There
is a risk that the accuracy of the chosen hypothesis is low on unseen data!
ComputationalProblem –
The Computational Problem arises when the learning algorithm cannot guarantees
finding the best hypothesis.
RepresentationalProblem –
The Representational Problem arises when the hypothesis space does not contain
any good approximation of the target class(es).
Main Challenge for Developing Ensemble Models
The main challenge is not to obtain highly accurate base models, but rather to obtain base
models which make different kinds of errors. For example, if ensembles are used for
classification, high accuracies can be accomplished if different base models misclassify
different training examples, even if the base classifier accuracy is low.
Methods for Independently Constructing Ensembles –
Majority Vote
Bagging and Random Forest
Randomness Injection
Feature-Selection Ensembles
Error-Correcting Output Coding
Methods for Coordinated Construction of Ensembles –
Boosting
Stacking
Reliable Classification: Meta-Classifier Approach
Co-Training and Self-Training
Types of Ensemble Classifier –
Bagging:
Bagging (Bootstrap Aggregation) is used to reduce the variance of a decision tree. Suppose
a set D of d tuples, at each iteration i, a training set Di of d tuples is sampled with replacement
from D (i.e., bootstrap). Then a classifier model M i is learned for each training set D < i.
Each classifier Mi returns its class prediction. The bagged classifier M* counts the votes and
assigns the class with the most votes to X (unknown sample).
Implementation steps of Bagging –
1. Multiple subsets are created from the original data set with equal tuples, selecting
observations with replacement.
2. A base model is created on each of these subsets.
3. Each model is learned in parallel from each training set and independent of each
other.
4. The final predictions are determined by combining the predictions from all the
models.
Bagging
Bootstrap Aggregating, also known as bagging, is a machine learning ensemble meta-
algorithm designed to improve the stability and accuracy of machine learning algorithms
used in statistical classification and regression. It decreases the variance and helps to
avoid overfitting. It is usually applied to decision tree methods. Bagging is a special case of
the model averaging approach.
Description of the Technique
Suppose a set D of d tuples, at each iteration i, a training set D i of d tuples is selected via row
sampling with a replacement method (i.e., there can be repetitive elements from different d
tuples) from D (i.e., bootstrap). Then a classifier model Mi is learned for each training set D
< i. Each classifier Mi returns its class prediction. The bagged classifier M* counts the votes
and assigns the class with the most votes to X (unknown sample).
Implementation Steps of Bagging
Step 1: Multiple subsets are created from the original data set with equal tuples,
selecting observations with replacement.
Step 2: A base model is created on each of these subsets.
Step 3: Each model is learned in parallel with each training set and independent
of each other.
Step 4: The final predictions are determined by combining the predictions from
all the models.
An illustration for the concept of bootstrap aggregating (Bagging)
Example of Bagging
The Random Forest model uses Bagging, where decision tree models with higher variance
are present. It makes random feature selection to grow trees. Several random trees make a
Random Forest.
Random Forest:
Random Forest is an extension over bagging. Each classifier in the ensemble is a
decision tree classifier and is generated using a random selection of attributes at each
node to determine the split. During classification, each tree votes and the most popular
class is returned.
Implementation steps of Random Forest –
1. Multiple subsets are created from the original data set, selecting
observations with replacement.
2. A subset of features is selected randomly and whichever feature gives the
best split is used to split the node iteratively.
3. The tree is grown to the largest.
4. Repeat the above steps and prediction is given based on the aggregation of
predictions from n number of trees.
5.
Boosting
Boosting is an ensemble modeling technique that attempts to build a strong classifier from
the number of weak classifiers. It is done by building a model by using weak models in series.
Firstly, a model is built from the training data. Then the second model is built which tries to
correct the errors present in the first model. This procedure is continued and models are added
until either the complete training data set is predicted correctly or the maximum number of
models is added.
Boosting Algorithms
There are several boosting algorithms. The original ones, proposed by Robert
Schapire and Yoav Freund were not adaptive and could not take full advantage of the weak
learners. Schapire and Freund then developed AdaBoost, an adaptive boosting algorithm that
won the prestigious Gödel Prize. AdaBoost was the first really successful boosting algorithm
developed for the purpose of binary classification. AdaBoost is short for Adaptive Boosting
and is a very popular boosting technique that combines multiple “weak classifiers” into a
single “strong classifier”.
Algorithm:
1. Initialise the dataset and assign equal weight to each of the data point.
2. Provide this as input to the model and identify the wrongly classified data points.
3. Increase the weight of the wrongly classified data points and decrease the weights
of correctly classified data points. And then normalize the weights of all data
points.
4. if (got required results)
Goto step 5
else
Goto step 2
5. End
An illustration presenting the intuition behind the boosting algorithm, consisting of the
parallel learners and