MCA22
Artificial Intelligence And
Machine Learning
AIML Syllabus Modules XP
Sr.No Module Hrs
1 Module1: Introduction to AI concept 6
2 Module2: Search Strategies 6
3 Module3: Basics of ML & Supervise Learning 6
4 Module4: Advanced Techniques in Supervised Learning 6
5 Module5: Unsupervised learning and Dimensionality Reduction- 7
6 Module6: Ensemble learning Methods and Reinforcement 9
HoursXP
Unit 6: Ensemble learning Methods & Reinforcement
Sr. No. Topics 09
1 Ensemble learning - Mixture Models, Classifier using multiple samples of the data set, improving
classifier by focusing on error,
2 weak learner with a decision stump,
3 Bagging, Stacking, Boosting,
4 Implementing the AdaBoost algorithm,
5 Classifying with AdaBoost Bootstrapping and cross validation
6 Reinforcement Learning (RL),
7 Elements of Reinforcement Learning,
8 Reinforcement Learning vs Supervised Learning,
9 Approaches of solving Reinforcement Learning: Value based, policy based, model based, MDP
Self-Learning – Monte Carlo methods
Module 6: Ensemble learning
Methods & Reinforcement
Different Classifiers (1) XP
■ Different Classifiers
■ Conduct classification on a same set of class labels
■ May use different input or have different parameters
■ May produce different output for a certain example
■ Learning Different Classifiers
■ Use different training examples
■ Use different features
Different Classifiers (2) XP
■ Performance
■ Each of the classifiers is not perfect
■ Complementary
■ Examples which are not correctly classified by one
classifier may be correctly classified by the other
classifiers
■ Potential Improvements?
■ Utilize the complementary property
Ensembles of Classifiers XP
■ Idea
■ Combine the classifiers to improve the
performance
■ Ensembles of Classifiers
■ Combine the classification results from
different classifiers to produce the final output
■ Unweighted voting
■ Weighted voting
Example: Weather Forecast XP
Reality
1 X X X
2 X X X
3 X X X
4 X X
5 X X
Combine
Ensemble Learning XP
■ Ensemble Learning
■ Relatively later field in machine learning
■ Achieve state-of-the-art performance
■ Central Issues in Ensemble Learning
■ How to create classifiers with complementary
performances
■ How to conduct voting
Strong and Weak Learners XP
■ Strong Learner
■ Take labeled data for training
■ Produce a classifier which can be arbitrarily accurate
■ Objective of machine learning
■ Weak Learner
■ Take labeled data for training
■ Produce a classifier which is more accurate than random
guessing
Ensemble XP
• We discussed many different
learning algorithms previously.
• Though these are generally
successful, no one single algorithm
is always the most accurate.
• Now, we are going to discuss
models composed of multiple
learners that complement each
other so that by combining them,
we attain higher accuracy.
XP
Ensemble models in machine learning operate on a
similar idea. They combine the decisions from
multiple models to improve the overall
performance.
Ensemble Learning XP
• It is the technique to use multiple learning algorithms to train models with the
same dataset to obtain a prediction in machine learning. After getting the
prediction from each model, we will use model averaging techniques like
weighted average, variance or max voting to get the final prediction. This
method aims to obtain better predictions than the individual model. This results
in better accuracy avoiding overfitting, and reduces bias and co-variance. Two
popular ensemble methods are:
1.Bagging (Bootstrap Aggregating)
2.Boosting
Boosting XP
■ Learners
■ Strong learners are very difficult to construct
■ Constructing weaker Learners is relatively easy
■ Strategy
■ Derive strong learner from weak learner
■ Boost weak classifiers to a strong learner
Construct Weak Classifiers XP
■ Using Different Data Distribution
■ Start with uniform weighting
■ During each step of learning
■ Increase weights of the examples which are not correctly learned by
the weak learner
■ Decrease weights of the examples which are correctly learned by the
weak learner
■ Idea
■ Focus on difficult examples which are not correctly classified
in the previous steps
Combine Weak Classifiers XP
■ Weighted Voting
■ Construct strong classifier by weighted voting of the
weak classifiers
■ Idea
■ Better weak classifier gets a larger weight
■ Iteratively add weak classifiers
■ Increase accuracy of the combined classifier through
minimization of a cost function
XP
• Recently with computation memory getting cheaper, such systems
composed of multiple learners have become popular
• There are basically two questions here:
1. How do we generate base-learners that complement each other?
2. How do we combine the outputs of base-learners for maximum
accuracy?
• model combination is not a trick that always increases accuracy
• model combination does always increase time and space complexity
of training and testing, and unless base-learners are trained carefully
and their decisions combined smartly, we will only pay for this extra
complexity without any significant gain in accuracy.
XP
• Ensemble methods is a machine learning technique that
combines several base models in order to produce one
optimal predictive model
XP
Ensemble methods XP
• Each learning algorithm dictates a certain model that comes with a set of
assumptions.
• This inductive bias leads to error if the assumptions do not hold for the data.
• Learning is an ill-posed problem and with finite data, each algorithm converges
to a different solution and fails under different circumstances.
• The performance of a learner may be fine-tuned to get the highest possible
accuracy on a validation set, but this fine tuning is a complex task and still
there are instances on which even the best learner is not accurate enough.
• The idea is that there may be another learner that is accurate on these.
• By suitably combining multiple base learners then, accuracy can be
improved.
XP
XP
Random forest is an ensemble of Decision tree
Why use ensemble models? XP
Maximizing individual accuracies and the diversity between XP
learners.
Different ways to achieve this.
• Different Algorithms
– We can use different learning algorithms to train different base-learners.
– Different algorithms make different assumptions about the data and lead to
different classifiers. For example, one base-learner may be parametric and
another may be nonparametric.
– When we decide on a single algorithm, we give emphasis to a single method
and ignore all others.
– Combining multiple learners based on multiple algorithms, we free ourselves
from taking a decision and we no longer put all our eggs in one basket.
Ethem page 462
Maximizing individual accuracies and the diversity between
XP
learners.
Different ways to achieve this.
• Different Hyperparameters
–We can use the same learning algorithm but use it with different
hyperparameters.
–Examples are the number of hidden units in a multilayer
perceptron, k in k-nearest neighbor, error threshold in decision
trees, the kernel function in support vector machines, etc.
–When we train multiple base-learners with different
hyperparameter values, we average over this factor and reduce
variance, and therefore error.
Ethem page 462
Different Input Representations
XP
• Separate base-learners may be using different representations of the same
input object or event, making it possible to integrate different types of
sensors/measurements/modalities.
• Different representations make different characteristics explicit allowing
better identification.
• In many applications, there are multiple sources of information, and it is
desirable to use all of these data to extract more information and achieve
higher accuracy in prediction.
• For example, in speech recognition, to recognize the uttered words, in
addition to the acoustic input, we can also use the video image of the
speaker’s lips sensor fusion as the words are spoken. This is similar to
sensor fusion where the data from different sensors are integrated to extract
more information for a specific application.
• The approach we take is to make separate predictions based on different
sources using separate base-learners, then combine their predictions.
XP
Different Training Sets
• Another possibility is to train different base-learners by
different subsets of the training set.
• This can be done randomly by drawing random training sets
from the given sample; this is called bagging.
• Or, the learners can be trained serially so that instances on
which the preceding base-learners are not accurate are given
more emphasis in training later base-learners; examples are
boosting and cascading, which actively try to generate
complementary learners, instead of leaving this to chance
Diversity vs. Accuracy XP
• One important note is that when we generate multiple base-learners, we want
them to be reasonably accurate but do not require them to be very accurate
individually, so they are not, and need not be, optimized separately for best
accuracy.
• The base-learners are not chosen for their accuracy, but for their simplicity.
• We do require, however, that the base learners be diverse, that is, accurate on
different instances, specializing in subdomains of the problem.
• What we care for is the final accuracy when the base-learners are combined,
rather than the accuracies of the base-learners we started from. Let us say we
have a classifier that is 80 percent accurate. When we decide on a second
classifier, we do not care for the overall accuracy; we care only about how
accurate it is on the 20 percent that the first classifier misclassifies, as long as
we know when to use which one.
Diversity vs. Accuracy conti.. XP
• This implies that the required accuracy and diversity
of the learners also depend on how their decisions are
to be combined. If, as in a voting scheme, a learner is
consulted for all inputs, it should be accurate
everywhere and diversity should be enforced
everywhere;
AdaBoost XP
• Freund and Schapire (1996) proposed a variant, named AdaBoost,
short for adaptive boosting, that uses the same training set over and
over and thus need not be large, but the classifiers should be simple so
that they do not overfit.
• AdaBoost can also combine an arbitrary number of base learners.
• Many variants of AdaBoost have been proposed; here, we discuss the
original algorithm
XP
AdaBoost
• In AdaBoost algorithm instead of multi layer tree we create stump
• A decision
Weakstump learner
is with a decision
a machine learningstump
model consisting of a one-level decision
tree.
• That is, it is a decision tree with one internal node (the root) which is immediately
connected to the terminal nodes (its leaves).
• A decision stump makes a prediction based on the value of just a single input feature
• Selection of the tree for first iteration will be based on entropy value (minimum
entropy tree will be select)
https://www.youtube.com/watch?v=LsK-xG1cLYA
Random forest AdaBoost/ Meta-algorithm
XP
1. Classifier using multiple samples of the data set
If you were going to make an important decision, you’d probably get the advice of
multiple experts instead of trusting one person. This is the idea behind a meta-
algorithm.
Meta-algorithms are a way of combining other algorithms. Which is also called as
….. AdaBoost.
Pred 2
Till now, we have seen 4 different classifiers
Pred 1 Pred 3 Decision Tree
Naïve Bayes One idea that naturally arises is
Final Prediction Linear Regression combining multiple
Instead of using a single model, we can use a group of models based on the each of the Logistic Regression & classifiers. Methods that do
predictions of each model, we can make the final decision. SVM this are known as ensemble
methods or meta-algorithms
churn is expressed as a degree of customer inactivity or
AdaBoost/ Meta-algorithm disengagement, observed over a given time.
Mode or Vote XP
How do we combine Models? This manifests within the data in the form of change in the
account balance
If all the tree models predict the same thing
there is no issue
Pred1=Pred2=Pred3
Otherwise
we use MaxVoting technique
According to MaxVoting the majority class is
Vote the final prediction.
We never say that combination of model prediction
would give 100% correct prediction but it would be
better..
But, this Voting technique will be helpful when the
Target values are discrete.
AdaBoost/ Meta-algorithm XP
Suppose for Regression Problem with continuous target.
Averaging
Two techniques are used in ensemble method:
1. Bagging
2. Boosting
n= number of instances in Training Data
AdaBoost/ Meta-algorithm XP
n’, n’’= number of instances in in a bag
1. Bagging [BOOTSTRAP Aggregation] Row Sampling with Replacement
BOOTSTRAP k = number of bags
Data
n’, n’’.. < n
Training
Data
n' n’'
…… n’k
n
Model 1 Model 2 Model k
Testing
Data
1 1 0 1
1
Aggregation
AdaBoost/ Meta-algorithm A well-known algorithm based on this technique is known
as AdaBoost XP
Adaptive
2. Boosting
Boosting is a technique similar to bagging. AdaBoost
Which focuses on the areas in which the system is not performing
well.
Random
Training
Data ……
n
Model 1 Model 2 Model k
Testing
Data
AdaBoost/ Meta-algorithm 3. Weak learner with a decision stump
2. Improving classifier by focusing on error XP
Weak classifier
Weak leaner
“Decision
STUMP”
A weight is applied to every example in the training data.
Bagging Boosting
Model 1 F1 Model 1
Random Sample F1
Training Data
Model 2 F2
Training Data Random Sample F Model 1 F2 F
Weighted Sample
Resampling Model 3 F3
Model 1
Sequential
Random Sample F3
Uniform Distribution Weighted Sample
Reweighting
Parallelly
Non- Uniform Distribution
XP
AdaBoost/ Meta-algorithm XP
n=1/7
number of observations (n)=7
AdaBoost/ Meta-algorithm Based on the purest split model
XPwill
select the STUMP.
Age Sex Cholesterol
> 40Y
0 1
5 Right Classification total error= Missed classification/ Total number of observations
2 Miss Classification
=2/7 = 0.28
SAMME : Stage wise Multi Model using
Exponential Loss function
AdaBoost/ Meta-algorithm 5. Classifying with AdaBoost
XP
So, those data whose New weights are
more their probability of selection will be
high.
The data whose weight we are
decreasing means that data is already
well classified.
Performance of STUMP =0.45 Old Wights
Step 4: Updating Weights
Performance
New Weight=Old Wights * e
+ for Misclassification
- For Right classification
New Weight=0.14 * e+0.45 =0.14 * 1.56 = 0.21 New Weight=0.14 * e- 0.45 =0.14 * 0.63 = 0.08
AdaBoost/ Meta-algorithm Model 1
XP
Total= 1 Total= 0.82
AdaBoost/ Meta-algorithm XP
5. Classifying with AdaBoost
decrease the
weight
increase the
weight
decrease the
weight
We will repeat all those steps till the number of estimators / Models is reached
weighted sample
It will go on until
Updated weighted sample Updated weighted sample Updated weighted sample
Original Model 1 Model 2 the higher number
Model 3 Model 4 Model N
Dataset W W W W W
of estimators are
L L L L L not reached or
accuracy is
achieved.
1 0 1 0 1
Strong Learner 1
AdaBoost/ Meta-algorithm XP
5. Classifying with AdaBoost
Schematic representation of AdaBoost
XP
AdaBoost…Conti..
XP
AdaBoost…Conti..
🡨 In random forest
AdaBoost…Conti..
XP
AdaBoost…Conti..
XP
https://www.youtube.com/watch?v=LsK-xG1cLYA
AdaBoost…Conti..
XP
XP
AdaBoost…Conti..
AdaBoost…Conti.. XP
Motivation for Stacking XP
• For each of our ensemble methods so far (besides Neural
Networks), we have:
• Fit the base model on the same type (regression trees, for
example).
• Combined the predictions in a naïve way.
• Stacking is a way to generalize the ensembling approach to
combine outputs of various types of model, and improves on the
combination as well.
Define Stacked generalization XP
• Stacked generalization is a technique proposed by Wolpert
(1992) that extends voting in that the way the output of the base-
learners is combined need not be linear but is learned through a
combiner system.
Ethem page 435
XP
XP
How stacking works?
• We split the training data into K-folds just like K-fold cross-validation.
• A base model is fitted on the K-1 parts and predictions are made for Kth part.
• We do for each part of the training data.
• The base model is then fitted on the whole train data set to calculate its
performance on the test set.
• We repeat the last 3 steps for other base models.
• Predictions from the train set are used as features for the second level model.
• Second level model is used to make a prediction on the test set.
XP
Sample questions XP
Explain in brief Adaboost algorithm
Explain the concept of Mixer model
Explain staking and how it is differ from adaboost
XP
Reinforcement Learning
■ Supervised and Unsupervised learning is learning from data
■ RL is not unsupervised learning
■ How did you learn “how to cycle”
■ Do you see a lot of videos or books ( data) to learn.
■ RL is Trial and Error learning…getting feedback and learn
■ Inspired by “Behavioral theory”
■ For eg Pavlov dog theory
■ Starts salivating on hearing a bell
■ Food is reward, bell is input signal
■ Behavioral conditioning
XP
There may be temporal disconnect between
the action and reward /punishment
Learn some kind of association between
inputs through actions. These association
give rise to a policy. Policy is not just
sequence of actions.
You do not always get the same responses
Reinforcement Learning XP
• Reinforcement learning addresses the question of how an autonomous agent that senses
and acts in its environment can learn to choose optimal actions to achieve its goals.
• Each time the agent performs an action in its environment, a trainer may provide a
reward or penalty to indicate the desirability of the resulting state.
• For example, when training an agent to play a game the trainer might provide a positive
reward when the game is won, negative reward when it is lost, and zero reward in all
other states
• The task of the agent is to learn from this
indirect, delayed reward, to choose sequences
of actions that produce the greatest
cumulative reward.
Building a learning robot XP
• The robot, or agent, has a set of sensors to observe the state of its
environment, and a set of actions it can perform to alter this state.
• For example, a mobile robot may have sensors such as a camera and sonars, and actions such as
"move forward" and "turn.“
• Its task is to learn a control strategy, or policy, for choosing actions that achieve its goals.
• For example, the robot may have a goal of docking onto its battery charger whenever its battery level
is low.
• agents can learn successful control policies by experimenting in their environment
• goals of the agent can be defined by a reward function that assigns a numerical value-an immediate
payoff-to each distinct action the agent may take from each distinct state.
XP
• E.g. the goal of docking to the battery charger can be captured by assigning a positive reward
(e.g., +l00) to state-action transitions that immediately result in a connection to the charger and
a reward of zero to every other state-action transition.
• This reward function may be built into the robot, or known only to an external teacher who
provides the reward value for each action performed by the robot.
• The control policy we desire is one that, from any initial state, chooses actions that maximize
the reward accumulated over time by the agent.
XP
XP
Reinforcement learning problem differs from other function XP
approximation tasks in several important respects
• Delayed reward: the trainer provides only a sequence of immediate reward values as the agent executes its
sequence of actions. The agent, therefore, faces the problem of temporal credit assignment: determining
which of the actions in its sequence are to be credited with producing the eventual rewards.
• Exploration: The learner faces a tradeoff in choosing whether to favor exploration of unknown states and
actions (to gather new information), or exploitation of states and actions that it has already learned will yield
high reward (to maximize its cumulative reward)
• Partially observable states: in many practical situations sensors provide only partial information
• Life-long learning: robot learning often requires that the robot learn several related tasks within the same
environment, using the same sensors. This setting raises the possibility of using previously obtained
experience or knowledge to reduce sample complexity when learning new tasks.
Reinforcement Learning – policy learning XP
◻Policies: what actions should an agent take in a particular situation
◻Utility estimation: how good is a state (used by policy)
■ No supervised output but delayed reward
■ Credit assignment problem (what was responsible for the outcome)
■ Applications:
◻Game playing
◻Robot in a maze
Helicopter experiment video standford https://www.youtube.com/watch?v=M-QUkgk3HyE XP
Backgammon
Robot Soccer Challenge is a football game in
which robots on remote control stand E learning
against each other
GO
Tesauro (1995) describes the TD-GAMMO program XP
• has used reinforcement learning to become a world-class backgammon player.
• This program, after training on 1.5 million self-generated games, is now considered
nearly equal to the best human players in the world and has played competitively
against top-ranked players in international backgammon tournaments.
End of Unit 6