0% found this document useful (0 votes)
18 views55 pages

Unit 6

The document discusses supervised learning, focusing on generative models that create new data instances by learning from existing datasets, and contrasts them with discriminative models that make predictions based on conditional probabilities. It also covers the Naïve Bayes classifier, decision trees, and ensemble models like bagging and boosting, explaining their functions and applications in machine learning. Key concepts such as entropy, classification and regression trees, and the importance of decision trees in handling diverse data types are also highlighted.

Uploaded by

kavita Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views55 pages

Unit 6

The document discusses supervised learning, focusing on generative models that create new data instances by learning from existing datasets, and contrasts them with discriminative models that make predictions based on conditional probabilities. It also covers the Naïve Bayes classifier, decision trees, and ensemble models like bagging and boosting, explaining their functions and applications in machine learning. Key concepts such as entropy, classification and regression trees, and the importance of decision trees in handling diverse data types are also highlighted.

Uploaded by

kavita Arora
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Unit VI

Supervised Learning
Generative Models
• A type of ML model that can create new data instances by learning patterns and
distributions from existing data.
• Once these patterns are learned, the model can then generate new data that shares
similar characteristics with the original dataset.

• Imagine you're teaching a child to draw animals. After showing them


several pictures of different animals, the child begins to understand
the general features of each animal.
• Given some time, the child might draw an animal they've never seen
before, combining features they've learned.
• This is analogous to how a generative model operates: it learns from
the data it's exposed to and then creates something new based on
that knowledge.

How generative modeling works
• Generative models generally run on neural networks.
• To create a generative model, a large data set is typically required.
• The model is trained by feeding it various examples from the data set and
adjusting its parameters to better match the distribution of the data.
• Once the model is trained, it can be used to generate new data by
sampling from the learned distribution.
• The generated data can be similar to the original data set, but with some
variations or noise.
• For example, a data set containing images of horses could be used to build
a model that can generate a new image of a horse that has never existed
but still looks almost realistic.
Generative modeling vs.
Discriminative modeling
• Discriminative model: Makes predictions on unseen data based on
conditional probability and can be used either for classification or
regression problem statements.

• Generative model: Enable computers to use existing content like text,


audio and video files, images, and even code to create new possible
content.
• The main idea is to generate completely original artifacts like the real deal.
Types of Generative models
• 1. Bayesian networks. These are graphical models that represent
the probabilistic relationships among a set of variables. They're
particularly useful in scenarios where understanding causal
relationships is crucial. For example, in medical diagnosis, a
Bayesian network might help determine the likelihood of a disease
given a set of symptoms.
• 2. Generative Adversarial Networks (GANs). GANs consist of
two neural networks, the generator and the discriminator, that are
trained together. The generator tries to produce data, while the
discriminator attempts to distinguish between real and generated
data. Over time, the generator becomes so good that the
discriminator can't tell the difference. GANs are popular in image
generation tasks, such as creating realistic human faces or artworks.
Applications of Generative ML
models
• 1. Natural Language Generation (NLG): Instances like GPT-3 can process
human-like written text when prompted, thereby leading to possible
applications in chatbots, content generation or language translation.

• 2. Anomaly Detection: Generative models can be trained to determine that


data follows a normal distribution pattern and look for any abnormalities
that vary out from this distribution in a significant way.
Naïve Bayes Classifier
• Naïve Bayes algorithm is a supervised learning algorithm, which is based
on Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Naïve Bayes Classifier is one of the simplest and most effective
Classification algorithms that help build fast machine learning models that
can make quick predictions.
• It is a probabilistic classifier, which means it predicts based on the
probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentiment analysis, and classifying articles.
Why is it called Naïve Bayes?
• The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:

• Naïve: It is called Naïve because it assumes that the occurrence of a certain


feature is independent of the occurrence of other features. Such as, if the
fruit is identified on the basis of color, shape, and taste, then red, spherical,
and sweet fruit is recognized as an apple. Hence, each feature individually
contributes to identifying that it is an apple without depending on each
other.
• Bayes: It is called Bayes because it depends on the principle of
Bayes' Theorem.
Bayes' Theorem:

• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is


used to determine the probability of a hypothesis with prior
knowledge. It depends on the conditional probability.
• The formula for Bayes' theorem is given as:
• P(h/D) = P(D/h) P(h) / P(D)
• P(h)= the probability of hypothesis h being true- called Prior
probability of h.
• P(D)= the probability of data. This is prior probability.
• P(h/D)= the probability of hypothesis h given the Data D; known as
posterior probability.
• P(D/h)= the probability of data d given that the hypothesis h was true;
known as posterior probability.
Working of Naïve bayes Classifier
• Let us understand the working of Naïve Bayes through an example.
• Given an example of Weather Condition and probability of playing
sports.
• We need to classify whether players will play or not, based on
weather condition.
Algorithm
• Step 1: Calculate the Prior probability for the given class labels.
• Step 2: Find likelihood probability with each attribute for each class.
• Step 3: Put these values in the Bayes Formula and calculate Posterior
probability.
• Step 4: See which class has a higher probability, given that the input
belongs to the higher probability class.
Decision Tree
• Have you been to a wedding or family function where your relatives
ate trying to decide your life?
• Think of a decision tree like that.
• Imagine you are trying to decide what career to choose, your aunt
might say “Engg. or Doctor”? If you say Engg- options Mechanical or
IT?
• A decision tree like your family, tries to make predictions based on
data.
• Every question is like a branch in the tree, breaking down into smaller
branches, until you reach a leaf – the final prediction or decision.
Decision Tree
• A Decision tree is a type of ML algorithm that uses a tree-like model
to make predictions based on the relationships between the features
in the dataset.
• It works by dividing the data into smaller and smaller groups based on
the values of the features until it reaches a decision about the target
variables for each group.
• It is like tree-like structure that represents a series of discussions and
their possible outcomes.
• It is used in ML for classification and regression.
Contd..
• Decision tree learning employs a divide and conquer strategy by
conducting a greedy search to identify the optimal split point within a
tree.
• It is a hierarchical model used in decision support that depicts and
their potential outcomes, incorporating chance events, resource
expenses and utility.
• The tree structure is comprised of a root node, branches, internal
nodes and leaf nodes forming a hierarchical tree-like structure.
Contd..
• Decision trees are upside down which means the root is at the top
and then this root is split into various several nodes.
• It is basically a bunch of if-else statements which check if the
condition is true and if it is, then it goes to the next node attached to
that decision.
• The goal of ML is to decrease uncertainty or disorders from the
dataset and for this, we use decision trees.
• Nodes resulting from splitting the root node are called decision
nodes.
Decision Tree
• A decision tree is a flow chart created by a computer algorithm to
make decisions or numeric predictions based on information in a
digital data set.
• They're considered a branch of artificial intelligence (AI) and
supervised learning, where algorithms make decisions based on past
known outcomes.
• The data set containing past known outcomes and related variables
that a decision tree algorithm uses to learn is known as training data.
Decision tree terminology

• These terms come up frequently in machine learning and are helpful to know as you
embark on your machine learning journey:
• Root node: The topmost node of a decision tree that represents the
entire message or decision
• Decision (or internal) node: A node within a decision tree where the
prior node branches into two or more variables
• Leaf (or terminal) node: The leaf node is also called the external node
or terminal node, which means it has no child—it’s the last node in the
decision tree and farthest from the root node
• Splitting: The process of dividing a node into two or more nodes. It’s
the part at which the decision branches off into variables
• Pruning: The opposite of splitting, the process of going through and
reducing the tree to only the most important nodes or outcomes
Types of decision trees in machine learning

• Decision trees in machine learning can either be classification trees


or regression trees. Together, both types of algorithms fall into a
category of “classification and regression trees” and are sometimes
referred to as CART.
• Their respective roles are to “classify” and to “predict.”
Contd..
• 1. Classification trees
• Classification trees determine whether an event happened or didn’t
happen. Usually, this involves a “yes” or “no” outcome.
• We often use this type of decision-making in the real world.
• Example : How to spend your free time after work
• What you do after work in your free time can be dependent on the
weather. If it is sunny, you might choose between having a picnic with a
friend, grabbing a drink with a colleague, or running errands.
• If it is raining, you might opt to stay home and watch a movie instead.
There is a clear outcome.
• In this case, that is classified as whether to “go out” or “stay in.”
• 2. Regression trees
• Regression trees, conversely, predict continuous values based on
previous data or information sources.
• For example, they can predict the price of gasoline or whether a
customer will purchase eggs (including which type of eggs and at
which store).
• This type of decision-making is more about programming algorithms
to predict what is likely to happen, given previous behavior or trends.
• Example : Bachelor’s degree graduates in 2025
• A regression tree can help a university predict how many bachelor’s
degree students there will be in 2025. On a graph, one can plot the
number of degree-holding students between 2010 and 2022. If the
number of university graduates increases linearly each year, then
regression analysis can be used to build an algorithm that predicts the
number of students in 2025.
Contd..
• Classification and Regression Tree (CART) is a predictive algorithm
used in machine learning that generates future predictions based on
previous values.
• These decision trees are at the core of machine learning, and serve as
a basis for other machine learning algorithms such as random forest,
bagged decision trees, and boosted decision trees.
Why are decision trees used in machine learning?

• Decision trees are widely used in machine learning (ML) because of their ability
to handle diverse data types, capture nonlinear relationships and provide clear,
explainable models.
• They provide a clear visual representation of the entire decision-making process.
This makes decision trees highly interpretable and easy to understand even for
nontechnical stakeholders.
• Because decision trees can be used for both classification and regression tasks,
they can be useful for a variety of ML tasks.
• They are easy to understand and interpret. The treelike structure in which they
model decisions and their possible consequences is intuitive and straightforward.
Contd..
• They can provide insights into the importance of different features, which
helps identify the variables that are the most influential in making
predictions.
• Decision trees can manage various data types, including numerical,
categorical and textual data. This flexibility makes them applicable to a wide
range of data sets.
• They are reliable compared to other methods, making them a good choice
for messy data sets.
• They're relatively simple to set up and understand than more complex ML
algorithms. This makes decision trees a popular choice for rapid prototyping
and beginners.
Entropy in Decision Trees
• Entropy is one of the key aspects of Machine Learning.
• In machine Learning, entropy measures the level of
disorder or uncertainty in a given dataset or system.
• It is a metric that quantifies the amount of information
in a dataset, and it is commonly used to evaluate the
quality of a model and its ability to make accurate
predictions.
• A higher entropy value indicates a more heterogeneous
dataset with diverse classes, while a lower entropy
signifies a more pure and homogeneous subset of data.
• Decision tree models can use entropy to determine the
best splits to make informed decisions and build
accurate predictive models.
Contd..
• When information is processed in the system, then
every piece of information has a specific value to make
and can be used to draw conclusions from it.
• So if it is easier to draw a valuable conclusion from a
piece of information, then entropy will be lower in
Machine Learning, or if entropy is higher, then it will be
difficult to draw any conclusion from that piece of
information.
Entropy In Everyday Life

• Entropy can also explain disorder and complication in everyday


life. Entropy in Health and fitness. Naturally as we get
older, our bodies tend to deteriorate and decay until we die.
But the rate of decline of our health depends on the level of
entropy.
• For example, let’s say you set a New Year’s resolution to lose
weight and get in shape.
• Over two months you stick to a regular exercise and healthy
eating routine, shed excess fat and feel healthier.
• But during the third month, you travel on a holiday and fall off
your healthy routine.
• Overtime you regain weight, lose muscle and experience more
illnesses that speed up the decay of your body.
Mathematical Formula for Entropy
• Consider a data set having a total number of N classes, then the entropy (E) can
be determined with the formula below:

• Where;

• Pi = Probability of randomly selecting an example in class I;


• Entropy always lies between 0 and 1, however depending on the number of
classes in the dataset, it can be greater than 1.
Contd..
• Let's understand it with an example where we have a
dataset having three colors of fruits as red, green, and
yellow.
• Suppose we have 2 red, 2 green, and 4 yellow observations
throughout the dataset. Then as per the above equation:
• E=−(prlog2pr+pglog2pg+pylog2py)
• Where;
• Pr = Probability of choosing red fruits;
• Pg = Probability of choosing green fruits and;
• Py = Probability of choosing yellow fruits.
Contd..
• Pr = 2/8 =1/4 [As only 2 out of 8 datasets represents
red fruits]
• Pg = 2/8 =1/4 [As only 2 out of 8 datasets represents
green fruits]
• Py = 4/8 = 1/2 [As only 4 out of 8 datasets represents
yellow fruits]
Contd..
• Now our final equation will be such as;

So, entropy will be 1.5.


Ensemble Models: Bagging and
Boosting
• Machine learning models are not like traditional software solutions.
• These models need constant updates as new data becomes available for accurate and
reliable predictions.
• In complex and sensitive scenarios, relying on a single model may not be sufficient to
generate an optimal result, and this is where ensemble modeling can help.
• Anytime we’re trying to make an important decision, we try to collect as much
information as possible and reach out to experts for advice.
• The more information we can gather, the more we trust the decision-making process.
• Machine learning predictions follow a similar behavior.
• Models process given inputs and produce an outcome.
• The outcome is a prediction based on what pattern the models see during the training
process.
Contd..
• We all use the Decision Tree Technique on day to day life to make the decision.
• Organizations use these supervised machine learning techniques like Decision
trees to make a better decision and to generate more surplus and profit.
• Ensemble methods combine different decision trees to deliver better predictive
results, afterward utilizing a single decision tree.
• The primary principle behind the ensemble model is that a group of weak
learners come together to form an active learner.
• Let’s understand the concept of ensemble learning with an example.
Suppose you are a movie director and you have created a short
movie on a very important and interesting topic. Now, you want to
take preliminary feedback (ratings) on the movie before making it
public.
• A: You may ask one of your friends to rate the movie for you.
Now it’s entirely possible that the person you have chosen loves you
very much and doesn’t want to break your heart by providing a 1-
star rating to the horrible work you have created.
• B: Another way could be by asking 5 colleagues of yours to
rate the movie.
This should provide a better idea of the movie. This method may
provide honest ratings for your movie. But a problem still exists.
These 5 people may not be “Subject Matter Experts” on the topic of
your movie. Sure, they might understand the cinematography, the
shots, or the audio, but at the same time may not be the best judges
of dark humor.
• C: How about asking 50 people to rate the movie?
Some of which can be your friends, some of them can be your
colleagues and some may even be total strangers.
Contd..
• The responses, in this case, would be more generalized
and diversified since now you have people with different
sets of skills.
• And as it turns out – this is a better approach to get
honest ratings than the previous cases we saw.
• With these examples, you can infer that a diverse group
of people are likely to make better decisions as
compared to individuals.
• Similar is true for a diverse set of models in comparison
to single models.
• This diversification in Machine Learning is achieved by a
technique called Ensemble Learning.
Ensemble Models in Machine Learning: Example

• Let’s imagine a music manager participating in an


international competition. They have access to a wide
variety of musicians with different expertise:
• Classical musicians with the ability to compose
traditional pieces.
• Electronic musicians who are experts in using electronic
instruments.
• Jazz musicians with a great sense of improvisation.
• Soloist musicians who can perform complex solos and
highlight their technical abilities.
Contd..
• Given the broad spectrum of musical expertise and background, the
manager can combine all of them to create a unique and memorable
performance.
• Think of ensemble models as an orchestra of musicians, where each
person specializes in a specific instrument like piano, trumpet, drum,
and more. The combination of those skills creates a harmonious
melody.
• Ensemble learning uses the same logic:
• It combines multiple algorithms to obtain better predictive
performance than the one from a single model.
• There is no predefined number of models to consider, and some
business goals may require more models than others.
Contd..
• Ensemble models are a machine learning approach to combine
multiple other models in the prediction process.
• These models are referred to as base estimators.
• Ensemble models offer a solution to overcome the technical
challenges of building a single estimator.
• Ensemble learning is a machine learning technique that enhances
accuracy and resilience in forecasting by merging predictions from
multiple models.
• It aims to mitigate errors or biases that may exist in individual models
by leveraging the collective intelligence of the ensemble.
Contd..
• The underlying concept behind ensemble learning is to
combine the outputs of diverse models to create a more
precise prediction.
• By considering multiple perspectives and utilizing the
strengths of different models, ensemble learning
improves the overall performance of the learning
system.
• This approach not only enhances accuracy but also
provides resilience against uncertainties in the data.
• By effectively merging predictions from multiple
models, ensemble learning has proven to be a powerful
tool in various domains, offering more robust and
A classifier works by learning the relationship between input features and the class labels
in the training data, and then applying this learned relationship to predict the class of
new examples.
Types of Ensemble Techniques
• There are two techniques given below that are used to perform
ensemble decision trees.

• 1. Bagging

• 2. Boosting
Bagging
• Bagging (Bootstrap Aggregating) is an ensemble learning technique
designed to improve the accuracy, reliability, precision and stability of
machine learning algorithms.
• It entails generating numerous subsets of the training data by
employing random sampling with replacement.
• During prediction, the outputs of these base learners are aggregated,
often by averaging (for regression tasks) or voting (for classification
tasks), to produce the final prediction.
• Bagging helps to reduce overfitting by introducing diversity among the
base learners and improves the overall performance by reducing
variance and increasing robustness.
Bagging steps
• It involves the following steps:
• Data Sampling: Creating multiple subsets of the training dataset using
bootstrap sampling (random sampling with replacement).
• Model Training: Training a separate model on each subset of the data.
• Aggregation: Combining the predictions from all individual models
(averaged for regression or majority voting for classification) to
produce the final output.
Contd..
• Key Benefits:

• Reduces Variance: By averaging multiple predictions, bagging reduces


the variance of the model and helps prevent overfitting. [In
machine learning, variance is a measure of how much a
model's predictions change when using different parts
of the training data. ]
• Improves Accuracy: Combining multiple models usually leads to better
performance than individual models.
Boosting
• The term ‘Boosting’ refers to a family of algorithms
which converts weak learner to strong learners.
• Boosting is a sequential process, where each
subsequent model attempts to correct the errors of the
previous model.
• The succeeding models are dependent on the previous
model.
• Let’s understand this definition in detail by solving a
problem of spam email identification:
• How would you classify an email as SPAM or not?
• Like everyone else, our initial approach would be to
identify ‘spam’ and ‘not spam’ emails using following
criteria. If:
1.Email has only one image file (promotional image), It’s
a SPAM
2.Email has only link(s), It’s a SPAM
3.Email body consist of sentence like “You won a prize
money of $ xxxxxx”, It’s a SPAM
4.Email from official domain “mriu.edu.in” , Not a SPAM
5.Email from known source, Not a SPAM
• Above, we’ve defined multiple rules to classify an email
into ‘spam’ or ‘not spam’. But, do you think these rules
individually are strong enough to classify an email
successfully? No.
• Individually, these rules are not powerful enough to
classify an email into ‘spam’ or ‘not spam’. Therefore,
these rules are called as weak learner.
• To convert weak learner to strong learner, we’ll combine
the prediction of each weak learner using methods like:
• Using average/ weighted average
• Considering prediction has higher vote
• For example: Above, we have defined 5 weak learners.
Out of these 5, 3 are voted as ‘SPAM’ and 2 are voted as
‘Not a SPAM’. In this case, by default, we’ll consider an
email as SPAM because we have higher(3) vote for
‘SPAM’.
Boosting
• In Boosting, models are used one after the other, and the predictions
made by the first model are used as input to the next model.
• The last layer of models will use the predictions from all previous
layers to get the final predictions.
• So Boosting enables each subsequent model to boost the
performance of the previous one by overcoming or reducing the error
of the previous model.
Boosting
• How it works:
• A subset of the original training data is created with all the data points
having equal weight.
• This subset creates a base model that predicts the entire data set.
• Based on the actual and predicted values, errors are calculated, and
incorrect observations are given higher weights.
• A new model is created that tries to correct the errors and new
predictions are made on the data set.
• This process continues, with each new model attempting to correct
the previous model's errors until a final model or strong learner is
created.
• This model is the weighted mean of all the weak learners, and it
improves the overall performance of the ensemble.
Bagging Boosting

Various training data subsets are randomly drawn with Each new subset contains the components that were
replacement from the whole training dataset. misclassified by previous models.

Bagging attempts to tackle the over-fitting issue. Boosting tries to reduce bias.

If the classifier is unstable (high variance), then we need to If the classifier is steady and straightforward (high bias),
apply bagging. then we need to apply boosting.

Every model receives an equal weight. Models are weighted by their performance.

Objective to decrease variance, not bias. Objective to decrease bias, not variance.

It is the easiest way of connecting predictions that belong It is a way of connecting predictions that belong to the
to the same type. different types.

New models are affected by the performance of the


Every model is constructed independently.
previously developed model.

You might also like