1.
Introduction to Bayes Classifier
A Bayes classifier is a simple but powerful classification algorithm that applies
Bayes' Theorem to predict the class label of a given instance. It is based on the
assumption that the probability of a class given the input data can be computed
using prior knowledge about the distribution of data.
It is widely used in applications such as:
• Spam detection
• Medical diagnosis
• Sentiment analysis
• Document classification
Bayes classifiers work by estimating the probability of a class label given a set of
observed features and choosing the class with the highest probability.
Example :
A new COVID-19 test claims to have 90% true positive rate (sensitivity) and 98%
true negative rate (specificity). In a population with a COVID-19 prevalence of
(one out of 1000), what is the chance that a patient who tested positive is truly
positive? Let us consider the following:
Let A be the event that a patient is truly positive;
So, P(A) = 0.001. So. A (SA) is the event of being truly negative.
So, P(A) = 0.999. Let B be the event that the patient tested positive.
We want P(A/B).
Let the data of sensitivity and specificity be summarized as shown in Table 4.1.
TABLE 4.1 Relevant probabilities of positive and negative tests True/Test
Positive Negative Positive 0.9 0.1
Negative 0.02 0.98
Bayes' Rule and Inference
Bayes’ rule is a mathematical formula used to update probabilities based on new
evidence. It is particularly useful in inference problems where we have prior
knowledge and want to determine the likelihood of an event given observed
data.
The formula for Bayes' theorem is:
Where:
• P(C) is the prior probability of event C occurring.
• P(A∣C) is the likelihood, or the probability of observing AAA given that C is
true.
• P(A) is the total probability of observing A.
• P(C∣A) is the posterior probability, or the updated probability of C given
that A has been observed.
Now, let’s apply this to the given examples.
Example 4: Probability of Choosing Chest C1C_1C1 Given a White Ball
We are given two chests:
• Chest C1C_1C1 contains 20 white balls (WB) and 10 red balls (RB).
• Chest C2C_2C2 contains 15 white balls (WB) and 15 red balls (RB).
Since either chest is picked with equal probability:
Bayes' Rule and Classification
Bayes' Theorem Recap
Bayes' rule allows us to compute the probability of an event given some
observed evidence. It is given by:
This allows us to update our belief about event A after observing B.
This means that even if someone tests positive, there is only a 4.31% chance
they actually have COVID-19. The remaining 95.69% of positive tests are false
positives, which happens due to the low prevalence.
2. Chain Rule of Probability
The chain rule helps compute probabilities of events happening together. Given
three events A,B,C we can compute:
3. Bayes Rule in Classification
Bayes' theorem can be used for classification by computing the probability of
different classes given observed features.
4. Example: Classifying Objects (Chairs vs. Tables)
Given:
4. Example: Classifying Objects (Chairs vs. Tables)
Given:
4. Example: Classifying Objects (Chairs vs. Tables)
Given:
Summary
1. Bayes' Rule updates probabilities based on observed evidence.
2. Medical Testing Example showed that when a disease is rare, even a
highly accurate test can have many false positives.
3. The Chain Rule helps compute probabilities in multi-event scenarios.
4. Bayes' Rule in Classification assigns objects to the class with the highest
probability.
5. Chair vs. Table Example showed how we classify based on observed
features like weight.
This forms the basis for many machine learning classification models, like Naïve
Bayes classifiers.
The Bayes Classifier and Its Optimality
Understanding Bayes Classifier
The Bayes classifier is a probabilistic model that assigns a test sample to the class
with the highest posterior probability. It is based on Bayes' theorem, which
states:
Geometric Interpretation: Minimal Distance Classifier
When classes follow Gaussian distributions with equal covariance matrices, the
Bayes classifier reduces to a minimal distance classifier (MDC):
• Assign x to the class whose mean is closer to x in Euclidean distance.
• The decision boundary is the perpendicular bisector of the line joining the
two class means.
For multivariate Gaussian distributions, classification is based on the
Mahalanobis distance instead of Euclidean distance.
Multi-Class Classification
Definition
Multi-class classification is an extension of binary classification where a sample
is assigned to one of more than two classes.
Bayes Classifier for Multi-Class Problems
The Bayes Classifier and Its Optimality
Overview
The Bayes classifier is a probabilistic model that assigns a test pattern z to a class
based on the posterior probabilities of the classes given z. It is optimal because
it minimizes the probability of classification error.
Bayes' Theorem in Classification
From Bayes' theorem:
Conclusion
The Bayes classifier is optimal because it minimizes the average probability of
error. It assigns data points to the class with the highest posterior probability.
• When distributions are normal with equal covariance matrices,
classification is based on Euclidean distance.
• When covariance matrices differ, classification is based on Mahalanobis
distance.
This classifier serves as the foundation for more advanced machine learning
models, including Gaussian Naïve Bayes and Linear Discriminant Analysis
(LDA).
Class Conditional Independence
Class conditional independence is a key assumption used in probabilistic
classifiers like the Naïve Bayes classifier. It assumes that the features (attributes)
of a given data point are conditionally independent given the class. This means
that the probability of a set of features occurring together can be broken down
into the product of the probabilities of individual features occurring within a
specific class.
Key Observations from the Text
• Class conditional independence allows us to simplify the computation of
probabilities in classification problems.
• If class conditional independence holds, the decisions made by the Naïve
Bayes classifier will coincide with those of a Bayes classifier.
• In practice, the assumption of class conditional independence does not
always hold, which can lead to differences in classification performance.
• Using Maximum Likelihood Estimation (MLE) ensures that probabilities
are estimated without yielding zero-valued estimates.
• The Bayesian Estimation (BE) method can be used instead of MLE to
avoid zero-valued estimates.
Naïve Bayes Classifier (NBC)
Definition
The Naïve Bayes classifier is a probabilistic classification technique that
simplifies computations by assuming class conditional independence. It is a
special case of the Bayes classifier with the added assumption that feature values
are independent given the class.
Bayes Theorem
Key Takeaways
• Naïve Bayes classifier is a practical simplification of the Bayes classifier that
assumes class conditional independence.
• The MLE method is used to estimate probabilities without yielding zero
values.
• In cases where MLE gives zero probabilities, Bayesian Estimation (BE) can
be used.
• NBC performs well when the assumption of conditional independence holds,
but its performance may degrade when the features are highly correlated.