0% found this document useful (0 votes)

166 views51 pages

Naïve Bayes Classifier Overview

The document discusses Naive Bayes classification. It begins by introducing classification techniques, including supervised and unsupervised methods. It then explains Bayesian classifiers and Bayes' theorem of probability. The core of Naive Bayes is that it assigns class membership probabilities based on applying Bayes' theorem, assuming attribute independence given class. An example uses air traffic data to classify flight arrival times based on attributes like season, fog, and rain into classes like on time, late, or cancelled.

Uploaded by

Praneeth Kadiyam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

166 views51 pages

Naïve Bayes Classifier Overview

Uploaded by

Praneeth Kadiyam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Lecture #8

Classification: Naïve Bayes’ Classifier

Today’s discussion…
 Introduction to Classification

 Classification Techniques
 Supervised and unsupervised classification

 Formal statement of supervised classification technique

 Bayesian Classifier
 Principle of Bayesian classifier
 Bayes’ theorem of probability

 Naïve Bayesian Classifier

CS 40003: Data Analytics 2

A Simple Quiz: Identify the objects

CS 40003: Data Analytics 3

Introduction to Classification
Example 8.1
 Teacher classify students as A, B, C, D and F based on their marks. The
following is one simple classification rule:

Mark ≥ 𝟗𝟎 : A
𝟗𝟎 > Mark ≥ 𝟖𝟎 : B
8𝟎 > Mark ≥ 𝟕𝟎 : C
𝟕𝟎 > Mark ≥ 𝟔𝟎 : D
6𝟎 > Mark : F
Note:
Here, we apply the above rule to a specific data
(in this case a table of marks).

CS 40003: Data Analytics 4

Examples of Classification in Data Analytics
 Life Science: Predicting tumor cells as benign or malignant

 Security: Classifying credit card transactions as legitimate or

fraudulent

 Prediction: Weather, voting, political dynamics, etc.

 Entertainment: Categorizing news stories as finance, weather,

entertainment, sports, etc.

 Social media: Identifying the current trend and future growth

CS 40003: Data Analytics 5

Classification : Definition
 Classification is a form of data analysis to extract models describing
important data classes.

 Essentially, it involves dividing up objects so that each is assigned to one of

a number of mutually exhaustive and exclusive categories known as
classes.
 The term “mutually exhaustive and exclusive” simply means that each object
must be assigned to precisely one class
 That is, never to more than one and never to no class at all.

CS 40003: Data Analytics 6

Classification Techniques
 Classification consists of assigning a class label to a set of unclassified
cases.

 Supervised Classification
 The set of possible classes is known in advance.

 Unsupervised Classification
 Set of possible classes is not known. After classification we can try to assign a
name to that class.
 Unsupervised classification is called clustering.

CS 40003: Data Analytics 7

Supervised Classification

CS 40003: Data Analytics 8

Unsupervised Classification

CS 40003: Data Analytics 9

Supervised Classification Technique
 Given a collection of records (training set )
 Each record contains a set of attributes, one of the attributes is the
class.

 Find a model for class attribute as a function of the values of

other attributes.
 Goal: Previously unseen records should be assigned a class as
accurately as possible.
 Satisfy the property of “mutually exclusive and exhaustive”

CS 40003: Data Analytics 10

Illustrating Classification Tasks
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

14 No Small 95K ?

15 No Large 67K ?
10

Test Set

CS 40003: Data Analytics 11

Classification Problem
 More precisely, a classification problem can be stated as below:

Definition 8.1: Classification Problem

Given a database D = 𝑡1 , 𝑡2 , … . . , 𝑡𝑚 of tuples and a set of classes C =

𝑐1 , 𝑐2 , … . . , 𝑐𝑘 , the classification problem is to define a mapping f ∶ D → 𝐶,

Where each 𝑡𝑖 is assigned to one class.

Note that tuple 𝑡𝑖 ∈ 𝐷 is defined by a set of attributes 𝐴 = 𝐴1 , 𝐴2 , … . . , 𝐴𝑛 .

CS 40003: Data Analytics 12

Classification Techniques
 A number of classification techniques are known, which can be broadly
classified into the following categories:

1. Statistical-Based Methods
• Regression
• Bayesian Classifier
•

2. Distance-Based Classification
• K-Nearest Neighbours

3. Decision Tree-Based Classification

• ID3, C 4.5, CART

5. Classification using Machine Learning (SVM)

6. Classification using Neural Network (ANN)

CS 40003: Data Analytics 13
Classification Techniques
 A number of classification techniques are known, which can be broadly
classified into the following categories:

1. Statistical-Based Methods
• Regression
• Bayesian Classifier
•

2. Distance-Based Classification
• K-Nearest Neighbours

3. Decision Tree-Based Classification

• ID3, C 4.5, CART

5. Classification using Machine Learning (SVM)

6. Classification using Neural Network (ANN)

CS 40003: Data Analytics 14
Bayesian Classifier

CS 40003: Data Analytics 15

Bayesian Classifier
 Principle
 If it walks like a duck, quacks like a duck, then it is probably a duck

CS 40003: Data Analytics 16

Bayesian Classifier
 A statistical classifier
 Performs probabilistic prediction, i.e., predicts class membership probabilities

 Foundation
 Based on Bayes’ Theorem.

 Assumptions
1. The classes are mutually exclusive and exhaustive.
2. The attributes are independent given the class.

 Called “Naïve” classifier because of these assumptions.

 Empirically proven to be useful.
 Scales very well.

CS 40003: Data Analytics 17

Example: Bayesian Classification
 Example 8.2: Air Traffic Data

 Let us consider a set

observation recorded in a
database
 Regarding the arrival of airplanes
in the routes from any airport to
New Delhi under certain
conditions.

CS 40003: Data Analytics 18

Air-Traffic Data
Days Season Fog Rain Class
Weekday Spring None None On Time
Weekday Winter None Slight On Time
Weekday Winter None None On Time
Holiday Winter High Slight Late
Saturday Summer Normal None On Time
Weekday Autumn Normal None Very Late
Holiday Summer High Slight On Time
Sunday Summer Normal None On Time
Weekday Winter High Heavy Very Late
Weekday Summer None Slight On Time

Cond. to next slide…

CS 40003: Data Analytics 19

Air-Traffic Data
Cond. from previous slide…

Days Season Fog Rain Class

Saturday Spring High Heavy Cancelled
Weekday Summer High Slight On Time
Weekday Winter Normal None Late
Weekday Summer High None On Time
Weekday Winter Normal Heavy Very Late
Saturday Autumn High Slight On Time
Weekday Autumn None Heavy On Time
Holiday Spring Normal Slight On Time
Weekday Spring Normal None On Time
Weekday Spring Normal Heavy On Time

CS 40003: Data Analytics 20

Air-Traffic Data
 In this database, there are four attributes
A = [ Day, Season, Fog, Rain]
with 20 tuples.
 The categories of classes are:
C= [On Time, Late, Very Late, Cancelled]

 Given this is the knowledge of data and classes, we are to find most likely
classification for any other unseen instance, for example:

Week Day Winter High None ???

 Classification technique eventually to map this tuple into an accurate class.

CS 40003: Data Analytics 21

Bayesian Classifier
 In many applications, the relationship between the attributes set and the
class variable is non-deterministic.
 In other words, a test cannot be classified to a class label with certainty.

 In such a situation, the classification can be achieved probabilistically.

 The Bayesian classifier is an approach for modelling probabilistic

relationships between the attribute set and the class variable.

 More precisely, Bayesian classifier use Bayes’ Theorem of Probability for

classification.

 Before going to discuss the Bayesian classifier, we should have a quick look
at the Theory of Probability and then Bayes’ Theorem.

CS 40003: Data Analytics 22

Bayes’ Theorem of Probability

CS 40003: Data Analytics 23

Simple Probability
Definition 8.2: Simple Probability

If there are n elementary events associated with a random experiment and m of n

of them are favorable to an event A, then the probability of happening or
occurrence of A is
𝑚
𝑃 𝐴 =
𝑛

CS 40003: Data Analytics 24

Simple Probability
 Suppose, A and B are any two events and P(A), P(B) denote the
probabilities that the events A and B will occur, respectively.

 Mutually Exclusive Events:

 Two events are mutually exclusive, if the occurrence of one precludes the
occurrence of the other.
Example: Tossing a coin (two events)
Tossing a ludo cube (Six events)

Can you give an example, so that two events are not mutually exclusive?
Hint: Tossing two identical coins, Weather (sunny, foggy, warm)

CS 40003: Data Analytics 25

Simple Probability
 Independent events: Two events are independent if occurrences of one
does not alter the occurrence of other.

Example: Tossing both coin and ludo cube together.

(How many events are here?)

Can you give an example, where an event is dependent on one or more

other events(s)?
Hint: Receiving a message (A) through a communication channel (B)
over a computer (C), rain and dating.

CS 40003: Data Analytics 26

Joint Probability
Definition 8.3: Joint Probability

If P(A) and P(B) are the probability of two events, then

𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵 −𝑃 𝐴∩𝐵

If A and B are mutually exclusive, then 𝑃 𝐴 ∩ 𝐵 = 0

If A and B are independent events, then 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 . 𝑃(𝐵)

Thus, for mutually exclusive events

𝑃 𝐴∪𝐵 =𝑃 𝐴 +𝑃 𝐵

CS 40003: Data Analytics 27

Conditional Probability
Definition 8.2: Conditional Probability

If events are dependent, then their probability is expressed by conditional

probability. The probability that A occurs given that B is denoted by 𝑃(𝐴|𝐵).

Suppose, A and B are two events associated with a random experiment. The
probability of A under the condition that B has already occurred and 𝑃(𝐵) ≠ 0 is
given by

Number of events in 𝐵 which are favourable to 𝐴

𝑃 𝐴𝐵 =
Number of events in 𝐵

Number of events favourable to 𝐴 ∩ 𝐵

=
Number of events favourable to 𝐵

𝑃(𝐴 ∩ 𝐵)
=
𝑃(𝐵)
28
CS 40003: Data Analytics
Conditional Probability
Corollary 8.1: Conditional Probability

𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 .𝑃 𝐵 𝐴 , 𝑖𝑓 𝑃 𝐴 ≠ 0
or 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 .𝑃 𝐴 𝐵 , 𝑖𝑓 𝑃(𝐵) ≠ 0

For three events A, B and C

𝑃 𝐴 ∩ 𝐵 ∩ 𝐶 = 𝑃 𝐴 .𝑃 𝐵 .𝑃 𝐶 𝐴 ∩ 𝐵

For n events A1, A2, …, An and if all events are mutually independent to each other

𝑃 𝐴1 ∩ 𝐴2 ∩ … … … … ∩ 𝐴𝑛 = 𝑃 𝐴1 . 𝑃 𝐴2 … … … … 𝑃 𝐴𝑛
Note:
𝑃 𝐴𝐵 =0 if events are mutually exclusive
𝑃 𝐴𝐵 =𝑃 𝐴 if A and B are independent
𝑃 𝐴 𝐵 ⋅ 𝑃 𝐵 = 𝑃 𝐵 𝐴 ⋅ 𝑃(𝐴) otherwise,
P A ∩ B = P(B ∩ A)
29
CS 40003: Data Analytics
Conditional Probability
 Generalization of Conditional Probability:

P(A ∩ B) P(B ∩ A)
P AB = =
P(B) P(B)

P(B|A)∙P(A)
= ∵P A ∩ B = P(B|A) ∙P(A) = P(A|B) ∙P(B)
P(B)

ഥ , where A
By the law of total probability : P(B) = P B ∩ A ∪ B ∩ A ഥ denotes
the compliment of event A. Thus,

P(B|A) ∙ P(A)
P AB =
P B∩A ∪ B∩A ഥ

P BA∙P(A)
=
P BA ∙P A +P(B│Aഥ )∙P(A
ഥ)
CS 40003: Data Analytics 30
Conditional Probability
In general,
P(A) ∙ P D A
P AD =
P A ∙ P D A + P B ∙ P D B + P(C) ∙ P(D│C)

CS 40003: Data Analytics 31

Total Probability
Definition 8.3: Total Probability

Let 𝐸1 , 𝐸2 , … … 𝐸𝑛 be n mutually exclusive and exhaustive events associated with a

random experiment. If A is any event which occurs with 𝐸1 𝑜𝑟 𝐸2 𝑜𝑟 … … 𝐸𝑛 , then

𝑃 𝐴 = 𝑃 𝐸1 . 𝑃 𝐴 𝐸1 + 𝑃 𝐸2 . 𝑃 𝐴 𝐸2 + ⋯ … … … . +𝑃 𝐸𝑛 . 𝑃(𝐴|𝐸𝑛 )

CS 40003: Data Analytics 32

Total Probability: An Example
Example 8.3
A bag contains 4 red and 3 black balls. A second bag contains 2 red and 4 black balls.
One bag is selected at random. From the selected bag, one ball is drawn. What is the
probability that the ball drawn is red?

This problem can be answered using the concept of Total Probability

𝐸1 =Selecting bag I
𝐸2 =Selecting bag II
A = Drawing the red ball

Thus, 𝑃 𝐴 = 𝑃 𝐸1 . 𝑃 𝐴|𝐸1 + 𝑃 𝐸2 . 𝑃(𝐴|𝐸2 )

where, 𝑃 𝐴|𝐸1 = Probability of drawing red ball when first bag has been chosen
and 𝑃 𝐴|𝐸2 = Probability of drawing red ball when second bag has been chosen

CS 40003: Data Analytics 33

Reverse Probability
Example 8.3:
A bag (Bag I) contains 4 red and 3 black balls. A second bag (Bag II) contains 2 red and 4
black balls. You have chosen one ball at random. It is found as red ball. What is the
probability that the ball is chosen from Bag I?

Here,
𝐸1 = Selecting bag I
𝐸2 = Selecting bag II
A = Drawing the red ball

We are to determine P(𝐸1 |A). Such a problem can be solved using Bayes' theorem of
probability.

CS 40003: Data Analytics 34

Bayes’ Theorem
Theorem 8.4: Bayes’ Theorem

Let 𝐸1 , 𝐸2 , … … 𝐸𝑛 be n mutually exclusive and exhaustive events associated with a

random experiment. If A is any event which occurs with 𝐸1 𝑜𝑟 𝐸2 𝑜𝑟 … … 𝐸𝑛 , then

𝑃 𝐸𝑖 . 𝑃(𝐴|𝐸𝑖 )
𝑃(𝐸𝑖 𝐴 =
σ𝑛𝑖=1 𝑃 𝐸𝑖 . 𝑃(𝐴|𝐸𝑖 )

CS 40003: Data Analytics 35

Prior and Posterior Probabilities
 P(A) and P(B) are called prior probabilities X Y
 P(A|B), P(B|A) are called posterior probabilities
𝑥1 A
Example 8.6: Prior versus Posterior Probabilities 𝑥2 A
 This table shows that the event Y has two outcomes
namely A and B, which is dependent on another event X 𝑥3 B
with various outcomes like 𝑥1 , 𝑥2 and 𝑥3 . 𝑥3 A
 Case1: Suppose, we don’t have any information of the
event A. Then, from the given sample space, we can 𝑥2 B
5
calculate P(Y = A) = 10 = 0.5 𝑥1 A
•

 Case2: Now, suppose, we want to calculate P(X = 𝑥1 B

2
𝑥2 |Y =A) = 5 = 0.4 .
𝑥3 B
The later is the conditional or posterior probability, where 𝑥2 B
as the former is the prior probability.
𝑥2 A
CS 40003: Data Analytics 36
Naïve Bayesian Classifier
 Suppose, Y is a class variable and X = 𝑋1, 𝑋2 , … . . , 𝑋𝑛 is a set of attributes,
with instance of Y.

INPUT (X) CLASS(Y)

… … …
… … … …
𝑥 1, 𝑥 2 , … , 𝑥 𝑛 𝑦 𝑖
… … … …

 The classification problem, then can be expressed as the class-conditional

probability
𝑃 𝑌 = 𝑦𝑖 | 𝑋1 = 𝑥1 AND 𝑋2 = 𝑥2 AND … . . 𝑋𝑛 = 𝑥𝑛

CS 40003: Data Analytics 37

Naïve Bayesian Classifier
 Naïve Bayesian classifier calculate this posterior probability using Bayes’
theorem, which is as follows.
 From Bayes’ theorem on conditional probability, we have
𝑃(𝑋|𝑌)∙𝑃(𝑌)
𝑃 𝑌𝑋 =
𝑃(𝑋)
𝑃(𝑋|𝑌) ∙ 𝑃(𝑌)
=
𝑃 𝑋 𝑌 = 𝑦1 ∙ 𝑃 𝑌 = 𝑦1 + ⋯ + 𝑃 𝑋 𝑌 = 𝑦𝑘 ∙ 𝑃 𝑌 = 𝑦𝑘
where,
𝑃 𝑋 = σ𝑘𝑖=1 𝑃(𝑋|𝑌 = 𝑦𝑖 ) ∙ 𝑃(Y = 𝑦𝑖 )
Note:
 𝑃 𝑋 is called the evidence (also the total probability) and it is a constant.
 The probability P(Y|X) (also called class conditional probability) is therefore
proportional to P(X|Y)∙ 𝑃(𝑌).
 Thus, P(Y|X) can be taken as a measure of Y given that X.
P(Y|X) ≈ 𝑃 𝑋 𝑌 ∙ 𝑃(𝑌)

CS 40003: Data Analytics 38

Naïve Bayesian Classifier
 Suppose, for a given instance of X (say x = (𝑋1 = 𝑥1 ) and ….. (𝑋𝑛 = 𝑥𝑛 )).

 There are any two class conditional probabilities namely P(Y= 𝑦𝑖 |X=x)
and P(Y= 𝑦𝑗 | X=x).

 If P(Y= 𝑦𝑖 | X=x) > P(Y= 𝑦𝑗 | X=x), then we say that 𝑦𝑖 is more stronger
than 𝑦𝑗 for the instance X = x.

 The strongest 𝑦𝑖 is the classification for the instance X = x.

CS 40003: Data Analytics 39

Naïve Bayesian Classifier
 Example: With reference to the Air Traffic Dataset mentioned earlier, let
us tabulate all the posterior and prior probabilities as shown below.
Class
Attribute On Time Late Very Late Cancelled
Weekday 9/14 = 0.64 ½ = 0.5 3/3 = 1 0/1 = 0
Saturday 2/14 = 0.14 ½ = 0.5 0/3 = 0 1/1 = 1
Day

Sunday 1/14 = 0.07 0/2 = 0 0/3 = 0 0/1 = 0

Holiday 2/14 = 0.14 0/2 = 0 0/3 = 0 0/1 = 0
Spring 4/14 = 0.29 0/2 = 0 0/3 = 0 0/1 = 0
Season

Summer 6/14 = 0.43 0/2 = 0 0/3 = 0 0/1 = 0

Autumn 2/14 = 0.14 0/2 = 0 1/3= 0.33 0/1 = 0
Winter 2/14 = 0.14 2/2 = 1 2/3 = 0.67 0/1 = 0
CS 40003: Data Analytics 40
Naïve Bayesian Classifier

Class
Attribute On Time Late Very Late Cancelled
None 5/14 = 0.36 0/2 = 0 0/3 = 0 0/1 = 0
Fog

High 4/14 = 0.29 1/2 = 0.5 1/3 = 0.33 1/1 = 1

Normal 5/14 = 0.36 1/2 = 0.5 2/3 = 0.67 0/1 = 0
None 5/14 = 0.36 1/2 = 0.5 1/3 = 0.33 0/1 = 0
Rain

Slight 8/14 = 0.57 0/2 = 0 0/3 = 0 0/1 = 0

Heavy 1/14 = 0.07 1/2 = 0.5 2/3 = 0.67 1/1 = 1
Prior Probability 14/20 = 0.70 2/20 = 0.10 3/20 = 0.15 1/20 = 0.05

CS 40003: Data Analytics 41

Naïve Bayesian Classifier
Instance:

Week Day Winter High Heavy ???

Case1: Class = On Time : 0.70 × 0.64 × 0.14 × 0.29 × 0.07 = 0.0013

Case2: Class = Late : 0.10 × 0.50 × 1.0 × 0.50 × 0.50 = 0.0125

Case3: Class = Very Late : 0.15 × 1.0 × 0.67 × 0.33 × 0.67 = 0.0222

Case4: Class = Cancelled : 0.05 × 0.0 × 0.0 × 1.0 × 1.0 = 0.0000

Case3 is the strongest; Hence correct classification is Very Late

CS 40003: Data Analytics 42

Naïve Bayesian Classifier
Algorithm: Naïve Bayesian Classification
Input: Given a set of k mutually exclusive and exhaustive classes C =
𝑐1 , 𝑐2 , … . . , 𝑐𝑘 , which have prior probabilities P(C1), P(C2),….. P(Ck).

There are n-attribute set A = 𝐴1 , 𝐴2 , … . . , 𝐴𝑛 , which for a given instance have

values 𝐴1 = 𝑎1 , 𝐴2 = 𝑎2 ,….., 𝐴𝑛 = 𝑎𝑛

Step: For each 𝑐𝑖 ∈ C, calculate the class condition probabilities, i = 1,2,…..,k

𝑝𝑖 = 𝑃 𝐶𝑖 × ς𝑛𝑗=1 𝑃(𝐴𝑗 = 𝑎𝑗 |𝐶𝑖 )
𝑝𝑥 = max 𝑝1 , 𝑝2 , … . . , 𝑝𝑘

Output: 𝐶𝑥 is the classification

Note: σ 𝒑𝒊 ≠ 𝟏, because they are not probabilities rather proportion values (to
posterior probabilities)

CS 40003: Data Analytics 43

Naïve Bayesian Classifier
Pros and Cons
 The Naïve Bayes’ approach is a very popular one, which often works well.

 However, it has a number of potential problems

 It relies on all attributes being categorical.

 If the data is less, then it estimates poorly.

CS 40003: Data Analytics 44

Naïve Bayesian Classifier
Approach to overcome the limitations in Naïve Bayesian Classification
 Estimating the posterior probabilities for continuous attributes
 In real life situation, all attributes are not necessarily be categorical, In fact, there is a
mix of both categorical and continuous attributes.

 In the following, we discuss the schemes to deal with continuous attributes in Bayesian
classifier.
1. We can discretize each continuous attributes and then replace the continuous
values with its corresponding discrete intervals.

2. We can assume a certain form of probability distribution for the continuous variable
and estimate the parameters of the distribution using the training data. A Gaussian
distribution is usually chosen to represent the posterior probabilities for continuous
attributes. A general form of Gaussian distribution will look like
2
1 x−μ
P x: μ, σ2 = e−
2πσ 2σ2
2
where, μ and σ denote mean and variance, respectively.

CS 40003: Data Analytics 45

Naïve Bayesian Classifier
For each class Ci, the posterior probabilities for attribute Aj (it is the numeric
attribute) can be calculated following Gaussian normal distribution as
follows.
2
1 aj − μij
P Aj = aj|Ci = e−
2πσij 2σij2
Here, the parameter μij can be calculated based on the sample mean of
attribute value of Aj for the training records that belong to the class Ci.

Similarly, σij2 can be estimated from the calculation of variance of such

training records.

CS 40003: Data Analytics 46

Naïve Bayesian Classifier
M-estimate of Conditional Probability

 The M-estimation is to deal with the potential problem of Naïve Bayesian

Classifier when training data size is too poor.
 If the posterior probability for one of the attribute is zero, then the overall class-
conditional probability for the class vanishes.

 In other words, if training data do not cover many of the attribute values, then we may
not be able to classify some of the test records.

 This problem can be addressed by using the M-estimate approach.

CS 40003: Data Analytics 47

M-estimate Approach
 M-estimate approach can be stated as follows
𝑛𝑐𝑖 +𝑚𝑝
P Aj = aj|Ci =
𝑛+𝑚

where, n = total number of instances from class C𝑖

𝑛𝑐𝑖 = number of training examples from class C𝑖 that take the value Aj =aj
m = it is a parameter known as the equivalent sample size, and
p = is a user specified parameter.

Note:
If n = 0, that is, if there is no training set available, then 𝑃 ai|C𝑖 = p,
so, this is a different value, in absence of sample value.

CS 40003: Data Analytics 48

A Practice Example
age income studentcredit_rating
buys_compu
Example 8.4 <=30 high no fair no
<=30 high no excellent no
Class: 31…40 high no fair yes
C1:buys_computer = ‘yes’
>40 medium no fair yes
C2:buys_computer = ‘no’
>40 low yes fair yes
>40 low yes excellent no
Data instance
31…40 low yes excellent yes
X = (age <=30,
<=30 medium no fair no
Income = medium,
<=30 low yes fair yes
Student = yes
>40 medium yes fair yes
Credit_rating = fair)
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
CS 40003: Data Analytics 49
A Practice Example
 P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357

 Compute P(X|Ci) for each class

P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

 X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 × 0.444 × 0.667 × 0.667 = 0.044

P(X|buys_computer = “no”) = 0.6 × 0.4 × 0.2 × 0.4 = 0.019

P(X|Ci)P(Ci) : P(X|buys_computer = “yes”) P(buys_computer = “yes”) = 0.028

P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

Therefore, X belongs to class (“buys_computer = yes”)

CS 40003: Data Analytics 50
Reference

 The detail material related to this lecture can be found in

Data Mining: Concepts and Techniques, (3rd Edn.), Jiawei Han, Micheline Kamber,
Morgan Kaufmann, 2015.

Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar,
Addison-Wesley, 2014

CS 40003: Data Analytics 51

Dynamic Programming in Algorithms
No ratings yet
Dynamic Programming in Algorithms
17 pages
Ai Unit 2
No ratings yet
Ai Unit 2
135 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
Machine Learning Concepts and Techniques
No ratings yet
Machine Learning Concepts and Techniques
12 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
21 pages
Machine Learning for Tech Enthusiasts
No ratings yet
Machine Learning for Tech Enthusiasts
12 pages
Concept Learning in Machine Learning
No ratings yet
Concept Learning in Machine Learning
71 pages
Lab Program
100% (1)
Lab Program
15 pages
Graph Mining Techniques Overview
No ratings yet
Graph Mining Techniques Overview
23 pages
DBSCAN Clustering Explained in ML
No ratings yet
DBSCAN Clustering Explained in ML
15 pages
AI Game Strategies Explained
No ratings yet
AI Game Strategies Explained
21 pages
Local Search Algorithms in AI Fundamentals
No ratings yet
Local Search Algorithms in AI Fundamentals
32 pages
Understanding Expert Systems and Their Components
No ratings yet
Understanding Expert Systems and Their Components
25 pages
Algorithm Design for Students
No ratings yet
Algorithm Design for Students
24 pages
ConceptLearning-Candidate Elimination Algorithm
No ratings yet
ConceptLearning-Candidate Elimination Algorithm
45 pages
ML Notes (III BCA)
No ratings yet
ML Notes (III BCA)
64 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
Divide and Conquer
No ratings yet
Divide and Conquer
54 pages
Module-02 AIML NOTES
No ratings yet
Module-02 AIML NOTES
29 pages
Confusion Matrix Examples
No ratings yet
Confusion Matrix Examples
2 pages
Iterative Improvement & Graph Theory Questions
No ratings yet
Iterative Improvement & Graph Theory Questions
12 pages
Data Mining: Classification & Prediction
No ratings yet
Data Mining: Classification & Prediction
16 pages
Python Module-4
No ratings yet
Python Module-4
109 pages
Supervised Learning Naive Biased Algorithm in NLP
No ratings yet
Supervised Learning Naive Biased Algorithm in NLP
7 pages
Lecture 02 Part A - Uninformed or Blind Search
No ratings yet
Lecture 02 Part A - Uninformed or Blind Search
92 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Module 3 Informed Search Techniques and Knowledge Representation
No ratings yet
Module 3 Informed Search Techniques and Knowledge Representation
26 pages
Ai Model Question Paper-4
No ratings yet
Ai Model Question Paper-4
23 pages
Daa Ktu Notes
No ratings yet
Daa Ktu Notes
112 pages
ML Unit-3 Notes
No ratings yet
ML Unit-3 Notes
26 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
Single-Layer Perceptron Guide
No ratings yet
Single-Layer Perceptron Guide
39 pages
Unit 5
No ratings yet
Unit 5
39 pages
Artificialintelligence
100% (1)
Artificialintelligence
18 pages
Unit-9: Introduction To NP-: Completeness
No ratings yet
Unit-9: Introduction To NP-: Completeness
20 pages
UNIT 1 - Introduction (Types of Machine Learning)
100% (1)
UNIT 1 - Introduction (Types of Machine Learning)
21 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
52 pages
Concept Learning in Machine Learning
No ratings yet
Concept Learning in Machine Learning
16 pages
Combining Classifiers in Machine Learning An Introductory Guide
No ratings yet
Combining Classifiers in Machine Learning An Introductory Guide
11 pages
Types of Fuzzy Sets
No ratings yet
Types of Fuzzy Sets
7 pages
CH 5 Reasoning Under Uncertainty
No ratings yet
CH 5 Reasoning Under Uncertainty
32 pages
Lecture 2: Problem Solving Using State Space Representation
No ratings yet
Lecture 2: Problem Solving Using State Space Representation
37 pages
358 33 Powerpoint Slides DSC Chapter 15
No ratings yet
358 33 Powerpoint Slides DSC Chapter 15
55 pages
Noc20 Cs81 Assignment 01 Week 05
No ratings yet
Noc20 Cs81 Assignment 01 Week 05
6 pages
Machine Learning Theory Essentials
No ratings yet
Machine Learning Theory Essentials
9 pages
Heuristic Search Techniques
No ratings yet
Heuristic Search Techniques
54 pages
Confusion Matrix
No ratings yet
Confusion Matrix
4 pages
Practical No: 03: Aim: A) Write A Program To Implement Alpha Beta Search. Code
No ratings yet
Practical No: 03: Aim: A) Write A Program To Implement Alpha Beta Search. Code
6 pages
N Queen Problem
No ratings yet
N Queen Problem
12 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
Bca Question Papers Blue Print
60% (5)
Bca Question Papers Blue Print
11 pages
DWDM Unit 6 Cluster Analysis
No ratings yet
DWDM Unit 6 Cluster Analysis
183 pages
AI Unit 2 - PPT - Latest
No ratings yet
AI Unit 2 - PPT - Latest
61 pages
Machine Learning Lab Viva
No ratings yet
Machine Learning Lab Viva
3 pages
C Programming Lab Guide
No ratings yet
C Programming Lab Guide
6 pages
Rule-Based Classification Guide
No ratings yet
Rule-Based Classification Guide
4 pages
Data Analytics: Classification: Naïve Bayes' Classifier
No ratings yet
Data Analytics: Classification: Naïve Bayes' Classifier
53 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
50 pages
Baysian
No ratings yet
Baysian
53 pages
Naïve Bayes Classifier Pros and Cons
No ratings yet
Naïve Bayes Classifier Pros and Cons
53 pages
13 - Calculate Rank Correlation Coefficient
No ratings yet
13 - Calculate Rank Correlation Coefficient
3 pages
Presentation VNIT
No ratings yet
Presentation VNIT
15 pages
Ass 1
No ratings yet
Ass 1
6 pages
Module - 6 PROB
No ratings yet
Module - 6 PROB
145 pages
Chapter1 Statistic
No ratings yet
Chapter1 Statistic
33 pages
BCS301 QP 2024
No ratings yet
BCS301 QP 2024
4 pages
Probability & Statistics For Engineers & Scientists, 9th Edition Ronald E. Walpole
No ratings yet
Probability & Statistics For Engineers & Scientists, 9th Edition Ronald E. Walpole
41 pages
Properties of Continuous Random Variables
No ratings yet
Properties of Continuous Random Variables
26 pages
Characteristics of Good Averages in Statistics
No ratings yet
Characteristics of Good Averages in Statistics
2 pages
Decision Theory
No ratings yet
Decision Theory
24 pages
IIM Calcutta Statistics Exam 2020
No ratings yet
IIM Calcutta Statistics Exam 2020
3 pages
STA301-Mid Term Solved Subjective With References
100% (1)
STA301-Mid Term Solved Subjective With References
23 pages
Ap24 Apc Statistics q4
No ratings yet
Ap24 Apc Statistics q4
13 pages
Probability & Monte Carlo Analysis
No ratings yet
Probability & Monte Carlo Analysis
66 pages
COR 006 CPE1 Reviewer
No ratings yet
COR 006 CPE1 Reviewer
15 pages
Statistics 1 Discrete Random Variables Past Examination
No ratings yet
Statistics 1 Discrete Random Variables Past Examination
24 pages
Statistical Sampling & Parameter Estimation: Prof M.Shashi
No ratings yet
Statistical Sampling & Parameter Estimation: Prof M.Shashi
25 pages
FRM Level 1 Quantitative Analysis Guide
No ratings yet
FRM Level 1 Quantitative Analysis Guide
156 pages
Factor of Safety and Probability of Failure in Concrete Dams
No ratings yet
Factor of Safety and Probability of Failure in Concrete Dams
6 pages
Math 2750
No ratings yet
Math 2750
112 pages
And Estimation Sampling Distributions: Learning Outcomes
No ratings yet
And Estimation Sampling Distributions: Learning Outcomes
12 pages
Quantitative Technique: Subject Code: IMT-24
No ratings yet
Quantitative Technique: Subject Code: IMT-24
7 pages
Axioms of Probability
No ratings yet
Axioms of Probability
40 pages
Statistics Homework Solutions
No ratings yet
Statistics Homework Solutions
14 pages
Probability & Statistics for Actuarial Science
No ratings yet
Probability & Statistics for Actuarial Science
3 pages
Sampling & Distribution Guide
100% (1)
Sampling & Distribution Guide
26 pages
Monte Carlo & Markov Chains Lecture
No ratings yet
Monte Carlo & Markov Chains Lecture
18 pages
Understanding Binomial Distribution Basics
No ratings yet
Understanding Binomial Distribution Basics
23 pages
PAS206 Exercises Unit 2A (Answers)
100% (1)
PAS206 Exercises Unit 2A (Answers)
6 pages