0% found this document useful (0 votes)

14 views17 pages

Proba Based Notes

Proba_Based_Notes ml

Uploaded by

fatima-zohra.lilia.amieur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views17 pages

Proba Based Notes

Proba_Based_Notes ml

Uploaded by

fatima-zohra.lilia.amieur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Proba-Based Learning

Comprehensive Course Material

Lyes HADJAR
z May 31, 2025

Machine Learning Course Navigation

ò Course Overview
This comprehensive course covers probability-based learning methods from fundamentals to
advanced techniques, including Naive Bayes and Bayesian Networks with detailed mathematical
derivations and practical examples.

Contents
1 Big Idea 1
2 Fundamentals 1
2.1 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2 Bayesian Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
3 Standard Approach: The Naive Bayes Model 2
3.1 Naive Bayes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 Prediction Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.3 Example: Naive Bayes with Categorical Features . . . . . . . . . . . . . . 2
4 Extensions and Variations 4
4.1 Smoothing (Laplace Correction) . . . . . . . . . . . . . . . . . . . . . . . . . 4
4.2 Continuous Features: Probability Density Functions (PDF) . . . . . . . 5
4.3 Continuous Features: Binning . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5 Bayesian Networks 9
5.1 Intuition and Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.2 Structure and Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5.3 Conditional Probability Tables (CPTs) . . . . . . . . . . . . . . . . . . . . . 10
5.4 Inference in Bayesian Networks . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.4.1 Exact Inference: Enumeration . . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.4.2 Approximate Inference: Sampling . . . . . . . . . . . . . . . . . . . . . . . . 10
5.5 Worked Example: Wet Grass Problem . . . . . . . . . . . . . . . . . . . . . 11
6 Summary of Key Concepts 12

1
Probability-Based Learning 2

7 Practice Problems 13
7.1 Problem 1: Medical Diagnosis with Bayes’ Theorem . . . . . . . . . . . . 13
7.2 Problem 2: Multi-feature Naive Bayes with Smoothing . . . . . . . . . . 13
7.3 Problem 3: Continuous Features with Gaussian Distribution . . . . . . . 15
Probability-Based Learning 1

1 Big Idea

Big Idea

Probability-based learning is about revising beliefs as new data arrives. It’s built on
Bayes’ Theorem, which connects prior beliefs with observed evidence to form new (pos-
terior) beliefs.

j Intuitive Understanding

Imagine you’re playing a game called Find the Lady where a queen is hidden under one
of three cards. If you have no information, you assume equal probability (1/3) for each
card. But if you watch the dealer favor the right position 19 out of 30 times, you revise
your guess — now you believe the queen is more likely to be on the right.
This updating of belief using evidence is the essence of Bayesian thinking.

2 Fundamentals
2.1 Bayes’ Theorem

j Intuitive Understanding

Intuitively, Bayes’ Theorem tells us:

Updated belief = Evidence-based likelihood × Prior belief

e Formal Idea
Formally:
P (d | t) · P (t)
P (t | d) = (1)
P (d)
Where:

• P (t | d): Posterior probability (probability of target t given data d)

• P (d | t): Likelihood (probability of seeing data d if target is t)

• P (t): Prior (our belief about t before seeing d)

• P (d): Evidence (overall probability of data d)

We often ignore P (d) for classification because it is the same for all classes.

2.2 Bayesian Prediction

To predict a class for an instance d, we compute:

P (t | d) ∝ P (d | t) · P (t) (2)

We choose the class with the highest posterior — called the MAP (Maximum A Pos-
teriori) prediction:
t∗ = arg max P (d | t) · P (t) (3)
t
Probability-Based Learning 2

3 Standard Approach: The Naive Bayes Model

The problem with Bayes’ approach is scalability — we need to estimate probabilities for every
possible combination of features. Naive Bayes simplifies this using a strong assumption:

e Formal Idea
Naive Bayes Assumption: All features are conditionally independent given the
target.
This reduces the complexity significantly.

3.1 Naive Bayes Formula

Given features d1 , d2 , ..., dm , and class t:
m
Y
P (t | d) ∝ P (t) P (di | t) (4)
i=1

3.2 Prediction Rule

m
!
Y
t̂ = arg max P (t) P (di | t) (5)
t
i=1

3.3 Example: Naive Bayes with Categorical Features

y Example

Assume a dataset for predicting whether an email is SPAM or NOT SPAM based on
three binary features:

• Free: email contains the word "free"

• Buy: email contains the word "buy"

• Click: email contains the word "click"

The dataset is summarized below, where each row shows a unique feature combination
and how many emails fall into that category.
Class Free Buy Click Count
Spam Yes Yes Yes 3
Spam No Yes Yes 2
Not No No Yes 3
Not No No No 2
Total number of training examples: 3 + 2 + 3 + 2 = 10
You want to predict the class of an email with: {Free=Yes, Buy=Yes, Click=Yes}.
Step 1: Compute Prior Probabilities

• P (Spam) = 3+2
10 = 5
10 = 0.5

• P (N ot) = 3+2
10 = 5
10 = 0.5

Step 2: Compute Conditional Probabilities (Likelihoods)

For class Spam (total Spam emails = 5):
Probability-Based Learning 3

• P (F ree = Y es | Spam) = 3
5 = 0.6

• P (Buy = Y es | Spam) = 3+2

5 = 5
5 = 1.0

• P (Click = Y es | Spam) = 3+2

5 = 5
5 = 1.0

For class Not (total Not emails = 5):

• P (F ree = Y es | N ot) = 0
5 =0

• P (Buy = Y es | N ot) = 0
5 =0

• P (Click = Y es | N ot) = 3
5 = 0.6

Step 3: Apply Naive Bayes Formula for Each Class

For Spam:

P (Spam | d) ∝ P (Spam) · P (F ree = Y es | Spam)·

P (Buy = Y es | Spam) · P (Click = Y es | Spam)
= 0.5 · 0.6 · 1.0 · 1.0
= 0.3

For Not:

P (N ot | d) ∝ P (N ot) · P (F ree = Y es | N ot)·

P (Buy = Y es | N ot) · P (Click = Y es | N ot)
= 0.5 · 0 · 0 · 0.6 = 0

Final Prediction: Since P (Spam | d) > P (N ot | d), the model predicts Spam.

. Warning

Question: In this example, why did we calculate P (Click = Y es | Spam) but not
P (Click = N o | Spam)?
Answer: This is an excellent question that highlights a key principle of Naive Bayes
classification.
Key Principle: In Naive Bayes, we only compute the probabilities of the specific
feature values observed in the test instance we’re trying to classify.
Our test email has: {Free=Yes, Buy=Yes, Click=Yes}
Therefore, for each class, we calculate:

P (Free=Yes | Class) · P (Buy=Yes | Class) · P (Click=Yes | Class)

We do not need P (Click=No | Spam) because the test instance has Click=Yes, not
Click=No.
When would we compute P (Click=No | Spam)?

• Only if the test instance had Click=No

• When building a complete probability table for analysis

• When predicting a different email with Click=No

Probability-Based Learning 4

Intuition: Think of it as asking "Given this exact email, how likely is it to be spam?"
We’re not concerned with emails that look different — only the observed features matter
for this specific prediction.

4 Extensions and Variations

4.1 Smoothing (Laplace Correction)

. Warning

If a feature-value/class combination never appears in training, its probability becomes

zero. This causes the entire product to be zero, even if other features are strongly pre-
dictive.

To solve this, we apply Laplace Smoothing with smoothing parameter k:

count(di = v, t) + k
P (di = v | t) = (6)
count(t) + kV
Where:
• k is the smoothing parameter (typically k = 1 for Laplace smoothing)
• V is the number of possible values for feature di (for binary features, V = 2)
• Adding k to numerator and kV to denominator avoids zero probabilities

y Example

Applying Laplace Smoothing to Previous Example (using k = 1):

Updated Conditional Probabilities with Smoothing (k = 1, V = 2 for binary
features):
For class Spam (total Spam emails = 5):
• P (F ree = Y es | Spam) = 3+1
5+1·2 = 4
7 ≈ 0.571

• P (Buy = Y es | Spam) = 5+1

5+1·2 = 6
7 ≈ 0.857

• P (Click = Y es | Spam) = 5+1

5+1·2 = 6
7 ≈ 0.857
For class Not (total Not emails = 5):
• P (F ree = Y es | N ot) = 0+1
5+1·2 = 1
7 ≈ 0.143

• P (Buy = Y es | N ot) = 0+1

5+1·2 = 1
7 ≈ 0.143

• P (Click = Y es | N ot) = 3+1

5+1·2 = 4
7 ≈ 0.571
Updated Naive Bayes Predictions with Smoothing:
For Spam:
P (Spam | d) ∝ P (Spam) · P (F ree = Y es | Spam)·
P (Buy = Y es | Spam) · P (Click = Y es | Spam)
4 6 6
= 0.5 · · ·
7 7 7
144
= 0.5 · ≈ 0.210
343
Probability-Based Learning 5

For Not:

P (N ot | d) ∝ P (N ot) · P (F ree = Y es | N ot)·

P (Buy = Y es | N ot) · P (Click = Y es | N ot)
1 1 4
= 0.5 · · ·
7 7 7
4
= 0.5 · ≈ 0.006
343
Final Prediction with Smoothing: Since P (Spam | d) > P (N ot | d) (0.210 > 0.006),
the model still predicts Spam, but now with non-zero probabilities for both classes.
Key Improvement: Without smoothing, P (N ot | d) = 0 due to zero probabilities.
With smoothing (k = 1), both classes have non-zero probabilities, making the model
more robust and providing meaningful probability estimates.
Note on Smoothing Parameter: Different values of k can be used:

• k = 1: Standard Laplace smoothing (add-one smoothing)

• k < 1: Less aggressive smoothing for large datasets

• k > 1: More aggressive smoothing for small or noisy datasets

4.2 Continuous Features: Probability Density Functions (PDF)

Instead of estimating probabilities for every value, we model P (di | t) using a probability distri-
bution, typically Gaussian (normal distribution).

e Formal Idea
The Gaussian formula:
(di − µ)2

1
P (di | t) = √ exp − (7)
2πσ 2 2σ 2

Where:

• µ: mean of feature di for class t

• σ 2 : variance of feature di for class t

y Example

Step-by-step example:
If for class Spam:

• Feature: word count di = 50

• µ = 40, σ 2 = 25
Probability-Based Learning 6

Then:
(50 − 40)2

1
P (di | t = Spam) = √ exp − (8)
2π · 25 2 · 25

1 100
= √ exp − (9)
157.08 50
1
= exp(−2) ≈ 0.135 · 0.135 = 0.0182 (10)
12.53

4.3 Continuous Features: Binning

When dealing with continuous features in Naive Bayes, we cannot directly compute probabilities
like P (Age = 25.7 | Class) because continuous values have infinite possibilities. One effective
strategy is to discretize continuous values into bins (intervals) and treat them as categorical
features.

Big Idea

Core Concept: Transform continuous features into discrete categorical features by

grouping similar values into meaningful intervals.

Steps for Binning Approach:

1. Choose bin boundaries based on domain knowledge or data distribution
2. Assign each continuous value to its corresponding bin
3. Apply standard Naive Bayes treating bins as categorical values
4. Compute probabilities using bin frequencies

y Example

Age Binning Example:

Suppose we have a dataset predicting loan approval based on age:
Step 1: Define Age Bins
• Bin 1: [18-25] (Young adults)

• Bin 2: [26-40] (Early career)

• Bin 3: [41-55] (Mid career)

• Bin 4: [56+] (Senior)

Step 2: Convert Continuous Ages to Bins
Original Age Assigned Bin Loan Approved Class
23 [18-25] Yes Approved
34 [26-40] No Rejected
19 [18-25] Yes Approved
45 [41-55] Yes Approved
62 [56+] No Rejected
Step 3: Compute Conditional Probabilities
For Approved class (3 instances):
Probability-Based Learning 7

• P (Age ∈ [18 − 25] | Approved) = 2

• P (Age ∈ [26 − 40] | Approved) = 0

• P (Age ∈ [41 − 55] | Approved) = 1

• P (Age ∈ [56+] | Approved) = 0

For Rejected class (2 instances):

• P (Age ∈ [18 − 25] | Rejected) = 0

• P (Age ∈ [26 − 40] | Rejected) = 1

• P (Age ∈ [41 − 55] | Rejected) = 0

• P (Age ∈ [56+] | Rejected) = 1

Step 4: Make Predictions

For a new applicant aged 28 (falls in [26-40] bin):

P (Approved | Age = 28) ∝ P (Approved) · P (Age ∈ [26 − 40] | Approved)

3 0
= · =0
5 3

P (Rejected | Age = 28) ∝ P (Rejected) · P (Age ∈ [26 − 40] | Rejected)

2 1 1
= · =
5 2 5
Prediction: Rejected (but should use Laplace smoothing in practice!)

Advantages of Binning:
• Simple to implement and understand
• Works well when meaningful intervals exist
• Robust to outliers within bins
• Preserves interpretability
Disadvantages of Binning:
• Loss of information due to discretization
• Arbitrary bin boundaries can affect results
• May not capture smooth relationships
• Requires domain knowledge for optimal binning

j Intuitive Understanding

Choosing Good Bins:

• Equal-width bins: Divide range into equal intervals
• Equal-frequency bins: Each bin contains same number of samples
Probability-Based Learning 8

• Domain-based bins: Use meaningful thresholds (e.g., age groups)

• Decision tree-based: Use optimal splits from decision trees

Probability-Based Learning 9

5 Bayesian Networks
5.1 Intuition and Purpose

j Intuitive Understanding

A Bayesian Network is a graphical model that represents the probabilistic relation-

ships between variables using a Directed Acyclic Graph (DAG). Each node represents
a variable, and each edge represents a conditional dependency.
Unlike Naive Bayes, Bayesian Networks allow you to relax the conditional indepen-
dence assumption and explicitly model dependencies between features.

5.2 Structure and Semantics

Let each node in the DAG represent a random variable. The network defines a joint probability
distribution over all variables:
n
Y
P (X1 , X2 , ..., Xn ) = P (Xi | Parents(Xi )) (11)
i=1

Each variable is conditionally independent of its non-descendants, given its parents.

y Example

Example DAG:

Root Variable

Weather

Sprinkler Rain

WetGrass

Leaf Variable

Dependency structure:

• Weather → Sprinkler

• Weather → Rain

• Sprinkler, Rain → WetGrass

Probability-Based Learning 10

Joint probability factorization:

P (W eather, Sprinkler, Rain, W etGrass) = (12)

P (W eather) · P (Sprinkler | W eather)· (13)
P (Rain | W eather) · P (W etGrass | Sprinkler, Rain) (14)

j Intuitive Understanding

Reading the graph:

• Weather is the root node — it has no parents and influences both Sprinkler and
Rain

• Sprinkler and Rain are intermediate nodes — they depend on Weather and influ-
ence WetGrass

• WetGrass is the leaf node — it depends on both Sprinkler and Rain but influences
nothing else

The arrows show causal relationships: Weather affects whether the sprinkler is on and
whether it rains, and both of these factors determine if the grass gets wet.

5.3 Conditional Probability Tables (CPTs)

Each node has a Conditional Probability Table (CPT) specifying probabilities for each
combination of parent values.

y Example

Example: Rain | Weather

Weather Rain=True Rain=False

Sunny 0.1 0.9
Cloudy 0.4 0.6
Rainy 0.8 0.2

5.4 Inference in Bayesian Networks

Inference means computing the posterior probability of one or more variables given some
observed evidence.

5.4.1 Exact Inference: Enumeration

To compute P (Q | E = e), where Q is the query and E is the evidence:
P
P (Q, E = e, H)
P (Q | E = e) = P H (15)
Q,H P (Q, E = e, H)

Where H are hidden variables. This can be slow for large networks.

5.4.2 Approximate Inference: Sampling

We can use methods like:
Probability-Based Learning 11

• Likelihood weighting
• Gibbs sampling
• Rejection sampling
These simulate values to estimate probabilities without full enumeration.

5.5 Worked Example: Wet Grass Problem

y Example

DAG Structure:

Weather (W)

Sprinkler (S) Rain (R)

Wet Grass (G)

Variables:

• W: Weather (Sunny, Rainy)

• S: Sprinkler (On/Off)

• R: Rain (Yes/No)

• G: Wet Grass (Yes/No)

Goal: Compute P (R = Yes | G = Yes)

Step 1: Write joint distribution

P (W, S, R, G) = P (W )P (S | W )P (R | W )P (G | S, R)

Step 2: Use enumeration to marginalize non-query variables

X
P (R = Yes, G = Yes) = P (W )P (S | W ) (16)
W,S

P (R = Yes | W ) (17)
P (G = Yes | S, R = Yes) (18)

Step 3: Compute denominator to normalize

X
P (G = Yes) = P (R, G = Yes)
R
Probability-Based Learning 12

Step 4: Compute posterior

P (R = Yes, G = Yes)
P (R = Yes | G = Yes) =
P (G = Yes)

Interpretation: This answers the question: "If the grass is wet, what is the probability
that it rained?"

Á Summary

Bayesian Networks are powerful for encoding structured probabilistic relationships and
allow for efficient reasoning under uncertainty.

6 Summary of Key Concepts

Á Summary
P (d|t)P (t)
• Bayes’ Theorem: Updates beliefs using evidence. P (t | d) = P (d)

• Naive Bayes: Assumes feature independence. P (t | d) ∝ P (t)

Q
P (di | t)

• Smoothing: Laplace correction avoids zero probabilities.

• Continuous Features: Modeled using Gaussian PDFs or binned into intervals.

• Bayesian Networks: DAGs that model variable dependencies with conditional

probability tables.

• Inference: Can be exact (enumeration) or approximate (sampling methods).

Probability-Based Learning 13

7 Practice Problems
7.1 Problem 1: Medical Diagnosis with Bayes’ Theorem

« Practice Problem

A rare disease affects 0.2% of the population. A medical test has:

• Sensitivity (true positive rate): P (T est+ | Disease+ ) = 0.95

• Specificity (true negative rate): P (T est− | Disease− ) = 0.98

• Prior probability: P (Disease+ ) = 0.002

Part A: What is the probability of a false positive? P (T est+ | Disease− )

Part B: If someone tests positive, what’s the probability they actually have the disease?
P (Disease+ | T est+ )
Solution:
Part A:

P (T est+ | Disease− ) = 1 − P (T est− | Disease− ) = 1 − 0.98 = 0.02

Part B: Using Bayes’ theorem:

P (T est+ | Disease+ ) · P (Disease+ )

P (Disease+ | T est+ ) = (19)
P (T est+ )

First, compute P (T est+ ) using the law of total probability:

P (T est+ ) = P (T est+ | Disease+ ) · P (Disease+ ) (20)

− −
+
+ P (T est | Disease ) · P (Disease ) (21)
= 0.95 × 0.002 + 0.02 × 0.998 (22)
= 0.0019 + 0.01996 = 0.02186 (23)

Therefore:
0.95 × 0.002 0.0019
P (Disease+ | T est+ ) = = ≈ 0.087
0.02186 0.02186
Insight: Despite a 95% accurate test, a positive result only indicates 8.7% chance of
having the disease due to the low base rate!

7.2 Problem 2: Multi-feature Naive Bayes with Smoothing

« Practice Problem
Email classification dataset with three binary features:
Probability-Based Learning 14

Email Contains "Free" Contains "Click" Length > 100 Class

1 Yes No Yes Spam
2 Yes Yes No Spam
3 No Yes Yes Spam
4 Yes Yes Yes Spam
5 No No No Ham
6 No No Yes Ham
7 No Yes No Ham
8 No No No Ham

Part A: Calculate all conditional probabilities with Laplace smoothing (k = 1, V = 2).

Part B: Classify a new email: [Contains "Free" = Yes, Contains "Click" = No, Length
> 100 = Yes]
Part C: What happens if we don’t use smoothing? Calculate the difference.
Solution:
Part A: Prior probabilities: - P (Spam) = 48 = 0.5 - P (Ham) = 48 = 0.5
With Laplace smoothing (k = 1, V = 2):
For Spam (4 emails):

• P (F ree = Y es | Spam) = 3+1

4+2 = 4
6 = 0.667

• P (Click = Y es | Spam) = 3+1

4+2 = 4
6 = 0.667

• P (Length > 100 = Y es | Spam) = 3+1

4+2 = 4
6 = 0.667

For Ham (4 emails):

• P (F ree = Y es | Ham) = 0+1

4+2 = 1
6 = 0.167

• P (Click = Y es | Ham) = 1+1

4+2 = 2
6 = 0.333

• P (Length > 100 = Y es | Ham) = 1+1

4+2 = 2
6 = 0.333

Part B: For new email [Free=Yes, Click=No, Length>100=Yes]:

For Spam:

P (Spam | features) ∝ 0.5 × 0.667 × (1 − 0.667) × 0.667 (24)

= 0.5 × 0.667 × 0.333 × 0.667 (25)
= 0.074 (26)

For Ham:

P (Ham | features) ∝ 0.5 × 0.167 × (1 − 0.333) × 0.333 (27)

= 0.5 × 0.167 × 0.667 × 0.333 (28)
= 0.019 (29)

Prediction: Spam (0.074 > 0.019)

Part C: Without smoothing, P (F ree = Y es | Ham) = 04 = 0, making the entire Ham
probability 0, regardless of other features. Smoothing prevents this zero-probability trap.
Probability-Based Learning 15

7.3 Problem 3: Continuous Features with Gaussian Distribution

« Practice Problem
Height classification for gender prediction:
Training Data:

• Male heights (cm): [175, 180, 170, 185, 178, 172, 183, 177]

• Female heights (cm): [160, 165, 158, 162, 167, 163, 159, 164]

Part A: Calculate mean and variance for each class.

Part B: What’s the probability of height 174 cm given Male? Given Female?
Part C: Classify someone who is 174 cm tall (assume equal priors).
Solution:
Part A: Male: µM = 177.5 cm, σM 2 = 25.71 cm² Female: µ = 162.25 cm, σ 2 = 10.21
F F
cm²
Part B:
(174 − 177.5)2

1
P (174 | M ale) = √ exp − (30)
2π × 25.71 2 × 25.71

1 12.25
= exp − (31)
8.01 51.42
= 0.125 × exp(−0.238) = 0.125 × 0.788 = 0.099 (32)

(174 − 162.25)2

1
P (174 | F emale) = √ exp − (33)
2π × 10.21 2 × 10.21

1 138.06
= exp − (34)
5.05 20.42
= 0.198 × exp(−6.76) = 0.198 × 0.001 ≈ 0.0002 (35)

Part C: Since P (174 | M ale) ≫ P (174 | F emale), classify as Male.

AIML-Unit 3 Notes-Assignment 3
No ratings yet
AIML-Unit 3 Notes-Assignment 3
37 pages
Unit 2
No ratings yet
Unit 2
20 pages
Learning Bayesian Networks (Neapolitan, Richard) PDF
100% (2)
Learning Bayesian Networks (Neapolitan, Richard) PDF
704 pages
@vtudeveloper - in ML Mod 4
No ratings yet
@vtudeveloper - in ML Mod 4
11 pages
Naïve Bayes Classifier Overview
No ratings yet
Naïve Bayes Classifier Overview
29 pages
Understanding Bayesian Networks and Statistics
No ratings yet
Understanding Bayesian Networks and Statistics
190 pages
Introduction To Bayesian Inference: M. Botje NIKHEF, PO Box 41882, 1009DB Amsterdam, The Netherlands June 21, 2006
No ratings yet
Introduction To Bayesian Inference: M. Botje NIKHEF, PO Box 41882, 1009DB Amsterdam, The Netherlands June 21, 2006
68 pages
Bayesian Methods in Machine Learning
No ratings yet
Bayesian Methods in Machine Learning
36 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
Bayesian Methods Statistical Analysis
100% (7)
Bayesian Methods Statistical Analysis
697 pages
Bayesian Learning
No ratings yet
Bayesian Learning
42 pages
ML UNIT-5 Notes PDF
No ratings yet
ML UNIT-5 Notes PDF
41 pages
Understanding Bayesian Learning Methods
No ratings yet
Understanding Bayesian Learning Methods
10 pages
Unit IV CI PDF
No ratings yet
Unit IV CI PDF
24 pages
Introduction to Bayesian Reasoning
No ratings yet
Introduction to Bayesian Reasoning
9 pages
Bayes Classifier
No ratings yet
Bayes Classifier
35 pages
Bayesian Learning in AI and ML
No ratings yet
Bayesian Learning in AI and ML
30 pages
Understanding Bayes Theorem and Classifiers
No ratings yet
Understanding Bayes Theorem and Classifiers
26 pages
Bayesian Concept Learning Guide
No ratings yet
Bayesian Concept Learning Guide
157 pages
MLT Unit 4 Notes
No ratings yet
MLT Unit 4 Notes
26 pages
Bayesian Learning and Inference Techniques
No ratings yet
Bayesian Learning and Inference Techniques
33 pages
Bayesian Networks and Inference Methods
No ratings yet
Bayesian Networks and Inference Methods
23 pages
Naïve Bayesian Classifier Explained
No ratings yet
Naïve Bayesian Classifier Explained
41 pages
Module 2 - Bayesian Learning
No ratings yet
Module 2 - Bayesian Learning
7 pages
Understanding Bayesian Classification Techniques
No ratings yet
Understanding Bayesian Classification Techniques
25 pages
Unit 2
No ratings yet
Unit 2
17 pages
Bayesian Concept Learning Explained
No ratings yet
Bayesian Concept Learning Explained
66 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
COMP3411 Week 9 - Uncertainty
No ratings yet
COMP3411 Week 9 - Uncertainty
70 pages
Naïve Bayes Probability and Classification
No ratings yet
Naïve Bayes Probability and Classification
68 pages
Zzzz-Essential Bayes
No ratings yet
Zzzz-Essential Bayes
158 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
Machine Learning PPT Part III
No ratings yet
Machine Learning PPT Part III
26 pages
Generative Bayesian Models for Discrete Data
No ratings yet
Generative Bayesian Models for Discrete Data
74 pages
Babybayes Master
No ratings yet
Babybayes Master
172 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
99 pages
Unit 4
No ratings yet
Unit 4
36 pages
ML Notes 4
No ratings yet
ML Notes 4
27 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
65 pages
Unit I Probabilistic Reasoning I 9
No ratings yet
Unit I Probabilistic Reasoning I 9
20 pages
Bayesian Learning in Machine Learning
No ratings yet
Bayesian Learning in Machine Learning
60 pages
Unit - 5 ML
No ratings yet
Unit - 5 ML
57 pages
Essential Guide to Bayesian Statistics
No ratings yet
Essential Guide to Bayesian Statistics
180 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Unit 3-2
No ratings yet
Unit 3-2
12 pages
Introduction To Probability Theory and Statistics
No ratings yet
Introduction To Probability Theory and Statistics
127 pages
Introduction To Probability Theory and S
No ratings yet
Introduction To Probability Theory and S
127 pages
ML 4
No ratings yet
ML 4
6 pages
Rice Leaf Disease Project Document (1) - Merged
No ratings yet
Rice Leaf Disease Project Document (1) - Merged
82 pages
Predictive Breast Cancer Statistical Modelling For Early Diagnosis
No ratings yet
Predictive Breast Cancer Statistical Modelling For Early Diagnosis
14 pages
6th - SEM Data Science Notes
No ratings yet
6th - SEM Data Science Notes
46 pages
Enhanced Online Payment Fraud Detection Using Advanced Machine Learning Models On PaySim Simulated Transaction Data
No ratings yet
Enhanced Online Payment Fraud Detection Using Advanced Machine Learning Models On PaySim Simulated Transaction Data
67 pages
Isl439e - Introduction To Machine Learning With Business Applications
No ratings yet
Isl439e - Introduction To Machine Learning With Business Applications
3 pages
Prac4 AAM
No ratings yet
Prac4 AAM
2 pages
Multiple Disease Prediction Using Machine Learning Algorithms
No ratings yet
Multiple Disease Prediction Using Machine Learning Algorithms
5 pages
JKKN College of Engineering and Technology: Mr.V.Dharani, Ap/Cse 3 0 0 B.E Cse 4 Iii/Vi 45 2023-2024 2021
No ratings yet
JKKN College of Engineering and Technology: Mr.V.Dharani, Ap/Cse 3 0 0 B.E Cse 4 Iii/Vi 45 2023-2024 2021
5 pages
A New Malware Detection Model Using
No ratings yet
A New Malware Detection Model Using
9 pages
Restaurant Review Analysis
67% (3)
Restaurant Review Analysis
59 pages
Data Mining Model For Predicting Student Enrolment
No ratings yet
Data Mining Model For Predicting Student Enrolment
8 pages
Machine Learning Assignment Questions 2022
No ratings yet
Machine Learning Assignment Questions 2022
2 pages
Towards The Development of A Career Path Recommender System For Senior High School in Selected Public Schools Using Multi-Label Classification
No ratings yet
Towards The Development of A Career Path Recommender System For Senior High School in Selected Public Schools Using Multi-Label Classification
9 pages
Hybrid Framework for Threat Prediction
No ratings yet
Hybrid Framework for Threat Prediction
14 pages
SVM & Naive Bayes Assignment
No ratings yet
SVM & Naive Bayes Assignment
3 pages
Naive Bayes for Data Scientists
No ratings yet
Naive Bayes for Data Scientists
4 pages
Malign Page Detection via Machine Learning
No ratings yet
Malign Page Detection via Machine Learning
6 pages
Machine Learning for Email Spam Detection
No ratings yet
Machine Learning for Email Spam Detection
7 pages
Research Papers of Rainfall Ptediction
No ratings yet
Research Papers of Rainfall Ptediction
8 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
Machine Learning for Inventory Classification
No ratings yet
Machine Learning for Inventory Classification
43 pages
Is Naive Bayes A Good Classifier For Document Clas
No ratings yet
Is Naive Bayes A Good Classifier For Document Clas
11 pages
ET-II Lab Manual Student
No ratings yet
ET-II Lab Manual Student
63 pages
Executive PG in ML & AI with Generative AI
No ratings yet
Executive PG in ML & AI with Generative AI
27 pages
Real Time Fake News Detection Using Machine Learning and NLP
No ratings yet
Real Time Fake News Detection Using Machine Learning and NLP
5 pages
Abel Final Thesis Tvet PPT Final
100% (1)
Abel Final Thesis Tvet PPT Final
39 pages
Categorization of Email Using Machine Learning On Cloud: Abstract
No ratings yet
Categorization of Email Using Machine Learning On Cloud: Abstract
5 pages
NLP Unit1Content
No ratings yet
NLP Unit1Content
106 pages

Proba Based Notes

Uploaded by

Proba Based Notes

Uploaded by

Proba-Based Learning

Comprehensive Course Material

 Machine Learning Course Navigation

Intuitively, Bayes’ Theorem tells us:

Updated belief = Evidence-based likelihood × Prior belief

• P (t | d): Posterior probability (probability of target t given data d)

• P (d | t): Likelihood (probability of seeing data d if target is t)

• P (t): Prior (our belief about t before seeing d)

• P (d): Evidence (overall probability of data d)

2.2 Bayesian Prediction

3 Standard Approach: The Naive Bayes Model

3.1 Naive Bayes Formula

3.2 Prediction Rule

3.3 Example: Naive Bayes with Categorical Features

• Free: email contains the word "free"

• Buy: email contains the word "buy"

• Click: email contains the word "click"

Step 2: Compute Conditional Probabilities (Likelihoods)

• P (Buy = Y es | Spam) = 3+2

• P (Click = Y es | Spam) = 3+2

For class Not (total Not emails = 5):

Step 3: Apply Naive Bayes Formula for Each Class

P (Spam | d) ∝ P (Spam) · P (F ree = Y es | Spam)·

P (N ot | d) ∝ P (N ot) · P (F ree = Y es | N ot)·

P (Free=Yes | Class) · P (Buy=Yes | Class) · P (Click=Yes | Class)

• Only if the test instance had Click=No

• When building a complete probability table for analysis

• When predicting a different email with Click=No

4 Extensions and Variations

If a feature-value/class combination never appears in training, its probability becomes

To solve this, we apply Laplace Smoothing with smoothing parameter k:

Applying Laplace Smoothing to Previous Example (using k = 1):

• P (Buy = Y es | Spam) = 5+1

• P (Click = Y es | Spam) = 5+1

• P (Buy = Y es | N ot) = 0+1

• P (Click = Y es | N ot) = 3+1

P (N ot | d) ∝ P (N ot) · P (F ree = Y es | N ot)·

• k = 1: Standard Laplace smoothing (add-one smoothing)

• k < 1: Less aggressive smoothing for large datasets

• k > 1: More aggressive smoothing for small or noisy datasets

4.2 Continuous Features: Probability Density Functions (PDF)

• µ: mean of feature di for class t

• σ 2 : variance of feature di for class t

• Feature: word count di = 50

4.3 Continuous Features: Binning

Core Concept: Transform continuous features into discrete categorical features by

Steps for Binning Approach:

Age Binning Example:

• Bin 2: [26-40] (Early career)

• Bin 3: [41-55] (Mid career)

• Bin 4: [56+] (Senior)

• P (Age ∈ [18 − 25] | Approved) = 2

• P (Age ∈ [26 − 40] | Approved) = 0

• P (Age ∈ [41 − 55] | Approved) = 1

• P (Age ∈ [56+] | Approved) = 0

For Rejected class (2 instances):

• P (Age ∈ [18 − 25] | Rejected) = 0

• P (Age ∈ [26 − 40] | Rejected) = 1

• P (Age ∈ [41 − 55] | Rejected) = 0

• P (Age ∈ [56+] | Rejected) = 1

Step 4: Make Predictions

P (Approved | Age = 28) ∝ P (Approved) · P (Age ∈ [26 − 40] | Approved)

P (Rejected | Age = 28) ∝ P (Rejected) · P (Age ∈ [26 − 40] | Rejected)

Choosing Good Bins:

• Domain-based bins: Use meaningful thresholds (e.g., age groups)

• Decision tree-based: Use optimal splits from decision trees

A Bayesian Network is a graphical model that represents the probabilistic relation-

5.2 Structure and Semantics

Each variable is conditionally independent of its non-descendants, given its parents.

• Sprinkler, Rain → WetGrass

Joint probability factorization:

P (W eather, Sprinkler, Rain, W etGrass) = (12)

Reading the graph:

5.3 Conditional Probability Tables (CPTs)

Example: Rain | Weather

Weather Rain=True Rain=False

Machine Learning Course Navigation