0% found this document useful (0 votes)
7 views4 pages

P03 BayesianLearning SolutionNotes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views4 pages

P03 BayesianLearning SolutionNotes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Aprendizagem 2023

Lab 3: Bayesian learning

Practical exercises

I. Probability theory

1. Consider the following registry where an experiment is repeated A B C D


𝐱1 1 1 0 0
six times and four events (A, B, C and D) are detected.
𝐱2 1 1 1 0
Considering frequentist estimates, compute: 𝐱3 0 0 0 1
2 1 𝐱4 0 0 0 1
𝑝(𝐴) = 𝑝(𝐴, 𝐵, 𝐶) = 0 0 0 0
6 6 𝐱5
𝑝(𝐴, 𝐵) =
2
𝑝(𝐴|𝐵, 𝐶) = 1 𝐱6 0 0 0 0
6
𝑝(𝐴, 𝐵, 𝐶, 𝐷) = 0
𝑝(𝐵|𝐴) = 1
𝑝(𝐷|𝐴, 𝐵, 𝐶) = 0

2. Considering the following two-dimensional measurements {(-2,2),(-1,3),(0,1),(-2,1)}.


a) What are the maximum likelihood parameters of a multivariate Gaussian distribution for this
set of points?

−1.25 0.92 −0.083 1.1 0.1


𝑁(𝐱|𝜇, Σ), µ=[ ], Σ=( ), 𝑑𝑒𝑡(Σ) = 0.83, Σ −1 = ( )
1.75 −0.083 0.92 0.1 1.1

b) What is the shape of the Gaussian?


Draw it approximately using a contour map.

II. Bayesian learning

3. Consider the following dataset where:


− 0: False and 1: True y1 y2 y3 y4 y5 class
𝐱1 1 1 0 1 0 1
− y1: Fast processing
𝐱2 1 1 1 0 0 0
− y2: Decent Battery 𝐱3 0 1 1 1 0 0
− y3: Good Camera 𝐱4 0 0 0 1 1 0
− y4: Good Look and Feel 𝐱5 1 0 1 1 1 1
− y5: Easiness of Use 𝐱6 0 0 1 0 0 1
𝐱7 0 0 0 0 1 1
− class: iPhone
And the query vector 𝐱 new = [1 1 1 1 1]𝑇
a) Using Bayes’ rule, without making any assumptions, compute the posterior probabilities for
the query vector. How is it classified?
3 4
𝑝(𝐶 = 0) = , 𝑝(𝐶 = 1) =
7 7
𝑝 (𝐶 = 0)𝑝(𝑦1 = 1, 𝑦2 = 1, 𝑦3 = 1, 𝑦4 = 1, 𝑦5 = 1|𝐶 = 0)
𝑝 (𝐶 = 0 | 𝑦1 = 1, 𝑦2 = 1, 𝑦3 = 1, 𝑦4 = 1, 𝑦5 = 1) =
𝑝 (𝑦1 = 1, 𝑦2 = 1, 𝑦3 = 1, 𝑦4 = 1, 𝑦5 = 1)
𝑝 (𝐶 = 1)𝑝(𝑦1 = 1, 𝑦2 = 1, 𝑦3 = 1, 𝑦4 = 1, 𝑦5 = 1|𝐶 = 1)
𝑝 (𝐶 = 1 | 𝑦1 = 1, 𝑦2 = 1, 𝑦3 = 1, 𝑦4 = 1, 𝑦5 = 1) =
𝑝 (𝑦1 = 1, 𝑦2 = 1, 𝑦3 = 1, 𝑦4 = 1, 𝑦5 = 1)
According to our estimated likelihoods, the denominators are equal to zero. Posteriors are not defined and,
thus, we cannot classify the input. A small training sample is not enough to decide under a classic Bayes rule.

b) What is the problem of working without assumptions?

Insufficient data to construct a meaningful joint distribution, e.g. applicable for datasets with high
dimensionality or low size (small sample).

c) Compute the class for the same query vector under the naive Bayes assumption.

0.0141
𝑝 (𝐶 = 0 | 𝑦1 = 1, 𝑦2 = 1, 𝑦3 = 1, 𝑦4 = 1, 𝑦5 = 1) =
𝑝(𝑦1 = 1, 𝑦2 = 1, 𝑦3 = 1, 𝑦4 = 1, 𝑦5 = 1)
0.0090
𝑝 (𝐶 = 1 | 𝑦1 = 1, 𝑦2 = 1, 𝑦3 = 1, 𝑦4 = 1, 𝑦5 = 1) =
𝑝(𝑦1 = 1, 𝑦2 = 1, 𝑦3 = 1, 𝑦4 = 1, 𝑦5 = 1)
Label 𝐶 = 0 (not an iPhone).

d) Consider the presence of missings. Under the same naive Bayes assumption, how do you
classify 𝐱 new = [1 ? 1 ? 1]𝑇

0.03175
𝑝 (𝐶 = 0 | 𝑦1 = 1, 𝑦2 =? , 𝑦3 = 1, 𝑦4 =? , 𝑦5 = 1) =
𝑝(𝑦1 = 1, 𝑦3 = 1, 𝑦5 = 1)
0.0714
𝑝 (𝐶 = 1 | 𝑦1 = 1, 𝑦2 =? , 𝑦3 = 1, 𝑦4 =? , 𝑦5 = 1) =
𝑝(𝑦1 = 1, 𝑦3 = 1, 𝑦5 = 1)
Label 𝐶 = 1.

weight (kg) height (cm) NBA player


4. Consider the following dataset
𝐱1 170 160 0
𝐱2 80 220 1
𝐱3 90 200 1
𝐱4 60 160 0
𝐱5 50 150 0
𝐱6 70 190 1

And the query vector 𝐱 new = [100 225]𝑇


a) Compute the most probable class for the query vector assuming that the likelihoods are 2-
dimensional Gaussians.

1 1
𝑝(𝐶 = 0) = , 𝑝(𝐶 = 1) = ,
2 2

𝑝(𝑦1, 𝑦2 | 𝐶 = 0) 𝑝(𝑦1, 𝑦2 | 𝐶 = 1)

93. (3) 80
𝜇 [ ] [ ]
156. (6) 203. (3)
4433. (3) 216. (6) 100 50
Σ [ ] [ ]
216. (6) 33. (3) 50 233. (3)

𝑝 (𝐶 = 0)𝑝( 𝑦1 = 100, 𝑦2 = 225 | 𝐶 = 0)


𝑝 (𝐶 = 0 | 𝑦1 = 100, 𝑦2 = 225) =
𝑝( 𝑦1 = 100, 𝑦2 = 225)

1 100 93. (3) 4433. (3) 216. (6)


𝑁 ([ ]| 𝜇 = [ ],Σ = [ ]) 1.74 × 10−48
2 225 156. (6) 216. (6) 33. (3)
=
𝑝( 𝑦1 = 100, 𝑦2 = 225) 𝑝( 𝑦1 = 100, 𝑦2 = 225)

𝑝 (𝐶 = 1)𝑝( 𝑦1 = 100, 𝑦2 = 225 | 𝐶 = 1)


𝑝 (𝐶 = 1 | 𝑦1 = 100, 𝑦2 = 225) =
𝑝( 𝑦1 = 100, 𝑦2 = 225)

1 100 80 100 50
𝑁 ([ ]| 𝜇 = [ ],Σ = [ ]) 5.38 × 10−5
2 225 203. (3) 50 233. (3)
=
𝑝( 𝑦1 = 100, 𝑦2 = 225) 𝑝( 𝑦1 = 100, 𝑦2 = 225)

Classified as an NBA player.

b) Compute the most probable class for the query vector, under the Naive Bayes assumption,
using 1-dimensional Gaussians to model the likelihoods

𝑝(𝑦1 | 𝐶 = 0) 𝑝(𝑦1| 𝐶 = 1) 𝑝(𝑦1 | 𝐶 = 0) 𝑝(𝑦1| 𝐶 = 1)


𝜇 93. (3) 80 156. (6) 203. (3)
σ 66.58 10 5.77 15.275
𝑝 (𝐶 = 0)𝑝( 𝑦1 = 100 | 𝐶 = 0)𝑝(𝑦2 = 225 | 𝐶 = 0)
𝑝 (𝐶 = 0 | 𝑦1 = 100, 𝑦2 = 225) =
𝑝( 𝑦1 = 100, 𝑦2 = 225)

1
𝑁(100 | 𝜇 = 93. (3), σ = 66.58)𝑁(225 | 𝜇 = 156. (6), σ = 5.77) 7.854 × 10−35
2 =
𝑝( 𝑦1 = 100, 𝑦2 = 225) 𝑝( 𝑦1 = 100, 𝑦2 = 225)

𝑝 (𝐶 = 1)𝑝( 𝑦1 = 100 | 𝐶 = 1)𝑝(𝑦2 = 225 | 𝐶 = 1)


𝑝 (𝐶 = 1 | 𝑦1 = 100, 𝑦2 = 225) =
𝑝( 𝑦1 = 100, 𝑦2 = 225)

1
𝑁(100 | 𝜇 = 80, σ = 10)𝑁(225 | 𝜇 = 203. (3), σ = 15.275) 2.578 × 10−5
2 =
𝑝( 𝑦1 = 100, 𝑦2 = 225) 𝑝( 𝑦1 = 100, 𝑦2 = 225)

Classified as an NBA player.

5. Assuming training examples with m features and a binary class.


a) How many parameters do you have to estimate considering features are Boolean and:
i. no assumptions about how the data is distributed
ii. naive Bayes assumption

One parameter for the prior 𝑝(𝑧 = 0) = 1 − 𝑝(𝑧 = 1).


Considering the classic Bayesian model: we need (2𝑚 − 1) × 2 parameters to estimate 𝑝(𝑦1 = 𝑣1 , … , 𝑦𝑚 =
𝑣𝑚 | 𝑧 = 𝑐), hence 2𝑚 × 2 − 1.
Considering the naïve Bayes: we need to estimate 𝑝(𝑦𝑖 | 𝑧 = 𝑐). Since there are 2 classes and 𝑚 features,
we have 2 × 𝑚 × 1 = 2𝑚 parameters for the likelihoods. The total number of parameters is 1 + 2𝑚.

b) How many parameters do you have to estimate considering features are numeric and:
iii. multivariate Gaussian assumption
iv. naive Bayes with Gaussian assumption
Similarly, one parameter for the prior, 𝑝(𝑧 = 0) = 1 − 𝑝(𝑧 = 1).

A multivariate Gaussian to estimate the likelihood 𝑝(𝐱 | 𝑧 = 0) requires a mean vector and a covariance
matrix. For 𝑚 variables, the mean vector has 𝑚 parameters. The covariance is a 𝑚 × 𝑚 matrix. However, the
matrix is symmetric so, we only need to count the diagonal and upper diagonal part of the matrix, i.e. 𝑚 +
(𝑚−1) (𝑚+1)
𝑚 . In this context, the total number of parameters is 2 (𝑚 + 𝑚 ) + 1.
2 2

Considering the naïve Bayes: we need to estimate 𝑝(𝑦𝑖 | 𝑧 = 𝑐), requiring the fitting of a (univariate)
Gaussian distribution with two parameters: 𝜇𝑖 and 𝜎𝑖 . Since there are 2 classes and m features, we have
2 × 𝑚 × 2 = 4𝑚 parameters for the likelihoods. The total number of parameters is 1 + 4𝑚.

Programming quests

Resources: Classification and Evaluation notebooks available at the course’s webpage

1. Reuse the sklearn code from last lab where we learnt a decision tree in the breast.w data:
a) apply the naïve Bayes classifier with default parameters
b) compare the accuracy of both classifiers using a 10-fold cross-validation

2. Consider the accuracy estimates collected under a 5-fold CV for two predictive models M1 and
M2, accM1=(0.7,0.5,0.55,0.55,0.6) and accM2=(0.75,0.6,0.6,0.65,0.55).
Using scipy (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_rel.html),
assess whether the differences in predictive accuracy are statistically significant.

You might also like