0% found this document useful (0 votes)
16 views6 pages

Tutorial I

This document is a tutorial for the COMP703 Artificial Intelligence course at the University of KwaZulu-Natal. It contains a series of questions covering topics such as probability, Bayesian networks, classifiers, n-gram models, and Hidden Markov Models. Students are instructed to attempt all questions in preparation for their Theory Test I.

Uploaded by

sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views6 pages

Tutorial I

This document is a tutorial for the COMP703 Artificial Intelligence course at the University of KwaZulu-Natal. It contains a series of questions covering topics such as probability, Bayesian networks, classifiers, n-gram models, and Hidden Markov Models. Students are instructed to attempt all questions in preparation for their Theory Test I.

Uploaded by

sam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

UNIVERSITY OF KWAZULU-NATAL

SCHOOL OF MATHEMATICS, STATISTICS AND COMPUTER


SCIENCE

DISCIPLINE OF COMPUTER SCIENCE

Tutorial I

COURSE CODE: COMP703 Artificial Intelligence

Examiner: Dr E. Jembere
Duration: open

Instructions

1. Attempt all questions in preparation for Theory Test I.

1. A biased coin (with probability of obtaining a Head equal to 𝑝 > 0) is tossed repeated and
independently until the first head is observed. Compute the probability that the first head
appears at an even numbered toss.

2. Consider the Bayesian Network with Boolean variables as shown in Figure 1 below:

𝑋11 𝑋12 𝑋13

𝑋21 𝑋22

𝑋31 𝑋32 𝑋33

Figure 1

1|Page
a. List all the variables that are conditionally independent of X32 given X21 and X22.
b. List all the variables that are conditionally independent of X21 given X11 and X12.
c. Is there any variable(s) conditionally independent of X33 given X22?? If so, list all.
d. Write the joint probability P(X11, X12, X13, X21, X22, X31, X32, X33) factored according
to the Bayesian network in Figure 1.
e. How many parameters are necessary to define the conditional probability distributions
for this Bayesian network?
3. Draw a Bayesian Network that corresponds to the conditional probability equations below:
a. P(X1| X2)P(X2|X3, X4)P(X3)P(X4)
b. P(X1)P(X2)P(X3|X1, X2)P(X4|X2)

4. The following dataset contains loan information and can be used to try to predict whether a
borrower will default (the last column is the classification). Use the naïve Bayes classifier
to determine whether a loan X= (Home Owner = No, Marital Status=Married,
Income=High) should be classified as a Defaulted Borrower or not.

5. Study the Probabilistic Graphical Model in Figure 2 below and answer the questions that
follow:

𝑋1 𝑋2 … 𝑋𝑛−1 𝑋𝑛

Figure 2

a. Write the joint probability, P(Y, X) = P(Y, X1, X2, …, Xn) factored according to the
Bayes net.
b. Write down the expression for the probability, P(Y|X).

2|Page
c. State which well-known classifier is defined by the Bayesian network in the figure
above.

6. Kim is building a spam filter. She has the hypothesis that counting the occurrences of the
letter ‘x’ in the e-mails will be a good indicator of spam or no-spam. She collects 7 spam
messages and 7 no-spam messages and counts the number of x-s in each. Here is what she
finds.
Number of ‘x’-s in each spam: [0, 3, 4, 8, 9, 13, 21]

Number of ‘x’-s in each no-spam: [0, 0, 1, 2, 2, 5, 6]

She trains a logistic regression classifier on the data and plots the classifier against the data.

a. Consider an e-mail with no ‘x’-s. According to the model, what is roughly the
probability of this message being a spam message and what is the probability of it not
being a spam.
b. How is a logistic regression model normally turned into a binary classifier?
c. If you turn the model into a binary classifier as you suggested above, what is the
accuracy of the classifier on the training data?

7. How exactly does an n-gram model compute the probability P(s) for a string = 𝑤1𝑛 ,
assuming a bigram model? State the central assumption made in this modelling approach.
8. It is by arbitrary convention that n-gram models go left to right, treating the words to the
left as the context. We can as well flip the order and treat the words to the right as context
instead and compute the probability by reading the sentence right to left. Prove that the
probability of a sentence, w, of length n under a left-ordered bigram model is the same as
under a right-ordered model, using the sentence “I want to eat Chinese food”
9. Suppose we didn’t use the end-symbol </s>. Train an unsmoothed bigram grammar on the
following training corpus without using the end-symbol </s>:
<s> a b

3|Page
<s> b b

<s> b a

<s> a a

Demonstrate that your bigram model does not assign a single probability distribution across
all sentence lengths by showing that the sum of the probability of the four possible 2-word
sentences over the alphabet {a, b} is 1.0, and the sum of the probability of all possible 3-
word sentences over the alphabet {a, b} is also 1.0.

10. Give an equation for finding the most probable sequence of part of speech (POS) tags that
could be utilized by a stochastic POS tagger. Assume a tri-gram model on the POS sequence.

11. Use the POS-tagged corpus in Figure 1 to answer the following questions:

A/DT bat/NN is/VBZ flying/VBG. A/DT Flying/JJ bat/NN is/VBZ scary/JJ. I/PRP
saw/VBZ Mary/NNP flying/VBG a/DT bat/NN. She/PRP likes/VBZ flying/VBG a/DT
scary/JJ bat/NN. I/PRP like/VBZ flying/VBG to/TO JHB/NN. Bats/NN are/VBZ
flying/JJ mammals/NN. Flying/VBG planes/NN is/VBZ scary/JJ.

Figure 3
a. Assuming a bigram model, determine which of the following phrases is the most likely
sequence of the words; scary, flying, bat, a, and is, given the corpus in Figure 2.
i. A scary bat is flying,
ii. Flying a bat is scary

b. Calculate the likelihood score of the POS-tag sequence VBG NN VBZ JJ, given
the phrase “flying bats is scary”. Assume a bigram model on the transition
probabilities

12. Given the training data in Figure 2, determine which of the following sequence labels are
the most likely POS tags for the sentence “a green bottle”.
a. DT JJ NN
b. DT NN NN

The_DT green_JJ bottle_NN leaked_VVD. The_DT suppliers_NN bottle_VVB


water_NN .Green_JJ water_NN suppliers_NN bottle_VVB. A_DT bottle_NN of_P
water_NN. A_DT green_JJ water_JJ bottle_NN.

Figure 4

4|Page
13. Some students registered for COMP104 have a tendency of using some senior student to
write online tests for them. The students wrote 3 tests for the module. The lecturer for the
module has got past data for the previous year’s offering of the module that shows whether
each student got some body to write the test (H) or not (W), and whether the students
passed the test (P) or not (F). For any student the mark the student got is observable, but
whether the student got some help on writing the test is not observable.
a. The Lecturer used the data to fit a hidden Markov model to predict the most likely
labels sequences
Table 1: Transition Probabilities

H W END
Start 0.1 0.9 0
H 0.7 0.2 0.1
W 0.1 0.8 0.1

Table 2:Emission Probabilities

Start P F
Start 1 0 0
H 0 0.6 0.4
W 0 0.5 0.5

Suppose the lecturer observed the sequence, F P P, for a given student, what is the
most likely sequence of Test writing events. Hint: Use the Viterbi algorithm?

14. A friend of mine at UCT likes to climbing the Table Mountain. To make a good start to
the coming week, he climbs on a Sunday with probability 0.98. Being concerned for his
own safety, he is less likely to climb today if he climbed yesterday, so
P(climb today|climb yesterday) = 0.4
If he did not climb yesterday then he is very likely to climb today, so
P(climb today|¬climb yesterday) = 0.1
Unfortunately, he is not a very good climber, and is quite likely to injure himself if he goes
climbing, so
P(injury|climb today) = 0.8
whereas
P(injury|¬climb today) = 0.1
a. Explain how my friend’s behavior can be formulated as a Hidden Markov Model.
What assumptions are required?

5|Page
b. You learn that on Monday and Tuesday evening he obtains an injury, but on
Wednesday evening he does not. Compute the probability that he climbed on
Wednesday.
15. Given the undirected graph with the potential functions defined as shown in the graph
below, define the CRF function that defines the probability of a tag sequence given the
input sequence

𝑓(𝑦𝑡 , 𝑦𝑡−1 )
yt-1 yt … Yn

𝑓(𝑦𝑡 , 𝑥𝑡 )

xt Xn

----------------------------------------------------------------------------------------------------------------

WISHING YOU ALL THE BEST

6|Page

You might also like