Module 4 - Problems
Module 4 - Problems
Problems
Outline
• Bayesian Networks • Bandit Problems
• Markov Models
• MDPs
• Dempster-Shafer Theory
• Decision Networks
• Bayesian Decision Theory
• MDPs
• Value Iteration
• Policy Iteration
Bayesian Networks
You have a new burglar alarm at home. It's fairly reliable at detecting burglaries but
also sometimes goes on because of a minor earthquake. You have two neighbors,
John and Mary, who promise to call you when they hear the alarm. John always
calls when he hears the alarm, but he sometimes confuses the phone ringing with
the alarm and calls by mistake. Mary likes loud music and sometimes misses the
alarm.
The variables are:
• B (Burglary): True or False
• E (Earthquake): True or False
• A (Alarm): True or False
• J (John Calls): True or False
• M (Mary Calls): True or False
Bayesian Networks
The causal relationships are:
• Burglary and Earthquake can cause the Alarm to go off.
• The Alarm going on can cause John to call.
• The Alarm going on can cause Mary to call.
This new matrix tells us, for example, that if today is sunny, there is a 60%
chance it will be sunny in two days.
Markov Models
The matrix for two-day transitions, P(2) , is P×P.
0.7 0.2 0.1 0.7 0.2 0.1
𝑃2 = 𝑃 × 𝑃 = 0.4 0.4 0.2 × 0.4 0.4 0.2
0.3 0.4 0.1 0.3 0.4 0.1
2
𝑃11 = (0.7 × 0.7) + (0.2 × 0.4) + (0.1 × 0.3) = 0.49 + 0.08 + 0.03 = 0.60
This new matrix tells us, for example, that if today is sunny, there is a 60%
chance it will be sunny in two days.
Markov Models
Consider the Markov chain with three states, S={1,2,3}, that has the
following transition matrix
1 1 1
2 4 4
1 2
0
3 3
1 1
0
2 2
Draw the state transition diagram for this chain.
1
If we know 𝑃(𝑋1 = 1) = 𝑃(𝑋1 = 2) = 4
, find 𝑃(𝑋1 = 3, 𝑋2 = 2, 𝑋3 = 1)
Markov Models
Draw the state transition diagram for this chain.
Markov Models
1
If we know 𝑃(𝑋1 = 1) = 𝑃(𝑋1 = 2) = 4
, find 𝑃(𝑋1 = 3, 𝑋2 = 2, 𝑋3 = 1)
For a Markov chain the joint probability factors as
𝑃 𝑋1 = 𝑎, 𝑋2 = 𝑏, 𝑋3 = 𝑐 = 𝑃 𝑋1 = 𝑎 𝑃𝑎,𝑏 𝑃𝑏,𝑐
We are given 𝑃(𝑋1=1)=𝑃(𝑋1=2)=1/4
1 1 1
Hence𝑃(𝑋1=3)=1 − − = .
4 4 2
1 1
From the matrix: 𝑃3,2 = 2
; and 𝑃2,1 = 3
.
1 1 1 1
Therefore 𝑃(𝑋1 = 3, 𝑋2 = 2, 𝑋3 = 1) = ⋅ ⋅ = .
2 2 3 12
MDP
Robot in a Grid World
States (S):The robot can be in three states: 𝑆 = {𝑠1 = 𝑆𝑡𝑎𝑟𝑡, 𝑠2 =
𝑀𝑖𝑑𝑑𝑙𝑒, 𝑠3 = 𝐺𝑜𝑎𝑙}
Actions (A):From each state, the robot can choose: 𝐴={Left, Right}
Transition probabilities (P):
If the robot is in Start (s1) and takes:
Right → goes to Middle (s2) with prob 1.
Left → stays in Start (s1) with prob 1.
If the robot is in Middle (s2) and takes:
Right → goes to Goal (s3) with prob 0.8, else falls back to Start (s1) with prob 0.2.
Left → goes back to Start (s1) with prob 1.
If in Goal (s3): absorbing state (stays there regardless of action).
Rewards (R):Moving to Goal (s3) gives reward +10.All other moves give
reward 0.
MDP
Robot in a Grid World
State Action Next State Probability Reward
s1 Right s2 1.0 0
s1 Left s1 1.0 0
s2 Right s3 0.8 +10
s2 Left s1 0.2 0
s3 any s3 1.0 0
MDP
Robot in a Grid World • So, the expected reward starting from
• Suppose the robot always chooses s2 is:
Right (policy π). • 𝑉 𝑠2 = 0.8 × 10 + 0.2 × 𝑉 𝑠1 .
Test B:
• 𝑚𝐵 𝐻2 = 0.6
• 𝑚𝐵 𝐻1 𝐻2 = 0.2
• 𝑚𝐵 Θ = 0.2
Dempster-Shafer Theory
Multiply all pairs
• We take one set from A and one set from B, multiply their masses,
and record the intersection.
Example:
• A gives 𝐻1 𝐻2 ,B gives 𝐻2 .
Intersection = 𝐻2 .
Multiply masses: 0.7 × 0.6 = 0.42.
• Do this for all pairs:
Dempster-Shafer Theory
Multiply all pairs: Do this for all pairs:
From A From B Intersection Mass product
{H1,H2} (0.7) {H2} (0.6) {H2} 0.42
{H1,H2} (0.7) {H1,H2} (0.2) {H1,H2} 0.14
{H1,H2} (0.7) Θ (0.2) {H1,H2} 0.14
{H3} (0.1) {H2} (0.6) ∅ (conflict) 0.06
{H3} (0.1) {H1,H2} (0.2) ∅ (conflict) 0.02
{H3} (0.1) Θ (0.2) {H3} 0.02
Θ (0.2) {H2} (0.6) {H2} 0.12
Θ (0.2) {H1,H2} (0.2) {H1,H2} 0.04
Θ (0.2) Θ (0.2) Θ 0.04
Dempster-Shafer Theory
Add up by set
• For 𝐻2 :0.42 + 0.12 = 0.54
• For 𝐻1 𝐻2 :0.14 + 0.14 + 0.04 = 0.32
• For 𝐻3 :0.02
• For Θ: 0.04
• Conflict = 0.06 + 0.02 = 0.08
Dempster-Shafer Theory
Normalize
• We must remove the conflict by dividing everything by 1 − 𝐾.
• 1 − 𝐾 = 1 − 0.08 = 0.92
• So:
• 𝑚 𝐻2 = 0.54/0.92 ≈ 0.587
• 𝑚 𝐻1 𝐻2 = 0.32/0.92 ≈ 0.348
• 𝑚 𝐻3 = 0.02/0.92 ≈ 0.022
• 𝑚 Θ = 0.04/0.92 ≈ 0.043
Dempster-Shafer Theory
Final combined masses:
Covid (H2) → 58.7%
Flu or Covid (H1, H2) → 34.8%
Other (H3) → 2.2%
Uncertainty (Θ) → 4.3%
Decision Networks
Final combined masses:
Covid (H2) → 58.7%
Flu or Covid (H1, H2) → 34.8%
Other (H3) → 2.2%
Uncertainty (Θ) → 4.3%
Decision Networks
Umbrella
Weather
Forecast
Decision Networks
• MEU: choose the action which maximizes the expected utility given the evidence
Weather
New node types:
Chance nodes (just like BNs)
Umbrella
U
Umbrella = take
Weather
A W U(A,W)
W P(W) leave sun 100
sun 0.7 leave rain 0
rain 0.3 take sun 20
Optimal decision = leave
take rain 70
Decision Networks: Notation
Umbrella = leave EU(leave) = Expected Utility of taking action
leave
In the parentheses, we write an action
Calculating EU requires taking an expectation
over chance node outcomes
Umbrella = take MEU(ø) = Maximum Expected Utility, given
no information
In the parentheses, we write the evidence (which
nodes we know)
Calculating MEU requires taking a maximum over
several expectations (one EU per action)
Optimal decision = leave
Decisions as Outcome Trees
{}
Umbrella
U Weather | {} Weather | {}
Weather
Umbrella = take
Weather
W P(W|F=bad)
sun 0.34
rain 0.66
Forecast
Optimal decision = take
=bad
Decision Networks: Notation
Umbrella = leave EU(leave|bad) = Expected Utility of choosing
leave, given you know the forecast is bad
Left side of conditioning bar: Action being taken
Right side of conditioning bar: The random
variable(s) we know the value of (evidence)
Umbrella = take MEU(F=bad) = Maximum Expected Utility,
given you know the forecast is bad
In the parentheses, we write the evidence (which
nodes we know)
U
W | {b} W | {b}
Weather
• Iteration 2:
• Iteration 3: