0% found this document useful (0 votes)
20 views54 pages

Module 4 - Problems

The document outlines various concepts in probabilistic models, including Bayesian Networks, Markov Models, and Decision Networks. It provides a detailed example of a Bayesian Network involving a burglar alarm scenario, illustrating how to calculate conditional probabilities using joint distributions. Additionally, it introduces Markov Models with a weather example, explaining transition probabilities between different weather states.

Uploaded by

manansakhiya3112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views54 pages

Module 4 - Problems

The document outlines various concepts in probabilistic models, including Bayesian Networks, Markov Models, and Decision Networks. It provides a detailed example of a Bayesian Network involving a burglar alarm scenario, illustrating how to calculate conditional probabilities using joint distributions. Additionally, it introduces Markov Models with a weather example, explaining transition probabilities between different weather states.

Uploaded by

manansakhiya3112
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

Module 4

Problems
Outline
• Bayesian Networks • Bandit Problems
• Markov Models
• MDPs
• Dempster-Shafer Theory
• Decision Networks
• Bayesian Decision Theory
• MDPs
• Value Iteration
• Policy Iteration
Bayesian Networks
You have a new burglar alarm at home. It's fairly reliable at detecting burglaries but
also sometimes goes on because of a minor earthquake. You have two neighbors,
John and Mary, who promise to call you when they hear the alarm. John always
calls when he hears the alarm, but he sometimes confuses the phone ringing with
the alarm and calls by mistake. Mary likes loud music and sometimes misses the
alarm.
The variables are:
• B (Burglary): True or False
• E (Earthquake): True or False
• A (Alarm): True or False
• J (John Calls): True or False
• M (Mary Calls): True or False
Bayesian Networks
The causal relationships are:
• Burglary and Earthquake can cause the Alarm to go off.
• The Alarm going on can cause John to call.
• The Alarm going on can cause Mary to call.

Conditional Probability Tables • 𝑃(𝐽 ∣ 𝐴)


• P(J=true ∣ A=true) = 0.90
• Prior Probabilities: • P(J=true ∣ A=false) = 0.05
• P(B=true)=0.001 • 𝑃(𝑀 ∣ 𝐴)
• P(E=true)=0.002 • P(M=true ∣ A=true) = 0.70
• Conditional Probabilities: • P(M=true ∣ A=false) = 0.01
• 𝑃(𝐴 ∣ 𝐵, 𝐸)
• P(A=true∣B=true,E=true) = 0.95
• P(A=true∣B=true,E=false) = 0.94
• P(A=true∣B=false,E=true) = 0.29
• P(A=true∣B=false,E=false) = 0.001
Bayesian Networks
Find the probability of a burglary given that John and Mary have both called:
𝑷(𝑩 = 𝒕𝒓𝒖𝒆 ∣ 𝑱 = 𝒕𝒓𝒖𝒆, 𝑴 = 𝒕𝒓𝒖𝒆).
𝑃 𝐵,𝐽,𝑀
• Using the formula for conditional probability, we get: 𝑃(𝐵 ∣ 𝐽, 𝑀) =
𝑃(𝐽,𝑀)
• In Bayesian network, we can expand the joint probability using the chain rule, :
𝑃(𝐵, 𝐸, 𝐴, 𝐽, 𝑀) = 𝑃(𝐵) × 𝑃(𝐸) × 𝑃(𝐴 ∣ 𝐵, 𝐸) × 𝑃(𝐽 ∣ 𝐴) × 𝑃(𝑀 ∣ 𝐴)
• Numerator, 𝑷(𝑩, 𝑱, 𝑴):
• Summing out the unobserved variables, A and E: 𝑃(𝐵, 𝐽, 𝑀) =
σ𝐴,𝐸 𝑃(𝐵, 𝐸, 𝐴, 𝐽, 𝑀)
• This expands to: 𝑷 𝑩, 𝑱, 𝑴 = σ𝑨𝑬 𝑷(𝑩)𝑷(𝑬)𝑷(𝑨 ∣ 𝑩, 𝑬)𝑷(𝑱 ∣ 𝑨)𝑷(𝑴 ∣ 𝑨)
• To calculate this, we need to consider all possible states of A and E:
𝑃(𝐵 = 𝑇𝑟, 𝐽 = 𝑇𝑟, 𝑀 = 𝑇𝑟) = 𝑃 𝐵 ෍ ෍ ​𝑃(𝐸)𝑃(𝐴 ∣ 𝐵, 𝐸)𝑃(𝐽 ∣ 𝐴)𝑃(𝑀 ∣ 𝐴)
𝐸 𝐴
Bayesian Networks
Case 1: The Alarm is ON (A=true)
• 𝑃(𝐵 = 𝑡𝑟𝑢𝑒, 𝐸, 𝐴 = 𝑡𝑟𝑢𝑒, 𝐽 = 𝑡𝑟𝑢𝑒, 𝑀 = 𝑡𝑟𝑢𝑒)
• Summing over both states of E:
• 𝑷(𝑩) × 𝑷(𝑬) × 𝑷(𝑨 ∣ 𝑩, 𝑬) × 𝑷(𝑱 ∣ 𝑨) × 𝑷(𝑴 ∣ 𝑨)
• 𝑃(𝐵) × 𝑃(𝐸 = 𝑡𝑟𝑢𝑒) × 𝑃(𝐴 = 𝑡𝑟𝑢𝑒 ∣ 𝐵 = 𝑡𝑟𝑢𝑒, 𝐸 = 𝑡𝑟𝑢𝑒) × 𝑃(𝐽 = 𝑡𝑟𝑢𝑒 ∣
𝐴 = 𝑡𝑟𝑢𝑒) × 𝑃(𝑀 = 𝑡𝑟𝑢𝑒 ∣ 𝐴 = 𝑡𝑟𝑢𝑒)
• =0.001×0.002×0.95×0.90×0.70=0.000001197
• 𝑃(𝐵) × 𝑃(𝑬 = 𝒇𝒂𝒍𝒔𝒆) × 𝑃(𝐴 = 𝑡𝑟𝑢𝑒 ∣ 𝐵 = 𝑡𝑟𝑢𝑒, 𝑬 = 𝒇𝒂𝒍𝒔𝒆) × 𝑃(𝐽 =
𝑡𝑟𝑢𝑒 ∣ 𝐴 = 𝑡𝑟𝑢𝑒) × 𝑃(𝑀 = 𝑡𝑟𝑢𝑒 ∣ 𝐴 = 𝑡𝑟𝑢𝑒)
• =0.001×(1−0.002)×0.94×0.90×0.70≈0.0005886
• Total for Case 1 = 0.000001197 + 0.0005886 = 0.000589797
Bayesian Networks
Case 2: The Alarm is OFF (A=false)
• 𝑃(𝐵 = 𝑡𝑟𝑢𝑒, 𝐸, 𝐴 = 𝑓𝑎𝑙𝑠𝑒, 𝐽 = 𝑡𝑟𝑢𝑒, 𝑀 = 𝑡𝑟𝑢𝑒)
• Summing over both states of E:
• 𝑃(𝐵) × 𝑃(𝑬 = 𝒕𝒓𝒖𝒆) × 𝑃(𝑨 = 𝒇𝒂𝒍𝒔𝒆 ∣ 𝐵 = 𝑡𝑟𝑢𝑒, 𝑬 = 𝒕𝒓𝒖𝒆) × 𝑃(𝐽 =
𝑡𝑟𝑢𝑒 ∣ 𝑨 = 𝒇𝒂𝒍𝒔𝒆) × 𝑃(𝑀 = 𝑡𝑟𝑢𝑒 ∣ 𝑨 = 𝒇𝒂𝒍𝒔𝒆)
• =0.001×0.002×(1−0.95)×0.05×0.01=0.0000000005 (negligible)

• 𝑃(𝐵) × 𝑃(𝑬 = 𝒇𝒂𝒍𝒔𝒆) × 𝑃(𝑨 = 𝒇𝒂𝒍𝒔𝒆 ∣ 𝐵 = 𝑡𝑟𝑢𝑒, 𝑬 = 𝒇𝒂𝒍𝒔𝒆) × 𝑃(𝐽 =


𝑡𝑟𝑢𝑒 ∣ 𝑨 = 𝒇𝒂𝒍𝒔𝒆) × 𝑃(𝑀 = 𝑡𝑟𝑢𝑒 ∣ 𝑨 = 𝒇𝒂𝒍𝒔𝒆)
• =0.001×(1−0.002)×(1−0.94)×0.05×0.01≈0.000000002994 (negligible)
• Total for Case 2 ≈0.000000003494
Bayesian Networks
• Numerator P(B=true,J=true,M=true) is the sum of both cases:
0.000589797+0.000000003494 ≈ 0.0005898
Bayesian Networks
Calculation of the Denominator: 𝑷(𝑱 = 𝒕𝒓𝒖𝒆, 𝑴 = 𝒕𝒓𝒖𝒆)
• The denominator is the total probability of John and Mary calling,
regardless of whether a burglary occurred.
• We calculate this by summing the numerator (the burglary case) and
the case where there is no burglary.
• We already have the numerator value.
• Now we need to calculate 𝑃(𝐵 = 𝑓𝑎𝑙𝑠𝑒, 𝐽 = 𝑡𝑟𝑢𝑒, 𝑀 = 𝑡𝑟𝑢𝑒).
Bayesian Networks
• Case C: Alarm is ON (A=true)
• 𝑃 𝐵𝑐 , 𝐽, 𝑀, 𝐴 = 𝑃 𝐵𝑐 𝑃 𝐸 𝑃 𝐴 𝐵𝑐 , 𝐸 𝑃 𝐽 𝐴 𝑃 𝑀 𝐴 +
𝑃 𝐵𝑐 𝑃 𝐸 𝑐 𝑃 𝐴 𝐵𝑐 , 𝐸 𝑐 𝑃 𝐽 𝐴 𝑃 𝑀 𝐴
• B=false, E=true: 0.999×0.002×0.29×0.90×0.70=0.000363065
• B=false, E=false: 0 .999×0.998×0.001×0.90×0.70=0.000628362
• Sum of Case C: 0.000363065+0.000628362=0.000991427
• Case D: Alarm is OFF (A=false)
• 𝑃(𝐵𝑐 , 𝐽, 𝑀, 𝐴𝑐 ) = [𝑃(𝐵𝑐 )𝑃(𝐸)𝑃(𝐴𝑐 ∣ 𝐵𝑐 , 𝐸)𝑃(𝐽 ∣ 𝐴𝑐 )𝑃(𝑀 ∣ 𝐴𝑐 )] +
[𝑃(𝐵𝑐 )𝑃(𝐸 𝑐 )𝑃(𝐴𝑐 ∣ 𝐵𝑐 , 𝐸 𝑐 )𝑃(𝐽 ∣ 𝐴𝑐 )𝑃(𝑀 ∣ 𝐴𝑐 )]
• B=false,E=true: 0.999×0.002×(1−0.29)×0.05×0.01=0.000000699
• B=false,E=false: 0.999×0.998×(1−0.001)×0.05×0.01=0.000004975
• Sum of Case D: 0.000000699+0.000004975=0.000005674
• Total for no burglary:
• 𝑃(𝐵 = 𝑓𝑎𝑙𝑠𝑒, 𝐽, 𝑀) =0.000991427+0.000005674=0.000997101
Bayesian Networks
Total Denominator:
P(J,M) = P(B,J,M) + P(Bc,J,M)
P(J,M) = 0.0005898245 + 0.000997101 = 0.0015869255
Bayesian Networks
Step 3: Combine the results
• Final Conditional Probability Calculation
𝑃 𝐵 = 𝑡𝑟𝑢𝑒, 𝐽 = 𝑡𝑟𝑢𝑒, 𝑀 = 𝑡𝑟𝑢𝑒
𝑃 𝐵 = 𝑡𝑟𝑢𝑒 𝐽 = 𝑡𝑟𝑢𝑒, 𝑀 = 𝑡𝑟𝑢𝑒 =
𝑃 𝐽 = 𝑡𝑟𝑢𝑒, 𝑀 = 𝑡𝑟𝑢𝑒
= 0.0005898/0.0015869​≈0.3716
= 37.16%
Markov Models
Weather Model Example
 Imagine a simplified weather system with three possible states:
 S1: Sunny
 S2: Cloudy
 S3: Rainy
 We want to model the probability of the weather on any given day,
based on the weather of the previous day.
 This dependency on only the previous state makes it a first-order Markov
model.
Markov Models
 Transition Probabilities:
 The relationships between states are defined by a transition matrix.
 Each entry in the matrix, 𝑃𝑖𝑗​ , represents the probability of moving from
state i to state j in one step (one day).
 Let's use the following probabilities:

To: Sunny (S1) Cloudy (S2) Rainy (S3)


From: Sunny (S1) 0.7 0.2 0.1
From: Cloudy (S2) 0.4 0.4 0.2
From: Rainy (S3) 0.3 0.6 0.1
Markov Models
 Transition Probabilities:
 P11​ =0.7: If today is sunny, there's a 70% chance it will be sunny tomorrow.
 P12​ =0.2: If today is sunny, there's a 20% chance it will be cloudy tomorrow.
 P23​ =0.2: If today is cloudy, there's a 20% chance it will be rainy tomorrow.
 Notice that the sum of the probabilities in each row must equal 1, since the
system must transition to one of the states.

To: Sunny (S1) Cloudy (S2) Rainy (S3)


From: Sunny (S1) 0.7 0.2 0.1
From: Cloudy (S2) 0.4 0.4 0.2
From: Rainy (S3) 0.3 0.6 0.1
Markov Models
 Let's use this model to answer a question: "If today is sunny, what is
the probability that it will be sunny three days from now?“

 We can find the probability of being in a certain state after multiple


steps by multiplying the transition matrix by itself.
 The matrix for two-day transitions, P(2) , is P×P.
 The matrix for three-day transitions, P(3) , is P×P×P.
Markov Models
 The matrix for two-day transitions, P(2) , is P×P.
0.7 0.2 0.1 0.7 0.2 0.1
𝑃2 = 𝑃 × 𝑃 = 0.4 0.4 0.2 × 0.4 0.4 0.2
0.3 0.4 0.1 0.3 0.4 0.1
2
𝑃11 = (0.7 × 0.7) + (0.2 × 0.4) + (0.1 × 0.3) = 0.49 + 0.08 + 0.03 = 0.60

0.6 0.28 0.12


𝑃2 = 0.5 0.36 0.14
0.48 0.38 0.14

This new matrix tells us, for example, that if today is sunny, there is a 60%
chance it will be sunny in two days.
Markov Models
 The matrix for two-day transitions, P(2) , is P×P.
0.7 0.2 0.1 0.7 0.2 0.1
𝑃2 = 𝑃 × 𝑃 = 0.4 0.4 0.2 × 0.4 0.4 0.2
0.3 0.4 0.1 0.3 0.4 0.1
2
𝑃11 = (0.7 × 0.7) + (0.2 × 0.4) + (0.1 × 0.3) = 0.49 + 0.08 + 0.03 = 0.60

0.6 0.28 0.12


𝑃2 = 0.5 0.36 0.14
0.48 0.38 0.14

This new matrix tells us, for example, that if today is sunny, there is a 60%
chance it will be sunny in two days.
Markov Models
 Consider the Markov chain with three states, S={1,2,3}, that has the
following transition matrix
1 1 1
2 4 4
1 2
0
3 3
1 1
0
2 2
 Draw the state transition diagram for this chain.
1
 If we know 𝑃(𝑋1 = 1) = 𝑃(𝑋1 = 2) = 4
, find 𝑃(𝑋1 = 3, 𝑋2 = 2, 𝑋3 = 1)
Markov Models
 Draw the state transition diagram for this chain.
Markov Models
1
 If we know 𝑃(𝑋1 = 1) = 𝑃(𝑋1 = 2) = 4
, find 𝑃(𝑋1 = 3, 𝑋2 = 2, 𝑋3 = 1)
 For a Markov chain the joint probability factors as
𝑃 𝑋1 = 𝑎, 𝑋2 = 𝑏, 𝑋3 = 𝑐 = 𝑃 𝑋1 = 𝑎 𝑃𝑎,𝑏 𝑃𝑏,𝑐
 We are given 𝑃(𝑋1=1)=𝑃(𝑋1=2)=1/4
1 1 1
 Hence𝑃(𝑋1=3)=1 − − = .
4 4 2

1 1
 From the matrix: 𝑃3,2 = 2
;​ and 𝑃2,1 = 3
.

1 1 1 1
 Therefore 𝑃(𝑋1 = 3, 𝑋2 = 2, 𝑋3 = 1) = ⋅ ⋅ = .
2 2 3 12
MDP
Robot in a Grid World
 States (S):The robot can be in three states: 𝑆 = {𝑠1 = 𝑆𝑡𝑎𝑟𝑡, 𝑠2 =
𝑀𝑖𝑑𝑑𝑙𝑒, 𝑠3 = 𝐺𝑜𝑎𝑙}
 Actions (A):From each state, the robot can choose: 𝐴={Left, Right}
 Transition probabilities (P):
 If the robot is in Start (s1) and takes:
 Right → goes to Middle (s2) with prob 1.
 Left → stays in Start (s1) with prob 1.
 If the robot is in Middle (s2) and takes:
 Right → goes to Goal (s3) with prob 0.8, else falls back to Start (s1) with prob 0.2.
 Left → goes back to Start (s1) with prob 1.
 If in Goal (s3): absorbing state (stays there regardless of action).
 Rewards (R):Moving to Goal (s3) gives reward +10.All other moves give
reward 0.
MDP
Robot in a Grid World
State Action Next State Probability Reward
s1 Right s2 1.0 0
s1 Left s1 1.0 0
s2 Right s3 0.8 +10
s2 Left s1 0.2 0
s3 any s3 1.0 0
MDP
Robot in a Grid World • So, the expected reward starting from
• Suppose the robot always chooses s2 is:
Right (policy π). • 𝑉 𝑠2 = 0.8 × 10 + 0.2 × 𝑉 𝑠1 .

• From s1: • And from s1:


• It goes to s2 with prob 1. • 𝑉 𝑠1 = 𝑉 𝑠2 .

• From s2 with Right: • Solving:


• To Goal (s3) with prob 0.8 → reward = 10 • 𝑉 𝑠2 = 8 + 0.2𝑉 𝑠1 , 𝑉 𝑠1 = 𝑉 𝑠2 .
• To Start (s1) with prob 0.2 → reward = 0, • 𝑉 𝑠1 = 8 + 0.2𝑉 𝑠1 .
and it must try again. • 0.8𝑉 𝑠1 = 8 ⇒ 𝑉 𝑠1 = 10.
• So, if the robot always goes Right, the
expected total reward from the Start is
10.
Dempster-Shafer Theory
• We are diagnosing a patient:
Θ = 𝐻1 = Flu 𝐻2 = Covid 𝐻3 = Other .
• Two tests (A and B) give basic probability assignments (BPAs).

Test A evidence Test B evidence


• 𝑚𝐴 𝐻1 𝐻2 = 0.7( symptom • 𝑚𝐵 𝐻2 = 0.6( strong sign of
suggests Flu or Covid) Covid
• 𝑚𝐴 𝐻3 = 0.1( 10% suggest • 𝑚𝐵 𝐻1 𝐻2 = 0.2( could be Flu
other or Covid
• 𝑚𝐴 Θ = 0.2( : Ignorance • 𝑚𝐵 Θ = 0.2( uncertainty
Dempster-Shafer Theory
Dempster’s Rule
• When we combine two pieces of evidence, the mass of a set 𝐶is:
𝟏
𝒎𝑨𝑩 𝑪 = ෍ 𝒎𝑨 𝑨𝒊 𝒎𝑩 𝑩𝒋
𝟏−𝑲
𝑨𝒊∩𝑩𝒋=𝑪
• The numerator adds up products where the intersection = 𝐶.
• 𝐾 is the conflict mass (where the intersection is empty).
Dempster-Shafer Theory
Conflict 𝐾
• Look for pairs where 𝐴and 𝐵 don’t overlap.
• Example: 𝐴 = 𝐻3 , 𝐵 = 𝐻2 .
• Intersection = ∅.
• Contribution = 0.1 × 0.6 = 0.06.
• So:
• 𝐾 = 0.06.
Normalization factor:
1 1
• = ≈ 1.064.
1−𝐾 0.94
Dempster-Shafer Theory
Test A:
• 𝑚𝐴 𝐻1 𝐻2 = 0.7
• 𝑚𝐴 𝐻3 = 0.1
• 𝑚𝐴 Θ = 0.2

Test B:
• 𝑚𝐵 𝐻2 = 0.6
• 𝑚𝐵 𝐻1 𝐻2 = 0.2
• 𝑚𝐵 Θ = 0.2
Dempster-Shafer Theory
Multiply all pairs
• We take one set from A and one set from B, multiply their masses,
and record the intersection.
Example:
• A gives 𝐻1 𝐻2 ,B gives 𝐻2 .
Intersection = 𝐻2 .
Multiply masses: 0.7 × 0.6 = 0.42.
• Do this for all pairs:
Dempster-Shafer Theory
Multiply all pairs: Do this for all pairs:
From A From B Intersection Mass product
{H1,H2} (0.7) {H2} (0.6) {H2} 0.42
{H1,H2} (0.7) {H1,H2} (0.2) {H1,H2} 0.14
{H1,H2} (0.7) Θ (0.2) {H1,H2} 0.14
{H3} (0.1) {H2} (0.6) ∅ (conflict) 0.06
{H3} (0.1) {H1,H2} (0.2) ∅ (conflict) 0.02
{H3} (0.1) Θ (0.2) {H3} 0.02
Θ (0.2) {H2} (0.6) {H2} 0.12
Θ (0.2) {H1,H2} (0.2) {H1,H2} 0.04
Θ (0.2) Θ (0.2) Θ 0.04
Dempster-Shafer Theory
Add up by set
• For 𝐻2 :0.42 + 0.12 = 0.54
• For 𝐻1 𝐻2 :0.14 + 0.14 + 0.04 = 0.32
• For 𝐻3 :0.02
• For Θ: 0.04
• Conflict = 0.06 + 0.02 = 0.08
Dempster-Shafer Theory
Normalize
• We must remove the conflict by dividing everything by 1 − 𝐾.
• 1 − 𝐾 = 1 − 0.08 = 0.92
• So:
• 𝑚 𝐻2 = 0.54/0.92 ≈ 0.587
• 𝑚 𝐻1 𝐻2 = 0.32/0.92 ≈ 0.348
• 𝑚 𝐻3 = 0.02/0.92 ≈ 0.022
• 𝑚 Θ = 0.04/0.92 ≈ 0.043
Dempster-Shafer Theory
Final combined masses:
 Covid (H2) → 58.7%
 Flu or Covid (H1, H2) → 34.8%
 Other (H3) → 2.2%
 Uncertainty (Θ) → 4.3%
Decision Networks
Final combined masses:
 Covid (H2) → 58.7%
 Flu or Covid (H1, H2) → 34.8%
 Other (H3) → 2.2%
 Uncertainty (Θ) → 4.3%
Decision Networks

Umbrella

Weather

Forecast
Decision Networks
• MEU: choose the action which maximizes the expected utility given the evidence

 Can directly operationalize this with


decision networks
Umbrella
 Bayes nets with nodes for utility and
actions
 Lets us calculate the expected utility for U
each action

Weather
 New node types:
 Chance nodes (just like BNs)

 Actions (rectangles, cannot have parents,


act as observed evidence)
Forecast
 Utility node (diamond, depends on action
and chance nodes)
Decision Networks
 Action selection
 Instantiate all evidence
Umbrella
 Set action node(s) each
possible way U

 Calculate posterior for all


Weather
parents of utility node, given
the evidence

 Calculate expected utility for


each action
Forecast
 Choose maximizing action
Decision Networks
Umbrella = leave

Umbrella

U
Umbrella = take
Weather

A W U(A,W)
W P(W) leave sun 100
sun 0.7 leave rain 0
rain 0.3 take sun 20
Optimal decision = leave
take rain 70
Decision Networks: Notation
Umbrella = leave  EU(leave) = Expected Utility of taking action
leave
 In the parentheses, we write an action
 Calculating EU requires taking an expectation
over chance node outcomes
Umbrella = take  MEU(ø) = Maximum Expected Utility, given
no information
 In the parentheses, we write the evidence (which
nodes we know)
 Calculating MEU requires taking a maximum over
several expectations (one EU per action)
Optimal decision = leave
Decisions as Outcome Trees
{}

Umbrella

U Weather | {} Weather | {}

Weather

U(t,s) U(t,r) U(l,s) U(l,r)

• Almost exactly like expectimax


• What’s changed?
Example: Decision Networks
Umbrella = leave A W U(A,W)
Umbrella leave sun 100
leave rain 0
take sun 20
U take rain 70

Umbrella = take
Weather
W P(W|F=bad)
sun 0.34
rain 0.66

Forecast
Optimal decision = take
=bad
Decision Networks: Notation
Umbrella = leave  EU(leave|bad) = Expected Utility of choosing
leave, given you know the forecast is bad
 Left side of conditioning bar: Action being taken
 Right side of conditioning bar: The random
variable(s) we know the value of (evidence)
Umbrella = take  MEU(F=bad) = Maximum Expected Utility,
given you know the forecast is bad
 In the parentheses, we write the evidence (which
nodes we know)

Optimal decision = take


Decisions as Outcome Trees
Umbrella {b}

U
W | {b} W | {b}
Weather

U(t,s) U(t,r) U(l,s) U(l,r)


Forecast
=bad
Bayesian Decision Theory
• A doctor wants to diagnose whether a patient has Flu (H1) or Covid
(H2).
• Prior probabilities (before seeing test result):
• 𝑃 𝐻1 = 0.6, 𝑃 𝐻2 = 0.4
• (Flu is more common than Covid).
• Test result (Symptom X = "loss of smell"):
• If patient has Flu → probability(symptom) = 0.2
• If patient has Covid → probability(symptom) = 0.7
• Loss (or cost) of wrong decisions:
• Misdiagnosing Covid as Flu = 100 (dangerous mistake).
• Misdiagnosing Flu as Covid = 10 (mild inconvenience).
• Correct diagnosis = 0.
Bayesian Decision Theory
• Step 1: Likelihoods
• 𝑃 𝑋 ∣ 𝐻1 = 0.2, 𝑃 𝑋 ∣ 𝐻2 = 0.7

• Step 2: Evidence (normalizing constant)


• 𝑃 𝑋 = 𝑃 𝑋 ∣ 𝐻1 𝑃 𝐻1 + 𝑃 𝑋 ∣ 𝐻2 𝑃 𝐻2
• = 0.2 0.6 + 0.7 0.4 = 0.12 + 0.28 = 0.40
Bayesian Decision Theory
• Step 3: Posterior probabilities
• Using Bayes’ rule:
𝑃 𝑋∣𝐻1 𝑃 𝐻1 0.2⋅0.6 0.12
• 𝑃 𝐻1 ∣ 𝑋 = 𝑃 𝑋
= 0.40
= 0.40
= 0.30
𝑃 𝑋∣𝐻2 𝑃 𝐻2 0.7⋅0.4 0.28
• 𝑃 𝐻2 ∣ 𝑋 = = = = 0.70
𝑃 𝑋 0.40 0.40
• So after seeing the symptom,
• 𝑃 𝐻1 ∣ 𝑋 = 0.30, 𝑃 𝐻2 ∣ 𝑋 = 0.70
Bayesian Decision Theory
• Step 4: Expected Risk for each decision
• We compute expected loss for choosing each hypothesis.
• If we decide H1 (Flu):
• 𝑅 𝐻1 ∣ 𝑋 = 0 ⋅ 𝑃 𝐻1 ∣ 𝑋 + 100 ⋅ 𝑃 𝐻2 ∣ 𝑋 = 0 ⋅ 0.30 + 100 ⋅ 0.70
= 70
• If we decide H2 (Covid):
• 𝑅 𝐻2 ∣ 𝑋 = 10 ⋅ 𝑃 𝐻1 ∣ 𝑋 + 0 ⋅ 𝑃 𝐻2 ∣ 𝑋 = 10 ⋅ 0.30 + 0 ⋅ 0.70
=3
Bayesian Decision Theory
• Step 5: Decision Rule
• Choose the hypothesis with minimum expected risk.
• 𝑅 𝐻1 ∣ 𝑋 = 70, 𝑅 𝐻2 ∣ 𝑋 = 3

So, the Bayesian decision is: diagnose Covid (H2).


MDPs – Value Iteration
Robot in a Grid • Transitions:
• A robot can be in two rooms: • If robot takes Stay → it stays in the
same room with probability 1.
• 𝑠1 =Room 1 (start)
• If robot takes Move:
• 𝑠2 =Room 2
• From 𝑠1 :goes to 𝑠2with probability 1.
• It can take actions: • From 𝑠2 :goes to 𝑠1with probability 1.
• 𝑎 = Stay • Discount factor: 𝛾 = 0.9.
• 𝑎 = Move
• Rewards:
• Staying in 𝑠1 gives +1.
• Moving to 𝑠2 gives +5.
• Staying in 𝑠2 gives +2.
MDPs – Value Iteration
• Define the Value Equations
• The Bellman equation for each state is:
𝑉 𝑠 = max[𝑅 𝑠 𝑎 + 𝛾 ෍ 𝑃 (𝑠 ′ |𝑠, 𝑎)𝑉 𝑠 ′ ]
a
𝑠′

Write for each state


For 𝑠1:
• If Stay:
• 𝑄 𝑠1 Stay = 1 + 0.9𝑉 𝑠1
• If Move:
• 𝑄 𝑠1 Move = 5 + 0.9𝑉 𝑠2
• So: 𝑉(𝑠1 ) = max 1 + 0.9𝑉 𝑠1 , 5 + 0.9𝑉 𝑠2
MDPs – Value Iteration
• For 𝑠2:
• If Stay:
• 𝑄 𝑠2 Stay = 2 + 0.9𝑉 𝑠2
• If Move:
• 𝑄 𝑠2 Move = 0 + 0.9𝑉 𝑠1
• So: max(2 + 0.9𝑉(𝑠2 ), 0.9𝑉(𝑠1 ))
MDPs – Value Iteration
• Step 3: Solve by Iteration (Value Iteration idea)
• Start with 𝑉 𝑠1 = 𝑉 𝑠2 = 0.
• Iteration 1:

• Iteration 2:

• Iteration 3:

• …and it will converge.


MDPs – Value Iteration
• Policy
• From the updates, the best choices are:
• At 𝑠1 :Move (because reward+future is higher).
• At 𝑠2 :Move (back to 𝑠1 ,since that gives higher long-term value).
• So the optimal policy is:
• Always move between rooms instead of staying.
References
• Bayesian Networks: Link
• Markov Models: Link
• Demster Shafer Theory: Link
• Decision Network: Link_1, Link_2
• MDP – Value Iteration: Link, Link_2

You might also like