Probability
P(a ∧ b)
P(a | b) =
P(b)
P(a ∧ b) = P(b)P(a | b)
P(a ∧ b) = P(a)P(b | a)
probability distribution
P(Flight) = ⟨0.6, 0.3, 0.1⟩
independence
the knowledge that one event occurs does
not affect the probability of the other event
independence
P(a ∧ b) = P(a)P(b | a)
independence
P(a ∧ b) = P(a)P(b)
independence
P( ) = P( )P( )
1 1 1
= ⋅ =
6 6 36
independence
P( ) ≠ P( )P( )
1 1 1
= ⋅ =
6 6 36
independence
P( ) ≠ P( )P( | )
1
= ⋅0=0
6
Bayes' Rule
P(a ∧ b) = P(b) P(a | b)
P(a ∧ b) P(a) P(b | a)
P(a) P(b | a) = P(b) P(a | b)
Bayes' Rule
P(b) P(a | b)
P(b | a) =
P(a)
Bayes' Rule
P(a | b) P(b)
P(b | a) =
P(a)
AM PM
Given clouds in the morning,
what's the probability of rain in the afternoon?
• 80% of rainy afternoons start with cloudy
mornings.
• 40% of days have cloudy mornings.
• 10% of days have rainy afternoons.
P(clouds | rain)P(rain)
P(rain | clouds) =
P(clouds)
(.8)(.1)
=
.4
= 0.2
Knowing
P(cloudy morning | rainy afternoon)
we can calculate
P(rainy afternoon | cloudy morning)
Knowing
P(visible effect | unknown cause)
we can calculate
P(unknown cause | visible effect)
Knowing
P(medical test result | disease)
we can calculate
P(disease | medical test result)
Knowing
P(blurry text | counterfeit bill)
we can calculate
P(counterfeit bill | blurry text)
Joint Probability
PM
AM
C = cloud C = ¬cloud R = rain R = ¬rain
0.4 0.6 0.1 0.9
PM
AM
R = rain R = ¬rain
C = cloud 0.08 0.32
C = ¬cloud 0.02 0.58
Probability Rules
Negation
P( ¬a) = 1 −
P(a)
Inclusion-Exclusion
P(a ∨ b) = P(a) + P(b) − P(a ∧
b)
Marginalization
P(a) = P(a, b) + P(a,
¬b)
Marginalization
P(X = xi) = ∑ P(X = xi, Y = yj)
j
Marginalization
R = rain R = ¬rain
C = cloud 0.08 0.32
C = ¬cloud 0.02 0.58
P(C = cloud)
= P(C = cloud, R = rain) + P(C = cloud, R = ¬rain)
= 0.08 + 0.32
= 0.40
Conditioning
P(a) = P(a | b)P(b) + P(a | ¬b)P(
¬b)
Conditioning
P(X = xi) = ∑ P(X = xi | Y = yj)P(Y = yj)
j
Bayesian Networks
Bayesian network
data structure that represents the
dependencies among random variables
Bayesian network
• directed graph
• each node represents a random variable
• arrow from X to Y means X is a parent of Y
• each node X has probability distribution
P(X | Parents(X))
Rain
{none, light, heavy}
Maintenance
{yes, no}
Train
{on time, delayed}
Appointment
{attend, miss}
Rain none light heavy
{none, light, heavy} 0.7 0.2 0.1
Rain
{none, light, heavy}
R yes no
Maintenance none 0.4 0.6
{yes, no} light 0.2 0.8
heavy 0.1 0.9
Rain
{none, light, heavy}
R M on time delayed
Maintenance none yes 0.8 0.2
{yes, no} none no 0.9 0.1
light yes 0.6 0.4
light no 0.7 0.3
Train heavy yes 0.4 0.6
{on time, delayed} heavy no 0.5 0.5
Maintenance
{yes, no}
Train
{on time, delayed}
T attend miss
Appointment on time 0.9 0.1
{attend, miss} delayed 0.6 0.4
Rain
{none, light, heavy}
Maintenance
{yes, no}
Train
{on time, delayed}
Appointment
{attend, miss}
Rain Computing Joint Probabilities
{none, light, heavy}
Maintenance
{yes, no}
Train
{on time, delayed}
Appointment
{attend, miss}
P(light)
P(light)
Rain Computing Joint Probabilities
{none, light, heavy}
Maintenance
{yes, no}
Train
{on time, delayed}
Appointment
{attend, miss}
P(light, no)
P(light) P(no | light)
Rain Computing Joint Probabilities
{none, light, heavy}
Maintenance
{yes, no}
Train
{on time, delayed}
Appointment
{attend, miss}
P(light, no, delayed)
P(light) P(no | light) P(delayed | light, no)
Rain Computing Joint Probabilities
{none, light, heavy}
Maintenance
{yes, no}
Train
{on time, delayed}
Appointment
{attend, miss}
P(light, no, delayed, miss)
P(light) P(no | light) P(delayed | light, no) P(miss | delayed)
Inference
Inference
• Query X: variable for which to compute distribution
• Evidence variables E: observed variables for event e
• Hidden variables Y: non-evidence, non-query variable.
• Goal: Calculate P(X | e)
Rain
{none, light, heavy}
P(Appointment | light, no)
Maintenance
{yes, no}
= α P(Appointment, light, no)
Train
{on time, delayed}
= α [P(Appointment, light, no, on time)
+ P(Appointment, light, no, delayed)] Appointment
{attend, miss}
Inference by Enumeration
P(X | e) = α P(X, e) = α ∑ P(X, e, y)
y
X is the query variable.
e is the evidence.
y ranges over values of hidden variables.
α normalizes the result.
Approximate Inference
Sampling
Rain
{none, light, heavy}
Maintenance
{yes, no}
Train
{on time, delayed}
Appointment
{attend, miss}
R = none
Rain none light heavy
{none, light, heavy} 0.7 0.2 0.1
R = none
M = yes
Rain
{none, light, heavy}
R yes no
Maintenance none 0.4 0.6
{yes, no} light 0.2 0.8
heavy 0.1 0.9
Rain R = none
{none, light, heavy} M = yes
T = on time
Maintenance
{yes, no} R M on time delayed
none yes 0.8 0.2
none no 0.9 0.1
light yes 0.6 0.4
Train light no 0.7 0.3
heavy yes 0.4 0.6
{on time, delayed} heavy no 0.5 0.5
Maintenance R = none
{yes, no}
M = yes
T = on time
Train A = attend
{on time, delayed}
T attend miss
Appointment on time 0.9 0.1
{attend, miss} delayed 0.6 0.4
R = none
M = yes
T = on time
A = attend
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend
R = none R = none R = heavy R = light
M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
P(Train = on time) ?
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend
R = none R = none R = heavy R = light
M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend
R = none R = none R = heavy R = light
M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
P(Rain = light | Train = on time) ?
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend
R = none R = none R = heavy R = light
M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend
R = none R = none R = heavy R = light
M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
R = light R = light R = none R = none
M = no M = yes M = no M = yes
T = on time T = delayed T = on time T = on time
A = miss A = attend A = attend A = attend
R = none R = none R = heavy R = light
M = yes M = yes M = no M = no
T = on time T = on time T = delayed T = on time
A = attend A = attend A = miss A = attend
Rejection Sampling
Likelihood Weighting
Likelihood Weighting
• Start by fixing the values for evidence variables.
• Sample the non-evidence variables using conditional
probabilities in the Bayesian Network.
• Weight each sample by its likelihood: the probability
of all of the evidence.
P(Rain = light | Train = on time) ?
Rain
{none, light, heavy}
Maintenance
{yes, no}
Train
{on time, delayed}
Appointment
{attend, miss}
R = light
T = on time
Rain none light heavy
{none, light, heavy} 0.7 0.2 0.1
R = light
M = yes
T = on time
Rain
{none, light, heavy}
R yes no
Maintenance none 0.4 0.6
{yes, no} light 0.2 0.8
heavy 0.1 0.9
Rain R = light
{none, light, heavy} M = yes
T = on time
Maintenance
{yes, no} R M on time delayed
none yes 0.8 0.2
none no 0.9 0.1
light yes 0.6 0.4
Train light no 0.7 0.3
heavy yes 0.4 0.6
{on time, delayed} heavy no 0.5 0.5
Maintenance R = light
{yes, no}
M = yes
T = on time
Train A = attend
{on time, delayed}
T attend miss
Appointment on time 0.9 0.1
{attend, miss} delayed 0.6 0.4
Rain R = light
{none, light, heavy} M = yes
T = on time
A = attend
Maintenance
{yes, no} R M on time delayed
none yes 0.8 0.2
none no 0.9 0.1
light yes 0.6 0.4
Train light no 0.7 0.3
heavy yes 0.4 0.6
{on time, delayed} heavy no 0.5 0.5
Rain R = light
{none, light, heavy} M = yes
T = on time
A = attend
Maintenance
{yes, no} R M on time delayed
none yes 0.8 0.2
none no 0.9 0.1
light yes 0.6 0.4
Train light no 0.7 0.3
heavy yes 0.4 0.6
{on time, delayed} heavy no 0.5 0.5
Uncertainty over Time
Xt: Weather at time t
Markov assumption
the assumption that the current state
depends on only a finite fixed number of
previous states
Markov Chain
Markov chain
a sequence of random variables where the
distribution of each variable follows the
Markov assumption
Transition Model
Tomorrow (Xt+1)
0.8 0.2
Today (X t )
0.3 0.7
X0 X1 X2 X3 X4
Sensor Models
Hidden State Observation
robot's position robot's sensor data
words spoken audio waveforms
user engagement website or app analytics
weather umbrella
Hidden Markov Models
Hidden Markov Model
a Markov model for a system with hidden
states that generate some observed event
Sensor Model
Observation (E t )
0.2 0.8
State (X t )
0.9 0.1
sensor Markov assumption
the assumption that the evidence variable
depends only the corresponding state
X0 X1 X2 X3 X4
E0 E1 E2 E3 E4