0% found this document useful (0 votes)

145 views46 pages

Understanding Hidden Markov Models

Hidden Markov models are a type of probabilistic graphical model that can be represented as a Bayes net. The key properties of hidden Markov models are: 1) They consist of a set of states, initial state probabilities, and state transition probabilities between each pair of states. 2) They assume the Markov property - the probability of moving to the next state depends only on the current state, not any previous states. 3) The state at each time step is represented by a random variable, and the goal is to compute the probability of being in each state at each time step using dynamic programming techniques like the forward algorithm.

Uploaded by

v123321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views46 pages

Understanding Hidden Markov Models

Uploaded by

v123321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Hidden Markov Models

Ronald J. Williams
CSG220
Spring 2007

Contains several slides adapted from an Andrew Moore tutorial on this topic and a few figures
from Russell & Norvig’s AIMA site and Alpaydin’s Introduction to Machine Learning site.

A Simple Markov Chain

1/3
Numbers at nodes represent
1/3 s2 1/2 probability of starting at the
1/2 s1
2/3
corresponding state.
2/3
Numbers on arcs represent
s3 1/3
1/3
0
transition probabilities.
At each time step, t = 1, 2, ... a
1/3
new state is selected randomly
according to the distribution at
the current state.

Let Xt be a random variable for the state at time step t.

Let xt represent the actual value of the state at time t.
In this example, xt can be s1, s2, or s3.

Hidden Markov Models: Slide 2

1
Markov Property
• For any t, Xt+1 is conditionally independent of {Xt-1, Xt-2, … X1}
given Xt.
• In other words:
P(Xt+1 = sj |Xt = si ) = P(Xt+1 = sj |Xt = si ,any earlier history)
• Question: What would be the best Bayes Net structure to
represent the Joint Distribution of (X1, X2, … , Xt-1, Xt, ...)?

Hidden Markov Models: Slide 3

Markov Property
• For any t, Xt+1 is conditionally independent of {Xt-1, Xt-2, … X1}
given Xt.
• In other words:
P(Xt+1 = sj |Xt = si ) = P(Xt+1 = sj |Xt = si ,any earlier history)
• Question: What would be the best Bayes Net structure to
represent the Joint Distribution of (X1, X2, … , Xt-1, Xt, ...)?
• Answer:

X1 X2 ... Xt-1 Xt ...

Hidden Markov Models: Slide 4

2
Markov chain as a Bayes net
i P(X1 = si)
1 π1 X1 X2 ... Xt-1 Xt ...
2 π2
3 π3 1 2 … j … N
1 a11 a12 … a1j … a1N
i πi
2 a21 a22 … a2j … a2N
3 a31 a32 … a3j … a3N
N πN
: : : : : : :
i ai1 ai2 … aij … aiN

N aN1 aN2 … aNj … aNN

Notation:
aij = P(Xt+1 = sj | Xt = si ) Same CPT at every
node except X1
πi = P(X1 = si )
Hidden Markov Models: Slide 5

Markov Chain: Formal Definition

A Markov chain is a 3-tuple consisting of

• a set of N possible states {s1, s2, ..., sN}
• {π1, π2, .. πN} The starting state probabilities
πi = P(X1 = si)
• a11 a22 … a1N
The state transition
a21 a22 … a2N
probabilities
: : :
aN1 aN2 … aNN aij = P(Xt+1=sj | Xt=si)

Hidden Markov Models: Slide 6

3
Computing stuff in Markov chains
• Some notation and assumptions
• Assume time t runs from 1 to T
• Recall that Xt is the r.v. representing the state at
time t and xt denotes the actual value
• Use Xt1:t2 and xt1:t2 as shorthand for
(Xt1, Xt1+1, ..., Xt2) and (xt1, xt1+1, ... xt2),
respectively
• Use notation like P(xt) as shorthand for P(Xt=xt)

Hidden Markov Models: Slide 7

What is P(Xt = si)? 1st attempt

Step 1: Work out how to compute P(x1:t) for any state
sequence x1:t
P (x1:t ) = P( xt | x1:t −1 )P( x1:t −1 )
= P( xt | x1:t −1 )P( xt −1 | x1:t − 2 )P( x1:t − 2 ) WHY?
M
= P( xt | x1:t −1 )P( xt −1 | x1:t − 2 )L P( x2 | x1:1 )P( x1:1 )
= P( xt | xt −1 )P( xt −1 | xt − 2 )L P( x2 | x1 )P ( x1 )
Step 2: Use this knowledge to get P(Xt =si)
ion is
omputat l in t
∑ P( x
C
P( X t = si ) = ) nentia
1:t expo
sequences for which xt = si
Hidden Markov Models: Slide 8

4
State sequence as a path

trellis

Exponentially many paths, but at each time step

only goes through exactly one of the N states
Hidden Markov Models: Slide 9

What is P(Xt =si)? Clever approach

• For each state si, define
pt (i ) = P( X t = si )

• Express inductively

∀i p1 (i ) ≡ P( X 1 = si ) = π i

∀j pt +1 ( j ) ≡ P(X t +1 = s j )
N
= ∑ P( X t +1 = s j | X t = si ) P( X t = si )
i =1
N
= ∑ aij pt (i )
i =1

Hidden Markov Models: Slide 10

5
What is P(Xt =si)? Clever approach
• For each state si, define time step
pt (i ) = P( X t = si ) 1 2 ... T

state index
1
2
• Express inductively
:
∀i p1 (i ) ≡ P( X 1 = si ) = π i N

pt +1 ( j ) ≡ P(X t +1 = s j )
• Computation is simple.
∀j • Just fill in this table one
N column at a time, from left
= ∑ P( X t +1 = s j | X t = si ) P( X = si )
to tright
i =1 • Cells in this table
N correspond to nodes in the
= ∑ aij pt (i ) trellis
i =1

Hidden Markov Models: Slide 11

What is P(Xt =si)? Clever approach

• For each state si, define
• Cost of computing pt(i) for all
pt (i ) = P( X t = si ) states si is now O(TN2)
• The first way was O(NT)
• Express inductively • This was a simple example
• It was meant to warm you up
∀i p1 (i ) ≡ P( X 1 = si ) = π i to this trick, called Dynamic
Programming, because HMM
computations involve many
∀j pt +1 ( j ) ≡ P(X t +1 = s j ) tricks just like this.
N
= ∑ P( X t +1 = s j | X t = si ) P( X t = si )
i =1
N
= ∑ aij pt (i )
i =1

Hidden Markov Models: Slide 12

6
Inductive step:
graphical representation

pt +1 ( j ) = ∑i =1 aij pt (i )
N
pt (i )

Compare this with similar

depictions of updates we’ll
use in HMMs

Hidden Markov Models: Slide 13

Hidden State
• Given a Markov model of a process, computation of various
quantities of interest (e.g., probabilities) is straightforward if
the state is observable – use techniques like the one just
described.
• More realistic: assume the true state is not observable – only
have observations that depend on, but do not fully determine,
the actual states.
• Examples
• Robot localization
• state = actual location
• observations = (noisy) sensor readings
• Speech recognition
• state sequence => word
• observations = acoustic signal
• In this situation, we say the state is hidden
• Model this using a Hidden Markov Model (HMM)
Hidden Markov Models: Slide 14

7
HMMs
• An HMM is just a Markov chain augmented
with
• a set of M possible observations {o1, o2, ..., oM}
• for each state s1, s2, ..., sN a distribution over
possible observations that might be sensed in that
state
• We’ll let Zt be the r.v. for the observation that
occurs at time t (with zt representing the
actual observation)
• In addition, we’ll assume that the observation
at time t depends only on the state at time t, in
the sense about to be described
Hidden Markov Models: Slide 15

Markov Property of Observations

• For any t, Zt is conditionally independent of {Xt-1, Xt-2, … X1,
Zt-1, Zt-2, ..., Z1} given Xt.
• In other words:
P(Zt = oj |Xt = si ) = P(Zt = oj |Xt = si ,any earlier history)
• Question: What would be the best Bayes Net structure to
represent the Joint Distribution of (X1, Z1, X2, Z2, … , Xt-1,Zt-1,
Xt, Zt, ...)?

Hidden Markov Models: Slide 16

8
Markov Property of Observations
• For any t, Zt is conditionally independent of {Xt-1, Xt-2, … X1,
Zt-1, Zt-2, ..., Z1} given Xt.
• In other words:
P(Zt = oj |Xt = si ) = P(Zt = oj |Xt = si ,any earlier history)
• Question: What would be the best Bayes Net structure to
represent the Joint Distribution of (X1, Z1, X2, Z2, … , Xt-1,Zt-1,
Xt, Zt, ...)?
• Answer:
X1 X2 ... Xt-1 Xt ...

Z1 Z2 Zt-1 Zt
Hidden Markov Models: Slide 17

HMM as a Bayes Net

X1 X2 ... Xt-1 Xt ...

Z1 Z2 Zt-1 Zt
observation index
1 2 … k … M
This is the CPT for
every Z node
1 b1(o1) b1 (o2) … b1 (ok) … b1(oM)
2 b2 (o1) b2 (o2) … b2(ok) … b2 (oM) Notation:
state index

3 b3 (o1) b3 (o2) … b3(ok) … b3 (oM)

bi (ok ) = P(Zt = ok | X t = si )
: : : : : : :

i bi(o1) bi (o2) … bi(ok) … bi (oM)

: : : : : : :

N bN (o1) bN (o2) … bN(ok) … bN (oM)

Hidden Markov Models: Slide 18

9
Are HMMs Useful?
You bet !!
• Robot planning & sensing under uncertainty (e.g.
Reid Simmons / Sebastian Thrun / Sven Koenig)
• Robot learning control (e.g. Yangsheng Xu’s work)
• Speech Recognition/Understanding
Phones → Words, Signal → phones
• Human Genome Project
Complicated stuff your lecturer knows nothing
about.
• Consumer decision modeling
• Economics & Finance.
Plus at least 5 other things I haven’t thought of.
Hidden Markov Models: Slide 19

Dynamic Bayes Nets

• An HMM is actually a special case of a more
general concept: Dynamic Bayes Net (DBN)
• Can decompose into multiple state variables
and multiple observation variables at each
time slice, with only direct influences
represented explicitly
• (1st order) Markov property: nodes in any time
slice have arcs only from nodes in their own
or the immediately preceding time slice
• Higher-order Markov models also easily
represented in this framework
Hidden Markov Models: Slide 20

10
DBN Example

Linear dynamical system with position sensors

E.g., target tracking

Hidden Markov Models: Slide 21

Another DBN Example

Modeling a robot
with position
sensors and a
battery charge
meter

Hidden Markov Models: Slide 22

11
Back to HMMs ...
Summary of our HMM notation:
• Xt = state at time t (r.v.)
• Zt = observation at time t (r.v.)
• Vt1:t2 = (Vt1, Vt1+1, ..., Vt2) for any time-indexed r.v. V
• Possible states = {s1, s2, ..., sN}
• Possible observations = {o1, o2, ..., oM}
• vt = actual value of r.v. V at time step t
• vt1:t2 = (vt1, vt1+1, ..., vt2) = sequence of actual values
of r.v. V from time steps t1 through t2
• Convenient shorthand: E.g., P(x1:t⏐z1:t) means
P(X1:t = x1:t⏐Z1:t = z1:t)
• T = final time step
Hidden Markov Models: Slide 23

HMM: Formal Definition

An HMM λ is a 5-tuple consisting of
• a set of N possible states {s1, s2, ..., sN}
• a set of M possible observations {o1, o2, ..., oM}
• {π1, π2, .. πN} The starting state probabilities
πi = P(X1 = si)
• a11 a22 … a1N
a21 a22 … a2N The state transition probabilities
: : : aij = P(Xt+1=sj | Xt=si)
aN1 aN2 … aNN

• b1(o1) b1(o2) … b1(oM)

b2(o1) b2(o2) … b2(oM) The observation probabilities
: : : bi(ok) = P(Zt=ok | Xt=si)
bN(o1) bN(o2) … bN(oM)

Hidden Markov Models: Slide 24

12
Here’s an HMM Start randomly in state 1 or 2
Choose one of the output
1/3
s1 s2 symbols in each state at
1/3 random.
1/2 uv vw
1/2
2/3
2/3

1/3 uw 0
1/3

State set {s1, s2, s3} s3

Observation set {u, v, w} 1/3

π1 = 1/2 π2 = 1/2 π3 = 0

a11 = 0 a12 = 1/3 a13 = 2/3

a12 = 1/3 a22 = 0 a13 = 2/3
a13 = 1/3 a32 = 1/3 a13 = 1/3

b1 (u) = 1/2 b1 (v) = 1/2 b1 (w) = 0

b2 (u) = 0 b2 (v) = 1/2 b2 (w) = 1/2
b3 (u) = 1/2 b3 (v) = 0 b3 (w) = 1/2
Hidden Markov Models: Slide 25

Here’s an HMM Start randomly in state 1 or 2

Choose one of the output
1/3
s1 s2 symbols in each state at
1/3 random.
1/2 uv vw
1/2
2/3 Let’s generate a sequence of
2/3
observations:
1/3 uw 0
1/3

State set {s1, s2, s3} s3

50-50 choice
Observation set {u, v, w} 1/3
between s1 and
π1 = 1/2 π2 = 1/2 π3 = 0 s2

a11 = 0 a12 = 1/3 a13 = 2/3

a12 = 1/3 a22 = 0 a13 = 2/3
a13 = 1/3 a32 = 1/3 a13 = 1/3 x1= __ z1= __
x2= __ z2= __
b1 (u) = 1/2 b1 (v) = 1/2 b1 (w) = 0
x3= __ z3= __
b2 (u) = 0 b2 (v) = 1/2 b2 (w) = 1/2
b3 (u) = 1/2 b3 (v) = 0 b3 (w) = 1/2
Hidden Markov Models: Slide 26

13
Here’s an HMM Start randomly in state 1 or 2
Choose one of the output
1/3
s1 s2 symbols in each state at
1/3 random.
1/2 uv vw
2/3 1/2 Let’s generate a sequence of
2/3
observations:
1/3 uw 0
1/3

State set {s1, s2, s3} s3 50-50 choice

Observation set {u, v, w} 1/3 between u and v
π1 = 1/2 π2 = 1/2 π3 = 0

a11 = 0 a12 = 1/3 a13 = 2/3

a12 = 1/3 a22 = 0 a13 = 2/3
a13 = 1/3 a32 = 1/3 a13 = 1/3 x1= s1 z1= __
x2= __ z2= __
b1 (u) = 1/2 b1 (v) = 1/2 b1 (w) = 0
x3= __ z3= __
b2 (u) = 0 b2 (v) = 1/2 b2 (w) = 1/2
b3 (u) = 1/2 b3 (v) = 0 b3 (w) = 1/2
Hidden Markov Models: Slide 27

Here’s an HMM Start randomly in state 1 or 2

Choose one of the output
1/3
s1 s2 symbols in each state at
1/3 random.
1/2 uv vw
2/3 1/2 Let’s generate a sequence of
2/3
observations:
1/3 uw 0
1/3

State set {s1, s2, s3} s3 Goto s2 with

Observation set {u, v, w} 1/3 probability 1/3 or
π1 = 1/2 π2 = 1/2 π3 = 0 s3 with prob. 2/3

a11 = 0 a12 = 1/3 a13 = 2/3

a12 = 1/3 a22 = 0 a13 = 2/3
a13 = 1/3 a32 = 1/3 a13 = 1/3 x1= s1 z1= u
x2= __ z2= __
b1 (u) = 1/2 b1 (v) = 1/2 b1 (w) = 0
x3= __ z3= __
b2 (u) = 0 b2 (v) = 1/2 b2 (w) = 1/2
b3 (u) = 1/2 b3 (v) = 0 b3 (w) = 1/2
Hidden Markov Models: Slide 28

14
Here’s an HMM Start randomly in state 1 or 2
Choose one of the output
1/3
s1 s2 symbols in each state at
1/3 random.
1/2 uv vw
1/2
2/3 Let’s generate a sequence of
2/3
observations:
1/3 uw 0
1/3

State set {s1, s2, s3} s3 50-50 choice

Observation set {u, v, w} 1/3 between u and w
π1 = 1/2 π2 = 1/2 π3 = 0

a11 = 0 a12 = 1/3 a13 = 2/3

a12 = 1/3 a22 = 0 a13 = 2/3
a13 = 1/3 a32 = 1/3 a13 = 1/3 x1= s1 z1= u
x2= s3 z2= __
b1 (u) = 1/2 b1 (v) = 1/2 b1 (w) = 0
x3= __ z3= __
b2 (u) = 0 b2 (v) = 1/2 b2 (w) = 1/2
b3 (u) = 1/2 b3 (v) = 0 b3 (w) = 1/2
Hidden Markov Models: Slide 29

Here’s an HMM Start randomly in state 1 or 2

Choose one of the output
1/3
s1 s2 symbols in each state at
1/3 random.
1/2 uv vw
1/2
2/3 Let’s generate a sequence of
2/3
observations:
1/3 uw 0
1/3

State set {s1, s2, s3} s3

Each of the three
Observation set {u, v, w} 1/3
next states is
π1 = 1/2 π2 = 1/2 π3 = 0
equally likely

a11 = 0 a12 = 1/3 a13 = 2/3

a12 = 1/3 a22 = 0 a13 = 2/3
a13 = 1/3 a32 = 1/3 a13 = 1/3 x1= s1 z1= u
x2= s3 z2= u
b1 (u) = 1/2 b1 (v) = 1/2 b1 (w) = 0
x3= __ z3= __
b2 (u) = 0 b2 (v) = 1/2 b2 (w) = 1/2
b3 (u) = 1/2 b3 (v) = 0 b3 (w) = 1/2
Hidden Markov Models: Slide 30

15
Here’s an HMM Start randomly in state 1 or 2
Choose one of the output
1/3
s1 s2 symbols in each state at
1/3 random.
1/2 uv vw
1/2
2/3 Let’s generate a sequence of
2/3
observations:
1/3 uw 0
1/3

State set {s1, s2, s3} s3 50-50 choice

Observation set {u, v, w} 1/3 between u and w
π1 = 1/2 π2 = 1/2 π3 = 0

a11 = 0 a12 = 1/3 a13 = 2/3

a12 = 1/3 a22 = 0 a13 = 2/3
a13 = 1/3 a32 = 1/3 a13 = 1/3 x1= s1 z1= u
x2= s3 z2= u
b1 (u) = 1/2 b1 (v) = 1/2 b1 (w) = 0
x3= s3 z3= __
b2 (u) = 0 b2 (v) = 1/2 b2 (w) = 1/2
b3 (u) = 1/2 b3 (v) = 0 b3 (w) = 1/2
Hidden Markov Models: Slide 31

Here’s an HMM Start randomly in state 1 or 2

Choose one of the output
1/3
s1 s2 symbols in each state at
1/3 random.
1/2 uv vw
1/2
2/3 Let’s generate a sequence of
2/3
observations:
1/3 uw 0
1/3

State set {s1, s2, s3} s3

Observation set {u, v, w} 1/3

π1 = 1/2 π2 = 1/2 π3 = 0

a11 = 0 a12 = 1/3 a13 = 2/3

a12 = 1/3 a22 = 0 a13 = 2/3
a13 = 1/3 a32 = 1/3 a13 = 1/3 x1= s1 z1= u
x2= s3 z2= u
b1 (u) = 1/2 b1 (v) = 1/2 b1 (w) = 0
x3= s3 z3= w
b2 (u) = 0 b2 (v) = 1/2 b2 (w) = 1/2
b3 (u) = 1/2 b3 (v) = 0 b3 (w) = 1/2
Hidden Markov Models: Slide 32

16
Hidden State Start randomly in state 1 or 2
Choose one of the output
1/3
s1 s2 symbols in each state at
1/3 random.
1/2 uv vw
1/2
2/3 Let’s generate a sequence of
2/3
observations:
1/3 uw 0
1/3

State set {s1, s2, s3} s3 This is what the

Observation set {u, v, w} 1/3
observer has to
π1 = 1/2 π2 = 1/2 π3 = 0
work with…
a11 = 0 a12 = 1/3 a13 = 2/3
a12 = 1/3 a22 = 0 a13 = 2/3
a13 = 1/3 a32 = 1/3 a13 = 1/3 x1= ? z1= u
x2= ? z2= u
b1 (u) = 1/2 b1 (v) = 1/2 b1 (w) = 0
x3= ? z3= w
b2 (u) = 0 b2 (v) = 1/2 b2 (w) = 1/2
b3 (u) = 1/2 b3 (v) = 0 b3 (w) = 1/2
Hidden Markov Models: Slide 33

Problems to solve
• So now we have an HMM (or, more generally,
a DBN) that models a temporal process of
interest
• What are some of the kinds of problems we’d
like to be able to solve with this?

Hidden Markov Models: Slide 34

17
Temporal Model Problems to Solve
• Filtering: Compute P(Xt⏐z1:t , λ)
• Prediction: Compute P(Xk⏐z1:t , λ) for k > t
• Smoothing: Compute P(Xk⏐z1:t , λ) for k < t
• Observation sequence likelihood:
Compute P(z1:T⏐λ)
• Most probable path (state sequence):
Compute x1:T maximizing P(x1:T⏐z1:T , λ)
• Maximum likelihood model: Given a set of
{ }
observation sequences z1r:Tr r , compute λ
maximizing ∏ P( z1:Tr | λ )
r

r Hidden Markov Models: Slide 35

Temporal Model Problems to Solve

• Used in a wide variety of dynamical systems
modeling applications:
• filtering
• prediction
• smoothing
• Used especially in HMM applications:
• observation sequence likelihood
• most probable path
• maximum likelihood model fitting

Hidden Markov Models: Slide 36

18
Filtering
Compute P( X t z1:t , λ )

X1 X2 ... Xt-1 Xt Xt+1 ...

Z1 Z2 Zt-1 Zt Zt+1

observed
current time
infer (distribution)

Hidden Markov Models: Slide 37

Prediction
Compute P ( X k z1:t , λ ) for k > t

X1 X2 ... Xt-1 Xt ... Xk ...

Z1 Z2 Zt-1 Zt Zk

observed
current time
infer (distribution)

Hidden Markov Models: Slide 38

19
Smoothing
Compute P ( X k z1:t , λ ) for k < t

X1 X2 ... Xk ... Xt Xt+1 ...

Z1 Z2 Zk Zt Zt+1

observed
current time
infer (distribution)

Hidden Markov Models: Slide 39

Observation Sequence Likelihood

Compute P(z1:t λ )
X1 X2 ... Xt-1 Xt ... XT

Z1 Z2 Zt-1 Zt ZT

observed

What’s the probability of this particular sequence of

observations as a function of the model parameters?
Useful for such things as finding which of a set of HMM models
best fits an observation sequence, as in speech recognition.
Hidden Markov Models: Slide 40

20
Most Probable Path
Compute arg max x1:T P (x1:T z1:T , λ )

X1 X2 ... Xt-1 Xt ... XT

Z1 Z2 Zt-1 Zt ZT

observed
infer (only most probable)

Not necessarily the same as the sequence of individually

most probable states (obtained by smoothing)
Hidden Markov Models: Slide 41

Maximum Likelihood Model

Assume number of states given
Given a set of R observation sequences
(
z11:T1 = z11 , z 12 , K , z T11 )
z12:T2 = (z 2
1 , z 22 , K , z T22 )
M
(
z1R:T R = z1R , z 2R , K , z TRR )
Compute R
λ = arg max λ ∏ P( z1r:T | λ )
*
r
r =1

Hidden Markov Models: Slide 42

21
Solution methods for these problems
Let’s start by considering the observation
sequence likelihood problem:
Given z1:T , compute P(z1:t λ )

Use our example HMM to illustrate

Hidden Markov Models: Slide 43

Prob. of a sequence of 3 observations

1/3
s1 s2
P( z1:3 ) = ∑ P( z 1:3
x1:3∈paths of length 3
∧ x1:3 ) 1/2 uv
2/3
1/3

2/3
vw
1/2

uw
∑ P( z
1/3
= 1:3 | x1:3 ) P ( x1:3 ) 1/3 0
s3
x1:3∈paths of length 3 1/3

How do we compute P(x1:3)

for an arbitrary path x1:3?
How do we compute P(z1:3|x1:3) for
an arbitrary path x1:3?

Hidden Markov Models: Slide 44

22
Prob. of a sequence of 3 observations
1/3
s1 s2
P( z1:3 ) = ∑ P( z 1:3
x1:3∈paths of length 3
∧ x1:3 ) 1/2 uv
2/3
1/3

2/3
vw
1/2

uw
∑ P( z
1/3
= 1:3 | x1:3 ) P ( x1:3 ) 1/3 0
s3
x1:3∈paths of length 3 1/3

How do we compute P(x1:3) P(x1,x2,x3) = P(x1) P(x2|x1) P(x3| x2)

for an arbitrary path x1:3? E.g, P(s1, s3, s3) =1/2 * 2/3 * 1/3 = 1/9

How do we compute P(z1:3|x1:3) for

an arbitrary path x1:3?

Hidden Markov Models: Slide 45

Prob. of a sequence of 3 observations

1/3
s1 s2
P( z1:3 ) = ∑ P( z 1:3
x1:3∈paths of length 3
∧ x1:3 ) 1/2 uv
2/3
1/3

2/3
vw
1/2

uw
∑ P( z
1/3
= 1:3 | x1:3 ) P ( x1:3 ) 1/3 0
s3
x1:3∈paths of length 3 1/3

How do we compute P(x1:3) P(x1,x2,x3) = P(x1) P(x2|x1) P(x3| x2)

for an arbitrary path x1:3? E.g, P(s1, s3, s3) =1/2 * 2/3 * 1/3 = 1/9

How do we compute P(z1:3|x1:3) for

23
Prob. of a sequence of 3 observations
1/3
s1 s2
P( z1:3 ) = ∑ P( z 1:3
x1:3∈paths of length 3
∧ x1:3 ) 1/2 uv
2/3
1/3

2/3
vw
1/2

uw
∑ P( z
1/3
= 1:3 | x1:3 ) P ( x1:3 ) 1/3 0
s3
x1:3∈paths of length 3 1/3

But this sum has 33 = 27 terms in it!

Exponential in the length of the sequence
Need to use a dynamic programming trick like before

Hidden Markov Models: Slide 47

The probability of a given sequence of

observations, non-exponential-cost-style
Given observation sequence (z1, z2,…, zT) = z1:T
Define the forward variable
αt(i) = P(z1:t, Xt = si | λ) for 1 ≤ t ≤ T
αt(i) = Probability that, in a random trial,
• we’d have seen the first t observations; and
• we’d have ended up in si as the tth state visited.

Hidden Markov Models: Slide 48

24
Computing the forward variables
Base case:
1 1
α1 (i ) ≡ P( z1 ∧ X 1 = si )
= P(z1 X 1 = si )P( X 1 = si ) πi i i
= bi ( z1 )π i z1
Note: For simplicity, we’ll N
N
drop explicit reference to
conditioning on the HMM
parameters λ for many of time step 1 2
the upcoming slides, but it’s
always there implicitly.

Hidden Markov Models: Slide 49

Forward variables: inductive step

( j ) ≡ P (z ∧ X = s )
α t +1 1:t +1
sum over all possible
t +1 j
previous states
= ∑ P (z ∧ X = s ∧ X = s )
N
i =1 1:t +1 t i t +1 j

= ∑ P (z ∧ z ∧ X = s ∧ X = s )
N
i =1 1:t t +1 t i t +1 j

= ∑ P (z ∧ X = s z ∧ X = s )P ( z ∧ X = s )
N
i =1 t +1 t +1 j 1:t t i 1:t t i

= ∑ P (z ∧ X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t

= ∑ P (z X = s ∧ X = s )P (X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t +1 j t i t

(
= ∑ i =1 P z t +1 X t +1 = s j aijα t (i )
N
)
= b j ( z t +1 )∑ i =1 aijα t (i )
N

Hidden Markov Models: Slide 50

25
Forward variables: inductive step
( j ) ≡ P(z ∧ X = s )
α t +1 1:t +1 t +1 j

= ∑ P (z ∧ X = s ∧ X = s )
N split off last
i =1 1:t +1 observation
t i t +1 j

= ∑ P (z ∧ z ∧ X = s ∧ X = s )
N
i =1 1:t t +1 t i t +1 j

= ∑ P (z ∧ X = s z ∧ X = s )P( z ∧ X = s )
N
i =1 t +1 t +1 j 1:t t i 1:t t i

= ∑ P (z ∧ X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t

= ∑ P (z X = s ∧ X = s )P (X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t +1 j t i t

= ∑ P (z X = s )a α (i )
N
i =1 t +1 t +1 j ij t

= b ( z )∑ a α (i )
N
j t +1 i =1 ij t
Hidden Markov Models: Slide 51

Forward variables: inductive step

( j ) ≡ P(z ∧ X = s )
α t +1 1:t +1 t +1 j

= ∑ P (z ∧ X = s ∧ X = s )
N
i =1 1:t +1 t i t +1 j

= ∑ P (z ∧ z ∧ X = s ∧ X = s ) chain rule
N
i =1 1:t t +1 t i t +1 j

= ∑ P (z ∧ X = s z ∧ X = s )P( z ∧ X = s )
N
i =1 t +1 t +1 j 1:t t i 1:t t i

= ∑ P (z ∧ X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t

= ∑ P (z X = s ∧ X = s )P (X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t +1 j t i t

= ∑ P (z X = s )a α (i )
N
i =1 t +1 t +1 j ij t

= b ( z )∑ a α (i )
N
j t +1 i =1 ij t
Hidden Markov Models: Slide 52

26
Forward variables: inductive step
( j ) ≡ P(z ∧ X = s )
α t +1 1:t +1 t +1 j

= ∑ P (z ∧ X = s ∧ X = s )
N
i =1 1:t +1 t i t +1 j

= ∑ P (z ∧ z ∧ X = s ∧ X = s )
N
i =1 1:t t +1 t i t +1 j

= ∑ P (z ∧ X = s z ∧ X = s )P( z ∧ X = s )
N
i =1 t +1 t +1 j 1:t t i 1:t t i

= ∑ P (z ∧ X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t

= ∑ P (z X = s ∧ X = s latest
)P(Xstate=ands observation
X = s )α (i )
N
i =1 t +1 t +1 j t i t +1 j t i t

= ∑ P (z X = s )a α (i ) earlier observations given

N conditionally independent of
i =1 t +1 t +1 j ij t
previous state
= b ( z )∑ a α (i )
N
j t +1 i =1 ij t
Hidden Markov Models: Slide 53

Forward variables: inductive step

( j ) ≡ P(z ∧ X = s )
α t +1 1:t +1 t +1 j

= ∑ P (z ∧ X = s ∧ X = s )
N
i =1 1:t +1 t i t +1 j

= ∑ P (z ∧ z ∧ X = s ∧ X = s )
N
i =1 1:t t +1 t i t +1 j

= ∑ P (z ∧ X = s z ∧ X = s )P( z ∧ X = s )
N
i =1 t +1 t +1 j 1:t t i 1:t t i

= ∑ P (z ∧ X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t

= ∑ P (z X = s ∧ X = s )P (X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t +1 j t i t

= ∑ P (z X = s )a α (i ) chain rule
N
i =1 t +1 t +1 j ij t

= b ( z )∑ a α (i )
N
j t +1 i =1 ij t
Hidden Markov Models: Slide 54

27
Forward variables: inductive step
( j ) ≡ P(z ∧ X = s )
α t +1 1:t +1 t +1 j

= ∑ P (z ∧ X = s ∧ X = s )
N
i =1 1:t +1 t i t +1 j

= ∑ P (z ∧ z ∧ X = s ∧ X = s )
N
i =1 1:t t +1 t i t +1 j

= ∑ P (z ∧ X = s z ∧ X = s )P( z ∧ X = s )
N
i =1 t +1 t +1 j 1:t t i 1:t t i

= ∑ P (z ∧ X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t

= ∑ P (z X = s ∧ X = s )P (X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t +1 j t i t

= ∑ P (z X = s )a α (i )
N
i =1 t +1 t +1 j ij t

= b ( z )∑ a α (i )
latest observation conditionally independent
N
j t +1 of earlier states given latest state
i =1 ij t
Hidden Markov Models: Slide 55

Forward variables: inductive step

( j ) ≡ P(z ∧ X = s )
α t +1 1:t +1 t +1 j

= ∑ P (z ∧ X = s ∧ X = s )
N
i =1 1:t +1 t i t +1 j

= ∑ P (z ∧ z ∧ X = s ∧ X = s )
N
i =1 1:t t +1 t i t +1 j

= ∑ P (z ∧ X = s z ∧ X = s )P( z ∧ X = s )
N
i =1 t +1 t +1 j 1:t t i 1:t t i

= ∑ P (z ∧ X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t

= ∑ P (z X = s ∧ X = s )P (X = s X = s )α (i )
N
i =1 t +1 t +1 j t i t +1 j t i t

= ∑ P (z X = s )a α (i )
N
i =1 t +1 t +1 j ij t

= b ( z )∑ a α (i )
N
j t +1 i =1 ij t
Hidden Markov Models: Slide 56

28
Forward variables: inductive step

α t (i ) α t +1 ( j ) = b j ( zt +1 )∑i =1 aijα t (i )
N

zt +1

Hidden Markov Models: Slide 57

Observation Sequence Likelihood

Efficient solution to the observation sequence

likelihood problem using the forward variables:

P(z1:t λ ) = ∑i =1 P(z1:t ∧ X t = si λ ) =∑i =1α t (i )

N N

Hidden Markov Models: Slide 58

29
In our example
1/3
s2
α t (i ) ≡ P(z1:t ∧ X t = si λ )
s1
1/2 uv
1/3
vw
α1 (i ) = bi ( z1 )π i 2/3 1/2
2/3

α t +1 ( j ) = b j ( zt +1 )∑ aij α t (i ) 1/3 uw 0
1/3

i s3
1/3

Observed: z1 z2 z3 = u u w

α1 (1) = 14 α 2 (1) = 0 α 2 (1) = 0

α1 (2 ) = 0 α 2 (2) = 0 α 3 (2) = 721
α1 (3) = 0 α 2 (3) = 121 α 3 (3) = 721
So probability of observing uuw is 1/36

Hidden Markov Models: Slide 59

Filtering

Efficient solution to the filtering problem using

the forward variables:
P( X t = si ∧ z1:t ) α t (i )
P( X t = si z1:t ) = =
P( z1:t ) ∑ j =1α t ( j )
N

Estimating current state based on all observations up to

the current time.

So in our example, after observing uuw, prob. of being in s1 is 0 and

prob. of being in s2 = prob. of being in s3 = 1/2

Hidden Markov Models: Slide 60

30
Prediction
• Note that the (state) prediction problem can
be viewed as a special case of the filtering
problem in which there are missing
observations.
• That is, trying to compute the probability of Xk
given observations up through time step t,
with k > t, amounts to filtering with missing
observations at time steps t+1, t+2, ..., k.
• Therefore, we now focus on the missing
observations problem.
Hidden Markov Models: Slide 61

Missing Observations
• Looking at the derivation of the inductive step for computing the forward
variables, we see that the last step involves writing

( )
α t +1 ( j ) = P zt +1 X t +1 = s j P(X t +1 = s j all observations up through time t )

b j ( zt +1 ) ∑ aijα t (i )
N
i =1

• Thus the second factor gives us a prediction of the state at time t+1 based
on all earlier observations, which we then multiply by the observation
probability at time t+1 given the state at time t+1.
• If there is no observation at time t+1, clearly the set of observations made
through time t+1 is the same as the set of observations made through
time t.

Hidden Markov Models: Slide 62

31
Missing Observations (cont.)
• Thus we redefine
α t (i ) = P( X t = si ∧ all available observations through time t )
• This generalizes our earlier definition but allows for the possibility that
some observations are present and others are missing
• Then define
⎧b ( z ) if there is an observation at time t
bi′(zt ) = ⎨ i t
⎩1 otherwise
• It’s not hard to see that the correct forward compution should then
proceed as:
α1 (i ) = bi′(z1 )π i
α t +1 ( j ) = b′j (zt +1 )∑ aij α t (i )
i
• Amounts to propagating state predictions forward wherever there are no
observations
• Interesting special case: When there are no observations at any time, the
α values are identical to the p values we defined earlier for Markov chains

Hidden Markov Models: Slide 63

Solving the smoothing problem

• Define the backward variables
β t (i ) = P (zt +1:T X t = si , λ )

• Probability of observing zt+1, ..., zT given that

system was in state si at time step t
• These can be computed efficiently by starting
at the end (time T) and working backwards
• Base case: βT (i ) = 1 for all i, 1 ≤ i ≤ N
• Valid because zT+1:T is an empty sequence of
observations so its probability is 1
Hidden Markov Models: Slide 64

32
Backward variables: inductive step
β t (i ) ≡ P(zt +1:T X t = si )
= ∑ j =1 P(zt +1:T ∧ X t +1 = s j X t = si )
N

( )
= ∑ j =1 P zt +1:T X t +1 = s j ∧ X t = si P (X t +1 = s j X t = si )
N

= ∑ j =1 P (z )
N
t +1 ∧ zt + 2:T X t +1 = s j ∧ X t = si aij

= ∑ j =1 P (z )
N
t +1 ∧ zt + 2:T X t +1 = s j aij

= ∑ j =1 P (z = s )P (z )
N
t +1 zt + 2:T ∧ X t +1 j t + 2:T X t +1 = s j aij

= ∑ j =1 P (z )
X t +1 = s j β t +1 ( j )aij
N
t +1

= ∑ j =1 b j ( zt +1 )β t +1aij
N

Hidden Markov Models: Slide 65

Backward variables: inductive step

β t (i ) = ∑ j =1 b j (zt +1 )β t +1aij
N
β t +1 ( j )
zt +1

Hidden Markov Models: Slide 66

33
Solving the smoothing problem
• Use the notation
γ t (i ) = P (X t = si z1:T )
for the probability we want to compute.
• Then
γ t (i ) = cP(z1:T X t = si )P( X t = si )
= cP(z1:t X t = si )P (zt +1:T X t = si )P( X t = si )
= cP( z1:t ∧ X t = si )P (zt +1:T X t = si )
= cα t (i )β t (i )
where c = 1/P(z1:T) is a constant of proportionality we can
ignore as long as we normalize to get the actual probs.
Hidden Markov Models: Slide 67

Smoothing

Efficient solution to the smoothing problem

using the forward and backward variables:
α t (i ) β t (i )
P( X t = si z1:T ) =
∑ j =1α t ( j )β t ( j )
N

Estimating a state based on all observations before,

during, and after that time step.

Forward-backward algorithm
Hidden Markov Models: Slide 68

34
Solving the most probable path problem
• Want arg max x1:T P x1:T z1:T( )
• One approach:
P(z1:T x1:T )P(x1:T )
arg max x1:T P(x1:T z1:T ) = arg max x1:T
P(z1:T )
= arg max x1:T P(z1:T x1:T )P(x1:T )

• Easy to compute each factor for a given state

and observation sequence, but number of
paths is exponential in T
• Use dynamic programming instead
Hidden Markov Models: Slide 69

DP for Most Probable Path

• Define
δ t (i ) = max x1:t−1 P( x1:t −1 ∧ X t = si ∧ z1:t )
• A path giving this maximum is one of length
t-1 having the highest probability of
simultaneously
• occuring
• ending at si
• producing observation sequence z1:t

Hidden Markov Models: Slide 70

35
DP for MPP (cont.)
• We’ll show that these values can be
computed by an efficient forward computation
similar to the computation of the α values
• But first, let’s check that it gives us something
useful:
δ T (i ) = max x1:T −1 P(x1:T −1 ∧ X T = si ∧ z1:T )
= max x1:T −1 P (x1:T −1 ∧ X T = si z1:T )P( z1:T )
• Thus a value of i maximizing δT(i) identifies a
state which represents the final state in a path
maximizing P(x1:T z1:T )
Hidden Markov Models: Slide 71

DP for MPP (cont.)

• First, base case is
δ1 (i ) = max one choice P( X t = si ∧ z1:1 )
= P(z1 X t = si )P( X t = si )
= bi (z1 )π i
• Then, since the max. prob. path ending at sj at time t+1 must
go through some state at time t, we can write
δ t +1 ( j ) ≡ max x P(x1:t ∧ X t +1 = s j ∧ z1:t +1 )
1:t

= max i max x1:t −1 P(x1:t −1 ∧ X t = si ∧ z1:t ∧ zt +1 ∧ X t +1 = s j )

Now work on just this part

Call it Δ(i,j)

Hidden Markov Models: Slide 72

36
DP for MPP (cont.)
• Using the chain rule and the Markov property, we find that
the probability to be maximized can be written as

Δ(i, j ) ≡ P(x1:t −1 ∧ X t = si ∧ z1:t ∧ zt +1 ∧ X t +1 = s j )

= P(zt +1 ∧ X t +1 = s j x1:t −1 ∧ X t = si ∧ z1:t )P(x1:t −1 ∧ X t = si ∧ z1:t )
= P(zt +1 ∧ X t +1 = s j X t = si )P(x1:t −1 ∧ X t = si ∧ z1:t )
( )
= P zt +1 X t +1 = s j P(X t +1 = s j X t = si )P(x1:t −1 ∧ X t = si ∧ z1:t )
= bj (zt +1 )aij P(x1:t −1 ∧ X t = si ∧ z1:t )

Hidden Markov Models: Slide 73

DP for MPP (cont.)

• Finally, then, we get

δ t +1 ( j ) = max i max x Δ(i, j )

1:t −1

[
= max i max x1:t −1 b j ( zt +1 )aij P( x1:t −1 ∧ X t = si ∧ z1:t ) ]
= b ( z ) max [a
j t +1 i ij max x1:t −1 P( x1:t −1 ∧ X t = si ∧ z1:t )]
= b j ( zt +1 ) max i aijδ t (i )
• This is inductive step
• Virtually identical to computation of forward variables α – only
difference is that it uses max instead of sum
• Also need to keep track of which state si gives max for each
state sj at the next time step to be able to determine actual
MPP, not just its probability
Hidden Markov Models: Slide 74

37
Viterbi Algorithm for Most Probable Path
Summary
• Base case: ∀i δ1 (i ) = bi ( z1 )π i
• Inductive step: ∀j δ t +1 ( j ) = b j ( zt +1 ) max i aijδ t (i )
• Compute for all states at t=1, then t=2, etc.
• Also save index giving max for each state at
each time step (backward pointers)
• Construct the MPP by determining state with
largest δT(i), then following backward pointers
to time steps T-1, T-2, etc.

Hidden Markov Models: Slide 75

Viterbi Algorithm

Store two numbers at each node in this trellis, one for δ and the other a
backward pointer to a node in the previous layer giving the max for this
node – this is computed left to right.
To find a most probable path, determine a node in the T layer with max δ
value, then follow backward pointers from right to left.
Hidden Markov Models: Slide 76

38
Viterbi algorithm: inductive step

δ t +1 ( j ) = b j ( zt +1 ) max i aijδ t (i )
δ t (i )
ψ t +1 ( j ) = arg max i aijδ t (i )
zt +1

Hidden Markov Models: Slide 77

Prob. of a given transition

• The final problem we want to address is the HMM
inference (learning) problem, given a training set of
observation sequences
• Most of the ingredients for deriving a max. likelihood
method for this are in place
• But there’s one more sub-problem we’ll need to
address:
Given an observation sequence z1:T, what’s the probability
that the state transition si to sj occurred at time t?
• Thus we define
(
ξ t (i, j ) = P X t = si ∧ X t +1 = s j z1:T )
Hidden Markov Models: Slide 78

39
Prob. of a given transition (cont.)
ξ t (i, j ) ≡ P(X t = si ∧ X t +1 = s j z1:T )
( )
= cP z1:T X t = si ∧ X t +1 = s j P (X t = si ∧ X t +1 = s j )
= cP(z X = s ∧ X = s )P (X = s X = s )P ( X = s )
1:T t i t +1 j t +1 j t i t i

= cP(z X = s ∧ X = s )a P ( X = s )
1:T t i t +1 j ij t i

= cP(z X = s )P (z X = s )P (z
1:t t i X = s )a P( X = s )
t +1 t +1 j t + 2:T t +1 j ij t i

= cP( z ∧ X = s )b ( z )P (z
1:t t X = s )a
i j t +1 t + 2:T t +1 j ij

= cα t (i )b j ( zt +1 )β t +1 ( j )aij c = 1/P(z1:T) is a normalizing

constant we can ignore as long
= cα t (i )aij b j ( zt +1 )β t +1 ( j ) as we make the sum over all
(i,j) pairs equal to 1 when
computing actual probabilities.
Hidden Markov Models: Slide 79

Prob. of a given transition (cont.)

zt +1

ξ t (i, j ) ≡ P (X t = si ∧ X t +1 = s j z1:T )
α t (i )aij b j (zt +1 )β t +1 ( j )
=
∑k ,l α t (k )akl bl (zt +1 )βt +1 (l )
Hidden Markov Models: Slide 80

40
Max. Likelihood HMM Inference
Given a state set {s1, s2, ..., sN} and a set of R
observation sequences
(
z11:T1 = z11 , z 12 , K , z T11 )
z12:T2 = (z 2
1 , z 22 , K , z T22 )
M
(
z1R:T R = z1R , z 2R , K , z TRR )
determine parameter set λ = (πi,{aij},{bi(oj)})
maximizing
R From now on, we’ll

λ* = arg max λ ∏ P( z1r:T | λ )

make conditioning on
r
λ explicit
r =1

Hidden Markov Models: Slide 81

A cheat
Let’s first imagine that along with each observation sequence
(
z1r:Tr = z1r , z 2r , K , z Trr )
an oracle also gives us the corresponding state sequence
(
x1r:Tr = x1r , x 2r , K , x Trr )
Then we could obtain max. likelihood estimates of all
parameters as follows:
# of sequences starting with si
πˆ i =
total # of sequences
# of transitions si → s j
aˆij =
# of visits to state si
# of visits to state si where ok observed
bˆi (ok ) =
# visits to state si
Hidden Markov Models: Slide 82

41
A cheat (cont.)
More formally, define the indicator functions

⎧ 1 if xtr = si
χ (i ) = ⎨
r

⎩0 otherwise
t

⎧ 1 if xtr = si and xtr+1 = s j

χ (i → j ) = ⎨
r

⎩0 otherwise
t

⎧
χ tr (i : k ) = ⎨1 if xt = si and zt = ok
r r

⎩0 otherwise

Hidden Markov Models: Slide 83

A cheat (cont.)
In terms of these indicator functions, our ML estimates would
then be

∑
R
χ r
1 (i )
πˆ i = r =1
R
∑ ∑ χ (i → j )
R Tr −1 r
For this, we can’t
aˆ = r =1 t =1 t
use the last state in
∑ ∑ χ (i )
ij R Tr −1 r any of the training
r =1 t =1 t sequences because

∑ ∑ χ (i : k )
R Tr r there’s no next state

bˆ (o ) = r =1 t =1 t

∑ ∑ χ (i )
i k R Tr r
r =1 t =1 t

Hidden Markov Models: Slide 84

42
The bad news ...
• There is no oracle to tell us the state sequence
corresponding to each observation sequence
• So we don’t know these actual indicator function
values
• So we can’t compute these sums

Hidden Markov Models: Slide 85

The good news ...

• We can compute their expected values efficiently:
( )
γ tr (i ) ≡ P X t = si z1r:T , λ = E (χ tr (i ) λ )
r

ξ tr (i, j ) ≡ P(X t ) (
= si ∧ X t +1 = s j z1r:Tr , λ = E χ tr (i → j ) λ )
• Also:
( ) (
E χ tr (i : k ) λ = P X t = si ∧ Z t = ok z1r:T , λ )
=⎨
(
⎧ P X t = si z1r:T , λ ) if ztr = ok
⎩0 otherwise
( )
= γ tr (i )I ztr = ok

Usual indicator function:

1 if true, 0 if false

Hidden Markov Models: Slide 86

43
The good news ...
• We can compute their expected values efficiently:
( )
γ tr (i ) ≡ P X t = si z1r:T , λ = E (χ tr (i ) λ )
r

ξ tr (i, j ) ≡ P(X ) (
= si ∧ X t +1 = s j z1r:Tr , λ = E χ tr (i → j ) λ )
t
a
• Also:
E (χ (i : k ) λ ) = P (X = sli∧ kZe= o !z , λ )
s
r r

k M
t t i t k 1:T

o P (X = s zE, λ ) if z = o
Lo= ⎨⎩0 for
⎧ t i
r
1:T
r
t k
otherwise

j=oγ b(i )I (z = o )
t
r r
t k

Usual indicator function:

1 if true, 0 if false

Hidden Markov Models: Slide 87

EM for HMMs (Baum-Welch)

E-step
Use the current estimate of model parameters λ to
compute all the γ t (i ) and ξ tr (i, j ) values for each
r
r
training sequence z1:Tr .
M-step
∑ ∑ ξ tr (i, j )
Tr −1
∑r =1 γ 1r (i )
R R

πi ← aij ← r =1 t =1

∑ ∑ γ tr (i )
R Tr −1
R r =1 t =1

∑ ∑ γ (i )I (z = o )
R Tr r r

b (k ) ← r =1 t =1 t t k

∑ ∑ γ (i )
i R Tr r
r =1 t =1 t

Hidden Markov Models: Slide 88

44
Remarks on Baum-Welch
• Bad news: There may be many local maxima
• Good news: The local maxima are usually
adequate models of the data
• Any probabilities initialized to zero will remain
zero throughout – useful when one wants a
model with limited state transitions

Hidden Markov Models: Slide 89

Summary of solution methods

• Filtering: forward variables (α’s)
• Prediction: (modified) forward variables
• Smoothing: forward-backward algorithm
• Observation sequence likelihood: forward
variables
• Most probable path: Viterbi algorithm
• Maximum likelihood model: Baum-Welch
algorithm

Hidden Markov Models: Slide 90

45
Some good references
• Standard HMM reference:
L. R. Rabiner, "A Tutorial on Hidden Markov Models
and Selected Applications in Speech Recognition,"
Proc. of the IEEE, Vol.77, No.2, pp.257-286, 1989.
• Excellent reference for Dynamic Bayes Nets
as a unifying framework for probabilistic
temporal models (including HMMs and
Kalman filters):
Chapter 15 of Artificial Intelligence, A Modern
Approach, 2nd Edition, by Russell & Norvig

Hidden Markov Models: Slide 91

What You Should Know

• What an HMM is
• Definition, computation, and use of αt(i)
• The Viterbi algorithm
• Outline of the EM algorithm for HMM learning
(Baum-Welch)
• Be comfortable with the kind of math needed
to derive the HMM algorithms described here
• What a DBN is and how an HMM is a special
case
• Appreciate that a DBN (and thus an HMM) is
really just a special kind of Bayes net
Hidden Markov Models: Slide 92

HMMs and Bayes Nets
No ratings yet
HMMs and Bayes Nets
34 pages
Markov Models
No ratings yet
Markov Models
54 pages
Hidden Markov Model Introduction
No ratings yet
Hidden Markov Model Introduction
36 pages
Sequence Model:: Hidden Markov Models
No ratings yet
Sequence Model:: Hidden Markov Models
60 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
55 pages
Hidden Markov Models for ML Students
No ratings yet
Hidden Markov Models for ML Students
5 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
9 pages
NLP Lecture 01-10-Hmm
No ratings yet
NLP Lecture 01-10-Hmm
9 pages
Hidden Markov Models and Sequential Data
No ratings yet
Hidden Markov Models and Sequential Data
45 pages
Module 4.2
No ratings yet
Module 4.2
42 pages
Applications of Hidden Markov Model Stat-1
No ratings yet
Applications of Hidden Markov Model Stat-1
8 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
56 pages
Machine Learning Course Syllabus GIT
No ratings yet
Machine Learning Course Syllabus GIT
28 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
26 pages
BT302 L9 HMM
No ratings yet
BT302 L9 HMM
29 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
Lecture Week11
No ratings yet
Lecture Week11
24 pages
Slides
No ratings yet
Slides
69 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
15 pages
Hidden Markov Model
0% (1)
Hidden Markov Model
5 pages
HMM
No ratings yet
HMM
4 pages
Lec18 HMMs
No ratings yet
Lec18 HMMs
56 pages
A Hidden Markov Model
No ratings yet
A Hidden Markov Model
6 pages
Lec19 PDF
No ratings yet
Lec19 PDF
9 pages
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
No ratings yet
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
26 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
10 pages
Hidden Markovnikov Model
No ratings yet
Hidden Markovnikov Model
32 pages
ML Unit V
No ratings yet
ML Unit V
32 pages
Hidden Markov Models in Bioinformatics
No ratings yet
Hidden Markov Models in Bioinformatics
28 pages
Cis262 HMM
No ratings yet
Cis262 HMM
34 pages
Asr04 HMM Intro
No ratings yet
Asr04 HMM Intro
38 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
Markov Chains Quant
No ratings yet
Markov Chains Quant
4 pages
Знімок екрана 2022-10-31 о 18.56.30
No ratings yet
Знімок екрана 2022-10-31 о 18.56.30
96 pages
Computational Genomics Hidden Markov Models (HMMS)
No ratings yet
Computational Genomics Hidden Markov Models (HMMS)
55 pages
Hidden Markov Model - 20241217 - 231745 - 0000
No ratings yet
Hidden Markov Model - 20241217 - 231745 - 0000
12 pages
Factorial Hidden Markov Models
No ratings yet
Factorial Hidden Markov Models
29 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
11 pages
Where: P (S0) Specifies Initial Conditions P (St+1 - ST) Specifies The Dynamics P (Ot - ST) Specifies The Sensor Model
No ratings yet
Where: P (S0) Specifies Initial Conditions P (St+1 - ST) Specifies The Dynamics P (Ot - ST) Specifies The Sensor Model
2 pages
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
No ratings yet
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
29 pages
Markov Models in AI Applications
No ratings yet
Markov Models in AI Applications
78 pages
HMMs for AI & Web Data Extraction
No ratings yet
HMMs for AI & Web Data Extraction
34 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
4 pages
Hidden Markov Model Overview
100% (2)
Hidden Markov Model Overview
73 pages
2024 Fall CSE366 12 HMM
No ratings yet
2024 Fall CSE366 12 HMM
46 pages
Esma El 2012
No ratings yet
Esma El 2012
6 pages
AML Mod2
No ratings yet
AML Mod2
38 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Lec 7 NNL
No ratings yet
Lec 7 NNL
19 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
4 pages
Learning Algorithms in AI Explained
No ratings yet
Learning Algorithms in AI Explained
67 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
17 pages
08 Graphical Models
No ratings yet
08 Graphical Models
61 pages
Hidden Markov Models for CS Students
No ratings yet
Hidden Markov Models for CS Students
35 pages
MLRD 8
No ratings yet
MLRD 8
39 pages
Human Machine System
No ratings yet
Human Machine System
12 pages
Sujal Ism File
No ratings yet
Sujal Ism File
44 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
OK Q1, W3 - Explaining-the-Dimensions-of-Communication
100% (1)
OK Q1, W3 - Explaining-the-Dimensions-of-Communication
13 pages
CV RN
No ratings yet
CV RN
206 pages
Entropy-Based Attribute Selection in Decision Trees
No ratings yet
Entropy-Based Attribute Selection in Decision Trees
18 pages
A Course in Fuzzy Systems and Control
100% (9)
A Course in Fuzzy Systems and Control
441 pages
Prasad Reddy19 - Power BI 4.2yr
No ratings yet
Prasad Reddy19 - Power BI 4.2yr
4 pages
Information Theory and Coding
No ratings yet
Information Theory and Coding
10 pages
On Color Image Segmentation
No ratings yet
On Color Image Segmentation
17 pages
Ccs341 Data Warehousing
75% (8)
Ccs341 Data Warehousing
2 pages
Implementation of PID, Bang-Bang and Backstepping Controllers On 3D Printed Ambidextrous Robot Hand
No ratings yet
Implementation of PID, Bang-Bang and Backstepping Controllers On 3D Printed Ambidextrous Robot Hand
22 pages
The 9 Deep Learning Papers You Need To Know About 3
No ratings yet
The 9 Deep Learning Papers You Need To Know About 3
19 pages
AI and Soft Computing Exam Questions
No ratings yet
AI and Soft Computing Exam Questions
4 pages
Facial Emotion Detection Using Deep Learning
No ratings yet
Facial Emotion Detection Using Deep Learning
9 pages
597 Icac3n23
No ratings yet
597 Icac3n23
5 pages
SKEP: Sentiment Knowledge Enhanced Pre-Training For Sentiment Analysis
No ratings yet
SKEP: Sentiment Knowledge Enhanced Pre-Training For Sentiment Analysis
10 pages
Neural Networks & Machine Learning
No ratings yet
Neural Networks & Machine Learning
34 pages
Machine Learning with Python Guide
No ratings yet
Machine Learning with Python Guide
22 pages
AI Mini Project Report
No ratings yet
AI Mini Project Report
7 pages
Machine Dreams
100% (3)
Machine Dreams
670 pages
Deep Learning Question Bank
No ratings yet
Deep Learning Question Bank
8 pages
Iterative Autoassociative Memory Models For Image Recalls and Pa
No ratings yet
Iterative Autoassociative Memory Models For Image Recalls and Pa
6 pages
Assignment 2 AIML
No ratings yet
Assignment 2 AIML
6 pages
Military Institute of Science & Technology: Experiment No. 8
No ratings yet
Military Institute of Science & Technology: Experiment No. 8
10 pages
Riya Bura CV
No ratings yet
Riya Bura CV
2 pages
What Is Exploratory Data Analysis (EDA)
100% (2)
What Is Exploratory Data Analysis (EDA)
13 pages
PostgreSQL Syllabus
No ratings yet
PostgreSQL Syllabus
6 pages
Intercultural Communication Study
No ratings yet
Intercultural Communication Study
5 pages
Sujeet Bhagwat (04) - Aditya Gujar (27) - Swaraj Govindwar (26) - Krishna Gatlewar
No ratings yet
Sujeet Bhagwat (04) - Aditya Gujar (27) - Swaraj Govindwar (26) - Krishna Gatlewar
25 pages