0% found this document useful (0 votes)

89 views7 pages

HMMs & Forward-Backward Guide

The document discusses Hidden Markov Models (HMMs) and the forward-backward algorithm. It provides: 1) An overview of HMMs, including their graphical model representation, the goal of estimating hidden state distributions given observations, and the components needed to specify an HMM - transition probabilities, observation probabilities, and initial state distribution. 2) An explanation of the forward-backward algorithm, which computes forward and backward messages to obtain marginal hidden state distributions. The messages incorporate previous messages, observation probabilities, and transition probabilities. 3) An example application of an HMM and the forward-backward algorithm to model a stuck robot moving between different areas based on sensor observations.

Uploaded by

saurav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views7 pages

HMMs & Forward-Backward Guide

Uploaded by

saurav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

HMMs and the forward-backward algorithm

Ramesh Sridharan
These notes give a short review of Hidden Markov Models (HMMs) and the forwardbackward algorithm. Theyre written assuming familiarity with the sum-product belief
propagation algorithm, but should be accessible to anyone whos seen the fundamentals
of HMMs before.
The notation here is borrowed from Introduction to Probability by Bertsekas & Tsitsiklis:
random variables are represented with capital letters, values they take are represented with
lowercase letters, pX represents a probability distribution for random variable X, and pX (x)
represents the probability of value x (according to pX ).

Hidden Markov Models

Figure 1 shows the (undirected) graphical model for HMMs. Heres a quick recap of the
important facts:
X1

Figure 1: An undirected graphical model for the HMM. Connections between nodes indicate
dependence.
We observe Y1 through Yn , which we model as being observed from hidden states X1
through Xn .
Any particular state variable Xk depends only on Xk1 (what came before it), Xk+1
(what comes after it), and Yk (the observation associated with it).
The goal of the forward-backward algorithm is to find the conditional distribution over
hidden states given the data.
In order to specify an HMM, we need three pieces:

Contact: [email protected]

m1!2 (x2 )

m2!3 (x3 )

m2!1 (x1 )

m3!4 (x4 )

m3!2 (x2 )

m4!3 (x3 )

Figure 2: A visualization of the forward and backward messages. Each message is a table
that indicates what the node at the start point believes about the node at the end point.
A transition distribution, pXk+1 |Xk (xk+1 |xk ) = W (xk+1 |xk ) 1 , which describes the
distribution for the next state given the current state. This is often represented
as a matrix that well call A. Rows of A correspond to the current state, columns
correspond to the next state, and each entry corresponds to the transition probability. So, the entry at row i and column j, Aij , is pXk+1 |Xk (j|i), or equivalently
W (j|i).
An observation distribution (also called an emission distribution) pYk |Xk (yk |xk ) =
pY |X (yk |xk ) 2 , which describes the distribution for the output given the current
state. Well represent this with matrix B. Here, rows correspond to the current
state, and columns correspond to the observation. So, Bij = pY |X (j|i): the probability of observing output j from state i is Bij . Since the number of possible
observations isnt necessarily the same as the number of possible states, B wont
necessarily be square.
An initial state distribution pX1 , which describes the starting distribution over
states. Well represent this with a vector called 0 , where item i in the vector
represents pX1 (i).
The forward-backward algorithm computes forward and backward messages as follows:
prev. message

observation term

transition term

}|
{z
}|
{z
}|
{
Xz
m(k2)(k1) (xk1 ) pY |X (yk1 |xk1 ) W (xk1 |xk )
m(k1)k (xk ) =
xk1

m(k+1)k (xk ) =

X
xk+1

m(k+2)(k+1) (xk+1 ) pY |X (yk+1 |xk+1 ) W (xk |xk+1 )

{z
}
|
{z
}|
{z
}|
prev. message

observation term

transition term

These messages are illustrated in Figure 2. The first forward message m01 (x1 ) is
initialized to 0 (x1 ) = pX1 (x1 ). The first backward message m(n+1)n (xn ) is initialized
to uniform (this is equivalent to not including it at all).
Figure 3 illustrates the computation of one forward message m23 (x3 ).
To obtain a marginal distribution for a particular state given all the observations,
pXk |Y1 ,...,Yn , we simply multiply the incoming messages together with the observation
1

Were only going to worry about homogeneous Markov chains, where the transition distribution doesnt
change over time: thats why our W and A notations only depend on the values and not the timepoints.
2
Once again, well focus on Markov chains where the emission distribution is the same for every state.

term, and then normalize:

pXk |Y1 ,...,Yn (xk |y1 , . . . , yn ) m(k1)k (xk )m(k+1)k (xk )pY |X (yk |xk )
Here, the symbol means is proportional to, and indicates that we have to normalize
at the end so that the answer sums to 1.
Traditionally, the forward-backward algorithm computes a slightly different set of messages. The forward message k represents a message from k 1 to k that includes
pY |X (yk |xk ), and the backward message k represents a message from k + 1 to k identical to m(k+1)k above.
prev. message transition term

observation term

}|
{ X z }| { z
}|
{
z
k (xk ) = pY |X (yk |xk )
k1 (xk1 ) W (xk1 |xk )
xk1

k (xk ) =

k+1 (xk+1 ) pY |X (yk+1 |xk+1 ) W (xk |xk+1 )

| {z } |
{z
}
{z
}|

xk+1 prev. message

observation term

transition term

These messages have a particularly nice interpretation as probabilities:

k (xk ) = pY1 ,Y2 ,...,Yk ,Xk (y1 , y2 , . . . , yk , xk )
k (xk ) = pYk+1 ,Yk+2 ,...,Yn |Xk (yk+1 , yk+2 , . . . , yn |xk )
The initial forward message is initialized to 1 (x1 ) = pX1 (x1 )pY |X (y1 |x1 ). To obtain
a marginal distribution, we simply multiply the messages together and normalize:
pXk |Y1 ,...,Yn (xk |y1 , . . . , yn ) k (xk )k (xk )
Example
Suppose you send a robot to Mars. Unfortunately, it gets stuck in a canyon while landing
and most of its sensors break. You know the canyon has 3 areas. Areas 1 and 3 are sunny
and hot, while Area 2 is cold. You decide to plan a rescue mission for the robot from Area
3, knowing the following things about the robot:
m1!2 (x2 )

m2!3 (x3 )

W (x3 |x2 )
pY |X (y2 |x2 )

Figure 3: An illustration of how to compute m23 (x3 ). In order for node 2 to summarize
its belief about X3 , it must incorporate the previous message m12 (x2 ), its observation
pY |X (y2 |x2 ), and the relationship W (x3 |x2 ) between X2 and X3 .
3

Every hour, it tries to move forward by one area (i.e. from Area 1 to Area 2, or Area 2
to Area 3). It succeeds with probability 0.75 and fails with probability 0.25. If it fails,
it stays where it is. If it is in Area 3, it always stays there (and waits to be rescued).
The temperature sensor still works. Every hour, we get a binary reading telling us
whether the robots current environment is hot or cold.
We have no idea where the robot initially got stuck.
Solution:
(a) Construct an HMM for this problem: define a transition matrix A, an observation matrix
B, and an initial state distribution 0 .
(b) Suppose we observe the sequence (hot, cold, hot). First, before doing any computation,
determine the sequence of locations. Then, compute the forward and backward messages,
and determine the distribution for the second state using the messages. Do your answers
match up?
(a) Well start with the transition matrix. Remember that each row corresponds to the
current state, and each column corresponds to the next state. Well use 3 states, each
corresponding to an area.
If the robot is in Area 1, it stays where it is with probability 0.25, moves to Area
2 with probability 0.75, and cant move to Area 3.
Similarly, if the robot is in Area 2, it stays where it is with probability 0.25, cant
move back to Area 1, and moves to Area 3 with probability 0.75.
If the robot is in Area 3, it always stays in Area 3.
Each item above gives us one row of A. Putting it all together, we obtain
1
2
1 0.25 0.75
A= 2
0
0.25
3
0
0

3
0 !
0.75
1

Next, lets look at the observation matrix. There are two possible observations, hot and
cold. Areas 1 and 3 always produce hot readings while Area 2 always produces a
cold reading:
hot
1
0
1

1
B= 2
3

cold
0 !
1
0

Last but not least, since we have no idea where the robot starts, our initial state distribution will be uniform:
1 1/3!
0 = 2
3

1/3
1/3

(b) Before doing any computation, we see that the sequence (hot,cold,hot) could only have
been observed from the hidden state sequence (1,2,3). Make sure you convince yourself
this is true before continuing!
Well start with the forward messages.
X
m12 =
m01 (x1 )pY |X (y1 |x1 ) (x1 , x2 )
{z
}
|
x
1

depends only on x1 and y1

The output message should have three different possibilities, one for each value of x2 .
We can therefore represent it as a vector indexed by x2 :
! value for x = 1
2

value for x2 = 2
value for x2 = 3

For each term in the sum (i.e., each possible value of x1 ):

m01 comes from from the initial distribution. Normally it would come from the
previous message, but our first forward message is always set to initial state distribution.
pY |X (y1 |x1 ) comes from the column of B corresponding to our observation y1 = hot.

comes from a row of A: we are fixing x1 and asking about possible values for
x2 , which corresponds exactly to the transition distributions given in the rows of
A (remember that the rows of A correspond to the current state and the columns
correspond to the next state).
So, we obtain
x =2

x1 =1

x =3

1
1
}|
}|
}|
{ z
{ z
{
.25
0
0
1
1
1

.75
.25
+ 0
+ 1 0
= 1
3
3
3
0
.75
1

1
3
4

m12

Since our probabilities are eventually computed by multiplying messages and normalizing, we can arbitrary renormalize at any step to make the computation easier.
5

For the second message, we perform a similar computation:

X
2 )(x2 , x3 )
m23 =
m12 (x2 )(x
x2
x2 =1

x =2

x =3

2
2
}|
{ z }| {
{ z
.25
0
0

= 1 0 .75 + 3 1 .25 + 4 0 0
0
.75
1

0
1
3

The backwards messages are computed using a similar formula:

X
3 ) (x2 , x3 )
m32 =
m43 (x3 )(x
{z
}
|
x3

depends only on x3

The first backwards message, m43 (x3 ), is always initialized to uniform since we have
no information about what the last state should be. Note that this is equivalent to not
including that term at all.
For each value of x3 , the transition term (x2 , x3 ) is now drawn from a column of A,
since we are interested in the probability of arriving at x3 from each possible state for
x2 . We compute the messages as:
x =1

m32

x =2

x =3

3
3
3
z }|
{ z }| { z }| {
.25
.75
0

= 1 0 + 0 .25 + 1 .75
0
0
1

1

3
4

Similarly, the second backwards message is:

x =2

x2 =1

m21

x =3

2
2
}|
}|
{ z
{ z
{
.25
.75
0

= 1 0 0 + 3 1 .25 + 4 0 .75
0
0
1

3

1
0

Notice from the symmetry of the problem that our forwards messages and backwards
messages were the same.
6

To compute the marginal distribution for X2 given the data, we multiply the messages
and the observation:
2)
pX2 |Y1 ,...,Yn (x2 |y1 , . . . , yn ) m12 (x2 )m32 (x2 )(x

1
1
0

3 3 1
4
4
0

0

= 1
0

Notice that in this case, because of our simplified observation model, the observation
cold allowed us to determine the state. This matches up with our earlier conclusion
that the robot must have been in Area 2 during the second hour.
If we were to compute messages, we would start with our initial message, 1 :

1/3

0
1 (x1 ) = pX1 (x1 )pY |X (y1 |x1 ) =
1/3

The first real message is computed as follows:

x1 =2
x1 =3
x1 =1
}|
{
}|
{
}|
{
z
z
z

.25
0
0
0

.75 + 0 .25 + 1/3 0

1/3

2 = 1

0
.75
1
0

0

1
0
The second message is similar:
x1 =1

x1 =2
x1 =3
z
}| { z
}| { z }|

{

.25
0
0
1

3 = 0
0 .75 + 1 .25 + 0 0

1
0
.75
1

1

0
1
The messages would be identical to our backwards messages computed earlier.

Hidden Markov Models Explained
No ratings yet
Hidden Markov Models Explained
8 pages
Forward-Backward Algorithm PDF
No ratings yet
Forward-Backward Algorithm PDF
6 pages
Lec20 PDF
No ratings yet
Lec20 PDF
7 pages
Hidden Markov Models for ML Students
No ratings yet
Hidden Markov Models for ML Students
5 pages
Hidden Markov Models for Experts
No ratings yet
Hidden Markov Models for Experts
59 pages
AML Mod2
No ratings yet
AML Mod2
38 pages
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
No ratings yet
SP14 CS188 Lecture 14 - Hidden Markov Models - Print
26 pages
Fundamentals of Speech Recognition Suggested Project The Hidden Markov Model 1. Project Introduction
No ratings yet
Fundamentals of Speech Recognition Suggested Project The Hidden Markov Model 1. Project Introduction
11 pages
Hidden Markov Models Overview
No ratings yet
Hidden Markov Models Overview
33 pages
Markov Models for Data Analysis
No ratings yet
Markov Models for Data Analysis
32 pages
Hidden Markov Models in NLP
No ratings yet
Hidden Markov Models in NLP
33 pages
Lec 11
No ratings yet
Lec 11
7 pages
Introduction to Hidden Markov Models
No ratings yet
Introduction to Hidden Markov Models
30 pages
Recitation4 Notes
No ratings yet
Recitation4 Notes
6 pages
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
No ratings yet
Lecture 8: State-Space Models Based On Slides By: Probabilis C Graphical Models
29 pages
10 Forward - Backward Algorithm
No ratings yet
10 Forward - Backward Algorithm
21 pages
19-Hidden Markov Models
No ratings yet
19-Hidden Markov Models
17 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
5 pages
Hidden Markov Models Overview
No ratings yet
Hidden Markov Models Overview
15 pages
Lecture 11
No ratings yet
Lecture 11
55 pages
Algorithms - Hidden Markov Models
No ratings yet
Algorithms - Hidden Markov Models
7 pages
Hidden Markov Model HMM
No ratings yet
Hidden Markov Model HMM
33 pages
11 Probabilistic Temporal Models
No ratings yet
11 Probabilistic Temporal Models
60 pages
24f 09 Hidden Markov Models
No ratings yet
24f 09 Hidden Markov Models
79 pages
Understanding the Backward Algorithm in HMM
No ratings yet
Understanding the Backward Algorithm in HMM
4 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
51 pages
Hidden Markovnikov Model
No ratings yet
Hidden Markovnikov Model
32 pages
Hidden Markov Models: Adapted From
No ratings yet
Hidden Markov Models: Adapted From
27 pages
Forward-Backward Algorithm
No ratings yet
Forward-Backward Algorithm
8 pages
Lectures 7 and 8
No ratings yet
Lectures 7 and 8
37 pages
Markov Models in AI Applications
No ratings yet
Markov Models in AI Applications
78 pages
Understanding HMM Parameter Learning
No ratings yet
Understanding HMM Parameter Learning
31 pages
Lec18 HMMs
No ratings yet
Lec18 HMMs
56 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
20 pages
AI Exam Solutions and Problems
100% (1)
AI Exam Solutions and Problems
10 pages
Hidden Markov Models and Sequential Data
No ratings yet
Hidden Markov Models and Sequential Data
45 pages
Machine Learning Solutions Manual
No ratings yet
Machine Learning Solutions Manual
18 pages
Знімок екрана 2022-10-31 о 18.56.30
No ratings yet
Знімок екрана 2022-10-31 о 18.56.30
96 pages
Machine Learning Course Syllabus GIT
No ratings yet
Machine Learning Course Syllabus GIT
28 pages
cs229 HMM
No ratings yet
cs229 HMM
13 pages
AAI Lab Manual FH-25
No ratings yet
AAI Lab Manual FH-25
20 pages
Slides
No ratings yet
Slides
69 pages
Unit 16: Hidden Markov Models: Computational Statistics With Application To Bioinformatics
No ratings yet
Unit 16: Hidden Markov Models: Computational Statistics With Application To Bioinformatics
24 pages
Hidden Markov Model (HMM) Architecture
No ratings yet
Hidden Markov Model (HMM) Architecture
15 pages
HMM Cuda Baum Welch
No ratings yet
HMM Cuda Baum Welch
8 pages
S4 Mat
No ratings yet
S4 Mat
2 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
51 pages
Particle Filtering and Smoothing Tutorial
No ratings yet
Particle Filtering and Smoothing Tutorial
41 pages
Sequence Model:: Hidden Markov Models
No ratings yet
Sequence Model:: Hidden Markov Models
60 pages
hw3 Solution
No ratings yet
hw3 Solution
7 pages
Particle Filter Tutorial
No ratings yet
Particle Filter Tutorial
8 pages
2006 March 21 MRF
No ratings yet
2006 March 21 MRF
101 pages
Particle Filtering Tutorial Overview
No ratings yet
Particle Filtering Tutorial Overview
39 pages
HMM Isolated Word Recognition
No ratings yet
HMM Isolated Word Recognition
23 pages
A Hidden Markov Model
No ratings yet
A Hidden Markov Model
6 pages
Chapter15 1
No ratings yet
Chapter15 1
36 pages
Iss8411 Gross Miller
No ratings yet
Iss8411 Gross Miller
11 pages
100 Mostly Used Vocabulary Words From The Hindu Editorial
No ratings yet
100 Mostly Used Vocabulary Words From The Hindu Editorial
3 pages
October Month Vocabulary The Hindu Editorial Part-1
No ratings yet
October Month Vocabulary The Hindu Editorial Part-1
10 pages
SBI Bank PO Exam Preparation Guide
No ratings yet
SBI Bank PO Exam Preparation Guide
35 pages
GATE Electrical Solved Paper
No ratings yet
GATE Electrical Solved Paper
16 pages
Constitutional Non Constitutional Bodies
0% (1)
Constitutional Non Constitutional Bodies
6 pages
Constitutional Non Constitutional Bodies
0% (1)
Constitutional Non Constitutional Bodies
6 pages
2 EC Objective Paper II 2010
No ratings yet
2 EC Objective Paper II 2010
24 pages
Constitutional Non Constitutional Bodies
0% (1)
Constitutional Non Constitutional Bodies
6 pages
IES Academy's Master Word List: Abandon Abridge
67% (3)
IES Academy's Master Word List: Abandon Abridge
76 pages
Gat 1-Communication Systems
No ratings yet
Gat 1-Communication Systems
110 pages
Transformer Basics for Engineers
No ratings yet
Transformer Basics for Engineers
5 pages
Chapter - 5 (Bipolar Junction Transisior) : Electronic Devices
No ratings yet
Chapter - 5 (Bipolar Junction Transisior) : Electronic Devices
15 pages
International Organizations Name, Headquarters and Heads by AffairsCloud
No ratings yet
International Organizations Name, Headquarters and Heads by AffairsCloud
8 pages
Android Magazine Issue 34 - 2014 UK
No ratings yet
Android Magazine Issue 34 - 2014 UK
100 pages
Long-Term Creep Prediction for Conductors
No ratings yet
Long-Term Creep Prediction for Conductors
6 pages
AD5933
No ratings yet
AD5933
12 pages
Abramson Et Al. - 2009 - OrthoMADS A Deterministic MADS Instance With Orth
No ratings yet
Abramson Et Al. - 2009 - OrthoMADS A Deterministic MADS Instance With Orth
19 pages
Welding Penetration Reports ITL
No ratings yet
Welding Penetration Reports ITL
31 pages
20170214160241lecture 2 Skm3013-Virial and Graphical
No ratings yet
20170214160241lecture 2 Skm3013-Virial and Graphical
9 pages
STD 11 Physics Maharashtra Board
No ratings yet
STD 11 Physics Maharashtra Board
26 pages
2.2 Magnification and Calibration
No ratings yet
2.2 Magnification and Calibration
20 pages
Mind Tricks and Visual Perception
100% (1)
Mind Tricks and Visual Perception
48 pages
Probabilistic Robotics and Localization
No ratings yet
Probabilistic Robotics and Localization
44 pages
Folded Bow-Tie Antenna
No ratings yet
Folded Bow-Tie Antenna
8 pages
Jce Math Full
No ratings yet
Jce Math Full
128 pages
CSE Schedule LEVEL1 Fall 2024-2025
No ratings yet
CSE Schedule LEVEL1 Fall 2024-2025
2 pages
11-1 Frame of Reference and Relative Motion
No ratings yet
11-1 Frame of Reference and Relative Motion
14 pages
IS-2911 (Part 4) - 2013
78% (9)
IS-2911 (Part 4) - 2013
17 pages
Civil Breadth Mor Question 1 Sample
No ratings yet
Civil Breadth Mor Question 1 Sample
7 pages
Chemical Reaction Analysis
No ratings yet
Chemical Reaction Analysis
5 pages
Turflow Type Heat Exchanger EVC (Exhaust Vapour Condenser) - Technical Information
No ratings yet
Turflow Type Heat Exchanger EVC (Exhaust Vapour Condenser) - Technical Information
3 pages
DOTmed's 2013 Buyer's Guide
No ratings yet
DOTmed's 2013 Buyer's Guide
206 pages
Problems Hydrology Lecture Notes
100% (1)
Problems Hydrology Lecture Notes
14 pages
Blood Flow Meter
No ratings yet
Blood Flow Meter
36 pages
Espionage in the Soviet Atomic Bomb Project
No ratings yet
Espionage in the Soviet Atomic Bomb Project
6 pages
JT15D Ata 79
100% (2)
JT15D Ata 79
10 pages
Short Summary Leviathan
100% (4)
Short Summary Leviathan
13 pages
Design of Soil Nailed Walls According To AS4678 CAB 061117 SMEC ChrisBridges
No ratings yet
Design of Soil Nailed Walls According To AS4678 CAB 061117 SMEC ChrisBridges
1 page
Understanding Dielectric Constants
No ratings yet
Understanding Dielectric Constants
2 pages
Biomechanics of Soft Tissue Mechanics
No ratings yet
Biomechanics of Soft Tissue Mechanics
18 pages
Nov Dec f3 Phy Assignment - Form 3 - Physics
No ratings yet
Nov Dec f3 Phy Assignment - Form 3 - Physics
14 pages
Schneider 3-Way Valves
No ratings yet
Schneider 3-Way Valves
5 pages
2 Way Poppet Type Solenoid Valve
No ratings yet
2 Way Poppet Type Solenoid Valve
16 pages