Tutorial on Factor Graph and Belief Propagation
References:
Factor Graphs and the Sum-Product Algorithm
by Frank R. Kschischang, Brendan J. Frey, and Hans-Andrea Loeliger,
IT Trans. Feb. 2001
Constructing Free Energy Approximations and Generalized Belief
Propagation Algorithms
by Jonathan S. Yedidia, Wiliam T. Freeman, and Yair Weiss
Marginal Functions
gi (xi ) =
g(x1 , . . . , xn )
xi
Example:
posteriori probability distribution p(x | y):
g(x1 , . . . , xn ) = p(x1 , . . . , xn |y)
likelihood function
g(x1 , . . . , xn ) = p(y | x1 , . . . , xn )
Marginalize: estimate individual symbols.
Factor Graph
Consider the class of functions
g(x1 , . . . , xn ) =
fj (Xj )
jJ
Example:
g(x1 , . . . , x5 ) = fA (x1 )fB (x2 )fC (x1 , x2 , x3 )fD (x3 , x4 )fE (x3 , x5 )
Example: Linear Codes
Example: Linear Codes, Tanner Graph
Parity check matrix
2
1
6
H=6
4 0
1
7
1 7
5
0
Code C is the set of x such that Hx = 0, its characteristic function is:
C (x1 , . . . , x6 )
=
=
1[(x1 ,...,x6 )C]
Y
1[j th parity check is satisfied]
j
Linear Codes, Noisy Observations
posteriori distribution
p(x | y) = g(x) f (y | x)p(x)
Assuming a memoryless channel, and equally likely codewords
n
Y
1
g(x) = C (x)
f (yi | xi )
C
i=1
Example: Kalman Filter
xj+1
Aj xj + Bj uj
yj
Cj xj + Dj wj
Consider
g(x1 , . . . , xn )
f (x1 , . . . , xn | y1 , . . . , yn )
n
Y
f (xj | xj1 )f (yj | xj )
j=1
Marginalize:
gn (xn ) = f (xn | y1 , . . . , yn )
gives the MMSE of xn and the estimation error distribution.
Example: Kalman Filter
Factor Graph for Kalman Filtering:
Factor Graph and Expression Tree
Idea: sum over one variable at a time.
Example:
g(x1 , . . . , x5 ) = fA (x1 )fB (x2 )fC (x1 , x2 , x3 )fD (x3 , x4 )fE (x3 , x5 )
Factor Graph and Expression Tree
Each factor node sums over all the variables that are its children,
since they are not related to any other function
X
Y
mai =
fa (xa )
nja (xj )
xa /xi
jN (a)/i
Each variable node combines all its children factors
Y
nia (xi ) =
mbi (xi )
bN (i)/a
Belief Propagation
At a variable node, belief
bi (xi )
mai (xi )
aN (i)
Approximation to the exact marginal distribution pi (xi ).
At a variable node, belief
ba (xa ) fa (xa )
nia (xi )
iN (a)
Approximation of the joint distribution of xa .
Marginalize:
X
bi (xi ) =
ba (xa )
xa /xi
Example: parity check node
fa (x3 = 0, x1 , x2 )p(x1 , x2 ) = P (x1 + x2 = 0)
x1 ,x2
Message transmitted in and out a variable node i is an approximate of
the marginal distribution of xi .
Example: Kalman Filtering
xj+1
Aj xj + Bj uj
yj
Cj xj + Dj wj
Goal: forward estimation
Pn|n (xn )
=
=
f (xn | y1 , . . . , yn )
Z
f (x1 , . . . , xn | y1 , . . . , yn )d( {xn })
{xn }
Example: Kalman Filtering
Update rule:
2
Pj|j (xj ) = Pj|j1 (xj )f (yj |xj ) N (m
j|j , j|j
)
estimate of xj based on observations y1 , . . . , yj .
Z
Pj+1|j (xj+1 ) =
2
j+1|j , j+1|j
)
Pj|j (xj )N (Aj xj , Bj2 )dxj N (m
estimate of xj+1 based on observations y1 , . . . , yj .
Computing All Marginal Functions
Each node sends message on an edge only when it has received
messages from all other edges.
Example: Kalman Filter (again)
Factor Graphs with Cycles
Iterative Processing
LDPC codes
RA codes
...
Message-passing Schedules
flooding
serial
Free Energy
For a factor graph probability distribution
M
1 Y
p(x) =
fa (xa )
Z a=1
define the energy of a state as
E(x) =
M
X
logfa (xa )
i=1
Helmholtz free energy
Fhelmholtz = logZ
Given E(x), can integral to compute Z and p(x), but it is hard.
Free Energy
Gibbs free energy: for another distribution b(x),
F (b)
=
=
U (b) H(b)
X
X
b(x)E(x) +
b(x) log b(x)
Fhelmholtz + D(b||p)
Minimize Gibbs free energy: compute Z and recover p: still hard.
Q
Restrict b(x) = i bi (xi ), minimize mean field free energy to have an
approximation of p(x).
Region-based Free Energy Approximation
Region energy
ER (xR ) =
log fa (xa )
aFR
Region belief bR (xR ) is an approximation of pR (xR ).
Minimize
FR (bR ) = UR (bR ) HR (bR )
gives an approximation of the original distribution.
Problem: overlap and over counting.
Solution: introduce counting number.
Bethe Method
Bethe method: a special method of defining the regions and the
counting numbers.
Fixed points of the BP algorithm correspond to stationary points of the
Bethe approximation of the free energy.
BP always have a fixed point
Uniqueness of the stationary points is studied in physics.
BP does not decrease Bethe free energy at each iteration.
Generalized BP.