Notes for E9 333 - Advanced Deep Representation Learning
Pnathosh A. P .
ECE ,
IISc .
Variational
Inference Ee Expectation Maximization .
Background Reading
:
Gradient descent
KL
divergence
1 .
6 .
Maximum Likelihood Estimation
2 . 7. Back
propagation .
Law the statistician
3.
of unconscious
4.
Expectations Er
gradients
5. Jensen 's
Inequality
Mixture Density models :
consider Data D= { ×,
,
✗2 .
. . _
,
✗n
} in I. i. de (Pa is )
unknown
✗ c- lRᵈ
Model
p ,
to be a convex combination
of several
densities termed Mixture Models
homogeneous .
:
That is , Let denote the model
density Then
Po .
,
¥4; pick)
M
Po (e) 4=1
=
Li ≥ 0
, _¥ ,
where polit any density
is
In particular if piicx n
NG ;ÑÉ) ,
then such
a model is termed as a Gaussian Mixture Model
GMM :
Po G)
=
IE X
; .N ( x
; iii.
,
The
problem : Obtain the Max . Likelihood estimate
GMM D.
for parameters given
Let 0 denote the parameters :
0--22 , µ ,
}, !
-2
The likelihood
function :
eco ) =
É , log Potti)
=
TÉ log IÉ
,
N(×i ; Mj , ;)
-2 -
(1)
( instead
Eq C)
. has sum
of logs of usually occurring
) Thus
log
22¥
does not lead to
of sums .
closed in 0
form covens
. .
One around : Assume / hidden / Latent
missing
work a
variable in the data .
{ :}
'
complete data D =
✗
i. 2
i -4
n i. i.
tape
Observed data D= {✗i } :[ in i. i. d
¥
this
Under
assumption ,
ID is sampeled as below :
)
i
sample 2in
12 (Pz is the prior on latent variables
i :) sample *
:-p
*
12=-7-5
Zi is unobserved but Xi is observed .
perform Max likelihood parameter
to
question : How .
estimation such latent variable models ?
for
Note
p•C✗)
=
]p• (4-2) DE or
E- p•( ✗
if
2 is discrete
Ilo)
log polx) log / Poke)dz
°
• .
=
Since 2 is unobserved ,
let us assume a distribution
qlz) over it ,
called the variational distribution .
Now ,
tho ) =
log / Po
Ki 2) dz
log fqlz) if kid dz
q( 2)
≥
[ qlz) log
I
Pocx 2)
,
dz [ Jensen's inequality ]
02C 2)
É Fo (g) + H (g)
log Po
=
It
(g) =
(2)
logqlasdz
-
Folq) is a
functional which is a lower bound on the
( Evidence)
data
dog likelihood and hence termed the
Evidence Lower Round [ ELBO] .
In order to ICO)
maximize ,
we
maximize Folq) instead
by alternatively optimizing over
if EO .
This is called the
Expectation Maximization ( En)
algorithm .
EM
algorithm :
Estep Fo (g)
:
Maximize [Link] given 0
q€* =
argmax Foetal
q
M Fo (g)
step :
Maximize [Link] - O
given q .
Oᵗᵗ
'
Fo ( OE )
argmax
=
argmax
0 / OF log Po ( ✗ F) die .
Lemma : EM
algorithm never decreases the
log likelihood .
:
Proof
Consider ICO) -
Fo (g) =
log poem -1912 )
log poke) dz
02127
log Poli) folk] log [Link] dz
-
OLCZ)
=
log Poli) -
log Pdx) -1-1042) toggle) de
Po #)
×
= DKL [04411%121×1]
i. [Link]) =
Fo (g) Eff DKL [ qC2) Up (21×7)=0 ,
02k) E-
poczlx)
of (21×1)
°
• . in E- step =
p→
110¥ =E →
(E) ≤ F(qᵗ) [Ensured via the
M
step]
We also have Fot ( OE) ≤ llot) [Jensen's ineq] .
llot )
'
E- Not )
-
8. E. D i☒%
In to the likelihood
summary
in
,
maximize a
variable ( mixture model ) lower bound
latent
, first a
is constructed likelihood variation distribution
on
using
a
on the latent variable .
Subsequently ,
the lower bound
is
iteratively optimized over variational Er the
model parameters Model then be used several
. can
for
tasks such as
posterior inference (
estimating Potala) and
sampling .
Variation Auto Encoders .
For mixture densities ,
EM can be used since
the variational
optimization over
density is tractable .
i. e.
, of =
PEG / )
× which can be
analytically
computed .
However thins not be the densities
may
case
for arbitrary .
Thus desirable
,
it is to have an
algorithm that
works even in the cases where poczlx) a Poli) are
intractable .
Variation Auto Encoders provide to
efficient way
accomplish ML estimation , posterior inference a
such
sampling for cases .
References:
1. [Link]
Expectation-Maximization/
2. [Link]
3. [Link]
4. [Link]
5. [Link]
reparameterization/