4/7/2015
MA5232 Modeling and Numerical
Simulations
Lecture 2
Iterative Methods for Mixture-Model
Segmentation
8 Apr 2015
National University of Singapore
Last time
PCA reduces dimensionality of a data set while
retaining as much as possible the data variation.
Statistical view: The leading PCs are given by the
leading eigenvectors of the covariance.
Geometric view: Fitting a d-dim subspace model via
SVD
Extensions of PCA
Probabilistic PCA via MLE
Kernel PCA via kernel functions and kernel matrices
National University of Singapore
4/7/2015
This lecture
Review basic iterative algorithms for central
clustering
Formulation of the subspace segmentation
problem
National University of Singapore
Segmentation by Clustering
From: Object Recognition as Machine Translation, Duygulu, Barnard, de Freitas, Forsyth, ECCV02
4/7/2015
Example 4.1
Euclidean distance-based clustering is not
invariant to linear transformation
Distance metric needs to be adjusted after
linear transformation
National University of Singapore
Central Clustering
Assume data sampled from a mixture of
Gaussian
Classical distance metric between a sample
and the mean of the jth cluster is the
Mahanalobis distance
National University of Singapore
4/7/2015
Central Clustering: K-Means
Assume a map function provide each ith
sample a label
An optimal clustering minimizes the withincluster scatter:
i.e., the average distance of all samples to
their respective cluster means
National University of Singapore
Central Clustering: K-Means
However, as K is user defined,
when
each point becomes a cluster itself: K=n.
In this chapter, would assume true K is known.
National University of Singapore
4/7/2015
Algorithm
A chicken-and-egg view
National University of Singapore
Two-Step Iteration
National University of Singapore
10
4/7/2015
Example
http://util.io/k-means
National University of Singapore
11
Feature Space
Source: K. Grauman
4/7/2015
Results of K-Means Clustering:
Image
Clusters on intensity
Clusters on color
K-means clustering using intensity alone and color alone
* From Marc Pollefeys COMP 256 2003
4/7/2015
A bad local optimum
National University of Singapore
15
Characteristics of K-Means
It is a greedy algorithm, does not guarantee to
converge to the global optimum.
Given fixed initial clusters/ Gaussian models, the
iterative process is deterministic.
Result may be improved by running k-means
multiple times with different starting conditions.
The segmentation-estimation process can be
treated as a generalized expectationmaximization algorithm
National University of Singapore
16
4/7/2015
EM Algorithm [Dempster-Laird-Rubin 1977]
Expectation Maximization (EM) estimates the
model parameters and the segmentation in a
ML sense.
Assume samples are independently drawn
from a mixed probabilistic distribution,
indicated by a hidden discrete variable z
Cond. dist. can be Gaussian
National University of Singapore
17
The Maximum-Likelihood Estimation
The unknown parameters are
The likelihood function:
The optimal solution maximizes the loglikelihood
National University of Singapore
18
4/7/2015
The Maximum-Likelihood Estimation
Directly maximize the log-likelihood function
is a high-dimensional nonlinear optimization
problem
National University of Singapore
19
Define a new function:
The first term is called expected complete loglikelihood function;
The second term is the conditional entropy.
National University of Singapore
20
10
4/7/2015
Observation:
National University of Singapore
21
The Maximum-Likelihood Estimation
Regard the (incomplete) log-likelihood as a
function of two variables:
Maximize g iteratively (E step, followed by M
step)
National University of Singapore
22
11
4/7/2015
Iteration converges to a stationary
point
National University of Singapore
23
Prop 4.2: Update
National University of Singapore
24
12
4/7/2015
Update
Recall
Assume
is fixed, then maximize the
expected complete log-likelihood
National University of Singapore
25
To maximize the expected log-likelihood, as an
example, assume each cluster is isotropic
normal distribution:
Eliminate the constant term in the objective
National University of Singapore
26
13
4/7/2015
Exer 4.2
Compared to k-means, EM assigns the
samples softly to each cluster according to a
set of probabilities.
National University of Singapore
27
EM Algorithm
National University of Singapore
28
14
4/7/2015
Exam 4.3: Global max may not exist
National University of Singapore
29
Alternative view of EM:
Coordinate ascent
w
w1
National University of Singapore
30
15
4/7/2015
Alternative view of EM:
Coordinate ascent
w
w1
National University of Singapore
31
Alternative view of EM:
Coordinate ascent
w
w2
w1
National University of Singapore
32
16
4/7/2015
Alternative view of EM:
Coordinate ascent
w
w2
w1
National University of Singapore
33
Alternative view of EM:
Coordinate ascent
w
w2
w1
National University of Singapore
34
17
4/7/2015
Alternative view of EM:
Coordinate ascent
w
w2
w1
National University of Singapore
35
Visual example of EM
18
4/7/2015
Potential Problems
Incorrect number of Mixture Components
Singularities
Incorrect Number of Gaussians
19
4/7/2015
Incorrect Number of Gaussians
Singularities
A minority of the data can have a
disproportionate effect on the model
likelihood.
For example
20
4/7/2015
GMM example
Singularities
When a mixture component collapses on a
given point, the mean becomes the point, and
the variance goes to zero.
Consider the likelihood function as the
covariance goes to zero.
The likelihood approaches infinity.
21
4/7/2015
K-means VS EM
k-means clustering and EM clustering on an artificial dataset ("mouse"). The
tendency of k-means to produce equi-sized clusters leads to bad results, while
EM benefits from the Gaussian distribution present in the data set
National University of Singapore
43
So far
K-means
Expectation Maximization
National University of Singapore
44
22
4/7/2015
Next up
Multiple-Subspace Segmentation
K-subspaces
EM for Subspaces
National University of Singapore
45
Multiple-Subspace Segmentation
National University of Singapore
46
23
4/7/2015
K-subspaces
National University of Singapore
47
K-subspaces
With noise, we minimize
Unfortunately, unlike PCA, there is no constructive
solution to the above minimization problem. The main
difficulty is that the foregoing objective is hybrid it is
a combination of minimization on the continuous
variables {Uj} and the discrete variable j.
National University of Singapore
48
24
4/7/2015
K-subspaces
National University of Singapore
49
K-subspaces
Exactly the same as
in PCA
National University of Singapore
50
25
4/7/2015
K-subspaces
National University of Singapore
51
K-subspaces
National University of Singapore
52
26
4/7/2015
EM for Subspaces
National University of Singapore
53
EM for Subspaces
National University of Singapore
54
27
4/7/2015
EM for Subspaces
National University of Singapore
55
EM for Subspaces
National University of Singapore
56
28
4/7/2015
EM for Subspaces
National University of Singapore
57
EM for Subspaces
In the M step
National University of Singapore
58
29
4/7/2015
EM for Subspaces
National University of Singapore
59
EM for Subspaces
National University of Singapore
60
30
4/7/2015
EM for Subspaces
National University of Singapore
61
EM for Subspaces
National University of Singapore
62
31
4/7/2015
Relationship between K-subspaces and
EM
At each iteration,
K-subspaces algorithm gives a definite
assignment of every data point into one of the
subspaces;
EM algorithm views the membership as a
random variable and uses its expected value
to give a probabilistic assignment of the
data point.
National University of Singapore
63
Homework
Read the handout Chapter 4 Iterative
Methods for Multiple-Subspace
Segmentation.
Complete exercise 4.2 (page 111) of the
handout
National University of Singapore
64
32