EM Algorithm
• It stands for Expectation – Maximization Algorithm
• It is used to find latent variable (variables that are not directly observable and are actually
inferred from the values of the other observed variables). A latent variable is a random
variable which you can't observe neither in training nor in test phase.
• It is basic for many unsupervised clustering algorithm.
Steps involved in EM Algorithm:
1. Initially, a set of initial values are considered (A set of incomplete data is given to system).
2. Expectation or E-Step – Here, we use observed data to estimate or guess the values of
missing/incomplete data.
3. Maximization or M-Step – Here we use the complete data generated in preceding e-step to
update the values.
4. We check if values are converging/not
If converging then stop the process otherwise repeat the step 2 and 3 till the convergence
occurs.
Usage of EM Algorithm:
• Used to fill missing data.
• Used for unsupervised clustering
• Used to discover values of latent variables
Advantages:
• With each iteration, likelihood increases
• E-step and M-step are easy to implement
Disadvantages:
• Slow convergence
• Makes convergence to local optimal
Application of EM Algorithm:
The latent variable model has several real-life applications in Machine learning:
• Used to calculate the Gaussian density of a function.
• Helpful to fill in the missing data during a sample.
• It finds plenty of use in different domains such as Natural Language Processing (NLP), Computer
Vision, etc.
• Used in image reconstruction in the field of Medicine and Structural Engineering.
• Used for estimating the parameters of the Hidden Markov Model (HMM) and also for some
other mixed models like Gaussian Mixture Models, etc.
Example -1 (Two Coin Problem)
• Assume that we have two coins, C1 and C2
• Assume the bias of C1 is t1 (i.e., probability of getting heads with C1)
• Assume the bias of C2 is t2 (i.e., probability of getting heads with C2)
• We want to find t1 and t2 by performing a number of trials (i.e., coin tosses)
Solution:
First Experiment –
• We choose 5 times one of the coins
• We toss the chosen coin 10 times.
B H T T T H H T H T H
A H H H H T H H H H H
A H T H H H H H T H H
B H T H T T T H H T T
A T H H H T H H H T H
t1 = Number of heads using C1 / total number of flips using C1
t2 = Number of heads using C2 / total number of flips using C2
Coin A Coin B
5H, 5T
9H, 1T
8H, 2T
4H, 6T
7H, 3T
t1 = 24/(24+6) = 24/30 = 0.8
t2 = 9/(9+11) = 9/20 = 0.45
• We have two coins: A and B
• The probabilities for heads are tA and tB
• We have 5 measurement sets including 10 coin tosses in each set
• If we know which of the coins are tossed in each set, we can calculate the ML probabilities for t A
and tB
• If we don’t know which of the coins are tossed in each set, ML estimates cannot be calculated
directly. → EM Algorithm
• Binomial distribution used to calculate probabilities:
L(C) = tk (1 – t)n-k
Where t = probability of that particular point
n = Number of flips taken by the coin
k = Number of heads taken by particular coin
EM Coin Problem Solved
• Assume that for first coin probability for head is 0.60
• Assume that for first coin probability for head is 0.50
• For first trial (toss) 5 heads and 5 tails
• Use binomial probability distribution for likelihood of head
• k is number of tails is 5 and n is trials 10
Likelihood for first coin flips
L (A) = 0.65 (1 – 0.6)10-5 = 0.0007963
L (B) = 0.55 (1 – 0.5)10-5 = 0.0009766
P (A) = L (A) / L (A) + L (B) = 0.0007963 / (0.0007963 + 0.0009766) = 0.45
P (B) = L (B) / L (A) + L (B) = 0.0009766 / (0.0007963 + 0.0009766) = 0.55
In similar fashion find probability of all coins with all flips. It will be as follows:
L (H) = Likely no of heads [P(A) * Total no of heads)]
L (T) = Likely no of tails [P(A) * Total no of tails)]
Iteration 1->: Coin A Coin B
P(A) P(B) L(H) L(T) L(H) L(T)
B H T T T H H T H T H 0.45 0.55 2.2 2.2 2.8 2.8
A H H H H T H H H H H 0.80 0.20 7.2 0.8 1.8 0.2
A H T H H H H H T H H 0.73 0.27 5.9 1.5 2.1 0.5
B H T H T T T H H T T 0.35 0.65 1.4 2.1 2.6 3.9
A T H H H T H H H T H 0.65 0.35 4.5 1.9 2.5 1.1
For Coin A
∑ L (H) = 21.3 and ∑ L (T) = 8.6
t1 = 21.3 / (21.3 + 8.6) = 0.71
For Coin B
∑ L (H) = 11.7 and ∑ L (T) = 8.4
t1 = 11.7 / (11.7 + 8.4) = 0.58
Python Code:
import numpy as np
def coin_em(rolls, theta_A, theta_B, maxiter=10):
# Initial Guess
theta_A = theta_A or random.random()
theta_B = theta_B or random.random()
thetas = [(theta_A, theta_B)]
# Iterate
for c in range(maxiter):
print("#%d:\t%0.2f %0.2f" % (c, theta_A, theta_B))
heads_A, tails_A, heads_B, tails_B = e_step(rolls, theta_A, theta_B)
theta_A, theta_B = m_step(heads_A, tails_A, heads_B, tails_B)
thetas.append((theta_A,theta_B))
return thetas, (theta_A,theta_B)
def e_step(rolls, theta_A, theta_B):
"""Produce the expected value for heads_A, tails_A, heads_B, tails_B
over the rolls given the coin biases"""
heads_A, tails_A = 0,0
heads_B, tails_B = 0,0
for trial in rolls:
likelihood_A = coin_likelihood(trial, theta_A)
likelihood_B = coin_likelihood(trial, theta_B)
p_A = likelihood_A / (likelihood_A + likelihood_B)
p_B = likelihood_B / (likelihood_A + likelihood_B)
heads_A += p_A * trial.count("H")
tails_A += p_A * trial.count("T")
heads_B += p_B * trial.count("H")
tails_B += p_B * trial.count("T")
return heads_A, tails_A, heads_B, tails_B
def m_step(heads_A, tails_A, heads_B, tails_B):
"""Produce the values for theta that maximize the expected number of heads/tails"""
# Replace dummy values with your implementation
theta_A = heads_A / (heads_A + tails_A)
theta_B = heads_B / (heads_B + tails_B)
return theta_A, theta_B
def coin_likelihood(roll, bias):
# P(X | Z, theta)
numHeads = roll.count("H")
flips = len(roll)
return pow(bias, numHeads) * pow(1-bias, flips-numHeads)
rolls = [ "HTTTHHTHTH", "HHHHTHHHHH", "HTHHHHHTHH",
"HTHTTTHHTT", "THHHTHHHTH" ]
thetas, _ = coin_em(rolls, 0.6, 0.5, maxiter=6)
Output:
#0: 0.60 0.50
#1: 0.71 0.58
#2: 0.75 0.57
#3: 0.77 0.55
#4: 0.78 0.53
#5: 0.79 0.53