Course Title – Natural Language Processing
Topic Title – Viterbi Algorithm
Presenter’s Name – Dr. M Nagaraju
Presenter’s ID – IARE10960
Department Name – CSE (AI & ML)
Lecture Number - 3
Presentation Date – 30-05-2025
1
Decoding: The Viterbi Algorithm
• The Viterbi Algorithm is a dynamic programming
algorithm widely used in NLP for solving problems
involving HMMs.
• It is particularly useful for tasks like:
• part-of-speech tagging,
• speech recognition,
• named entity recognition, and
• sequence labelling.
2
Decoding: The Viterbi Algorithm
• For any model, such as an HMM, that contains hidden variables,
the task of determining which sequence of variables is the
underlying source of some sequence of observations is called
the decoding task.
• In the ice cream domain, given a sequence of ice cream
observations 3 1 3 and an HMM, the task of the decoder is to
find the best hidden weather sequence (H H H).
3
Decoding: The Viterbi Algorithm
• We might propose to find the best sequence as follows: for
each possible hidden state sequence (HHH, HHC, HCH, etc.).
• We could run the forward algorithm and compute the likelihood
of the observation sequence given that hidden state sequence.
• It should be clear that we cannot do this because there are an
exponentially large number of state sequences!
• Viterbi also strongly resembles another dynamic programming
variant, the minimum edit distance.
4
Decoding: The Viterbi Algorithm
5
Decoding: The Viterbi Algorithm - Purpose
• The Viterbi algorithm finds the most probable sequence
of hidden states (e.g., part-of-speech tags) given a
sequence of observed events (e.g., words in a sentence)
and a trained HMM.
1.Initialization:
• Start by calculating the probability of the first
observation being generated by each possible state.
• Use the initial probabilities of the states and the
emission probabilities.
6
Decoding: The Viterbi Algorithm - Purpose
2. Recursion:
• For each subsequent observation, compute the
probability of reaching each state by considering:
1.The transition probabilities from the previous states.
2.The emission probabilities of the current
observation.
• Keep track of the most probable path to each state.
7
Decoding: The Viterbi Algorithm - Purpose
3. Termination:
• Identify the final state with the highest probability.
4. Backtracking:
• Trace back through the stored paths to reconstruct the
most probable sequence of states.
8
Decoding: The Viterbi Algorithm - Example
We want to find the most likely sequence of hidden states (e.g., weather
conditions like Hot or Cold) that leads to the observed emissions: [3, 1, 3] (ice
creams eaten).
9
Decoding: The Viterbi Algorithm - Example
Transition Probability: The transition probability in an HMM is the probability of
moving from one hidden state to another.
“If today is HOT, what’s the probability that tomorrow will be HOT or COLD?”
This means:
• On a Hot day, the chance of eating 3 ice creams is 0.4
• On a Cold day, the chance of eating 3 ice creams is only 0.1
• So, if someone eats 3 ice creams, it's more likely the day was Hot.
10
Decoding: The Viterbi Algorithm - Example
You want to find the most likely sequence of hidden states (e.g.,
weather conditions like Hot or Cold) that leads to the observed
emissions: [3, 1, 3] (ice creams eaten).
11
Decoding: The Viterbi Algorithm - Example
In HMM, the emission probability is the probability of observing a specific
output (also called "observation" or "emission") given a hidden state.
“If the system is in a certain hidden state, how likely is it to emit a particular
observable value?”
This means:
• On a Hot day, the chance of eating 3 ice creams is 0.4
• On a Cold day, the chance of eating 3 ice creams is only 0.1
• So, if someone eats 3 ice creams, it's more likely the day was Hot.
12
Decoding: The Viterbi Algorithm - Example
Use a table to store
- V[state][t]: max probability of any path that ends in state at time t.
- Path[state]: best path leading to that state.
13
Decoding: The Viterbi Algorithm - Example
14
Thank You
15