0% found this document useful (0 votes)
32 views32 pages

Hidden Markovnikov Model

The document provides an overview of Markov models, specifically focusing on Finite Markov models and Hidden Markov Models (HMMs), which are used to predict future states based on current observations. It explains the components of Markov chains, the algorithms for implementing HMMs, and their applications in various fields such as speech recognition, natural language processing, bioinformatics, and finance. Additionally, it discusses the limitations of HMMs, including limited modeling capabilities, overfitting, lack of robustness, and computational complexity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views32 pages

Hidden Markovnikov Model

The document provides an overview of Markov models, specifically focusing on Finite Markov models and Hidden Markov Models (HMMs), which are used to predict future states based on current observations. It explains the components of Markov chains, the algorithms for implementing HMMs, and their applications in various fields such as speech recognition, natural language processing, bioinformatics, and finance. Additionally, it discusses the limitations of HMMs, including limited modeling capabilities, overfitting, lack of robustness, and computational complexity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

By Dr.

Samta Gajbhiye
Ø Markov models are a type of probabilistic model that is used to
predict the future state of a system, based on its current state.
Ø finite state machine with N distint states begins at (Time t = 1) in
initial state
Ø Each state has an associated probability of being in any other state
after one step.
Ø It moves from current state to Next state according to the transition
probabilities associated with the Current state
Ø This kind of system is called Finite or Discrete Markov model.

2
Ø Used to model real-world problems where hidden and observable
states are involved.
Ø Based on the type of information available to use for making
predictions or decisions Markov models classified into two:
i. Hidden Markov models deal with hidden variables that cannot be
directly observed
ii. Observable model also termed as Markov chain, hidden variables
are not involved.

3
Ø Current state of the system depends on the previous state of the
system
Ø The State of the system at Time [ T+1 ] depends on the state of the
system at time T.

Xt=1 Xt=2 Xt=3 Xt=4 Xt=5

4
Ø Markov chains: Thought of as a machine or a system that hops from
one state to another, typically forming a chain.
Ø Property: probability of moving to any particular state next depends
only on the current state and not on the previous states.

5
Ø A Markov chain consists of three components:
i. Initial probability distribution: An initial probability distribution
over states, is the probability that the Markov chain will start in a
certain state i.
ii. One or more states
iii. Transition probability distribution: A transition probability matrix
A where each aij represents the probability of moving from state ‘i’
to state ’j’.

6
7
Using this Markov chain, what is the probability that the Wednesday
will be cloudy if today is sunny. The following are different transitions
that can result in a cloudy Wednesday given today (Monday) is sunny.
Ø Sunny – Sunny (Tuesday) – Cloudy (Wednesday): The probability to a
cloudy Wednesday can be calculated as 0.5 x 0.4 = 0.2
Ø Sunny – Rainy (Tuesday) – Cloudy (Wednesday): The probability of a
cloudy Wednesday can be calculated as 0.1 x 0.3 = 0.03
Ø Sunny – Cloudy (Tuesday) – Cloudy (Wednesday): The probability of a
cloudy Wednesday can be calculated as 0.4 x 0.1 = 0.04
Ø The total probability of a cloudy Wednesday = 0.2 + 0.03 + 0.04 =
0.27.

8
Ø Idea: the hidden states generate the observations, and the observed
data is used to estimate the hidden state sequence.
Ø Referred as the forward-backwards algorithm.
Ø Used in situations where the underlying system or process that
generates the observations is unknown or hidden, hence it got the
name “Hidden Markov Model.”
Ø It is used to predict future observations or classify sequences, based
on the underlying hidden process that generates the data.

9
Ø An HMM consists of two types of variables: hidden states and
observations.
§ The hidden states are the underlying variables that generate the
observed data, but they are not directly observable. Their presence
is observed by observation symbols that hidden states emits
§ The observations are the variables that are measured and
observed.
§ Example to understand hidden state and observations : We don’t
know in what mood your friend is (mood is hidden states), but we
observe their actions (observable symbols), and from those actions
we observe we make a guess about hidden state in which she or he
is.
Ø The relationship between the hidden states and the observations is
10
modeled using a probability distribution.
Ø The Hidden Markov Model (HMM) is the relationship between the
hidden states and the observations using two sets of probabilities:
the transition probabilities and the emission probabilities.
§ The transition probabilities describe the probability of
transitioning from one hidden state to another.
§ The emission probabilities describe the probability of observing
an output given a hidden state.

11
The Hidden Markov Model (HMM) algorithm can be implemented using
the following steps:
Step 1: Define the state space and observation space:
§ State space: Set of all possible hidden states
§ Observation space: Set of all possible observations.

Step 2:Define the initial state distribution:


§ Probability distribution over the initial state.

12
Step 3:Define the state transition probabilities:
§ Probabilities of transitioning from one state to another.
§ Forms the transition matrix, which describes the probability of
moving from one state to another.
§ The element aij is the probability of transiting from state i to state j.
The rows of a Markov matrix add up to one, i.e. the probability of
reaching a state from any possible state is one.

13
Step 4:Define the observation likelihoods:
§ Probabilities of generating each observation from each state.
§ Forms the emission matrix, which describes the probability of
generating each observation from each state.
Step 5:Train the model
§ The parameters of the state transition probabilities and the
observation likelihoods are estimated using the Baum-Welch
algorithm, or the forward-backward algorithm.
§ Done by iteratively updating the parameters until convergence.
Step 6:Decode the most likely sequence of hidden states
§ Given the observed data, the Viterbi algorithm is used to compute
the most likely sequence of hidden states.
§ Step 7:Evaluate: Evaluated using various metrics, such as accuracy,
14
precision, recall, or F1 score.
Ø Two major assumptions are made in HMM. The next state (xi+1) and
the current observation (oi ) solely depend on the current state (x i )
only.

15
Ø Suppose you want to know your friends activity, but you
can only observe what weather is outside. Your friend
activities which are hidden states “emits” observable
symbols, which are weather condition.
Ø Step1: Hidden states and observation states visualization
§ Friends activities: Basketball (B), Football (F), Video games (G)
§ Observable symbols: Sunny (S), Cloudy (C), Rainy (R)
§ Figure represents Hidden and Observed States

16
Ø Step2: Initial /Terminal state probability distribution states are used
for calculation
Ø Initial probability: P(X = B)=0.4, P(X = F)=0.5, P(X = G)=0.1

17
Ø Step3: State Transition Probability distribution :

Xi+1↓
End→ B F G Su
Start↓ m
B 0.3 0.5 0.2 1
Xi→ F 0.5 0.3 0.2 1

G 0.4 0.2 0.4 1


18
Ø Step4: Observation likelihoods

O i↓
Observation→ S C R Su
State↓ m
B 0.6 0.3 0.1 1

Xi→ F 0.7 0.2 0.1 1

G 0.1 0.1 0.8 1

19
Ø Step5 and Step6: Train the model and Generate the Observed
Sequence
§ Observation sequence is sequence of observation symbols from 1
symbol to N symbols.
§ Every observation sequence is treated as separate unit without any
knowledge about past or future.

20
Ø Two Diagrams with same observed sequence ‘SSCRCSC’ for different
hidden states

Ø Observed Sequence ‘RCS’

21
§ Consider three consecutive days and assume that this scenario
happens ie. First day video game was played and it was of raining,
played Basket Ball and it was cloudy, and third day Football was
played and it was Sunny day
§ What is the probability of this scenario occurring [RCS] ?
P (Oi =R C S , Xi = G B F) = ?

22
§ The next state (xi+1) and the current observation (oi) solely depend
on the current state (xi)

§ Using Markov Property, we can compute these as the product of six


terms. Here X1=G, X2=B, X3=F represents hidden states and O1=R,
O2=C, O3=S represents observed states for three consecutive day
respectively
i. P( X1 = G)
ii. P(X2=B| X1=G)
iii. P(X3 = F| X2 =B)
iv. P(O1 = R|X1=G)
v. P(O2=C| X2=B)
23
§ So,
iv. P(O1 = R|X1=G) = 0.8
i. P( X1 = G) = ) = 0.4 (initial state)
v. P(O2=C| X2=B) = 0.3
ii. P(X2=B| X1=G) = 0.4
vi. P(O3=S| X3=F) = 0.7
iii. P(X3 = F| X2 =B) = 0.5
Xi+1↓ O i↓
E→ B F G O→ S C R
S↓ H↓
B 0.3 0.5 0.2 B 0.6 0.3 0.1
Xi
→ F 0.5 0.3 0.2 F 0.7 0.2 0.1
G 0.4 0.2 0.4 G 0.1 0.1 0.8

§ Hence Joint Probability of observed weather sequence ‘RCS’ and


playing sequence ‘G B F’ is product of all six terms (i, ii, ….vi)
P (Oi =R C S, Xi = G B F)
= 0.4 × 0.4 × 0.5 × 0.8 × 0.3 ×0.7
= 0.01344 24
Similarly, many computations can be done like
§ “What is the most likely paying sequence for three consecutive days
for the given the observed state as R C S respectively?”
§ Permutations of playing sequence: BFG , BGF, FBG, FGB, GBF, GFB,
BBB, GGG, FFF.
§ To fi nd the most l i kel y sequence: Compute the pr o b a b i l i t y
corresponding to each of these playing sequence for given observed
state RCS like ü P(Oi = RCS, Xi = GBF)
ü P(Oi = RCS, Xi = BFG ) ü P(Oi = RCS, Xi = GFB)
ü P(Oi = RCS, Xi = BGF) ü P(Oi = RCS, Xi = BBB)
ü P(Oi = RCS, Xi = FBG) ü P(Oi = RCS, Xi = BFG )
ü P(Oi = RCS, Xi = FGB) ü P(Oi = RCS, Xi = FFF)

§ And select most likely sequence, the one with the


maximum probability 25
1.Speech Recognition:
§ used to model the different sounds and phones that makeup speech.
The hidden states: different sounds or phones
§ The observations are the acoustic signals that are generated by the
speech.
§ The goal is to estimate the hidden state sequence, which
corresponds to the transcription of the speech, based on the
observed acoustic signals.

26
2. Natural Language Processing
§ used for tasks such as part-of-speech tagging, named entity
recognition, and text classification.
§ The hidden states: underlying grammar or structure of the text,
§ The observations are the words in the text.
§ The goal i s to es t i ma t e t h e h i d d e n s t a t e s e q u e n c e , w h i c h
corresponds to the structure or meaning of the text, based on the
observed words.

27
3. Bioinformatics
§ Used to model sequences of DNA, RNA, and proteins.
§ The hidden states: different types of residues
§ The observations are the sequences of residues.
§ The goal is to estimate the hidden state sequence, which
corresponds to the underlying structure of the molecule, based on
the observed sequences of residues.

28
4. Finance
§ Used to model stock prices, interest rates, and currency exchange
rates.
§ The hidden states: different economic states, such as bull and bear
markets
§ The observations are the stock prices, interest rates, or exchange
rates.
§ The goal is to estimate the hidden state sequence, which
corresponds to the underlying economic state, based on the
observed prices, rates, or exchange rates.

29
1. Limited Modeling Capabilities:
§ Limited in their modelling capabilities.
§ The structure of the data can be quite complex, and the simple
structure of HMMs may not be enough to accurately capture all
the details.
§ For example, in speech recognition, the complex relationship
between the speech sounds and the corresponding acoustic
signals may not be fully captured by the simple structure of an
HMM.

30
2. Overfitting
§ Prone to overfitting,
§ Overfitting occurs when the model fits the training data too well
and is unable to generalize to new data.
3. Lack of Robustness
§ limited in their robustness to noise and variability in the data.
§ For example, in speech recognition, the acoustic signals generated
by speech can be subjected to a variety of distortions and noise,
which can make it difficult for the HMM to accurately estimate the
underlying structure of the data.

31
4. Computational Complexity
§ Limited by their computational complexity
§ The computational complexity of HMMs is due to the need to
estimate the parameters of the model and to compute the
likelihood of the data given in the model.

32

You might also like