0% found this document useful (0 votes)

45 views14 pages

Lec7 MarkovChains

The document discusses Markov Chains, focusing on their application in modeling dependent discrete random variables through examples such as snow fall data, DNA sequences, and breast cancer patient statuses. It explains the basics of Markov processes, the Markov property, transition probabilities, and the concept of stationary distributions. Additionally, it covers irreducibility, periodicity, and the conditions for unique stationary distributions in Markov chains.

Uploaded by

zz2884

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views14 pages

Lec7 MarkovChains

Uploaded by

zz2884

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

COLUMBIA UNIVERSITY Prof.

Marco Avella Medina

Statistical Inference and Modeling Spring 2022
Lecture Notes 7: Markov Chains

1 Some Examples
Until now we have focused on the i.i.d. random sample framework. However, in many
applications there are dependence structures in the data that cannot be ignored. In the
three examples below we consider modeling the data presented as dependent discrete random
variables.

1.1 Snow Fall Data

(a) Observed Snow Depth (b) Binary Time Series

Figure 1: Snow Fall Data

Figure 1 gives the depth of snow in day for the period between December 17 and December
31 in Central Park from 1912 to 2009. Figure 1a records the depth in mm of the snow, if we
observed a snow depth of at least 50 mm on the jth day is represented with a ‘1’, while ‘0’
indicates otherwise. Therefore we will have the binary time series in Figure 1b.

1.2 DNA Data

Figure 2 gives an example of DNA sequence. The table in Figure 2a shows the first 150 bases
from a sequence of 1572 bases found in a human genome. The plot in Figure 2a shows the
proportion of times the bases (A, C, T or G) appear in a window of width 100 centered at
position t. The estimated proportions seem to be fairly stable along the different locations
in the sequence. In Figure 2b shows the observed frequencies and estimated proportions of
16 possible successive pairs of bases.

1
(a) Proportion of Bases (b) Successive Pairs

Figure 2: DNA Sequence Data

1.3 Breast Cancer Data

Figure 3 gives the initial and follow-up status for 37 breast cancer patients treated for spinal
metastases. Three possible status are: able to walk unaided (1), unable to walk unaided
(2), and dead (3). The times of follow-up are 0, 3, 6, 12, 24, and 60 months after treatment
began. Note that, Woman 24 was alive after 6 months but her ability to walk was not
recorded. In this example, we can see there are apparent relationships between different
status. Furthermore, state 3 (dead) is an absorbing state as it will not have a follow up
status.

Figure 3: Breast Cancer Data

2
2 Markov Chain Model
2.1 Statistical Model
2.1.1 Basics
Let Xt denote a process taking values in a state space S = {1, . . . , S}. The data is of the
form X0 = s0 , Xt1 = s1 , . . . , Xtk = sk where 0 < t1 < . . . , tk .
Example.
• Snow fall data: t is the day, there are total k = 15 days in December per year (between
December 17 and December 31) and the state space is S = {1, 2} for the states “no
snow” and “snow”.
• DNA data: t is the location in the DNA sequence, we have k = 1571 total location
and the state space S = {1, 2, 3, 4} = {A, C, G, T }
• Breast cancer data: in total, k = 5 at most with t0 = 0, t1 = 3, t2 = 6, t3 = 12,
t4 = 24, and t5 = 60 months, and state space S = {1, 2, 3} representing the states
“walk unaided”, “not able to walk unaided” and “dead”.

2.1.2 Markov Property

Assume that we have a sample Xt1 , . . . , Xtk of dependent data. Using conditional probabil-
ities, we can always write the joint frequency function as
P(X0 = s0 , Xt1 = st1 , . . . , Xtk = sk )
=P(X0 = s0 , Xt1 = st1 , . . . , Xtk−1 = stk−1 )P(Xtk = sk |X0 = s0 , Xt1 = s1 , . . . , Xtk−1 = sk−1 )
the former part can be calculated using the same rule
P(X0 = s0 , Xt1 = st1 , . . . , Xtk−1 = stk−1 )
=P(X0 = s0 , Xt1 = st1 , . . . , Xtk−2 = stk−2 )P(Xtk−1 = sk−1 |X0 = s0 , Xt1 = s1 , . . . , Xtk−2 = sk−2 ).
Iterating conditional probabilities following the same logic leads to
P(X0 = s0 , Xt1 = st1 , . . . , Xtk = sk )
k
Y
=P(X0 = s0 ) P(Xtj = sj |X0 = s0 , Xt1 = s1 , . . . , Xtj−1 = sj−1 ).
j=1

The (first-order) Markov Property assumption requires that given the “present” Xi , the
“future”, Xi+1 , Xi+2 , . . . , is independent of the “past” , Xi−1 , Xi−2 , . . . . This leads to the
simplification
n
Y
P(X0 = s0 , . . . , Xtk = sk ) = P(X0 = s0 ) P(Xti = si |Xti−1 = si−1 ).
i=1

A random process that satisfies this (first order) Markov Property, in which the future is
independent of the past, given the present is called a (first order) Markov Process.

3
2.1.3 Markov Chains
A Markov Chain is a Markov process in discrete time taking values in a state space
S = {1, . . . , S}.

(a) Snow Fall Data (b) DNA Sequence Data (c) Breast Cancer Data

Figure 4: Markov Chain Models

Example. The three examples mentioned above will have Markov Chain graph showed
in Figure 4. The arrows indicate the direction of possible transitions in the following time
step.

2.2 Stationarity and Transition Probabilities

• Stationarity
We will assume that the process Xt is stationary i.e.
P(Xt = s|Xu = r) = P(Xt−u = s|X0 = r).
Hence the conditional probabilities only depend on the time differences. In particular,
stationarity assumes that the probability of transitioning between states will not change
over time.
• Transition Probabilities
For a stationary chain Xt observed at a discrete equally spaced times t = 0, 1, . . . , k,
we define the transition probabilities as
prs = P(X1 = s|X0 = r), r, s ∈ S
Note that we only considered the event X1 = s|X0 = r, since the transition probabilities
are assumed to stay fixed over time and we have equally spaced time.

2.3 Transition Matrix

• The transition probabilities define the S × S transition
P matrix PP
whose r, s element is
prs . Summing all elements in rth row of S will get Sj=1 prj = Sj=1 P(X1 = j|X0 =
r) = 1. Since each row of the transition matrix should sum up to 1, we have that
P1S = 1S , where 1S denotes the S-dimensional vector of ones.

4
(0) (0) (0) (0)
• Consider an initial probability vector p(0) = (p1 p2 . . . pS )> where pr = P(X0 = r).
Then the probability distribution function of each state in time step 1 is

p> >
(1) =p(0) P
 
p11 p12 . . . p1S
(0)  p21 p22 . . . p2S 

(0) (0)
=[p1 p2 . . . pS ]  ..

.. . . .. 
 . . . . 
pS1 pS2 . . . pSS
hX s s
X (0) s
X i
(0) (0)
= pj pj1 pj pj2 . . . pj pjS
j=1 j=1 j=1

(0)
Therefore the sth element of p> >
P
(1) = p(0) P is P(X1 = s) = j=1 pj pjs . With the same
reasoning, the probability distribution function of Xn is given by p> > n
(n) = p(0) P .

• A Markov chain with is said to have a stationary distribution π if π > P = π > and
π > 1S = 1S . More specifically, π is the eigenvector of P associated with the eigenvalue
λ = 1. It is the limiting distribution π > = limn→∞ p> > n
(n) = limn→∞ p(0) P .

Example. Consider a simple four-state Markov chain model as below in Figure 5. The
model diagram and transition matrix are shown in Figure 5a and 5b respectively.

(a) Model Diagram (b) Transition Matrix

Figure 5: Four-state Markov chain

By solving the π > P = π > and π > 1S = 1S , we get



 1/3π2 = π1

π1 + 1/2π3 + 1/2π4 = π2



1/3π2 + 1/2π4 = π3

1/3π2 + 1/2π3 = π4





π + π + π + π = 1
1 2 3 4

Further solving this linear system we get a unique solution π = ( 81 38 41 14 )> . We can see
the stationary distribution is independent of initial probabilities. In this example, there
is a unique stationary distribution, however, it is not always the case.

5
Example. From the diagram representing the Markov Chain in Figure 6a, we can con-
struct a transition matrix as P. We can check that the elements in each row of the

(a) Model Diagram (b) Transition Matrix

Figure 6: Six-state Markov chain

transition matrix sum to 1 since

1 1     
2
0 0 0 0
2
1 1
1 3
0 0 0 0  1 1
   
 41 1
4
1 1

 0 0 1 1
P1S =  14 4 4 4  × = 
   = 1S

4 0 41 14 0 14 
 1 1

 0 0 0 0 1 1  1 1
2 2
0 0 0 0 12 12 1 1
Further assume its initial probability is p(0) = ( 41 41 0 1
4
0 14 )> , we can compute P(X2 = C)
as the the third element of p> 2
(0) P , which is 0.031.

1 1
2
2
0 0 0 0
2
1 3
0 0 0 0
4 4 
1 11 1
0 0 h i
p> 2
1 1 1 1 4

0 0 44 4
(0) P = = 0.234 0.375 0.031 0.031 0.156 0.172

4 4 4 4 1 0 14 14 0 41 
4 
0 0 0 0 1 1 
2 2
0 0 0 0 12 21
The stationary distribution π = (π1 π2 . . . π6 )> has to satisfy π > P = π > and π > 1S = 1S ,
solving the equation we get 

 2π1 = π2

π = π = 0
3 4


 π5 = π6

3π1 + 2π5 = 1
In this case, there isn’t a unique stationary distribution, one of them can be π =
(0.2 0.4 0 0 0.2 0.2)> . In fact, there is a unique stationary distribution when the chain is
both irreducible and positive recurrent. 1
1
Check Section Classification of chains, pp.229-232 in A.C. Davison (2003) for details.

6
2.4 Irreducibility
We say that state j is reachable from i, denoted by i −→ j, if there exists an integer n ≥ 0
such that Pijn > 0. A Markov chain is irreducible if and only if all states are reachable from
each other.

Example. The Markov chain illustrated in Fig 5 is irreducible, but the other example
from Fig 6 is not.

Remark. If the Markov chain is reducible, it implies that there exists at least one “isolated
island” that cannot be reached from remaining states.

2.5 Periodicity (Optional)

The period of a state i in Markov chain model with transition matrix P is given by

d(i) = gcd{m ≥ 1 : Piim > 0}.

If d(i) = 1, then the state i is called aperiodic.

Remark 1. A Markov chain is called aperiodic if and only if all its states are aperiodic.
Remark 2. If a Markov chain is aperiodic and has finite states, then there exists some
positive integer N such that:
Piim > 0
for any m ≥ N and state i.
Remark 3. If an irreducible Markov chain is aperiodic and has finite states, then there
exists some positive integer N such that:

Pijm > 0

for any m ≥ N , states i and j.

2.6 Recurrence and Transience (Optional)

Let τi,i denote the return time to state i given X0 = i.

τi,i = min{k : Xk = i|X0 = i}

If the Markov chain never revisits i, then τi,i = ∞. The state i is recurrent if P(τi,i < ∞) = 1,
or transient if P(τi,i = ∞) > 0.
If state i is recurrent and E(τi,i ) < ∞, then we say the state i is positive recurrent. Otherwise,
state i is called null recurrent.

Example. Recurrence and transience of simple random walk on Rd . It can be proved

that random walk is recurrent when d = 1, 2 but is transient when d ≥ 3. A famous
quote: “A drunken man will always find his way home but a drunken bird may get lost
forever.”

7
Remark. If all states in an irreducible Markov chain are positive recurrent, then we say that
the Markov chain is positive recurrent. If all states in an irreducible Markov chain are null
recurrent, then we say that the Markov chain is null recurrent. If all states in an irreducible
Markov chain are transient, then we say that the Markov chain is transient.

3 Likelihood Inference
3.1 Likelihood Function
Assuming we have observed data s0 , s1 , . . . , sk at times 0, 1, . . . , k, from a stationary Markov
chain Xt , the likelihood function is
k−1
Y
P(X0 = s0 , . . . , Xk = sk ) = P(X0 = s0 ) P(Xi+1 = si+1 |Xi = si )
i=0
k−1
Y
= P(X0 = s0 ) psi si+1
i=1
S Y
Y S
= p0 pnrsrs ,
r=1 s=1

Qk−1
where nrs is the number of observed transitions from r to s. Since i=1 psi si+1 = ps1 s2 ×
p s3 ×· · ·×psk−1 sk , it suffices to count the number of times each transition occurs. Therefore,
Qs2k−1 QS QS nrs
i=1 psi si+1 = r=1 s=1 prs .

From expression above we get the log-likelihood

S X
X S
`(P) = nrs log prs + log p0 ,
r=1 s=1

Taking partial derivative to calculate the MLE of prs , the second term log p0 will disappear,
so the optimization does not depend on the initial probabilities.

The MLE of prs can be shown to be of the form

nrs
p̂rs = ,
nr·

where nr· = Sj=1 nrj .

P
This is the most intuitive estimator that simply counts the proportion of transitions, as Fig-
ure 2b in the DNA example.

Let the transition matrix be P = (p> > > >

1 p2 . . . pS ) , where pr is the transition probability
vector from state r, and let P̂ = (p̂> > > >
1 p̂2 . . . p̂S ) denote the MLE of the transition matrix.
Under regularity conditions (ergodicity) the p̂rs have a joint normal distribution

8
√
D
nr. p̂r − pr −−−→ N (0, Vr )
n→∞

where nr. is the number of visits the chain makes to state r, and
 
pr1 (1 − pr1 ) −pr1 pr2 −pr1 pr3 . . . −pr1 prS
 −pr2 pr1 pr2 (1 − pr2 ) −pr2 pr3 . . . −pr2 prS 
Vr = 
 
.. .. .. . . .. 
 . . . . . 
−prS pr1 −prS pr2 −prS pr3 . . . prS (1 − prS )
and p̂i and p̂j are asymptotically independent. Therefore the covariances of p̂rs can be
estimated by 
 p̂rs (1 − p̂rs )/nr· , r = t, s = u,
cov(p̂
c rs , p̂tu ) = −p̂rs p̂ru /nr· , r = t, s 6= u,
0, otherwise


Example. Assume X = (0, 0, 0, 1, 1, 0, 0, 0, 1, 0) was drawn from a first order Markov

(0) (0)
chain, and the initial probability vector is p>(0) = (p1 p2 ) = (0.8 0.2). Then the
observed likelihood function is
S Y
Y S
P(X) =p0 pnrsrs
r=1 s=1
=0.8 × p400 × p201 × p210 × p111

So the maximum likelihood estimates of prs will be

4 2 2 1
pM
00
LE
= pM
01
LE
= pM
10
LE
= pM
11
LE
=
6 6 3 3

3.2 A Simpler Model

One could test a simpler model where the transition probabilities are independent of the
current state i.e. prs = ps This model is a zeroth-order Markov Chain PS that just assumes a
multinomial distribution.
P In
P this case the likelihood simplifies to j=1 n·j log pj and p̂s =
n·s /n·· with n·· = s n·s = r,s nrs .
The likelihood ratio test for testing a zeroth order V. a first order Markov chain models is
n o S X
nX S S
X o
Λn =2 `(P̂)f irst order − `(P̂)zeroth order =2 nrs log p̂rs − n.s log p̂s
r=1 s=1 s=1
S X
nX S S X
X S o XS X
S p̂
rs
=2 nrs log p̂rs − nrs log p̂s = 2 nrs log
r=1 s=1 s=1 r=1 r=1 s=1
p̂s
S S n n
rs ··
XX
=2 nrs log
r=1 s=1
nr· n·s

9
To test a simpler independent model, the null hypothesis should be H0 :“all rows of P equal
(p1 , . . . , pS ) ” or equivalently “the zeroth order model holds”. In this case, under H0 we have
D
Λn −−−→ χ2(S−1)2
n→∞

Here we have (S − 1)2 the degrees of freedom since the difference of number of parameters
in the two models compared is S(S − 1) − (S − 1) = (S − 1)2 . Note that, as the probability
of all states needs to sum to 1, each row has only (S − 1) parameters.

Example. DNA Sequence Data

Calculating the likelihood ratio under the independent (zeroth order) model, we get
Λn ≈ χ29 , where the observed value of this likelihood ratio is tn = 64.45, and p-value is
p < .00001. This gives strong evidence against the independence model. Note that Λn

Figure 7: Fit of Markov Chains to the DNA Data

is asymptotically equivalent to the Pearson chi-squared statistic given i (Oi − Ei )2 /Ei ,

P
where the sum is taken over all the cells in Figure 2b (16 in total), Oi are the observed
counts in those cells and Ei denote the expected counts under H0 :“the zeroth order
model is correct”. The Ei are calculated using the empirical proportions p̂s = n·s /n·· and
1/2
the total counts per row in Figure 2b. Since (Oi − Ei )/Ei is asymptotically normally
distributed under H0 , they can be used for model checking. According to the Q-Q plot
Figure 7, we can also conclude that the first-order model (right), fits better than zeroth-
order model (left). In particular while the left plot shows that the zeroth-order and the
first-order models give very different fits, the Q-Q plot on the right hand side shows that
there are no huge discrepancies in the fits of the first- and second-order models. In the case
the Oi denotes each cells in Figure 8 (64 in total) and the expected counts are computed
H0 :“the first order model is correct” using the empirical proportions p̂rs = nrs /nr· and
the total counts of each row of Figure 8.

10
3.3 Higher Order Models
One can extend the idea of a first-order Markov chain to chains of order m, where the
probability of transition into s depends on the m previous states

P(Xj = s|X0 = s0 , . . . , Xtj−1 = sj−1 )

= P(Xj = s|Xj−m = sj−m , . . . , Xtj−1 = sj−1 )
= psj−m sj−m+1 ...sj−1 s

Thus the ‘current’ state Xj depends not only on Xj−1 but on Xm−1 , Xm−2 , . . . , Xj−1 .

When m = 1 we have S(S−1) parameters since we had S vectors of parameters (pr1 , . . . , prS ).
In general there will be S m such vectors now, which leads to S m (S − 1) parameters. Hence
the dimension of the parameter increases exponentially fast with the order of the Markov
Chain. If we consider the second-order model for the DNA data example, i.e. a model where
the ‘current’ state depends on previous two states, we get the observed frequencies of Figure
8. The proportion of transitions described by the ‘frequencies for third base’ in Figure 8

Figure 8: Second-order Markov Chains Model

needs sum to 1, so the second-order Markov model has S − 1 = 3 parameters in each row
so there are S m (S − 1) = 16 × 3 = 48 unknown parameters. A third-order model will have
S m = 43 = 64 rows, hence S m (S − 1) = 64 × 3 = 192 parameters.

Example. Model Selection (DNA Sequence Data)

With the likelihood ratio test, we get the test statistic for testing first-order vs. second-
order dependence leads to P(χ236 > 55.2) ≈ 0.02. The likelihood ratio statistics for
testing second-order vs. third-order dependence leads to P(χ2144 > 150.3) ≈ 0.34. Recall
that there is strong evidence (p < .00001) for first-order dependence model compared
to independence model (zeroth-order model), the evidence for second- compared to first-
order model is weaker, and there is no suggestion of third-order dependence model.
Another way to make model selection is to compare AIC values. The Akaike Information

11
Criterion (AIC) is a general likelihood based model selection criterion. It can be defined
for a parametric model as

AIC(θ̂k ) = −2`k (θ̂k ) + 2k

where θk is a k-dimensional parameter, θ̂k is the MLE of θk and `k (θ̂k ) denotes the log-
likelihood of the MLE of θk . Using the AIC as the model criterion, we choose the model
with minimum AIC.2 The AIC values for the zeroth-, first-, second- and third-order
models are 4122.9, 4076.0, 4092.8 and 4230.5. This suggests that that m = 1 is the best
model.

4 Final Remarks
The general ideas developed here can be extended in many ways. Major examples include:

• Markov Chain Monte Carlo: an extremely useful method for running simulations and
Bayesian inference.

• Continuous time models: extend the Markov property to infinitesimal intervals [t, (1 +
δ)t). Markov Chains missed the precise lengths of time spent in each state before a
transition occurs. Continuous time models can accommodate in a more realistic way
such as continuous time events.

• Hidden Markov Models: the distribution that generates an observation depends on the
state of an underlying and unobserved Markov process. In the snow fall example, if
the observation states S : { ‘snow’ , ‘no snow’} also depends on an unobserved states
atmospheric pressure H : {‘low’, ‘high’}, it then can be modeled as a Hidden Markov
Model.

2
See Chapter 4.7 of Statistical Models for more details.

12
Appendix A Eigenvalues and Eigenvectors
Given A a square matrix of dimension n × n. The eigenvalues are defined as the roots of the
characteristic equation
|A − λIn | = 0
Let λ be an eigenvalue of A. The eigenvector x, associated with the eigenvalue λ satisfies

Ax = λx

In general, the eigenvectors are normalized, x> x = 1. The multiplicity of an eigenvalue

corresponds to the number of times that it is a root of the characteristic equation.

• A and A> have the same eigenvalues, but not necessarily the same eigenvectors.

• If λ is an eigenvalue of A, then 1/λ is an eigenvalue of A−1 .

• If A is a matrix n × n and G a non-singular matrix n × n, then A and G−1 AG have

the same eigenvalues.

• A singular matrix has at least one eigenvalue equal to zero.

• An idempotent matrix has only eigenvalues equal to 0 or 1.

• An orthogonal matrix has only eigenvalues equal to 1 and -1.

• Given A a real symmetric n × n :

– All its eigenvalues are real .

– The eigenvectors are orthogonal.
– A is positive definite (positive semi-definite) if and only if its eigenvalues are
positive (non-negative).
– There exists an orthogonal matrix P such that
 
λ1 0 ... 0
 0 λ2 ... 0 
P AP > = Λ =  ..
 
.. 
 . . 
0 0 . . . λn

with P > ΛP = A
– Noting that Q = Λ−1/2 P , we obtain :
QAQ> = I

• Given A(n × n) a real symmetric matrix, positive definite :

– |A| = ni=1 λi
Q
P P
– tr(A) = aii = λi

13
– The eigenvalues of I − A are equal 1 − λi .

• Given A and B two real symmetric matrices n × n :

– The roots of |A − λB| are the same as those of T −1 AT −1 where B = T > T .

– There exists a matrix R such that

RAR> = Λ and RBR> = In

Markov Hand Out
No ratings yet
Markov Hand Out
14 pages
Markov Decission Process. Unit 3
No ratings yet
Markov Decission Process. Unit 3
37 pages
Notes On Markov Chain
No ratings yet
Notes On Markov Chain
34 pages
Markov Chains
No ratings yet
Markov Chains
133 pages
Markov Chains
No ratings yet
Markov Chains
42 pages
Lec19 PDF
No ratings yet
Lec19 PDF
9 pages
Markov Chain (Part 1)
No ratings yet
Markov Chain (Part 1)
31 pages
Markov Chains for Academics
No ratings yet
Markov Chains for Academics
80 pages
Probability & Statistics 2: Robert Šámal January 29, 2024
No ratings yet
Probability & Statistics 2: Robert Šámal January 29, 2024
29 pages
Stochastic Processes & Markov Chains Overview
No ratings yet
Stochastic Processes & Markov Chains Overview
37 pages
Classification of States: (A) Absorbing State
No ratings yet
Classification of States: (A) Absorbing State
22 pages
Markov Chains: J. M. Akinpelu
No ratings yet
Markov Chains: J. M. Akinpelu
56 pages
Advanced Markov Chain Concepts
No ratings yet
Advanced Markov Chain Concepts
38 pages
Markov Chains
No ratings yet
Markov Chains
55 pages
Stochastic Proccess
No ratings yet
Stochastic Proccess
114 pages
Markov Chain
No ratings yet
Markov Chain
37 pages
DSBD Unit-Ii 2
No ratings yet
DSBD Unit-Ii 2
47 pages
Lecture 19 Compressed
No ratings yet
Lecture 19 Compressed
19 pages
Markov Chains
No ratings yet
Markov Chains
6 pages
Markov Chains
No ratings yet
Markov Chains
29 pages
Handout02 2
No ratings yet
Handout02 2
37 pages
System Modeling - 5
No ratings yet
System Modeling - 5
9 pages
Introduction to Stochastic Processes
No ratings yet
Introduction to Stochastic Processes
4 pages
Markov Chains: Definitions & Examples
No ratings yet
Markov Chains: Definitions & Examples
7 pages
Understanding Markov Chains
No ratings yet
Understanding Markov Chains
76 pages
Stochastic Processes I
No ratings yet
Stochastic Processes I
15 pages
Markov Processes: Fundamental of Stochastic Networks-Oliver C.Ibe, John-Wiley, 2011
No ratings yet
Markov Processes: Fundamental of Stochastic Networks-Oliver C.Ibe, John-Wiley, 2011
30 pages
확통1 LectureNote08 on Markov Chains
No ratings yet
확통1 LectureNote08 on Markov Chains
77 pages
07 Markov Chains
No ratings yet
07 Markov Chains
4 pages
02 Markov Chain
No ratings yet
02 Markov Chain
12 pages
3
No ratings yet
3
58 pages
Markov Chains for Mathematicians
No ratings yet
Markov Chains for Mathematicians
59 pages
Discrete Time Markov
No ratings yet
Discrete Time Markov
71 pages
Stochastic Process Simulation in Matlab
No ratings yet
Stochastic Process Simulation in Matlab
17 pages
Markov Chains for Business Students
No ratings yet
Markov Chains for Business Students
39 pages
Markov Chains
No ratings yet
Markov Chains
50 pages
Markov Chains: 1.1 Specifying and Simulating A Markov Chain
No ratings yet
Markov Chains: 1.1 Specifying and Simulating A Markov Chain
38 pages
STA 412 Markov Chain
No ratings yet
STA 412 Markov Chain
11 pages
18 615 Notes
No ratings yet
18 615 Notes
33 pages
Long-Run State Occupation in Markov Chains
No ratings yet
Long-Run State Occupation in Markov Chains
53 pages
STA03B3 Lecture 3
No ratings yet
STA03B3 Lecture 3
27 pages
Understanding Markov Chains Basics
No ratings yet
Understanding Markov Chains Basics
17 pages
Chapter 8 - Markov Chains
No ratings yet
Chapter 8 - Markov Chains
28 pages
18-MarkovChains 2
No ratings yet
18-MarkovChains 2
30 pages
Markov Chains
No ratings yet
Markov Chains
21 pages
MCMC Notes by Mark Holder
No ratings yet
MCMC Notes by Mark Holder
16 pages
Markov Chains: Concepts & Examples
No ratings yet
Markov Chains: Concepts & Examples
8 pages
MATH858D Markov Chains: Maria Cameron
No ratings yet
MATH858D Markov Chains: Maria Cameron
44 pages
Markov Chains2
No ratings yet
Markov Chains2
75 pages
Screenshot 2025-05-06 at 4.01.34 AM
No ratings yet
Screenshot 2025-05-06 at 4.01.34 AM
5 pages
Discrete Finite Markov Chains Notes
No ratings yet
Discrete Finite Markov Chains Notes
7 pages
Computational Genomics Hidden Markov Models (HMMS)
No ratings yet
Computational Genomics Hidden Markov Models (HMMS)
55 pages
Markov Chain
No ratings yet
Markov Chain
7 pages
ESI 4313 Operations Research 2: Markov Chains Basics
No ratings yet
ESI 4313 Operations Research 2: Markov Chains Basics
45 pages
Discrete Markov Chain
No ratings yet
Discrete Markov Chain
43 pages
(MTL106) Review Notes - Stochastic Processes (IITD)
No ratings yet
(MTL106) Review Notes - Stochastic Processes (IITD)
8 pages
Precalculus With Trigonometry
94% (33)
Precalculus With Trigonometry
791 pages
All The Math You Missed - But Need To Know For Graduate School
100% (36)
All The Math You Missed - But Need To Know For Graduate School
417 pages
Introduction To Linear Algebra For Science and Engineering 1st Ed
90% (59)
Introduction To Linear Algebra For Science and Engineering 1st Ed
550 pages
An Introduction To Linear Algebra For Science and Engineering - 3rd Ed - Norman
100% (17)
An Introduction To Linear Algebra For Science and Engineering - 3rd Ed - Norman
592 pages
Linear Algebra Optimization Machine Learning PDF
100% (12)
Linear Algebra Optimization Machine Learning PDF
507 pages
LinearAlgebra
100% (26)
LinearAlgebra
532 pages
Calculus Better Explained PDF
93% (14)
Calculus Better Explained PDF
76 pages
Learning Statistics
100% (30)
Learning Statistics
408 pages
Ordinary Differential Equations 9781498733816 Compress
91% (11)
Ordinary Differential Equations 9781498733816 Compress
864 pages
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
94% (18)
Artificial Intelligence With Python (Machine Learning Foundations, Methodologies, and Applications) (Teik Toe Teoh, Zheng Rong)
334 pages
Understanding Machine Learning
100% (73)
Understanding Machine Learning
416 pages
Machine Learning - An Applied Mathematics Introduction PDF
100% (14)
Machine Learning - An Applied Mathematics Introduction PDF
246 pages
Essential Calculus
100% (10)
Essential Calculus
156 pages
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
100% (11)
Deep Learning - Fundamentals, Theory and Applications 2019 PDF
168 pages
Full Course of Machine Learning
100% (17)
Full Course of Machine Learning
660 pages
Foundations of Geometry - Venema, G., Second Edition
94% (16)
Foundations of Geometry - Venema, G., Second Edition
407 pages
Bayesian Statistical Methods
100% (11)
Bayesian Statistical Methods
288 pages
Essential Calculus Skills Practice Workbook With Full Solutions
95% (87)
Essential Calculus Skills Practice Workbook With Full Solutions
528 pages
Lets Learn AI Base Module PDF
87% (15)
Lets Learn AI Base Module PDF
196 pages
Artificial Intelligence Problems and Their Solutions
100% (2)
Artificial Intelligence Problems and Their Solutions
273 pages
Statistics in Action
96% (26)
Statistics in Action
903 pages
Linear Algebra - Pure & Applied
88% (8)
Linear Algebra - Pure & Applied
734 pages
Calculus Data Science
100% (6)
Calculus Data Science
271 pages
50 Challenging Calculus Problems (Fully Solved) - Chris McMullen
100% (18)
50 Challenging Calculus Problems (Fully Solved) - Chris McMullen
236 pages
Understanding Mathematical Proof
100% (12)
Understanding Mathematical Proof
416 pages
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
100% (19)
Learning The Pandas Library Python Tools For Data Munging Analysis and Visual PDF
208 pages
Calculus - Concepts and Applications - Foerster
83% (58)
Calculus - Concepts and Applications - Foerster
731 pages
Hackers Guide To Machine Learning With Python PDF
100% (16)
Hackers Guide To Machine Learning With Python PDF
272 pages
(Industrial and Applied Mathematics) Martin Brokate, Pammy Manchanda, Abul Hasan Siddiqi - Calculus For Scientists and Engineers (2019, Springer)
94% (16)
(Industrial and Applied Mathematics) Martin Brokate, Pammy Manchanda, Abul Hasan Siddiqi - Calculus For Scientists and Engineers (2019, Springer)
655 pages
Burkov's Guide to Machine Learning
100% (11)
Burkov's Guide to Machine Learning
135 pages
Ec 1 Module 5 Vectors and Its Applications
No ratings yet
Ec 1 Module 5 Vectors and Its Applications
3 pages
Methods for Bleaching Gray Kaolin
No ratings yet
Methods for Bleaching Gray Kaolin
5 pages
Test 5
No ratings yet
Test 5
6 pages
Arborvitae and Evergreen Varieties Guide
No ratings yet
Arborvitae and Evergreen Varieties Guide
27 pages
Turning Tools - Tooling Systems
No ratings yet
Turning Tools - Tooling Systems
94 pages
Ancient Egyptian Meditation Posture Guide
100% (1)
Ancient Egyptian Meditation Posture Guide
5 pages
Quantifying Uncertainty with Kragten Method
No ratings yet
Quantifying Uncertainty with Kragten Method
5 pages
Tooth Morphology Basics
No ratings yet
Tooth Morphology Basics
86 pages
Tech Pack 2
No ratings yet
Tech Pack 2
7 pages
VCI Powder for Corrosion Protection
No ratings yet
VCI Powder for Corrosion Protection
4 pages
How Artificial Intelligence Can Power Clinical Development
No ratings yet
How Artificial Intelligence Can Power Clinical Development
9 pages
Audit For Coating
No ratings yet
Audit For Coating
42 pages
Database Concurrency Techniques
No ratings yet
Database Concurrency Techniques
31 pages
Using MPU 9250 IMU in Vertical Orientation
No ratings yet
Using MPU 9250 IMU in Vertical Orientation
3 pages
Avocado Cultivation and Irrigation Guide
No ratings yet
Avocado Cultivation and Irrigation Guide
8 pages
CH 8-The-Prisoner-of-Zenda PDF
No ratings yet
CH 8-The-Prisoner-of-Zenda PDF
8 pages
Adnan
No ratings yet
Adnan
19 pages
CCNA - Doc Version 1
No ratings yet
CCNA - Doc Version 1
38 pages
Chaper 9 Battery Storage and Standalone System Design: Presentation #2
No ratings yet
Chaper 9 Battery Storage and Standalone System Design: Presentation #2
31 pages
Walmart Project
No ratings yet
Walmart Project
6 pages
Solenoides Ss Series Parker
No ratings yet
Solenoides Ss Series Parker
34 pages
O-Level Exercise & Sports Science Syllabus 2021
No ratings yet
O-Level Exercise & Sports Science Syllabus 2021
18 pages
71 - Ansi Agma 2008-D11
No ratings yet
71 - Ansi Agma 2008-D11
49 pages
"Home of The Wildcats": Thornton Township High School
No ratings yet
"Home of The Wildcats": Thornton Township High School
5 pages
ELM3720 Manual
No ratings yet
ELM3720 Manual
16 pages
Home Reading Report 2
100% (1)
Home Reading Report 2
1 page
Manhour Rate
No ratings yet
Manhour Rate
8 pages
Sony Fda-Ev1s Ver.1.0 SM
No ratings yet
Sony Fda-Ev1s Ver.1.0 SM
8 pages
LEGO Set Retirement Dates
No ratings yet
LEGO Set Retirement Dates
21 pages
Three-N-One Adapter: Standard Features
No ratings yet
Three-N-One Adapter: Standard Features
1 page

Lec7 MarkovChains

Uploaded by

Lec7 MarkovChains

Uploaded by

COLUMBIA UNIVERSITY Prof.

Marco Avella Medina

1.1 Snow Fall Data

(a) Observed Snow Depth (b) Binary Time Series

Figure 1: Snow Fall Data

1.2 DNA Data

Figure 2: DNA Sequence Data

1.3 Breast Cancer Data

Figure 3: Breast Cancer Data

2.1.2 Markov Property

Figure 4: Markov Chain Models

2.2 Stationarity and Transition Probabilities

2.3 Transition Matrix

(a) Model Diagram (b) Transition Matrix

Figure 5: Four-state Markov chain

By solving the π > P = π > and π > 1S = 1S , we get

(a) Model Diagram (b) Transition Matrix

Figure 6: Six-state Markov chain

transition matrix sum to 1 since

2.5 Periodicity (Optional)

d(i) = gcd{m ≥ 1 : Piim > 0}.

If d(i) = 1, then the state i is called aperiodic.

for any m ≥ N , states i and j.

2.6 Recurrence and Transience (Optional)

τi,i = min{k : Xk = i|X0 = i}

Example. Recurrence and transience of simple random walk on Rd . It can be proved

From expression above we get the log-likelihood

The MLE of prs can be shown to be of the form

where nr· = Sj=1 nrj .

Let the transition matrix be P = (p> > > >

Example. Assume X = (0, 0, 0, 1, 1, 0, 0, 0, 1, 0) was drawn from a first order Markov

So the maximum likelihood estimates of prs will be

3.2 A Simpler Model

Example. DNA Sequence Data

Figure 7: Fit of Markov Chains to the DNA Data

is asymptotically equivalent to the Pearson chi-squared statistic given i (Oi − Ei )2 /Ei ,

P(Xj = s|X0 = s0 , . . . , Xtj−1 = sj−1 )

Figure 8: Second-order Markov Chains Model

Example. Model Selection (DNA Sequence Data)

AIC(θ̂k ) = −2`k (θ̂k ) + 2k

In general, the eigenvectors are normalized, x> x = 1. The multiplicity of an eigenvalue

• If λ is an eigenvalue of A, then 1/λ is an eigenvalue of A−1 .

• If A is a matrix n × n and G a non-singular matrix n × n, then A and G−1 AG have

• A singular matrix has at least one eigenvalue equal to zero.

• An idempotent matrix has only eigenvalues equal to 0 or 1.

• An orthogonal matrix has only eigenvalues equal to 1 and -1.

• Given A a real symmetric n × n :

– All its eigenvalues are real .

• Given A(n × n) a real symmetric matrix, positive definite :

• Given A and B two real symmetric matrices n × n :

– The roots of |A − λB| are the same as those of T −1 AT −1 where B = T > T .

RAR> = Λ and RBR> = In

You might also like