Introduction to
Bayesian Time Series Econometrics
Joshua Chan
Australian National University
5 July 2013
Instructor
Name: Joshua Chan (Josh or Dr. Chan)
Website:
[Link]
Email:
[Link]@[Link]
Overview of the Workshop
Purpose
◦ prepare you for DSGE lectures
◦ go over basic Bayesian time series models and computations
◦ include MATLAB tutorials
Topics covered
◦ linear regression, autoregressive moving average models
◦ vector autoregressive (VAR) models
◦ state space models
◦ Bayesian model comparison
Computation techniques
◦ Monte Carlo simulation
◦ Markov chain Monte Carlo: Gibbs sampling,
Metropolis-Hastings algorithm
Logistics
◦ 8 hours of lecture (9am to 1pm, two days)
◦ 6 hours of computer lab/exercises (2pm to 5pm, two days)
Plan for Today
Four 50-minute Sessions
◦ Intro to Bayesian Econometrics
◦ 1-parameter normal model (Monte Carlo integration)
◦ 2-parameter normal model (Gibbs sampler)
◦ linear regression with iid errors
◦ linear regression with moving average errors (MH algorithm)
Recommended Readings
Textbooks
◦ Koop (2003) Bayesian Econometrics, Wiley
◦ Koop, Poirier and Tobias (2007) Bayesian Econometric
Methods, Cambridge University Press
◦ Geweke (2005) Contemporary Bayesian Econometrics and
Statistics, Wiley
Survey papers
◦ Del Negro and Schorfheide (2011) “Bayesian
Macroeconometrics,” in Geweke, Koop and van Dijk (eds):
The Oxford Handbook of Bayesian Econometrics, Oxford
University Press
◦ Koop and Korobilis (2010) “Bayesian Multivariate Time Series
Methods for Empirical Macroeconomics,” Foundations and
Trends in Econometrics, 3(4): 267-358
Introduction
Frequentist inference
◦ frequentist interpretation of probabilities—an event’s
probability as the limit of its relative frequency in a large
number of trials
◦ θ is thought to be an unknown, but fixed quantity
◦ knowledge of θ is obtained from an observed sample
y1 , . . . , yn (only)
◦ usually summarized by the likelihood function
L(θ; y) = f (y | θ)
Bayesian inference
◦ probability is a (subjective) measure of the “degree of belief”
of the individual assessing the event
◦ θ is a random quantity and has a distribution f (θ) called the
prior distribution
◦ knowledge of θ comes from an observed sample y1 , . . . , yn and
the prior distribution
◦ the goal of the analysis is to obtain the posterior distribution
f (θ | y) using Bayes’ Theorem
f (θ)f (y | θ)
f (θ | y) = ∝ f (θ)f (y | θ)
m(y)
R
where m(y) = f (θ)f (y | θ)dθ is called the marginal
likelihood
◦ then compute E(θ | y), Cov(θ | y), P(θj > 0)
1-parameter Normal Model
Suppose we take n measurements y1 , . . . , yn of an unknown
quantity µ where the magnitude of measurement error is known
From a small-scale study µ is estimated to be around µ0 (but with
substantial uncertainty)
One model is
(yi | µ) ∼ N(µ, σ 2 ), i = 1, . . . , n,
µ∼ N(µ0 , σ02 ),
where σ 2 , µ0 , and σ02 are known
Recall that for x ∼ N(a, b 2 ), the density of x is
1 − 1
(x−a)2 − 12 1 2
x −2x a2
f (x) = √ e 2b 2 ∝e b2 b
2πb 2
What do we learn from the new measurements?
All the information is summarized by the posterior distribution
f (µ)f (y | µ)
f (µ | y) = ,
m(y)
where y = (y1 , . . . , yn )′
Computation: Overview
Immediate goal: derive f (µ | y)
Ignore any constants in f (µ | y) not involving µ
Then try to figure out what density f (µ | y) is
Computation: Details
Now do the computations...
1 n
Y
− (µ−µ0 )2 1 2
f (µ | y) ∝ f (µ)f (y | µ) ∝ e 2σ 2
0 e− 2σ2 (yi −µ)
i =1
P
1 µ2 − 2µµ0 −2µ yi + nµ2
∝ exp − +
2 σ02 σ2
1 1 n 2 µ0 nȳ
∝ exp − + µ − 2µ + 2
2 σ02 σ 2 σ02 σ
The exponent is quadratic in µ, hence, (µ | y) ∼ N(b
µ , Dµ )
Computation: Details
Now compare
1 1 n 2 µ0 nȳ
f (µ | y) ∝ exp − + µ − 2µ +
2 σ02 σ 2 σ02 σ2
with
− 21 1
µ2 −2µ Dµ
b
f (µ | y) ∝ e Dµ µ
We have
−1
1 n µ0 nȳ
Dµ = 2 + 2 , µ
b = Dµ + 2
σ0 σ σ02 σ
Posterior Mean Interpretation
In particular, the posterior mean is
−1
1 n µ0 nȳ
E(µ | y) = µ
b= 2 + 2 + 2
σ0 σ σ02 σ
1 1
σ02 σ2 /n
= 1 n
µ0 + 1 n
ȳ
σ02
+ σ2 σ02
+ σ2
a weighted average of the prior mean and the MLE
Numerical Examples
Suppose µ0 = 10, n = 3, ȳ = 20 and σ 2 = 1
Case 1: σ02 = 0.1; Case 2: σ02 = 10
1.5 0.8
prior
0.7 posterior
0.6
1
0.5
0.4
0.3
0.5
0.2
0.1
0 0
8 10 12 14 0 5 10 15 20 25
Quantities of Interest
p
Since (µ | y) ∼ N(b
µ, Dµ ), can easily compute Var(µ | y),
P(µ > 0 | y), 95% credibe set for µ, etc.
But suppose we wish to obtain E(g (µ) | y) < ∞ for some function
g
Can estimate that using Monte Carlo integration
Monte Carlo Integration
First note that
Z
E(g (µ) | y) = g (µ)f (µ | y)dµ
(cannot be computed analytically in general)
Monte Carlo integration: generate R draws µ(1) , . . . , µ(R) from
f (µ | y), and compute
R
1X
gb = g (µ(r ) )
R
r =1
By the weak law of large numbers, gb converges weakly in
probability to E(g (µ) | y) < ∞
Numerical Examples (Continued)
Consider case 2. Then, µ
b = 19.67 and Dµ = 0.3226
Suppose g (x) = log |x|
Recall that if Z ∼ N(0, 1), then
Y = a + bZ
follows the N(a, b 2 ) distribution
MATLAB Code
mu0 = 10; sig02 = 10; n=3; ybar = 20; sig2 = 1;
Dmu = 1/(1/sig02 + n/sig2);
muhat = Dmu*(mu0/sig02 + n*ybar/sig2);
R = 10000;
mu = muhat + sqrt(Dmu)*randn(R,1);
ghat = mean(log(abs(mu)));
2-parameter Normal Model
Now we extend the previous model so that σ 2 is unknown
The model is again
(yi | µ) ∼ N(µ, σ 2 ), i = 1, . . . , n,
where both µ and σ 2 are unknown
The same prior for µ : N(µ0 , σ02 )
Need a prior for σ 2 : InvGamma(ν0 , S0 )
Inverse-Gamma Distribution
A random variable Z is said to have an inverse-gamma distribution
with shape parameter α > 0 and scale parameter β > 0 if its pdf is
given by
β α −(α+1) −β/z
f (z; α, β) = z e
Γ(α)
We write Z ∼ InvGamma(α, β)
β β2
Moments: EZ = α−1 for α > 1, Var(Z ) = (α−1)2 (α−2)
for α > 2
Sampling from the Inverse-Gamma Distribution
Sample X ∼ Gamma(α, λ) and set Z = 1/X
Exercise: Show that the pdf of Z is
β α −(α+1) −β/z
f (z; α, β) = z e
Γ(α)
where the pdf of X is
β α α−1 −βx
f (x; α, β) = z e
Γ(α)
The Joint Posterior Distribution
By Bayes’ Theorem, the joint posterior is given by
f (µ, σ 2 | y) ∝ f (µ, σ 2 , y)
∝ f (µ)f (σ 2 )f (y | µ, σ 2 )
1 n
Y
− (µ−µ0 )2 S0 1 1 2
∝e 2σ 2
0 (σ 2 )−(ν0 +1) e− σ2 (σ 2 )− 2 e− 2σ2 (yi −µ)
i =1
We wish to obtain E(µ | y), Var(σ 2 | y) or the quantile µq so that
P(µ > µq ) = q
But those quantities cannot be obtained analytically
Monte Carlo Simulation
Idea: Use Monte Carlo methods. Specifically, obtain draws from
f (µ, σ 2 | y), say, µ(1) , . . . , µ(R) , σ 2(1) , . . . , σ 2(R)
Then compute
R R
1 X (r ) 1 X 2(r )
µ , (σ − σ̄ 2 )2 , µ(qR)
R R
r =1 r =1
One method to do that is the Gibbs Sampler
Gibbs Sampler: Overview
Suppose we want to sample from f (Θ) = f (θ 1 , . . . , θ n ) (target
distribution)
Gibbs sampler constructs a Markov chain Θ(1) , Θ(2) , . . . such that
its limiting distribution converges to f
Use full conditional distributions f (θ i | θ 1 , . . . , θ i −1 , θ i +1 , . . . , θ n )
as the transition kernels
Gibbs Sampler: Details
Starting from an initial state Θ(0) , repeat the following steps from
r = 1 to R:
1. Given the current state Θ(r ) = Θ, generate Y = (Y1 , . . . , Yn )
as follows:
1.1 Draw Y1 ∼ f (y1 | θ2 , . . . , θn ).
1.2 Draw Yi ∼ f (yi | Y1 , . . . , Yi −1 , θi +1 , . . . , θn ), i = 2, . . . , n − 1.
1.3 Draw Yn ∼ f (yn | Y1 , . . . , Yn−1 ).
2. Set Θ(r +1) = Y.
Common Misconceptions
The Markov chain Θ(1) , Θ(2) , . . . does not converge to a fixed
point in Rk
Rather, the distribution of Θ(r ) converges to the target distribution
An example:
3.2
3.15
3.1
3.05
2.95
2.9
2.85
0 2000 4000 6000 8000 10000
Histograms of the previous Markov chain:
150 150
100 100
50 50
0 0
2.9 3 3.1 2.9 3 3.1
2-parameter Normal Model (Continued)
To construct a Gibbs sampler to sample from f (µ, σ 2 | y), we need
to derive f (µ | y, σ 2 ) and f (σ 2 | y, µ)
(1) f (σ 2 | y, µ):
f (σ 2 | y, µ) ∝ f (µ, σ 2 | y)
S0 1 Pn 2
∝ (σ 2 )−(ν0 +1) e− σ2 (σ 2 )−n/2 e− 2σ2 i =1 (yi −µ)
P 2
S0 + ni =1 (yi −µ) /2
2 −(ν0 +n/2+1) −
∝ (σ ) e σ2
A known distribution?
Compare
P 2
S0 + ni =1 (yi −µ) /2
2 2 −(ν0 +n/2+1) −
f (σ | y, µ) ∝ (σ ) e σ2
with
2
f (σ 2 ; α, β) ∝ (σ 2 )−(α+1) e−β/σ
Hence,
n
!
X
(σ 2 | y, µ) ∼ InvGamma ν0 + n/2, S0 + (yi − µ)2 /2
i =1
When σ 2 is given (i.e., known), then the model reduces to the
1-parameter normal model
(2) f (µ | y, σ 2 ):
f (µ | y, σ 2 ) ∝ f (µ)f (y | µ, σ 2 )
1 1 n 2 µ0 nȳ
∝ exp − + µ − 2µ + 2
2 σ02 σ 2 σ02 σ
Hence, (µ | y) ∼ N(b
µ, Dµ ) where
−1
1 n µ0 nȳ
Dµ = 2
+ 2 , µ
b = Dµ + 2
σ0 σ σ02 σ
Gibbs Sampler for the 2-parameter Normal Model
Pick some initial values µ(0) = a0 and σ 2(0) = b0 > 0. Then,
repeat the following steps from r = 1 to R:
1. Draw σ 2(r ) ∼ f (σ 2 | y, µ(r −1) ) (inverse-gamma).
2. Draw µ(r ) ∼ f (µ | y, σ 2(r ) ) (normal).
Usually discard the first m draws as “burn-in”
Then, given µ(1) , . . . , µ(R) and σ 2(1) , . . . , σ 2(R) one can compute,
e.g.,
R
1 X (r )
µ
R
r =1
as an estimate of E(µ | y)
MATLAB Code
% norm_2para.m
nloop = 10000; burnin = 1000;
n = 500; mu = 3; sig2 = .5;
y = mu + sqrt(sig2)*randn(n,1);
% prior
mu0 = 0; sig20 = 100;
nu0 = 3; S0 = .5;
% initialize the Markov chain
mu = 0; sig2 = 1;
store_theta = zeros(nloop,2);
MATLAB Code
for loop=1:nloop
% sample mu
Dmu = 1/(1/sig20 + n/sig2);
muhat = Dmu*(mu0/sig20 + sum(y)/sig2);
mu = muhat + sqrt(Dmu)*randn;
% sample sig2
sig2 = 1/gamrnd(nu0+n/2,1/(S0+sum((y-mu).^2)/2));
% store the parameters
store_theta(loop,:) = [mu sig2];
end
store_theta = store_theta(burnin+1:end,:);
thetahat = mean(store_theta);
Linear Regression Model
Consider the linear regression model:
yt = β0 + xt,1 β1 + · · · + xt,k βk + ǫt , ǫt ∼ N(0, σ 2 ),
where xt = (1, xt,1 , . . . , xt,k )′ is a vector of regressors and
β = (β0 , β1 , . . . , βk )′ is the associated vector of regression
coefficients
(The 2-parameter normal model is a spacial case with only an
intercept and β0 = µ)
Linear Regression Model: Matrix Form
Hence,
y1 x1,1 x1,2 ··· x1,k ǫ1
y2 x2,1 β1
x2,2 ··· x2,k .
ǫ2
.. = .. .. .. .. .. + ..
. . . . . .
βk
yT xT ,1 xT ,2 · · · xT ,k ǫT
Or equivalently,
y = Xβ + ǫ, ǫ ∼ N(0, σ 2 IT ),
where IT is the T × T identity matrix
Two Useful Results
(1) An affine transformation (linear transformation followed by a
translation) of normal random variables is a (multivariate) normal
random variable
(2) Suppose U has an expectation vector µU and covariance
matrix ΣU . Let V = AU. Then
µV = AµU , ΣV = AΣU A′ .
Since y is an affine transformation of ǫ ∼ N(0, σ 2 IT ), so y has a
normal distribution
Moreover, given β and σ 2 , the expectation and covariance matrix
of y are Xβ and σ 2 IT
Therefore, we have
(y | β, σ 2 ) ∼ N(Xβ, σ 2 IT )
Linear Regression Model: Likelihood
Since
(y | β, σ 2 ) ∼ N(Xβ, σ 2 IT ),
the likelihood function is given by:
1 1 ′ 2I −1 (y−Xβ)
f (y | β, σ 2 ) = |2πσ 2 IT |− 2 e− 2 (y−Xβ) (σ T)
T 1 ′
= (2πσ 2 )− 2 e− 2σ2 (y−Xβ) (y−Xβ)
Recall that for an n × n matrix A and c ∈ R
|cA| = c n |A|
Priors and Gibbs Sampler
Independent priors for β and σ 2 :
β ∼ N(β 0 , Vβ ), σ 2 ∼ InvGamma(ν0 , S0 )
Use Gibbs sampler to estimate the model
We need to derive (1) (σ 2 | y, β) and (2) (β | y, σ 2 )
Sample (σ 2 | y, β)
(1) f (σ 2 | y, β):
S0 1 ′
f (σ 2 | y, β) ∝ (σ 2 )−(ν0 +1) e− σ2 (σ 2 )−T /2 e− 2σ2 (y−Xβ) (y−Xβ)
1 ′
∝ (σ 2 )−(ν0 +T /2+1) e− σ2 (S0 +(y−Xβ) (y−Xβ)/2)
Hence,
(σ 2 | y, β) ∼ InvGamma ν0 + T /2, S0 + (y − Xβ)′ (y − Xβ)/2
Recall (AB)′ = B′ A′ , and hence
(y − Xβ)′ (y − Xβ) = y′ y − y′ Xβ − β ′ X′ y + β ′ X′ Xβ
= β ′ X′ Xβ − 2β ′ X′ y + y′ y,
since β ′ X′ y is a scalar so
β ′ X′ y = (β ′ X′ y)′ = y′ Xβ
Sample (β | y, σ 2)
(2) f (β | y, σ 2 ):
f (β | y, σ 2 ) ∝ f (β)f (y | β, σ 2 )
1 ′ −1
(β−β0 ) − 2σ1 2 (y−Xβ)′ (y−Xβ)
∝ e− 2 (β−β0 ) Vβ e
− 12 (β ′ Vβ
−1 −1
β−2β ′ Vβ β 0 ) − 2σ1 2 (β ′ X′ Xβ−2β ′ X′ y)
∝e e
h i
− 12 −1
β ′ Vβ + 1
X′ X −1
β−2β ′ Vβ β0+ 1
X′ y
∝e σ2 σ2
The exponent is quadratic in β. Hence,
b Dβ )
(β | y, σ 2 ) ∼ N(β,
Now compare
h i
2 − 21 β′ Vβ
−1
+ 1
X′ X −1
β−2β ′ Vβ β0 + 1
X′ y
f (β | y, σ ) ∝ e σ2 σ2
with 1
f (β | y, σ 2 ) ∝ e− 2 (β Dβ β β)
′ −1
β−2β ′ D−1 b
We have
−1
1 b 1 ′
Dβ = Vβ−1 + 2 X′ X , −1
β = Dβ Vβ β 0 + 2 X y
σ σ
Sampling from the Multivariate Normal Distribution
To generate R independent draws from N(µ, Σ) of dimension n,
carry out the following steps:
1. Compute the lower Cholesky factorization Σ = BB′ .
2. Generate Z = (Z1 , . . . , Zn )′ by drawing Z1 , . . . , Zn ∼ N(0, 1).
3. Output U = µ + BZ.
4. Repeat Steps 2 and 3 independently R times.
Gibbs Sampler for the Linear Regression Model
Pick some initial values β (0) = a0 and σ 2(0) = b0 > 0. Then,
repeat the following steps from r = 1 to R:
1. Draw σ 2(r ) ∼ f (σ 2 | y, β (r −1) ) (inverse-gamma).
2. Draw β (r ) ∼ f (β | y, σ 2(r ) ) (multivariate normal).
MATLAB Code
% linreg.m
nloop = 10000; burnin = 1000;
n = 500; beta = [1 5]’; sig2 = .5;
X = [ones(n,1) 1+randn(n,1)];
y = X*beta + sqrt(sig2)*randn(n,1);
% prior
beta0 = [0 0]’; invVbeta0 = speye(2)/100;
nu0 = 3; S0 = .5;
% initialize the Markov chain
beta = (X’*X)\(X’*y); sig2 = 1;
store_theta = zeros(nloop,3);
MATLAB Code
for loop=1:nloop
% sample beta
Dbeta = (invVbeta0 + X’*X/sig2)\speye(2);
betahat = Dbeta*(invVbeta0*beta0 + X’*y/sig2);
C = chol(Dbeta,’lower’);
beta = betahat + C*randn(2,1);
% sample sig2
e = y-X*beta;
sig2 = 1/gamrnd(nu0+n/2,1/(S0+e’*e/2));
% store the parameters
store_theta(loop,:) = [beta’ sig2];
end
store_theta = store_theta(burnin+1:end,:);
thetahat = mean(store_theta);
Linear Regression Model with Moving Average Errors
Consider again the linear regression model:
yt = x′t β + ǫt ,
but with MA(1) error:
ǫt = ut + ψut−1 ,
where u0 = 0 and u1 . . . , uT ∼ N(0, σ 2 )
(The previous linear regression model is a spacial case with ψ = 0)
Priors and Estimation
Independent priors for β, σ 2 , and ψ
β ∼ N(β 0 , Vβ ), σ 2 ∼ InvGamma(ν0 , S0 ), ψ ∼ U(−1, 1),
To construct a Gibbs sampler, we need to derive
1. (σ 2 | y, β, ψ),
2. (β | y, σ 2 , ψ),
3. (ψ | y, β, σ 2 ).
Turns out Steps 1 and 2 can be done as before, but f (ψ | y, β, σ 2 )
is not a standard density
Estimation: Overview
We will first derive f (σ 2 | y, β, ψ) (inverse-gamma) and
f (β | y, σ 2 , ψ) (multivariate normal)
Introduce the Metropolis-Hastings algorithm to handle Step 3
(The resulting sampler is sometimes called
Metropolis-within-Gibbs)
Likelihood: Derivation
To derive the likelihood function, rewrite the MA(1) model into
the matrix form (Chan, 2013)
ǫ = Hψ u,
where u = (u1 , . . . , uT )′ ∼ N(0, σ 2 IT ), and
1 0 0 ··· 0
ψ 1 0 ··· 0
0
Hψ = 0 ψ 1 ···
.. .. ..
. . .
0 0 ··· ψ 1
is a sparse T × T matrix that contains only (2T − 1) non-zero
elements
Hence,
y = Xβ + ǫ = Xβ + Hψ u, u ∼ N(0, σ 2 IT )
By a change of variable, we have
(y | β, σ 2 , ψ) ∼ N(Xβ, σ 2 Hψ H′ψ ),
Note that Hψ is a lower triangular matrix with ones on the main
diagonal. Hence, |Hψ | = 1
Likelihood: Computation
The likelihood function can be written as
1 1 ′ 2H ′ −1 (y−Xβ)
f (y | β, σ 2 , ψ) = |2πσ 2 Hψ H′ψ |− 2 e− 2 (y−Xβ) (σ ψ Hψ )
T 1 ′ ′ −1 (y−Xβ)
= (2πσ 2 )− 2 e− 2σ2 (y−Xβ) (Hψ Hψ )
The likelihood function can be evaluated quickly
To compute (Hψ H′ψ )−1 (y − Xβ), one needs not obtain
(Hψ H′ψ )−1 , which is a very time-consuming operation
Instead, solve the system
(Hψ H′ψ )x = (y − Xβ)
for x
Can be done by in MATLAB using the command \
To evaluate the likelihood function for general MA(q) model, one
only needs to redefine the matrix Hψ appropriately
Sample (σ 2 | y, β, ψ)
(1) f (σ 2 | y, β, ψ):
S0 T 1 ′ ′ −1 (y−Xβ)
f (σ 2 | y, β, ψ) ∝ (σ 2 )−(ν0 +1) e− σ2 (σ 2 )− 2 e− 2σ2 (y−Xβ) (Hψ Hψ )
1 ′ ′ −1 (y−Xβ)/2)
∝ (σ 2 )−(ν0 +T /2+1) e− σ2 (S0 +(y−Xβ) (Hψ Hψ )
Hence,
(σ 2 | y, β) ∼ InvGamma (ν0 + T /2, S) ,
where S = S0 + (y − Xβ)′ (Hψ H′ψ )−1 (y − Xβ)/2
Sample (β | y, σ 2, ψ)
(2) f (β | y, σ 2 , ψ):
1 −1 1
′ (y−Xβ)′ (Hψ H′ψ )−1 (y−Xβ)
∝ e− 2 (β−β0 ) Vβ (β−β0 ) −
e 2σ 2
1 ′ −1 −1
β−2β ′ Vβ β 0 ) − 2σ1 2 (β ′ X′ (Hψ H′ψ )−1 Xβ−2β ′ X′ (Hψ H′ψ )−1 y)
∝ e− 2 (β Vβ e
h i
− 12 β ′ Vβ
−1
+ 1
X′ (Hψ H′ψ )−1 X −1
β−2β′ Vβ β0 + 1
X′ (Hψ H′ψ )−1 y
∝e σ2 σ2
Again, the exponent is quadratic in β, and we have
b Dβ )
(β | y, σ 2 , ψ) ∼ N(β,
where
−1
1 ′
Dβ = Vβ−1 ′ −1
+ 2 X (Hψ Hψ ) X ,
σ
b −1 1 ′ ′ −1
β = Dβ Vβ β 0 + 2 X (Hψ Hψ ) y
σ
Sample (ψ | y, β, σ 2)
(3) f (ψ | y, β, σ 2 ):
f (ψ | y, β, σ 2 ) ∝ f (ψ)f (y | β, σ 2 , ψ)
1 ′ ′ −1 (y−Xβ)
∝ e− 2σ2 (y−Xβ) (Hψ Hψ )
with −1 < ψ < 1
The density f (ψ | y, β, σ 2 ) is non-standard
We don’t know how to sample from f (ψ | y, β, σ 2 ), but it can be
evaluated quickly (up to a normalization constant)
Metropolis-Hastings Algorithm: Overview
Suppose ψ is the current draw. Obtain a candidate draw ψ c , and
decide (probabilistically) whether to accept it or not
The acceptance probability is computed so that the detailed
balance equations are satisfied
As a result, the limiting distribution of the Markov chain
constructed is guaranteed to converge to the target
See Chib and Greenberg (1995)
Metropolis-Hastings Algorithm: Implementation
Two variants: random walk sampler and independent sampler
Random walk sampler:
1. the candidate is obtained via ψ c = ψ + v , where v ∼ g for
some distribution g (usually g is symmetric around 0)
2. accept ψ c with probability
f (ψ c | y, β, σ 2 )
min 1,
f (ψ | y, β, σ 2 )
3. if ψ c is not accepted, stay at ψ
Some comments:
◦ easy to implement
◦ g is typically chosen to be normal of t distributions with 0
mean
◦ might need some fine-tuning, but works for low-dimensional
problems
◦ could be very inefficient for high-dimensional problems
Independent sampler:
1. the candidate is obtained via ψ c ∼ q(ψ) (called proposal
density)
2. accept ψ c with probability
f (ψ c | y, β, σ 2 )q(ψ)
min 1,
f (ψ | y, β, σ 2 )q(ψ c )
3. if ψ c is not accepted, stay at ψ
Some comments:
◦ requires more from user/more difficult to implement
◦ choose q(ψ) as “close” as possible to f (ψ c | y, β, σ 2 )
◦ (if q(ψ) = f (ψ | y, β, σ 2 ), then acceptance prob. = 1 and it
reduces to the Gibbs sampler)
◦ could be very efficient or very inefficient (depends on the
choice of q(ψ))
Sample (ψ | y, β, σ 2)
Recall 1 ′ ′ −1 (y−Xβ)
f (ψ | y, β, σ 2 ) ∝ e− 2σ2 (y−Xβ) (Hψ Hψ )
with −1 < ψ < 1
Use a random walk sampler with ψ c = ψ + v , where v ∼ N(0, 0.1)
MATLAB Code: Evaluating log f (ψ | y, β, σ 2)
% ppsi.m
function llike = ppsi(psi,y,X,beta,sig2)
n = length(y);
Hpsi = speye(n) + ...
psi*sparse(2:n,1:n-1,ones(n-1,1),n,n);
R = Hpsi*Hpsi’;
llike = -.5/sig2*(y-X*beta)’*(R\(y-X*beta));
end
MATLAB Code
% linreg_ma1_RW.m
nloop = 11000; burnin = 1000;
randn(’seed’, 314159);
psi = -.5; beta = [4 .6]’; sig2 = .5;
n = 100; y0 = 10; y = zeros(n,1);
u = sqrt(sig2)*randn(n,1);
for i=1:n
if i==1
y(i) = beta(1) + beta(2)*y0 + u(i);
else
y(i) = beta(1) + beta(2)*y(i-1) + ...
+ u(i) + psi*u(i-1);
end
end
X = [ones(n,1) [y0; y(1:end-1)]];
MATLAB Code
% prior
beta0 = [0 0]’; invVbeta0 = 1/100*speye(2);
nu0 = 3; S0 = 1;
% initialize the Markov chain
psi = .1; beta = (X’*X)\(X’*y); sig2 = 1;
store_theta = zeros(nloop,4); count = 0;
MATLAB Code
for loop=1:nloop
% sample beta
Hpsi = speye(n) + ...
psi*sparse(2:n,1:n-1,ones(n-1,1),n,n);
R = Hpsi*Hpsi’;
Dbeta = (invVbeta0 + X’*(R\X)/sig2)\speye(2);
betahat = Dbeta*(invVbeta0*beta0+X’*(R\y)/sig2);
C = chol(Dbeta,’lower’);
beta = betahat + C*randn(2,1);
% sample sig2
e = y-X*beta;
sig2 = 1/gamrnd(nu0+n/2,1/(S0+e’*(R\e)/2));
MATLAB Code
% sample psi
psic = psi + sqrt(.1)*randn;
if psic<1 && psic >-1
alp = ppsi(psic,y,X,beta,sig2) ...
- ppsi(psi,y,X,beta,sig2);
if exp(alp)>rand
psi = psic;
count = count+1;
end
end
% store the parameters
store_theta(loop,:) = [beta’ sig2 psi];
end
store_theta = store_theta(burnin+1:end,:);
thetahat = mean(store_theta);