Frank Schorfheide, University of Pennsylvania: Bayesian Methods 1
Bayesian Methods for Macroeconometrics
Frank Schorfheide
Department of Economics, University of Pennsylvania
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 2
Preliminaries
• Suppose that the random matrix Φ has density
½ ¾
1
p(Φ|Σ, X 0X) ∝ |Σ ⊗ (X 0X)−1|−T /2 exp − tr[Σ−1(Φ − Φ̂)0X 0X(Φ − Φ̂)] (1)
2
• Let β = vec(Φ) and β̂ = vec(Φ̂).
• Then
³ ´
0 0 −1
β|Σ, X X ∼ N β̂, Σ ⊗ (X X) . (2)
• Note: to generate a draw Z from a multivariate N (µ, Σ), decompose Σ = CC 0, where
C is the lower triangular Cholesky decomposition matrix. Then let Z = µ+CN (0, I).
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 3
Preliminaries
• The multivariate version of the inverted Gamma distribution is called Wishart Distri-
bution.
• Let Σ be a n×n positive definite random matrix. Σ has the inverted Wishart IW (S, ν)
distribution if its density is of the form
½ ¾
1
p(Σ|S, ν) ∝ |S|ν/2|Σ|−(ν+n+1)/2 exp − tr[Σ−1S] (3)
2
• To sample a Σ from an inverted Wishart IW (S, ν) distribution, draw n × 1 vectors
Z1, . . . , Zν from a multivariate normal N (0, S −1) and let
" ν #−1
X
Σ= ZiZi0
i=1
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 4
Preliminaries
• Recall:
½ ¾ ½ ¾
1 1
p(Y |Φ, Σ, Y0) ∝ |Σ|−T /2 exp − tr[Σ−1S] exp − tr[Σ−1(Φ − Φ̂)0X 0X(Φ − Φ̂)]
2 2
• Let’s interpret the likelihood as density:
p(Φ, Σ|S, Φ̂, X 0X)
½ ¾ ½ ¾
1 1
∝ |Σ|−T /2 exp − tr[Σ−1S] exp − tr[Σ−1(Φ − Φ̂)0X 0X(Φ − Φ̂)]
2 2
½ ¾
1
∝ |Σ|−T /2|Σ ⊗ (X 0X)−1|1/2 exp − tr[Σ−1S]
2
½ ¾
1
(2π)−nk/2|Σ ⊗ (X 0X)−1|−1/2 exp − tr[Σ−1(Φ − Φ̂)0X 0X(Φ − Φ̂)]
2
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 5
Preliminaries
• We now integrate out Φ (Note: |Σ ⊗ (X 0X)−1|1/2 = |Σ|k/2|X 0X|−n/2):
½ ¾
1
p(Σ|S, Φ̂, X 0X) ∝ |Σ|−(T −k)/2|X 0X|−n/2 exp − tr[Σ−1S]
2
• Hence,
Σ|S, Φ̂, X 0X ∼ IW(S, T − k − n − 1),
µ ¶
Φ|Σ, S, Φ̂, X 0X ∼ N Φ̂, Σ ⊗ (X 0X)−1
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 6
Dummy Observation Priors
• Suppose we have T ∗ dummy observations (Y ∗, X ∗).
• The likelihood function for the dummy observations is of the form
p(Y ∗|Φ, Σ) = (4)
½ ¾
∗ ∗ 1 0 0 0 0
(2π)−nT /2|Σ|−T /2 exp − tr[Σ−1(Y ∗ Y ∗ − Φ0X ∗ Y ∗ − Y ∗ X ∗Φ + Φ0X ∗ X ∗Φ)] .
2
• Combining (4) with the improper prior p(Φ, Σ) ∝ |Σ|−(n+1)/2 yields
p(Φ, Σ|Y ∗) (5)
½ ¾
∗
− T +n+1 1 0 0 0 0
= c−1
∗ |Σ|
2 − tr[Σ−1(Y ∗ Y ∗ − Φ0X ∗ Y ∗ − Y ∗ X ∗Φ + Φ0X ∗ X ∗Φ)] ,
2
• which can be interpreted as a prior density for Φ and Σ.
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 7
Dummy Observation Priors
• Define
0 0
Φ̂∗ = (X ∗ X ∗)−1X ∗ Y ∗
S ∗ = (Y ∗ − X ∗Φ̂∗)0(Y ∗ − X ∗Φ̂∗).
• It can be verified that the prior p(Φ, Σ|Y ∗) is of the Inverted Wishart-Normal IW −N
form
µ ¶
Σ ∼ IW S ∗, T ∗ − k (6)
µ ¶
0
Φ|Σ ∼ N Φ∗, Σ ⊗ (X ∗ X ∗)−1 . (7)
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 8
Dummy Observation Priors
• The appropriate normalization constant for the prior density is given by
nk ∗
∗0 ∗ − n2 ∗ − T 2−k
c∗ = (2π) |X 2 X | |S | (8)
n
Y
n(T ∗ −k) n(n−1)
2 2 π 4 Γ[(T ∗ − k + 1 − i)/2],
i=1
k is the dimension of xt and Γ[·] denotes the gamma function.
• The implementation of priors through dummy variables is often called mixed estimation
and dates back to Theil and Goldberger (1961).
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 9
Dummy Observation Priors
• Now let’s calculate the posterior ...
• Notice that
p(Φ, Σu|Y ) ∝ p(Y |Φ, Σu)p(Y ∗|Φ, Σu) (9)
• Define:
0 0
Φ̃ = (X ∗ X ∗ + X 0X)−1(X ∗ Y ∗ + X 0Y ) (10)
·
1 0
Σ̃u = ∗ Y ∗ Y ∗ + Y 0Y )
T +T
¸
∗0 ∗0 ∗0
−(X Y ∗ + X 0Y )0(X X ∗ + X 0X)−1(X Y ∗ + X 0Y ) . (11)
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 10
Dummy Observation Priors
• Since prior and likelihood function are conjugate, it is straightforward to show, that
the posterior distribution of Φ and Σ is also of the Inverted Wishart – Normal form:
µ ¶
Σ|Y ∼ IW (T ∗ + T )Σ̃u, T ∗ + T − k (12)
µ ¶
0
Φ|Σ, Y ∼ N Φ̃, Σ ⊗ (X ∗ X ∗ + X 0X)−1 . (13)
• Draws s = 1, . . . , nsim from the posterior can be generated as follows:
(i) Draw Σ(s) from the IW distribution;
(ii) draw Φ(s) from the normal distribution of Φ|Σ(s), Y .
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 11
Dummy Observation Priors
• Finally, we can compute the marginal data density ...
• Suppose that we are using a prior constructed from dummy observations. Then the
marginal data density is given by
R
∗ p(Y, Y ∗|Φ, Σ)dΦdΣ
p(Y |Y ) = R (14)
p(Y ∗|Φ, Σ)dΦdΣ
• The integrals in the numerator and denominator are given by the appropriate modifi-
cation of c∗ defined above:
Z n
− T −k 0 − n − T −k n(n−1) Y
p(Y |Φ, Σ)dΦdΣ = π 2 |X X| 2 |S| 2 π 4 Γ[(T − k + 1 − i)/2], (15)
i=1
where
Φ̂ = (X 0X)−1X 0Y
S = (Y − X Φ̂)0(Y − X Φ̂).
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 12
Dummy Observation Priors – Examples
• Minnesota Prior
• Training Sample Prior
• DSGE Model Prior: DSGE-VAR
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 13
Minnesota Prior
• Reference: Doan, Litterman, and Sims (1984). The version below is described in the
Appendix of Lubik and Schorfheide (Macro Annual, 2005).
• Consider the following Gaussian bivariate VAR(2).
y1,t α1 β11 β12 y1,t−1 γ11 γ12 y1,t−2 u1,t
= + + + (16)
y2,t α2 β21 β22 y2,t−1 γ21 γ22 y2,t−2 u2,t
• Define yt = [y1,t, y2,t]0, xt = [yt−1
0 0
, yt−2 , 1]0, and ut = [u1,t, u2,t]0 and
β11 β21
β12 β22
Φ = γ11 γ21
. (17)
γ12 γ22
α1 α2
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 14
Minnesota Prior
• The VAR can be rewritten as follows
yt0 = x0tΦ + u0t, t = 1, . . . , T, ut ∼ iidN (0, Σu) (18)
or in matrix form
Y = XΦ + U. (19)
• Based on a short pre-sample Y0 (typically the observations used to initialized the lags
of the VAR) one calculates: s = std(Y0) and ȳ = mean(Y0).
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 15
Minnesota Prior
• In addition there are a number of tuning parameters for the prior
– τ is the overall tightness of the prior. Large values imply a small prior covariance
matrix.
– d: the variance for the coefficients of lag h is scaled down by the factor l−2d.
– w: determines the weight for the prior on Σu. Suppose that Zi = N (0, σ 2). Then
P
an estimator for σ 2 is σ̂ 2 = w1 wi=1 Zi2. The larger w, the more informative the
estimator, and in the context of the VAR, the tighter the prior.
– λ and µ: additional tuning parameters.
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 16
Minnesota Prior
The dummy observations can be classified as follows:
• Dummies for the β coefficients:
τ s1 0 τ s1 0 0 0 0 0
= Φ + u
0 τ s2 0 τ s2 0 0 0
The first observation implies, for instance, that
µ ¶
u1 Σu,11
τ s1 = τ s1β11 + u1 =⇒ β11 = 1 − =⇒ β11 ∼ N 1, 2 2
τ s1 τ s1
µ ¶
u2 Σu,22
0 = τ s1β21 + u2 =⇒ β21 = − =⇒ β21 ∼ N 0, 2 2
τ s1 τ s1
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 17
Minnesota Prior
The dummy observations can be classified as follows (continued...):
• Dummies for the γ coefficients:
d
0 0 0 0 τ s1 2 0 0 0
= Φ + u
0 0 0 0 0 τ s22d 0
• The prior for the covariance matrix is implemented by
s1 0 0 0 0 0 0 0
= Φ + u
0 s2 0 0 0 0 0
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 18
Minnesota Prior
The dummy observations can be classified as follows (continued...):
• Co-persistence prior dummy observations, reflecting the belief that when data on all
y’s are stable at their initial levels, thy will tend to persist at that level:
· ¸ · ¸
0
λȳ1 λȳ2 = λȳ1 λȳ2 λȳ1 λȳ2 λ Φ + u
• Own-persistence prior dummy observations, reflecting the belief that when yi has been
stable at its initial level, it will tend to persist at that level, regardless of the value of
other variables:
µȳ1 0 µȳ1 0 µȳ1 0 0 0
= Φ + u
0 µȳ2 0 µȳ2 0 µȳ2 0
Frank Schorfheide, University of Pennsylvania: Bayesian Methods 19
Dummy Observation Priors – Examples
• Training Sample Prior: replace dummy observations by actual observations from a pre-
or training sample.
• DSGE Model Prior: use artificial observations generated by a DSGE model. Details
will follow later.