0% found this document useful (0 votes)
58 views1,399 pages

Quantecon Python Econometria

Uploaded by

benjamim ramos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views1,399 pages

Quantecon Python Econometria

Uploaded by

benjamim ramos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1399

Intermediate Quantitative Economics

with Python

Thomas J. Sargent & John Stachurski

Apr 30, 2024


CONTENTS

I Tools and Techniques 5


1 Modeling COVID 19 7
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 The SIR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Ending Lockdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2 Linear Algebra 17
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Solving Systems of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3 QR Decomposition 41
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4 Some Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.6 Using QR Decomposition to Compute Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.7 𝑄𝑅 and PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4 Circulant Matrices 49
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2 Constructing a Circulant Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3 Connection to Permutation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.4 Examples with Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 Associated Permutation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

5 Singular Value Decomposition (SVD) 65


5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 The Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.3 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.4 Four Fundamental Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.5 Eckart-Young Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

i
5.6 Full and Reduced SVD’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.7 Polar Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.8 Application: Principal Components Analysis (PCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.9 Relationship of PCA to SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.10 PCA with Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.11 Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6 VARs and DMDs 83


6.1 First-Order Vector Autoregressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Dynamic Mode Decomposition (DMD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.3 Representation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.4 Representation 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.5 Representation 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.6 Source for Some Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7 Using Newton’s Method to Solve Economic Models 95


7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
7.2 Fixed Point Computation Using Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.3 Root-Finding in One Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.4 Multivariate Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

II Elementary Statistics 119


8 Elementary Probability with Matrices 121
8.1 Sketch of Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.2 What Does Probability Mean? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.3 Representing Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
8.4 Univariate Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.5 Bivariate Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.6 Marginal Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.7 Conditional Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.8 Statistical Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.9 Means and Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.10 Generating Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
8.11 Some Discrete Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.12 Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.13 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
8.14 A Mixed Discrete-Continuous Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.15 Matrix Representation of Some Bivariate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.16 A Continuous Bivariate Random Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.17 Sum of Two Independently Distributed Random Variables . . . . . . . . . . . . . . . . . . . . . . . . 154
8.18 Transition Probability Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.19 Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
8.20 Copula Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.21 Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

9 LLN and CLT 163


9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
9.2 Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
9.3 LLN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
9.4 CLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

ii
10 Two Meanings of Probability 181
10.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
10.2 Frequentist Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
10.3 Bayesian Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
10.4 Role of a Conjugate Prior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

11 Multivariate Hypergeometric Distribution 199


11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
11.2 The Administrator’s Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
11.3 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

12 Multivariate Normal Distribution 209


12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
12.2 The Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
12.3 Bivariate Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
12.4 Trivariate Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
12.5 One Dimensional Intelligence (IQ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
12.6 Information as Surprise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
12.7 Cholesky Factor Magic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
12.8 Math and Verbal Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
12.9 Univariate Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
12.10 Stochastic Difference Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
12.11 Application to Stock Price Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
12.12 Filtering Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
12.13 Classic Factor Analysis Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
12.14 PCA and Factor Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

13 Fault Tree Uncertainties 247


13.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
13.2 Log normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
13.3 The Convolution Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
13.4 Approximating Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
13.5 Convolving Probability Mass Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
13.6 Failure Tree Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
13.7 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
13.8 Failure Rates Unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
13.9 Waste Hoist Failure Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

14 Introduction to Artificial Neural Networks 263


14.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
14.2 A Deep (but not Wide) Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
14.3 Calibrating Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
14.4 Back Propagation and the Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
14.5 Training Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
14.6 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
14.7 How Deep? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
14.8 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

15 Randomized Response Surveys 275


15.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
15.2 Warner’s Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
15.3 Comparing Two Survey Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
15.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

16 Expected Utilities of Random Responses 285

iii
16.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
16.2 Privacy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
16.3 Zoo of Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
16.4 Respondent’s Expected Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
16.5 Utilitarian View of Survey Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
16.6 Criticisms of Proposed Privacy Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
16.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

III Linear Programming 301


17 Optimal Transport 303
17.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
17.2 The Optimal Transport Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
17.3 The Linear Programming Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
17.4 The Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
17.5 The Python Optimal Transport Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316

18 Von Neumann Growth Model (and a Generalization) 321


18.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
18.2 Model Ingredients and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
18.3 Dynamic Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
18.4 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
18.5 Interpretation as Two-player Zero-sum Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331

IV Introduction to Dynamics 337


19 Finite Markov Chains 339
19.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
19.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
19.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
19.4 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
19.5 Irreducibility and Aperiodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
19.6 Stationary Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
19.7 Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
19.8 Computing Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
19.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355

20 Inventory Dynamics 363


20.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
20.2 Sample Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
20.3 Marginal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
20.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369

21 Linear State Space Models 373


21.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
21.2 The Linear State Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
21.3 Distributions and Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
21.4 Stationarity and Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
21.5 Noisy Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
21.6 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391
21.7 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
21.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393

iv
22 Samuelson Multiplier-Accelerator 395
22.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
22.2 Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
22.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
22.4 Stochastic Shocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
22.5 Government Spending . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
22.6 Wrapping Everything Into a Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
22.7 Using the LinearStateSpace Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
22.8 Pure Multiplier Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
22.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432

23 Kesten Processes and Firm Dynamics 433


23.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
23.2 Kesten Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
23.3 Heavy Tails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
23.4 Application: Firm Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
23.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440

24 Wealth Distribution Dynamics 445


24.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
24.2 Lorenz Curves and the Gini Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
24.3 A Model of Wealth Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
24.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
24.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
24.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456

25 A First Look at the Kalman Filter 459


25.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
25.2 The Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
25.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
25.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
25.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469

26 Another Look at the Kalman Filter 479


26.1 A worker’s output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
26.2 A firm’s wage-setting policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480
26.3 A state-space representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
26.4 An Innovations Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
26.5 Some Computational Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
26.6 Future Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494

V Search 495
27 Job Search I: The McCall Search Model 497
27.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
27.2 The McCall Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
27.3 Computing the Optimal Policy: Take 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
27.4 Computing an Optimal Policy: Take 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
27.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507

28 Job Search II: Search and Separation 513


28.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
28.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
28.3 Solving the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515

v
28.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
28.5 Impact of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
28.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521

29 Job Search III: Fitted Value Function Iteration 525


29.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
29.2 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526
29.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
29.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530

30 Job Search IV: Correlated Wage Offers 535


30.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
30.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
30.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537
30.4 Unemployment Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
30.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543

31 Job Search V: Modeling Career Choice 545


31.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
31.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
31.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
31.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553

32 Job Search VI: On-the-Job Search 559


32.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
32.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
32.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
32.4 Solving for Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
32.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567

33 Job Search VII: A McCall Worker Q-Learns 571


33.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
33.2 Review of McCall Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
33.3 Implied Quality Function 𝑄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
33.4 From Probabilities to Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
33.5 Q-Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
33.6 Employed Worker Can’t Quit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
33.7 Possible Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587

VI Consumption, Savings and Capital 589


34 Cass-Koopmans Model 591
34.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
34.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592
34.3 Planning Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
34.4 Shooting Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
34.5 Setting Initial Capital to Steady State Capital . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601
34.6 A Turnpike Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
34.7 A Limiting Infinite Horizon Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604
34.8 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607

35 Cass-Koopmans Competitive Equilibrium 609


35.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
35.2 Review of Cass-Koopmans Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610

vi
35.3 Competitive Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
35.4 Market Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612
35.5 Firm Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612
35.6 Household Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613
35.7 Computing a Competitive Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
35.8 Yield Curves and Hicks-Arrow Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623

36 Cake Eating I: Introduction to Optimal Saving 625


36.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
36.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
36.3 The Value Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627
36.4 The Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629
36.5 The Euler Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630
36.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632

37 Cake Eating II: Numerical Methods 635


37.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635
37.2 Reviewing the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
37.3 Value Function Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636
37.4 Time Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
37.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645

38 Optimal Growth I: The Stochastic Optimal Growth Model 651


38.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651
38.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
38.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656
38.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664

39 Optimal Growth II: Accelerating the Code with Numba 667


39.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
39.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
39.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
39.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673

40 Optimal Growth III: Time Iteration 679


40.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
40.2 The Euler Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 680
40.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 682
40.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688

41 Optimal Growth IV: The Endogenous Grid Method 691


41.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 691
41.2 Key Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692
41.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693

42 The Income Fluctuation Problem I: Basic Model 699


42.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699
42.2 The Optimal Savings Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
42.3 Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702
42.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
42.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708

43 The Income Fluctuation Problem II: Stochastic Returns on Assets 717


43.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 717
43.2 The Savings Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718

vii
43.3 Solution Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 719
43.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
43.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 726

VII Bayes Law 731


44 Non-Conjugate Priors 733
44.1 Unleashing MCMC on a Binomial Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
44.2 Prior Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
44.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 740
44.4 Alternative Prior Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745
44.5 Posteriors Via MCMC and VI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
44.6 Non-conjugate Prior Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757

45 Posterior Distributions for AR(1) Parameters 779


45.1 PyMC Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782
45.2 Numpyro Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785

46 Forecasting an AR(1) Process 789


46.1 A Univariate First-Order Autoregressive Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
46.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 791
46.3 Predictive Distributions of Path Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792
46.4 A Wecker-Like Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793
46.5 Using Simulations to Approximate a Posterior Distribution . . . . . . . . . . . . . . . . . . . . . . . 794
46.6 Calculating Sample Path Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795
46.7 Original Wecker Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796
46.8 Extended Wecker Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 798
46.9 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 801

VIII Information 805


47 Job Search VII: Search with Learning 807
47.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807
47.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 808
47.3 Take 1: Solution by VFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 811
47.4 Take 2: A More Efficient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816
47.5 Another Functional Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
47.6 Solving the RWFE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
47.7 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818
47.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819
47.9 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 819
47.10 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 821
47.11 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 823
47.12 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827

48 Likelihood Ratio Processes 839


48.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
48.2 Likelihood Ratio Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 840
48.3 Nature Permanently Draws from Density g . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841
48.4 Peculiar Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843
48.5 Nature Permanently Draws from Density f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844
48.6 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845
48.7 Kullback–Leibler Divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 850

viii
48.8 Sequels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853

49 Computing Mean of a Likelihood Ratio Process 855


49.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855
49.2 Mathematical Expectation of Likelihood Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
49.3 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858
49.4 Selecting a Sampling Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 859
49.5 Approximating a cumulative likelihood ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 860
49.6 Distribution of Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861
49.7 More Thoughts about Choice of Sampling Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 863

50 A Problem that Stumped Milton Friedman 869


50.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 869
50.2 Origin of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 870
50.3 A Dynamic Programming Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 871
50.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876
50.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 878
50.6 Comparison with Neyman-Pearson Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884
50.7 Sequels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886

51 Exchangeability and Bayesian Updating 887


51.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
51.2 Independently and Identically Distributed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
51.3 A Setting in Which Past Observations Are Informative . . . . . . . . . . . . . . . . . . . . . . . . . 889
51.4 Relationship Between IID and Exchangeable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 890
51.5 Exchangeability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891
51.6 Bayes’ Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 891
51.7 More Details about Bayesian Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892
51.8 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895
51.9 Sequels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 901

52 Likelihood Ratio Processes and Bayesian Learning 903


52.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 903
52.2 The Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904
52.3 Likelihood Ratio Process and Bayes’ Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 905
52.4 Behavior of posterior probability {𝜋𝑡 } under the subjective probability distribution . . . . . . . . . . . 909
52.5 Initial Prior is Verified by Paths Drawn from Subjective Conditional Densities . . . . . . . . . . . . . . 915
52.6 Drilling Down a Little Bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916
52.7 Sequels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 917

53 Incorrect Models 919


53.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 919
53.2 Sampling from Compound Lottery 𝐻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 922
53.3 Type 1 Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924
53.4 What a type 1 Agent Learns when Mixture 𝐻 Generates Data . . . . . . . . . . . . . . . . . . . . . . 925
53.5 Kullback-Leibler Divergence Governs Limit of 𝜋𝑡 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
53.6 Type 2 Agent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 931
53.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933

54 Bayesian versus Frequentist Decision Rules 935


54.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935
54.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 936
54.3 Frequentist Decision Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 939
54.4 Bayesian Decision Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945
54.5 Was the Navy Captain’s Hunch Correct? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 952

ix
54.6 More Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954
54.7 Distribution of Bayesian Decision Rule’s Time to Decide . . . . . . . . . . . . . . . . . . . . . . . . 954
54.8 Probability of Making Correct Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 958
54.9 Distribution of Likelihood Ratios at Frequentist’s 𝑡 . . . . . . . . . . . . . . . . . . . . . . . . . . . 960

IX LQ Control 963
55 LQ Control: Foundations 965
55.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 965
55.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 966
55.3 Optimality – Finite Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 968
55.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 971
55.5 Extensions and Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 976
55.6 Further Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 978
55.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986

56 Lagrangian for LQ Control 995


56.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995
56.2 Undiscounted LQ DP Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 996
56.3 Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 997
56.4 State-Costate Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998
56.5 Reciprocal Pairs Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998
56.6 Schur decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 999
56.7 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1000
56.8 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1005
56.9 Discounted Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1006

57 Eliminating Cross Products 1009


57.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1009
57.2 Undiscounted Dynamic Programming Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1009
57.3 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1010
57.4 Duality table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1011

58 The Permanent Income Model 1013


58.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1013
58.2 The Savings Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014
58.3 Alternative Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1021
58.4 Two Classic Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024
58.5 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1027
58.6 Appendix: The Euler Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028

59 Permanent Income II: LQ Techniques 1029


59.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1029
59.2 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1030
59.3 The LQ Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1032
59.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1033
59.5 Two Example Economies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1036

60 Production Smoothing via Inventories 1049


60.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1049
60.2 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1054
60.3 Inventories Not Useful . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1056
60.4 Inventories Useful but are Hardwired to be Zero Always . . . . . . . . . . . . . . . . . . . . . . . . . 1056
60.5 Example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057

x
60.6 Example 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1058
60.7 Example 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059
60.8 Example 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1061
60.9 Example 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1062
60.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065

X Multiple Agent Models 1071


61 A Lake Model of Employment and Unemployment 1073
61.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1073
61.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074
61.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1076
61.4 Dynamics of an Individual Worker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1081
61.5 Endogenous Job Finding Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1083
61.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1090

62 Rational Expectations Equilibrium 1101


62.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1101
62.2 Rational Expectations Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104
62.3 Computing an Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1107
62.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1109

63 Stability in Linear Rational Expectations Models 1115


63.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116
63.2 Linear Difference Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1116
63.3 Illustration: Cagan’s Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1118
63.4 Some Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1120
63.5 Alternative Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1122
63.6 Another Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124
63.7 Log money Supply Feeds Back on Log Price Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1126
63.8 Big 𝑃 , Little 𝑝 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1130
63.9 Fun with SymPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1132

64 Markov Perfect Equilibrium 1135


64.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135
64.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136
64.3 Linear Markov Perfect Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137
64.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1139
64.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1144

65 Uncertainty Traps 1153


65.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1153
65.2 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154
65.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1157
65.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1158
65.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1159

66 The Aiyagari Model 1167


66.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1167
66.2 The Economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1168
66.3 Firms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1169
66.4 Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1170

xi
XI Asset Pricing and Finance 1177
67 Asset Pricing: Finite State Models 1179
67.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1179
67.2 Pricing Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1180
67.3 Prices in the Risk-Neutral Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1181
67.4 Risk Aversion and Asset Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185
67.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1194

68 Competitive Equilibria with Arrow Securities 1199


68.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199
68.2 The setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1200
68.3 Recursive Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1201
68.4 State Variable Degeneracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1202
68.5 Markov Asset Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1202
68.6 General Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1204
68.7 Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1208
68.8 Finite Horizon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1219

69 Heterogeneous Beliefs and Bubbles 1225


69.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225
69.2 Structure of the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1226
69.3 Solving the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1228
69.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1233

XII Data and Empirics 1237


70 Pandas for Panel Data 1239
70.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1239
70.2 Slicing and Reshaping Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1240
70.3 Merging Dataframes and Filling NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1245
70.4 Grouping and Summarizing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1250
70.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256
70.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1257

71 Linear Regression in Python 1261


71.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1261
71.2 Simple Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1262
71.3 Extending the Linear Regression Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1267
71.4 Endogeneity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1269
71.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1273
71.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1273

72 Maximum Likelihood Estimation 1277


72.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1277
72.2 Set Up and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1278
72.3 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1281
72.4 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1283
72.5 MLE with Numerical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1285
72.6 Maximum Likelihood Estimation with statsmodels . . . . . . . . . . . . . . . . . . . . . . . . . 1290
72.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1294
72.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295

xii
XIII Auctions 1299
73 First-Price and Second-Price Auctions 1301
73.1 First-Price Sealed-Bid Auction (FPSB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1301
73.2 Second-Price Sealed-Bid Auction (SPSB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1302
73.3 Characterization of SPSB Auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1302
73.4 Uniform Distribution of Private Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1303
73.5 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1303
73.6 First price sealed bid auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1303
73.7 Second Price Sealed Bid Auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1304
73.8 Python Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1304
73.9 Revenue Equivalence Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1306
73.10 Calculation of Bid Price in FPSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1308
73.11 𝜒2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1309
73.12 5 Code Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1312
73.13 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1317

74 Multiple Good Allocation Mechanisms 1319


74.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319
74.2 Ascending Bids Auction for Multiple Goods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1319
74.3 A Benevolent Planner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1320
74.4 Equivalence of Allocations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1320
74.5 Ascending Bid Auction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1320
74.6 Pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1321
74.7 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1323
74.8 A Python Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1331
74.9 Robustness Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1340
74.10 A Groves-Clarke Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1352
74.11 An Example Solved by Hand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1353
74.12 Another Python Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1356

XIV Other 1363


75 Troubleshooting 1365
75.1 Fixing Your Local Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1365
75.2 Reporting an Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1366

76 References 1367

77 Execution Statistics 1369

Bibliography 1373

Index 1381

xiii
xiv
Intermediate Quantitative Economics with Python

This website presents a set of lectures on quantitative economic modeling.


• Tools and Techniques
– Modeling COVID 19
– Linear Algebra
– QR Decomposition
– Circulant Matrices
– Singular Value Decomposition (SVD)
– VARs and DMDs
– Using Newton’s Method to Solve Economic Models
• Elementary Statistics
– Elementary Probability with Matrices
– LLN and CLT
– Two Meanings of Probability
– Multivariate Hypergeometric Distribution
– Multivariate Normal Distribution
– Fault Tree Uncertainties
– Introduction to Artificial Neural Networks
– Randomized Response Surveys
– Expected Utilities of Random Responses
• Linear Programming
– Optimal Transport
– Von Neumann Growth Model (and a Generalization)
• Introduction to Dynamics
– Finite Markov Chains
– Inventory Dynamics
– Linear State Space Models
– Samuelson Multiplier-Accelerator
– Kesten Processes and Firm Dynamics
– Wealth Distribution Dynamics
– A First Look at the Kalman Filter
– Another Look at the Kalman Filter
• Search
– Job Search I: The McCall Search Model
– Job Search II: Search and Separation
– Job Search III: Fitted Value Function Iteration
– Job Search IV: Correlated Wage Offers

CONTENTS 1
Intermediate Quantitative Economics with Python

– Job Search V: Modeling Career Choice


– Job Search VI: On-the-Job Search
– Job Search VII: A McCall Worker Q-Learns
• Consumption, Savings and Capital
– Cass-Koopmans Model
– Cass-Koopmans Competitive Equilibrium
– Cake Eating I: Introduction to Optimal Saving
– Cake Eating II: Numerical Methods
– Optimal Growth I: The Stochastic Optimal Growth Model
– Optimal Growth II: Accelerating the Code with Numba
– Optimal Growth III: Time Iteration
– Optimal Growth IV: The Endogenous Grid Method
– The Income Fluctuation Problem I: Basic Model
– The Income Fluctuation Problem II: Stochastic Returns on Assets
• Bayes Law
– Non-Conjugate Priors
– Posterior Distributions for AR(1) Parameters
– Forecasting an AR(1) Process
• Information
– Job Search VII: Search with Learning
– Likelihood Ratio Processes
– Computing Mean of a Likelihood Ratio Process
– A Problem that Stumped Milton Friedman
– Exchangeability and Bayesian Updating
– Likelihood Ratio Processes and Bayesian Learning
– Incorrect Models
– Bayesian versus Frequentist Decision Rules
• LQ Control
– LQ Control: Foundations
– Lagrangian for LQ Control
– Eliminating Cross Products
– The Permanent Income Model
– Permanent Income II: LQ Techniques
– Production Smoothing via Inventories
• Multiple Agent Models
– A Lake Model of Employment and Unemployment

2 CONTENTS
Intermediate Quantitative Economics with Python

– Rational Expectations Equilibrium


– Stability in Linear Rational Expectations Models
– Markov Perfect Equilibrium
– Uncertainty Traps
– The Aiyagari Model
• Asset Pricing and Finance
– Asset Pricing: Finite State Models
– Competitive Equilibria with Arrow Securities
– Heterogeneous Beliefs and Bubbles
• Data and Empirics
– Pandas for Panel Data
– Linear Regression in Python
– Maximum Likelihood Estimation
• Auctions
– First-Price and Second-Price Auctions
– Multiple Good Allocation Mechanisms
• Other
– Troubleshooting
– References
– Execution Statistics

CONTENTS 3
Intermediate Quantitative Economics with Python

4 CONTENTS
Part I

Tools and Techniques

5
CHAPTER

ONE

MODELING COVID 19

Contents

• Modeling COVID 19
– Overview
– The SIR Model
– Implementation
– Experiments
– Ending Lockdown

1.1 Overview

This is a Python version of the code for analyzing the COVID-19 pandemic provided by Andrew Atkeson.
See, in particular
• NBER Working Paper No. 26867
• COVID-19 Working papers and code
The purpose of his notes is to introduce economists to quantitative modeling of infectious disease dynamics.
Dynamics are modeled using a standard SIR (Susceptible-Infected-Removed) model of disease spread.
The model dynamics are represented by a system of ordinary differential equations.
The main objective is to study the impact of suppression through social distancing on the spread of the infection.
The focus is on US outcomes but the parameters can be adjusted to study other countries.
We will use the following standard imports:

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from numpy import exp

We will also use SciPy’s numerical routine odeint for solving differential equations.

7
Intermediate Quantitative Economics with Python

from scipy.integrate import odeint

This routine calls into compiled code from the FORTRAN library odepack.

1.2 The SIR Model

In the version of the SIR model we will analyze there are four states.
All individuals in the population are assumed to be in one of these four states.
The states are: susceptible (S), exposed (E), infected (I) and removed ®.
Comments:
• Those in state R have been infected and either recovered or died.
• Those who have recovered are assumed to have acquired immunity.
• Those in the exposed group are not yet infectious.

1.2.1 Time Path

The flow across states follows the path 𝑆 → 𝐸 → 𝐼 → 𝑅.


All individuals in the population are eventually infected when the transmission rate is positive and 𝑖(0) > 0.
The interest is primarily in
• the number of infections at a given time (which determines whether or not the health care system is overwhelmed)
and
• how long the caseload can be deferred (hopefully until a vaccine arrives)
Using lower case letters for the fraction of the population in each state, the dynamics are

𝑠(𝑡)
̇ = −𝛽(𝑡) 𝑠(𝑡) 𝑖(𝑡)
𝑒(𝑡)
̇ = 𝛽(𝑡) 𝑠(𝑡) 𝑖(𝑡) − 𝜎𝑒(𝑡) (1.1)
̇ = 𝜎𝑒(𝑡) − 𝛾𝑖(𝑡)
𝑖(𝑡)

In these equations,
• 𝛽(𝑡) is called the transmission rate (the rate at which individuals bump into others and expose them to the virus).
• 𝜎 is called the infection rate (the rate at which those who are exposed become infected)
• 𝛾 is called the recovery rate (the rate at which infected people recover or die).
• the dot symbol 𝑦 ̇ represents the time derivative 𝑑𝑦/𝑑𝑡.
We do not need to model the fraction 𝑟 of the population in state 𝑅 separately because the states form a partition.
In particular, the “removed” fraction of the population is 𝑟 = 1 − 𝑠 − 𝑒 − 𝑖.
We will also track 𝑐 = 𝑖 + 𝑟, which is the cumulative caseload (i.e., all those who have or have had the infection).
The system (1.1) can be written in vector form as

𝑥̇ = 𝐹 (𝑥, 𝑡), 𝑥 ∶= (𝑠, 𝑒, 𝑖) (1.2)

for suitable definition of 𝐹 (see the code below).

8 Chapter 1. Modeling COVID 19


Intermediate Quantitative Economics with Python

1.2.2 Parameters

Both 𝜎 and 𝛾 are thought of as fixed, biologically determined parameters.


As in Atkeson’s note, we set
• 𝜎 = 1/5.2 to reflect an average incubation period of 5.2 days.
• 𝛾 = 1/18 to match an average illness duration of 18 days.
The transmission rate is modeled as
• 𝛽(𝑡) ∶= 𝑅(𝑡)𝛾 where 𝑅(𝑡) is the effective reproduction number at time 𝑡.
(The notation is slightly confusing, since 𝑅(𝑡) is different to 𝑅, the symbol that represents the removed state.)

1.3 Implementation

First we set the population size to match the US.

pop_size = 3.3e8

Next we fix parameters as described above.

γ = 1 / 18
σ = 1 / 5.2

Now we construct a function that represents 𝐹 in (1.2)

def F(x, t, R0=1.6):


"""
Time derivative of the state vector.

* x is the state vector (array_like)


* t is time (scalar)
* R0 is the effective transmission rate, defaulting to a constant

"""
s, e, i = x

# New exposure of susceptibles


β = R0(t) * γ if callable(R0) else R0 * γ
ne = β * s * i

# Time derivatives
ds = - ne
de = ne - σ * e
di = σ * e - γ * i

return ds, de, di

Note that R0 can be either constant or a given function of time.


The initial conditions are set to

1.3. Implementation 9
Intermediate Quantitative Economics with Python

# initial conditions of s, e, i
i_0 = 1e-7
e_0 = 4 * i_0
s_0 = 1 - i_0 - e_0

In vector form the initial condition is

x_0 = s_0, e_0, i_0

We solve for the time path numerically using odeint, at a sequence of dates t_vec.

def solve_path(R0, t_vec, x_init=x_0):


"""
Solve for i(t) and c(t) via numerical integration,
given the time path for R0.

"""
G = lambda x, t: F(x, t, R0)
s_path, e_path, i_path = odeint(G, x_init, t_vec).transpose()

c_path = 1 - s_path - e_path # cumulative cases


return i_path, c_path

1.4 Experiments

Let’s run some experiments using this code.


The time period we investigate will be 550 days, or around 18 months:

t_length = 550
grid_size = 1000
t_vec = np.linspace(0, t_length, grid_size)

1.4.1 Experiment 1: Constant R0 Case

Let’s start with the case where R0 is constant.


We calculate the time path of infected people under different assumptions for R0:

R0_vals = np.linspace(1.6, 3.0, 6)


labels = [f'$R0 = {r:.2f}$' for r in R0_vals]
i_paths, c_paths = [], []

for r in R0_vals:
i_path, c_path = solve_path(r, t_vec)
i_paths.append(i_path)
c_paths.append(c_path)

Here’s some code to plot the time paths.

10 Chapter 1. Modeling COVID 19


Intermediate Quantitative Economics with Python

def plot_paths(paths, labels, times=t_vec):

fig, ax = plt.subplots()

for path, label in zip(paths, labels):


ax.plot(times, path, label=label)

ax.legend(loc='upper left')

plt.show()

Let’s plot current cases as a fraction of the population.

plot_paths(i_paths, labels)

As expected, lower effective transmission rates defer the peak of infections.


They also lead to a lower peak in current cases.
Here are cumulative cases, as a fraction of population:

plot_paths(c_paths, labels)

1.4. Experiments 11
Intermediate Quantitative Economics with Python

1.4.2 Experiment 2: Changing Mitigation

Let’s look at a scenario where mitigation (e.g., social distancing) is successively imposed.
Here’s a specification for R0 as a function of time.

def R0_mitigating(t, r0=3, η=1, r_bar=1.6):


R0 = r0 * exp(- η * t) + (1 - exp(- η * t)) * r_bar
return R0

The idea is that R0 starts off at 3 and falls to 1.6.


This is due to progressive adoption of stricter mitigation measures.
The parameter η controls the rate, or the speed at which restrictions are imposed.
We consider several different rates:

η_vals = 1/5, 1/10, 1/20, 1/50, 1/100


labels = [fr'$\eta = {η:.2f}$' for η in η_vals]

This is what the time path of R0 looks like at these alternative rates:

fig, ax = plt.subplots()

for η, label in zip(η_vals, labels):


ax.plot(t_vec, R0_mitigating(t_vec, η=η), label=label)

ax.legend()
plt.show()

12 Chapter 1. Modeling COVID 19


Intermediate Quantitative Economics with Python

Let’s calculate the time path of infected people:

i_paths, c_paths = [], []

for η in η_vals:
R0 = lambda t: R0_mitigating(t, η=η)
i_path, c_path = solve_path(R0, t_vec)
i_paths.append(i_path)
c_paths.append(c_path)

These are current cases under the different scenarios:

plot_paths(i_paths, labels)

Here are cumulative cases, as a fraction of population:

plot_paths(c_paths, labels)

1.4. Experiments 13
Intermediate Quantitative Economics with Python

1.5 Ending Lockdown

The following replicates additional results by Andrew Atkeson on the timing of lifting lockdown.
Consider these two mitigation scenarios:
1. 𝑅𝑡 = 0.5 for 30 days and then 𝑅𝑡 = 2 for the remaining 17 months. This corresponds to lifting lockdown in 30
days.
2. 𝑅𝑡 = 0.5 for 120 days and then 𝑅𝑡 = 2 for the remaining 14 months. This corresponds to lifting lockdown in 4
months.
The parameters considered here start the model with 25,000 active infections and 75,000 agents already exposed to the
virus and thus soon to be contagious.

# initial conditions
i_0 = 25_000 / pop_size
e_0 = 75_000 / pop_size
s_0 = 1 - i_0 - e_0
x_0 = s_0, e_0, i_0

Let’s calculate the paths:

R0_paths = (lambda t: 0.5 if t < 30 else 2,


lambda t: 0.5 if t < 120 else 2)

labels = [f'scenario {i}' for i in (1, 2)]

i_paths, c_paths = [], []

for R0 in R0_paths:
i_path, c_path = solve_path(R0, t_vec, x_init=x_0)
i_paths.append(i_path)
c_paths.append(c_path)

Here is the number of active infections:

14 Chapter 1. Modeling COVID 19


Intermediate Quantitative Economics with Python

plot_paths(i_paths, labels)

What kind of mortality can we expect under these scenarios?


Suppose that 1% of cases result in death

ν = 0.01

This is the cumulative number of deaths:

paths = [path * ν * pop_size for path in c_paths]


plot_paths(paths, labels)

This is the daily death rate:

1.5. Ending Lockdown 15


Intermediate Quantitative Economics with Python

paths = [path * ν * γ * pop_size for path in i_paths]


plot_paths(paths, labels)

Pushing the peak of curve further into the future may reduce cumulative deaths if a vaccine is found.

16 Chapter 1. Modeling COVID 19


CHAPTER

TWO

LINEAR ALGEBRA

Contents

• Linear Algebra
– Overview
– Vectors
– Matrices
– Solving Systems of Equations
– Eigenvalues and Eigenvectors
– Further Topics
– Exercises

2.1 Overview

Linear algebra is one of the most useful branches of applied mathematics for economists to invest in.
For example, many applied problems in economics and finance require the solution of a linear system of equations, such
as
𝑦1 = 𝑎𝑥1 + 𝑏𝑥2
𝑦2 = 𝑐𝑥1 + 𝑑𝑥2

or, more generally,

𝑦1 = 𝑎11 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑘 𝑥𝑘


⋮ (2.1)
𝑦𝑛 = 𝑎𝑛1 𝑥1 + 𝑎𝑛2 𝑥2 + ⋯ + 𝑎𝑛𝑘 𝑥𝑘

The objective here is to solve for the “unknowns” 𝑥1 , … , 𝑥𝑘 given 𝑎11 , … , 𝑎𝑛𝑘 and 𝑦1 , … , 𝑦𝑛 .
When considering such problems, it is essential that we first consider at least some of the following questions
• Does a solution actually exist?
• Are there in fact many solutions, and if so how should we interpret them?
• If no solution exists, is there a best “approximate” solution?
• If a solution exists, how should we compute it?

17
Intermediate Quantitative Economics with Python

These are the kinds of topics addressed by linear algebra.


In this lecture we will cover the basics of linear and matrix algebra, treating both theory and computation.
We admit some overlap with this lecture, where operations on NumPy arrays were first explained.
Note that this lecture is more theoretical than most, and contains background material that will be used in applications as
we go along.
Let’s start with some imports:

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
from scipy.linalg import inv, solve, det, eig

2.2 Vectors

A vector of length 𝑛 is just a sequence (or array, or tuple) of 𝑛 numbers, which we write as 𝑥 = (𝑥1 , … , 𝑥𝑛 ) or 𝑥 =
[𝑥1 , … , 𝑥𝑛 ].
We will write these sequences either horizontally or vertically as we please.
(Later, when we wish to perform certain matrix operations, it will become necessary to distinguish between the two)
The set of all 𝑛-vectors is denoted by ℝ𝑛 .
For example, ℝ2 is the plane, and a vector in ℝ2 is just a point in the plane.
Traditionally, vectors are represented visually as arrows from the origin to the point.
The following figure represents three vectors in this manner

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(-5, 5), ylim=(-5, 5))


ax.grid()
vecs = ((2, 4), (-3, 3), (-4, -3.5))
for v in vecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.7,
width=0.5))
ax.text(1.1 * v[0], 1.1 * v[1], str(v))
plt.show()

18 Chapter 2. Linear Algebra


Intermediate Quantitative Economics with Python

2.2.1 Vector Operations

The two most common operators for vectors are addition and scalar multiplication, which we now describe.
As a matter of definition, when we add two vectors, we add them element-by-element

𝑥1 𝑦1 𝑥1 + 𝑦1
⎡𝑥 ⎤ ⎡𝑦 ⎤ ⎡𝑥 + 𝑦 ⎤
𝑥 + 𝑦 = ⎢ ⎥ + ⎢ ⎥ ∶= ⎢ 2
2 2 2⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
𝑥 𝑦
⎣ 𝑛⎦ ⎣ 𝑛⎦ 𝑥
⎣ 𝑛 + 𝑦 𝑛⎦

Scalar multiplication is an operation that takes a number 𝛾 and a vector 𝑥 and produces

𝛾𝑥1
⎡ 𝛾𝑥 ⎤
𝛾𝑥 ∶= ⎢ 2 ⎥
⎢ ⋮ ⎥
⎣𝛾𝑥𝑛 ⎦
Scalar multiplication is illustrated in the next figure

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
for spine in ['left', 'bottom']:
(continues on next page)

2.2. Vectors 19
Intermediate Quantitative Economics with Python

(continued from previous page)


ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(-5, 5), ylim=(-5, 5))


x = (2, 2)
ax.annotate('', xy=x, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=1,
width=0.5))
ax.text(x[0] + 0.4, x[1] - 0.2, '$x$', fontsize='16')

scalars = (-2, 2)
x = np.array(x)

for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] - 0.2, f'${s} x$', fontsize='16')
plt.show()

20 Chapter 2. Linear Algebra


Intermediate Quantitative Economics with Python

In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is more commonly represented
as a NumPy array.
One advantage of NumPy arrays is that scalar multiplication and addition have very natural syntax

x = np.ones(3) # Vector of three ones


y = np.array((2, 4, 6)) # Converts tuple (2, 4, 6) into array
x + y

array([3., 5., 7.])

4 * x

array([4., 4., 4.])

2.2. Vectors 21
Intermediate Quantitative Economics with Python

2.2.2 Inner Product and Norm

The inner product of vectors 𝑥, 𝑦 ∈ ℝ𝑛 is defined as


𝑛
𝑥′ 𝑦 ∶= ∑ 𝑥𝑖 𝑦𝑖
𝑖=1

Two vectors are called orthogonal if their inner product is zero.


The norm of a vector 𝑥 represents its “length” (i.e., its distance from the zero vector) and is defined as
1/2
√ 𝑛
‖𝑥‖ ∶= 𝑥′ 𝑥 ∶= (∑ 𝑥2𝑖 )
𝑖=1

The expression ‖𝑥 − 𝑦‖ is thought of as the distance between 𝑥 and 𝑦.


Continuing on from the previous example, the inner product and norm can be computed as follows

np.sum(x * y) # Inner product of x and y

12.0

np.sqrt(np.sum(x**2)) # Norm of x, take one

1.7320508075688772

np.linalg.norm(x) # Norm of x, take two

1.7320508075688772

2.2.3 Span

Given a set of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 , it’s natural to think about the new vectors we can create by performing
linear operations.
New vectors created in this manner are called linear combinations of 𝐴.
In particular, 𝑦 ∈ ℝ𝑛 is a linear combination of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } if

𝑦 = 𝛽1 𝑎1 + ⋯ + 𝛽𝑘 𝑎𝑘 for some scalars 𝛽1 , … , 𝛽𝑘

In this context, the values 𝛽1 , … , 𝛽𝑘 are called the coefficients of the linear combination.
The set of linear combinations of 𝐴 is called the span of 𝐴.
The next figure shows the span of 𝐴 = {𝑎1 , 𝑎2 } in ℝ3 .
The span is a two-dimensional plane passing through these two points and the origin.

ax = plt.figure(figsize=(10, 8)).add_subplot(projection='3d')

x_min, x_max = -5, 5


y_min, y_max = -5, 5
(continues on next page)

22 Chapter 2. Linear Algebra


Intermediate Quantitative Economics with Python

(continued from previous page)

α, β = 0.2, 0.1

ax.set(xlim=(x_min, x_max), ylim=(x_min, x_max), zlim=(x_min, x_max),


xticks=(0,), yticks=(0,), zticks=(0,))

gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k-', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k-', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k-', lw=2, alpha=0.5)

# Fixed linear function, to generate a plane


def f(x, y):
return α * x + β * y

# Vector locations, by coordinate


x_coords = np.array((3, 3))
y_coords = np.array((4, -4))
z = f(x_coords, y_coords)
for i in (0, 1):
ax.text(x_coords[i], y_coords[i], z[i], f'$a_{i+1}$', fontsize=14)

# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b-', lw=1.5, alpha=0.6)

# Draw the plane


grid_size = 20
xr2 = np.linspace(x_min, x_max, grid_size)
yr2 = np.linspace(y_min, y_max, grid_size)
x2, y2 = np.meshgrid(xr2, yr2)
z2 = f(x2, y2)
ax.plot_surface(x2, y2, z2, rstride=1, cstride=1, cmap=cm.jet,
linewidth=0, antialiased=True, alpha=0.2)
plt.show()

2.2. Vectors 23
Intermediate Quantitative Economics with Python

Examples

If 𝐴 contains only one vector 𝑎1 ∈ ℝ2 , then its span is just the scalar multiples of 𝑎1 , which is the unique line passing
through both 𝑎1 and the origin.
If 𝐴 = {𝑒1 , 𝑒2 , 𝑒3 } consists of the canonical basis vectors of ℝ3 , that is

1 0 0
𝑒1 ∶= ⎡ ⎤
⎢0⎥ , 𝑒2 ∶= ⎡ ⎤
⎢1⎥ , 𝑒3 ∶= ⎡
⎢0⎥

⎣0⎦ ⎣0⎦ ⎣1⎦

then the span of 𝐴 is all of ℝ3 , because, for any 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) ∈ ℝ3 , we can write

𝑥 = 𝑥1 𝑒1 + 𝑥2 𝑒2 + 𝑥3 𝑒3

Now consider 𝐴0 = {𝑒1 , 𝑒2 , 𝑒1 + 𝑒2 }.

24 Chapter 2. Linear Algebra


Intermediate Quantitative Economics with Python

If 𝑦 = (𝑦1 , 𝑦2 , 𝑦3 ) is any linear combination of these vectors, then 𝑦3 = 0 (check it).


Hence 𝐴0 fails to span all of ℝ3 .

2.2.4 Linear Independence

As we’ll see, it’s often desirable to find families of vectors with relatively large span, so that many vectors can be described
by linear operators on a few vectors.
The condition we need for a set of vectors to have a large span is what’s called linear independence.
In particular, a collection of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in ℝ𝑛 is said to be
• linearly dependent if some strict subset of 𝐴 has the same span as 𝐴.
• linearly independent if it is not linearly dependent.
Put differently, a set of vectors is linearly independent if no vector is redundant to the span and linearly dependent
otherwise.
To illustrate the idea, recall the figure that showed the span of vectors {𝑎1 , 𝑎2 } in ℝ3 as a plane through the origin.
If we take a third vector 𝑎3 and form the set {𝑎1 , 𝑎2 , 𝑎3 }, this set will be
• linearly dependent if 𝑎3 lies in the plane
• linearly independent otherwise
As another illustration of the concept, since ℝ𝑛 can be spanned by 𝑛 vectors (see the discussion of canonical basis vectors
above), any collection of 𝑚 > 𝑛 vectors in ℝ𝑛 must be linearly dependent.
The following statements are equivalent to linear independence of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ ℝ𝑛
1. No vector in 𝐴 can be formed as a linear combination of the other elements.
2. If 𝛽1 𝑎1 + ⋯ 𝛽𝑘 𝑎𝑘 = 0 for scalars 𝛽1 , … , 𝛽𝑘 , then 𝛽1 = ⋯ = 𝛽𝑘 = 0.
(The zero in the first expression is the origin of ℝ𝑛 )

2.2.5 Unique Representations

Another nice thing about sets of linearly independent vectors is that each element in the span has a unique representation
as a linear combination of these vectors.
In other words, if 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ ℝ𝑛 is linearly independent and

𝑦 = 𝛽1 𝑎1 + ⋯ 𝛽𝑘 𝑎𝑘

then no other coefficient sequence 𝛾1 , … , 𝛾𝑘 will produce the same vector 𝑦.


Indeed, if we also have 𝑦 = 𝛾1 𝑎1 + ⋯ 𝛾𝑘 𝑎𝑘 , then

(𝛽1 − 𝛾1 )𝑎1 + ⋯ + (𝛽𝑘 − 𝛾𝑘 )𝑎𝑘 = 0

Linear independence now implies 𝛾𝑖 = 𝛽𝑖 for all 𝑖.

2.2. Vectors 25
Intermediate Quantitative Economics with Python

2.3 Matrices

Matrices are a neat way of organizing data for use in linear operations.
An 𝑛 × 𝑘 matrix is a rectangular array 𝐴 of numbers with 𝑛 rows and 𝑘 columns:

𝑎11 𝑎12 ⋯ 𝑎1𝑘


⎡𝑎 𝑎22 ⋯ 𝑎2𝑘 ⎤
𝐴 = ⎢ 21 ⎥
⎢ ⋮ ⋮ ⋮ ⎥
⎣𝑎𝑛1 𝑎𝑛2 ⋯ 𝑎𝑛𝑘 ⎦

Often, the numbers in the matrix represent coefficients in a system of linear equations, as discussed at the start of this
lecture.
For obvious reasons, the matrix 𝐴 is also called a vector if either 𝑛 = 1 or 𝑘 = 1.
In the former case, 𝐴 is called a row vector, while in the latter it is called a column vector.
If 𝑛 = 𝑘, then 𝐴 is called square.
The matrix formed by replacing 𝑎𝑖𝑗 by 𝑎𝑗𝑖 for every 𝑖 and 𝑗 is called the transpose of 𝐴 and denoted 𝐴′ or 𝐴⊤ .
If 𝐴 = 𝐴′ , then 𝐴 is called symmetric.
For a square matrix 𝐴, the 𝑖 elements of the form 𝑎𝑖𝑖 for 𝑖 = 1, … , 𝑛 are called the principal diagonal.
𝐴 is called diagonal if the only nonzero entries are on the principal diagonal.
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then 𝐴 is called the identity matrix
and denoted by 𝐼.

2.3.1 Matrix Operations

Just as was the case for vectors, a number of algebraic operations are defined for matrices.
Scalar multiplication and addition are immediate generalizations of the vector case:

𝑎11 ⋯ 𝑎1𝑘 𝛾𝑎11 ⋯ 𝛾𝑎1𝑘


𝛾𝐴 = 𝛾 ⎡
⎢ ⋮ ⋮ ⋮ ⎤ ⎡
⎥ ∶= ⎢ ⋮ ⋮ ⋮ ⎤⎥
⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝛾𝑎𝑛1 ⋯ 𝛾𝑎𝑛𝑘 ⎦

and
𝑎11 ⋯ 𝑎1𝑘 𝑏11 ⋯ 𝑏1𝑘 𝑎11 + 𝑏11 ⋯ 𝑎1𝑘 + 𝑏1𝑘
𝐴+𝐵 =⎡
⎢ ⋮ ⋮ ⋮ ⎤ ⎡
⎥+⎢ ⋮ ⋮ ⋮ ⎤ ⎡
⎥ ∶= ⎢ ⋮ ⋮ ⋮ ⎤

⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝑏𝑛1 ⋯ 𝑏𝑛𝑘 ⎦ ⎣𝑎𝑛1 + 𝑏𝑛1 ⋯ 𝑎𝑛𝑘 + 𝑏𝑛𝑘 ⎦

In the latter case, the matrices must have the same shape in order for the definition to make sense.
We also have a convention for multiplying two matrices.
The rule for matrix multiplication generalizes the idea of inner products discussed above and is designed to make multi-
plication play well with basic linear operations.
If 𝐴 and 𝐵 are two matrices, then their product 𝐴𝐵 is formed by taking as its 𝑖, 𝑗-th element the inner product of the 𝑖-th
row of 𝐴 and the 𝑗-th column of 𝐵.
There are many tutorials to help you visualize this operation, such as this one, or the discussion on the Wikipedia page.
If 𝐴 is 𝑛 × 𝑘 and 𝐵 is 𝑗 × 𝑚, then to multiply 𝐴 and 𝐵 we require 𝑘 = 𝑗, and the resulting matrix 𝐴𝐵 is 𝑛 × 𝑚.
As perhaps the most important special case, consider multiplying 𝑛 × 𝑘 matrix 𝐴 and 𝑘 × 1 column vector 𝑥.

26 Chapter 2. Linear Algebra


Intermediate Quantitative Economics with Python

According to the preceding rule, this gives us an 𝑛 × 1 column vector

𝑎11 ⋯ 𝑎1𝑘 𝑥1 𝑎11 𝑥1 + ⋯ + 𝑎1𝑘 𝑥𝑘


𝐴𝑥 = ⎡
⎢ ⋮ ⋮ ⋮ ⎤ ⎡ ⋮ ⎤ ∶= ⎡
⎥⎢ ⎥ ⎢ ⋮ ⎤
⎥ (2.2)
⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝑥𝑘 ⎦ ⎣𝑎𝑛1 𝑥1 + ⋯ + 𝑎𝑛𝑘 𝑥𝑘 ⎦

Note: 𝐴𝐵 and 𝐵𝐴 are not generally the same thing.

Another important special case is the identity matrix.


You should check that if 𝐴 is 𝑛 × 𝑘 and 𝐼 is the 𝑘 × 𝑘 identity matrix, then 𝐴𝐼 = 𝐴.
If 𝐼 is the 𝑛 × 𝑛 identity matrix, then 𝐼𝐴 = 𝐴.

2.3.2 Matrices in NumPy

NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all the standard matrix oper-
ations1 .
You can create them manually from tuples of tuples (or lists of lists) as follows

A = ((1, 2),
(3, 4))

type(A)

tuple

A = np.array(A)

type(A)

numpy.ndarray

A.shape

(2, 2)

The shape attribute is a tuple giving the number of rows and columns — see here for more discussion.
To get the transpose of A, use A.transpose() or, more simply, A.T.
There are many convenient functions for creating common matrices (matrices of zeros, ones, etc.) — see here.
Since operations are performed elementwise by default, scalar multiplication and addition have very natural syntax

A = np.identity(3)
B = np.ones((3, 3))
2 * A

1 Although there is a specialized matrix data type defined in NumPy, it’s more standard to work with ordinary NumPy arrays. See this discussion.

2.3. Matrices 27
Intermediate Quantitative Economics with Python

array([[2., 0., 0.],


[0., 2., 0.],
[0., 0., 2.]])

A + B

array([[2., 1., 1.],


[1., 2., 1.],
[1., 1., 2.]])

To multiply matrices we use the @ symbol.


In particular, A @ B is matrix multiplication, whereas A * B is element-by-element multiplication.
See here for more discussion.

2.3.3 Matrices as Maps

Each 𝑛 × 𝑘 matrix 𝐴 can be identified with a function 𝑓(𝑥) = 𝐴𝑥 that maps 𝑥 ∈ ℝ𝑘 into 𝑦 = 𝐴𝑥 ∈ ℝ𝑛 .
These kinds of functions have a special property: they are linear.
A function 𝑓 ∶ ℝ𝑘 → ℝ𝑛 is called linear if, for all 𝑥, 𝑦 ∈ ℝ𝑘 and all scalars 𝛼, 𝛽, we have

𝑓(𝛼𝑥 + 𝛽𝑦) = 𝛼𝑓(𝑥) + 𝛽𝑓(𝑦)

You can check that this holds for the function 𝑓(𝑥) = 𝐴𝑥 + 𝑏 when 𝑏 is the zero vector and fails when 𝑏 is nonzero.
In fact, it’s known that 𝑓 is linear if and only if there exists a matrix 𝐴 such that 𝑓(𝑥) = 𝐴𝑥 for all 𝑥.

2.4 Solving Systems of Equations

Recall again the system of equations (2.1).


If we compare (2.1) and (2.2), we see that (2.1) can now be written more conveniently as

𝑦 = 𝐴𝑥 (2.3)

The problem we face is to determine a vector 𝑥 ∈ ℝ𝑘 that solves (2.3), taking 𝑦 and 𝐴 as given.
This is a special case of a more general problem: Find an 𝑥 such that 𝑦 = 𝑓(𝑥).
Given an arbitrary function 𝑓 and a 𝑦, is there always an 𝑥 such that 𝑦 = 𝑓(𝑥)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows

def f(x):
return 0.6 * np.cos(4 * x) + 1.4

xmin, xmax = -1, 1


x = np.linspace(xmin, xmax, 160)
(continues on next page)

28 Chapter 2. Linear Algebra


Intermediate Quantitative Economics with Python

(continued from previous page)


y = f(x)
ya, yb = np.min(y), np.max(y)

fig, axes = plt.subplots(2, 1, figsize=(10, 10))

for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(ylim=(-0.6, 3.2), xlim=(xmin, xmax),


yticks=(), xticks=())

ax.plot(x, y, 'k-', lw=2, label='$f$')


ax.fill_between(x, ya, yb, facecolor='blue', alpha=0.05)
ax.vlines([0], ya, yb, lw=3, color='blue', label='range of $f$')
ax.text(0.04, -0.3, '$0$', fontsize=16)

ax = axes[0]

ax.legend(loc='upper right', frameon=False)


ybar = 1.5
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.05, 0.8 * ybar, '$y$', fontsize=16)
for i, z in enumerate((-0.35, 0.35)):
ax.vlines(z, 0, f(z), linestyle='--', alpha=0.5)
ax.text(z, -0.2, f'$x_{i}$', fontsize=16)

ax = axes[1]

ybar = 2.6
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.04, 0.91 * ybar, '$y$', fontsize=16)

plt.show()

2.4. Solving Systems of Equations 29


Intermediate Quantitative Economics with Python

In the first plot, there are multiple solutions, as the function is not one-to-one, while in the second there are no solutions,
since 𝑦 lies outside the range of 𝑓.
Can we impose conditions on 𝐴 in (2.3) that rule out these problems?
In this context, the most important thing to recognize about the expression 𝐴𝑥 is that it corresponds to a linear combination
of the columns of 𝐴.
In particular, if 𝑎1 , … , 𝑎𝑘 are the columns of 𝐴, then

𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘

Hence the range of 𝑓(𝑥) = 𝐴𝑥 is exactly the span of the columns of 𝐴.


We want the range to be large so that it contains arbitrary 𝑦.
As you might recall, the condition that we want for the span to be large is linear independence.
A happy fact is that linear independence of the columns of 𝐴 also gives us uniqueness.

30 Chapter 2. Linear Algebra


Intermediate Quantitative Economics with Python

Indeed, it follows from our earlier discussion that if {𝑎1 , … , 𝑎𝑘 } are linearly independent and 𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 +⋯+𝑥𝑘 𝑎𝑘 ,
then no 𝑧 ≠ 𝑥 satisfies 𝑦 = 𝐴𝑧.

2.4.1 The Square Matrix Case

Let’s discuss some more details, starting with the case where 𝐴 is 𝑛 × 𝑛.
This is the familiar case where the number of unknowns equals the number of equations.
For arbitrary 𝑦 ∈ ℝ𝑛 , we hope to find a unique 𝑥 ∈ ℝ𝑛 such that 𝑦 = 𝐴𝑥.
In view of the observations immediately above, if the columns of 𝐴 are linearly independent, then their span, and hence
the range of 𝑓(𝑥) = 𝐴𝑥, is all of ℝ𝑛 .
Hence there always exists an 𝑥 such that 𝑦 = 𝐴𝑥.
Moreover, the solution is unique.
In particular, the following are equivalent
1. The columns of 𝐴 are linearly independent.
2. For any 𝑦 ∈ ℝ𝑛 , the equation 𝑦 = 𝐴𝑥 has a unique solution.
The property of having linearly independent columns is sometimes expressed as having full column rank.

Inverse Matrices

Can we give some sort of expression for the solution?


If 𝑦 and 𝐴 are scalar with 𝐴 ≠ 0, then the solution is 𝑥 = 𝐴−1 𝑦.
A similar expression is available in the matrix case.
In particular, if square matrix 𝐴 has full column rank, then it possesses a multiplicative inverse matrix 𝐴−1 , with the
property that 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼.
As a consequence, if we pre-multiply both sides of 𝑦 = 𝐴𝑥 by 𝐴−1 , we get 𝑥 = 𝐴−1 𝑦.
This is the solution that we’re looking for.

Determinants

Another quick comment about square matrices is that to every such matrix we assign a unique number called the deter-
minant of the matrix — you can find the expression for it here.
If the determinant of 𝐴 is not zero, then we say that 𝐴 is nonsingular.
Perhaps the most important fact about determinants is that 𝐴 is nonsingular if and only if 𝐴 is of full column rank.
This gives us a useful one-number summary of whether or not a square matrix can be inverted.

2.4. Solving Systems of Equations 31


Intermediate Quantitative Economics with Python

2.4.2 More Rows than Columns

This is the 𝑛 × 𝑘 case with 𝑛 > 𝑘.


This case is very important in many settings, not least in the setting of linear regression (where 𝑛 is the number of
observations, and 𝑘 is the number of explanatory variables).
Given arbitrary 𝑦 ∈ ℝ𝑛 , we seek an 𝑥 ∈ ℝ𝑘 such that 𝑦 = 𝐴𝑥.
In this setting, the existence of a solution is highly unlikely.
Without much loss of generality, let’s go over the intuition focusing on the case where the columns of 𝐴 are linearly
independent.
It follows that the span of the columns of 𝐴 is a 𝑘-dimensional subspace of ℝ𝑛 .
This span is very “unlikely” to contain arbitrary 𝑦 ∈ ℝ𝑛 .
To see why, recall the figure above, where 𝑘 = 2 and 𝑛 = 3.
Imagine an arbitrarily chosen 𝑦 ∈ ℝ3 , located somewhere in that three-dimensional space.
What’s the likelihood that 𝑦 lies in the span of {𝑎1 , 𝑎2 } (i.e., the two dimensional plane through these points)?
In a sense, it must be very small, since this plane has zero “thickness”.
As a result, in the 𝑛 > 𝑘 case we usually give up on existence.
However, we can still seek the best approximation, for example, an 𝑥 that makes the distance ‖𝑦 − 𝐴𝑥‖ as small as
possible.
To solve this problem, one can use either calculus or the theory of orthogonal projections.
The solution is known to be 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦 — see for example chapter 3 of these notes.

2.4.3 More Columns than Rows

This is the 𝑛 × 𝑘 case with 𝑛 < 𝑘, so there are fewer equations than unknowns.
In this case there are either no solutions or infinitely many — in other words, uniqueness never holds.
For example, consider the case where 𝑘 = 3 and 𝑛 = 2.
Thus, the columns of 𝐴 consists of 3 vectors in ℝ2 .
This set can never be linearly independent, since it is possible to find two vectors that span ℝ2 .
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two.
For example, let’s say that 𝑎1 = 𝛼𝑎2 + 𝛽𝑎3 .
Then if 𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + 𝑥2 𝑎2 + 𝑥3 𝑎3 , we can also write

𝑦 = 𝑥1 (𝛼𝑎2 + 𝛽𝑎3 ) + 𝑥2 𝑎2 + 𝑥3 𝑎3 = (𝑥1 𝛼 + 𝑥2 )𝑎2 + (𝑥1 𝛽 + 𝑥3 )𝑎3

In other words, uniqueness fails.

32 Chapter 2. Linear Algebra


Intermediate Quantitative Economics with Python

2.4.4 Linear Equations with SciPy

Here’s an illustration of how to solve linear equations with SciPy’s linalg submodule.
All of these routines are Python front ends to time-tested and highly optimized FORTRAN code

A = ((1, 2), (3, 4))


A = np.array(A)
y = np.ones((2, 1)) # Column vector
det(A) # Check that A is nonsingular, and hence invertible

-2.0

A_inv = inv(A) # Compute the inverse


A_inv

array([[-2. , 1. ],
[ 1.5, -0.5]])

x = A_inv @ y # Solution
A @ x # Should equal y

array([[1.],
[1.]])

solve(A, y) # Produces the same solution

array([[-1.],
[ 1.]])

Observe how we can solve for 𝑥 = 𝐴−1 𝑦 by either via inv(A) @ y, or using solve(A, y).
The latter method uses a different algorithm (LU decomposition) that is numerically more stable, and hence should almost
always be preferred.
To obtain the least-squares solution 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦, use scipy.linalg.lstsq(A, y).

2.5 Eigenvalues and Eigenvectors

Let 𝐴 be an 𝑛 × 𝑛 square matrix.


If 𝜆 is scalar and 𝑣 is a non-zero vector in ℝ𝑛 such that

𝐴𝑣 = 𝜆𝑣

then we say that 𝜆 is an eigenvalue of 𝐴, and 𝑣 is an eigenvector.


Thus, an eigenvector of 𝐴 is a vector such that when the map 𝑓(𝑥) = 𝐴𝑥 is applied, 𝑣 is merely scaled.
The next figure shows two eigenvectors (blue arrows) and their images under 𝐴 (red arrows).
As expected, the image 𝐴𝑣 of each 𝑣 is just a scaled version of the original

2.5. Eigenvalues and Eigenvectors 33


Intermediate Quantitative Economics with Python

A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evecs = evecs[:, 0], evecs[:, 1]

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.grid(alpha=0.4)

xmin, xmax = -3, 3


ymin, ymax = -3, 3
ax.set(xlim=(xmin, xmax), ylim=(ymin, ymax))

# Plot each eigenvector


for v in evecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the image of each eigenvector


for v in evecs:
v = A @ v
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the lines they run through


x = np.linspace(xmin, xmax, 3)
for v in evecs:
a = v[1] / v[0]
ax.plot(x, a * x, 'b-', lw=0.4)

plt.show()

34 Chapter 2. Linear Algebra


Intermediate Quantitative Economics with Python

The eigenvalue equation is equivalent to (𝐴 − 𝜆𝐼)𝑣 = 0, and this has a nonzero solution 𝑣 only when the columns of
𝐴 − 𝜆𝐼 are linearly dependent.
This in turn is equivalent to stating that the determinant is zero.
Hence to find all eigenvalues, we can look for 𝜆 such that the determinant of 𝐴 − 𝜆𝐼 is zero.
This problem can be expressed as one of solving for the roots of a polynomial in 𝜆 of degree 𝑛.
This in turn implies the existence of 𝑛 solutions in the complex plane, although some might be repeated.
Some nice facts about the eigenvalues of a square matrix 𝐴 are as follows
1. The determinant of 𝐴 equals the product of the eigenvalues.
2. The trace of 𝐴 (the sum of the elements on the principal diagonal) equals the sum of the eigenvalues.
3. If 𝐴 is symmetric, then all of its eigenvalues are real.
4. If 𝐴 is invertible and 𝜆1 , … , 𝜆𝑛 are its eigenvalues, then the eigenvalues of 𝐴−1 are 1/𝜆1 , … , 1/𝜆𝑛 .
A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues are nonzero.
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows

A = ((1, 2),
(2, 1))

(continues on next page)

2.5. Eigenvalues and Eigenvectors 35


Intermediate Quantitative Economics with Python

(continued from previous page)


A = np.array(A)
evals, evecs = eig(A)
evals

array([ 3.+0.j, -1.+0.j])

evecs

array([[ 0.70710678, -0.70710678],


[ 0.70710678, 0.70710678]])

Note that the columns of evecs are the eigenvectors.


Since any scalar multiple of an eigenvector is an eigenvector with the same eigenvalue (check it), the eig routine normalizes
the length of each eigenvector to one.

2.5.1 Generalized Eigenvalues

It is sometimes useful to consider the generalized eigenvalue problem, which, for given matrices 𝐴 and 𝐵, seeks generalized
eigenvalues 𝜆 and eigenvectors 𝑣 such that

𝐴𝑣 = 𝜆𝐵𝑣

This can be solved in SciPy via scipy.linalg.eig(A, B).


Of course, if 𝐵 is square and invertible, then we can treat the generalized eigenvalue problem as an ordinary eigenvalue
problem 𝐵−1 𝐴𝑣 = 𝜆𝑣, but this is not always the case.

2.6 Further Topics

We round out our discussion by briefly mentioning several other important topics.

2.6.1 Series Expansions



Recall the usual summation formula for a geometric progression, which states that if |𝑎| < 1, then ∑𝑘=0 𝑎𝑘 = (1 − 𝑎)−1 .
A generalization of this idea exists in the matrix setting.

Matrix Norms

Let 𝐴 be a square matrix, and let

‖𝐴‖ ∶= max ‖𝐴𝑥‖


‖𝑥‖=1

The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand side is a matrix norm — in
this case, the so-called spectral norm.

36 Chapter 2. Linear Algebra


Intermediate Quantitative Economics with Python

For example, for a square matrix 𝑆, the condition ‖𝑆‖ < 1 means that 𝑆 is contractive, in the sense that it pulls all vectors
towards the origin2 .

Neumann’s Theorem

Let 𝐴 be a square matrix and let 𝐴𝑘 ∶= 𝐴𝐴𝑘−1 with 𝐴1 ∶= 𝐴.


In other words, 𝐴𝑘 is the 𝑘-th power of 𝐴.
Neumann’s theorem states the following: If ‖𝐴𝑘 ‖ < 1 for some 𝑘 ∈ ℕ, then 𝐼 − 𝐴 is invertible, and

(𝐼 − 𝐴)−1 = ∑ 𝐴𝑘 (2.4)
𝑘=0

Spectral Radius

A result known as Gelfand’s formula tells us that, for any square matrix 𝐴,

𝜌(𝐴) = lim ‖𝐴𝑘 ‖1/𝑘


𝑘→∞

Here 𝜌(𝐴) is the spectral radius, defined as max𝑖 |𝜆𝑖 |, where {𝜆𝑖 }𝑖 is the set of eigenvalues of 𝐴.
As a consequence of Gelfand’s formula, if all eigenvalues are strictly less than one in modulus, there exists a 𝑘 with
‖𝐴𝑘 ‖ < 1.
In which case (2.4) is valid.

2.6.2 Positive Definite Matrices

Let 𝐴 be a symmetric 𝑛 × 𝑛 matrix.


We say that 𝐴 is
1. positive definite if 𝑥′ 𝐴𝑥 > 0 for every 𝑥 ∈ ℝ𝑛 {0}
2. positive semi-definite or nonnegative definite if 𝑥′ 𝐴𝑥 ≥ 0 for every 𝑥 ∈ ℝ𝑛
Analogous definitions exist for negative definite and negative semi-definite matrices.
It is notable that if 𝐴 is positive definite, then all of its eigenvalues are strictly positive, and hence 𝐴 is invertible (with
positive definite inverse).

2.6.3 Differentiating Linear and Quadratic Forms

The following formulas are useful in many economic contexts. Let


• 𝑧, 𝑥 and 𝑎 all be 𝑛 × 1 vectors
• 𝐴 be an 𝑛 × 𝑛 matrix
• 𝐵 be an 𝑚 × 𝑛 matrix and 𝑦 be an 𝑚 × 1 vector
Then
𝜕𝑎′ 𝑥
1. 𝜕𝑥 =𝑎
2 Suppose that ‖𝑆‖ < 1. Take any nonzero vector 𝑥, and let 𝑟 ∶= ‖𝑥‖. We have ‖𝑆𝑥‖ = 𝑟‖𝑆(𝑥/𝑟)‖ ≤ 𝑟‖𝑆‖ < 𝑟 = ‖𝑥‖. Hence every point is

pulled towards the origin.

2.6. Further Topics 37


Intermediate Quantitative Economics with Python

𝜕𝐴𝑥
2. 𝜕𝑥 = 𝐴′
𝜕𝑥′ 𝐴𝑥
3. 𝜕𝑥 = (𝐴 + 𝐴′ )𝑥
𝜕𝑦′ 𝐵𝑧
4. 𝜕𝑦 = 𝐵𝑧
𝜕𝑦′ 𝐵𝑧
5. 𝜕𝐵 = 𝑦𝑧 ′
Exercise 2.7.1 below asks you to apply these formulas.

2.6.4 Further Reading

The documentation of the scipy.linalg submodule can be found here.


Chapters 2 and 3 of the Econometric Theory contains a discussion of linear algebra along the same lines as above, with
solved exercises.
If you don’t mind a slightly abstract approach, a nice intermediate-level text on linear algebra is [Jänich, 1994].

2.7 Exercises

Exercise 2.7.1
Let 𝑥 be a given 𝑛 × 1 vector and consider the problem

𝑣(𝑥) = max {−𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢}


𝑦,𝑢

subject to the linear constraint

𝑦 = 𝐴𝑥 + 𝐵𝑢

Here
• 𝑃 is an 𝑛 × 𝑛 matrix and 𝑄 is an 𝑚 × 𝑚 matrix
• 𝐴 is an 𝑛 × 𝑛 matrix and 𝐵 is an 𝑛 × 𝑚 matrix
• both 𝑃 and 𝑄 are symmetric and positive semidefinite
(What must the dimensions of 𝑦 and 𝑢 be to make this a well-posed problem?)
One way to solve the problem is to form the Lagrangian

ℒ = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]

where 𝜆 is an 𝑛 × 1 vector of Lagrange multipliers.


Try applying the formulas given above for differentiating quadratic and linear forms to obtain the first-order conditions
for maximizing ℒ with respect to 𝑦, 𝑢 and minimizing it with respect to 𝜆.
Show that these conditions imply that
1. 𝜆 = −2𝑃 𝑦.
2. The optimizing choice of 𝑢 satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥.
3. The function 𝑣 satisfies 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 where 𝑃 ̃ = 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴.

38 Chapter 2. Linear Algebra


Intermediate Quantitative Economics with Python

As we will see, in economic contexts Lagrange multipliers often are shadow prices.

Note: If we don’t care about the Lagrange multipliers, we can substitute the constraint into the objective function, and
then just maximize −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢 with respect to 𝑢. You can verify that this leads to the same
maximizer.

Solution to Exercise 2.7.1


We have an optimization problem:

𝑣(𝑥) = max{−𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢}


𝑦,𝑢

s.t.

𝑦 = 𝐴𝑥 + 𝐵𝑢

with primitives
• 𝑃 be a symmetric and positive semidefinite 𝑛 × 𝑛 matrix
• 𝑄 be a symmetric and positive semidefinite 𝑚 × 𝑚 matrix
• 𝐴 an 𝑛 × 𝑛 matrix
• 𝐵 an 𝑛 × 𝑚 matrix
The associated Lagrangian is:

𝐿 = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]

Step 1.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields

𝜕𝐿
= −(𝑃 + 𝑃 ′ )𝑦 − 𝜆 = −2𝑃 𝑦 − 𝜆 = 0 ,
𝜕𝑦
since P is symmetric.
Accordingly, the first-order condition for maximizing L w.r.t. y implies

𝜆 = −2𝑃 𝑦

Step 2.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields

𝜕𝐿
= −(𝑄 + 𝑄′ )𝑢 − 𝐵′ 𝜆 = −2𝑄𝑢 + 𝐵′ 𝜆 = 0
𝜕𝑢
Substituting 𝜆 = −2𝑃 𝑦 gives

𝑄𝑢 + 𝐵′ 𝑃 𝑦 = 0

Substituting the linear constraint 𝑦 = 𝐴𝑥 + 𝐵𝑢 into above equation gives

𝑄𝑢 + 𝐵′ 𝑃 (𝐴𝑥 + 𝐵𝑢) = 0

2.7. Exercises 39
Intermediate Quantitative Economics with Python

(𝑄 + 𝐵′ 𝑃 𝐵)𝑢 + 𝐵′ 𝑃 𝐴𝑥 = 0
which is the first-order condition for maximizing 𝐿 w.r.t. 𝑢.
Thus, the optimal choice of u must satisfy

𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥 ,

which follows from the definition of the first-order conditions for Lagrangian equation.
Step 3.
Rewriting our problem by substituting the constraint into the objective function, we get

𝑣(𝑥) = max{−(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢}


𝑢

Since we know the optimal choice of u satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥, then

𝑣(𝑥) = −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢 𝑤𝑖𝑡ℎ 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

To evaluate the function


𝑣(𝑥) = −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢
= −(𝑥′ 𝐴′ + 𝑢′ 𝐵′ )𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢
= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑥′ 𝐴′ 𝑃 𝐵𝑢 − 𝑢′ 𝐵′ 𝑃 𝐵𝑢 − 𝑢′ 𝑄𝑢
= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 2𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢

For simplicity, denote by 𝑆 ∶= (𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴, then 𝑢 = −𝑆𝑥.


Regarding the second term −2𝑢′ 𝐵′ 𝑃 𝐴𝑥,

−2𝑢′ 𝐵′ 𝑃 𝐴𝑥 = −2𝑥′ 𝑆 ′ 𝐵′ 𝑃 𝐴𝑥
= 2𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

Notice that the term (𝑄 + 𝐵′ 𝑃 𝐵)−1 is symmetric as both P and Q are symmetric.
Regarding the third term −𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢,

−𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢 = −𝑥′ 𝑆 ′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑆𝑥


= −𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

Hence, the summation of second and third terms is 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥.
This implies that

𝑣(𝑥) = −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 2𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢


= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 + 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
= −𝑥′ [𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴]𝑥

Therefore, the solution to the optimization problem 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 follows the above result by denoting 𝑃 ̃ ∶= 𝐴′ 𝑃 𝐴 −
𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴

40 Chapter 2. Linear Algebra


CHAPTER

THREE

QR DECOMPOSITION

3.1 Overview

This lecture describes the QR decomposition and how it relates to


• Orthogonal projection and least squares
• A Gram-Schmidt process
• Eigenvalues and eigenvectors
We’ll write some Python code to help consolidate our understandings.

3.2 Matrix Factorization

The QR decomposition (also called the QR factorization) of a matrix is a decomposition of a matrix into the product of
an orthogonal matrix and a triangular matrix.
A QR decomposition of a real matrix 𝐴 takes the form

𝐴 = 𝑄𝑅

where
• 𝑄 is an orthogonal matrix (so that 𝑄𝑇 𝑄 = 𝐼)
• 𝑅 is an upper triangular matrix
We’ll use a Gram-Schmidt process to compute a QR decomposition
Because doing so is so educational, we’ll write our own Python code to do the job

3.3 Gram-Schmidt process

We’ll start with a square matrix 𝐴.


If a square matrix 𝐴 is nonsingular, then a 𝑄𝑅 factorization is unique.
We’ll deal with a rectangular matrix 𝐴 later.
Actually, our algorithm will work with a rectangular 𝐴 that is not square.

41
Intermediate Quantitative Economics with Python

3.3.1 Gram-Schmidt process for square 𝐴

Here we apply a Gram-Schmidt process to the columns of matrix 𝐴.


In particular, let

𝐴 = [ 𝑎1 𝑎2 ⋯ 𝑎𝑛 ]

Let || · || denote the L2 norm.


The Gram-Schmidt algorithm repeatedly combines the following two steps in a particular order
• normalize a vector to have unit norm
• orthogonalize the next vector
To begin, we set 𝑢1 = 𝑎1 and then normalize:
𝑢1
𝑢1 = 𝑎 1 , 𝑒 1 =
||𝑢1 ||

We orgonalize first to compute 𝑢2 and then normalize to create 𝑒2 :


𝑢2
𝑢2 = 𝑎2 − (𝑎2 · 𝑒1 )𝑒1 , 𝑒2 =
||𝑢2 ||

We invite the reader to verify that 𝑒1 is orthogonal to 𝑒2 by checking that 𝑒1 ⋅ 𝑒2 = 0.


The Gram-Schmidt procedure continues iterating.
Thus, for 𝑘 = 2, … , 𝑛 − 1 we construct
𝑢𝑘+1
𝑢𝑘+1 = 𝑎𝑘+1 − (𝑎𝑘+1 · 𝑒1 )𝑒1 − ⋯ − (𝑎𝑘+1 · 𝑒𝑘 )𝑒𝑘 , 𝑒𝑘+1 =
||𝑢𝑘+1 ||

Here (𝑎𝑗 ⋅ 𝑒𝑖 ) can be interpreted as the linear least squares regression coefficient of 𝑎𝑗 on 𝑒𝑖
• it is the inner product of 𝑎𝑗 and 𝑒𝑖 divided by the inner product of 𝑒𝑖 where 𝑒𝑖 ⋅ 𝑒𝑖 = 1, as normalization has assured
us.
• this regression coefficient has an interpretation as being a covariance divided by a variance
It can be verified that
𝑎1 · 𝑒1 𝑎2 · 𝑒1 ⋯ 𝑎 𝑛 · 𝑒1
⎡ 0 𝑎2 · 𝑒 2 ⋯ 𝑎 𝑛 · 𝑒2 ⎤
𝐴 = [ 𝑎1 𝑎2 ⋯ 𝑎𝑛 ] = [ 𝑒1 𝑒2 ⋯ 𝑒𝑛 ]⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎣ 0 0 ⋯ 𝑎𝑛 · 𝑒 𝑛 ⎦

Thus, we have constructed the decomposision

𝐴 = 𝑄𝑅

where

𝑄 = [ 𝑎1 𝑎2 ⋯ 𝑎𝑛 ] = [ 𝑒1 𝑒2 ⋯ 𝑒𝑛 ]

and
𝑎1 · 𝑒1 𝑎2 · 𝑒1 ⋯ 𝑎 𝑛 · 𝑒1
⎡ 0 𝑎2 · 𝑒 2 ⋯ 𝑎 𝑛 · 𝑒2 ⎤
𝑅=⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎣ 0 0 ⋯ 𝑎𝑛 · 𝑒𝑛 ⎦

42 Chapter 3. QR Decomposition
Intermediate Quantitative Economics with Python

3.3.2 𝐴 not square

Now suppose that 𝐴 is an 𝑛 × 𝑚 matrix where 𝑚 > 𝑛.


Then a 𝑄𝑅 decomposition is

𝑎1 · 𝑒1 𝑎2 · 𝑒 1 ⋯ 𝑎 𝑛 · 𝑒1 𝑎𝑛+1 ⋅ 𝑒1 ⋯ 𝑎 𝑚 ⋅ 𝑒1
⎡ 0 𝑎2 · 𝑒 2 ⋯ 𝑎 𝑛 · 𝑒2 𝑎𝑛+1 ⋅ 𝑒2 ⋯ 𝑎 𝑚 ⋅ 𝑒2 ⎤
𝐴 = [ 𝑎1 𝑎2 ⋯ 𝑎𝑚 ] = [ 𝑒1 𝑒2 ⋯ 𝑒𝑛 ]⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⋮ ⋱ ⋮ ⎥
⎣ 0 0 ⋯ 𝑎𝑛 · 𝑒 𝑛 𝑎𝑛+1 ⋅ 𝑒𝑛 ⋯ 𝑎 𝑚 ⋅ 𝑒𝑛 ⎦

which implies that

𝑎1 = (𝑎1 ⋅ 𝑒1 )𝑒1
𝑎2 = (𝑎2 ⋅ 𝑒1 )𝑒1 + (𝑎2 ⋅ 𝑒2 )𝑒2
⋮ ⋮
𝑎𝑛 = (𝑎𝑛 ⋅ 𝑒1 )𝑒1 + (𝑎𝑛 ⋅ 𝑒2 )𝑒2 + ⋯ + (𝑎𝑛 ⋅ 𝑒𝑛 )𝑒𝑛
𝑎𝑛+1 = (𝑎𝑛+1 ⋅ 𝑒1 )𝑒1 + (𝑎𝑛+1 ⋅ 𝑒2 )𝑒2 + ⋯ + (𝑎𝑛+1 ⋅ 𝑒𝑛 )𝑒𝑛
⋮ ⋮
𝑎𝑚 = (𝑎𝑚 ⋅ 𝑒1 )𝑒1 + (𝑎𝑚 ⋅ 𝑒2 )𝑒2 + ⋯ + (𝑎𝑚 ⋅ 𝑒𝑛 )𝑒𝑛

3.4 Some Code

Now let’s write some homemade Python code to implement a QR decomposition by deploying the Gram-Schmidt process
described above.

import numpy as np
from scipy.linalg import qr

def QR_Decomposition(A):
n, m = A.shape # get the shape of A

Q = np.empty((n, n)) # initialize matrix Q


u = np.empty((n, n)) # initialize matrix u

u[:, 0] = A[:, 0]
Q[:, 0] = u[:, 0] / np.linalg.norm(u[:, 0])

for i in range(1, n):

u[:, i] = A[:, i]
for j in range(i):
u[:, i] -= (A[:, i] @ Q[:, j]) * Q[:, j] # get each u vector

Q[:, i] = u[:, i] / np.linalg.norm(u[:, i]) # compute each e vetor

R = np.zeros((n, m))
for i in range(n):
for j in range(i, m):
R[i, j] = A[:, j] @ Q[:, i]

return Q, R

3.4. Some Code 43


Intermediate Quantitative Economics with Python

The preceding code is fine but can benefit from some further housekeeping.
We want to do this because later in this notebook we want to compare results from using our homemade code above with
the code for a QR that the Python scipy package delivers.
There can be be sign differences between the 𝑄 and 𝑅 matrices produced by different numerical algorithms.
All of these are valid QR decompositions because of how the sign differences cancel out when we compute 𝑄𝑅.
However, to make the results from our homemade function and the QR module in scipy comparable, let’s require that
𝑄 have positive diagonal entries.
We do this by adjusting the signs of the columns in 𝑄 and the rows in 𝑅 appropriately.
To accomplish this we’ll define a pair of functions.

def diag_sign(A):
"Compute the signs of the diagonal of matrix A"

D = np.diag(np.sign(np.diag(A)))

return D

def adjust_sign(Q, R):


"""
Adjust the signs of the columns in Q and rows in R to
impose positive diagonal of Q
"""

D = diag_sign(Q)

Q[:, :] = Q @ D
R[:, :] = D @ R

return Q, R

3.5 Example

Now let’s do an example.

A = np.array([[1.0, 1.0, 0.0], [1.0, 0.0, 1.0], [0.0, 1.0, 1.0]])


# A = np.array([[1.0, 0.5, 0.2], [0.5, 0.5, 1.0], [0.0, 1.0, 1.0]])
# A = np.array([[1.0, 0.5, 0.2], [0.5, 0.5, 1.0]])

array([[1., 1., 0.],


[1., 0., 1.],
[0., 1., 1.]])

Q, R = adjust_sign(*QR_Decomposition(A))

44 Chapter 3. QR Decomposition
Intermediate Quantitative Economics with Python

array([[ 0.70710678, -0.40824829, -0.57735027],


[ 0.70710678, 0.40824829, 0.57735027],
[ 0. , -0.81649658, 0.57735027]])

array([[ 1.41421356, 0.70710678, 0.70710678],


[ 0. , -1.22474487, -0.40824829],
[ 0. , 0. , 1.15470054]])

Let’s compare outcomes with what the scipy package produces

Q_scipy, R_scipy = adjust_sign(*qr(A))

print('Our Q: \n', Q)
print('\n')
print('Scipy Q: \n', Q_scipy)

Our Q:
[[ 0.70710678 -0.40824829 -0.57735027]
[ 0.70710678 0.40824829 0.57735027]
[ 0. -0.81649658 0.57735027]]

Scipy Q:
[[ 0.70710678 -0.40824829 -0.57735027]
[ 0.70710678 0.40824829 0.57735027]
[ 0. -0.81649658 0.57735027]]

print('Our R: \n', R)
print('\n')
print('Scipy R: \n', R_scipy)

Our R:
[[ 1.41421356 0.70710678 0.70710678]
[ 0. -1.22474487 -0.40824829]
[ 0. 0. 1.15470054]]

Scipy R:
[[ 1.41421356 0.70710678 0.70710678]
[ 0. -1.22474487 -0.40824829]
[ 0. 0. 1.15470054]]

The above outcomes give us the good news that our homemade function agrees with what scipy produces.
Now let’s do a QR decomposition for a rectangular matrix 𝐴 that is 𝑛 × 𝑚 with 𝑚 > 𝑛.

A = np.array([[1, 3, 4], [2, 0, 9]])

Q, R = adjust_sign(*QR_Decomposition(A))
Q, R

3.5. Example 45
Intermediate Quantitative Economics with Python

(array([[ 0.4472136 , -0.89442719],


[ 0.89442719, 0.4472136 ]]),
array([[ 2.23606798, 1.34164079, 9.8386991 ],
[ 0. , -2.68328157, 0.4472136 ]]))

Q_scipy, R_scipy = adjust_sign(*qr(A))


Q_scipy, R_scipy

(array([[ 0.4472136 , -0.89442719],


[ 0.89442719, 0.4472136 ]]),
array([[ 2.23606798, 1.34164079, 9.8386991 ],
[ 0. , -2.68328157, 0.4472136 ]]))

3.6 Using QR Decomposition to Compute Eigenvalues

Now for a useful fact about the QR algorithm.


The following iterations on the QR decomposition can be used to compute eigenvalues of a square matrix 𝐴.
Here is the algorithm:
1. Set 𝐴0 = 𝐴 and form 𝐴0 = 𝑄0 𝑅0
2. Form 𝐴1 = 𝑅0 𝑄0 . Note that 𝐴1 is similar to 𝐴0 (easy to verify) and so has the same eigenvalues.
3. Form 𝐴1 = 𝑄1 𝑅1 (i.e., form the 𝑄𝑅 decomposition of 𝐴1 ).
4. Form 𝐴2 = 𝑅1 𝑄1 and then 𝐴2 = 𝑄2 𝑅2 .
5. Iterate to convergence.
6. Compute eigenvalues of 𝐴 and compare them to the diagonal values of the limiting 𝐴𝑛 found from this process.
Remark: this algorithm is close to one of the most efficient ways of computing eigenvalues!
Let’s write some Python code to try out the algorithm

def QR_eigvals(A, tol=1e-12, maxiter=1000):


"Find the eigenvalues of A using QR decomposition."

A_old = np.copy(A)
A_new = np.copy(A)

diff = np.inf
i = 0
while (diff > tol) and (i < maxiter):
A_old[:, :] = A_new
Q, R = QR_Decomposition(A_old)

A_new[:, :] = R @ Q

diff = np.abs(A_new - A_old).max()


i += 1

eigvals = np.diag(A_new)

return eigvals

46 Chapter 3. QR Decomposition
Intermediate Quantitative Economics with Python

Now let’s try the code and compare the results with what scipy.linalg.eigvals gives us
Here goes

# experiment this with one random A matrix


A = np.random.random((3, 3))

sorted(QR_eigvals(A))

[-0.5697946336664133, 0.06239382762551169, 2.0469077458946816]

Compare with the scipy package.

sorted(np.linalg.eigvals(A))

[-0.5697946336664135, 0.062393827625510115, 2.046907745894684]

3.7 𝑄𝑅 and PCA

There are interesting connections between the 𝑄𝑅 decomposition and principal components analysis (PCA).
Here are some.
1. Let 𝑋 ′ be a 𝑘 × 𝑛 random matrix where the 𝑗th column is a random draw from 𝒩(𝜇, Σ) where 𝜇 is 𝑘 × 1 vector
of means and Σ is a 𝑘 × 𝑘 covariance matrix. We want 𝑛 >> 𝑘 – this is an “econometrics example”.
2. Form 𝑋 ′ = 𝑄𝑅 where 𝑄 is 𝑘 × 𝑘 and 𝑅 is 𝑘 × 𝑛.
3. Form the eigenvalues of 𝑅𝑅′ , i.e., we’ll compute 𝑅𝑅′ = 𝑃 ̃ Λ𝑃 ̃ ′ .
̂ ′.
4. Form 𝑋 ′ 𝑋 = 𝑄𝑃 ̃ Λ𝑃 ̃ ′ 𝑄′ and compare it with the eigen decomposition 𝑋 ′ 𝑋 = 𝑃 Λ𝑃
5. It will turn out that that Λ = Λ̂ and that 𝑃 = 𝑄𝑃 ̃ .
Let’s verify conjecture 5 with some Python code.
Start by simulating a random (𝑛, 𝑘) matrix 𝑋.

k = 5
n = 1000

# generate some random moments


= np.random.random(size=k)
C = np.random.random((k, k))
Σ = C.T @ C

# X is random matrix where each column follows multivariate normal dist.


X = np.random.multivariate_normal( , Σ, size=n)

X.shape

(1000, 5)

3.7. 𝑄𝑅 and PCA 47


Intermediate Quantitative Economics with Python

Let’s apply the QR decomposition to 𝑋 ′ .

Q, R = adjust_sign(*QR_Decomposition(X.T))

Check the shapes of 𝑄 and 𝑅.

Q.shape, R.shape

((5, 5), (5, 1000))

Now we can construct 𝑅𝑅′ = 𝑃 ̃ Λ𝑃 ̃ ′ and form an eigen decomposition.

RR = R @ R.T

, P_tilde = np.linalg.eigh(RR)
Λ = np.diag( )

̂ ′.
We can also apply the decomposition to 𝑋 ′ 𝑋 = 𝑃 Λ𝑃

XX = X.T @ X

_hat, P = np.linalg.eigh(XX)
Λ_hat = np.diag( _hat)

Compare the eigenvalues that are on the diagonals of Λ and Λ.̂

, _hat

(array([ 36.45694801, 182.4271492 , 593.23015461, 1315.47957925,


8259.33586321]),
array([ 36.45694801, 182.4271492 , 593.23015461, 1315.47957925,
8259.33586321]))

Let’s compare 𝑃 and 𝑄𝑃 ̃ .


Again we need to be careful about sign differences between the columns of 𝑃 and 𝑄𝑃 ̃ .

QP_tilde = Q @ P_tilde

np.abs(P @ diag_sign(P) - QP_tilde @ diag_sign(QP_tilde)).max()

3.344546861683284e-15

Let’s verify that 𝑋 ′ 𝑋 can be decomposed as 𝑄𝑃 ̃ Λ𝑃 ̃ ′ 𝑄′ .

QPΛPQ = Q @ P_tilde @ Λ @ P_tilde.T @ Q.T

np.abs(QPΛPQ - XX).max()

5.002220859751105e-12

48 Chapter 3. QR Decomposition
CHAPTER

FOUR

CIRCULANT MATRICES

4.1 Overview

This lecture describes circulant matrices and some of their properties.


Circulant matrices have a special structure that connects them to useful concepts including
• convolution
• Fourier transforms
• permutation matrices
Because of these connections, circulant matrices are widely used in machine learning, for example, in image processing.
We begin by importing some Python packages

import numpy as np
from numba import njit
import matplotlib.pyplot as plt

np.set_printoptions(precision=3, suppress=True)

4.2 Constructing a Circulant Matrix

To construct an 𝑁 × 𝑁 circulant matrix, we need only the first row, say,

[𝑐0 𝑐1 𝑐2 𝑐3 𝑐4 ⋯ 𝑐𝑁−1 ] .

After setting entries in the first row, the remaining rows of a circulant matrix are determined as follows:

𝑐0 𝑐1 𝑐2 𝑐3 𝑐4 ⋯ 𝑐𝑁−1
⎡ 𝑐 𝑐0 𝑐1 𝑐2 𝑐3 ⋯ 𝑐𝑁−2 ⎤
⎢ 𝑁−1 ⎥
⎢ 𝑐𝑁−2 𝑐𝑁−1 𝑐0 𝑐1 𝑐2 ⋯ 𝑐𝑁−3 ⎥
𝐶=⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥ (4.1)
⎢ ⎥
⎢ 𝑐3 𝑐4 𝑐5 𝑐6 𝑐7 ⋯ 𝑐2 ⎥
⎢ 𝑐2 𝑐3 𝑐4 𝑐5 𝑐6 ⋯ 𝑐1 ⎥
⎣ 𝑐1 𝑐2 𝑐3 𝑐4 𝑐5 ⋯ 𝑐0 ⎦
It is also possible to construct a circulant matrix by creating the transpose of the above matrix, in which case only the first
column needs to be specified.
Let’s write some Python code to generate a circulant matrix.

49
Intermediate Quantitative Economics with Python

@njit
def construct_cirlulant(row):

N = row.size

C = np.empty((N, N))

for i in range(N):

C[i, i:] = row[:N-i]


C[i, :i] = row[N-i:]

return C

# a simple case when N = 3


construct_cirlulant(np.array([1., 2., 3.]))

array([[1., 2., 3.],


[3., 1., 2.],
[2., 3., 1.]])

4.2.1 Some Properties of Circulant Matrices

Here are some useful properties:


Suppose that 𝐴 and 𝐵 are both circulant matrices. Then it can be verified that
• The transpose of a circulant matrix is a circulant matrix.
• 𝐴 + 𝐵 is a circulant matrix
• 𝐴𝐵 is a circulant matrix
• 𝐴𝐵 = 𝐵𝐴
Now consider a circulant matrix with first row

𝑐 = [𝑐0 𝑐1 ⋯ 𝑐𝑁−1 ]

and consider a vector

𝑎 = [𝑎0 𝑎1 ⋯ 𝑎𝑁−1 ]

The convolution of vectors 𝑐 and 𝑎 is defined as the vector 𝑏 = 𝑐 ∗ 𝑎 with components


𝑛−1
𝑏𝑘 = ∑ 𝑐𝑘−𝑖 𝑎𝑖 (4.2)
𝑖=0

We use ∗ to denote convolution via the calculation described in equation (4.2).


It can be verified that the vector 𝑏 satisfies

𝑏 = 𝐶𝑇 𝑎

where 𝐶 𝑇 is the transpose of the circulant matrix defined in equation (4.1).

50 Chapter 4. Circulant Matrices


Intermediate Quantitative Economics with Python

4.3 Connection to Permutation Matrix

A good way to construct a circulant matrix is to use a permutation matrix.


Before defining a permutation matrix, we’ll define a permutation.
A permutation of a set of the set of non-negative integers {0, 1, 2, …} is a one-to-one mapping of the set into itself.
A permutation of a set {1, 2, … , 𝑛} rearranges the 𝑛 integers in the set.
A permutation matrix is obtained by permuting the rows of an 𝑛 × 𝑛 identity matrix according to a permutation of the
numbers 1 to 𝑛.
Thus, every row and every column contain precisely a single 1 with 0 everywhere else.
Every permutation corresponds to a unique permutation matrix.
For example, the 𝑁 × 𝑁 matrix

0 1 0 0 ⋯ 0
⎡ 0 0 1 0 ⋯ 0 ⎤
⎢ ⎥
0 0 0 1 ⋯ 0
𝑃 =⎢ ⎥ (4.3)
⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥
⎢ 0 0 0 0 ⋯ 1 ⎥
⎣ 1 0 0 0 ⋯ 0 ⎦

serves as a cyclic shift operator that, when applied to an 𝑁 × 1 vector ℎ, shifts entries in rows 2 through 𝑁 up one row
and shifts the entry in row 1 to row 𝑁 .
Eigenvalues of the cyclic shift permutation matrix 𝑃 defined in equation (4.3) can be computed by constructing

−𝜆 1 0 0 ⋯ 0
⎡ 0 −𝜆 1 0 ⋯ 0 ⎤
⎢ ⎥
0 0 −𝜆 1 ⋯ 0
𝑃 − 𝜆𝐼 = ⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥
⎢ 0 0 0 0 ⋯ 1 ⎥
⎣ 1 0 0 0 ⋯ −𝜆 ⎦

and solving

det(𝑃 − 𝜆𝐼) = (−1)𝑁 𝜆𝑁 − 1 = 0

Eigenvalues 𝜆𝑖 can be complex.


Magnitudes ∣ 𝜆𝑖 ∣ of these eigenvalues 𝜆𝑖 all equal 1.
Thus, singular values of the permutation matrix 𝑃 defined in equation (4.3) all equal 1.
It can be verified that permutation matrices are orthogonal matrices:

𝑃𝑃′ = 𝐼

4.3. Connection to Permutation Matrix 51


Intermediate Quantitative Economics with Python

4.4 Examples with Python

Let’s write some Python code to illustrate these ideas.

@njit
def construct_P(N):

P = np.zeros((N, N))

for i in range(N-1):
P[i, i+1] = 1
P[-1, 0] = 1

return P

P4 = construct_P(4)
P4

array([[0., 1., 0., 0.],


[0., 0., 1., 0.],
[0., 0., 0., 1.],
[1., 0., 0., 0.]])

# compute the eigenvalues and eigenvectors


, Q = np.linalg.eig(P4)

for i in range(4):
print(f' {i} = { [i]:.1f} \nvec{i} = {Q[i, :]}\n')

0 = -1.0+0.0j
vec0 = [-0.5+0.j 0. +0.5j 0. -0.5j -0.5+0.j ]

1 = 0.0+1.0j
vec1 = [ 0.5+0.j -0.5+0.j -0.5-0.j -0.5+0.j]

2 = 0.0-1.0j
vec2 = [-0.5+0.j 0. -0.5j 0. +0.5j -0.5+0.j ]

3 = 1.0+0.0j
vec3 = [ 0.5+0.j 0.5-0.j 0.5+0.j -0.5+0.j]

In graphs below, we shall portray eigenvalues of a shift permutation matrix in the complex plane.
These eigenvalues are uniformly distributed along the unit circle.
They are the 𝑛 roots of unity, meaning they are the 𝑛 numbers 𝑧 that solve 𝑧 𝑛 = 1, where 𝑧 is a complex number.
In particular, the 𝑛 roots of unity are

2𝜋𝑗𝑘
𝑧 = exp ( ), 𝑘 = 0, … , 𝑁 − 1
𝑁
where 𝑗 denotes the purely imaginary unit number.

52 Chapter 4. Circulant Matrices


Intermediate Quantitative Economics with Python

fig, ax = plt.subplots(2, 2, figsize=(10, 10))

for i, N in enumerate([3, 4, 6, 8]):

row_i = i // 2
col_i = i % 2

P = construct_P(N)
, Q = np.linalg.eig(P)

circ = plt.Circle((0, 0), radius=1, edgecolor='b', facecolor='None')


ax[row_i, col_i].add_patch(circ)

for j in range(N):
ax[row_i, col_i].scatter( [j].real, [j].imag, c='b')

ax[row_i, col_i].set_title(f'N = {N}')


ax[row_i, col_i].set_xlabel('real')
ax[row_i, col_i].set_ylabel('imaginary')

plt.show()

4.4. Examples with Python 53


Intermediate Quantitative Economics with Python

For a vector of coefficients {𝑐𝑖 }𝑛−1


𝑖=0 , eigenvectors of 𝑃 are also eigenvectors of

𝐶 = 𝑐0 𝐼 + 𝑐1 𝑃 + 𝑐2 𝑃 2 + ⋯ + 𝑐𝑁−1 𝑃 𝑁−1 .

Consider an example in which 𝑁 = 8 and let 𝑤 = 𝑒−2𝜋𝑗/𝑁 .


It can be verified that the matrix 𝐹8 of eigenvectors of 𝑃8 is

1 1 1 ⋯ 1
⎡ 1 𝑤 𝑤2 ⋯ 𝑤7 ⎤
⎢ ⎥
⎢ 1 𝑤2 𝑤4 ⋯ 𝑤14 ⎥
⎢ 1 𝑤3 𝑤6 ⋯ 𝑤21 ⎥
𝐹8 = ⎢ ⎥
1 𝑤4 𝑤8 ⋯ 𝑤28
⎢ ⎥
⎢ 1 𝑤5 𝑤10 ⋯ 𝑤35 ⎥
⎢ 1 𝑤6 𝑤12 ⋯ 𝑤42 ⎥
⎣ 1 𝑤7 𝑤14 ⋯ 𝑤49 ⎦
The matrix 𝐹8 defines a Discete Fourier Transform.

54 Chapter 4. Circulant Matrices


Intermediate Quantitative Economics with Python


To convert it into an orthogonal eigenvector matrix, we can simply normalize it by dividing every entry by 8.
• stare at the first column of 𝐹8 above to convince yourself of this fact
The eigenvalues corresponding to each eigenvector are {𝑤𝑗 }7𝑗=0 in order.

def construct_F(N):

w = np.e ** (-complex(0, 2*np.pi/N))

F = np.ones((N, N), dtype=complex)


for i in range(1, N):
F[i, 1:] = w ** (i * np.arange(1, N))

return F, w

F8, w = construct_F(8)

(0.7071067811865476-0.7071067811865475j)

F8

array([[ 1. +0.j , 1. +0.j , 1. +0.j , 1. +0.j ,


1. +0.j , 1. +0.j , 1. +0.j , 1. +0.j ],
[ 1. +0.j , 0.707-0.707j, 0. -1.j , -0.707-0.707j,
-1. -0.j , -0.707+0.707j, -0. +1.j , 0.707+0.707j],
[ 1. +0.j , 0. -1.j , -1. -0.j , -0. +1.j ,
1. +0.j , 0. -1.j , -1. -0.j , -0. +1.j ],
[ 1. +0.j , -0.707-0.707j, -0. +1.j , 0.707-0.707j,
-1. -0.j , 0.707+0.707j, 0. -1.j , -0.707+0.707j],
[ 1. +0.j , -1. -0.j , 1. +0.j , -1. -0.j ,
1. +0.j , -1. -0.j , 1. +0.j , -1. -0.j ],
[ 1. +0.j , -0.707+0.707j, 0. -1.j , 0.707+0.707j,
-1. -0.j , 0.707-0.707j, -0. +1.j , -0.707-0.707j],
[ 1. +0.j , -0. +1.j , -1. -0.j , 0. -1.j ,
1. +0.j , -0. +1.j , -1. -0.j , 0. -1.j ],
[ 1. +0.j , 0.707+0.707j, -0. +1.j , -0.707+0.707j,
-1. -0.j , -0.707-0.707j, 0. -1.j , 0.707-0.707j]])

# normalize
Q8 = F8 / np.sqrt(8)

# verify the orthogonality (unitarity)


Q8 @ np.conjugate(Q8)

array([[ 1.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, 0.+0.j, 0.+0.j,


0.+0.j],
[-0.-0.j, 1.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, 0.+0.j,
0.+0.j],
[-0.-0.j, -0.-0.j, 1.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, 0.+0.j,
(continues on next page)

4.4. Examples with Python 55


Intermediate Quantitative Economics with Python

(continued from previous page)


0.+0.j],
[-0.-0.j, -0.-0.j, -0.-0.j, 1.+0.j, -0.+0.j, -0.+0.j, -0.+0.j,
-0.+0.j],
[-0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j, 1.+0.j, -0.+0.j, -0.+0.j,
-0.+0.j],
[ 0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j, 1.+0.j, -0.+0.j,
-0.+0.j],
[ 0.-0.j, 0.-0.j, 0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j, 1.+0.j,
-0.+0.j],
[ 0.-0.j, 0.-0.j, 0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j, -0.-0.j,
1.+0.j]])

Let’s verify that 𝑘th column of 𝑄8 is an eigenvector of 𝑃8 with an eigenvalue 𝑤𝑘 .

P8 = construct_P(8)

diff_arr = np.empty(8, dtype=complex)


for j in range(8):
diff = P8 @ Q8[:, j] - w ** j * Q8[:, j]
diff_arr[j] = diff @ diff.T

diff_arr

array([ 0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j, -0.+0.j,


-0.+0.j])

4.5 Associated Permutation Matrix

Next, we execute calculations to verify that the circulant matrix 𝐶 defined in equation (4.1) can be written as

𝐶 = 𝑐0 𝐼 + 𝑐1 𝑃 + ⋯ + 𝑐𝑛−1 𝑃 𝑛−1

and that every eigenvector of 𝑃 is also an eigenvector of 𝐶.


We illustrate this for 𝑁 = 8 case.

c = np.random.random(8)

array([0.421, 0.58 , 0.352, 0.055, 0.428, 0.466, 0.943, 0.027])

C8 = construct_cirlulant(c)

Compute 𝑐0 𝐼 + 𝑐1 𝑃 + ⋯ + 𝑐𝑛−1 𝑃 𝑛−1 .

56 Chapter 4. Circulant Matrices


Intermediate Quantitative Economics with Python

N = 8

C = np.zeros((N, N))
P = np.eye(N)

for i in range(N):
C += c[i] * P
P = P8 @ P

array([[0.421, 0.58 , 0.352, 0.055, 0.428, 0.466, 0.943, 0.027],


[0.027, 0.421, 0.58 , 0.352, 0.055, 0.428, 0.466, 0.943],
[0.943, 0.027, 0.421, 0.58 , 0.352, 0.055, 0.428, 0.466],
[0.466, 0.943, 0.027, 0.421, 0.58 , 0.352, 0.055, 0.428],
[0.428, 0.466, 0.943, 0.027, 0.421, 0.58 , 0.352, 0.055],
[0.055, 0.428, 0.466, 0.943, 0.027, 0.421, 0.58 , 0.352],
[0.352, 0.055, 0.428, 0.466, 0.943, 0.027, 0.421, 0.58 ],
[0.58 , 0.352, 0.055, 0.428, 0.466, 0.943, 0.027, 0.421]])

C8

array([[0.421, 0.58 , 0.352, 0.055, 0.428, 0.466, 0.943, 0.027],


[0.027, 0.421, 0.58 , 0.352, 0.055, 0.428, 0.466, 0.943],
[0.943, 0.027, 0.421, 0.58 , 0.352, 0.055, 0.428, 0.466],
[0.466, 0.943, 0.027, 0.421, 0.58 , 0.352, 0.055, 0.428],
[0.428, 0.466, 0.943, 0.027, 0.421, 0.58 , 0.352, 0.055],
[0.055, 0.428, 0.466, 0.943, 0.027, 0.421, 0.58 , 0.352],
[0.352, 0.055, 0.428, 0.466, 0.943, 0.027, 0.421, 0.58 ],
[0.58 , 0.352, 0.055, 0.428, 0.466, 0.943, 0.027, 0.421]])

Now let’s compute the difference between two circulant matrices that we have constructed in two different ways.

np.abs(C - C8).max()

0.0

7
The 𝑘th column of 𝑃8 associated with eigenvalue 𝑤𝑘−1 is an eigenvector of 𝐶8 associated with an eigenvalue ∑ℎ=0 𝑐𝑗 𝑤ℎ𝑘 .

_C8 = np.zeros(8, dtype=complex)

for j in range(8):
for k in range(8):
_C8[j] += c[k] * w ** (j * k)

_C8

array([ 3.272+0.j , 0.053+0.49j , -0.446-0.964j, -0.067-0.691j,


1.018+0.j , -0.067+0.691j, -0.446+0.964j, 0.053-0.49j ])

We can verify this by comparing C8 @ Q8[:, j] with _C8[j] * Q8[:, j].

4.5. Associated Permutation Matrix 57


Intermediate Quantitative Economics with Python

# verify
for j in range(8):
diff = C8 @ Q8[:, j] - _C8[j] * Q8[:, j]
print(diff)

[0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j]


[-0.+0.j 0.+0.j 0.-0.j -0.-0.j -0.-0.j -0.+0.j -0.+0.j -0.+0.j]
[ 0.-0.j 0.-0.j -0.-0.j -0.-0.j -0.+0.j 0.+0.j 0.-0.j -0.-0.j]
[ 0.+0.j -0.-0.j -0.-0.j -0.+0.j 0.-0.j -0.-0.j 0.+0.j 0.-0.j]
[ 0.+0.j -0.-0.j 0.-0.j -0.+0.j 0.-0.j -0.+0.j 0.-0.j -0.+0.j]
[ 0.-0.j 0.+0.j 0.-0.j 0.+0.j -0.-0.j 0.-0.j -0.+0.j -0.-0.j]
[ 0.+0.j 0.-0.j 0.-0.j 0.-0.j 0.+0.j -0.+0.j -0.-0.j 0.-0.j]
[ 0.+0.j -0.+0.j 0.-0.j 0.-0.j 0.-0.j 0.+0.j 0.+0.j -0.+0.j]

4.6 Discrete Fourier Transform

The Discrete Fourier Transform (DFT) allows us to represent a discrete time sequence as a weighted sum of complex
sinusoids.
Consider a sequence of 𝑁 real number {𝑥𝑗 }𝑁−1
𝑗=0 .

The Discrete Fourier Transform maps {𝑥𝑗 }𝑁−1 𝑁−1


𝑗=0 into a sequence of complex numbers {𝑋𝑘 }𝑘=0

where
𝑁−1
𝑘𝑛
𝑋𝑘 = ∑ 𝑥𝑛 𝑒−2𝜋 𝑁 𝑖
𝑛=0

def DFT(x):
"The discrete Fourier transform."

N = len(x)
w = np.e ** (-complex(0, 2*np.pi/N))

X = np.zeros(N, dtype=complex)
for k in range(N):
for n in range(N):
X[k] += x[n] * w ** (k * n)

return X

Consider the following example.

1/2 𝑛 = 0, 1
𝑥𝑛 = {
0 otherwise

x = np.zeros(10)
x[0:2] = 1/2

58 Chapter 4. Circulant Matrices


Intermediate Quantitative Economics with Python

array([0.5, 0.5, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ])

Apply a discrete Fourier transform.

X = DFT(x)

array([ 1. +0.j , 0.905-0.294j, 0.655-0.476j, 0.345-0.476j,


0.095-0.294j, -0. +0.j , 0.095+0.294j, 0.345+0.476j,
0.655+0.476j, 0.905+0.294j])

We can plot magnitudes of a sequence of numbers and the associated discrete Fourier transform.

def plot_magnitude(x=None, X=None):

data = []
names = []
xs = []
if (x is not None):
data.append(x)
names.append('x')
xs.append('n')
if (X is not None):
data.append(X)
names.append('X')
xs.append('j')

num = len(data)
for i in range(num):
n = data[i].size
plt.figure(figsize=(8, 3))
plt.scatter(range(n), np.abs(data[i]))
plt.vlines(range(n), 0, np.abs(data[i]), color='b')

plt.xlabel(xs[i])
plt.ylabel('magnitude')
plt.title(names[i])
plt.show()

plot_magnitude(x=x, X=X)

4.6. Discrete Fourier Transform 59


Intermediate Quantitative Economics with Python

The inverse Fourier transform transforms a Fourier transform 𝑋 of 𝑥 back to 𝑥.


The inverse Fourier transform is defined as
𝑁−1
1 𝑘𝑛
𝑥𝑛 = ∑ 𝑋 𝑒2𝜋( 𝑁 )𝑖 , 𝑛 = 0, 1, … , 𝑁 − 1
𝑘=0
𝑁 𝑘

def inverse_transform(X):

N = len(X)
w = np.e ** (complex(0, 2*np.pi/N))

x = np.zeros(N, dtype=complex)
for n in range(N):
for k in range(N):
x[n] += X[k] * w ** (k * n) / N

return x

60 Chapter 4. Circulant Matrices


Intermediate Quantitative Economics with Python

inverse_transform(X)

array([ 0.5+0.j, 0.5-0.j, -0. -0.j, -0. -0.j, -0. -0.j, -0. -0.j,
-0. +0.j, -0. +0.j, -0. +0.j, -0. +0.j])

Another example is
11
𝑥𝑛 = 2 cos (2𝜋 𝑛) , 𝑛 = 0, 1, 2, ⋯ 19
40
1 11
Since 𝑁 = 20, we cannot use an integer multiple of 20 to represent a frequency 40 .

To handle this, we shall end up using all 𝑁 of the availble frequencies in the DFT.
11
Since 40 is in between 10 12
40 and 40 (each of which is an integer multiple of
1
20 ), the complex coefficients in the DFT have
their largest magnitudes at 𝑘 = 5, 6, 15, 16, not just at a single frequency.

N = 20
x = np.empty(N)

for j in range(N):
x[j] = 2 * np.cos(2 * np.pi * 11 * j / 40)

X = DFT(x)

plot_magnitude(x=x, X=X)

4.6. Discrete Fourier Transform 61


Intermediate Quantitative Economics with Python

What happens if we change the last example to 𝑥𝑛 = 2 cos (2𝜋 10


40 𝑛)?
10 1
Note that 40 is an integer multiple of 20 .

N = 20
x = np.empty(N)

for j in range(N):
x[j] = 2 * np.cos(2 * np.pi * 10 * j / 40)

X = DFT(x)

plot_magnitude(x=x, X=X)

62 Chapter 4. Circulant Matrices


Intermediate Quantitative Economics with Python

If we represent the discrete Fourier transform as a matrix, we discover that it equals the matrix 𝐹𝑁 of eigenvectors of the
permutation matrix 𝑃𝑁 .
We can use the example where 𝑥𝑛 = 2 cos (2𝜋 11
40 𝑛) , 𝑛 = 0, 1, 2, ⋯ 19 to illustrate this.

N = 20
x = np.empty(N)

for j in range(N):
x[j] = 2 * np.cos(2 * np.pi * 11 * j / 40)

array([ 2. , -0.313, -1.902, 0.908, 1.618, -1.414, -1.176, 1.782,


0.618, -1.975, -0. , 1.975, -0.618, -1.782, 1.176, 1.414,
-1.618, -0.908, 1.902, 0.313])

First use the summation formula to transform 𝑥 to 𝑋.

X = DFT(x)
X

array([2. +0.j , 2. +0.558j, 2. +1.218j, 2. +2.174j, 2. +4.087j,


2.+12.785j, 2.-12.466j, 2. -3.751j, 2. -1.801j, 2. -0.778j,
2. -0.j , 2. +0.778j, 2. +1.801j, 2. +3.751j, 2.+12.466j,
2.-12.785j, 2. -4.087j, 2. -2.174j, 2. -1.218j, 2. -0.558j])

Now let’s evaluate the outcome of postmultiplying the eigenvector matrix 𝐹20 by the vector 𝑥, a product that we claim
should equal the Fourier tranform of the sequence {𝑥𝑛 }𝑁−1
𝑛=0 .

F20, _ = construct_F(20)

F20 @ x

4.6. Discrete Fourier Transform 63


Intermediate Quantitative Economics with Python

array([2. +0.j , 2. +0.558j, 2. +1.218j, 2. +2.174j, 2. +4.087j,


2.+12.785j, 2.-12.466j, 2. -3.751j, 2. -1.801j, 2. -0.778j,
2. -0.j , 2. +0.778j, 2. +1.801j, 2. +3.751j, 2.+12.466j,
2.-12.785j, 2. -4.087j, 2. -2.174j, 2. -1.218j, 2. -0.558j])

−1
Similarly, the inverse DFT can be expressed as a inverse DFT matrix 𝐹20 .

F20_inv = np.linalg.inv(F20)
F20_inv @ X

array([ 2. -0.j, -0.313-0.j, -1.902-0.j, 0.908-0.j, 1.618-0.j,


-1.414+0.j, -1.176+0.j, 1.782+0.j, 0.618-0.j, -1.975-0.j,
-0. +0.j, 1.975-0.j, -0.618-0.j, -1.782+0.j, 1.176+0.j,
1.414-0.j, -1.618-0.j, -0.908+0.j, 1.902+0.j, 0.313-0.j])

64 Chapter 4. Circulant Matrices


CHAPTER

FIVE

SINGULAR VALUE DECOMPOSITION (SVD)

5.1 Overview

The singular value decomposition (SVD) is a work-horse in applications of least squares projection that form founda-
tions for many statistical and machine learning methods.
After defining the SVD, we’ll describe how it connects to
• four fundamental spaces of linear algebra
• under-determined and over-determined least squares regressions
• principal components analysis (PCA)
Like principal components analysis (PCA), DMD can be thought of as a data-reduction procedure that represents salient
patterns by projecting data onto a limited set of factors.
In a sequel to this lecture about Dynamic Mode Decompositions, we’ll describe how SVD’s provide ways rapidly to compute
reduced-order approximations to first-order Vector Autoregressions (VARs).

5.2 The Setting

Let 𝑋 be an 𝑚 × 𝑛 matrix of rank 𝑝.


Necessarily, 𝑝 ≤ min(𝑚, 𝑛).
In much of this lecture, we’ll think of 𝑋 as a matrix of data in which
• each column is an individual – a time period or person, depending on the application
• each row is a random variable describing an attribute of a time period or a person, depending on the application
We’ll be interested in two situations
• A short and fat case in which 𝑚 << 𝑛, so that there are many more columns (individuals) than rows (attributes).
• A tall and skinny case in which 𝑚 >> 𝑛, so that there are many more rows (attributes) than columns (individuals).
We’ll apply a singular value decomposition of 𝑋 in both situations.
In the 𝑚 << 𝑛 case in which there are many more individuals 𝑛 than attributes 𝑚, we can calculate sample moments of
a joint distribution by taking averages across observations of functions of the observations.
In this 𝑚 << 𝑛 case, we’ll look for patterns by using a singular value decomposition to do a principal components
analysis (PCA).
In the 𝑚 >> 𝑛 case in which there are many more attributes 𝑚 than individuals 𝑛 and when we are in a time-series
setting in which 𝑛 equals the number of time periods covered in the data set 𝑋, we’ll proceed in a different way.

65
Intermediate Quantitative Economics with Python

We’ll again use a singular value decomposition, but now to construct a dynamic mode decomposition (DMD)

5.3 Singular Value Decomposition

A singular value decomposition of an 𝑚 × 𝑛 matrix 𝑋 of rank 𝑝 ≤ min(𝑚, 𝑛) is

𝑋 = 𝑈 Σ𝑉 ⊤ (5.1)

where
𝑈𝑈⊤ = 𝐼 𝑈 ⊤𝑈 = 𝐼
𝑉𝑉⊤ = 𝐼 𝑉 ⊤𝑉 = 𝐼

and
• 𝑈 is an 𝑚 × 𝑚 orthogonal matrix of left singular vectors of 𝑋
• Columns of 𝑈 are eigenvectors of 𝑋𝑋 ⊤
• 𝑉 is an 𝑛 × 𝑛 orthogonal matrix of right singular vectors of 𝑋
• Columns of 𝑉 are eigenvectors of 𝑋 ⊤ 𝑋
• Σ is an 𝑚 × 𝑛 matrix in which the first 𝑝 places on its main diagonal are positive numbers 𝜎1 , 𝜎2 , … , 𝜎𝑝 called
singular values; remaining entries of Σ are all zero
• The 𝑝 singular values are positive square roots of the eigenvalues of the 𝑚 × 𝑚 matrix 𝑋𝑋 ⊤ and also of the 𝑛 × 𝑛
matrix 𝑋 ⊤ 𝑋
• We adopt a convention that when 𝑈 is a complex valued matrix, 𝑈 ⊤ denotes the conjugate-transpose or

Hermitian-transpose of 𝑈 , meaning that 𝑈𝑖𝑗 is the complex conjugate of 𝑈𝑗𝑖 .
• Similarly, when 𝑉 is a complex valued matrix, 𝑉 ⊤ denotes the conjugate-transpose or Hermitian-transpose of
𝑉
The matrices 𝑈 , Σ, 𝑉 entail linear transformations that reshape in vectors in the following ways:
• multiplying vectors by the unitary matrices 𝑈 and 𝑉 rotates them, but leaves angles between vectors and lengths
of vectors unchanged.
• multiplying vectors by the diagonal matrix Σ leaves angles between vectors unchanged but rescales vectors.
Thus, representation (5.1) asserts that multiplying an 𝑛 × 1 vector 𝑦 by the 𝑚 × 𝑛 matrix 𝑋 amounts to performing the
following three multiplications of 𝑦 sequentially:
• rotating 𝑦 by computing 𝑉 ⊤ 𝑦
• rescaling 𝑉 ⊤ 𝑦 by multiplying it by Σ
• rotating Σ𝑉 ⊤ 𝑦 by multiplying it by 𝑈
This structure of the 𝑚 × 𝑛 matrix 𝑋 opens the door to constructing systems of data encoders and decoders.
Thus,
• 𝑉 ⊤ 𝑦 is an encoder
• Σ is an operator to be applied to the encoded data
• 𝑈 is a decoder to be applied to the output from applying operator Σ to the encoded data

66 Chapter 5. Singular Value Decomposition (SVD)


Intermediate Quantitative Economics with Python

We’ll apply this circle of ideas later in this lecture when we study Dynamic Mode Decomposition.
Road Ahead
What we have described above is called a full SVD.
In a full SVD, the shapes of 𝑈 , Σ, and 𝑉 are (𝑚, 𝑚), (𝑚, 𝑛), (𝑛, 𝑛), respectively.
Later we’ll also describe an economy or reduced SVD.
Before we study a reduced SVD we’ll say a little more about properties of a full SVD.

5.4 Four Fundamental Subspaces

Let 𝒞 denote a column space, 𝒩 denote a null space, and ℛ denote a row space.
Let’s start by recalling the four fundamental subspaces of an 𝑚 × 𝑛 matrix 𝑋 of rank 𝑝.
• The column space of 𝑋, denoted 𝒞(𝑋), is the span of the columns of 𝑋, i.e., all vectors 𝑦 that can be written as
linear combinations of columns of 𝑋. Its dimension is 𝑝.
• The null space of 𝑋, denoted 𝒩(𝑋) consists of all vectors 𝑦 that satisfy 𝑋𝑦 = 0. Its dimension is 𝑛 − 𝑝.
• The row space of 𝑋, denoted ℛ(𝑋) is the column space of 𝑋 ⊤ . It consists of all vectors 𝑧 that can be written as
linear combinations of rows of 𝑋. Its dimension is 𝑝.
• The left null space of 𝑋, denoted 𝒩(𝑋 ⊤ ), consist of all vectors 𝑧 such that 𝑋 ⊤ 𝑧 = 0. Its dimension is 𝑚 − 𝑝.
For a full SVD of a matrix 𝑋, the matrix 𝑈 of left singular vectors and the matrix 𝑉 of right singular vectors contain
orthogonal bases for all four subspaces.
They form two pairs of orthogonal subspaces that we’ll describe now.
Let 𝑢𝑖 , 𝑖 = 1, … , 𝑚 be the 𝑚 column vectors of 𝑈 and let 𝑣𝑖 , 𝑖 = 1, … , 𝑛 be the 𝑛 column vectors of 𝑉 .
Let’s write the full SVD of X as
Σ𝑝 0 ⊤
𝑋 = [𝑈𝐿 𝑈𝑅 ] [ ] [𝑉𝐿 𝑉𝑅 ] (5.2)
0 0

where Σ𝑝 is a 𝑝 × 𝑝 diagonal matrix with the 𝑝 singular values on the diagonal and

𝑈𝐿 = [𝑢1 ⋯ 𝑢𝑝 ] , 𝑈𝑅 = [𝑢𝑝+1 ⋯ 𝑢𝑚 ]
𝑉𝐿 = [𝑣1 ⋯ 𝑣𝑝 ] , 𝑈𝑅 = [𝑣𝑝+1 ⋯ 𝑢𝑛 ]

Representation (5.2) implies that

Σ𝑝 0
𝑋 [𝑉𝐿 𝑉𝑅 ] = [𝑈𝐿 𝑈𝑅 ] [ ]
0 0
or
𝑋𝑉𝐿 = 𝑈𝐿 Σ𝑝
(5.3)
𝑋𝑉𝑅 = 0
or
𝑋𝑣𝑖 = 𝜎𝑖 𝑢𝑖 , 𝑖 = 1, … , 𝑝
(5.4)
𝑋𝑣𝑖 = 0, 𝑖 = 𝑝 + 1, … , 𝑛

Equations (5.4) tell how the transformation 𝑋 maps a pair of orthonormal vectors 𝑣𝑖 , 𝑣𝑗 for 𝑖 and 𝑗 both less than or equal
to the rank 𝑝 of 𝑋 into a pair of orthonormal vectors 𝑢𝑖 , 𝑢𝑗 .

5.4. Four Fundamental Subspaces 67


Intermediate Quantitative Economics with Python

Equations (5.3) assert that


𝒞(𝑋) = 𝒞(𝑈𝐿 )
𝒩(𝑋) = 𝒞(𝑉𝑅 )
Taking transposes on both sides of representation (5.2) implies
Σ𝑝 0
𝑋 ⊤ [𝑈𝐿 𝑈𝑅 ] = [𝑉𝐿 𝑉𝑅 ] [ ]
0 0
or
𝑋 ⊤ 𝑈𝐿 = 𝑉𝐿 Σ𝑝
(5.5)
𝑋 ⊤ 𝑈𝑅 = 0
or
𝑋 ⊤ 𝑢 𝑖 = 𝜎 𝑖 𝑣𝑖 , 𝑖 = 1, … , 𝑝

(5.6)
𝑋 𝑢𝑖 = 0 𝑖 = 𝑝 + 1, … , 𝑚

Notice how equations (5.6) assert that the transformation 𝑋 ⊤ maps a pair of distinct orthonormal vectors 𝑢𝑖 , 𝑢𝑗 for 𝑖 and
𝑗 both less than or equal to the rank 𝑝 of 𝑋 into a pair of distinct orthonormal vectors 𝑣𝑖 , 𝑣𝑗 .
Equations (5.5) assert that
ℛ(𝑋) ≡ 𝒞(𝑋 ⊤ ) = 𝒞(𝑉𝐿 )
𝒩(𝑋 ⊤ ) = 𝒞(𝑈𝑅 )
Thus, taken together, the systems of equations (5.3) and (5.5) describe the four fundamental subspaces of 𝑋 in the
following ways:
𝒞(𝑋) = 𝒞(𝑈𝐿 )
𝒩(𝑋 ⊤ ) = 𝒞(𝑈𝑅 )
ℛ(𝑋) ≡ 𝒞(𝑋 ⊤ ) = 𝒞(𝑉𝐿 ) (5.7)
𝒩(𝑋) = 𝒞(𝑉𝑅 )

Since 𝑈 and 𝑉 are both orthonormal matrices, collection (5.7) asserts that
• 𝑈𝐿 is an orthonormal basis for the column space of 𝑋
• 𝑈𝑅 is an orthonormal basis for the null space of 𝑋 ⊤
• 𝑉𝐿 is an orthonormal basis for the row space of 𝑋
• 𝑉𝑅 is an orthonormal basis for the null space of 𝑋
We have verified the four claims in (5.7) simply by performing the multiplications called for by the right side of (5.2) and
reading them.
The claims in (5.7) and the fact that 𝑈 and 𝑉 are both unitary (i.e, orthonormal) matrices imply that
• the column space of 𝑋 is orthogonal to the null space of 𝑋 ⊤
• the null space of 𝑋 is orthogonal to the row space of 𝑋
Sometimes these properties are described with the following two pairs of orthogonal complement subspaces:
• 𝒞(𝑋) is the orthogonal complement of 𝒩(𝑋 ⊤ )
• ℛ(𝑋) is the orthogonal complement 𝒩(𝑋)
Let’s do an example.

68 Chapter 5. Singular Value Decomposition (SVD)


Intermediate Quantitative Economics with Python

import numpy as np
import numpy.linalg as LA
import matplotlib.pyplot as plt

Having imported these modules, let’s do the example.

np.set_printoptions(precision=2)

# Define the matrix


A = np.array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[4, 5, 6, 7, 8],
[5, 6, 7, 8, 9]])

# Compute the SVD of the matrix


U, S, V = np.linalg.svd(A,full_matrices=True)

# Compute the rank of the matrix


rank = np.linalg.matrix_rank(A)

# Print the rank of the matrix


print("Rank of matrix:\n", rank)
print("S: \n", S)

# Compute the four fundamental subspaces


row_space = U[:, :rank]
col_space = V[:, :rank]
null_space = V[:, rank:]
left_null_space = U[:, rank:]

print("U:\n", U)
print("Column space:\n", col_space)
print("Left null space:\n", left_null_space)
print("V.T:\n", V.T)
print("Row space:\n", row_space.T)
print("Right null space:\n", null_space.T)

Rank of matrix:
2
S:
[2.69e+01 1.86e+00 1.20e-15 2.24e-16 5.82e-17]
U:
[[-0.27 -0.73 0.63 -0.06 0.06]
[-0.35 -0.42 -0.69 -0.45 0.12]
[-0.43 -0.11 -0.24 0.85 0.12]
[-0.51 0.19 0.06 -0.1 -0.83]
[-0.59 0.5 0.25 -0.24 0.53]]
Column space:
[[-0.27 -0.35]
[ 0.73 0.42]
[ 0.32 -0.65]
[ 0.54 -0.39]
[-0.06 -0.35]]
Left null space:
(continues on next page)

5.4. Four Fundamental Subspaces 69


Intermediate Quantitative Economics with Python

(continued from previous page)


[[ 0.63 -0.06 0.06]
[-0.69 -0.45 0.12]
[-0.24 0.85 0.12]
[ 0.06 -0.1 -0.83]
[ 0.25 -0.24 0.53]]
V.T:
[[-0.27 0.73 0.32 0.54 -0.06]
[-0.35 0.42 -0.65 -0.39 -0.35]
[-0.43 0.11 0.02 -0.29 0.85]
[-0.51 -0.19 0.61 -0.41 -0.4 ]
[-0.59 -0.5 -0.31 0.55 -0.04]]
Row space:
[[-0.27 -0.35 -0.43 -0.51 -0.59]
[-0.73 -0.42 -0.11 0.19 0.5 ]]
Right null space:
[[-0.43 0.11 0.02 -0.29 0.85]
[-0.51 -0.19 0.61 -0.41 -0.4 ]
[-0.59 -0.5 -0.31 0.55 -0.04]]

5.5 Eckart-Young Theorem

Suppose that we want to construct the best rank 𝑟 approximation of an 𝑚 × 𝑛 matrix 𝑋.


By best, we mean a matrix 𝑋𝑟 of rank 𝑟 < 𝑝 that, among all rank 𝑟 matrices, minimizes

||𝑋 − 𝑋𝑟 ||

where || ⋅ || denotes a norm of a matrix 𝑋 and where 𝑋𝑟 belongs to the space of all rank 𝑟 matrices of dimension 𝑚 × 𝑛.
Three popular matrix norms of an 𝑚 × 𝑛 matrix 𝑋 can be expressed in terms of the singular values of 𝑋
||𝑋𝑦||
• the spectral or 𝑙2 norm ||𝑋||2 = max||𝑦||≠0 ||𝑦|| = 𝜎1

• the Frobenius norm ||𝑋||𝐹 = √𝜎12 + ⋯ + 𝜎𝑝2

• the nuclear norm ||𝑋||𝑁 = 𝜎1 + ⋯ + 𝜎𝑝


The Eckart-Young theorem states that for each of these three norms, same rank 𝑟 matrix is best and that it equals

𝑋̂ 𝑟 = 𝜎1 𝑈1 𝑉1⊤ + 𝜎2 𝑈2 𝑉2⊤ + ⋯ + 𝜎𝑟 𝑈𝑟 𝑉𝑟⊤ (5.8)

This is a very powerful theorem that says that we can take our 𝑚 × 𝑛 matrix 𝑋 that in not full rank, and we can best
approximate it by a full rank 𝑝 × 𝑝 matrix through the SVD.
Moreover, if some of these 𝑝 singular values carry more information than others, and if we want to have the most amount
of information with the least amount of data, we can take 𝑟 leading singular values ordered by magnitude.
We’ll say more about this later when we present Principal Component Analysis.
You can read about the Eckart-Young theorem and some of its uses here.
We’ll make use of this theorem when we discuss principal components analysis (PCA) and also dynamic mode decom-
position (DMD).

70 Chapter 5. Singular Value Decomposition (SVD)


Intermediate Quantitative Economics with Python

5.6 Full and Reduced SVD’s

Up to now we have described properties of a full SVD in which shapes of 𝑈 , Σ, and 𝑉 are (𝑚, 𝑚), (𝑚, 𝑛), (𝑛, 𝑛),
respectively.
There is an alternative bookkeeping convention called an economy or reduced SVD in which the shapes of 𝑈 , Σ and 𝑉
are different from what they are in a full SVD.
Thus, note that because we assume that 𝑋 has rank 𝑝, there are only 𝑝 nonzero singular values, where 𝑝 = rank(𝑋) ≤
min (𝑚, 𝑛).
A reduced SVD uses this fact to express 𝑈 , Σ, and 𝑉 as matrices with shapes (𝑚, 𝑝), (𝑝, 𝑝), (𝑛, 𝑝).
You can read about reduced and full SVD here https://numpy.org/doc/stable/reference/generated/numpy.linalg.svd.html
For a full SVD,

𝑈𝑈⊤ = 𝐼 𝑈 ⊤𝑈 = 𝐼
𝑉𝑉⊤ = 𝐼 𝑉 ⊤𝑉 = 𝐼

But not all these properties hold for a reduced SVD.


Which properties hold depend on whether we are in a tall-skinny case or a short-fat case.
• In a tall-skinny case in which 𝑚 >> 𝑛, for a reduced SVD
𝑈𝑈⊤ ≠ 𝐼 𝑈 ⊤𝑈 = 𝐼
𝑉𝑉⊤ = 𝐼 𝑉 ⊤𝑉 = 𝐼
• In a short-fat case in which 𝑚 << 𝑛, for a reduced SVD
𝑈𝑈⊤ = 𝐼 𝑈 ⊤𝑈 = 𝐼
𝑉𝑉⊤ = 𝐼 𝑉 ⊤𝑉 ≠ 𝐼
When we study Dynamic Mode Decomposition below, we shall want to remember these properties when we use a reduced
SVD to compute some DMD representations.
Let’s do an exercise to compare full and reduced SVD’s.
To review,
• in a full SVD
– 𝑈 is 𝑚 × 𝑚
– Σ is 𝑚 × 𝑛
– 𝑉 is 𝑛 × 𝑛
• in a reduced SVD
– 𝑈 is 𝑚 × 𝑝
– Σ is 𝑝 × 𝑝
– 𝑉 is 𝑛 × 𝑝
First, let’s study a case in which 𝑚 = 5 > 𝑛 = 2.
(This is a small example of the tall-skinny case that will concern us when we study Dynamic Mode Decompositions
below.)

5.6. Full and Reduced SVD’s 71


Intermediate Quantitative Economics with Python

import numpy as np
X = np.random.rand(5,2)
U, S, V = np.linalg.svd(X,full_matrices=True) # full SVD
Uhat, Shat, Vhat = np.linalg.svd(X,full_matrices=False) # economy SVD
print('U, S, V =')
U, S, V

U, S, V =

(array([[-0.48, 0.29, -0.29, -0.59, -0.51],


[-0.3 , -0.1 , -0.79, 0.53, 0.05],
[-0.52, -0.76, 0.33, 0.07, -0.21],
[-0.42, 0.57, 0.44, 0.54, -0.15],
[-0.49, 0.09, 0.04, -0.29, 0.82]]),
array([1.93, 0.69]),
array([[-0.52, -0.85],
[-0.85, 0.52]]))

print('Uhat, Shat, Vhat = ')


Uhat, Shat, Vhat

Uhat, Shat, Vhat =

(array([[-0.48, 0.29],
[-0.3 , -0.1 ],
[-0.52, -0.76],
[-0.42, 0.57],
[-0.49, 0.09]]),
array([1.93, 0.69]),
array([[-0.52, -0.85],
[-0.85, 0.52]]))

rr = np.linalg.matrix_rank(X)
print(f'rank of X = {rr}')

rank of X = 2

Properties:
• Where 𝑈 is constructed via a full SVD, 𝑈 ⊤ 𝑈 = 𝐼𝑚×𝑚 and 𝑈 𝑈 ⊤ = 𝐼𝑚×𝑚
• Where 𝑈̂ is constructed via a reduced SVD, although 𝑈̂ ⊤ 𝑈̂ = 𝐼𝑝×𝑝 , it happens that 𝑈̂ 𝑈̂ ⊤ ≠ 𝐼𝑚×𝑚
We illustrate these properties for our example with the following code cells.

UTU = U.T@U
UUT = [email protected]
print('UUT, UTU = ')
UUT, UTU

UUT, UTU =

72 Chapter 5. Singular Value Decomposition (SVD)


Intermediate Quantitative Economics with Python

(array([[ 1.00e+00, -2.73e-16, -4.78e-17, 9.58e-17, -5.26e-17],


[-2.73e-16, 1.00e+00, -3.84e-17, -1.13e-16, -2.19e-16],
[-4.78e-17, -3.84e-17, 1.00e+00, 1.15e-16, -1.66e-16],
[ 9.58e-17, -1.13e-16, 1.15e-16, 1.00e+00, -8.04e-18],
[-5.26e-17, -2.19e-16, -1.66e-16, -8.04e-18, 1.00e+00]]),
array([[ 1.00e+00, -1.96e-16, -1.25e-16, 7.70e-17, 8.84e-17],
[-1.96e-16, 1.00e+00, 1.29e-16, -2.40e-16, -3.68e-17],
[-1.25e-16, 1.29e-16, 1.00e+00, -2.67e-16, -3.88e-17],
[ 7.70e-17, -2.40e-16, -2.67e-16, 1.00e+00, 7.92e-17],
[ 8.84e-17, -3.68e-17, -3.88e-17, 7.92e-17, 1.00e+00]]))

UhatUhatT = [email protected]
UhatTUhat = Uhat.T@Uhat
print('UhatUhatT, UhatTUhat= ')
UhatUhatT, UhatTUhat

UhatUhatT, UhatTUhat=

(array([[ 0.31, 0.11, 0.03, 0.37, 0.26],


[ 0.11, 0.1 , 0.23, 0.07, 0.14],
[ 0.03, 0.23, 0.84, -0.21, 0.18],
[ 0.37, 0.07, -0.21, 0.5 , 0.26],
[ 0.26, 0.14, 0.18, 0.26, 0.25]]),
array([[ 1.00e+00, -1.96e-16],
[-1.96e-16, 1.00e+00]]))

Remarks:
The cells above illustrate the application of the full_matrices=True and full_matrices=False options.
Using full_matrices=False returns a reduced singular value decomposition.
The full and reduced SVD’s both accurately decompose an 𝑚 × 𝑛 matrix 𝑋
When we study Dynamic Mode Decompositions below, it will be important for us to remember the preceding properties
of full and reduced SVD’s in such tall-skinny cases.
Now let’s turn to a short-fat case.
To illustrate this case, we’ll set 𝑚 = 2 < 5 = 𝑛 and compute both full and reduced SVD’s.

import numpy as np
X = np.random.rand(2,5)
U, S, V = np.linalg.svd(X,full_matrices=True) # full SVD
Uhat, Shat, Vhat = np.linalg.svd(X,full_matrices=False) # economy SVD
print('U, S, V = ')
U, S, V

U, S, V =

(array([[ 0.92, -0.38],


[ 0.38, 0.92]]),
array([1.38, 0.31]),
array([[ 0.45, 0.46, 0.31, 0.55, 0.43],
[-0.34, -0.19, -0.53, 0.16, 0.74],
(continues on next page)

5.6. Full and Reduced SVD’s 73


Intermediate Quantitative Economics with Python

(continued from previous page)


[-0.28, -0.52, 0.76, 0.02, 0.27],
[-0.58, 0.12, -0.02, 0.7 , -0.39],
[-0.51, 0.68, 0.22, -0.43, 0.19]]))

print('Uhat, Shat, Vhat = ')


Uhat, Shat, Vhat

Uhat, Shat, Vhat =

(array([[ 0.92, -0.38],


[ 0.38, 0.92]]),
array([1.38, 0.31]),
array([[ 0.45, 0.46, 0.31, 0.55, 0.43],
[-0.34, -0.19, -0.53, 0.16, 0.74]]))

Let’s verify that our reduced SVD accurately represents 𝑋

SShat=np.diag(Shat)
np.allclose(X, Uhat@SShat@Vhat)

True

5.7 Polar Decomposition

A reduced singular value decomposition (SVD) of 𝑋 is related to a polar decomposition of 𝑋

𝑋 = 𝑆𝑄

where
𝑆 = 𝑈 Σ𝑈 ⊤
𝑄 = 𝑈𝑉 ⊤

Here
• 𝑆 is an 𝑚 × 𝑚 symmetric matrix
• 𝑄 is an 𝑚 × 𝑛 orthogonal matrix
and in our reduced SVD
• 𝑈 is an 𝑚 × 𝑝 orthonormal matrix
• Σ is a 𝑝 × 𝑝 diagonal matrix
• 𝑉 is an 𝑛 × 𝑝 orthonormal

74 Chapter 5. Singular Value Decomposition (SVD)


Intermediate Quantitative Economics with Python

5.8 Application: Principal Components Analysis (PCA)

Let’s begin with a case in which 𝑛 >> 𝑚, so that we have many more individuals 𝑛 than attributes 𝑚.
The matrix 𝑋 is short and fat in an 𝑛 >> 𝑚 case as opposed to a tall and skinny case with 𝑚 >> 𝑛 to be discussed
later.
We regard 𝑋 as an 𝑚 × 𝑛 matrix of data:

𝑋 = [𝑋1 ∣ 𝑋2 ∣ ⋯ ∣ 𝑋𝑛 ]

𝑋1𝑗 𝑥1
⎡𝑋 ⎤ ⎡𝑥 ⎤
2𝑗 ⎥ is a vector of observations on variables ⎢ 2 ⎥.
where for 𝑗 = 1, … , 𝑛 the column vector 𝑋𝑗 = ⎢
⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣𝑋𝑚𝑗 ⎦ ⎣𝑥𝑚 ⎦
In a time series setting, we would think of columns 𝑗 as indexing different times at which random variables are observed,
while rows index different random variables.
In a cross-section setting, we would think of columns 𝑗 as indexing different individuals for which random variables are
observed, while rows index different attributes.
As we have seen before, the SVD is a way to decompose a matrix into useful components, just like polar decomposition,
eigendecomposition, and many others.
PCA, on the other hand, is a method that builds on the SVD to analyze data. The goal is to apply certain steps, to help
better visualize patterns in data, using statistical tools to capture the most important patterns in data.
Step 1: Standardize the data:
Because our data matrix may hold variables of different units and scales, we first need to standardize the data.
First by computing the average of each row of 𝑋.

1 𝑚
𝑋𝑗̄ = ∑𝑥
𝑚 𝑖=1 𝑖,𝑗

We then create an average matrix out of these means:

1
⎡1⎤
𝑋̄ = ⎢ ⎥ [𝑋1̄ ∣ 𝑋2̄ ∣ ⋯ ∣ 𝑋𝑛̄ ]
⎢…⎥
⎣1⎦

And subtract out of the original matrix to create a mean centered matrix:

𝐵 = 𝑋 − 𝑋̄

Step 2: Compute the covariance matrix:


Then because we want to extract the relationships between variables rather than just their magnitude, in other words, we
want to know how they can explain each other, we compute the covariance matrix of 𝐵.
1 ⊤
𝐶= 𝐵 𝐵
𝑛
Step 3: Decompose the covariance matrix and arrange the singular values:
If the matrix 𝐶 is diagonalizable, we can eigendecompose it, find its eigenvalues and rearrange the eigenvalue and eigen-
vector matrices in a decreasing other.

5.8. Application: Principal Components Analysis (PCA) 75


Intermediate Quantitative Economics with Python

If 𝐶 is not diagonalizable, we can perform an SVD of 𝐶:

𝐵𝑇 𝐵 = 𝑉 Σ⊤ 𝑈 ⊤ 𝑈 Σ𝑉 ⊤
= 𝑉 Σ⊤ Σ𝑉 ⊤

1
𝐶= 𝑉 Σ⊤ Σ𝑉 ⊤
𝑛
We can then rearrange the columns in the matrices 𝑉 and Σ so that the singular values are in decreasing order.
Step 4: Select singular values, (optional) truncate the rest:
We can now decide how many singular values to pick, based on how much variance you want to retain. (e.g., retaining
95% of the total variance).
We can obtain the percentage by calculating the variance contained in the leading 𝑟 factors divided by the variance in
total:
𝑟
∑𝑖=1 𝜎𝑖2
𝑝
∑𝑖=1 𝜎𝑖2

Step 5: Create the Score Matrix:

𝑇 = 𝐵𝑉
= 𝑈 Σ𝑉 ⊤
= 𝑈Σ

5.9 Relationship of PCA to SVD

To relate an SVD to a PCA of data set 𝑋, first construct the SVD of the data matrix 𝑋:
Let’s assume that sample means of all variables are zero, so we don’t need to standardize our matrix.

𝑋 = 𝑈 Σ𝑉 ⊤ = 𝜎1 𝑈1 𝑉1⊤ + 𝜎2 𝑈2 𝑉2⊤ + ⋯ + 𝜎𝑝 𝑈𝑝 𝑉𝑝⊤ (5.9)

where

𝑈 = [𝑈1 |𝑈2 | … |𝑈𝑚 ]

𝑉1⊤
⎡𝑉 ⊤ ⎤
𝑉⊤ =⎢ 2 ⎥
⎢…⎥
⎣𝑉𝑛⊤ ⎦
In equation (5.9), each of the 𝑚 × 𝑛 matrices 𝑈𝑗 𝑉𝑗⊤ is evidently of rank 1.
Thus, we have

𝑈11 𝑉1⊤ 𝑈12 𝑉2⊤ 𝑈1𝑝 𝑉𝑝⊤


⎛ ⊤⎞
𝑈21 𝑉1 ⎟ ⎛ ⊤⎞
𝑈22 𝑉2 ⎟ ⎛
⎜ 𝑈2𝑝 𝑉𝑝⊤ ⎞

𝑋 = 𝜎1 ⎜

⎜ ⋯ ⎟ ⎟ + 𝜎 2


⎜ ⋯ ⎟ ⎟ + … + 𝜎 𝑝 ⎜
⎜ ⋯ ⎟ ⎟ (5.10)
⎝𝑈𝑚1 𝑉1⊤ ⎠ ⎝𝑈𝑚2 𝑉2⊤ ⎠ ⊤
⎝𝑈𝑚𝑝 𝑉𝑝 ⎠

Here is how we would interpret the objects in the matrix equation (5.10) in a time series context:
𝑛
• for each 𝑘 = 1, … , 𝑛, the object {𝑉𝑘𝑗 }𝑗=1 is a time series for the 𝑘th principal component

76 Chapter 5. Singular Value Decomposition (SVD)


Intermediate Quantitative Economics with Python

𝑈1𝑘
⎡𝑈 ⎤
• 𝑈𝑗 = ⎢ 2𝑘 ⎥ 𝑘 = 1, … , 𝑚 is a vector of loadings of variables 𝑋𝑖 on the 𝑘th principal component, 𝑖 = 1, … , 𝑚
⎢ … ⎥
⎣𝑈𝑚𝑘 ⎦
• 𝜎𝑘 for each 𝑘 = 1, … , 𝑝 is the strength of 𝑘th principal component, where strength means contribution to the
overall covariance of 𝑋.

5.10 PCA with Eigenvalues and Eigenvectors

We now use an eigen decomposition of a sample covariance matrix to do PCA.


Let 𝑋𝑚×𝑛 be our 𝑚 × 𝑛 data matrix.
Let’s assume that sample means of all variables are zero.
We can assure this by pre-processing the data by subtracting sample means.
Define a sample covariance matrix Ω as

Ω = 𝑋𝑋 ⊤

Then use an eigen decomposition to represent Ω as follows:

Ω = 𝑃 Λ𝑃 ⊤

Here
• 𝑃 is 𝑚 × 𝑚 matrix of eigenvectors of Ω
• Λ is a diagonal matrix of eigenvalues of Ω
We can then represent 𝑋 as

𝑋 = 𝑃𝜖

where

𝜖 = 𝑃 −1 𝑋

and

𝜖𝜖⊤ = Λ.

We can verify that

𝑋𝑋 ⊤ = 𝑃 Λ𝑃 ⊤ . (5.11)

It follows that we can represent the data matrix 𝑋 as

𝜖1
⎡𝜖 ⎤
𝑋 = [𝑋1 |𝑋2 | … |𝑋𝑚 ] = [𝑃1 |𝑃2 | … |𝑃𝑚 ] ⎢ 2 ⎥ = 𝑃1 𝜖1 + 𝑃2 𝜖2 + … + 𝑃𝑚 𝜖𝑚
⎢…⎥
⎣𝜖𝑚 ⎦

To reconcile the preceding representation with the PCA that we had obtained earlier through the SVD, we first note that
𝜖2𝑗 = 𝜆𝑗 ≡ 𝜎𝑗2 .

5.10. PCA with Eigenvalues and Eigenvectors 77


Intermediate Quantitative Economics with Python

𝜖𝑗
Now define 𝜖𝑗̃ = √𝜆𝑗
, which implies that 𝜖𝑗̃ 𝜖⊤
𝑗̃ = 1.

Therefore
𝑋 = √𝜆1 𝑃1 𝜖1̃ + √𝜆2 𝑃2 𝜖2̃ + … + √𝜆𝑚 𝑃𝑚 𝜖𝑚̃
= 𝜎1 𝑃1 𝜖2̃ + 𝜎2 𝑃2 𝜖2̃ + … + 𝜎𝑚 𝑃𝑚 𝜖𝑚̃ ,

which agrees with


𝑇 𝑇 𝑇
𝑋 = 𝜎1 𝑈1 𝑉1 + 𝜎2 𝑈2 𝑉2 + … + 𝜎𝑟 𝑈𝑟 𝑉𝑟

provided that we set


• 𝑈𝑗 = 𝑃𝑗 (a vector of loadings of variables on principal component 𝑗)
𝑇
• 𝑉𝑘 = 𝜖𝑘̃ (the 𝑘th principal component)
Because there are alternative algorithms for computing 𝑃 and 𝑈 for given a data matrix 𝑋, depending on algorithms used,
we might have sign differences or different orders of eigenvectors.
We can resolve such ambiguities about 𝑈 and 𝑃 by
1. sorting eigenvalues and singular values in descending order
2. imposing positive diagonals on 𝑃 and 𝑈 and adjusting signs in 𝑉 ⊤ accordingly

5.11 Connections

To pull things together, it is useful to assemble and compare some formulas presented above.
First, consider an SVD of an 𝑚 × 𝑛 matrix:

𝑋 = 𝑈 Σ𝑉 ⊤

Compute:

𝑋𝑋 ⊤ = 𝑈 Σ𝑉 ⊤ 𝑉 Σ⊤ 𝑈 ⊤
≡ 𝑈 ΣΣ⊤ 𝑈 ⊤ (5.12)
≡ 𝑈 Λ𝑈 ⊤

Compare representation (5.12) with equation (5.11) above.


Evidently, 𝑈 in the SVD is the matrix 𝑃 of eigenvectors of 𝑋𝑋 ⊤ and ΣΣ⊤ is the matrix Λ of eigenvalues.
Second, let’s compute

𝑋 ⊤ 𝑋 = 𝑉 Σ⊤ 𝑈 ⊤ 𝑈 Σ𝑉 ⊤
= 𝑉 Σ⊤ Σ𝑉 ⊤

Thus, the matrix 𝑉 in the SVD is the matrix of eigenvectors of 𝑋 ⊤ 𝑋


Summarizing and fitting things together, we have the eigen decomposition of the sample covariance matrix

𝑋𝑋 ⊤ = 𝑃 Λ𝑃 ⊤

where 𝑃 is an orthogonal matrix.


Further, from the SVD of 𝑋, we know that

𝑋𝑋 ⊤ = 𝑈 ΣΣ⊤ 𝑈 ⊤

78 Chapter 5. Singular Value Decomposition (SVD)


Intermediate Quantitative Economics with Python

where 𝑈 is an orthogonal matrix.


Thus, 𝑃 = 𝑈 and we have the representation of 𝑋

𝑋 = 𝑃 𝜖 = 𝑈 Σ𝑉 ⊤

It follows that

𝑈 ⊤ 𝑋 = Σ𝑉 ⊤ = 𝜖

Note that the preceding implies that

𝜖𝜖⊤ = Σ𝑉 ⊤ 𝑉 Σ⊤ = ΣΣ⊤ = Λ,

so that everything fits together.


Below we define a class DecomAnalysis that wraps PCA and SVD for a given a data matrix X.

class DecomAnalysis:
"""
A class for conducting PCA and SVD.
X: data matrix
r_component: chosen rank for best approximation
"""

def __init__(self, X, r_component=None):

self.X = X

self.Ω = (X @ X.T)

self.m, self.n = X.shape


self.r = LA.matrix_rank(X)

if r_component:
self.r_component = r_component
else:
self.r_component = self.m

def pca(self):

, P = LA.eigh(self.Ω) # columns of P are eigenvectors

ind = sorted(range( .size), key=lambda x: [x], reverse=True)

# sort by eigenvalues
self. = [ind]
P = P[:, ind]
self.P = P @ diag_sign(P)

self.Λ = np.diag(self. )

self.explained_ratio_pca = np.cumsum(self. ) / self. .sum()

# compute the N by T matrix of principal components


self. = self.P.T @ self.X

P = self.P[:, :self.r_component]
(continues on next page)

5.11. Connections 79
Intermediate Quantitative Economics with Python

(continued from previous page)


= self. [:self.r_component, :]

# transform data
self.X_pca = P @

def svd(self):

U, , VT = LA.svd(self.X)

ind = sorted(range( .size), key=lambda x: [x], reverse=True)

# sort by eigenvalues
d = min(self.m, self.n)

self. = [ind]
U = U[:, ind]
D = diag_sign(U)
self.U = U @ D
VT[:d, :] = D @ VT[ind, :]
self.VT = VT

self.Σ = np.zeros((self.m, self.n))


self.Σ[:d, :d] = np.diag(self. )

_sq = self. ** 2
self.explained_ratio_svd = np.cumsum( _sq) / _sq.sum()

# slicing matrices by the number of components to use


U = self.U[:, :self.r_component]
Σ = self.Σ[:self.r_component, :self.r_component]
VT = self.VT[:self.r_component, :]

# transform data
self.X_svd = U @ Σ @ VT

def fit(self, r_component):

# pca
P = self.P[:, :r_component]
= self. [:r_component, :]

# transform data
self.X_pca = P @

# svd
U = self.U[:, :r_component]
Σ = self.Σ[:r_component, :r_component]
VT = self.VT[:r_component, :]

# transform data
self.X_svd = U @ Σ @ VT

def diag_sign(A):
"Compute the signs of the diagonal of matrix A"

D = np.diag(np.sign(np.diag(A)))

(continues on next page)

80 Chapter 5. Singular Value Decomposition (SVD)


Intermediate Quantitative Economics with Python

(continued from previous page)

return D

We also define a function that prints out information so that we can compare decompositions obtained by different algo-
rithms.

def compare_pca_svd(da):
"""
Compare the outcomes of PCA and SVD.
"""

da.pca()
da.svd()

print('Eigenvalues and Singular values\n')


print(f'λ = {da.λ}\n')
print(f'σ^2 = {da.σ**2}\n')
print('\n')

# loading matrices
fig, axs = plt.subplots(1, 2, figsize=(14, 5))
plt.suptitle('loadings')
axs[0].plot(da.P.T)
axs[0].set_title('P')
axs[0].set_xlabel('m')
axs[1].plot(da.U.T)
axs[1].set_title('U')
axs[1].set_xlabel('m')
plt.show()

# principal components
fig, axs = plt.subplots(1, 2, figsize=(14, 5))
plt.suptitle('principal components')
axs[0].plot(da.ε.T)
axs[0].set_title('ε')
axs[0].set_xlabel('n')
axs[1].plot(da.VT[:da.r, :].T * np.sqrt(da.λ))
axs[1].set_title('$V^\top *\sqrt{\lambda}$')
axs[1].set_xlabel('n')
plt.show()

5.12 Exercises

Exercise 5.12.1
In Ordinary Least Squares (OLS), we learn to compute 𝛽 ̂ = (𝑋 ⊤ 𝑋)−1 𝑋 ⊤ 𝑦, but there are cases such as when we have
colinearity or an underdetermined system: short fat matrix.
In these cases, the (𝑋 ⊤ 𝑋) matrix is not not invertible (its determinant is zero) or ill-conditioned (its determinant is very
close to zero).
What we can do instead is to create what is called a pseudoinverse, a full rank approximation of the inverted matrix so
we can compute 𝛽 ̂ with it.

5.12. Exercises 81
Intermediate Quantitative Economics with Python

Thinking in terms of the Eckart-Young theorem, build the pseudoinverse matrix 𝑋 + and use it to compute 𝛽.̂

Solution to Exercise 5.12.1


We can use SVD to compute the pseudoinverse:

𝑋 = 𝑈 Σ𝑉 ⊤

inverting 𝑋, we have:

𝑋 + = 𝑉 Σ+ 𝑈 ⊤

where:
1
𝜎 0 ⋯ 0 0
⎡ 01 1
⋯ 0 0⎤
⎢ 𝜎2 ⎥
Σ+ = ⎢ ⋮ ⋮ ⋱ ⋮ ⋮⎥
⎢ 1 ⎥
⎢0 0 ⋯ 𝜎𝑝 0⎥
⎣0 0 ⋯ 0 0⎦

and finally:

𝛽 ̂ = 𝑋 + 𝑦 = 𝑉 Σ+ 𝑈 ⊤ 𝑦

For an example PCA applied to analyzing the structure of intelligence tests see this lecture Multivariable Normal Distri-
bution.
Look at parts of that lecture that describe and illustrate the classic factor analysis model.
As mentioned earlier, in a sequel to this lecture about Dynamic Mode Decompositions, we’ll describe how SVD’s provide
ways rapidly to compute reduced-order approximations to first-order Vector Autoregressions (VARs).

82 Chapter 5. Singular Value Decomposition (SVD)


CHAPTER

SIX

VARS AND DMDS

This lecture applies computational methods that we learned about in this lecture Singular Value Decomposition to
• first-order vector autoregressions (VARs)
• dynamic mode decompositions (DMDs)
• connections between DMDs and first-order VARs

6.1 First-Order Vector Autoregressions

We want to fit a first-order vector autoregression

𝑋𝑡+1 = 𝐴𝑋𝑡 + 𝐶𝜖𝑡+1 , 𝜖𝑡+1 ⟂ 𝑋𝑡 (6.1)

where 𝜖𝑡+1 is the time 𝑡 + 1 component of a sequence of i.i.d. 𝑚 × 1 random vectors with mean vector zero and identity
covariance matrix and where the 𝑚 × 1 vector 𝑋𝑡 is

𝑋𝑡 = [𝑋1,𝑡 𝑋2,𝑡 ⋯ 𝑋𝑚,𝑡 ] (6.2)

and where ⋅⊤ again denotes complex transposition and 𝑋𝑖,𝑡 is variable 𝑖 at time 𝑡.
We want to fit equation (6.1).
Our data are organized in an 𝑚 × (𝑛 + 1) matrix 𝑋̃

𝑋̃ = [𝑋1 ∣ 𝑋2 ∣ ⋯ ∣ 𝑋𝑛 ∣ 𝑋𝑛+1 ]

where for 𝑡 = 1, … , 𝑛 + 1, the 𝑚 × 1 vector 𝑋𝑡 is given by (6.2).


Thus, we want to estimate a system (6.1) that consists of 𝑚 least squares regressions of everything on one lagged value
of everything.
The 𝑖’th equation of (6.1) is a regression of 𝑋𝑖,𝑡+1 on the vector 𝑋𝑡 .
We proceed as follows.
From 𝑋,̃ we form two 𝑚 × 𝑛 matrices

𝑋 = [𝑋1 ∣ 𝑋2 ∣ ⋯ ∣ 𝑋𝑛 ]

and

𝑋 ′ = [𝑋2 ∣ 𝑋3 ∣ ⋯ ∣ 𝑋𝑛+1 ]

Here ′ is part of the name of the matrix 𝑋 ′ and does not indicate matrix transposition.

83
Intermediate Quantitative Economics with Python

We use ⋅⊤ to denote matrix transposition or its extension to complex matrices.


In forming 𝑋 and 𝑋 ′ , we have in each case dropped a column from 𝑋,̃ the last column in the case of 𝑋, and the first
column in the case of 𝑋 ′ .
Evidently, 𝑋 and 𝑋 ′ are both 𝑚 × 𝑛 matrices.
We denote the rank of 𝑋 as 𝑝 ≤ min(𝑚, 𝑛).
Two cases that interest us are
• 𝑛 >> 𝑚, so that we have many more time series observations 𝑛 than variables 𝑚
• 𝑚 >> 𝑛, so that we have many more variables 𝑚 than time series observations 𝑛
At a general level that includes both of these special cases, a common formula describes the least squares estimator 𝐴 ̂ of
𝐴.
But important details differ.
The common formula is

𝐴̂ = 𝑋′𝑋+ (6.3)

where 𝑋 + is the pseudo-inverse of 𝑋.


To read about the Moore-Penrose pseudo-inverse please see Moore-Penrose pseudo-inverse
Applicable formulas for the pseudo-inverse differ for our two cases.
Short-Fat Case:
When 𝑛 >> 𝑚, so that we have many more time series observations 𝑛 than variables 𝑚 and when 𝑋 has linearly
independent rows, 𝑋𝑋 ⊤ has an inverse and the pseudo-inverse 𝑋 + is

𝑋 + = 𝑋 ⊤ (𝑋𝑋 ⊤ )−1

Here 𝑋 + is a right-inverse that verifies 𝑋𝑋 + = 𝐼𝑚×𝑚 .


In this case, our formula (6.3) for the least-squares estimator of the population matrix of regression coefficients 𝐴 becomes

𝐴 ̂ = 𝑋 ′ 𝑋 ⊤ (𝑋𝑋 ⊤ )−1 (6.4)

This formula for least-squares regression coefficients is widely used in econometrics.


It is used to estimate vector autorgressions.
The right side of formula (6.4) is proportional to the empirical cross second moment matrix of 𝑋𝑡+1 and 𝑋𝑡 times the
inverse of the second moment matrix of 𝑋𝑡 .
Tall-Skinny Case:
When 𝑚 >> 𝑛, so that we have many more attributes 𝑚 than time series observations 𝑛 and when 𝑋 has linearly
independent columns, 𝑋 ⊤ 𝑋 has an inverse and the pseudo-inverse 𝑋 + is

𝑋 + = (𝑋 ⊤ 𝑋)−1 𝑋 ⊤

Here 𝑋 + is a left-inverse that verifies 𝑋 + 𝑋 = 𝐼𝑛×𝑛 .


In this case, our formula (6.3) for a least-squares estimator of 𝐴 becomes

𝐴 ̂ = 𝑋 ′ (𝑋 ⊤ 𝑋)−1 𝑋 ⊤ (6.5)

Please compare formulas (6.4) and (6.5) for 𝐴.̂

84 Chapter 6. VARs and DMDs


Intermediate Quantitative Economics with Python

Here we are especially interested in formula (6.5).


The 𝑖th row of 𝐴 ̂ is an 𝑚 × 1 vector of regression coefficients of 𝑋𝑖,𝑡+1 on 𝑋𝑗,𝑡 , 𝑗 = 1, … , 𝑚.
̂ we find that
If we use formula (6.5) to calculate 𝐴𝑋

̂ = 𝑋′
𝐴𝑋

so that the regression equation fits perfectly.


This is a typical outcome in an underdetermined least-squares model.
To reiterate, in the tall-skinny case (described in Singular Value Decomposition) in which we have a number 𝑛 of obser-
vations that is small relative to the number 𝑚 of attributes that appear in the vector 𝑋𝑡 , we want to fit equation (6.1).
We confront the facts that the least squares estimator is underdetermined and that the regression equation fits perfectly.
To proceed, we’ll want efficiently to calculate the pseudo-inverse 𝑋 + .
The pseudo-inverse 𝑋 + will be a component of our estimator of 𝐴.
As our estimator 𝐴 ̂ of 𝐴 we want to form an 𝑚 × 𝑚 matrix that solves the least-squares best-fit problem

𝐴 ̂ = argmin𝐴̌ ||𝑋 ′ − 𝐴𝑋||


̌
𝐹 (6.6)

where || ⋅ ||𝐹 denotes the Frobenius (or Euclidean) norm of a matrix.


The Frobenius norm is defined as

√𝑚 𝑚
||𝐴||𝐹 = √∑ ∑ |𝐴𝑖𝑗 |2
⎷ 𝑖=1 𝑗=1
The minimizer of the right side of equation (6.6) is

𝐴̂ = 𝑋′𝑋+ (6.7)

where the (possibly huge) 𝑛 × 𝑚 matrix 𝑋 + = (𝑋 ⊤ 𝑋)−1 𝑋 ⊤ is again a pseudo-inverse of 𝑋.


For some situations that we are interested in, 𝑋 ⊤ 𝑋 can be close to singular, a situation that makes some numerical
algorithms be inaccurate.
To acknowledge that possibility, we’ll use efficient algorithms to constructing a reduced-rank approximation of 𝐴 ̂ in
formula (6.5).
Such an approximation to our vector autoregression will no longer fit perfectly.
The 𝑖th row of 𝐴 ̂ is an 𝑚 × 1 vector of regression coefficients of 𝑋𝑖,𝑡+1 on 𝑋𝑗,𝑡 , 𝑗 = 1, … , 𝑚.
An efficient way to compute the pseudo-inverse 𝑋 + is to start with a singular value decomposition

𝑋 = 𝑈 Σ𝑉 ⊤ (6.8)

where we remind ourselves that for a reduced SVD, 𝑋 is an 𝑚 × 𝑛 matrix of data, 𝑈 is an 𝑚 × 𝑝 matrix, Σ is a 𝑝 × 𝑝
matrix, and 𝑉 is an 𝑛 × 𝑝 matrix.
We can efficiently construct the pertinent pseudo-inverse 𝑋 + by recognizing the following string of equalities.

𝑋 + = (𝑋 ⊤ 𝑋)−1 𝑋 ⊤
= (𝑉 Σ𝑈 ⊤ 𝑈 Σ𝑉 ⊤ )−1 𝑉 Σ𝑈 ⊤
= (𝑉 ΣΣ𝑉 ⊤ )−1 𝑉 Σ𝑈 ⊤ (6.9)
−1 −1 ⊤ ⊤
= 𝑉 Σ Σ 𝑉 𝑉 Σ𝑈
= 𝑉 Σ−1 𝑈 ⊤

6.1. First-Order Vector Autoregressions 85


Intermediate Quantitative Economics with Python

(Since we are in the 𝑚 >> 𝑛 case in which 𝑉 ⊤ 𝑉 = 𝐼𝑝×𝑝 in a reduced SVD, we can use the preceding string of equalities
for a reduced SVD as well as for a full SVD.)
Thus, we shall construct a pseudo-inverse 𝑋 + of 𝑋 by using a singular value decomposition of 𝑋 in equation (6.8) to
compute

𝑋 + = 𝑉 Σ−1 𝑈 ⊤ (6.10)

where the matrix Σ−1 is constructed by replacing each non-zero element of Σ with 𝜎𝑗−1 .

We can use formula (6.10) together with formula (6.7) to compute the matrix 𝐴 ̂ of regression coefficients.
Thus, our estimator 𝐴 ̂ = 𝑋 ′ 𝑋 + of the 𝑚 × 𝑚 matrix of coefficients 𝐴 is

𝐴 ̂ = 𝑋 ′ 𝑉 Σ−1 𝑈 ⊤ (6.11)

6.2 Dynamic Mode Decomposition (DMD)

We turn to the 𝑚 >> 𝑛 tall and skinny case associated with Dynamic Mode Decomposition.
Here an 𝑚 × 𝑛 + 1 data matrix 𝑋̃ contains many more attributes (or variables) 𝑚 than time periods 𝑛 + 1.
Dynamic mode decomposition was introduced by [Schmid, 2010],
You can read about Dynamic Mode Decomposition [Kutz et al., 2016] and [Brunton and Kutz, 2019] (section 7.2).
Dynamic Mode Decomposition (DMD) computes a rank 𝑟 < 𝑝 approximation to the least squares regression coefficients
𝐴 ̂ described by formula (6.11).
We’ll build up gradually to a formulation that is useful in applications.
We’ll do this by describing three alternative representations of our first-order linear dynamic system, i.e., our vector
autoregression.
Guide to three representations: In practice, we’ll mainly be interested in Representation 3.
We use the first two representations to present some useful intermediate steps that help us to appreciate what is under the
hood of Representation 3.
In applications, we’ll use only a small subset of DMD modes to approximate dynamics.
We use such a small subset of DMD modes to construct a reduced-rank approximation to 𝐴.
To do that, we’ll want to use the reduced SVD’s affiliated with representation 3, not the full SVD’s affiliated with repre-
sentations 1 and 2.
Guide to impatient reader: In our applications, we’ll be using Representation 3.
You might want to skip the stage-setting representations 1 and 2 on first reading.

6.3 Representation 1

In this representation, we shall use a full SVD of 𝑋.


We use the 𝑚 columns of 𝑈 , and thus the 𝑚 rows of 𝑈 ⊤ , to define a 𝑚 × 1 vector 𝑏̃𝑡 as

𝑏̃𝑡 = 𝑈 ⊤ 𝑋𝑡 . (6.12)

86 Chapter 6. VARs and DMDs


Intermediate Quantitative Economics with Python

The original data 𝑋𝑡 can be represented as

𝑋𝑡 = 𝑈 𝑏̃𝑡 (6.13)

(Here we use 𝑏 to remind ourselves that we are creating a basis vector.)


Since we are now using a full SVD, 𝑈 𝑈 ⊤ = 𝐼𝑚×𝑚 .
So it follows from equation (6.12) that we can reconstruct 𝑋𝑡 from 𝑏̃𝑡 .
In particular,
• Equation (6.12) serves as an encoder that rotates the 𝑚 × 1 vector 𝑋𝑡 to become an 𝑚 × 1 vector 𝑏̃𝑡
• Equation (6.13) serves as a decoder that reconstructs the 𝑚 × 1 vector 𝑋𝑡 by rotating the 𝑚 × 1 vector 𝑏̃𝑡
Define a transition matrix for an 𝑚 × 1 basis vector 𝑏̃𝑡 by

𝐴 ̃ = 𝑈 ⊤ 𝐴𝑈
̂ (6.14)

We can recover 𝐴 ̂ from

𝐴 ̂ = 𝑈 𝐴𝑈
̃ ⊤

Dynamics of the 𝑚 × 1 basis vector 𝑏̃𝑡 are governed by

𝑏̃𝑡+1 = 𝐴𝑏̃ ̃𝑡

To construct forecasts 𝑋 𝑡 of future values of 𝑋𝑡 conditional on 𝑋1 , we can apply decoders (i.e., rotators) to both sides
of this equation and deduce

𝑋 𝑡+1 = 𝑈 𝐴𝑡̃ 𝑈 ⊤ 𝑋1

where we use 𝑋 𝑡+1 , 𝑡 ≥ 1 to denote a forecast.

6.4 Representation 2

This representation is related to one originally proposed by [Schmid, 2010].


It can be regarded as an intermediate step on the way to obtaining a related representation 3 to be presented later
As with Representation 1, we continue to
• use a full SVD and not a reduced SVD
As we observed and illustrated in a lecture about the Singular Value Decomposition
• (a) for a full SVD 𝑈 𝑈 ⊤ = 𝐼𝑚×𝑚 and 𝑈 ⊤ 𝑈 = 𝐼𝑝×𝑝 are both identity matrices
• (b) for a reduced SVD of 𝑋, 𝑈 ⊤ 𝑈 is not an identity matrix.
As we shall see later, a full SVD is too confining for what we ultimately want to do, namely, cope with situations in which
𝑈 ⊤ 𝑈 is not an identity matrix because we use a reduced SVD of 𝑋.
But for now, let’s proceed under the assumption that we are using a full SVD so that requirements (a) and (b) are both
satisfied.
Form an eigendecomposition of the 𝑚 × 𝑚 matrix 𝐴 ̃ = 𝑈 ⊤ 𝐴𝑈
̂ defined in equation (6.14):

𝐴 ̃ = 𝑊 Λ𝑊 −1 (6.15)

6.4. Representation 2 87
Intermediate Quantitative Economics with Python

where Λ is a diagonal matrix of eigenvalues and 𝑊 is an 𝑚 × 𝑚 matrix whose columns are eigenvectors corresponding
to rows (eigenvalues) in Λ.
When 𝑈 𝑈 ⊤ = 𝐼𝑚×𝑚 , as is true with a full SVD of 𝑋, it follows that

𝐴 ̂ = 𝑈 𝐴𝑈
̃ ⊤ = 𝑈 𝑊 Λ𝑊 −1 𝑈 ⊤ (6.16)

According to equation (6.16), the diagonal matrix Λ contains eigenvalues of 𝐴 ̂ and corresponding eigenvectors of 𝐴 ̂ are
columns of the matrix 𝑈 𝑊 .
It follows that the systematic (i.e., not random) parts of the 𝑋𝑡 dynamics captured by our first-order vector autoregressions
are described by

𝑋𝑡+1 = 𝑈 𝑊 Λ𝑊 −1 𝑈 ⊤ 𝑋𝑡

Multiplying both sides of the above equation by 𝑊 −1 𝑈 ⊤ gives

𝑊 −1 𝑈 ⊤ 𝑋𝑡+1 = Λ𝑊 −1 𝑈 ⊤ 𝑋𝑡

or

𝑏̂𝑡+1 = Λ𝑏̂𝑡
where our encoder is

𝑏̂𝑡 = 𝑊 −1 𝑈 ⊤ 𝑋𝑡
and our decoder is

𝑋𝑡 = 𝑈 𝑊 𝑏̂𝑡
We can use this representation to construct a predictor 𝑋 𝑡+1 of 𝑋𝑡+1 conditional on 𝑋1 via:

𝑋 𝑡+1 = 𝑈 𝑊 Λ𝑡 𝑊 −1 𝑈 ⊤ 𝑋1 (6.17)

In effect, [Schmid, 2010] defined an 𝑚 × 𝑚 matrix Φ𝑠 as

Φ𝑠 = 𝑈 𝑊 (6.18)

and a generalized inverse

Φ+
𝑠 =𝑊
−1 ⊤
𝑈 (6.19)

[Schmid, 2010] then represented equation (6.17) as

𝑋 𝑡+1 = Φ𝑠 Λ𝑡 Φ+
𝑠 𝑋1 (6.20)

Components of the basis vector 𝑏̂𝑡 = 𝑊 −1 𝑈 ⊤ 𝑋𝑡 ≡ Φ+


𝑠 𝑋𝑡 are
DMD projected modes.
To understand why they are called projected modes, notice that

Φ+ ⊤ −1 ⊤
𝑠 = (Φ𝑠 Φ𝑠 ) Φ𝑠

so that the 𝑚 × 𝑝 matrix

𝑏̂ = Φ +
𝑠𝑋

is a matrix of regression coefficients of the 𝑚 × 𝑛 matrix 𝑋 on the 𝑚 × 𝑝 matrix Φ𝑠 .


We’ll say more about this interpretation in a related context when we discuss representation 3, which was suggested by
Tu et al. [Tu et al., 2014].
It is more appropriate to use representation 3 when, as is often the case in practice, we want to use a reduced SVD.

88 Chapter 6. VARs and DMDs


Intermediate Quantitative Economics with Python

6.5 Representation 3

Departing from the procedures used to construct Representations 1 and 2, each of which deployed a full SVD, we now
use a reduced SVD.
Again, we let 𝑝 ≤ min(𝑚, 𝑛) be the rank of 𝑋.
Construct a reduced SVD

𝑋 = 𝑈̃ Σ̃ 𝑉 ̃ ⊤ ,

where now 𝑈̃ is 𝑚 × 𝑝, Σ̃ is 𝑝 × 𝑝, and 𝑉 ̃ ⊤ is 𝑝 × 𝑛.


Our minimum-norm least-squares approximator of 𝐴 now has representation

𝐴 ̂ = 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑈̃ ⊤ (6.21)

Computing Dominant Eigenvectors of 𝐴 ̂


We begin by paralleling a step used to construct Representation 1, define a transition matrix for a rotated 𝑝 × 1 state 𝑏̃𝑡 by

𝐴 ̃ = 𝑈 ̃ ⊤ 𝐴𝑈
̂ ̃ (6.22)

Interpretation as projection coefficients


[Brunton and Kutz, 2022] remark that 𝐴 ̃ can be interpreted in terms of a projection of 𝐴 ̂ onto the 𝑝 modes in 𝑈̃ .
To verify this, first note that, because 𝑈̃ ⊤ 𝑈̃ = 𝐼, it follows that

𝐴 ̃ = 𝑈 ̃ ⊤ 𝐴𝑈
̂ ̃ = 𝑈̃ ⊤ 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑈̃ ⊤ 𝑈̃ = 𝑈̃ ⊤ 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑈̃ ⊤ (6.23)

Next, we’ll just compute the regression coefficients in a projection of 𝐴 ̂ on 𝑈̃ using a standard least-squares formula

(𝑈̃ ⊤ 𝑈̃ )−1 𝑈̃ ⊤ 𝐴 ̂ = (𝑈̃ ⊤ 𝑈̃ )−1 𝑈̃ ⊤ 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑈̃ ⊤ = 𝑈̃ ⊤ 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑈̃ ⊤ = 𝐴.̃

Thus, we have verified that 𝐴 ̃ is a least-squares projection of 𝐴 ̂ onto 𝑈̃ .


An Inverse Challenge
Because we are using a reduced SVD, 𝑈̃ 𝑈̃ ⊤ ≠ 𝐼.
Consequently,

𝐴 ̂ ≠ 𝑈 ̃ 𝐴𝑈
̃ ̃ ⊤,

so we can’t simply recover 𝐴 ̂ from 𝐴 ̃ and 𝑈̃ .


A Blind Alley
We can start by hoping for the best and proceeding to construct an eigendecomposition of the 𝑝 × 𝑝 matrix 𝐴:̃

𝐴 ̃ = 𝑊̃ Λ𝑊̃ −1 (6.24)

where Λ is a diagonal matrix of 𝑝 eigenvalues and the columns of 𝑊̃ are corresponding eigenvectors.
Mimicking our procedure in Representation 2, we cross our fingers and compute an 𝑚 × 𝑝 matrix

Φ̃ 𝑠 = 𝑈̃ 𝑊̃ (6.25)

that corresponds to (6.18) for a full SVD.

6.5. Representation 3 89
Intermediate Quantitative Economics with Python

At this point, where 𝐴 ̂ is given by formula (6.21) it is interesting to compute 𝐴Φ


̂ ̃ :
𝑠

̂ ̃ = (𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑈̃ ⊤ )(𝑈̃ 𝑊̃ )
𝐴Φ 𝑠

= 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑊̃
≠ (𝑈̃ 𝑊̃ )Λ
= Φ̃ 𝑠 Λ

That 𝐴Φ̂ ̃ ≠ Φ̃ Λ means that, unlike the corresponding situation in Representation 2, columns of Φ̃ = 𝑈̃ 𝑊̃ are not
𝑠 𝑠 𝑠
eigenvectors of 𝐴 ̂ corresponding to eigenvalues on the diagonal of matix Λ.
An Approach That Works
Continuing our quest for eigenvectors of 𝐴 ̂ that we can compute with a reduced SVD, let’s define an 𝑚 × 𝑝 matrix Φ as

̂ ̃ = 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑊̃
Φ ≡ 𝐴Φ (6.26)
𝑠

It turns out that columns of Φ are eigenvectors of 𝐴.̂


This is a consequence of a result established by Tu et al. [Tu et al., 2014] that we now present.
Proposition The 𝑝 columns of Φ are eigenvectors of 𝐴.̂
Proof: From formula (6.26) we have

̂ = (𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑈̃ ⊤ )(𝑋 ′ 𝑉 ̃ Σ−1 𝑊̃ )
𝐴Φ
̃ ̃
= 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝐴𝑊
= 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑊̃ Λ
= ΦΛ

so that

̂ = ΦΛ.
𝐴Φ (6.27)

Let 𝜙𝑖 be the 𝑖th column of Φ and 𝜆𝑖 be the corresponding 𝑖 eigenvalue of 𝐴 ̃ from decomposition (6.24).
Equating the 𝑚 × 1 vectors that appear on the two sides of equation (6.27) gives

̂ =𝜆𝜙.
𝐴𝜙 𝑖 𝑖 𝑖

This equation confirms that 𝜙𝑖 is an eigenvector of 𝐴 ̂ that corresponds to eigenvalue 𝜆𝑖 of both 𝐴 ̃ and 𝐴.̂
This concludes the proof.
Also see [Brunton and Kutz, 2022] (p. 238)

6.5.1 Decoder of 𝑏̌ as a linear projection

From eigendecomposition (6.27) we can represent 𝐴 ̂ as

𝐴 ̂ = ΦΛΦ+ . (6.28)

From formula (6.28) we can deduce dynamics of the 𝑝 × 1 vector 𝑏̌𝑡 :

𝑏̌𝑡+1 = Λ𝑏̌𝑡

90 Chapter 6. VARs and DMDs


Intermediate Quantitative Economics with Python

where

𝑏̌𝑡 = Φ+ 𝑋𝑡 (6.29)

Since the 𝑚 × 𝑝 matrix Φ has 𝑝 linearly independent columns, the generalized inverse of Φ is

Φ+ = (Φ⊤ Φ)−1 Φ⊤

and so

𝑏̌ = (Φ⊤ Φ)−1 Φ⊤ 𝑋 (6.30)

The 𝑝 × 𝑛 matrix 𝑏̌ is recognizable as a matrix of least squares regression coefficients of the 𝑚 × 𝑛 matrix 𝑋 on the 𝑚 × 𝑝
matrix Φ and consequently

𝑋̌ = Φ𝑏̌ (6.31)

is an 𝑚 × 𝑛 matrix of least squares projections of 𝑋 on Φ.


Variance Decomposition of 𝑋
By virtue of the least-squares projection theory discussed in this quantecon lecture https://python-advanced.quantecon.
org/orth_proj.html, we can represent 𝑋 as the sum of the projection 𝑋̌ of 𝑋 on Φ plus a matrix of errors.
To verify this, note that the least squares projection 𝑋̌ is related to 𝑋 by

𝑋 = 𝑋̌ + 𝜖

or

𝑋 = Φ 𝑏̌ + 𝜖 (6.32)

where 𝜖 is an 𝑚 × 𝑛 matrix of least squares errors satisfying the least squares orthogonality conditions 𝜖⊤ Φ = 0 or

(𝑋 − Φ𝑏)̌ ⊤ Φ = 0𝑚×𝑝 (6.33)

̌ ⊤ Φ, which implies formula (6.30).


Rearranging the orthogonality conditions (6.33) gives 𝑋 ⊤ Φ = 𝑏Φ

6.5.2 An Approximation

We now describe a way to approximate the 𝑝 × 1 vector 𝑏̌𝑡 instead of using formula (6.29).
In particular, the following argument adapted from [Brunton and Kutz, 2022] (page 240) provides a computationally
efficient way to approximate 𝑏̌𝑡 .
For convenience, we’ll apply the method at time 𝑡 = 1.
For 𝑡 = 1, from equation (6.32) we have

𝑋̌ 1 = Φ𝑏̌1 (6.34)

where 𝑏̌1 is a 𝑝 × 1 vector.


Recall from representation 1 above that 𝑋1 = 𝑈 𝑏̃1 , where 𝑏̃1 is a time 1 basis vector for representation 1 and 𝑈 is from
the full SVD 𝑋 = 𝑈 Σ𝑉 ⊤ .
It then follows from equation (6.32) that

𝑈 𝑏̃1 = 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑊̃ 𝑏̌1 + 𝜖1

6.5. Representation 3 91
Intermediate Quantitative Economics with Python

where 𝜖1 is a least-squares error vector from equation (6.32).


It follows that

𝑏̃1 = 𝑈 ⊤ 𝑋 ′ 𝑉 Σ̃ −1 𝑊̃ 𝑏̌1 + 𝑈 ⊤ 𝜖1

Replacing the error term 𝑈 ⊤ 𝜖1 by zero, and replacing 𝑈 from a full SVD of 𝑋 with 𝑈̃ from a reduced SVD, we obtain
an approximation 𝑏̂1 to 𝑏̃1 :

𝑏̂1 = 𝑈̃ ⊤ 𝑋 ′ 𝑉 ̃ Σ̃ −1 𝑊̃ 𝑏̌1

Recall that from equation (6.23), 𝐴 ̃ = 𝑈̃ ⊤ 𝑋 ′ 𝑉 ̃ Σ̃ −1 .


It then follows that
̃ ̃ 𝑏̌
𝑏̂1 = 𝐴𝑊 1

and therefore, by the eigendecomposition (6.24) of 𝐴,̃ we have

𝑏̂1 = 𝑊̃ Λ𝑏̌1

Consequently,

𝑏̂1 = (𝑊̃ Λ)−1 𝑏̃1

or

𝑏̂1 = (𝑊̃ Λ)−1 𝑈̃ ⊤ 𝑋1 , (6.35)

which is a computationally efficient approximation to the following instance of equation (6.29) for the initial vector 𝑏̌1 :

𝑏̌1 = Φ+ 𝑋1 (6.36)

(To highlight that (6.35) is an approximation, users of DMD sometimes call components of basis vector 𝑏̌𝑡 = Φ+ 𝑋𝑡 the
exact DMD modes and components of 𝑏̂𝑡 = (𝑊̃ Λ)−1 𝑈̃ ⊤ 𝑋𝑡 the approximate modes.)
Conditional on 𝑋𝑡 , we can compute a decoded 𝑋̌ 𝑡+𝑗 , 𝑗 = 1, 2, … from the exact modes via

𝑋̌ 𝑡+𝑗 = ΦΛ𝑗 Φ+ 𝑋𝑡 (6.37)

or use compute a decoded 𝑋̂ 𝑡+𝑗 from approximate modes via

𝑋̂ 𝑡+𝑗 = ΦΛ𝑗 (𝑊̃ Λ)−1 𝑈̃ ⊤ 𝑋𝑡 . (6.38)

We can then use a decoded 𝑋̌ 𝑡+𝑗 or 𝑋̂ 𝑡+𝑗 to forecast 𝑋𝑡+𝑗 .

6.5.3 Using Fewer Modes

In applications, we’ll actually use only a few modes, often three or less.
Some of the preceding formulas assume that we have retained all 𝑝 modes associated with singular values of 𝑋.
We can adjust our formulas to describe a situation in which we instead retain only the 𝑟 < 𝑝 largest singular values.
In that case, we simply replace Σ̃ with the appropriate 𝑟 × 𝑟 matrix of singular values, 𝑈̃ with the 𝑚 × 𝑟 matrix whose
columns correspond to the 𝑟 largest singular values, and 𝑉 ̃ with the 𝑛 × 𝑟 matrix whose columns correspond to the 𝑟
largest singular values.
Counterparts of all of the salient formulas above then apply.

92 Chapter 6. VARs and DMDs


Intermediate Quantitative Economics with Python

6.6 Source for Some Python Code

You can find a Python implementation of DMD here:


https://mathlab.sissa.it/pydmd

6.6. Source for Some Python Code 93


Intermediate Quantitative Economics with Python

94 Chapter 6. VARs and DMDs


CHAPTER

SEVEN

USING NEWTON’S METHOD TO SOLVE ECONOMIC MODELS

Contents

• Using Newton’s Method to Solve Economic Models


– Overview
– Fixed Point Computation Using Newton’s Method
– Root-Finding in One Dimension
– Multivariate Newton’s Method
– Exercises

See also:
GPU: A version of this lecture which makes use of jax to run the code on a GPU is available here

7.1 Overview

Many economic problems involve finding fixed points or zeros (sometimes called “roots”) of functions.
For example, in a simple supply and demand model, an equilibrium price is one that makes excess demand zero.
In other words, an equilibrium is a zero of the excess demand function.
There are various computational techniques for solving for fixed points and zeros.
In this lecture we study an important gradient-based technique called Newton’s method.
Newton’s method does not always work but, in situations where it does, convergence is often fast when compared to other
methods.
The lecture will apply Newton’s method in one-dimensional and multi-dimensional settings to solve fixed-point and zero-
finding problems.
• When finding the fixed point of a function 𝑓, Newton’s method updates an existing guess of the fixed point by
solving for the fixed point of a linear approximation to the function 𝑓.
• When finding the zero of a function 𝑓, Newton’s method updates an existing guess by solving for the zero of a
linear approximation to the function 𝑓.
To build intuition, we first consider an easy, one-dimensional fixed point problem where we know the solution and solve
it using both successive approximation and Newton’s method.
Then we apply Newton’s method to multi-dimensional settings to solve market for equilibria with multiple goods.

95
Intermediate Quantitative Economics with Python

At the end of the lecture we leverage the power of automatic differentiation in autograd to solve a very high-dimensional
equilibrium problem

!pip install autograd

We use the following imports in this lecture

import matplotlib.pyplot as plt


from collections import namedtuple
from scipy.optimize import root
from autograd import jacobian
# Thinly-wrapped numpy to enable automatic differentiation
import autograd.numpy as np

plt.rcParams["figure.figsize"] = (10, 5.7)

7.2 Fixed Point Computation Using Newton’s Method

In this section we solve the fixed point of the law of motion for capital in the setting of the Solow growth model.
We will inspect the fixed point visually, solve it by successive approximation, and then apply Newton’s method to achieve
faster convergence.

7.2.1 The Solow Model

In the Solow growth model, assuming Cobb-Douglas production technology and zero population growth, the law of motion
for capital is

𝑘𝑡+1 = 𝑔(𝑘𝑡 ) where 𝑔(𝑘) ∶= 𝑠𝐴𝑘𝛼 + (1 − 𝛿)𝑘 (7.1)

Here
• 𝑘𝑡 is capital stock per worker,
• 𝐴, 𝛼 > 0 are production parameters, 𝛼 < 1
• 𝑠 > 0 is a savings rate, and
• 𝛿 ∈ (0, 1) is a rate of depreciation
In this example, we wish to calculate the unique strictly positive fixed point of 𝑔, the law of motion for capital.
In other words, we seek a 𝑘∗ > 0 such that 𝑔(𝑘∗ ) = 𝑘∗ .
• such a 𝑘∗ is called a steady state, since 𝑘𝑡 = 𝑘∗ implies 𝑘𝑡+1 = 𝑘∗ .
Using pencil and paper to solve 𝑔(𝑘) = 𝑘, you will be able to confirm that
1/(1−𝛼)
𝑠𝐴
𝑘∗ = ( )
𝛿

96 Chapter 7. Using Newton’s Method to Solve Economic Models


Intermediate Quantitative Economics with Python

7.2.2 Implementation

Let’s store our parameters in namedtuple to help us keep our code clean and concise.

SolowParameters = namedtuple("SolowParameters", ('A', 's', 'α', 'δ'))

This function creates a suitable namedtuple with default parameter values.

def create_solow_params(A=2.0, s=0.3, α=0.3, δ=0.4):


"Creates a Solow model parameterization with default values."
return SolowParameters(A=A, s=s, α=α, δ=δ)

The next two functions implement the law of motion (7.2.1) and store the true fixed point 𝑘∗ .

def g(k, params):


A, s, α, δ = params
return A * s * k**α + (1 - δ) * k

def exact_fixed_point(params):
A, s, α, δ = params
return ((s * A) / δ)**(1/(1 - α))

Here is a function to provide a 45 degree plot of the dynamics.

def plot_45(params, ax, fontsize=14):

k_min, k_max = 0.0, 3.0


k_grid = np.linspace(k_min, k_max, 1200)

# Plot the functions


lb = r"$g(k) = sAk^{\alpha} + (1 - \delta)k$"
ax.plot(k_grid, g(k_grid, params), lw=2, alpha=0.6, label=lb)
ax.plot(k_grid, k_grid, "k--", lw=1, alpha=0.7, label="45")

# Show and annotate the fixed point


kstar = exact_fixed_point(params)
fps = (kstar,)
ax.plot(fps, fps, "go", ms=10, alpha=0.6)
ax.annotate(r"$k^* = (sA / \delta)^{\frac{1}{1-\alpha}}$",
xy=(kstar, kstar),
xycoords="data",
xytext=(20, -20),
textcoords="offset points",
fontsize=fontsize)

ax.legend(loc="upper left", frameon=False, fontsize=fontsize)

ax.set_yticks((0, 1, 2, 3))
ax.set_yticklabels((0.0, 1.0, 2.0, 3.0), fontsize=fontsize)
ax.set_ylim(0, 3)
ax.set_xlabel("$k_t$", fontsize=fontsize)
ax.set_ylabel("$k_{t+1}$", fontsize=fontsize)

Let’s look at the 45 degree diagram for two parameterizations.

7.2. Fixed Point Computation Using Newton’s Method 97


Intermediate Quantitative Economics with Python

params = create_solow_params()
fig, ax = plt.subplots(figsize=(8, 8))
plot_45(params, ax)
plt.show()

params = create_solow_params(α=0.05, δ=0.5)


fig, ax = plt.subplots(figsize=(8, 8))
plot_45(params, ax)
plt.show()

98 Chapter 7. Using Newton’s Method to Solve Economic Models


Intermediate Quantitative Economics with Python

We see that 𝑘∗ is indeed the unique positive fixed point.

Successive Approximation

First let’s compute the fixed point using successive approximation.


In this case, successive approximation means repeatedly updating capital from some initial state 𝑘0 using the law of
motion.
Here’s a time series from a particular choice of 𝑘0 .

def compute_iterates(k_0, f, params, n=25):


"Compute time series of length n generated by arbitrary function f."
k = k_0
k_iterates = []
for t in range(n):
(continues on next page)

7.2. Fixed Point Computation Using Newton’s Method 99


Intermediate Quantitative Economics with Python

(continued from previous page)


k_iterates.append(k)
k = f(k, params)
return k_iterates

params = create_solow_params()
k_0 = 0.25
k_series = compute_iterates(k_0, g, params)
k_star = exact_fixed_point(params)

fig, ax = plt.subplots()
ax.plot(k_series, 'o')
ax.plot([k_star] * len(k_series), 'k--')
ax.set_ylim(0, 3)
plt.show()

Let’s see the output for a long time series.

k_series = compute_iterates(k_0, g, params, n=10_000)


k_star_approx = k_series[-1]
k_star_approx

1.7846741842265788

This is close to the true value.

k_star

1.7846741842265788

100 Chapter 7. Using Newton’s Method to Solve Economic Models


Intermediate Quantitative Economics with Python

Newton’s Method

In general, when applying Newton’s fixed point method to some function 𝑔, we start with a guess 𝑥0 of the fixed point
and then update by solving for the fixed point of a tangent line at 𝑥0 .
To begin with, we recall that the first-order approximation of 𝑔 at 𝑥0 (i.e., the first order Taylor approximation of 𝑔 at 𝑥0 )
is the function

𝑔(𝑥)
̂ ≈ 𝑔(𝑥0 ) + 𝑔′ (𝑥0 )(𝑥 − 𝑥0 ) (7.2)

We solve for the fixed point of 𝑔 ̂ by calculating the 𝑥1 that solves

𝑔(𝑥0 ) − 𝑔′ (𝑥0 )𝑥0


𝑥1 =
1 − 𝑔′ (𝑥0 )
Generalising the process above, Newton’s fixed point method iterates on
𝑔(𝑥𝑡 ) − 𝑔′ (𝑥𝑡 )𝑥𝑡
𝑥𝑡+1 = , 𝑥0 given (7.3)
1 − 𝑔′ (𝑥𝑡 )
To implement Newton’s method we observe that the derivative of the law of motion for capital (7.2.1) is

𝑔′ (𝑘) = 𝛼𝑠𝐴𝑘𝛼−1 + (1 − 𝛿) (7.4)

Let’s define this:

def Dg(k, params):


A, s, α, δ = params
return α * A * s * k**(α-1) + (1 - δ)

Here’s a function 𝑞 representing (7.2.3).

def q(k, params):


return (g(k, params) - Dg(k, params) * k) / (1 - Dg(k, params))

Now let’s plot some trajectories.

def plot_trajectories(params,
k0_a=0.8, # first initial condition
k0_b=3.1, # second initial condition
n=20, # length of time series
fs=14): # fontsize

fig, axes = plt.subplots(2, 1, figsize=(10, 6))


ax1, ax2 = axes

ks1 = compute_iterates(k0_a, g, params, n)


ax1.plot(ks1, "-o", label="successive approximation")

ks2 = compute_iterates(k0_b, g, params, n)


ax2.plot(ks2, "-o", label="successive approximation")

ks3 = compute_iterates(k0_a, q, params, n)


ax1.plot(ks3, "-o", label="newton steps")

ks4 = compute_iterates(k0_b, q, params, n)


ax2.plot(ks4, "-o", label="newton steps")
(continues on next page)

7.2. Fixed Point Computation Using Newton’s Method 101


Intermediate Quantitative Economics with Python

(continued from previous page)

for ax in axes:
ax.plot(k_star * np.ones(n), "k--")
ax.legend(fontsize=fs, frameon=False)
ax.set_ylim(0.6, 3.2)
ax.set_yticks((k_star,))
ax.set_yticklabels(("$k^*$",), fontsize=fs)
ax.set_xticks(np.linspace(0, 19, 20))

plt.show()

params = create_solow_params()
plot_trajectories(params)

We can see that Newton’s method converges faster than successive approximation.

7.3 Root-Finding in One Dimension

In the previous section we computed fixed points.


In fact Newton’s method is more commonly associated with the problem of finding zeros of functions.
Let’s discuss this “root-finding” problem and then show how it is connected to the problem of finding fixed points.

102 Chapter 7. Using Newton’s Method to Solve Economic Models


Intermediate Quantitative Economics with Python

7.3.1 Newton’s Method for Zeros

Let’s suppose we want to find an 𝑥 such that 𝑓(𝑥) = 0 for some smooth function 𝑓 mapping real numbers to real numbers.
Suppose we have a guess 𝑥0 and we want to update it to a new point 𝑥1 .
As a first step, we take the first-order approximation of 𝑓 around 𝑥0 :

̂ ≈ 𝑓 (𝑥 ) + 𝑓 ′ (𝑥 ) (𝑥 − 𝑥 )
𝑓(𝑥) 0 0 0

Now we solve for the zero of 𝑓.̂


̂ ) = 0 and solve for 𝑥 to get
In particular, we set 𝑓(𝑥 1 1

𝑓(𝑥0 )
𝑥1 = 𝑥 0 − , 𝑥0 given
𝑓 ′ (𝑥0 )

Generalizing the formula above, for one-dimensional zero-finding problems, Newton’s method iterates on

𝑓(𝑥𝑡 )
𝑥𝑡+1 = 𝑥𝑡 − , 𝑥0 given (7.5)
𝑓 ′ (𝑥𝑡 )

The following code implements the iteration (7.3.1)

def newton(f, Df, x_0, tol=1e-7, max_iter=100_000):


x = x_0

# Implement the zero-finding formula


def q(x):
return x - f(x) / Df(x)

error = tol + 1
n = 0
while error > tol:
n += 1
if(n > max_iter):
raise Exception('Max iteration reached without convergence')
y = q(x)
error = np.abs(x - y)
x = y
print(f'iteration {n}, error = {error:.5f}')
return x

Numerous libraries implement Newton’s method in one dimension, including SciPy, so the code is just for illustrative
purposes.
(That said, when we want to apply Newton’s method using techniques such as automatic differentiation or GPU acceler-
ation, it will be helpful to know how to implement Newton’s method ourselves.)

7.3.2 Application to Finding Fixed Points

Now consider again the Solow fixed-point calculation, where we solve for 𝑘 satisfying 𝑔(𝑘) = 𝑘.
We can convert to this to a zero-finding problem by setting 𝑓(𝑥) ∶= 𝑔(𝑥) − 𝑥.
Any zero of 𝑓 is clearly a fixed point of 𝑔.
Let’s apply this idea to the Solow problem

7.3. Root-Finding in One Dimension 103


Intermediate Quantitative Economics with Python

params = create_solow_params()
k_star_approx_newton = newton(f=lambda x: g(x, params) - x,
Df=lambda x: Dg(x, params) - 1,
x_0=0.8)

iteration 1, error = 1.27209


iteration 2, error = 0.28180
iteration 3, error = 0.00561
iteration 4, error = 0.00000
iteration 5, error = 0.00000

k_star_approx_newton

1.7846741842265788

The result confirms the descent we saw in the graphs above: a very accurate result is reached with only 5 iterations.

7.4 Multivariate Newton’s Method

In this section, we introduce a two-good problem, present a visualization of the problem, and solve for the equilibrium of
the two-good market using both a zero finder in SciPy and Newton’s method.
We then expand the idea to a larger market with 5,000 goods and compare the performance of the two methods again.
We will see a significant performance gain when using Netwon’s method.

7.4.1 A Two Goods Market Equilibrium

Let’s start by computing the market equilibrium of a two-good problem.


We consider a market for two related products, good 0 and good 1, with price vector 𝑝 = (𝑝0 , 𝑝1 )
Supply of good 𝑖 at price 𝑝,

𝑞𝑖𝑠 (𝑝) = 𝑏𝑖 𝑝𝑖
Demand of good 𝑖 at price 𝑝 is,
𝑞𝑖𝑑 (𝑝) = exp(−(𝑎𝑖0 𝑝0 + 𝑎𝑖1 𝑝1 )) + 𝑐𝑖
Here 𝑐𝑖 , 𝑏𝑖 and 𝑎𝑖𝑗 are parameters.
For example, the two goods might be computer components that are typically used together, in which case they are
complements. Hence demand depends on the price of both components.
The excess demand function is,
𝑒𝑖 (𝑝) = 𝑞𝑖𝑑 (𝑝) − 𝑞𝑖𝑠 (𝑝), 𝑖 = 0, 1
An equilibrium price vector 𝑝∗ satisfies 𝑒𝑖 (𝑝∗ ) = 0.
We set
𝑎00 𝑎01 𝑏 𝑐
𝐴=( ), 𝑏 = ( 0) and 𝑐 = ( 0)
𝑎10 𝑎11 𝑏1 𝑐1
for this particular question.

104 Chapter 7. Using Newton’s Method to Solve Economic Models


Intermediate Quantitative Economics with Python

A Graphical Exploration

Since our problem is only two-dimensional, we can use graphical analysis to visualize and help understand the problem.
Our first step is to define the excess demand function

𝑒0 (𝑝)
𝑒(𝑝) = ( )
𝑒1 (𝑝)

The function below calculates the excess demand for given parameters

def e(p, A, b, c):


return np.exp(- A @ p) + c - b * np.sqrt(p)

Our default parameter values will be

0.5 0.4 1 1
𝐴=( ), 𝑏=( ) and 𝑐=( )
0.8 0.2 1 1

A = np.array([
[0.5, 0.4],
[0.8, 0.2]
])
b = np.ones(2)
c = np.ones(2)

At a price level of 𝑝 = (1, 0.5), the excess demand is

ex_demand = e((1.0, 0.5), A, b, c)

print(f'The excess demand for good 0 is {ex_demand[0]:.3f} \n'


f'The excess demand for good 1 is {ex_demand[1]:.3f}')

The excess demand for good 0 is 0.497


The excess demand for good 1 is 0.699

Next we plot the two functions 𝑒0 and 𝑒1 on a grid of (𝑝0 , 𝑝1 ) values, using contour surfaces and lines.
We will use the following function to build the contour plots

def plot_excess_demand(ax, good=0, grid_size=100, grid_max=4, surface=True):

# Create a 100x100 grid


p_grid = np.linspace(0, grid_max, grid_size)
z = np.empty((100, 100))

for i, p_1 in enumerate(p_grid):


for j, p_2 in enumerate(p_grid):
z[i, j] = e((p_1, p_2), A, b, c)[good]

if surface:
cs1 = ax.contourf(p_grid, p_grid, z.T, alpha=0.5)
plt.colorbar(cs1, ax=ax, format="%.6f")

ctr1 = ax.contour(p_grid, p_grid, z.T, levels=[0.0])


ax.set_xlabel("$p_0$")
(continues on next page)

7.4. Multivariate Newton’s Method 105


Intermediate Quantitative Economics with Python

(continued from previous page)


ax.set_ylabel("$p_1$")
ax.set_title(f'Excess Demand for Good {good}')
plt.clabel(ctr1, inline=1, fontsize=13)

Here’s our plot of 𝑒0 :

fig, ax = plt.subplots()
plot_excess_demand(ax, good=0)
plt.show()

Here’s our plot of 𝑒1 :

fig, ax = plt.subplots()
plot_excess_demand(ax, good=1)
plt.show()

106 Chapter 7. Using Newton’s Method to Solve Economic Models


Intermediate Quantitative Economics with Python

We see the black contour line of zero, which tells us when 𝑒𝑖 (𝑝) = 0.
For a price vector 𝑝 such that 𝑒𝑖 (𝑝) = 0 we know that good 𝑖 is in equilibrium (demand equals supply).
If these two contour lines cross at some price vector 𝑝∗ , then 𝑝∗ is an equilibrium price vector.

fig, ax = plt.subplots(figsize=(10, 5.7))


for good in (0, 1):
plot_excess_demand(ax, good=good, surface=False)
plt.show()

7.4. Multivariate Newton’s Method 107


Intermediate Quantitative Economics with Python

It seems there is an equilibrium close to 𝑝 = (1.6, 1.5).

Using a Multidimensional Root Finder

To solve for 𝑝∗ more precisely, we use a zero-finding algorithm from scipy.optimize.


We supply 𝑝 = (1, 1) as our initial guess.

init_p = np.ones(2)

This uses the modified Powell method to find the zero

%%time
solution = root(lambda p: e(p, A, b, c), init_p, method='hybr')

CPU times: user 332 µs, sys: 0 ns, total: 332 µs


Wall time: 286 µs

Here’s the resulting value:

p = solution.x
p

array([1.57080182, 1.46928838])

This looks close to our guess from observing the figure. We can plug it back into 𝑒 to test that 𝑒(𝑝) ≈ 0:

np.max(np.abs(e(p, A, b, c)))

108 Chapter 7. Using Newton’s Method to Solve Economic Models


Intermediate Quantitative Economics with Python

2.0383694732117874e-13

This is indeed a very small error.

Adding Gradient Information

In many cases, for zero-finding algorithms applied to smooth functions, supplying the Jacobian of the function leads to
better convergence properties.
Here we manually calculate the elements of the Jacobian
𝜕𝑒0 𝜕𝑒0
(𝑝) 𝜕𝑝1 (𝑝)
𝐽 (𝑝) = ( 𝜕𝑝
𝜕𝑒1
0
𝜕𝑒1 )
𝜕𝑝0 (𝑝) 𝜕𝑝1 (𝑝)

def jacobian_e(p, A, b, c):


p_0, p_1 = p
a_00, a_01 = A[0, :]
a_10, a_11 = A[1, :]
j_00 = -a_00 * np.exp(-a_00 * p_0) - (b[0]/2) * p_0**(-1/2)
j_01 = -a_01 * np.exp(-a_01 * p_1)
j_10 = -a_10 * np.exp(-a_10 * p_0)
j_11 = -a_11 * np.exp(-a_11 * p_1) - (b[1]/2) * p_1**(-1/2)
J = [[j_00, j_01],
[j_10, j_11]]
return np.array(J)

%%time
solution = root(lambda p: e(p, A, b, c),
init_p,
jac=lambda p: jacobian_e(p, A, b, c),
method='hybr')

CPU times: user 582 µs, sys: 46 µs, total: 628 µs


Wall time: 449 µs

Now the solution is even more accurate (although, in this low-dimensional problem, the difference is quite small):

p = solution.x
np.max(np.abs(e(p, A, b, c)))

1.3322676295501878e-15

Using Newton’s Method

Now let’s use Newton’s method to compute the equilibrium price using the multivariate version of Newton’s method

𝑝𝑛+1 = 𝑝𝑛 − 𝐽𝑒 (𝑝𝑛 )−1 𝑒(𝑝𝑛 ) (7.6)

This is a multivariate version of (7.3.1)


(Here 𝐽𝑒 (𝑝𝑛 ) is the Jacobian of 𝑒 evaluated at 𝑝𝑛 .)

7.4. Multivariate Newton’s Method 109


Intermediate Quantitative Economics with Python

The iteration starts from some initial guess of the price vector 𝑝0 .
Here, instead of coding Jacobian by hand, We use the jacobian() function in the autograd library to auto-
differentiate and calculate the Jacobian.
With only slight modification, we can generalize our previous attempt to multi-dimensional problems

def newton(f, x_0, tol=1e-5, max_iter=10):


x = x_0
q = lambda x: x - np.linalg.solve(jacobian(f)(x), f(x))
error = tol + 1
n = 0
while error > tol:
n+=1
if(n > max_iter):
raise Exception('Max iteration reached without convergence')
y = q(x)
if(any(np.isnan(y))):
raise Exception('Solution not found with NaN generated')
error = np.linalg.norm(x - y)
x = y
print(f'iteration {n}, error = {error:.5f}')
print('\n' + f'Result = {x} \n')
return x

def e(p, A, b, c):


return np.exp(- np.dot(A, p)) + c - b * np.sqrt(p)

We find the algorithm terminates in 4 steps

%%time
p = newton(lambda p: e(p, A, b, c), init_p)

iteration 1, error = 0.62515


iteration 2, error = 0.11152
iteration 3, error = 0.00258
iteration 4, error = 0.00000

Result = [1.57080182 1.46928838]

CPU times: user 4.86 ms, sys: 394 µs, total: 5.25 ms
Wall time: 3.62 ms

np.max(np.abs(e(p, A, b, c)))

1.4632739464559563e-13

The result is very accurate.


With the larger overhead, the speed is not better than the optimized scipy function.

110 Chapter 7. Using Newton’s Method to Solve Economic Models


Intermediate Quantitative Economics with Python

7.4.2 A High-Dimensional Problem

Our next step is to investigate a large market with 3,000 goods.


A JAX version of this section using GPU accelerated linear algebra and automatic differentiation is available here
The excess demand function is essentially the same, but now the matrix 𝐴 is 3000 × 3000 and the parameter vectors 𝑏
and 𝑐 are 3000 × 1.

dim = 3000
np.random.seed(123)

# Create a random matrix A and normalize the rows to sum to one


A = np.random.rand(dim, dim)
A = np.asarray(A)
s = np.sum(A, axis=0)
A = A / s

# Set up b and c
b = np.ones(dim)
c = np.ones(dim)

Here’s our initial condition

init_p = np.ones(dim)

%%time
p = newton(lambda p: e(p, A, b, c), init_p)

iteration 1, error = 23.22267

iteration 2, error = 3.94538

iteration 3, error = 0.08500

iteration 4, error = 0.00004

iteration 5, error = 0.00000

Result = [1.50185286 1.49865815 1.50028285 ... 1.50875149 1.48724784 1.48577532]

CPU times: user 2min 9s, sys: 1.76 s, total: 2min 11s
Wall time: 32.9 s

np.max(np.abs(e(p, A, b, c)))

6.661338147750939e-16

With the same tolerance, we compare the runtime and accuracy of Newton’s method to SciPy’s root function

7.4. Multivariate Newton’s Method 111


Intermediate Quantitative Economics with Python

%%time
solution = root(lambda p: e(p, A, b, c),
init_p,
jac=lambda p: jacobian(e)(p, A, b, c),
method='hybr',
tol=1e-5)

CPU times: user 1min 20s, sys: 618 ms, total: 1min 21s
Wall time: 42.7 s

p = solution.x
np.max(np.abs(e(p, A, b, c)))

8.295585953721485e-07

7.5 Exercises

Exercise 7.5.1
Consider a three-dimensional extension of the Solow fixed point problem with

2 3 3
𝐴=⎛
⎜2 4 2⎞⎟, 𝑠 = 0.2, 𝛼 = 0.5, 𝛿 = 0.8
⎝1 5 1⎠
As before the law of motion is

𝑘𝑡+1 = 𝑔(𝑘𝑡 ) where 𝑔(𝑘) ∶= 𝑠𝐴𝑘𝛼 + (1 − 𝛿)𝑘

However 𝑘𝑡 is now a 3 × 1 vector.


Solve for the fixed point using Newton’s method with the following initial values:

𝑘10 = (1, 1, 1)
𝑘20 = (3, 5, 5)
𝑘30 = (50, 50, 50)

Hint:
• The computation of the fixed point is equivalent to computing 𝑘∗ such that 𝑓(𝑘∗ ) − 𝑘∗ = 0.
• If you are unsure about your solution, you can start with the solved example:
2 0 0
𝐴=⎛
⎜0 2 0⎞⎟
⎝0 0 2⎠
with 𝑠 = 0.3, 𝛼 = 0.3, and 𝛿 = 0.4 and starting value:

𝑘0 = (1, 1, 1)

The result should converge to the analytical solution.

112 Chapter 7. Using Newton’s Method to Solve Economic Models


Intermediate Quantitative Economics with Python

Solution to Exercise 7.5.1


Let’s first define the parameters for this problem

A = np.array([[2.0, 3.0, 3.0],


[2.0, 4.0, 2.0],
[1.0, 5.0, 1.0]])

s = 0.2
α = 0.5
δ = 0.8

initLs = [np.ones(3),
np.array([3.0, 5.0, 5.0]),
np.repeat(50.0, 3)]

Then define the multivariate version of the formula for the (7.2.1)

def multivariate_solow(k, A=A, s=s, α=α, δ=δ):


return (s * np.dot(A, k**α) + (1 - δ) * k)

Let’s run through each starting value and see the output

attempt = 1
for init in initLs:
print(f'Attempt {attempt}: Starting value is {init} \n')
%time k = newton(lambda k: multivariate_solow(k) - k, \
init)
print('-'*64)
attempt += 1

Attempt 1: Starting value is [1. 1. 1.]

iteration 1, error = 50.49630


iteration 2, error = 41.10937
iteration 3, error = 4.29413
iteration 4, error = 0.38543
iteration 5, error = 0.00544
iteration 6, error = 0.00000

Result = [3.84058108 3.87071771 3.41091933]

CPU times: user 26.8 ms, sys: 137 µs, total: 27 ms


Wall time: 6.33 ms
----------------------------------------------------------------
Attempt 2: Starting value is [3. 5. 5.]

iteration 1, error = 2.07011


iteration 2, error = 0.12642
iteration 3, error = 0.00060
iteration 4, error = 0.00000

Result = [3.84058108 3.87071771 3.41091933]

(continues on next page)

7.5. Exercises 113


Intermediate Quantitative Economics with Python

(continued from previous page)


CPU times: user 17.3 ms, sys: 17 µs, total: 17.3 ms
Wall time: 4.06 ms
----------------------------------------------------------------
Attempt 3: Starting value is [50. 50. 50.]

iteration 1, error = 73.00943


iteration 2, error = 6.49379
iteration 3, error = 0.68070
iteration 4, error = 0.01620
iteration 5, error = 0.00001
iteration 6, error = 0.00000

Result = [3.84058108 3.87071771 3.41091933]

CPU times: user 25.5 ms, sys: 0 ns, total: 25.5 ms


Wall time: 5.97 ms
----------------------------------------------------------------

We find that the results are invariant to the starting values given the well-defined property of this question.
But the number of iterations it takes to converge is dependent on the starting values.
Let substitute the output back to the formulate to check our last result

multivariate_solow(k) - k

array([ 0.0000000e+00, -4.4408921e-16, 8.8817842e-16])

Note the error is very small.


We can also test our results on the known solution

A = np.array([[2.0, 0.0, 0.0],


[0.0, 2.0, 0.0],
[0.0, 0.0, 2.0]])

s = 0.3
α = 0.3
δ = 0.4

init = np.repeat(1.0, 3)

%time k = newton(lambda k: multivariate_solow(k, A=A, s=s, α=α, δ=δ) - k, \


init)

iteration 1, error = 1.57459


iteration 2, error = 0.21345
iteration 3, error = 0.00205
iteration 4, error = 0.00000

Result = [1.78467418 1.78467418 1.78467418]

CPU times: user 14 ms, sys: 4.03 ms, total: 18 ms


Wall time: 4.22 ms

114 Chapter 7. Using Newton’s Method to Solve Economic Models


Intermediate Quantitative Economics with Python

The result is very close to the ground truth but still slightly different.

%time k = newton(lambda k: multivariate_solow(k, A=A, s=s, α=α, δ=δ) - k, \


init,\
tol=1e-7)

iteration 1, error = 1.57459


iteration 2, error = 0.21345
iteration 3, error = 0.00205
iteration 4, error = 0.00000
iteration 5, error = 0.00000

Result = [1.78467418 1.78467418 1.78467418]

CPU times: user 22.3 ms, sys: 79 µs, total: 22.3 ms


Wall time: 5.23 ms

We can see it steps towards a more accurate solution.

Exercise 7.5.2
In this exercise, let’s try different initial values and check how Newton’s method responds to different starting points.
Let’s define a three-good problem with the following default values:
0.2 0.1 0.7 1 1
𝐴=⎛
⎜0.3 0.2 0.5⎞⎟, 𝑏=⎛
⎜1⎞⎟ and 𝑐=⎛
⎜1⎞

⎝0.1 0.8 0.1⎠ 1
⎝ ⎠ 1
⎝ ⎠
For this exercise, use the following extreme price vectors as initial values:
𝑝10 = (5, 5, 5)
𝑝20 = (1, 1, 1)
𝑝30 = (4.5, 0.1, 4)
Set the tolerance to 0.0 for more accurate output.

Solution to Exercise 7.5.2


Define parameters and initial values

A = np.array([
[0.2, 0.1, 0.7],
[0.3, 0.2, 0.5],
[0.1, 0.8, 0.1]
])

b = np.array([1.0, 1.0, 1.0])


c = np.array([1.0, 1.0, 1.0])

initLs = [np.repeat(5.0, 3),


np.ones(3),
np.array([4.5, 0.1, 4.0])]

Let’s run through each initial guess and check the output

7.5. Exercises 115


Intermediate Quantitative Economics with Python

attempt = 1
for init in initLs:
print(f'Attempt {attempt}: Starting value is {init} \n')
%time p = newton(lambda p: e(p, A, b, c), \
init, \
tol=1e-15, \
max_iter=15)
print('-'*64)
attempt += 1

Attempt 1: Starting value is [5. 5. 5.]

iteration 1, error = 9.24381

/opt/conda/envs/quantecon/lib/python3.11/site-packages/autograd/tracer.py:48:␣
↪RuntimeWarning: invalid value encountered in sqrt

return f_raw(*args, **kwargs)


/opt/conda/envs/quantecon/lib/python3.11/site-packages/autograd/numpy/numpy_vjps.
↪py:99: RuntimeWarning: invalid value encountered in power

defvjp(anp.sqrt, lambda ans, x : lambda g: g * 0.5 * x**-0.5)

---------------------------------------------------------------------------
Exception Traceback (most recent call last)
File <timed exec>:1

Cell In[34], line 12, in newton(f, x_0, tol, max_iter)


10 y = q(x)
11 if(any(np.isnan(y))):
---> 12 raise Exception('Solution not found with NaN generated')
13 error = np.linalg.norm(x - y)
14 x = y

Exception: Solution not found with NaN generated

----------------------------------------------------------------
Attempt 2: Starting value is [1. 1. 1.]

iteration 1, error = 0.73419


iteration 2, error = 0.12472
iteration 3, error = 0.00269
iteration 4, error = 0.00000
iteration 5, error = 0.00000
iteration 6, error = 0.00000

Result = [1.49744442 1.49744442 1.49744442]

CPU times: user 5.71 ms, sys: 0 ns, total: 5.71 ms


Wall time: 4.85 ms
----------------------------------------------------------------
Attempt 3: Starting value is [4.5 0.1 4. ]

iteration 1, error = 4.89202


iteration 2, error = 1.21206
iteration 3, error = 0.69421
(continues on next page)

116 Chapter 7. Using Newton’s Method to Solve Economic Models


Intermediate Quantitative Economics with Python

(continued from previous page)


iteration 4, error = 0.16895
iteration 5, error = 0.00521
iteration 6, error = 0.00000
iteration 7, error = 0.00000
iteration 8, error = 0.00000

Result = [1.49744442 1.49744442 1.49744442]

CPU times: user 7.05 ms, sys: 19 µs, total: 7.07 ms


Wall time: 6.03 ms
----------------------------------------------------------------

We can find that Newton’s method may fail for some starting values.
Sometimes it may take a few initial guesses to achieve convergence.
Substitute the result back to the formula to check our result

e(p, A, b, c)

array([ 0.00000000e+00, 0.00000000e+00, -2.22044605e-16])

We can see the result is very accurate.

7.5. Exercises 117


Intermediate Quantitative Economics with Python

118 Chapter 7. Using Newton’s Method to Solve Economic Models


Part II

Elementary Statistics

119
CHAPTER

EIGHT

ELEMENTARY PROBABILITY WITH MATRICES

This lecture uses matrix algebra to illustrate some basic ideas about probability theory.
After providing somewhat informal definitions of the underlying objects, we’ll use matrices and vectors to describe
probability distributions.
Among concepts that we’ll be studying include
• a joint probability distribution
• marginal distributions associated with a given joint distribution
• conditional probability distributions
• statistical independence of two random variables
• joint distributions associated with a prescribed set of marginal distributions
– couplings
– copulas
• the probability distribution of a sum of two independent random variables
– convolution of marginal distributions
• parameters that define a probability distribution
• sufficient statistics as data summaries
We’ll use a matrix to represent a bivariate probability distribution and a vector to represent a univariate probability dis-
tribution
In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install prettytable

As usual, we’ll start with some imports

import numpy as np
import matplotlib.pyplot as plt
import prettytable as pt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib_inline.backend_inline import set_matplotlib_formats
set_matplotlib_formats('retina')

121
Intermediate Quantitative Economics with Python

8.1 Sketch of Basic Concepts

We’ll briefly define what we mean by a probability space, a probability measure, and a random variable.
For most of this lecture, we sweep these objects into the background, but they are there underlying the other objects that
we’ll mainly focus on.
Let Ω be a set of possible underlying outcomes and let 𝜔 ∈ Ω be a particular underlying outcomes.
Let 𝒢 ⊂ Ω be a subset of Ω.
Let ℱ be a collection of such subsets 𝒢 ⊂ Ω.
The pair Ω, ℱ forms our probability space on which we want to put a probability measure.
A probability measure 𝜇 maps a set of possible underlying outcomes 𝒢 ∈ ℱ into a scalar number between 0 and 1
• this is the “probability” that 𝑋 belongs to 𝐴, denoted by Prob{𝑋 ∈ 𝐴}.
A random variable 𝑋(𝜔) is a function of the underlying outcome 𝜔 ∈ Ω.
The random variable 𝑋(𝜔) has a probability distribution that is induced by the underlying probability measure 𝜇 and
the function 𝑋(𝜔):

Prob(𝑋 ∈ 𝐴) = ∫ 𝜇(𝜔)𝑑𝜔 (8.1)


𝒢

where 𝒢 is the subset of Ω for which 𝑋(𝜔) ∈ 𝐴.


We call this the induced probability distribution of random variable 𝑋.

8.2 What Does Probability Mean?

Before diving in, we’ll say a few words about what probability theory means and how it connects to statistics.
We also touch on these topics in the quantecon lectures https://python.quantecon.org/prob_meaning.html and https://
python.quantecon.org/navy_captain.html.
For much of this lecture we’ll be discussing fixed “population” probabilities.
These are purely mathematical objects.
To appreciate how statisticians connect probabilities to data, the key is to understand the following concepts:
• A single draw from a probability distribution
• Repeated independently and identically distributed (i.i.d.) draws of “samples” or “realizations” from the same
probability distribution
• A statistic defined as a function of a sequence of samples
• An empirical distribution or histogram (a binned empirical distribution) that records observed relative fre-
quencies
• The idea that a population probability distribution is what we anticipate relative frequencies will be in a long
sequence of i.i.d. draws. Here the following mathematical machinery makes precise what is meant by anticipated
relative frequencies
– Law of Large Numbers (LLN)
– Central Limit Theorem (CLT)

122 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

Scalar example
Let 𝑋 be a scalar random variable that takes on the 𝐼 possible values 0, 1, 2, … , 𝐼 − 1 with probabilities

Prob(𝑋 = 𝑖) = 𝑓𝑖 ,

where
𝑓𝑖 ⩾ 0, ∑ 𝑓𝑖 = 1.
𝑖

We sometimes write

𝑋 ∼ {𝑓𝑖 }𝐼−1
𝑖=0

as a short-hand way of saying that the random variable 𝑋 is described by the probability distribution {𝑓𝑖 }𝐼−1
𝑖=0 .

Consider drawing a sample 𝑥0 , 𝑥1 , … , 𝑥𝑁−1 of 𝑁 independent and identically distributoed draws of 𝑋.


What do the “identical” and “independent” mean in IID or iid (“identically and independently distributed)?
• “identical” means that each draw is from the same distribution.
• “independent” means that joint distribution equal products of marginal distributions, i.e.,
Prob{𝑥0 = 𝑖0 , 𝑥1 = 𝑖1 , … , 𝑥𝑁−1 = 𝑖𝑁−1 } = Prob{𝑥0 = 𝑖0 } ⋅ ⋯ ⋅ Prob{𝑥𝐼−1 = 𝑖𝐼−1 }
= 𝑓𝑖0 𝑓𝑖1 ⋅ ⋯ ⋅ 𝑓𝑖𝑁−1
We define an e empirical distribution as follows.
For each 𝑖 = 0, … , 𝐼 − 1, let

𝑁𝑖 = number of times 𝑋 = 𝑖,
𝐼−1
𝑁 = ∑ 𝑁𝑖 total number of draws,
𝑖=0
𝑁𝑖
𝑓𝑖̃ = ∼ frequency of draws for which 𝑋 = 𝑖
𝑁
Key ideas that justify connecting probability theory with statistics are laws of large numbers and central limit theorems
LLN:
• A Law of Large Numbers (LLN) states that 𝑓𝑖̃ → 𝑓𝑖 as 𝑁 → ∞
CLT:
• A Central Limit Theorem (CLT) describes a rate at which 𝑓𝑖̃ → 𝑓𝑖
Remarks
• For “frequentist” statisticians, anticipated relative frequency is all that a probability distribution means.
• But for a Bayesian it means something more or different.

8.3 Representing Probability Distributions

A probability distribution Prob(𝑋 ∈ 𝐴) can be described by its cumulative distribution function (CDF)

𝐹𝑋 (𝑥) = Prob{𝑋 ≤ 𝑥}.

8.3. Representing Probability Distributions 123


Intermediate Quantitative Economics with Python

Sometimes, but not always, a random variable can also be described by density function 𝑓(𝑥) that is related to its CDF
by

Prob{𝑋 ∈ 𝐵} = ∫ 𝑓(𝑡)𝑑𝑡
𝑡∈𝐵

𝑥
𝐹 (𝑥) = ∫ 𝑓(𝑡)𝑑𝑡
−∞
Here 𝐵 is a set of possible 𝑋’s whose probability we want to compute.
When a probability density exists, a probability distribution can be characterized either by its CDF or by its density.
For a discrete-valued random variable
• the number of possible values of 𝑋 is finite or countably infinite
• we replace a density with a probability mass function, a non-negative sequence that sums to one
• we replace integration with summation in the formula like (8.1) that relates a CDF to a probability mass function
In this lecture, we mostly discuss discrete random variables.
Doing this enables us to confine our tool set basically to linear algebra.
Later we’ll briefly discuss how to approximate a continuous random variable with a discrete random variable.

8.4 Univariate Probability Distributions

We’ll devote most of this lecture to discrete-valued random variables, but we’ll say a few things about continuous-valued
random variables.

8.4.1 Discrete random variable

Let 𝑋 be a discrete random variable that takes possible values: 𝑖 = 0, 1, … , 𝐼 − 1 = 𝑋.̄


Here, we choose the maximum index 𝐼 − 1 because of how this aligns nicely with Python’s index convention.
Define 𝑓𝑖 ≡ Prob{𝑋 = 𝑖} and assemble the non-negative vector

𝑓0
⎡ 𝑓 ⎤
𝑓 =⎢ 1 ⎥ (8.2)
⎢ ⋮ ⎥
⎣ 𝑓𝐼−1 ⎦
𝐼−1
for which 𝑓𝑖 ∈ [0, 1] for each 𝑖 and ∑𝑖=0 𝑓𝑖 = 1.
This vector defines a probability mass function.
𝐼−2
The distribution (8.2) has parameters {𝑓𝑖 }𝑖=0,1,⋯,𝐼−2 since 𝑓𝐼−1 = 1 − ∑𝑖=0 𝑓𝑖 .
These parameters pin down the shape of the distribution.
(Sometimes 𝐼 = ∞.)
Such a “non-parametric” distribution has as many “parameters” as there are possible values of the random variable.
We often work with special distributions that are characterized by a small number parameters.

124 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

In these special parametric distributions,

𝑓𝑖 = 𝑔(𝑖; 𝜃)

where 𝜃 is a vector of parameters that is of much smaller dimension than 𝐼.


Remarks:
• The concept of parameter is intimately related to the notion of sufficient statistic.
• Sufficient statistics are nonlinear functions of a data set.
• Sufficient statistics are designed to summarize all information about parameters that is contained in a data set.
• They are important tools that AI uses to summarize a big data set
• R. A. Fisher provided a rigorous definition of information – see https://en.wikipedia.org/wiki/Fisher_information
An example of a parametric probability distribution is a geometric distribution.
It is described by

𝑓𝑖 = Prob{𝑋 = 𝑖} = (1 − 𝜆)𝜆𝑖 , 𝜆 ∈ [0, 1], 𝑖 = 0, 1, 2, …



Evidently, ∑𝑖=0 𝑓𝑖 = 1.
Let 𝜃 be a vector of parameters of the distribution described by 𝑓, then

𝑓𝑖 (𝜃) ≥ 0, ∑ 𝑓𝑖 (𝜃) = 1
𝑖=0

8.4.2 Continuous random variable

Let 𝑋 be a continous random variable that takes values 𝑋 ∈ 𝑋̃ ≡ [𝑋𝑈 , 𝑋𝐿 ] whose distributions have parameters 𝜃.

Prob{𝑋 ∈ 𝐴} = ∫ 𝑓(𝑥; 𝜃) 𝑑𝑥; 𝑓(𝑥; 𝜃) ≥ 0


𝑥∈𝐴

where 𝐴 is a subset of 𝑋̃ and

̃ =1
Prob{𝑋 ∈ 𝑋}

8.5 Bivariate Probability Distributions

We’ll now discuss a bivariate joint distribution.


To begin, we restrict ourselves to two discrete random variables.
Let 𝑋, 𝑌 be two discrete random variables that take values:

𝑋 ∈ {0, … , 𝐼 − 1}

𝑌 ∈ {0, … , 𝐽 − 1}
Then their joint distribution is described by a matrix

𝐹𝐼×𝐽 = [𝑓𝑖𝑗 ]𝑖∈{0,…,𝐼−1},𝑗∈{0,…,𝐽−1}

8.5. Bivariate Probability Distributions 125


Intermediate Quantitative Economics with Python

whose elements are

𝑓𝑖𝑗 = Prob{𝑋 = 𝑖, 𝑌 = 𝑗} ≥ 0

where

∑ ∑ 𝑓𝑖𝑗 = 1
𝑖 𝑗

8.6 Marginal Probability Distributions

The joint distribution induce marginal distributions


𝐽−1
Prob{𝑋 = 𝑖} = ∑ 𝑓𝑖𝑗 = 𝜇𝑖 , 𝑖 = 0, … , 𝐼 − 1
𝑗=0

𝐼−1
Prob{𝑌 = 𝑗} = ∑ 𝑓𝑖𝑗 = 𝜈𝑗 , 𝑗 = 0, … , 𝐽 − 1
𝑖=0

For example, let a joint distribution over (𝑋, 𝑌 ) be

.25 .1
𝐹 =[ ] (8.3)
.15 .5

The implied marginal distributions are:

Prob{𝑋 = 0} = .25 + .1 = .35


Prob{𝑋 = 1} = .15 + .5 = .65
Prob{𝑌 = 0} = .25 + .15 = .4
Prob{𝑌 = 1} = .1 + .5 = .6

Digression: If two random variables 𝑋, 𝑌 are continuous and have joint density 𝑓(𝑥, 𝑦), then marginal distributions can
be computed by

𝑓(𝑥) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑦


𝑓(𝑦) = ∫ 𝑓(𝑥, 𝑦)𝑑𝑥


8.7 Conditional Probability Distributions

Conditional probabilities are defined according to

Prob{𝐴 ∩ 𝐵}
Prob{𝐴 ∣ 𝐵} =
Prob{𝐵}

where 𝐴, 𝐵 are two events.


For a pair of discrete random variables, we have the conditional distribution

𝑓𝑖𝑗 Prob{𝑋 = 𝑖, 𝑌 = 𝑗}
Prob{𝑋 = 𝑖|𝑌 = 𝑗} = =
∑𝑖 𝑓𝑖𝑗 Prob{𝑌 = 𝑗}

126 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

where 𝑖 = 0, … , 𝐼 − 1, 𝑗 = 0, … , 𝐽 − 1.
Note that
∑𝑖 𝑓𝑖𝑗
∑ Prob{𝑋𝑖 = 𝑖|𝑌𝑗 = 𝑗} = =1
𝑖
∑𝑖 𝑓𝑖𝑗

Remark: The mathematics of conditional probability implies Bayes’ Law:

Prob{𝑋 = 𝑖, 𝑌 = 𝑗} Prob{𝑌 = 𝑗|𝑋 = 𝑖}Prob{𝑋 = 𝑖}


Prob{𝑋 = 𝑖|𝑌 = 𝑗} = =
Prob{𝑌 = 𝑗} Prob{𝑌 = 𝑗}

For the joint distribution (8.3)


.1 .1
Prob{𝑋 = 0|𝑌 = 1} = =
.1 + .5 .6

8.8 Statistical Independence

Random variables X and Y are statistically independent if

Prob{𝑋 = 𝑖, 𝑌 = 𝑗} = 𝑓𝑖 𝑔𝑗

where

Prob{𝑋 = 𝑖} = 𝑓𝑖 ≥ 0 ∑ 𝑓𝑖 = 1
Prob{𝑌 = 𝑗} = 𝑔𝑗 ≥ 0 ∑ 𝑔𝑗 = 1

Conditional distributions are


𝑓𝑖 𝑔𝑗 𝑓𝑖 𝑔𝑗
Prob{𝑋 = 𝑖|𝑌 = 𝑗} = = = 𝑓𝑖
∑ 𝑖 𝑓𝑖 𝑔 𝑗 𝑔𝑗
𝑓𝑖 𝑔𝑗 𝑓𝑖 𝑔𝑗
Prob{𝑌 = 𝑗|𝑋 = 𝑖} = = = 𝑔𝑗
∑𝑗 𝑓𝑖 𝑔𝑗 𝑓𝑖

8.9 Means and Variances

The mean and variance of a discrete random variable 𝑋 are

𝜇𝑋 ≡ 𝔼 [𝑋] = ∑ 𝑘Prob{𝑋 = 𝑘}
𝑘
2 2
𝜎𝑋 ≡ 𝔻 [𝑋] = ∑ (𝑘 − 𝔼 [𝑋]) Prob{𝑋 = 𝑘}
𝑘

A continuous random variable having density 𝑓𝑋 (𝑥)) has mean and variance

𝜇𝑋 ≡ 𝔼 [𝑋] = ∫ 𝑥𝑓𝑋 (𝑥)𝑑𝑥
−∞

2 2 2
𝜎𝑋 ≡ 𝔻 [𝑋] = E [(𝑋 − 𝜇𝑋 ) ] = ∫ (𝑥 − 𝜇𝑋 ) 𝑓𝑋 (𝑥)𝑑𝑥
−∞

8.8. Statistical Independence 127


Intermediate Quantitative Economics with Python

8.10 Generating Random Numbers

Suppose we have at our disposal a pseudo random number that draws a uniform random variable, i.e., one with probability
distribution
1
Prob{𝑋̃ = 𝑖} = , 𝑖 = 0, … , 𝐼 − 1
𝐼

How can we transform 𝑋̃ to get a random variable 𝑋 for which Prob{𝑋 = 𝑖} = 𝑓𝑖 , 𝑖 = 0, … , 𝐼 − 1, where 𝑓𝑖 is an
arbitary discrete probability distribution on 𝑖 = 0, 1, … , 𝐼 − 1?
The key tool is the inverse of a cumulative distribution function (CDF).
Observe that the CDF of a distribution is monotone and non-decreasing, taking values between 0 and 1.
We can draw a sample of a random variable 𝑋 with a known CDF as follows:
• draw a random variable 𝑢 from a uniform distribution on [0, 1]
• pass the sample value of 𝑢 into the “inverse” target CDF for 𝑋
• 𝑋 has the target CDF
Thus, knowing the “inverse” CDF of a distribution is enough to simulate from this distribution.

Note: The “inverse” CDF needs to exist for this method to work.

The inverse CDF is

𝐹 −1 (𝑢) ≡ inf{𝑥 ∈ ℝ ∶ 𝐹 (𝑥) ≥ 𝑢} (0 < 𝑢 < 1)

Here we use infimum because a CDF is a non-decreasing and right-continuous function.


Thus, suppose that
• 𝑈 is a uniform random variable 𝑈 ∈ [0, 1]
• We want to sample a random variable 𝑋 whose CDF is 𝐹 .
It turns out that if we use draw uniform random numbers 𝑈 and then compute 𝑋 from

𝑋 = 𝐹 −1 (𝑈 ),

then 𝑋 is a random variable with CDF 𝐹𝑋 (𝑥) = 𝐹 (𝑥) = Prob{𝑋 ≤ 𝑥}.


We’ll verify this in the special case in which 𝐹 is continuous and bijective so that its inverse function exists and can be
denoted by 𝐹 −1 .
Note that
𝐹𝑋 (𝑥) = Prob {𝑋 ≤ 𝑥}
= Prob {𝐹 −1 (𝑈 ) ≤ 𝑥}
= Prob {𝑈 ≤ 𝐹 (𝑥)}
= 𝐹 (𝑥)

where the last equality occurs because 𝑈 is distributed uniformly on [0, 1] while 𝐹 (𝑥) is a constant given 𝑥 that also lies
on [0, 1].
Let’s use numpy to compute some examples.
Example: A continuous geometric (exponential) distribution

128 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

Let 𝑋 follow a geometric distribution, with parameter 𝜆 > 0.


Its density function is

𝑓(𝑥) = 𝜆𝑒−𝜆𝑥

Its CDF is

𝐹 (𝑥) = ∫ 𝜆𝑒−𝜆𝑥 = 1 − 𝑒−𝜆𝑥
0

Let 𝑈 follow a uniform distribution on [0, 1].


𝑋 is a random variable such that 𝑈 = 𝐹 (𝑋).
The distribution 𝑋 can be deduced from
𝑈 = 𝐹 (𝑋) = 1 − 𝑒−𝜆𝑋
⟹ − 𝑈 = 𝑒−𝜆𝑋
⟹ log(1 − 𝑈 ) = −𝜆𝑋
(1 − 𝑈 )
⟹ 𝑋=
−𝜆
𝑙𝑜𝑔(1−𝑈)
Let’s draw 𝑢 from 𝑈 [0, 1] and calculate 𝑥 = −𝜆 .
We’ll check whether 𝑋 seems to follow a continuous geometric (exponential) distribution.
Let’s check with numpy.

n, λ = 1_000_000, 0.3

# draw uniform numbers


u = np.random.rand(n)

# transform
x = -np.log(1-u)/λ

# draw geometric distributions


x_g = np.random.exponential(1 / λ, n)

# plot and compare


plt.hist(x, bins=100, density=True)
plt.show()

8.10. Generating Random Numbers 129


Intermediate Quantitative Economics with Python

plt.hist(x_g, bins=100, density=True, alpha=0.6)


plt.show()

130 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

Geometric distribution
Let 𝑋 distributed geometrically, that is

Prob(𝑋 = 𝑖) = (1 − 𝜆)𝜆𝑖 , 𝜆 ∈ (0, 1), 𝑖 = 0, 1, …


∞ ∞
1−𝜆
∑ Prob(𝑋 = 𝑖) = 1 ⟷ (1 − 𝜆) ∑ 𝜆𝑖 = =1
𝑖=0 𝑖=0
1−𝜆

Its CDF is given by


𝑖
Prob(𝑋 ≤ 𝑖) = (1 − 𝜆) ∑ 𝜆𝑖
𝑗=0

1 − 𝜆𝑖+1
= (1 − 𝜆)[ ]
1−𝜆
= 1 − 𝜆𝑖+1
= 𝐹 (𝑋) = 𝐹𝑖

Again, let 𝑈̃ follow a uniform distribution and we want to find 𝑋 such that 𝐹 (𝑋) = 𝑈̃ .
Let’s deduce the distribution of 𝑋 from

𝑈̃ = 𝐹 (𝑋) = 1 − 𝜆𝑥+1
1 − 𝑈̃ = 𝜆𝑥+1
log(1 − 𝑈̃ ) = (𝑥 + 1) log 𝜆
log(1 − 𝑈̃ )
=𝑥+1
log 𝜆
log(1 − 𝑈̃ )
−1=𝑥
log 𝜆

However, 𝑈̃ = 𝐹 −1 (𝑋) may not be an integer for any 𝑥 ≥ 0.


So let

log(1 − 𝑈̃ )
𝑥=⌈ − 1⌉
log 𝜆

where ⌈.⌉ is the ceiling function.


Thus 𝑥 is the smallest integer such that the discrete geometric CDF is greater than or equal to 𝑈̃ .
We can verify that 𝑥 is indeed geometrically distributed by the following numpy program.

Note: The exponential distribution is the continuous analog of geometric distribution.

n, λ = 1_000_000, 0.8

# draw uniform numbers


u = np.random.rand(n)

# transform
x = np.ceil(np.log(1-u)/np.log(λ) - 1)

(continues on next page)

8.10. Generating Random Numbers 131


Intermediate Quantitative Economics with Python

(continued from previous page)


# draw geometric distributions
x_g = np.random.geometric(1-λ, n)

# plot and compare


plt.hist(x, bins=150, density=True)
plt.show()

np.random.geometric(1-λ, n).max()

64

np.log(0.4)/np.log(0.3)

0.7610560044063083

plt.hist(x_g, bins=150, density=True, alpha=0.6)


plt.show()

132 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

8.11 Some Discrete Probability Distributions

Let’s write some Python code to compute means and variances of some univariate random variables.
We’ll use our code to
• compute population means and variances from the probability distribution
• generate a sample of 𝑁 independently and identically distributed draws and compute sample means and variances
• compare population and sample means and variances

8.12 Geometric distribution

Prob(𝑋 = 𝑘) = (1 − 𝑝)𝑘−1 𝑝, 𝑘 = 1, 2, …

1
𝔼(𝑋) =
𝑝
1−𝑝
𝔻(𝑋) =
𝑝2
We draw observations from the distribution and compare the sample mean and variance with the theoretical results.

8.11. Some Discrete Probability Distributions 133


Intermediate Quantitative Economics with Python

# specify parameters
p, n = 0.3, 1_000_000

# draw observations from the distribution


x = np.random.geometric(p, n)

# compute sample mean and variance


μ_hat = np.mean(x)
σ2_hat = np.var(x)

print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)

# compare with theoretical results


print("\nThe population mean is: ", 1/p)
print("The population variance is: ", (1-p)/(p**2))

The sample mean is: 3.33521


The sample variance is: 7.793688255900004

The population mean is: 3.3333333333333335


The population variance is: 7.777777777777778

8.12.1 Newcomb–Benford distribution

The Newcomb–Benford law fits many data sets, e.g., reports of incomes to tax authorities, in which the leading digit is
more likely to be small than large.
See https://en.wikipedia.org/wiki/Benford’s_law
A Benford probability distribution is
1
Prob{𝑋 = 𝑑} = log10 (𝑑 + 1) − log10 (𝑑) = log10 (1 + )
𝑑
where 𝑑 ∈ {1, 2, ⋯ , 9} can be thought of as a first digit in a sequence of digits.
This is a well defined discrete distribution since we can verify that probabilities are nonnegative and sum to 1.
9
1 1
log10 (1 + ) ≥ 0, ∑ log10 (1 + )=1
𝑑 𝑑=1
𝑑

The mean and variance of a Benford distribution are


9
1
𝔼 [𝑋] = ∑ 𝑑 log10 (1 + ) ≃ 3.4402
𝑑=1
𝑑
9
2 1
𝕍 [𝑋] = ∑ (𝑑 − 𝔼 [𝑋]) log10 (1 + ) ≃ 6.0565
𝑑=1
𝑑

We verify the above and compute the mean and variance using numpy.

Benford_pmf = np.array([np.log10(1+1/d) for d in range(1,10)])


k = np.array(range(1,10))

# mean
(continues on next page)

134 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

(continued from previous page)


mean = np.sum(Benford_pmf * k)

# variance
var = np.sum([(k-mean)**2 * Benford_pmf])

# verify sum to 1
print(np.sum(Benford_pmf))
print(mean)
print(var)

0.9999999999999999
3.440236967123206
6.056512631375667

# plot distribution
plt.plot(range(1,10), Benford_pmf, 'o')
plt.title('Benford\'s distribution')
plt.show()

8.12. Geometric distribution 135


Intermediate Quantitative Economics with Python

8.12.2 Pascal (negative binomial) distribution

Consider a sequence of independent Bernoulli trials.


Let 𝑝 be the probability of success.
Let 𝑋 be a random variable that represents the number of failures before we get 𝑟 success.
Its distribution is
𝑋 ∼ 𝑁 𝐵(𝑟, 𝑝)
𝑘+𝑟−1 𝑟
Prob(𝑋 = 𝑘; 𝑟, 𝑝) = ( ) 𝑝 (1 − 𝑝)𝑘
𝑟−1
Here, we choose from among 𝑘 + 𝑟 − 1 possible outcomes because the last draw is by definition a success.
We compute the mean and variance to be
𝑘(1 − 𝑝)
𝔼(𝑋) =
𝑝
𝑘(1 − 𝑝)
𝕍(𝑋) =
𝑝2
# specify parameters
r, p, n = 10, 0.3, 1_000_000

# draw observations from the distribution


x = np.random.negative_binomial(r, p, n)

# compute sample mean and variance


μ_hat = np.mean(x)
σ2_hat = np.var(x)

print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
print("\nThe population mean is: ", r*(1-p)/p)
print("The population variance is: ", r*(1-p)/p**2)

The sample mean is: 23.309863


The sample variance is: 77.53126792123103

The population mean is: 23.333333333333336


The population variance is: 77.77777777777779

8.13 Continuous Random Variables

8.13.1 Univariate Gaussian distribution

We write

𝑋 ∼ 𝑁 (𝜇, 𝜎2 )

to indicate the probability distribution


1 1 2
𝑓(𝑥|𝑢, 𝜎2 ) = √ 𝑒[− 2𝜎2 (𝑥−𝑢) ]
2𝜋𝜎2
In the below example, we set 𝜇 = 0, 𝜎 = 0.1.

136 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

# specify parameters
μ, σ = 0, 0.1

# specify number of draws


n = 1_000_000

# draw observations from the distribution


x = np.random.normal(μ, σ, n)

# compute sample mean and variance


μ_hat = np.mean(x)
σ_hat = np.std(x)

print("The sample mean is: ", μ_hat)


print("The sample standard deviation is: ", σ_hat)

The sample mean is: -2.6670604920545608e-05


The sample standard deviation is: 0.10000647799603994

# compare
print(μ-μ_hat < 1e-3)
print(σ-σ_hat < 1e-3)

True
True

8.13.2 Uniform Distribution

𝑋 ∼ 𝑈 [𝑎, 𝑏]
1
, 𝑎≤𝑥≤𝑏
𝑓(𝑥) = { 𝑏−𝑎
0, otherwise
The population mean and variance are
𝑎+𝑏
𝔼(𝑋) =
2
(𝑏 − 𝑎)2
𝕍(𝑋) =
12
# specify parameters
a, b = 10, 20

# specify number of draws


n = 1_000_000

# draw observations from the distribution


x = a + (b-a)*np.random.rand(n)

# compute sample mean and variance


μ_hat = np.mean(x)
σ2_hat = np.var(x)

(continues on next page)

8.13. Continuous Random Variables 137


Intermediate Quantitative Economics with Python

(continued from previous page)


print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)
print("\nThe population mean is: ", (a+b)/2)
print("The population variance is: ", (b-a)**2/12)

The sample mean is: 14.997530408465284


The sample variance is: 8.326682057974937

The population mean is: 15.0


The population variance is: 8.333333333333334

8.14 A Mixed Discrete-Continuous Distribution

We’ll motivate this example with a little story.


Suppose that to apply for a job you take an interview and either pass or fail it.
You have 5% chance to pass an interview and you know your salary will uniformly distributed in the interval 300~400 a
day only if you pass.
We can describe your daily salary as a discrete-continuous variable with the following probabilities:

𝑃 (𝑋 = 0) = 0.95
400
𝑃 (300 ≤ 𝑋 ≤ 400) = ∫ 𝑓(𝑥) 𝑑𝑥 = 0.05
300

𝑓(𝑥) = 0.0005
Let’s start by generating a random sample and computing sample moments.

x = np.random.rand(1_000_000)
# x[x > 0.95] = 100*x[x > 0.95]+300
x[x > 0.95] = 100*np.random.rand(len(x[x > 0.95]))+300
x[x <= 0.95] = 0

μ_hat = np.mean(x)
σ2_hat = np.var(x)

print("The sample mean is: ", μ_hat, "\nThe sample variance is: ", σ2_hat)

The sample mean is: 17.501194648768948


The sample variance is: 5863.879898691551

The analytical mean and variance can be computed:


400
𝜇=∫ 𝑥𝑓(𝑥)𝑑𝑥
300
400
= 0.0005 ∫ 𝑥𝑑𝑥
300
400
1
= 0.0005 × 𝑥2 ∣
2 300

138 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

400
𝜎2 = 0.95 × (0 − 17.5)2 + ∫ (𝑥 − 17.5)2 𝑓(𝑥)𝑑𝑥
300
400
= 0.95 × 17.52 + 0.0005 ∫ (𝑥 − 17.5)2 𝑑𝑥
300
400
1
2
= 0.95 × 17.5 + 0.0005 × (𝑥 − 17.5)3 ∣
3 300

mean = 0.0005*0.5*(400**2 - 300**2)


var = 0.95*17.5**2+0.0005/3*((400-17.5)**3-(300-17.5)**3)
print("mean: ", mean)
print("variance: ", var)

mean: 17.5
variance: 5860.416666666666

8.15 Matrix Representation of Some Bivariate Distributions

Let’s use matrices to represent a joint distribution, conditional distribution, marginal distribution, and the mean and
variance of a bivariate random variable.
The table below illustrates a probability distribution for a bivariate random variable.

0.3 0.2
𝐹 = [𝑓𝑖𝑗 ] = [ ]
0.1 0.4

Marginal distributions are

Prob(𝑋 = 𝑖) = ∑ 𝑓𝑖𝑗 = 𝑢𝑖
𝑗

Prob(𝑌 = 𝑗) = ∑ 𝑓𝑖𝑗 = 𝑣𝑗
𝑖

Below we draw some samples confirm that the “sampling” distribution agrees well with the “population” distribution.
Sample results:

# specify parameters
xs = np.array([0, 1])
ys = np.array([10, 20])
f = np.array([[0.3, 0.2], [0.1, 0.4]])
f_cum = np.cumsum(f)

# draw random numbers


p = np.random.rand(1_000_000)
x = np.vstack([xs[1]*np.ones(p.shape), ys[1]*np.ones(p.shape)])
# map to the bivariate distribution

x[0, p < f_cum[2]] = xs[1]


x[1, p < f_cum[2]] = ys[0]

x[0, p < f_cum[1]] = xs[0]


x[1, p < f_cum[1]] = ys[1]

(continues on next page)

8.15. Matrix Representation of Some Bivariate Distributions 139


Intermediate Quantitative Economics with Python

(continued from previous page)


x[0, p < f_cum[0]] = xs[0]
x[1, p < f_cum[0]] = ys[0]
print(x)

[[ 1. 1. 0. ... 1. 0. 0.]
[20. 20. 20. ... 20. 20. 10.]]

Here, we use exactly the inverse CDF technique to generate sample from the joint distribution 𝐹 .

# marginal distribution
xp = np.sum(x[0, :] == xs[0])/1_000_000
yp = np.sum(x[1, :] == ys[0])/1_000_000

# print output
print("marginal distribution for x")
xmtb = pt.PrettyTable()
xmtb.field_names = ['x_value', 'x_prob']
xmtb.add_row([xs[0], xp])
xmtb.add_row([xs[1], 1-xp])
print(xmtb)

print("\nmarginal distribution for y")


ymtb = pt.PrettyTable()
ymtb.field_names = ['y_value', 'y_prob']
ymtb.add_row([ys[0], yp])
ymtb.add_row([ys[1], 1-yp])
print(ymtb)

marginal distribution for x


+---------+---------------------+
| x_value | x_prob |
+---------+---------------------+
| 0 | 0.501237 |
| 1 | 0.49876299999999996 |
+---------+---------------------+

marginal distribution for y


+---------+---------+
| y_value | y_prob |
+---------+---------+
| 10 | 0.40036 |
| 20 | 0.59964 |
+---------+---------+

# conditional distributions
xc1 = x[0, x[1, :] == ys[0]]
xc2 = x[0, x[1, :] == ys[1]]
yc1 = x[1, x[0, :] == xs[0]]
yc2 = x[1, x[0, :] == xs[1]]

xc1p = np.sum(xc1 == xs[0])/len(xc1)


xc2p = np.sum(xc2 == xs[0])/len(xc2)
yc1p = np.sum(yc1 == ys[0])/len(yc1)
yc2p = np.sum(yc2 == ys[0])/len(yc2)
(continues on next page)

140 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

(continued from previous page)

# print output
print("conditional distribution for x")
xctb = pt.PrettyTable()
xctb.field_names = ['y_value', 'prob(x=0)', 'prob(x=1)']
xctb.add_row([ys[0], xc1p, 1-xc1p])
xctb.add_row([ys[1], xc2p, 1-xc2p])
print(xctb)

print("\nconditional distribution for y")


yctb = pt.PrettyTable()
yctb.field_names = ['x_value', 'prob(y=10)', 'prob(y=20)']
yctb.add_row([xs[0], yc1p, 1-yc1p])
yctb.add_row([xs[1], yc2p, 1-yc2p])
print(yctb)

conditional distribution for x


+---------+--------------------+--------------------+
| y_value | prob(x=0) | prob(x=1) |
+---------+--------------------+--------------------+
| 10 | 0.7501148965930663 | 0.2498851034069337 |
| 20 | 0.3350693749583083 | 0.6649306250416918 |
+---------+--------------------+--------------------+

conditional distribution for y


+---------+---------------------+--------------------+
| x_value | prob(y=10) | prob(y=20) |
+---------+---------------------+--------------------+
| 0 | 0.5991497036332114 | 0.4008502963667886 |
| 1 | 0.20058424542317693 | 0.799415754576823 |
+---------+---------------------+--------------------+

Let’s calculate population marginal and conditional probabilities using matrix algebra.
⋮ 𝑦 1 𝑦2 ⋮ 𝑥
⎡ ⋯ ⋮ ⋯ ⋯ ⋮ ⋯ ⎤
⎢ ⎥
⎢ 𝑥1 ⋮ 0.3 0.2 ⋮ 0.5 ⎥
⎢ 𝑥2 ⋮ 0.1 0.4 ⋮ 0.5 ⎥
⎢ ⋯ ⋮ ⋯ ⋯ ⋮ ⋯ ⎥
⎣ 𝑦 ⋮ 0.4 0.6 ⋮ 1 ⎦

(1) Marginal distribution:
𝑣𝑎𝑟 ⋮ 𝑣𝑎𝑟1 𝑣𝑎𝑟2
⎡ ⋯ ⋮ ⋯ ⋯ ⎤
⎢ ⎥
⎢ 𝑥 ⋮ 0.5 0.5 ⎥
⎢ ⋯ ⋮ ⋯ ⋯ ⎥
⎣ 𝑦 ⋮ 0.4 0.6 ⎦
(2) Conditional distribution:
𝑥 ⋮ 𝑥1 𝑥2
⎡ ⋯⋯⋯ ⋮ ⋯⋯⋯ ⋯⋯⋯ ⎤
⎢ 0.3 0.1 ⎥
⎢ 𝑦 = 𝑦1 ⋮ 0.4 = 0.75 0.4 = 0.25 ⎥
⎢ ⋯⋯⋯ ⋮ ⋯⋯⋯ ⋯⋯⋯ ⎥
0.2 0.4
⎣ 𝑦 = 𝑦2 ⋮ 0.6 ≈ 0.33 0.6 ≈ 0.67 ⎦

8.15. Matrix Representation of Some Bivariate Distributions 141


Intermediate Quantitative Economics with Python

𝑦 ⋮ 𝑦1 𝑦2
⎡ ⋯⋯⋯ ⋮ ⋯⋯⋯ ⋯⋯⋯ ⎤
⎢ 0.3 0.2 ⎥
⎢ 𝑥 = 𝑥1 ⋮ 0.5 = 0.6 0.5 = 0.4 ⎥
⎢ ⋯⋯⋯ ⋮ ⋯⋯⋯ ⋯⋯⋯ ⎥
0.1 0.4
⎣ 𝑥 = 𝑥2 ⋮ 0.5 = 0.2 0.5 = 0.8 ⎦
These population objects closely resemble sample counterparts computed above.
Let’s wrap some of the functions we have used in a Python class for a general discrete bivariate joint distribution.

class discrete_bijoint:

def __init__(self, f, xs, ys):


'''initialization
-----------------
parameters:
f: the bivariate joint probability matrix
xs: values of x vector
ys: values of y vector
'''
self.f, self.xs, self.ys = f, xs, ys

def joint_tb(self):
'''print the joint distribution table'''
xs = self.xs
ys = self.ys
f = self.f
jtb = pt.PrettyTable()
jtb.field_names = ['x_value/y_value', *ys, 'marginal sum for x']
for i in range(len(xs)):
jtb.add_row([xs[i], *f[i, :], np.sum(f[i, :])])
jtb.add_row(['marginal_sum for y', *np.sum(f, 0), np.sum(f)])
print("\nThe joint probability distribution for x and y\n", jtb)
self.jtb = jtb

def draw(self, n):


'''draw random numbers
----------------------
parameters:
n: number of random numbers to draw
'''
xs = self.xs
ys = self.ys
f_cum = np.cumsum(self.f)
p = np.random.rand(n)
x = np.empty([2, p.shape[0]])
lf = len(f_cum)
lx = len(xs)-1
ly = len(ys)-1
for i in range(lf):
x[0, p < f_cum[lf-1-i]] = xs[lx]
x[1, p < f_cum[lf-1-i]] = ys[ly]
if ly == 0:
lx -= 1
ly = len(ys)-1
else:
ly -= 1
self.x = x
self.n = n
(continues on next page)

142 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

(continued from previous page)

def marg_dist(self):
'''marginal distribution'''
x = self.x
xs = self.xs
ys = self.ys
n = self.n
xmp = [np.sum(x[0, :] == xs[i])/n for i in range(len(xs))]
ymp = [np.sum(x[1, :] == ys[i])/n for i in range(len(ys))]

# print output
xmtb = pt.PrettyTable()
ymtb = pt.PrettyTable()
xmtb.field_names = ['x_value', 'x_prob']
ymtb.field_names = ['y_value', 'y_prob']
for i in range(max(len(xs), len(ys))):
if i < len(xs):
xmtb.add_row([xs[i], xmp[i]])
if i < len(ys):
ymtb.add_row([ys[i], ymp[i]])
xmtb.add_row(['sum', np.sum(xmp)])
ymtb.add_row(['sum', np.sum(ymp)])
print("\nmarginal distribution for x\n", xmtb)
print("\nmarginal distribution for y\n", ymtb)

self.xmp = xmp
self.ymp = ymp

def cond_dist(self):
'''conditional distribution'''
x = self.x
xs = self.xs
ys = self.ys
n = self.n
xcp = np.empty([len(ys), len(xs)])
ycp = np.empty([len(xs), len(ys)])
for i in range(max(len(ys), len(xs))):
if i < len(ys):
xi = x[0, x[1, :] == ys[i]]
idx = xi.reshape(len(xi), 1) == xs.reshape(1, len(xs))
xcp[i, :] = np.sum(idx, 0)/len(xi)
if i < len(xs):
yi = x[1, x[0, :] == xs[i]]
idy = yi.reshape(len(yi), 1) == ys.reshape(1, len(ys))
ycp[i, :] = np.sum(idy, 0)/len(yi)

# print output
xctb = pt.PrettyTable()
yctb = pt.PrettyTable()
xctb.field_names = ['x_value', *xs, 'sum']
yctb.field_names = ['y_value', *ys, 'sum']
for i in range(max(len(xs), len(ys))):
if i < len(ys):
xctb.add_row([ys[i], *xcp[i], np.sum(xcp[i])])
if i < len(xs):
yctb.add_row([xs[i], *ycp[i], np.sum(ycp[i])])

(continues on next page)

8.15. Matrix Representation of Some Bivariate Distributions 143


Intermediate Quantitative Economics with Python

(continued from previous page)


print("\nconditional distribution for x\n", xctb)
print("\nconditional distribution for y\n", yctb)

self.xcp = xcp
self.xyp = ycp

Let’s apply our code to some examples.


Example 1

# joint
d = discrete_bijoint(f, xs, ys)
d.joint_tb()

The joint probability distribution for x and y


+--------------------+-----+--------------------+--------------------+
| x_value/y_value | 10 | 20 | marginal sum for x |
+--------------------+-----+--------------------+--------------------+
| 0 | 0.3 | 0.2 | 0.5 |
| 1 | 0.1 | 0.4 | 0.5 |
| marginal_sum for y | 0.4 | 0.6000000000000001 | 1.0 |
+--------------------+-----+--------------------+--------------------+

# sample marginal
d.draw(1_000_000)
d.marg_dist()

marginal distribution for x


+---------+----------+
| x_value | x_prob |
+---------+----------+
| 0 | 0.498825 |
| 1 | 0.501175 |
| sum | 1.0 |
+---------+----------+

marginal distribution for y


+---------+---------+
| y_value | y_prob |
+---------+---------+
| 10 | 0.39994 |
| 20 | 0.60006 |
| sum | 1.0 |
+---------+---------+

# sample conditional
d.cond_dist()

conditional distribution for x


+---------+---------------------+--------------------+-----+
| x_value | 0 | 1 | sum |
+---------+---------------------+--------------------+-----+
(continues on next page)

144 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

(continued from previous page)


| 10 | 0.7494524178626794 | 0.2505475821373206 | 1.0 |
| 20 | 0.33178182181781823 | 0.6682181781821818 | 1.0 |
+---------+---------------------+--------------------+-----+

conditional distribution for y


+---------+---------------------+---------------------+-----+
| y_value | 10 | 20 | sum |
+---------+---------------------+---------------------+-----+
| 0 | 0.6008840775823184 | 0.39911592241768157 | 1.0 |
| 1 | 0.19993814535840773 | 0.8000618546415923 | 1.0 |
+---------+---------------------+---------------------+-----+

Example 2

xs_new = np.array([10, 20, 30])


ys_new = np.array([1, 2])
f_new = np.array([[0.2, 0.1], [0.1, 0.3], [0.15, 0.15]])
d_new = discrete_bijoint(f_new, xs_new, ys_new)
d_new.joint_tb()

The joint probability distribution for x and y


+--------------------+---------------------+------+---------------------+
| x_value/y_value | 1 | 2 | marginal sum for x |
+--------------------+---------------------+------+---------------------+
| 10 | 0.2 | 0.1 | 0.30000000000000004 |
| 20 | 0.1 | 0.3 | 0.4 |
| 30 | 0.15 | 0.15 | 0.3 |
| marginal_sum for y | 0.45000000000000007 | 0.55 | 1.0 |
+--------------------+---------------------+------+---------------------+

d_new.draw(1_000_000)
d_new.marg_dist()

marginal distribution for x


+---------+----------+
| x_value | x_prob |
+---------+----------+
| 10 | 0.299336 |
| 20 | 0.400698 |
| 30 | 0.299966 |
| sum | 1.0 |
+---------+----------+

marginal distribution for y


+---------+----------+
| y_value | y_prob |
+---------+----------+
| 1 | 0.449267 |
| 2 | 0.550733 |
| sum | 1.0 |
+---------+----------+

d_new.cond_dist()

8.15. Matrix Representation of Some Bivariate Distributions 145


Intermediate Quantitative Economics with Python

conditional distribution for x


+---------+--------------------+---------------------+--------------------+-----+
| x_value | 10 | 20 | 30 | sum |
+---------+--------------------+---------------------+--------------------+-----+
| 1 | 0.4446108884026648 | 0.22235997747441943 | 0.3330291341229158 | 1.0 |
| 2 | 0.180826280611476 | 0.5461793645922798 | 0.2729943547962443 | 1.0 |
+---------+--------------------+---------------------+--------------------+-----+

conditional distribution for y


+---------+--------------------+--------------------+-----+
| y_value | 1 | 2 | sum |
+---------+--------------------+--------------------+-----+
| 10 | 0.6673069727663896 | 0.3326930272336104 | 1.0 |
| 20 | 0.2493124497751424 | 0.7506875502248577 | 1.0 |
| 30 | 0.4987865291399692 | 0.5012134708600308 | 1.0 |
+---------+--------------------+--------------------+-----+

8.16 A Continuous Bivariate Random Vector

A two-dimensional Gaussian distribution has joint density

1 (𝑥 − 𝜇1 )2 2𝜌(𝑥 − 𝜇1 )(𝑦 − 𝜇2 ) (𝑦 − 𝜇2 )2
𝑓(𝑥, 𝑦) = (2𝜋𝜎1 𝜎2 √1 − 𝜌2 )−1 exp [− ( − + )]
2(1 − 𝜌2 ) 𝜎12 𝜎1 𝜎2 𝜎22

1 1 (𝑥 − 𝜇1 )2 2𝜌(𝑥 − 𝜇1 )(𝑦 − 𝜇2 ) (𝑦 − 𝜇2 )2
exp [− ( 2
− + )]
2𝜋𝜎1 𝜎2 √1 − 𝜌2 2(1 − 𝜌2 ) 𝜎1 𝜎1 𝜎2 𝜎22
We start with a bivariate normal distribution pinned down by

0 5 .2
𝜇=[ ], Σ=[ ]
5 .2 1

# define the joint probability density function


def func(x, y, μ1=0, μ2=5, σ1=np.sqrt(5), σ2=np.sqrt(1), ρ=.2/np.sqrt(5*1)):
A = (2 * np.pi * σ1 * σ2 * np.sqrt(1 - ρ**2))**(-1)
B = -1 / 2 / (1 - ρ**2)
C1 = (x - μ1)**2 / σ1**2
C2 = 2 * ρ * (x - μ1) * (y - μ2) / σ1 / σ2
C3 = (y - μ2)**2 / σ2**2
return A * np.exp(B * (C1 - C2 + C3))

μ1 = 0
μ2 = 5
σ1 = np.sqrt(5)
σ2 = np.sqrt(1)
ρ = .2 / np.sqrt(5 * 1)

x = np.linspace(-10, 10, 1_000)


y = np.linspace(-10, 10, 1_000)
x_mesh, y_mesh = np.meshgrid(x, y, indexing="ij")

Joint Distribution
Let’s plot the population joint density.

146 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

# %matplotlib notebook

fig = plt.figure()
ax = plt.axes(projection='3d')

surf = ax.plot_surface(x_mesh, y_mesh, func(x_mesh, y_mesh), cmap='viridis')


plt.show()

# %matplotlib notebook

fig = plt.figure()
ax = plt.axes(projection='3d')

curve = ax.contour(x_mesh, y_mesh, func(x_mesh, y_mesh), zdir='x')


plt.ylabel('y')
ax.set_zlabel('f')
ax.set_xticks([])
plt.show()

8.16. A Continuous Bivariate Random Vector 147


Intermediate Quantitative Economics with Python

Next we can simulate from a built-in numpy function and calculate a sample marginal distribution from the sample mean
and variance.

μ= np.array([0, 5])
σ= np.array([[5, .2], [.2, 1]])
n = 1_000_000
data = np.random.multivariate_normal(μ, σ, n)
x = data[:, 0]
y = data[:, 1]

Marginal distribution

plt.hist(x, bins=1_000, alpha=0.6)


μx_hat, σx_hat = np.mean(x), np.std(x)
print(μx_hat, σx_hat)
x_sim = np.random.normal(μx_hat, σx_hat, 1_000_000)
plt.hist(x_sim, bins=1_000, alpha=0.4, histtype="step")
plt.show()

-0.0009410653678662386 2.237337853596715

148 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

plt.hist(y, bins=1_000, density=True, alpha=0.6)


μy_hat, σy_hat = np.mean(y), np.std(y)
print(μy_hat, σy_hat)
y_sim = np.random.normal(μy_hat, σy_hat, 1_000_000)
plt.hist(y_sim, bins=1_000, density=True, alpha=0.4, histtype="step")
plt.show()

4.999005281178264 1.0003086878642835

8.16. A Continuous Bivariate Random Vector 149


Intermediate Quantitative Economics with Python

Conditional distribution
The population conditional distribution is

𝑦 − 𝜇𝑌 2
[𝑋|𝑌 = 𝑦] ∼ ℕ[𝜇𝑋 + 𝜌𝜎𝑋 , 𝜎𝑋 (1 − 𝜌2 )]
𝜎𝑌
𝑥 − 𝜇𝑋 2
[𝑌 |𝑋 = 𝑥] ∼ ℕ[𝜇𝑌 + 𝜌𝜎𝑌 , 𝜎𝑌 (1 − 𝜌2 )]
𝜎𝑋

Let’s approximate the joint density by discretizing and mapping the approximating joint density into a matrix.
We can compute the discretized marginal density by just using matrix algebra and noting that

𝑓𝑖𝑗
Prob{𝑋 = 𝑖|𝑌 = 𝑗} =
∑𝑖 𝑓𝑖𝑗

Fix 𝑦 = 0.

# discretized marginal density


x = np.linspace(-10, 10, 1_000_000)
z = func(x, y=0) / np.sum(func(x, y=0))
plt.plot(x, z)
plt.show()

150 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

The mean and variance are computed by

𝑓𝑖𝑗
𝔼 [𝑋|𝑌 = 𝑗] = ∑ 𝑖𝑃 𝑟𝑜𝑏{𝑋 = 𝑖|𝑌 = 𝑗} = ∑ 𝑖
𝑖 𝑖
∑𝑖 𝑓𝑖𝑗
𝑓𝑖𝑗
2
𝔻 [𝑋|𝑌 = 𝑗] = ∑ (𝑖 − 𝜇𝑋|𝑌 =𝑗 )
𝑖
∑ 𝑓
𝑖 𝑖𝑗

Let’s draw from a normal distribution with above mean and variance and check how accurate our approximation is.

# discretized mean
μx = np.dot(x, z)

# discretized standard deviation


σx = np.sqrt(np.dot((x - μx)**2, z))

# sample
zz = np.random.normal(μx, σx, 1_000_000)
plt.hist(zz, bins=300, density=True, alpha=0.3, range=[-10, 10])
plt.show()

8.16. A Continuous Bivariate Random Vector 151


Intermediate Quantitative Economics with Python

Fix 𝑥 = 1.

y = np.linspace(0, 10, 1_000_000)


z = func(x=1, y=y) / np.sum(func(x=1, y=y))
plt.plot(y,z)
plt.show()

152 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

# discretized mean and standard deviation


μy = np.dot(y,z)
σy = np.sqrt(np.dot((y - μy)**2, z))

# sample
zz = np.random.normal(μy,σy,1_000_000)
plt.hist(zz, bins=100, density=True, alpha=0.3)
plt.show()

8.16. A Continuous Bivariate Random Vector 153


Intermediate Quantitative Economics with Python

We compare with the analytically computed parameters and note that they are close.

print(μx, σx)
print(μ1 + ρ * σ1 * (0 - μ2) / σ2, np.sqrt(σ1**2 * (1 - ρ**2)))

print(μy, σy)
print(μ2 + ρ * σ2 * (1 - μ1) / σ1, np.sqrt(σ2**2 * (1 - ρ**2)))

-0.9997518414498433 2.22658413316977
-1.0 2.227105745132009
5.039999456960768 0.9959851265795597
5.04 0.9959919678390986

8.17 Sum of Two Independently Distributed Random Variables

Let 𝑋, 𝑌 be two independent discrete random variables that take values in 𝑋,̄ 𝑌 ̄ , respectively.
Define a new random variable 𝑍 = 𝑋 + 𝑌 .
Evidently, 𝑍 takes values from 𝑍 ̄ defined as follows:

𝑋̄ = {0, 1, … , 𝐼 − 1}; 𝑓𝑖 = Prob{𝑋 = 𝑖}


̄
𝑌 = {0, 1, … , 𝐽 − 1}; 𝑔𝑗 = Prob{𝑌 = 𝑗}
𝑍 ̄ = {0, 1, … , 𝐼 + 𝐽 − 2}; ℎ𝑘 = Prob{𝑋 + 𝑌 = 𝑘}

154 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

Independence of 𝑋 and 𝑌 implies that

ℎ𝑘 = Prob{𝑋 = 0, 𝑌 = 𝑘} + Prob{𝑋 = 1, 𝑌 = 𝑘 − 1} + … + Prob{𝑋 = 𝑘, 𝑌 = 0}


ℎ𝑘 = 𝑓0 𝑔𝑘 + 𝑓1 𝑔𝑘−1 + … + 𝑓𝑘−1 𝑔1 + 𝑓𝑘 𝑔0 for 𝑘 = 0, 1, … , 𝐼 + 𝐽 − 2

Thus, we have:
𝑘
ℎ𝑘 = ∑ 𝑓𝑖 𝑔𝑘−𝑖 ≡ 𝑓 ∗ 𝑔
𝑖=0

where 𝑓 ∗ 𝑔 denotes the convolution of the 𝑓 and 𝑔 sequences.


Similarly, for two random variables 𝑋, 𝑌 with densities 𝑓𝑋 , 𝑔𝑌 , the density of 𝑍 = 𝑋 + 𝑌 is

𝑓𝑍 (𝑧) = ∫ 𝑓𝑋 (𝑥)𝑓𝑌 (𝑧 − 𝑥)𝑑𝑥 ≡ 𝑓𝑋 ∗ 𝑔𝑌
−∞

where 𝑓𝑋 ∗ 𝑔𝑌 denotes the convolution of the 𝑓𝑋 and 𝑔𝑌 functions.

8.18 Transition Probability Matrix

Consider the following joint probability distribution of two random variables.


Let 𝑋, 𝑌 be discrete random variables with joint distribution

Prob{𝑋 = 𝑖, 𝑌 = 𝑗} = 𝜌𝑖𝑗

where 𝑖 = 0, … , 𝐼 − 1; 𝑗 = 0, … , 𝐽 − 1 and

∑ ∑ 𝜌𝑖𝑗 = 1, 𝜌𝑖𝑗 ⩾ 0.
𝑖 𝑗

An associated conditional distribution is


𝜌𝑖𝑗 Prob{𝑌 = 𝑗, 𝑋 = 𝑖}
Prob{𝑌 = 𝑖|𝑋 = 𝑗} = =
∑𝑖 𝜌𝑖𝑗 Prob{𝑋 = 𝑖}

We can define a transition probability matrix


𝜌𝑖𝑗
𝑝𝑖𝑗 = Prob{𝑌 = 𝑗|𝑋 = 𝑖} =
∑𝑗 𝜌𝑖𝑗

where
𝑝 𝑝12
[ 11 ]
𝑝21 𝑝22

The first row is the probability of 𝑌 = 𝑗, 𝑗 = 0, 1 conditional on 𝑋 = 0.


The second row is the probability of 𝑌 = 𝑗, 𝑗 = 0, 1 conditional on 𝑋 = 1.
Note that
∑𝑗 𝜌𝑖𝑗
• ∑𝑗 𝜌𝑖𝑗 = ∑𝑗 𝜌𝑖𝑗 = 1, so each row of 𝜌 is a probability distribution (not so for each column.

8.18. Transition Probability Matrix 155


Intermediate Quantitative Economics with Python

8.19 Coupling

Start with a joint distribution

𝑓𝑖𝑗 = Prob{𝑋 = 𝑖, 𝑌 = 𝑗}
𝑖 = 0, ⋯ 𝐼 − 1
𝑗 = 0, ⋯ 𝐽 − 1
stacked to an 𝐼 × 𝐽 matrix
𝑒.𝑔. 𝐼 = 1, 𝐽 = 1

where
𝑓11 𝑓12
[ ]
𝑓21 𝑓22

From the joint distribution, we have shown above that we obtain unique marginal distributions.
Now we’ll try to go in a reverse direction.
We’ll find that from two marginal distributions, can we usually construct more than one joint distribution that verifies
these marginals.
Each of these joint distributions is called a coupling of the two marginal distributions.
Let’s start with marginal distributions

Prob{𝑋 = 𝑖} = ∑ 𝑓𝑖𝑗 = 𝜇𝑖 , 𝑖 = 0, ⋯ , 𝐼 − 1
𝑗

Prob{𝑌 = 𝑗} = ∑ 𝑓𝑖𝑗 = 𝜈𝑗 , 𝑗 = 0, ⋯ , 𝐽 − 1
𝑗

Given two marginal distribution, 𝜇 for 𝑋 and 𝜈 for 𝑌 , a joint distribution 𝑓𝑖𝑗 is said to be a coupling of 𝜇 and 𝜈.
Example:
Consider the following bivariate example.

Prob{𝑋 = 0} =1 − 𝑞 = 𝜇0
Prob{𝑋 = 1} =𝑞 = 𝜇1
Prob{𝑌 = 0} =1 − 𝑟 = 𝜈0
Prob{𝑌 = 1} =𝑟 = 𝜈1
where 0 ≤ 𝑞 < 𝑟 ≤ 1

We construct two couplings.


The first coupling if our two marginal distributions is the joint distribution

(1 − 𝑞)(1 − 𝑟) (1 − 𝑞)𝑟
𝑓𝑖𝑗 = [ ]
𝑞(1 − 𝑟) 𝑞𝑟

To verify that it is a coupling, we check that

(1 − 𝑞)(1 − 𝑟) + (1 − 𝑞)𝑟 + 𝑞(1 − 𝑟) + 𝑞𝑟 = 1


𝜇0 = (1 − 𝑞)(1 − 𝑟) + (1 − 𝑞)𝑟 = 1 − 𝑞
𝜇1 = 𝑞(1 − 𝑟) + 𝑞𝑟 = 𝑞
𝜈0 = (1 − 𝑞)(1 − 𝑟) + (1 − 𝑟)𝑞 = 1 − 𝑟
𝜇1 = 𝑟(1 − 𝑞) + 𝑞𝑟 = 𝑟

156 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

A second coupling of our two marginal distributions is the joint distribution


(1 − 𝑟) 𝑟−𝑞
𝑓𝑖𝑗 = [ ]
0 𝑞
The verify that this is a coupling, note that
1−𝑟+𝑟−𝑞+𝑞 =1
𝜇0 = 1 − 𝑞
𝜇1 = 𝑞
𝜈0 = 1 − 𝑟
𝜈1 = 𝑟
Thus, our two proposed joint distributions have the same marginal distributions.
But the joint distributions differ.
Thus, multiple joint distributions [𝑓𝑖𝑗 ] can have the same marginals.
Remark:
• Couplings are important in optimal transport problems and in Markov processes.

8.20 Copula Functions

Suppose that 𝑋1 , 𝑋2 , … , 𝑋𝑛 are 𝑁 random variables and that


• their marginal distributions are 𝐹1 (𝑥1 ), 𝐹2 (𝑥2 ), … , 𝐹𝑁 (𝑥𝑁 ), and
• their joint distribution is 𝐻(𝑥1 , 𝑥2 , … , 𝑥𝑁 )
Then there exists a copula function 𝐶(⋅) that verifies
𝐻(𝑥1 , 𝑥2 , … , 𝑥𝑁 ) = 𝐶(𝐹1 (𝑥1 ), 𝐹2 (𝑥2 ), … , 𝐹𝑁 (𝑥𝑁 )).
We can obtain
𝐶(𝑢1 , 𝑢2 , … , 𝑢𝑛 ) = 𝐻[𝐹1−1 (𝑢1 ), 𝐹2−1 (𝑢2 ), … , 𝐹𝑁−1 (𝑢𝑁 )]
In a reverse direction of logic, given univariate marginal distributions 𝐹1 (𝑥1 ), 𝐹2 (𝑥2 ), … , 𝐹𝑁 (𝑥𝑁 ) and a
copula function 𝐶(⋅), the function 𝐻(𝑥1 , 𝑥2 , … , 𝑥𝑁 ) = 𝐶(𝐹1 (𝑥1 ), 𝐹2 (𝑥2 ), … , 𝐹𝑁 (𝑥𝑁 )) is a coupling of
𝐹1 (𝑥1 ), 𝐹2 (𝑥2 ), … , 𝐹𝑁 (𝑥𝑁 ).
Thus, for given marginal distributions, we can use a copula function to determine a joint distribution when the associated
univariate random variables are not independent.
Copula functions are often used to characterize dependence of random variables.
Discrete marginal distribution
As mentioned above, for two given marginal distributions there can be more than one coupling.
For example, consider two random variables 𝑋, 𝑌 with distributions
Prob(𝑋 = 0) = 0.6,
Prob(𝑋 = 1) = 0.4,
Prob(𝑌 = 0) = 0.3,
Prob(𝑌 = 1) = 0.7,
For these two random variables there can be more than one coupling.
Let’s first generate X and Y.

8.20. Copula Functions 157


Intermediate Quantitative Economics with Python

# define parameters
mu = np.array([0.6, 0.4])
nu = np.array([0.3, 0.7])

# number of draws
draws = 1_000_000

# generate draws from uniform distribution


p = np.random.rand(draws)

# generate draws of X and Y via uniform distribution


x = np.ones(draws)
y = np.ones(draws)
x[p <= mu[0]] = 0
x[p > mu[0]] = 1
y[p <= nu[0]] = 0
y[p > nu[0]] = 1

# calculate parameters from draws


q_hat = sum(x[x == 1])/draws
r_hat = sum(y[y == 1])/draws

# print output
print("distribution for x")
xmtb = pt.PrettyTable()
xmtb.field_names = ['x_value', 'x_prob']
xmtb.add_row([0, 1-q_hat])
xmtb.add_row([1, q_hat])
print(xmtb)

print("distribution for y")


ymtb = pt.PrettyTable()
ymtb.field_names = ['y_value', 'y_prob']
ymtb.add_row([0, 1-r_hat])
ymtb.add_row([1, r_hat])
print(ymtb)

distribution for x
+---------+--------------------+
| x_value | x_prob |
+---------+--------------------+
| 0 | 0.6006279999999999 |
| 1 | 0.399372 |
+---------+--------------------+
distribution for y
+---------+----------+
| y_value | y_prob |
+---------+----------+
| 0 | 0.300752 |
| 1 | 0.699248 |
+---------+----------+

Let’s now take our two marginal distributions, one for 𝑋, the other for 𝑌 , and construct two distinct couplings.
For the first joint distribution:

Prob(𝑋 = 𝑖, 𝑌 = 𝑗) = 𝑓𝑖𝑗

158 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

where
0.18 0.42
[𝑓𝑖𝑗 ] = [ ]
0.12 0.28

Let’s use Python to construct this joint distribution and then verify that its marginal distributions are what we want.

# define parameters
f1 = np.array([[0.18, 0.42], [0.12, 0.28]])
f1_cum = np.cumsum(f1)

# number of draws
draws1 = 1_000_000

# generate draws from uniform distribution


p = np.random.rand(draws1)

# generate draws of first copuling via uniform distribution


c1 = np.vstack([np.ones(draws1), np.ones(draws1)])
# X=0, Y=0
c1[0, p <= f1_cum[0]] = 0
c1[1, p <= f1_cum[0]] = 0
# X=0, Y=1
c1[0, (p > f1_cum[0])*(p <= f1_cum[1])] = 0
c1[1, (p > f1_cum[0])*(p <= f1_cum[1])] = 1
# X=1, Y=0
c1[0, (p > f1_cum[1])*(p <= f1_cum[2])] = 1
c1[1, (p > f1_cum[1])*(p <= f1_cum[2])] = 0
# X=1, Y=1
c1[0, (p > f1_cum[2])*(p <= f1_cum[3])] = 1
c1[1, (p > f1_cum[2])*(p <= f1_cum[3])] = 1

# calculate parameters from draws


f1_00 = sum((c1[0, :] == 0)*(c1[1, :] == 0))/draws1
f1_01 = sum((c1[0, :] == 0)*(c1[1, :] == 1))/draws1
f1_10 = sum((c1[0, :] == 1)*(c1[1, :] == 0))/draws1
f1_11 = sum((c1[0, :] == 1)*(c1[1, :] == 1))/draws1

# print output of first joint distribution


print("first joint distribution for c1")
c1_mtb = pt.PrettyTable()
c1_mtb.field_names = ['c1_x_value', 'c1_y_value', 'c1_prob']
c1_mtb.add_row([0, 0, f1_00])
c1_mtb.add_row([0, 1, f1_01])
c1_mtb.add_row([1, 0, f1_10])
c1_mtb.add_row([1, 1, f1_11])
print(c1_mtb)

first joint distribution for c1


+------------+------------+----------+
| c1_x_value | c1_y_value | c1_prob |
+------------+------------+----------+
| 0 | 0 | 0.179818 |
| 0 | 1 | 0.420259 |
| 1 | 0 | 0.120202 |
| 1 | 1 | 0.279721 |
+------------+------------+----------+

8.20. Copula Functions 159


Intermediate Quantitative Economics with Python

# calculate parameters from draws


c1_q_hat = sum(c1[0, :] == 1)/draws1
c1_r_hat = sum(c1[1, :] == 1)/draws1

# print output
print("marginal distribution for x")
c1_x_mtb = pt.PrettyTable()
c1_x_mtb.field_names = ['c1_x_value', 'c1_x_prob']
c1_x_mtb.add_row([0, 1-c1_q_hat])
c1_x_mtb.add_row([1, c1_q_hat])
print(c1_x_mtb)

print("marginal distribution for y")


c1_ymtb = pt.PrettyTable()
c1_ymtb.field_names = ['c1_y_value', 'c1_y_prob']
c1_ymtb.add_row([0, 1-c1_r_hat])
c1_ymtb.add_row([1, c1_r_hat])
print(c1_ymtb)

marginal distribution for x

+------------+-----------+
| c1_x_value | c1_x_prob |
+------------+-----------+
| 0 | 0.600077 |
| 1 | 0.399923 |
+------------+-----------+
marginal distribution for y
+------------+---------------------+
| c1_y_value | c1_y_prob |
+------------+---------------------+
| 0 | 0.30001999999999995 |
| 1 | 0.69998 |
+------------+---------------------+

Now, let’s construct another joint distribution that is also a coupling of 𝑋 and 𝑌

0.3 0.3
[𝑓𝑖𝑗 ] = [ ]
0 0.4

# define parameters
f2 = np.array([[0.3, 0.3], [0, 0.4]])
f2_cum = np.cumsum(f2)

# number of draws
draws2 = 1_000_000

# generate draws from uniform distribution


p = np.random.rand(draws2)

# generate draws of first coupling via uniform distribution


c2 = np.vstack([np.ones(draws2), np.ones(draws2)])
# X=0, Y=0
c2[0, p <= f2_cum[0]] = 0
c2[1, p <= f2_cum[0]] = 0
(continues on next page)

160 Chapter 8. Elementary Probability with Matrices


Intermediate Quantitative Economics with Python

(continued from previous page)


# X=0, Y=1
c2[0, (p > f2_cum[0])*(p <= f2_cum[1])] = 0
c2[1, (p > f2_cum[0])*(p <= f2_cum[1])] = 1
# X=1, Y=0
c2[0, (p > f2_cum[1])*(p <= f2_cum[2])] = 1
c2[1, (p > f2_cum[1])*(p <= f2_cum[2])] = 0
# X=1, Y=1
c2[0, (p > f2_cum[2])*(p <= f2_cum[3])] = 1
c2[1, (p > f2_cum[2])*(p <= f2_cum[3])] = 1

# calculate parameters from draws


f2_00 = sum((c2[0, :] == 0)*(c2[1, :] == 0))/draws2
f2_01 = sum((c2[0, :] == 0)*(c2[1, :] == 1))/draws2
f2_10 = sum((c2[0, :] == 1)*(c2[1, :] == 0))/draws2
f2_11 = sum((c2[0, :] == 1)*(c2[1, :] == 1))/draws2

# print output of second joint distribution


print("first joint distribution for c2")
c2_mtb = pt.PrettyTable()
c2_mtb.field_names = ['c2_x_value', 'c2_y_value', 'c2_prob']
c2_mtb.add_row([0, 0, f2_00])
c2_mtb.add_row([0, 1, f2_01])
c2_mtb.add_row([1, 0, f2_10])
c2_mtb.add_row([1, 1, f2_11])
print(c2_mtb)

first joint distribution for c2


+------------+------------+----------+
| c2_x_value | c2_y_value | c2_prob |
+------------+------------+----------+
| 0 | 0 | 0.29983 |
| 0 | 1 | 0.300708 |
| 1 | 0 | 0.0 |
| 1 | 1 | 0.399462 |
+------------+------------+----------+

# calculate parameters from draws


c2_q_hat = sum(c2[0, :] == 1)/draws2
c2_r_hat = sum(c2[1, :] == 1)/draws2

# print output
print("marginal distribution for x")
c2_x_mtb = pt.PrettyTable()
c2_x_mtb.field_names = ['c2_x_value', 'c2_x_prob']
c2_x_mtb.add_row([0, 1-c2_q_hat])
c2_x_mtb.add_row([1, c2_q_hat])
print(c2_x_mtb)

print("marginal distribution for y")


c2_ymtb = pt.PrettyTable()
c2_ymtb.field_names = ['c2_y_value', 'c2_y_prob']
c2_ymtb.add_row([0, 1-c2_r_hat])
c2_ymtb.add_row([1, c2_r_hat])
print(c2_ymtb)

8.20. Copula Functions 161


Intermediate Quantitative Economics with Python

marginal distribution for x

+------------+-----------+
| c2_x_value | c2_x_prob |
+------------+-----------+
| 0 | 0.600538 |
| 1 | 0.399462 |
+------------+-----------+
marginal distribution for y
+------------+---------------------+
| c2_y_value | c2_y_prob |
+------------+---------------------+
| 0 | 0.29983000000000004 |
| 1 | 0.70017 |
+------------+---------------------+

We have verified that both joint distributions, 𝑐1 and 𝑐2 , have identical marginal distributions of 𝑋 and 𝑌 , respectively.
So they are both couplings of 𝑋 and 𝑌 .

8.21 Time Series

Suppose that there are two time periods.


• 𝑡 = 0 “today”
• 𝑡 = 1 “tomorrow”
Let 𝑋(0) be a random variable to be realized at 𝑡 = 0, 𝑋(1) be a random variable to be realized at 𝑡 = 1.
Suppose that

Prob{𝑋(0) = 𝑖, 𝑋(1) = 𝑗} = 𝑓𝑖𝑗 ≥ 0 𝑖 = 0, ⋯ , 𝐼 − 1


∑ ∑ 𝑓𝑖𝑗 = 1
𝑖 𝑗

𝑓𝑖𝑗 is a joint distribution over [𝑋(0), 𝑋(1)].


A conditional distribution is
𝑓𝑖𝑗
Prob{𝑋(1) = 𝑗|𝑋(0) = 𝑖} =
∑𝑗 𝑓𝑖𝑗

Remark:
• This is a key formula for a theory of optimally predicting a time series.

162 Chapter 8. Elementary Probability with Matrices


CHAPTER

NINE

LLN AND CLT

Contents

• LLN and CLT


– Overview
– Relationships
– LLN
– CLT
– Exercises

9.1 Overview

This lecture illustrates two of the most important theorems of probability and statistics: The law of large numbers (LLN)
and the central limit theorem (CLT).
These beautiful theorems lie behind many of the most fundamental results in econometrics and quantitative economic
modeling.
The lecture is based around simulations that show the LLN and CLT in action.
We also demonstrate how the LLN and CLT break down when the assumptions they are based on do not hold.
In addition, we examine several useful extensions of the classical theorems, such as
• The delta method, for smooth functions of random variables, and
• the multivariate case.
Some of these extensions are presented as exercises.
We’ll need the following imports:

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import random
import numpy as np
from scipy.stats import t, beta, lognorm, expon, gamma, uniform
from scipy.stats import gaussian_kde, poisson, binom, norm, chi2
from mpl_toolkits.mplot3d import Axes3D
(continues on next page)

163
Intermediate Quantitative Economics with Python

(continued from previous page)


from matplotlib.collections import PolyCollection
from scipy.linalg import inv, sqrtm

9.2 Relationships

The CLT refines the LLN.


The LLN gives conditions under which sample moments converge to population moments as sample size increases.
The CLT provides information about the rate at which sample moments converge to population moments as sample size
increases.

9.3 LLN

We begin with the law of large numbers, which tells us when sample averages will converge to their population means.

9.3.1 The Classical LLN

The classical law of large numbers concerns independent and identically distributed (IID) random variables.
Here is the strongest version of the classical LLN, known as Kolmogorov’s strong law.
Let 𝑋1 , … , 𝑋𝑛 be independent and identically distributed scalar random variables, with common distribution 𝐹 .
When it exists, let 𝜇 denote the common mean of this sample:

𝜇 ∶= 𝔼𝑋 = ∫ 𝑥𝐹 (𝑑𝑥)

In addition, let

1 𝑛
𝑋̄ 𝑛 ∶= ∑ 𝑋𝑖
𝑛 𝑖=1

Kolmogorov’s strong law states that, if 𝔼|𝑋| is finite, then

ℙ {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (9.1)

What does this last expression mean?


Let’s think about it from a simulation perspective, imagining for a moment that our computer can generate perfect random
samples (which of course it can’t).
Let’s also imagine that we can generate infinite sequences so that the statement 𝑋̄ 𝑛 → 𝜇 can be evaluated.
In this setting, (9.1) should be interpreted as meaning that the probability of the computer producing a sequence where
𝑋̄ 𝑛 → 𝜇 fails to occur is zero.

164 Chapter 9. LLN and CLT


Intermediate Quantitative Economics with Python

9.3.2 Proof

The proof of Kolmogorov’s strong law is nontrivial – see, for example, theorem 8.3.5 of [Dudley, 2002].
On the other hand, we can prove a weaker version of the LLN very easily and still get most of the intuition.
The version we prove is as follows: If 𝑋1 , … , 𝑋𝑛 is IID with 𝔼𝑋𝑖2 < ∞, then, for any 𝜖 > 0, we have

ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} → 0 as 𝑛→∞ (9.2)

(This version is weaker because we claim only convergence in probability rather than almost sure convergence, and assume
a finite second moment)
To see that this is so, fix 𝜖 > 0, and let 𝜎2 be the variance of each 𝑋𝑖 .
Recall the Chebyshev inequality, which tells us that

𝔼[(𝑋̄ 𝑛 − 𝜇)2 ]
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (9.3)
𝜖2
Now observe that
2

{ 1 𝑛 ⎫
}
̄ 2
𝔼[(𝑋𝑛 − 𝜇) ] = 𝔼 ⎨[ ∑(𝑋𝑖 − 𝜇)] ⎬
{ 𝑛 𝑖=1 }
⎩ ⎭
𝑛 𝑛
1
= 2 ∑ ∑ 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇)
𝑛 𝑖=1 𝑗=1
1 𝑛
= ∑ 𝔼(𝑋𝑖 − 𝜇)2
𝑛2 𝑖=1
𝜎2
=
𝑛
Here the crucial step is at the third equality, which follows from independence.
Independence means that if 𝑖 ≠ 𝑗, then the covariance term 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) drops out.
As a result, 𝑛2 − 𝑛 terms vanish, leading us to a final expression that goes to zero in 𝑛.
Combining our last result with (9.3), we come to the estimate

𝜎2
ℙ {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (9.4)
𝑛𝜖2
The claim in (9.2) is now clear.
Of course, if the sequence 𝑋1 , … , 𝑋𝑛 is correlated, then the cross-product terms 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) are not necessarily
zero.
While this doesn’t mean that the same line of argument is impossible, it does mean that if we want a similar result then
the covariances should be “almost zero” for “most” of these terms.
In a long sequence, this would be true if, for example, 𝔼(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) approached zero when the difference between
𝑖 and 𝑗 became large.
In other words, the LLN can still work if the sequence 𝑋1 , … , 𝑋𝑛 has a kind of “asymptotic independence”, in the sense
that correlation falls to zero as variables become further apart in the sequence.
This idea is very important in time series analysis, and we’ll come across it again soon enough.

9.3. LLN 165


Intermediate Quantitative Economics with Python

9.3.3 Illustration

Let’s now illustrate the classical IID law of large numbers using simulation.
In particular, we aim to generate some sequences of IID random variables and plot the evolution of 𝑋̄ 𝑛 as 𝑛 increases.
Below is a figure that does just this (as usual, you can click on it to expand it).
It shows IID observations from three different distributions and plots 𝑋̄ 𝑛 against 𝑛 in each case.
The dots represent the underlying observations 𝑋𝑖 for 𝑖 = 1, … , 100.
In each of the three cases, convergence of 𝑋̄ 𝑛 to 𝜇 occurs as predicted

n = 100

# Arbitrary collection of distributions


distributions = {"student's t with 10 degrees of freedom": t(10),
"β(2, 2)": beta(2, 2),
"lognormal LN(0, 1/2)": lognorm(0.5),
"γ(5, 1/2)": gamma(5, scale=2),
"poisson(4)": poisson(4),
"exponential with λ = 1": expon(1)}

# Create a figure and some axes


num_plots = 3
fig, axes = plt.subplots(num_plots, 1, figsize=(10, 20))

# Set some plotting parameters to improve layout


bbox = (0., 1.02, 1., .102)
legend_args = {'ncol': 2,
'bbox_to_anchor': bbox,
'loc': 3,
'mode': 'expand'}
plt.subplots_adjust(hspace=0.5)

for ax in axes:
# Choose a randomly selected distribution
name = random.choice(list(distributions.keys()))
distribution = distributions.pop(name)

# Generate n draws from the distribution


data = distribution.rvs(n)

# Compute sample mean at each n


sample_mean = np.empty(n)
for i in range(n):
sample_mean[i] = np.mean(data[:i+1])

# Plot
ax.plot(list(range(n)), data, 'o', color='grey', alpha=0.5)
axlabel = '$\\bar{X}_n$ for $X_i \sim$' + name
ax.plot(list(range(n)), sample_mean, 'g-', lw=3, alpha=0.6, label=axlabel)
m = distribution.mean()
ax.plot(list(range(n)), [m] * n, 'k--', lw=1.5, label='$\mu$')
ax.vlines(list(range(n)), m, data, lw=0.2)
ax.legend(**legend_args, fontsize=12)

plt.show()

166 Chapter 9. LLN and CLT


Intermediate Quantitative Economics with Python

9.3. LLN 167


Intermediate Quantitative Economics with Python

The three distributions are chosen at random from a selection stored in the dictionary distributions.

9.4 CLT

Next, we turn to the central limit theorem, which tells us about the distribution of the deviation between sample averages
and population means.

9.4.1 Statement of the Theorem

The central limit theorem is one of the most remarkable results in all of mathematics.
In the classical IID setting, it tells us the following:
If the sequence 𝑋1 , … , 𝑋𝑛 is IID, with common mean 𝜇 and common variance 𝜎2 ∈ (0, ∞), then
√ 𝑑
𝑛(𝑋̄ 𝑛 − 𝜇) → 𝑁 (0, 𝜎2 ) as 𝑛 → ∞ (9.5)

𝑑
Here → 𝑁 (0, 𝜎2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal with standard deviation 𝜎.

9.4.2 Intuition

The striking implication of the CLT is that for any distribution with finite second moment, the simple operation of adding
independent copies always leads to a Gaussian curve.
A relatively simple proof of the central limit theorem can be obtained by working with characteristic functions (see, e.g.,
theorem 9.5.6 of [Dudley, 2002]).
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition.
In fact, all of the proofs of the CLT that we know are similar in this respect.
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating the addition of independent Bernoulli random variables.
In particular, let 𝑋𝑖 be binary, with ℙ{𝑋𝑖 = 0} = ℙ{𝑋𝑖 = 1} = 0.5, and let 𝑋1 , … , 𝑋𝑛 be independent.
𝑛
Think of 𝑋𝑖 = 1 as a “success”, so that 𝑌𝑛 = ∑𝑖=1 𝑋𝑖 is the number of successes in 𝑛 trials.
The next figure plots the probability mass function of 𝑌𝑛 for 𝑛 = 1, 2, 4, 8

fig, axes = plt.subplots(2, 2, figsize=(10, 6))


plt.subplots_adjust(hspace=0.4)
axes = axes.flatten()
ns = [1, 2, 4, 8]
dom = list(range(9))

for ax, n in zip(axes, ns):


b = binom(n, 0.5)
ax.bar(dom, b.pmf(dom), alpha=0.6, align='center')
ax.set(xlim=(-0.5, 8.5), ylim=(0, 0.55),
xticks=list(range(9)), yticks=(0, 0.2, 0.4),
title=f'$n = {n}$')

plt.show()

168 Chapter 9. LLN and CLT


Intermediate Quantitative Economics with Python

When 𝑛 = 1, the distribution is flat — one success or no successes have the same probability.
When 𝑛 = 2 we can either have 0, 1 or 2 successes.
Notice the peak in probability mass at the mid-point 𝑘 = 1.
The reason is that there are more ways to get 1 success (“fail then succeed” or “succeed then fail”) than to get zero or two
successes.
Moreover, the two trials are independent, so the outcomes “fail then succeed” and “succeed then fail” are just as likely as
the outcomes “fail then fail” and “succeed then succeed”.
(If there was positive correlation, say, then “succeed then fail” would be less likely than “succeed then succeed”)
Here, already we have the essence of the CLT: addition under independence leads probability mass to pile up in the middle
and thin out at the tails.
For 𝑛 = 4 and 𝑛 = 8 we again get a peak at the “middle” value (halfway between the minimum and the maximum
possible value).
The intuition is the same — there are simply more ways to get these middle outcomes.
If we continue, the bell-shaped curve becomes even more pronounced.
We are witnessing the binomial approximation of the normal distribution.

9.4. CLT 169


Intermediate Quantitative Economics with Python

9.4.3 Simulation 1

Since the CLT seems almost magical, running simulations that verify its implications is one good way to build intuition.
To this end, we now perform the following simulation
1. Choose an arbitrary distribution 𝐹 for the underlying observations 𝑋𝑖 .

2. Generate independent draws of 𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇).
3. Use these draws to compute some measure of their distribution — such as a histogram.
4. Compare the latter to 𝑁 (0, 𝜎2 ).
Here’s some code that does exactly this for the exponential distribution 𝐹 (𝑥) = 1 − 𝑒−𝜆𝑥 .
(Please experiment with other choices of 𝐹 , but remember that, to conform with the conditions of the CLT, the distribution
must have a finite second moment.)

# Set parameters
n = 250 # Choice of n
k = 100000 # Number of draws of Y_n
distribution = expon(2) # Exponential distribution, λ = 1/2
μ, s = distribution.mean(), distribution.std()

# Draw underlying RVs. Each row contains a draw of X_1,..,X_n


data = distribution.rvs((k, n))
# Compute mean of each row, producing k draws of \bar X_n
sample_means = data.mean(axis=1)
# Generate observations of Y_n
Y = np.sqrt(n) * (sample_means - μ)

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * s, 3 * s
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k-', lw=2, label='$N(0, \sigma^2)$')
ax.legend()

plt.show()

170 Chapter 9. LLN and CLT


Intermediate Quantitative Economics with Python

Notice the absence of for loops — every operation is vectorized, meaning that the major calculations are all shifted to
highly optimized C code.
The fit to the normal density is already tight and can be further improved by increasing n.
You can also experiment with other specifications of 𝐹 .

9.4.4 Simulation 2

Our next simulation is somewhat like the first, except that we aim to track the distribution of 𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇) as 𝑛
increases.
In the simulation, we’ll be working with random variables having 𝜇 = 0.
Thus, when 𝑛 = 1, we have 𝑌1 = 𝑋1 , so the first distribution is just the distribution of the underlying random variable.

For 𝑛 = 2, the distribution of 𝑌2 is that of (𝑋1 + 𝑋2 )/ 2, and so on.
What we expect is that, regardless of the distribution of the underlying random variable, the distribution of 𝑌𝑛 will smooth
out into a bell-shaped curve.
The next figure shows this process for 𝑋𝑖 ∼ 𝑓, where 𝑓 was specified as the convex combination of three different beta
densities.
(Taking a convex combination is an easy way to produce an irregular shape for 𝑓.)
In the figure, the closest density is that of 𝑌1 , while the furthest is that of 𝑌5

beta_dist = beta(2, 2)

def gen_x_draws(k):
"""
Returns a flat array containing k independent draws from the
distribution of X, the underlying random variable. This distribution
(continues on next page)

9.4. CLT 171


Intermediate Quantitative Economics with Python

(continued from previous page)


is itself a convex combination of three beta distributions.
"""
bdraws = beta_dist.rvs((3, k))
# Transform rows, so each represents a different distribution
bdraws[0, :] -= 0.5
bdraws[1, :] += 0.6
bdraws[2, :] -= 1.1
# Set X[i] = bdraws[j, i], where j is a random draw from {0, 1, 2}
js = np.random.randint(0, 2, size=k)
X = bdraws[js, np.arange(k)]
# Rescale, so that the random variable is zero mean
m, sigma = X.mean(), X.std()
return (X - m) / sigma

nmax = 5
reps = 100000
ns = list(range(1, nmax + 1))

# Form a matrix Z such that each column is reps independent draws of X


Z = np.empty((reps, nmax))
for i in range(nmax):
Z[:, i] = gen_x_draws(reps)
# Take cumulative sum across columns
S = Z.cumsum(axis=1)
# Multiply j-th column by sqrt j
Y = (1 / np.sqrt(ns)) * S

# Plot
ax = plt.figure(figsize = (10, 6)).add_subplot(projection='3d')

a, b = -3, 3
gs = 100
xs = np.linspace(a, b, gs)

# Build verts
greys = np.linspace(0.3, 0.7, nmax)
verts = []
for n in ns:
density = gaussian_kde(Y[:, n-1])
ys = density(xs)
verts.append(list(zip(xs, ys)))

poly = PolyCollection(verts, facecolors=[str(g) for g in greys])


poly.set_alpha(0.85)
ax.add_collection3d(poly, zs=ns, zdir='x')

ax.set(xlim3d=(1, nmax), xticks=(ns), ylabel='$Y_n$', zlabel='$p(y_n)$',


xlabel=("n"), yticks=((-3, 0, 3)), ylim3d=(a, b),
zlim3d=(0, 0.4), zticks=((0.2, 0.4)))
ax.invert_xaxis()
# Rotates the plot 30 deg on z axis and 45 deg on x axis
ax.view_init(30, 45)
plt.show()

172 Chapter 9. LLN and CLT


Intermediate Quantitative Economics with Python

As expected, the distribution smooths out into a bell curve as 𝑛 increases.


We leave you to investigate its contents if you wish to know more.
If you run the file from the ordinary IPython shell, the figure should pop up in a window that you can rotate with your
mouse, giving different views on the density sequence.

9.4.5 The Multivariate Case

The law of large numbers and central limit theorem work just as nicely in multidimensional settings.
To state the results, let’s recall some elementary facts about random vectors.
A random vector X is just a sequence of 𝑘 random variables (𝑋1 , … , 𝑋𝑘 ).
Each realization of X is an element of ℝ𝑘 .
A collection of random vectors X1 , … , X𝑛 is called independent if, given any 𝑛 vectors x1 , … , x𝑛 in ℝ𝑘 , we have

ℙ{X1 ≤ x1 , … , X𝑛 ≤ x𝑛 } = ℙ{X1 ≤ x1 } × ⋯ × ℙ{X𝑛 ≤ x𝑛 }

(The vector inequality X ≤ x means that 𝑋𝑗 ≤ 𝑥𝑗 for 𝑗 = 1, … , 𝑘)


Let 𝜇𝑗 ∶= 𝔼[𝑋𝑗 ] for all 𝑗 = 1, … , 𝑘.

9.4. CLT 173


Intermediate Quantitative Economics with Python

The expectation 𝔼[X] of X is defined to be the vector of expectations:

𝔼[𝑋1 ] 𝜇1
⎛ 𝔼[𝑋2 ] ⎞ ⎛ 𝜇2 ⎞
𝔼[X] ∶= ⎜



⎟ ⎜
⎟=⎜



⎟ =∶ 𝜇
⋮ ⋮
⎝ 𝔼[𝑋 ]
𝑘 ⎠ ⎝ 𝜇𝑘 ⎠

The variance-covariance matrix of random vector X is defined as

Var[X] ∶= 𝔼[(X − 𝜇)(X − 𝜇)′ ]

Expanding this out, we get

𝔼[(𝑋1 − 𝜇1 )(𝑋1 − 𝜇1 )] ⋯ 𝔼[(𝑋1 − 𝜇1 )(𝑋𝑘 − 𝜇𝑘 )]



⎜ 𝔼[(𝑋2 − 𝜇2 )(𝑋1 − 𝜇1 )] ⋯ 𝔼[(𝑋2 − 𝜇2 )(𝑋𝑘 − 𝜇𝑘 )] ⎞

Var[X] = ⎜
⎜ ⎟

⋮ ⋮ ⋮
⎝ 𝔼[(𝑋𝑘 − 𝜇𝑘 )(𝑋1 − 𝜇1 )] ⋯ 𝔼[(𝑋𝑘 − 𝜇𝑘 )(𝑋𝑘 − 𝜇𝑘 )] ⎠

The 𝑗, 𝑘-th term is the scalar covariance between 𝑋𝑗 and 𝑋𝑘 .


With this notation, we can proceed to the multivariate LLN and CLT.
Let X1 , … , X𝑛 be a sequence of independent and identically distributed random vectors, each one taking values in ℝ𝑘 .
Let 𝜇 be the vector 𝔼[X𝑖 ], and let Σ be the variance-covariance matrix of X𝑖 .
Interpreting vector addition and scalar multiplication in the usual way (i.e., pointwise), let

1 𝑛
X̄ 𝑛 ∶= ∑ X𝑖
𝑛 𝑖=1

In this setting, the LLN tells us that

ℙ {X̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (9.6)

Here X̄ 𝑛 → 𝜇 means that ‖X̄ 𝑛 − 𝜇‖ → 0, where ‖ ⋅ ‖ is the standard Euclidean norm.


The CLT tells us that, provided Σ is finite,
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) as 𝑛→∞ (9.7)

9.5 Exercises

Exercise 9.5.1
One very useful consequence of the central limit theorem is as follows.
Assume the conditions of the CLT as stated above.
If 𝑔 ∶ ℝ → ℝ is differentiable at 𝜇 and 𝑔′ (𝜇) ≠ 0, then
√ 𝑑
𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} → 𝑁 (0, 𝑔′ (𝜇)2 𝜎2 ) as 𝑛 → ∞ (9.8)

This theorem is used frequently in statistics to obtain the asymptotic distribution of estimators — many of which can be
expressed as functions of sample means.
(These kinds of results are often said to use the “delta method”.)

174 Chapter 9. LLN and CLT


Intermediate Quantitative Economics with Python

The proof is based on a Taylor expansion of 𝑔 around the point 𝜇.


Taking the result as given, let the distribution 𝐹 of each 𝑋𝑖 be uniform on [0, 𝜋/2] and let 𝑔(𝑥) = sin(𝑥).

Derive the asymptotic distribution of 𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} and illustrate convergence in the same spirit as the program
discussed above.
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
What is the source of the problem?

Solution to Exercise 9.5.1


Here is one solution

"""
Illustrates the delta method, a consequence of the central limit theorem.
"""

# Set parameters
n = 250
replications = 100000
distribution = uniform(loc=0, scale=(np.pi / 2))
μ, s = distribution.mean(), distribution.std()

g = np.sin
g_prime = np.cos

# Generate obs of sqrt{n} (g(X_n) - g(μ))


data = distribution.rvs((replications, n))
sample_means = data.mean(axis=1) # Compute mean of each row
error_obs = np.sqrt(n) * (g(sample_means) - g(μ))

# Plot
asymptotic_sd = g_prime(μ) * s
fig, ax = plt.subplots(figsize=(10, 6))
xmin = -3 * g_prime(μ) * s
xmax = -xmin
ax.set_xlim(xmin, xmax)
ax.hist(error_obs, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
lb = "$N(0, g'(\mu)^2 \sigma^2)$"
ax.plot(xgrid, norm.pdf(xgrid, scale=asymptotic_sd), 'k-', lw=2, label=lb)
ax.legend()
plt.show()

9.5. Exercises 175


Intermediate Quantitative Economics with Python

What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
In this case, the mean 𝜇 of this distribution is 𝜋/2, and since 𝑔′ = cos, we have 𝑔′ (𝜇) = 0.
Hence the conditions of the delta theorem are not satisfied.

Exercise 9.5.2
Here’s a result that’s often used in developing statistical tests, and is connected to the multivariate central limit theorem.
If you study econometric theory, you will see this result used again and again.
Assume the setting of the multivariate CLT discussed above, so that
1. X1 , … , X𝑛 is a sequence of IID random vectors, each taking values in ℝ𝑘 .
2. 𝜇 ∶= 𝔼[X𝑖 ], and Σ is the variance-covariance matrix of X𝑖 .
3. The convergence
√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) (9.9)
is valid.
In a statistical setting, one often wants the right-hand side to be standard normal so that confidence intervals are easily
computed.
This normalization can be achieved on the basis of three observations.
First, if X is a random vector in ℝ𝑘 and A is constant and 𝑘 × 𝑘, then

Var[AX] = A Var[X]A′
𝑑
Second, by the continuous mapping theorem, if Z𝑛 → Z in ℝ𝑘 and A is constant and 𝑘 × 𝑘, then
𝑑
AZ𝑛 → AZ

176 Chapter 9. LLN and CLT


Intermediate Quantitative Economics with Python

Third, if S is a 𝑘 × 𝑘 symmetric positive definite matrix, then there exists a symmetric positive definite matrix Q, called
the inverse square root of S, such that

QSQ′ = I

Here I is the 𝑘 × 𝑘 identity matrix.


Putting these things together, your first exercise is to show that if Q is the inverse square root of 2/7, then
√ 𝑑
Z𝑛 ∶= 𝑛Q(X̄ 𝑛 − 𝜇) → Z ∼ 𝑁 (0, I)

Applying the continuous mapping theorem one more time tells us that
𝑑
‖Z𝑛 ‖2 → ‖Z‖2

Given the distribution of Z, we conclude that


𝑑
𝑛‖Q(X̄ 𝑛 − 𝜇)‖2 → 𝜒2 (𝑘) (9.10)

where 𝜒2 (𝑘) is the chi-squared distribution with 𝑘 degrees of freedom.


(Recall that 𝑘 is the dimension of X𝑖 , the underlying random vectors.)
Your second exercise is to illustrate the convergence in (9.10) with a simulation.
In doing so, let

𝑊𝑖
X𝑖 ∶= ( )
𝑈𝑖 + 𝑊𝑖

where
• each 𝑊𝑖 is an IID draw from the uniform distribution on [−1, 1].
• each 𝑈𝑖 is an IID draw from the uniform distribution on [−2, 2].
• 𝑈𝑖 and 𝑊𝑖 are independent of each other.

Hint:
1. scipy.linalg.sqrtm(A) computes the square root of A. You still need to invert it.
2. You should be able to work out Σ from the preceding information.

Solution to Exercise 9.5.2


First we want to verify the claim that
√ 𝑑
𝑛Q(X̄ 𝑛 − 𝜇) → 𝑁 (0, I)

This is straightforward given the facts presented in the exercise.


Let

Y𝑛 ∶= 𝑛(X̄ 𝑛 − 𝜇) and Y ∼ 𝑁 (0, Σ)

By the multivariate CLT and the continuous mapping theorem, we have


𝑑
QY𝑛 → QY

9.5. Exercises 177


Intermediate Quantitative Economics with Python

Since linear combinations of normal random variables are normal, the vector QY is also normal.
Its mean is clearly 0, and its variance-covariance matrix is

Var[QY] = QVar[Y]Q′ = QΣQ′ = I

𝑑
In conclusion, QY𝑛 → QY ∼ 𝑁 (0, I), which is what we aimed to show.
Now we turn to the simulation exercise.
Our solution is as follows

# Set parameters
n = 250
replications = 50000
dw = uniform(loc=-1, scale=2) # Uniform(-1, 1)
du = uniform(loc=-2, scale=4) # Uniform(-2, 2)
sw, su = dw.std(), du.std()
vw, vu = sw**2, su**2
Σ = ((vw, vw), (vw, vw + vu))
Σ = np.array(Σ)

# Compute Σ^{-1/2}
Q = inv(sqrtm(Σ))

# Generate observations of the normalized sample mean


error_obs = np.empty((2, replications))
for i in range(replications):
# Generate one sequence of bivariate shocks
X = np.empty((2, n))
W = dw.rvs(n)
U = du.rvs(n)
# Construct the n observations of the random vector
X[0, :] = W
X[1, :] = W + U
# Construct the i-th observation of Y_n
error_obs[:, i] = np.sqrt(n) * X.mean(axis=1)

# Premultiply by Q and then take the squared norm


temp = Q @ error_obs
chisq_obs = np.sum(temp**2, axis=0)

# Plot
fig, ax = plt.subplots(figsize=(10, 6))
xmax = 8
ax.set_xlim(0, xmax)
xgrid = np.linspace(0, xmax, 200)
lb = "Chi-squared with 2 degrees of freedom"
ax.plot(xgrid, chi2.pdf(xgrid, 2), 'k-', lw=2, label=lb)
ax.legend()
ax.hist(chisq_obs, bins=50, density=True)
plt.show()

178 Chapter 9. LLN and CLT


Intermediate Quantitative Economics with Python

9.5. Exercises 179


Intermediate Quantitative Economics with Python

180 Chapter 9. LLN and CLT


CHAPTER

TEN

TWO MEANINGS OF PROBABILITY

10.1 Overview

This lecture illustrates two distinct interpretations of a probability distribution


• A frequentist interpretation as relative frequencies anticipated to occur in a large i.i.d. sample
• A Bayesian interpretation as a personal opinion (about a parameter or list of parameters) after seeing a collection
of observations
We recommend watching this video about hypothesis testing within the frequentist approach

https://youtu.be/8JIe_cz6qGA

After you watch that video, please watch the following video on the Bayesian approach to constructing coverage intervals

https://youtu.be/Pahyv9i_X2k

After you are familiar with the material in these videos, this lecture uses the Socratic method to to help consolidate your
understanding of the different questions that are answered by
• a frequentist confidence interval
• a Bayesian coverage interval
We do this by inviting you to write some Python code.
It would be especially useful if you tried doing this after each question that we pose for you, before proceeding to read
the rest of the lecture.
We provide our own answers as the lecture unfolds, but you’ll learn more if you try writing your own code before reading
and running ours.
Code for answering questions:
In addition to what’s in Anaconda, this lecture will deploy the following library:

pip install prettytable

To answer our coding questions, we’ll start with some imports

import numpy as np
import pandas as pd
import prettytable as pt
import matplotlib.pyplot as plt
from scipy.stats import binom
import scipy.stats as st

181
Intermediate Quantitative Economics with Python

Empowered with these Python tools, we’ll now explore the two meanings described above.

10.2 Frequentist Interpretation

Consider the following classic example.


The random variable 𝑋 takes on possible values 𝑘 = 0, 1, 2, … , 𝑛 with probabilties

𝑛!
Prob(𝑋 = 𝑘|𝜃) = ( ) 𝜃𝑘 (1 − 𝜃)𝑛−𝑘
𝑘!(𝑛 − 𝑘)!

where the fixed parameter 𝜃 ∈ (0, 1).


This is called the binomial distribution.
Here
• 𝜃 is the probability that one toss of a coin will be a head, an outcome that we encode as 𝑌 = 1.
• 1 − 𝜃 is the probability that one toss of the coin will be a tail, an outcome that we denote 𝑌 = 0.
• 𝑋 is the total number of heads that came up after flipping the coin 𝑛 times.
Consider the following experiment:
Take 𝐼 independent sequences of 𝑛 independent flips of the coin
Notice the repeated use of the adjective independent:
• we use it once to describe that we are drawing 𝑛 independent times from a Bernoulli distribution with parameter
𝜃 to arrive at one draw from a Binomial distribution with parameters 𝜃, 𝑛.
• we use it again to describe that we are then drawing 𝐼 sequences of 𝑛 coin draws.
Let 𝑦ℎ𝑖 ∈ {0, 1} be the realized value of 𝑌 on the ℎth flip during the 𝑖th sequence of flips.
𝑛
Let ∑ℎ=1 𝑦ℎ𝑖 denote the total number of times heads come up during the 𝑖th sequence of 𝑛 independent coin flips.
𝑛
Let 𝑓𝑘 record the fraction of samples of length 𝑛 for which ∑ℎ=1 𝑦ℎ𝑖 = 𝑘:
𝑛
number of samples of length n for which ∑ℎ=1 𝑦ℎ𝑖 = 𝑘
𝑓𝑘𝐼 =
𝐼
The probability Prob(𝑋 = 𝑘|𝜃) answers the following question:
• As 𝐼 becomes large, in what fraction of 𝐼 independent draws of 𝑛 coin flips should we anticipate 𝑘 heads to occur?
As usual, a law of large numbers justifies this answer.

Exercise 10.2.1
1. Please write a Python class to compute 𝑓𝑘𝐼
2. Please use your code to compute 𝑓𝑘𝐼 , 𝑘 = 0, … , 𝑛 and compare them to Prob(𝑋 = 𝑘|𝜃) for various values of 𝜃, 𝑛
and 𝐼
3. With the Law of Large numbers in mind, use your code to say something

Solution to Exercise 10.2.1


Here is one solution:

182 Chapter 10. Two Meanings of Probability


Intermediate Quantitative Economics with Python

class frequentist:

def __init__(self, θ, n, I):

'''
initialization
-----------------
parameters:
θ : probability that one toss of a coin will be a head with Y = 1
n : number of independent flips in each independent sequence of draws
I : number of independent sequence of draws

'''

self.θ, self.n, self.I = θ, n, I

def binomial(self, k):

'''compute the theoretical probability for specific input k'''

θ, n = self.θ, self.n
self.k = k
self.P = binom.pmf(k, n, θ)

def draw(self):

'''draw n independent flips for I independent sequences'''

θ, n, I = self.θ, self.n, self.I


sample = np.random.rand(I, n)
Y = (sample <= θ) * 1
self.Y = Y

def compute_fk(self, kk):

'''compute f_{k}^I for specific input k'''

Y, I = self.Y, self.I
K = np.sum(Y, 1)
f_kI = np.sum(K == kk) / I
self.f_kI = f_kI
self.kk = kk

def compare(self):

'''compute and print the comparison'''

n = self.n
comp = pt.PrettyTable()
comp.field_names = ['k', 'Theoretical', 'Frequentist']
self.draw()
for i in range(n):
self.binomial(i+1)
self.compute_fk(i+1)
comp.add_row([i+1, self.P, self.f_kI])
print(comp)

10.2. Frequentist Interpretation 183


Intermediate Quantitative Economics with Python

θ, n, k, I = 0.7, 20, 10, 1_000_000

freq = frequentist(θ, n, I)

freq.compare()

+----+------------------------+-------------+
| k | Theoretical | Frequentist |
+----+------------------------+-------------+
| 1 | 1.6271660538000033e-09 | 0.0 |
| 2 | 3.606884752589999e-08 | 0.0 |
| 3 | 5.04963865362601e-07 | 2e-06 |
| 4 | 5.007558331512455e-06 | 3e-06 |
| 5 | 3.7389768875293014e-05 | 4.9e-05 |
| 6 | 0.00021810698510587546 | 0.000211 |
| 7 | 0.001017832597160754 | 0.001035 |
| 8 | 0.003859281930901185 | 0.003907 |
| 9 | 0.012006654896137007 | 0.011892 |
| 10 | 0.030817080900085007 | 0.03103 |
| 11 | 0.06536956554563476 | 0.065302 |
| 12 | 0.11439673970486108 | 0.11459 |
| 13 | 0.1642619852172365 | 0.164278 |
| 14 | 0.19163898275344252 | 0.191064 |
| 15 | 0.17886305056987967 | 0.179323 |
| 16 | 0.1304209743738704 | 0.130184 |
| 17 | 0.07160367220526209 | 0.071683 |
| 18 | 0.027845872524268643 | 0.027709 |
| 19 | 0.006839337111223895 | 0.006971 |
| 20 | 0.0007979226629761189 | 0.000767 |
+----+------------------------+-------------+

From the table above, can you see the law of large numbers at work?

Let’s do some more calculations.


Comparison with different 𝜃
Now we fix

𝑛 = 20, 𝑘 = 10, 𝐼 = 1, 000, 000

We’ll vary 𝜃 from 0.01 to 0.99 and plot outcomes against 𝜃.

θ_low, θ_high, npt = 0.01, 0.99, 50


thetas = np.linspace(θ_low, θ_high, npt)
P = []
f_kI = []
for i in range(npt):
freq = frequentist(thetas[i], n, I)
freq.binomial(k)
freq.draw()
freq.compute_fk(k)
P.append(freq.P)
f_kI.append(freq.f_kI)

184 Chapter 10. Two Meanings of Probability


Intermediate Quantitative Economics with Python

fig, ax = plt.subplots(figsize=(8, 6))


ax.grid()
ax.plot(thetas, P, 'k-.', label='Theoretical')
ax.plot(thetas, f_kI, 'r--', label='Fraction')
plt.title(r'Comparison with different $\theta$', fontsize=16)
plt.xlabel(r'$\theta$', fontsize=15)
plt.ylabel('Fraction', fontsize=15)
plt.tick_params(labelsize=13)
plt.legend()
plt.show()

Comparison with different 𝑛


Now we fix 𝜃 = 0.7, 𝑘 = 10, 𝐼 = 1, 000, 000 and vary 𝑛 from 1 to 100.
Then we’ll plot outcomes.

n_low, n_high, nn = 1, 100, 50


ns = np.linspace(n_low, n_high, nn, dtype='int')
P = []
f_kI = []
for i in range(nn):
freq = frequentist(θ, ns[i], I)
freq.binomial(k)
freq.draw()
(continues on next page)

10.2. Frequentist Interpretation 185


Intermediate Quantitative Economics with Python

(continued from previous page)


freq.compute_fk(k)
P.append(freq.P)
f_kI.append(freq.f_kI)

fig, ax = plt.subplots(figsize=(8, 6))


ax.grid()
ax.plot(ns, P, 'k-.', label='Theoretical')
ax.plot(ns, f_kI, 'r--', label='Frequentist')
plt.title(r'Comparison with different $n$', fontsize=16)
plt.xlabel(r'$n$', fontsize=15)
plt.ylabel('Fraction', fontsize=15)
plt.tick_params(labelsize=13)
plt.legend()
plt.show()

Comparison with different 𝐼


Now we fix 𝜃 = 0.7, 𝑛 = 20, 𝑘 = 10 and vary log(𝐼) from 2 to 7.

I_log_low, I_log_high, nI = 2, 6, 200


log_Is = np.linspace(I_log_low, I_log_high, nI)
Is = np.power(10, log_Is).astype(int)
P = []
(continues on next page)

186 Chapter 10. Two Meanings of Probability


Intermediate Quantitative Economics with Python

(continued from previous page)


f_kI = []
for i in range(nI):
freq = frequentist(θ, n, Is[i])
freq.binomial(k)
freq.draw()
freq.compute_fk(k)
P.append(freq.P)
f_kI.append(freq.f_kI)

fig, ax = plt.subplots(figsize=(8, 6))


ax.grid()
ax.plot(Is, P, 'k-.', label='Theoretical')
ax.plot(Is, f_kI, 'r--', label='Fraction')
plt.title(r'Comparison with different $I$', fontsize=16)
plt.xlabel(r'$I$', fontsize=15)
plt.ylabel('Fraction', fontsize=15)
plt.tick_params(labelsize=13)
plt.legend()
plt.show()

From the above graphs, we can see that 𝐼, the number of independent sequences, plays an important role.
When 𝐼 becomes larger, the difference between theoretical probability and frequentist estimate becomes smaller.

10.2. Frequentist Interpretation 187


Intermediate Quantitative Economics with Python

Also, as long as 𝐼 is large enough, changing 𝜃 or 𝑛 does not substantially change the accuracy of the observed fraction as
an approximation of 𝜃.
The Law of Large Numbers is at work here.
For each draw of an independent sequence, Prob(𝑋𝑖 = 𝑘|𝜃) is the same, so aggregating all draws forms an i.i.d sequence
of a binary random variable 𝜌𝑘,𝑖 , 𝑖 = 1, 2, ...𝐼, with a mean of Prob(𝑋 = 𝑘|𝜃) and a variance of

𝑛 ⋅ Prob(𝑋 = 𝑘|𝜃) ⋅ (1 − Prob(𝑋 = 𝑘|𝜃)).

So, by the LLN, the average of 𝑃𝑘,𝑖 converges to:

𝑛!
𝐸[𝜌𝑘,𝑖 ] = Prob(𝑋 = 𝑘|𝜃) = ( ) 𝜃𝑘 (1 − 𝜃)𝑛−𝑘
𝑘!(𝑛 − 𝑘)!

as 𝐼 goes to infinity.

10.3 Bayesian Interpretation

We again use a binomial distribution.


But now we don’t regard 𝜃 as being a fixed number.
Instead, we think of it as a random variable.
𝜃 is described by a probability distribution.
But now this probability distribution means something different than a relative frequency that we can anticipate to occur
in a large i.i.d. sample.
Instead, the probability distribution of 𝜃 is now a summary of our views about likely values of 𝜃 either
• before we have seen any data at all, or
• before we have seen more data, after we have seen some data
Thus, suppose that, before seeing any data, you have a personal prior probability distribution saying that

𝜃𝛼−1 (1 − 𝜃)𝛽−1
𝑃 (𝜃) =
𝐵(𝛼, 𝛽)

where 𝐵(𝛼, 𝛽) is a beta function , so that 𝑃 (𝜃) is a beta distribution with parameters 𝛼, 𝛽.

Exercise 10.3.1
a) Please write down the likelihood function for a sample of length 𝑛 from a binomial distribution with parameter 𝜃.
b) Please write down the posterior distribution for 𝜃 after observing one flip of the coin.
c) Now pretend that the true value of 𝜃 = .4 and that someone who doesn’t know this has a beta prior distribution with
parameters with 𝛽 = 𝛼 = .5. Please write a Python class to simulate this person’s personal posterior distribution for 𝜃
for a single sequence of 𝑛 draws.
d) Please plot the posterior distribution for 𝜃 as a function of 𝜃 as 𝑛 grows as 1, 2, ….
e) For various 𝑛’s, please describe and compute a Bayesian coverage interval for the interval [.45, .55].
f) Please tell what question a Bayesian coverage interval answers.
g) Please compute the Posterior probabililty that 𝜃 ∈ [.45, .55] for various values of sample size 𝑛.

188 Chapter 10. Two Meanings of Probability


Intermediate Quantitative Economics with Python

h) Please use your Python class to study what happens to the posterior distribution as 𝑛 → +∞, again assuming that the
true value of 𝜃 = .4, though it is unknown to the person doing the updating via Bayes’ Law.

Solution to Exercise 10.3.1


a) Please write down the likelihood function and the posterior distribution for 𝜃 after observing one flip of our coin.
Suppose the outcome is Y.
The likelihood function is:

𝐿(𝑌 |𝜃) = Prob(𝑋 = 𝑌 |𝜃) = 𝜃𝑌 (1 − 𝜃)1−𝑌

b) Please write the posterior distribution for 𝜃 after observing one flip of our coin.
The prior distribution is

𝜃𝛼−1 (1 − 𝜃)𝛽−1
Prob(𝜃) =
𝐵(𝛼, 𝛽)
We can derive the posterior distribution for 𝜃 via
Prob(𝑌 |𝜃)Prob(𝜃)
Prob(𝜃|𝑌 ) =
Prob(𝑌 )
Prob(𝑌 |𝜃)Prob(𝜃)
= 1
∫0 Prob(𝑌 |𝜃)Prob(𝜃)𝑑𝜃
𝜃𝛼−1 (1−𝜃)𝛽−1
𝜃𝑌 (1 − 𝜃)1−𝑌 𝐵(𝛼,𝛽)
= 1 𝜃𝛼−1 (1−𝜃)𝛽−1
∫0 𝜃𝑌 (1 − 𝜃)1−𝑌 𝐵(𝛼,𝛽) 𝑑𝜃
𝜃𝑌 +𝛼−1 (1 − 𝜃)1−𝑌 +𝛽−1
= 1
∫0 𝜃𝑌 +𝛼−1 (1 − 𝜃)1−𝑌 +𝛽−1 𝑑𝜃

which means that

Prob(𝜃|𝑌 ) ∼ Beta(𝛼 + 𝑌 , 𝛽 + (1 − 𝑌 ))

Now please pretend that the true value of 𝜃 = .4 and that someone who doesn’t know this has a beta prior with 𝛽 = 𝛼 = .5.
c) Now pretend that the true value of 𝜃 = .4 and that someone who doesn’t know this has a beta prior distribution with
parameters with 𝛽 = 𝛼 = .5. Please write a Python class to simulate this person’s personal posterior distribution for 𝜃
for a single sequence of 𝑛 draws.

class Bayesian:

def __init__(self, θ=0.4, n=1_000_000, α=0.5, β=0.5):


"""
Parameters:
----------
θ : float, ranging from [0,1].
probability that one toss of a coin will be a head with Y = 1

n : int.
number of independent flips in an independent sequence of draws

α&β : int or float.


(continues on next page)

10.3. Bayesian Interpretation 189


Intermediate Quantitative Economics with Python

(continued from previous page)


parameters of the prior distribution on θ

"""
self.θ, self.n, self.α, self.β = θ, n, α, β
self.prior = st.beta(α, β)

def draw(self):
"""
simulate a single sequence of draws of length n, given probability θ

"""
array = np.random.rand(self.n)
self.draws = (array < self.θ).astype(int)

def form_single_posterior(self, step_num):


"""
form a posterior distribution after observing the first step_num elements of␣
↪the draws

Parameters
----------
step_num: int.
number of steps observed to form a posterior distribution

Returns
------
the posterior distribution for sake of plotting in the subsequent steps

"""
heads_num = self.draws[:step_num].sum()
tails_num = step_num - heads_num

return st.beta(self.α+heads_num, self.β+tails_num)

def form_posterior_series(self,num_obs_list):
"""
form a series of posterior distributions that form after observing different␣
↪number of draws.

Parameters
----------
num_obs_list: a list of int.
a list of the number of observations used to form a series of␣
↪posterior distributions.

"""
self.posterior_list = []
for num in num_obs_list:
self.posterior_list.append(self.form_single_posterior(num))

d) Please plot the posterior distribution for 𝜃 as a function of 𝜃 as 𝑛 grows from 1, 2, ….

Bay_stat = Bayesian()
Bay_stat.draw()

num_list = [1, 2, 3, 4, 5, 10, 20, 30, 50, 70, 100, 300, 500, 1000, # this line for␣
↪finite n
(continues on next page)

190 Chapter 10. Two Meanings of Probability


Intermediate Quantitative Economics with Python

(continued from previous page)


5000, 10_000, 50_000, 100_000, 200_000, 300_000] # this line for␣
↪ approximately infinite n

Bay_stat.form_posterior_series(num_list)

θ_values = np.linspace(0.01, 1, 100)

fig, ax = plt.subplots(figsize=(10, 6))

ax.plot(θ_values, Bay_stat.prior.pdf(θ_values), label='Prior Distribution', color='k',


↪ linestyle='--')

for ii, num in enumerate(num_list[:14]):


ax.plot(θ_values, Bay_stat.posterior_list[ii].pdf(θ_values), label='Posterior␣
↪with n = %d' % num)

ax.set_title('P.D.F of Posterior Distributions', fontsize=15)


ax.set_xlabel(r"$\theta$", fontsize=15)

ax.legend(fontsize=11)
plt.show()

e) For various 𝑛’s, please describe and compute .05 and .95 quantiles for posterior probabilities.

upper_bound = [ii.ppf(0.05) for ii in Bay_stat.posterior_list[:14]]


lower_bound = [ii.ppf(0.95) for ii in Bay_stat.posterior_list[:14]]

interval_df = pd.DataFrame()
(continues on next page)

10.3. Bayesian Interpretation 191


Intermediate Quantitative Economics with Python

(continued from previous page)


interval_df['upper'] = upper_bound
interval_df['lower'] = lower_bound
interval_df.index = num_list[:14]
interval_df = interval_df.T
interval_df

1 2 3 4 5 10 20 \
upper 0.228520 0.097308 0.062413 0.16528 0.260634 0.347322 0.280091
lower 0.998457 0.902692 0.764466 0.83472 0.872224 0.814884 0.629953

30 50 70 100 300 500 1000


upper 0.293487 0.329116 0.389167 0.418512 0.373839 0.391977 0.393532
lower 0.582293 0.555887 0.583119 0.581488 0.467296 0.464637 0.444813

As 𝑛 increases, we can see that Bayesian coverage intervals narrow and move toward 0.4.
f) Please tell what question a Bayesian coverage interval answers.
The Bayesian coverage interval tells the range of 𝜃 that corresponds to the [𝑝1 , 𝑝2 ] quantiles of the cumulative probability
distribution (CDF) of the posterior distribution.
To construct the coverage interval we first compute a posterior distribution of the unknown parameter 𝜃.
If the CDF is 𝐹 (𝜃), then the Bayesian coverage interval [𝑎, 𝑏] for the interval [𝑝1 , 𝑝2 ] is described by

𝐹 (𝑎) = 𝑝1 , 𝐹 (𝑏) = 𝑝2

g) Please compute the Posterior probabililty that 𝜃 ∈ [.45, .55] for various values of sample size 𝑛.

left_value, right_value = 0.45, 0.55

posterior_prob_list=[ii.cdf(right_value)-ii.cdf(left_value) for ii in Bay_stat.


↪posterior_list]

fig, ax = plt.subplots(figsize=(8, 5))


ax.plot(posterior_prob_list)
ax.set_title('Posterior Probabililty that '+ r"$\theta$" +' Ranges from %.2f to %.2f'
↪%(left_value, right_value),

fontsize=13)
ax.set_xticks(np.arange(0, len(posterior_prob_list), 3))
ax.set_xticklabels(num_list[::3])
ax.set_xlabel('Number of Observations', fontsize=11)

plt.show()

192 Chapter 10. Two Meanings of Probability


Intermediate Quantitative Economics with Python

Notice that in the graph above the posterior probabililty that 𝜃 ∈ [.45, .55] typically exhibits a hump shape as 𝑛 increases.
Two opposing forces are at work.
The first force is that the individual adjusts his belief as he observes new outcomes, so his posterior probability distribution
becomes more and more realistic, which explains the rise of the posterior probabililty.
However, [.45, .55] actually excludes the true 𝜃 = .4 that generates the data.
As a result, the posterior probabililty drops as larger and larger samples refine his posterior probability distribution of 𝜃.
The descent seems precipitous only because of the scale of the graph that has the number of observations increasing
disproportionately.
When the number of observations becomes large enough, our Bayesian becomes so confident about 𝜃 that he considers
𝜃 ∈ [.45, .55] very unlikely.
That is why we see a nearly horizontal line when the number of observations exceeds 500.
h) Please use your Python class to study what happens to the posterior distribution as 𝑛 → +∞, again assuming that the
true value of 𝜃 = .4, though it is unknown to the person doing the updating via Bayes’ Law.
Using the Python class we made above, we can see the evolution of posterior distributions as 𝑛 approaches infinity.

fig, ax = plt.subplots(figsize=(10, 6))

for ii, num in enumerate(num_list[14:]):


ii += 14
ax.plot(θ_values, Bay_stat.posterior_list[ii].pdf(θ_values),
label='Posterior with n=%d thousand' % (num/1000))

(continues on next page)

10.3. Bayesian Interpretation 193


Intermediate Quantitative Economics with Python

(continued from previous page)


ax.set_title('P.D.F of Posterior Distributions', fontsize=15)
ax.set_xlabel(r"$\theta$", fontsize=15)
ax.set_xlim(0.3, 0.5)

ax.legend(fontsize=11)
plt.show()

As 𝑛 increases, we can see that the probability density functions concentrate on 0.4, the true value of 𝜃.
Here the posterior means converges to 0.4 while the posterior standard deviations converges to 0 from above.
To show this, we compute the means and variances statistics of the posterior distributions.

mean_list = [ii.mean() for ii in Bay_stat.posterior_list]


std_list = [ii.std() for ii in Bay_stat.posterior_list]

fig, ax = plt.subplots(1, 2, figsize=(14, 5))

ax[0].plot(mean_list)
ax[0].set_title('Mean Values of Posterior Distribution', fontsize=13)
ax[0].set_xticks(np.arange(0, len(mean_list), 3))
ax[0].set_xticklabels(num_list[::3])
ax[0].set_xlabel('Number of Observations', fontsize=11)

ax[1].plot(std_list)
ax[1].set_title('Standard Deviations of Posterior Distribution', fontsize=13)
ax[1].set_xticks(np.arange(0, len(std_list), 3))
ax[1].set_xticklabels(num_list[::3])
ax[1].set_xlabel('Number of Observations', fontsize=11)
(continues on next page)

194 Chapter 10. Two Meanings of Probability


Intermediate Quantitative Economics with Python

(continued from previous page)

plt.show()

How shall we interpret the patterns above?


The answer is encoded in the Bayesian updating formulas.
It is natural to extend the one-step Bayesian update to an 𝑛-step Bayesian update.

Prob(𝜃, 𝑘) Prob(𝑘|𝜃) ∗ Prob(𝜃) Prob(𝑘|𝜃) ∗ Prob(𝜃)


Prob(𝜃|𝑘) = = = 1
Prob(𝑘) Prob(𝑘) ∫0 Prob(𝑘|𝜃) ∗ Prob(𝜃)𝑑𝜃

𝜃𝛼−1 (1−𝜃)𝛽−1
(𝑁
𝑘 )(1 − 𝜃)
𝑁−𝑘 𝑘
𝜃 ∗ 𝐵(𝛼,𝛽)
= 1 𝜃𝛼−1 (1−𝜃)𝛽−1
∫0 (𝑁
𝑘 )(1 − 𝜃)
𝑁−𝑘 𝜃𝑘 ∗
𝐵(𝛼,𝛽) 𝑑𝜃

(1 − 𝜃)𝛽+𝑁−𝑘−1 ∗ 𝜃𝛼+𝑘−1
= 1
∫0 (1 − 𝜃)𝛽+𝑁−𝑘−1 ∗ 𝜃𝛼+𝑘−1 𝑑𝜃

= 𝐵𝑒𝑡𝑎(𝛼 + 𝑘, 𝛽 + 𝑁 − 𝑘)
A beta distribution with 𝛼 and 𝛽 has the following mean and variance.
𝛼
The mean is 𝛼+𝛽
𝛼𝛽
The variance is (𝛼+𝛽)2 (𝛼+𝛽+1)

• 𝛼 can be viewed as the number of successes


• 𝛽 can be viewed as the number of failures
The random variables 𝑘 and 𝑁 − 𝑘 are governed by Binomial Distribution with 𝜃 = 0.4.
Call this the true data generating process.
According to the Law of Large Numbers, for a large number of observations, observed frequencies of 𝑘 and 𝑁 − 𝑘
will be described by the true data generating process, i.e., the population probability distribution that we assumed when
generating the observations on the computer. (See Exercise 10.2.1).
Consequently, the mean of the posterior distribution converges to 0.4 and the variance withers to zero.

10.3. Bayesian Interpretation 195


Intermediate Quantitative Economics with Python

upper_bound = [ii.ppf(0.95) for ii in Bay_stat.posterior_list]


lower_bound = [ii.ppf(0.05) for ii in Bay_stat.posterior_list]

fig, ax = plt.subplots(figsize=(10, 6))


ax.scatter(np.arange(len(upper_bound)), upper_bound, label='95 th Quantile')
ax.scatter(np.arange(len(lower_bound)), lower_bound, label='05 th Quantile')

ax.set_xticks(np.arange(0, len(upper_bound), 2))


ax.set_xticklabels(num_list[::2])
ax.set_xlabel('Number of Observations', fontsize=12)
ax.set_title('Bayesian Coverage Intervals of Posterior Distributions', fontsize=15)

ax.legend(fontsize=11)
plt.show()

After observing a large number of outcomes, the posterior distribution collapses around 0.4.
Thus, the Bayesian statististian comes to believe that 𝜃 is near .4.
As shown in the figure above, as the number of observations grows, the Bayesian coverage intervals (BCIs) become
narrower and narrower around 0.4.
However, if you take a closer look, you will find that the centers of the BCIs are not exactly 0.4, due to the persistent
influence of the prior distribution and the randomness of the simulation path.

196 Chapter 10. Two Meanings of Probability


Intermediate Quantitative Economics with Python

10.4 Role of a Conjugate Prior

We have made assumptions that link functional forms of our likelihood function and our prior in a way that has eased our
calculations considerably.
In particular, our assumptions that the likelihood function is binomial and that the prior distribution is a beta distribution
have the consequence that the posterior distribution implied by Bayes’ Law is also a beta distribution.
So posterior and prior are both beta distributions, albeit ones with different parameters.
When a likelihood function and prior fit together like hand and glove in this way, we can say that the prior and posterior
are conjugate distributions.
In this situation, we also sometimes say that we have conjugate prior for the likelihood function Prob(𝑋|𝜃).
Typically, the functional form of the likelihood function determines the functional form of a conjugate prior.
A natural question to ask is why should a person’s personal prior about a parameter 𝜃 be restricted to be described by a
conjugate prior?
Why not some other functional form that more sincerely describes the person’s beliefs?
To be argumentative, one could ask, why should the form of the likelihood function have anything to say about my personal
beliefs about 𝜃?
A dignified response to that question is, well, it shouldn’t, but if you want to compute a posterior easily you’ll just be
happier if your prior is conjugate to your likelihood.
Otherwise, your posterior won’t have a convenient analytical form and you’ll be in the situation of wanting to apply the
Markov chain Monte Carlo techniques deployed in this quantecon lecture.
We also apply these powerful methods to approximating Bayesian posteriors for non-conjugate priors in this quantecon
lecture and this quantecon lecture

10.4. Role of a Conjugate Prior 197


Intermediate Quantitative Economics with Python

198 Chapter 10. Two Meanings of Probability


CHAPTER

ELEVEN

MULTIVARIATE HYPERGEOMETRIC DISTRIBUTION

Contents

• Multivariate Hypergeometric Distribution


– Overview
– The Administrator’s Problem
– Usage

11.1 Overview

This lecture describes how an administrator deployed a multivariate hypergeometric distribution in order to access
the fairness of a procedure for awarding research grants.
In the lecture we’ll learn about
• properties of the multivariate hypergeometric distribution
• first and second moments of a multivariate hypergeometric distribution
• using a Monte Carlo simulation of a multivariate normal distribution to evaluate the quality of a normal approxi-
mation
• the administrator’s problem and why the multivariate hypergeometric distribution is the right tool

11.2 The Administrator’s Problem

An administrator in charge of allocating research grants is in the following situation.


To help us forget details that are none of our business here and to protect the anonymity of the administrator and the
subjects, we call research proposals balls and continents of residence of authors of a proposal a color.
There are 𝐾𝑖 balls (proposals) of color 𝑖.
There are 𝑐 distinct colors (continents of residence).
Thus, 𝑖 = 1, 2, … , 𝑐
𝑐
So there is a total of 𝑁 = ∑𝑖=1 𝐾𝑖 balls.
All 𝑁 of these balls are placed in an urn.

199
Intermediate Quantitative Economics with Python

Then 𝑛 balls are drawn randomly.


The selection procedure is supposed to be color blind meaning that ball quality, a random variable that is supposed to
be independent of ball color, governs whether a ball is drawn.
Thus, the selection procedure is supposed randomly to draw 𝑛 balls from the urn.
The 𝑛 balls drawn represent successful proposals and are awarded research funds.
The remaining 𝑁 − 𝑛 balls receive no research funds.

11.2.1 Details of the Awards Procedure Under Study

Let 𝑘𝑖 be the number of balls of color 𝑖 that are drawn.


𝑐
Things have to add up so ∑𝑖=1 𝑘𝑖 = 𝑛.
Under the hypothesis that the selection process judges proposals on their quality and that quality is independent of conti-
nent of the author’s continent of residence, the administrator views the outcome of the selection procedure as a random
vector
𝑘1

⎜𝑘2 ⎞
⎟.
𝑋=⎜
⎜⋮⎟ ⎟
𝑘
⎝ 𝑐⎠

To evaluate whether the selection procedure is color blind the administrator wants to study whether the particular re-
alization of 𝑋 drawn can plausibly be said to be a random draw from the probability distribution that is implied by the
color blind hypothesis.
The appropriate probability distribution is the one described here.
Let’s now instantiate the administrator’s problem, while continuing to use the colored balls metaphor.
The administrator has an urn with 𝑁 = 238 balls.
157 balls are blue, 11 balls are green, 46 balls are yellow, and 24 balls are black.
So (𝐾1 , 𝐾2 , 𝐾3 , 𝐾4 ) = (157, 11, 46, 24) and 𝑐 = 4.
15 balls are drawn without replacement.
So 𝑛 = 15.
The administrator wants to know the probability distribution of outcomes

𝑘1

⎜𝑘2 ⎞
⎟.
𝑋=⎜
⎜⋮⎟ ⎟
⎝𝑘4 ⎠

In particular, he wants to know whether a particular outcome - in the form of a 4 × 1 vector of integers recording the
numbers of blue, green, yellow, and black balls, respectively, - contains evidence against the hypothesis that the selection
process is fair, which here means color blind and truly are random draws without replacement from the population of 𝑁
balls.
The right tool for the administrator’s job is the multivariate hypergeometric distribution.

200 Chapter 11. Multivariate Hypergeometric Distribution


Intermediate Quantitative Economics with Python

11.2.2 Multivariate Hypergeometric Distribution

Let’s start with some imports.

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from scipy.special import comb
from scipy.stats import normaltest
from numba import njit, prange

To recapitulate, we assume there are in total 𝑐 types of objects in an urn.


If there are 𝐾𝑖 type 𝑖 object in the urn and we take 𝑛 draws at random without replacement, then the numbers of type 𝑖
objects in the sample (𝑘1 , 𝑘2 , … , 𝑘𝑐 ) has the multivariate hypergeometric distribution.
𝑐 𝑐
Note again that 𝑁 = ∑𝑖=1 𝐾𝑖 is the total number of objects in the urn and 𝑛 = ∑𝑖=1 𝑘𝑖 .
Notation
We use the following notation for binomial coefficients: (𝑚
𝑞) =
𝑚!
(𝑚−𝑞)! .

The multivariate hypergeometric distribution has the following properties:


Probability mass function:
𝑐
∏𝑖=1 (𝐾
𝑘 )
𝑖

Pr{𝑋𝑖 = 𝑘𝑖 ∀𝑖} = 𝑖

(𝑁
𝑛)

Mean:
𝐾𝑖
E(𝑋𝑖 ) = 𝑛
𝑁
Variances and covariances:
𝑁 − 𝑛 𝐾𝑖 𝐾
Var(𝑋𝑖 ) = 𝑛 (1 − 𝑖 )
𝑁 −1 𝑁 𝑁
𝑁 − 𝑛 𝐾𝑖 𝐾𝑗
Cov(𝑋𝑖 , 𝑋𝑗 ) = −𝑛
𝑁 −1 𝑁 𝑁
To do our work for us, we’ll write an Urn class.

class Urn:

def __init__(self, K_arr):


"""
Initialization given the number of each type i object in the urn.

Parameters
----------
K_arr: ndarray(int)
number of each type i object.
"""

self.K_arr = np.array(K_arr)
self.N = np.sum(K_arr)
self.c = len(K_arr)

(continues on next page)

11.2. The Administrator’s Problem 201


Intermediate Quantitative Economics with Python

(continued from previous page)


def pmf(self, k_arr):
"""
Probability mass function.

Parameters
----------
k_arr: ndarray(int)
number of observed successes of each object.
"""

K_arr, N = self.K_arr, self.N

k_arr = np.atleast_2d(k_arr)
n = np.sum(k_arr, 1)

num = np.prod(comb(K_arr, k_arr), 1)


denom = comb(N, n)

pr = num / denom

return pr

def moments(self, n):


"""
Compute the mean and variance-covariance matrix for
multivariate hypergeometric distribution.

Parameters
----------
n: int
number of draws.
"""

K_arr, N, c = self.K_arr, self.N, self.c

# mean
μ = n * K_arr / N

# variance-covariance matrix
Σ = np.full((c, c), n * (N - n) / (N - 1) / N ** 2)
for i in range(c-1):
Σ[i, i] *= K_arr[i] * (N - K_arr[i])
for j in range(i+1, c):
Σ[i, j] *= - K_arr[i] * K_arr[j]
Σ[j, i] = Σ[i, j]

Σ[-1, -1] *= K_arr[-1] * (N - K_arr[-1])

return μ, Σ

def simulate(self, n, size=1, seed=None):


"""
Simulate a sample from multivariate hypergeometric
distribution where at each draw we take n objects
from the urn without replacement.

(continues on next page)

202 Chapter 11. Multivariate Hypergeometric Distribution


Intermediate Quantitative Economics with Python

(continued from previous page)


Parameters
----------
n: int
number of objects for each draw.
size: int(optional)
sample size.
seed: int(optional)
random seed.
"""

K_arr = self.K_arr

gen = np.random.Generator(np.random.PCG64(seed))
sample = gen.multivariate_hypergeometric(K_arr, n, size=size)

return sample

11.3 Usage

11.3.1 First example

Apply this to an example from wiki:


Suppose there are 5 black, 10 white, and 15 red marbles in an urn. If six marbles are chosen without replacement, the
probability that exactly two of each color are chosen is

(52)(10 15
2 )( 2 )
𝑃 (2 black, 2 white, 2 red) = = 0.079575596816976
(30
6)

# construct the urn


K_arr = [5, 10, 15]
urn = Urn(K_arr)

Now use the Urn Class method pmf to compute the probability of the outcome 𝑋 = (2 2 2)

k_arr = [2, 2, 2] # array of number of observed successes


urn.pmf(k_arr)

array([0.0795756])

We can use the code to compute probabilities of a list of possible outcomes by constructing a 2-dimensional array k_arr
and pmf will return an array of probabilities for observing each case.

k_arr = [[2, 2, 2], [1, 3, 2]]


urn.pmf(k_arr)

array([0.0795756, 0.1061008])

Now let’s compute the mean vector and variance-covariance matrix.

11.3. Usage 203


Intermediate Quantitative Economics with Python

n = 6
μ, Σ = urn.moments(n)

array([1., 2., 3.])

array([[ 0.68965517, -0.27586207, -0.4137931 ],


[-0.27586207, 1.10344828, -0.82758621],
[-0.4137931 , -0.82758621, 1.24137931]])

11.3.2 Back to The Administrator’s Problem

Now let’s turn to the grant administrator’s problem.


Here the array of numbers of 𝑖 objects in the urn is (157, 11, 46, 24).

K_arr = [157, 11, 46, 24]


urn = Urn(K_arr)

Let’s compute the probability of the outcome (10, 1, 4, 0).

k_arr = [10, 1, 4, 0]
urn.pmf(k_arr)

array([0.01547738])

We can compute probabilities of three possible outcomes by constructing a 3-dimensional arrays k_arr and utilizing
the method pmf of the Urn class.

k_arr = [[5, 5, 4 ,1], [10, 1, 2, 2], [13, 0, 2, 0]]


urn.pmf(k_arr)

array([6.21412534e-06, 2.70935969e-02, 1.61839976e-02])

Now let’s compute the mean and variance-covariance matrix of 𝑋 when 𝑛 = 6.

n = 6 # number of draws
μ, Σ = urn.moments(n)

# mean
μ

array([3.95798319, 0.27731092, 1.15966387, 0.60504202])

204 Chapter 11. Multivariate Hypergeometric Distribution


Intermediate Quantitative Economics with Python

# variance-covariance matrix
Σ

array([[ 1.31862604, -0.17907267, -0.74884935, -0.39070401],


[-0.17907267, 0.25891399, -0.05246715, -0.02737417],
[-0.74884935, -0.05246715, 0.91579029, -0.11447379],
[-0.39070401, -0.02737417, -0.11447379, 0.53255196]])

We can simulate a large sample and verify that sample means and covariances closely approximate the population means
and covariances.

size = 10_000_000
sample = urn.simulate(n, size=size)

# mean
np.mean(sample, 0)

array([3.9573046, 0.2774102, 1.1597064, 0.6055788])

# variance covariance matrix


np.cov(sample.T)

array([[ 1.31949123, -0.17936828, -0.74889015, -0.39123281],


[-0.17936828, 0.25914361, -0.05241489, -0.02736044],
[-0.74889015, -0.05241489, 0.91570316, -0.11439812],
[-0.39123281, -0.02736044, -0.11439812, 0.53299137]])

Evidently, the sample means and covariances approximate their population counterparts well.

11.3.3 Quality of Normal Approximation

To judge the quality of a multivariate normal approximation to the multivariate hypergeometric distribution, we draw
a large sample from a multivariate normal distribution with the mean vector and covariance matrix for the correspond-
ing multivariate hypergeometric distribution and compare the simulated distribution with the population multivariate
hypergeometric distribution.

sample_normal = np.random.multivariate_normal(μ, Σ, size=size)

def bivariate_normal(x, y, μ, Σ, i, j):

μ_x, μ_y = μ[i], μ[j]


σ_x, σ_y = np.sqrt(Σ[i, i]), np.sqrt(Σ[j, j])
σ_xy = Σ[i, j]

x_μ = x - μ_x
y_μ = y - μ_y

ρ = σ_xy / (σ_x * σ_y)


z = x_μ**2 / σ_x**2 + y_μ**2 / σ_y**2 - 2 * ρ * x_μ * y_μ / (σ_x * σ_y)
denom = 2 * np.pi * σ_x * σ_y * np.sqrt(1 - ρ**2)
(continues on next page)

11.3. Usage 205


Intermediate Quantitative Economics with Python

(continued from previous page)

return np.exp(-z / (2 * (1 - ρ**2))) / denom

@njit
def count(vec1, vec2, n):
size = sample.shape[0]

count_mat = np.zeros((n+1, n+1))


for i in prange(size):
count_mat[vec1[i], vec2[i]] += 1

return count_mat

c = urn.c
fig, axs = plt.subplots(c, c, figsize=(14, 14))

# grids for ploting the bivariate Gaussian


x_grid = np.linspace(-2, n+1, 100)
y_grid = np.linspace(-2, n+1, 100)
X, Y = np.meshgrid(x_grid, y_grid)

for i in range(c):
axs[i, i].hist(sample[:, i], bins=np.arange(0, n, 1), alpha=0.5, density=True,␣
↪label='hypergeom')

axs[i, i].hist(sample_normal[:, i], bins=np.arange(0, n, 1), alpha=0.5,␣


↪density=True, label='normal')

axs[i, i].legend()
axs[i, i].set_title('$k_{' +str(i+1) +'}$')
for j in range(c):
if i == j:
continue

# bivariate Gaussian density function


Z = bivariate_normal(X, Y, μ, Σ, i, j)
cs = axs[i, j].contour(X, Y, Z, 4, colors="black", alpha=0.6)
axs[i, j].clabel(cs, inline=1, fontsize=10)

# empirical multivariate hypergeometric distrbution


count_mat = count(sample[:, i], sample[:, j], n)
axs[i, j].pcolor(count_mat.T/size, cmap='Blues')
axs[i, j].set_title('$(k_{' +str(i+1) +'}, k_{' + str(j+1) + '})$')

plt.show()

206 Chapter 11. Multivariate Hypergeometric Distribution


Intermediate Quantitative Economics with Python

The diagonal graphs plot the marginal distributions of 𝑘𝑖 for each 𝑖 using histograms.
Note the substantial differences between hypergeometric distribution and the approximating normal distribution.
The off-diagonal graphs plot the empirical joint distribution of 𝑘𝑖 and 𝑘𝑗 for each pair (𝑖, 𝑗).
The darker the blue, the more data points are contained in the corresponding cell. (Note that 𝑘𝑖 is on the x-axis and 𝑘𝑗 is
on the y-axis).
The contour maps plot the bivariate Gaussian density function of (𝑘𝑖 , 𝑘𝑗 ) with the population mean and covariance given
by slices of 𝜇 and Σ that we computed above.
Let’s also test the normality for each 𝑘𝑖 using scipy.stats.normaltest that implements D’Agostino and Pearson’s
test that combines skew and kurtosis to form an omnibus test of normality.
The null hypothesis is that the sample follows normal distribution.
normaltest returns an array of p-values associated with tests for each 𝑘𝑖 sample.

11.3. Usage 207


Intermediate Quantitative Economics with Python

test_multihyper = normaltest(sample)
test_multihyper.pvalue

array([0., 0., 0., 0.])

As we can see, all the p-values are almost 0 and the null hypothesis is soundly rejected.
By contrast, the sample from normal distribution does not reject the null hypothesis.

test_normal = normaltest(sample_normal)
test_normal.pvalue

array([0.8969004 , 0.27041724, 0.9152563 , 0.71988042])

The lesson to take away from this is that the normal approximation is imperfect.

208 Chapter 11. Multivariate Hypergeometric Distribution


CHAPTER

TWELVE

MULTIVARIATE NORMAL DISTRIBUTION

Contents

• Multivariate Normal Distribution


– Overview
– The Multivariate Normal Distribution
– Bivariate Example
– Trivariate Example
– One Dimensional Intelligence (IQ)
– Information as Surprise
– Cholesky Factor Magic
– Math and Verbal Intelligence
– Univariate Time Series Analysis
– Stochastic Difference Equation
– Application to Stock Price Model
– Filtering Foundations
– Classic Factor Analysis Model
– PCA and Factor Analysis

12.1 Overview

This lecture describes a workhorse in probability theory, statistics, and economics, namely, the multivariate normal
distribution.
In this lecture, you will learn formulas for
• the joint distribution of a random vector 𝑥 of length 𝑁
• marginal distributions for all subvectors of 𝑥
• conditional distributions for subvectors of 𝑥 conditional on other subvectors of 𝑥
We will use the multivariate normal distribution to formulate some useful models:

209
Intermediate Quantitative Economics with Python

• a factor analytic model of an intelligence quotient, i.e., IQ


• a factor analytic model of two independent inherent abilities, say, mathematical and verbal.
• a more general factor analytic model
• Principal Components Analysis (PCA) as an approximation to a factor analytic model
• time series generated by linear stochastic difference equations
• optimal linear filtering theory

12.2 The Multivariate Normal Distribution

This lecture defines a Python class MultivariateNormal to be used to generate marginal and conditional distri-
butions associated with a multivariate normal distribution.
For a multivariate normal distribution it is very convenient that
• conditional expectations equal linear least squares projections
• conditional distributions are characterized by multivariate linear regressions
We apply our Python class to some examples.
We use the following imports:

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from numba import njit
import statsmodels.api as sm

Assume that an 𝑁 × 1 random vector 𝑧 has a multivariate normal probability density.


This means that the probability density takes the form
−( 𝑁
2 ) − 12 ′
𝑓 (𝑧; 𝜇, Σ) = (2𝜋) det (Σ) exp (−.5 (𝑧 − 𝜇) Σ−1 (𝑧 − 𝜇))

where 𝜇 = 𝐸𝑧 is the mean of the random vector 𝑧 and Σ = 𝐸 (𝑧 − 𝜇) (𝑧 − 𝜇) is the covariance matrix of 𝑧.
The covariance matrix Σ is symmetric and positive definite.

@njit
def f(z, μ, Σ):
"""
The density function of multivariate normal distribution.

Parameters
---------------
z: ndarray(float, dim=2)
random vector, N by 1
μ: ndarray(float, dim=1 or 2)
the mean of z, N by 1
Σ: ndarray(float, dim=2)
the covarianece matrix of z, N by 1
"""

z = np.atleast_2d(z)
(continues on next page)

210 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

(continued from previous page)


μ = np.atleast_2d(μ)
Σ = np.atleast_2d(Σ)

N = z.size

temp1 = np.linalg.det(Σ) ** (-1/2)


temp2 = np.exp(-.5 * (z - μ).T @ np.linalg.inv(Σ) @ (z - μ))

return (2 * np.pi) ** (-N/2) * temp1 * temp2

For some integer 𝑘 ∈ {1, … , 𝑁 − 1}, partition 𝑧 as

𝑧1
𝑧=[ ],
𝑧2

where 𝑧1 is an (𝑁 − 𝑘) × 1 vector and 𝑧2 is a 𝑘 × 1 vector.


Let
𝜇1 Σ11 Σ12
𝜇=[ ], Σ=[ ]
𝜇2 Σ21 Σ22

be corresponding partitions of 𝜇 and Σ.


The marginal distribution of 𝑧1 is
• multivariate normal with mean 𝜇1 and covariance matrix Σ11 .
The marginal distribution of 𝑧2 is
• multivariate normal with mean 𝜇2 and covariance matrix Σ22 .
The distribution of 𝑧1 conditional on 𝑧2 is
• multivariate normal with mean
𝜇1̂ = 𝜇1 + 𝛽 (𝑧2 − 𝜇2 )
and covariance matrix

Σ̂ 11 = Σ11 − Σ12 Σ−1


22 Σ21 = Σ11 − 𝛽Σ22 𝛽

where

𝛽 = Σ12 Σ−1
22

is an (𝑁 − 𝑘) × 𝑘 matrix of population regression coefficients of the (𝑁 − 𝑘) × 1 random vector 𝑧1 − 𝜇1 on the 𝑘 × 1


random vector 𝑧2 − 𝜇2 .
The following class constructs a multivariate normal distribution instance with two methods.
• a method partition computes 𝛽, taking 𝑘 as an input
• a method cond_dist computes either the distribution of 𝑧1 conditional on 𝑧2 or the distribution of 𝑧2 conditional
on 𝑧1

class MultivariateNormal:
"""
Class of multivariate normal distribution.

(continues on next page)

12.2. The Multivariate Normal Distribution 211


Intermediate Quantitative Economics with Python

(continued from previous page)


Parameters
----------
μ: ndarray(float, dim=1)
the mean of z, N by 1
Σ: ndarray(float, dim=2)
the covarianece matrix of z, N by 1

Arguments
---------
μ, Σ:
see parameters
μs: list(ndarray(float, dim=1))
list of mean vectors μ1 and μ2 in order
Σs: list(list(ndarray(float, dim=2)))
2 dimensional list of covariance matrices
Σ11, Σ12, Σ21, Σ22 in order
βs: list(ndarray(float, dim=1))
list of regression coefficients β1 and β2 in order
"""

def __init__(self, μ, Σ):


"initialization"
self.μ = np.array(μ)
self.Σ = np.atleast_2d(Σ)

def partition(self, k):


"""
Given k, partition the random vector z into a size k vector z1
and a size N-k vector z2. Partition the mean vector μ into
μ1 and μ2, and the covariance matrix Σ into Σ11, Σ12, Σ21, Σ22
correspondingly. Compute the regression coefficients β1 and β2
using the partitioned arrays.
"""
μ = self.μ
Σ = self.Σ

self.μs = [μ[:k], μ[k:]]


self.Σs = [[Σ[:k, :k], Σ[:k, k:]],
[Σ[k:, :k], Σ[k:, k:]]]

self.βs = [self.Σs[0][1] @ np.linalg.inv(self.Σs[1][1]),


self.Σs[1][0] @ np.linalg.inv(self.Σs[0][0])]

def cond_dist(self, ind, z):


"""
Compute the conditional distribution of z1 given z2, or reversely.
Argument ind determines whether we compute the conditional
distribution of z1 (ind=0) or z2 (ind=1).

Returns
---------
μ_hat: ndarray(float, ndim=1)
The conditional mean of z1 or z2.
Σ_hat: ndarray(float, ndim=2)
The conditional covariance matrix of z1 or z2.
"""

(continues on next page)

212 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

(continued from previous page)


β = self.βs[ind]
μs = self.μs
Σs = self.Σs

μ_hat = μs[ind] + β @ (z - μs[1-ind])


Σ_hat = Σs[ind][ind] - β @ Σs[1-ind][1-ind] @ β.T

return μ_hat, Σ_hat

Let’s put this code to work on a suite of examples.


We begin with a simple bivariate example; after that we’ll turn to a trivariate example.
We’ll compute population moments of some conditional distributions using our MultivariateNormal class.
For fun we’ll also compute sample analogs of the associated population regressions by generating simulations and then
computing linear least squares regressions.
We’ll compare those linear least squares regressions for the simulated data to their population counterparts.

12.3 Bivariate Example

We start with a bivariate normal distribution pinned down by


.5 1 .5
𝜇=[ ], Σ=[ ]
1.0 .5 1

μ = np.array([.5, 1.])
Σ = np.array([[1., .5], [.5 ,1.]])

# construction of the multivariate normal instance


multi_normal = MultivariateNormal(μ, Σ)

k = 1 # choose partition

# partition and compute regression coefficients


multi_normal.partition(k)
multi_normal.βs[0],multi_normal.βs[1]

(array([[0.5]]), array([[0.5]]))

Let’s illustrate the fact that you can regress anything on anything else.
We have computed everything we need to compute two regression lines, one of 𝑧2 on 𝑧1 , the other of 𝑧1 on 𝑧2 .
We’ll represent these regressions as

𝑧1 = 𝑎 1 + 𝑏 1 𝑧2 + 𝜖 1

and

𝑧2 = 𝑎 2 + 𝑏 2 𝑧1 + 𝜖 2

where we have the population least squares orthogonality conditions

𝐸𝜖1 𝑧2 = 0

12.3. Bivariate Example 213


Intermediate Quantitative Economics with Python

and

𝐸𝜖2 𝑧1 = 0

Let’s compute 𝑎1 , 𝑎2 , 𝑏1 , 𝑏2 .

beta = multi_normal.βs

a1 = μ[0] - beta[0]*μ[1]
b1 = beta[0]

a2 = μ[1] - beta[1]*μ[0]
b2 = beta[1]

Let’s print out the intercepts and slopes.


For the regression of 𝑧1 on 𝑧2 we have

print ("a1 = ", a1)


print ("b1 = ", b1)

a1 = [[0.]]
b1 = [[0.5]]

For the regression of 𝑧2 on 𝑧1 we have

print ("a2 = ", a2)


print ("b2 = ", b2)

a2 = [[0.75]]
b2 = [[0.5]]

Now let’s plot the two regression lines and stare at them.

z2 = np.linspace(-4,4,100)

a1 = np.squeeze(a1)
b1 = np.squeeze(b1)

a2 = np.squeeze(a2)
b2 = np.squeeze(b2)

z1 = b1*z2 + a1

z1h = z2/b2 - a2/b2

fig = plt.figure(figsize=(12,12))
ax = fig.add_subplot(1, 1, 1)
ax.set(xlim=(-4, 4), ylim=(-4, 4))
ax.spines['left'].set_position('center')
ax.spines['bottom'].set_position('zero')
ax.spines['right'].set_color('none')
(continues on next page)

214 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

(continued from previous page)


ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')
plt.ylabel('$z_1$', loc = 'top')
plt.xlabel('$z_2$,', loc = 'right')
plt.title('two regressions')
plt.plot(z2,z1, 'r', label = "$z_1$ on $z_2$")
plt.plot(z2,z1h, 'b', label = "$z_2$ on $z_1$")
plt.legend()
plt.show()

The red line is the expectation of 𝑧1 conditional on 𝑧2 .


The intercept and slope of the red line are

12.3. Bivariate Example 215


Intermediate Quantitative Economics with Python

print("a1 = ", a1)


print("b1 = ", b1)

a1 = 0.0
b1 = 0.5

The blue line is the expectation of 𝑧2 conditional on 𝑧1 .


The intercept and slope of the blue line are

print("-a2/b2 = ", - a2/b2)


print("1/b2 = ", 1/b2)

-a2/b2 = -1.5
1/b2 = 2.0

We can use these regression lines or our code to compute conditional expectations.
Let’s compute the mean and variance of the distribution of 𝑧2 conditional on 𝑧1 = 5.
After that we’ll reverse what are on the left and right sides of the regression.

# compute the cond. dist. of z1


ind = 1
z1 = np.array([5.]) # given z1

μ2_hat, Σ2_hat = multi_normal.cond_dist(ind, z1)


print('μ2_hat, Σ2_hat = ', μ2_hat, Σ2_hat)

μ2_hat, Σ2_hat = [3.25] [[0.75]]

Now let’s compute the mean and variance of the distribution of 𝑧1 conditional on 𝑧2 = 5.

# compute the cond. dist. of z1


ind = 0
z2 = np.array([5.]) # given z2

μ1_hat, Σ1_hat = multi_normal.cond_dist(ind, z2)


print('μ1_hat, Σ1_hat = ', μ1_hat, Σ1_hat)

μ1_hat, Σ1_hat = [2.5] [[0.75]]

Let’s compare the preceding population mean and variance with outcomes from drawing a large sample and then regressing
𝑧1 − 𝜇1 on 𝑧2 − 𝜇2 .
We know that

𝐸𝑧1 |𝑧2 = (𝜇1 − 𝛽𝜇2 ) + 𝛽𝑧2

which can be arranged to

𝑧1 − 𝜇1 = 𝛽 (𝑧2 − 𝜇2 ) + 𝜖,

We anticipate that for larger and larger sample sizes, estimated OLS coefficients will converge to 𝛽 and the estimated
variance of 𝜖 will converge to Σ̂ 1 .

216 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

n = 1_000_000 # sample size

# simulate multivariate normal random vectors


data = np.random.multivariate_normal(μ, Σ, size=n)
z1_data = data[:, 0]
z2_data = data[:, 1]

# OLS regression
μ1, μ2 = multi_normal.μs
results = sm.OLS(z1_data - μ1, z2_data - μ2).fit()

Let’s compare the preceding population 𝛽 with the OLS sample estimate on 𝑧2 − 𝜇2

multi_normal.βs[0], results.params

(array([[0.5]]), array([0.50068711]))

Let’s compare our population Σ̂ 1 with the degrees-of-freedom adjusted estimate of the variance of 𝜖

Σ1_hat, results.resid @ results.resid.T / (n - 1)

(array([[0.75]]), 0.7504621422788655)

̂ and compare it with 𝜇̂


Lastly, let’s compute the estimate of 𝐸𝑧1 |𝑧2 1

μ1_hat, results.predict(z2 - μ2) + μ1

(array([2.5]), array([2.50274842]))

Thus, in each case, for our very large sample size, the sample analogues closely approximate their population counterparts.
A Law of Large Numbers explains why sample analogues approximate population objects.

12.4 Trivariate Example

Let’s apply our code to a trivariate example.


We’ll specify the mean vector and the covariance matrix as follows.

μ = np.random.random(3)
C = np.random.random((3, 3))
Σ = C @ C.T # positive semi-definite

multi_normal = MultivariateNormal(μ, Σ)

μ, Σ

(array([0.96647091, 0.52989787, 0.54470206]),


array([[1.05309198, 0.68622856, 0.92507853],
[0.68622856, 0.45333322, 0.63969818],
[0.92507853, 0.63969818, 1.03456211]]))

12.4. Trivariate Example 217


Intermediate Quantitative Economics with Python

k = 1
multi_normal.partition(k)

2
Let’s compute the distribution of 𝑧1 conditional on 𝑧2 = [ ].
5

ind = 0
z2 = np.array([2., 5.])

μ1_hat, Σ1_hat = multi_normal.cond_dist(ind, z2)

n = 1_000_000
data = np.random.multivariate_normal(μ, Σ, size=n)
z1_data = data[:, :k]
z2_data = data[:, k:]

μ1, μ2 = multi_normal.μs
results = sm.OLS(z1_data - μ1, z2_data - μ2).fit()

As above, we compare population and sample regression coefficients, the conditional covariance matrix, and the condi-
tional mean vector in that order.

multi_normal.βs[0], results.params

(array([[ 1.97658029, -0.32799991]]), array([ 1.97658228, -0.32800479]))

Σ1_hat, results.resid @ results.resid.T / (n - 1)

(array([[0.00013182]]), 0.0001318492235146378)

μ1_hat, results.predict(z2 - μ2) + μ1

(array([2.41090846]), array([2.41088967]))

Once again, sample analogues do a good job of approximating their populations counterparts.

12.5 One Dimensional Intelligence (IQ)

Let’s move closer to a real-life example, namely, inferring a one-dimensional measure of intelligence called IQ from a list
of test scores.
The 𝑖th test score 𝑦𝑖 equals the sum of an unknown scalar IQ 𝜃 and a random variable 𝑤𝑖 .

𝑦𝑖 = 𝜃 + 𝜎𝑦 𝑤𝑖 , 𝑖 = 1, … , 𝑛

The distribution of IQ’s for a cross-section of people is a normal random variable described by

𝜃 = 𝜇𝜃 + 𝜎𝜃 𝑤𝑛+1 .

We assume that the noises {𝑤𝑖 }𝑁


𝑖=1 in the test scores are IID and not correlated with IQ.

218 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

We also assume that {𝑤𝑖 }𝑛+1


𝑖=1 are i.i.d. standard normal:

𝑤1
⎡ 𝑤 ⎤
2
⎢ ⎥
𝑤=⎢ ⋮ ⎥ ∼ 𝑁 (0, 𝐼𝑛+1 )
⎢ 𝑤𝑛 ⎥
⎣ 𝑤𝑛+1 ⎦

The following system describes the (𝑛 + 1) × 1 random vector 𝑋 that interests us:

𝑦1 𝜇𝜃 𝜎𝑦 0 ⋯ 0 𝜎𝜃 𝑤1
⎡ 𝑦2 ⎤ ⎡ 𝜇𝜃 ⎤ ⎡ 0 𝜎𝑦 ⋯ 0 𝜎𝜃 ⎤⎡ 𝑤 ⎤
2
⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥
𝑋=⎢ ⋮ ⎥=⎢ ⋮ ⎥+⎢ ⋮ ⋮ ⋱ ⋮ ⋮ ⎥⎢ ⋮ ⎥,
⎢ 𝑦𝑛 ⎥ ⎢ 𝜇𝜃 ⎥ ⎢ 0 0 ⋯ 𝜎𝑦 𝜎𝜃 ⎥ ⎢ 𝑤𝑛 ⎥
⎣ 𝜃 ⎦ ⎣ 𝜇𝜃 ⎦ ⎣ 0 0 ⋯ 0 𝜎𝜃 ⎦ ⎣ 𝑤𝑛+1 ⎦

or equivalently,

𝑋 = 𝜇𝜃 1𝑛+1 + 𝐷𝑤

𝑦
where 𝑋 = [ ], 1𝑛+1 is a vector of 1s of size 𝑛 + 1, and 𝐷 is an 𝑛 + 1 by 𝑛 + 1 matrix.
𝜃
Let’s define a Python function that constructs the mean 𝜇 and covariance matrix Σ of the random vector 𝑋 that we know
is governed by a multivariate normal distribution.
As arguments, the function takes the number of tests 𝑛, the mean 𝜇𝜃 and the standard deviation 𝜎𝜃 of the IQ distribution,
and the standard deviation of the randomness in test scores 𝜎𝑦 .

def construct_moments_IQ(n, μθ, σθ, σy):

μ_IQ = np.full(n+1, μθ)

D_IQ = np.zeros((n+1, n+1))


D_IQ[range(n), range(n)] = σy
D_IQ[:, n] = σθ

Σ_IQ = D_IQ @ D_IQ.T

return μ_IQ, Σ_IQ, D_IQ

Now let’s consider a specific instance of this model.


Assume we have recorded 50 test scores and we know that 𝜇𝜃 = 100, 𝜎𝜃 = 10, and 𝜎𝑦 = 10.
We can compute the mean vector and covariance matrix of 𝑋 easily with our construct_moments_IQ function as
follows.

n = 50
μθ, σθ, σy = 100., 10., 10.

μ_IQ, Σ_IQ, D_IQ = construct_moments_IQ(n, μθ, σθ, σy)


μ_IQ, Σ_IQ, D_IQ

(array([100., 100., 100., 100., 100., 100., 100., 100., 100., 100., 100.,
100., 100., 100., 100., 100., 100., 100., 100., 100., 100., 100.,
100., 100., 100., 100., 100., 100., 100., 100., 100., 100., 100.,
(continues on next page)

12.5. One Dimensional Intelligence (IQ) 219


Intermediate Quantitative Economics with Python

(continued from previous page)


100., 100., 100., 100., 100., 100., 100., 100., 100., 100., 100.,
100., 100., 100., 100., 100., 100., 100.]),
array([[200., 100., 100., ..., 100., 100., 100.],
[100., 200., 100., ..., 100., 100., 100.],
[100., 100., 200., ..., 100., 100., 100.],
...,
[100., 100., 100., ..., 200., 100., 100.],
[100., 100., 100., ..., 100., 200., 100.],
[100., 100., 100., ..., 100., 100., 100.]]),
array([[10., 0., 0., ..., 0., 0., 10.],
[ 0., 10., 0., ..., 0., 0., 10.],
[ 0., 0., 10., ..., 0., 0., 10.],
...,
[ 0., 0., 0., ..., 10., 0., 10.],
[ 0., 0., 0., ..., 0., 10., 10.],
[ 0., 0., 0., ..., 0., 0., 10.]]))

We can now use our MultivariateNormal class to construct an instance, then partition the mean vector and co-
variance matrix as we wish.
We want to regress IQ, the random variable 𝜃 (what we don’t know), on the vector 𝑦 of test scores (what we do know).
We choose k=n so that 𝑧1 = 𝑦 and 𝑧2 = 𝜃.

multi_normal_IQ = MultivariateNormal(μ_IQ, Σ_IQ)

k = n
multi_normal_IQ.partition(k)

Using the generator multivariate_normal, we can make one draw of the random vector from our distribution and
then compute the distribution of 𝜃 conditional on our test scores.
Let’s do that and then print out some pertinent quantities.

x = np.random.multivariate_normal(μ_IQ, Σ_IQ)
y = x[:-1] # test scores
θ = x[-1] # IQ

# the true value


θ

103.64946988092446

The method cond_dist takes test scores 𝑦 as input and returns the conditional normal distribution of the IQ 𝜃.
In the following code, ind sets the variables on the right side of the regression.
Given the way we have defined the vector 𝑋, we want to set ind=1 in order to make 𝜃 the left side variable in the
population regression.

ind = 1
multi_normal_IQ.cond_dist(ind, y)

(array([106.80818783]), array([[1.96078431]]))

220 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

The first number is the conditional mean 𝜇𝜃̂ and the second is the conditional variance Σ̂ 𝜃 .
How do additional test scores affect our inferences?
To shed light on this, we compute a sequence of conditional distributions of 𝜃 by varying the number of test scores in the
conditioning set from 1 to 𝑛.
We’ll make a pretty graph showing how our judgment of the person’s IQ change as more test results come in.

# array for containing moments


μθ_hat_arr = np.empty(n)
Σθ_hat_arr = np.empty(n)

# loop over number of test scores


for i in range(1, n+1):
# construction of multivariate normal distribution instance
μ_IQ_i, Σ_IQ_i, D_IQ_i = construct_moments_IQ(i, μθ, σθ, σy)
multi_normal_IQ_i = MultivariateNormal(μ_IQ_i, Σ_IQ_i)

# partition and compute conditional distribution


multi_normal_IQ_i.partition(i)
scores_i = y[:i]
μθ_hat_i, Σθ_hat_i = multi_normal_IQ_i.cond_dist(1, scores_i)

# store the results


μθ_hat_arr[i-1] = μθ_hat_i[0]
Σθ_hat_arr[i-1] = Σθ_hat_i[0, 0]

# transform variance to standard deviation


σθ_hat_arr = np.sqrt(Σθ_hat_arr)

μθ_hat_lower = μθ_hat_arr - 1.96 * σθ_hat_arr


μθ_hat_higher = μθ_hat_arr + 1.96 * σθ_hat_arr

plt.hlines(θ, 1, n+1, ls='--', label='true $θ$')


plt.plot(range(1, n+1), μθ_hat_arr, color='b', label='$\hat{μ}_{θ}$')
plt.plot(range(1, n+1), μθ_hat_lower, color='b', ls='--')
plt.plot(range(1, n+1), μθ_hat_higher, color='b', ls='--')
plt.fill_between(range(1, n+1), μθ_hat_lower, μθ_hat_higher,
color='b', alpha=0.2, label='95%')

plt.xlabel('number of test scores')


plt.ylabel('$\hat{θ}$')
plt.legend()

plt.show()

12.5. One Dimensional Intelligence (IQ) 221


Intermediate Quantitative Economics with Python

The solid blue line in the plot above shows 𝜇𝜃̂ as a function of the number of test scores that we have recorded and
conditioned on.
The blue area shows the span that comes from adding or subtracting 1.96𝜎̂𝜃 from 𝜇𝜃̂ .
Therefore, 95% of the probability mass of the conditional distribution falls in this range.
The value of the random 𝜃 that we drew is shown by the black dotted line.
As more and more test scores come in, our estimate of the person’s 𝜃 become more and more reliable.
By staring at the changes in the conditional distributions, we see that adding more test scores makes 𝜃 ̂ settle down and
approach 𝜃.
Thus, each 𝑦𝑖 adds information about 𝜃.
1
If we were to drive the number of tests 𝑛 → +∞, the conditional standard deviation 𝜎̂𝜃 would converge to 0 at rate 𝑛.5 .

12.6 Information as Surprise

By using a different representation, let’s look at things from a different perspective.


We can represent the random vector 𝑋 defined above as

𝑋 = 𝜇𝜃 1𝑛+1 + 𝐶𝜖, 𝜖 ∼ 𝑁 (0, 𝐼)

where 𝐶 is a lower triangular Cholesky factor of Σ so that

Σ ≡ 𝐷𝐷′ = 𝐶𝐶 ′

and

𝐸𝜖𝜖′ = 𝐼.

It follows that

𝜖 ∼ 𝑁 (0, 𝐼).

Let 𝐺 = 𝐶 −1

222 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

𝐺 is also lower triangular.


We can compute 𝜖 from the formula

𝜖 = 𝐺 (𝑋 − 𝜇𝜃 1𝑛+1 )

This formula confirms that the orthonormal vector 𝜖 contains the same information as the non-orthogonal vector
(𝑋 − 𝜇𝜃 1𝑛+1 ).
We can say that 𝜖 is an orthogonal basis for (𝑋 − 𝜇𝜃 1𝑛+1 ).
Let 𝑐𝑖 be the 𝑖th element in the last row of 𝐶.
Then we can write

𝜃 = 𝜇𝜃 + 𝑐1 𝜖1 + 𝑐2 𝜖2 + ⋯ + 𝑐𝑛 𝜖𝑛 + 𝑐𝑛+1 𝜖𝑛+1 (12.1)

The mutual orthogonality of the 𝜖𝑖 ’s provides us with an informative way to interpret them in light of equation (12.1).
Thus, relative to what is known from tests 𝑖 = 1, … , 𝑛 − 1, 𝑐𝑖 𝜖𝑖 is the amount of new information about 𝜃 brought by
the test number 𝑖.
Here new information means surprise or what could not be predicted from earlier information.
Formula (12.1) also provides us with an enlightening way to express conditional means and conditional variances that we
computed earlier.
In particular,

𝐸 [𝜃 ∣ 𝑦1 , … , 𝑦𝑘 ] = 𝜇𝜃 + 𝑐1 𝜖1 + ⋯ + 𝑐𝑘 𝜖𝑘

and
2 2 2
𝑉 𝑎𝑟 (𝜃 ∣ 𝑦1 , … , 𝑦𝑘 ) = 𝑐𝑘+1 + 𝑐𝑘+2 + ⋯ + 𝑐𝑛+1 .

C = np.linalg.cholesky(Σ_IQ)
G = np.linalg.inv(C)

ε = G @ (x - μθ)

cε = C[n, :] * ε

# compute the sequence of μθ and Σθ conditional on y1, y2, ..., yk


μθ_hat_arr_C = np.array([np.sum(cε[:k+1]) for k in range(n)]) + μθ
Σθ_hat_arr_C = np.array([np.sum(C[n, i+1:n+1] ** 2) for i in range(n)])

To confirm that these formulas give the same answers that we computed earlier, we can compare the means and variances
of 𝜃 conditional on {𝑦𝑖 }𝑘𝑖=1 with what we obtained above using the formulas implemented in the class Multivari-
ateNormal built on our original representation of conditional distributions for multivariate normal distributions.

# conditional mean
np.max(np.abs(μθ_hat_arr - μθ_hat_arr_C)) < 1e-10

True

# conditional variance
np.max(np.abs(Σθ_hat_arr - Σθ_hat_arr_C)) < 1e-10

12.6. Information as Surprise 223


Intermediate Quantitative Economics with Python

True

12.7 Cholesky Factor Magic

Evidently, the Cholesky factorizations automatically computes the population regression coefficients and associated
statistics that are produced by our MultivariateNormal class.
The Cholesky factorization computes these things recursively.
Indeed, in formula (12.1),
• the random variable 𝑐𝑖 𝜖𝑖 is information about 𝜃 that is not contained by the information in 𝜖1 , 𝜖2 , … , 𝜖𝑖−1
• the coefficient 𝑐𝑖 is the simple population regression coefficient of 𝜃 − 𝜇𝜃 on 𝜖𝑖

12.8 Math and Verbal Intelligence

We can alter the preceding example to be more realistic.


There is ample evidence that IQ is not a scalar.
Some people are good in math skills but poor in language skills.
Other people are good in language skills but poor in math skills.
So now we shall assume that there are two dimensions of IQ, 𝜃 and 𝜂.
These determine average performances in math and language tests, respectively.
We observe math scores {𝑦𝑖 }𝑛𝑖=1 and language scores {𝑦𝑖 }2𝑛
𝑖=𝑛+1 .

When 𝑛 = 2, we assume that outcomes are draws from a multivariate normal distribution with representation

𝑦1 𝜇𝜃 𝜎𝑦 0 0 0 𝜎𝜃 0 𝑤1
⎡ 𝑦2 ⎤ ⎡ 𝜇𝜃 ⎤ ⎡ 0 𝜎𝑦 0 0 𝜎𝜃 0 ⎤⎡ 𝑤2 ⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎥
𝑦3 ⎥=⎢ 𝜇𝜂 ⎥+⎢ 0 0 𝜎𝑦 0 0 𝜎𝜂 ⎥⎢ 𝑤3
𝑋=⎢ ⎥
⎢ 𝑦4 ⎥ ⎢ 𝜇𝜂 ⎥ ⎢ 0 0 0 𝜎𝑦 0 𝜎𝜂 ⎥⎢ 𝑤4 ⎥
⎢ 𝜃 ⎥ ⎢ 𝜇𝜃 ⎥ ⎢ 0 0 0 0 𝜎𝜃 0 ⎥⎢ 𝑤5 ⎥
⎣ 𝜂 ⎦ ⎣ 𝜇𝜂 ⎦ ⎣ 0 0 0 0 0 𝜎𝜂 ⎦⎣ 𝑤6 ⎦

𝑤1
⎡𝑤 ⎤
where 𝑤 ⎢ 2 ⎥ is a standard normal random vector.
⎢ ⋮ ⎥
⎣𝑤6 ⎦
We construct a Python function construct_moments_IQ2d to construct the mean vector and covariance matrix of
the joint normal distribution.

def construct_moments_IQ2d(n, μθ, σθ, μη, ση, σy):

μ_IQ2d = np.empty(2*(n+1))
μ_IQ2d[:n] = μθ
μ_IQ2d[2*n] = μθ
μ_IQ2d[n:2*n] = μη
μ_IQ2d[2*n+1] = μη
(continues on next page)

224 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

(continued from previous page)

D_IQ2d = np.zeros((2*(n+1), 2*(n+1)))


D_IQ2d[range(2*n), range(2*n)] = σy
D_IQ2d[:n, 2*n] = σθ
D_IQ2d[2*n, 2*n] = σθ
D_IQ2d[n:2*n, 2*n+1] = ση
D_IQ2d[2*n+1, 2*n+1] = ση

Σ_IQ2d = D_IQ2d @ D_IQ2d.T

return μ_IQ2d, Σ_IQ2d, D_IQ2d

Let’s put the function to work.

n = 2
# mean and variance of θ, η, and y
μθ, σθ, μη, ση, σy = 100., 10., 100., 10, 10

μ_IQ2d, Σ_IQ2d, D_IQ2d = construct_moments_IQ2d(n, μθ, σθ, μη, ση, σy)


μ_IQ2d, Σ_IQ2d, D_IQ2d

(array([100., 100., 100., 100., 100., 100.]),


array([[200., 100., 0., 0., 100., 0.],
[100., 200., 0., 0., 100., 0.],
[ 0., 0., 200., 100., 0., 100.],
[ 0., 0., 100., 200., 0., 100.],
[100., 100., 0., 0., 100., 0.],
[ 0., 0., 100., 100., 0., 100.]]),
array([[10., 0., 0., 0., 10., 0.],
[ 0., 10., 0., 0., 10., 0.],
[ 0., 0., 10., 0., 0., 10.],
[ 0., 0., 0., 10., 0., 10.],
[ 0., 0., 0., 0., 10., 0.],
[ 0., 0., 0., 0., 0., 10.]]))

# take one draw


x = np.random.multivariate_normal(μ_IQ2d, Σ_IQ2d)
y1 = x[:n]
y2 = x[n:2*n]
θ = x[2*n]
η = x[2*n+1]

# the true values


θ, η

(83.26886447129678, 112.92159885842455)

We first compute the joint normal distribution of (𝜃, 𝜂).

multi_normal_IQ2d = MultivariateNormal(μ_IQ2d, Σ_IQ2d)

k = 2*n # the length of data vector


(continues on next page)

12.8. Math and Verbal Intelligence 225


Intermediate Quantitative Economics with Python

(continued from previous page)


multi_normal_IQ2d.partition(k)

multi_normal_IQ2d.cond_dist(1, [*y1, *y2])

(array([ 85.61557319, 105.80129067]),


array([[33.33333333, 0. ],
[ 0. , 33.33333333]]))

Now let’s compute distributions of 𝜃 and 𝜇 separately conditional on various subsets of test scores.
It will be fun to compare outcomes with the help of an auxiliary function cond_dist_IQ2d that we now construct.

def cond_dist_IQ2d(μ, Σ, data):

n = len(μ)

multi_normal = MultivariateNormal(μ, Σ)
multi_normal.partition(n-1)
μ_hat, Σ_hat = multi_normal.cond_dist(1, data)

return μ_hat, Σ_hat

Let’s see how things work for an example.

for indices, IQ, conditions in [([*range(2*n), 2*n], 'θ', 'y1, y2, y3, y4'),
([*range(n), 2*n], 'θ', 'y1, y2'),
([*range(n, 2*n), 2*n], 'θ', 'y3, y4'),
([*range(2*n), 2*n+1], 'η', 'y1, y2, y3, y4'),
([*range(n), 2*n+1], 'η', 'y1, y2'),
([*range(n, 2*n), 2*n+1], 'η', 'y3, y4')]:

μ_hat, Σ_hat = cond_dist_IQ2d(μ_IQ2d[indices], Σ_IQ2d[indices][:, indices],␣


↪ x[indices[:-1]])
print(f'The mean and variance of {IQ} conditional on {conditions: <15} are ' +
f'{μ_hat[0]:1.2f} and {Σ_hat[0, 0]:1.2f} respectively')

The mean and variance of θ conditional on y1, y2, y3, y4 are 85.62 and 33.33␣
↪respectively

The mean and variance of θ conditional on y1, y2 are 85.62 and 33.33␣
↪respectively

The mean and variance of θ conditional on y3, y4 are 100.00 and 100.00␣
↪respectively

The mean and variance of η conditional on y1, y2, y3, y4 are 105.80 and 33.33␣
↪respectively

The mean and variance of η conditional on y1, y2 are 100.00 and 100.00␣
↪respectively

The mean and variance of η conditional on y3, y4 are 105.80 and 33.33␣
↪respectively

Evidently, math tests provide no information about 𝜇 and language tests provide no information about 𝜂.

226 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

12.9 Univariate Time Series Analysis

We can use the multivariate normal distribution and a little matrix algebra to present foundations of univariate linear time
series analysis.
Let 𝑥𝑡 , 𝑦𝑡 , 𝑣𝑡 , 𝑤𝑡+1 each be scalars for 𝑡 ≥ 0.
Consider the following model:

𝑥0 ∼ 𝑁 (0, 𝜎02 )
𝑥𝑡+1 = 𝑎𝑥𝑡 + 𝑏𝑤𝑡+1 , 𝑤𝑡+1 ∼ 𝑁 (0, 1) , 𝑡 ≥ 0
𝑦𝑡 = 𝑐𝑥𝑡 + 𝑑𝑣𝑡 , 𝑣𝑡 ∼ 𝑁 (0, 1) , 𝑡 ≥ 0

We can compute the moments of 𝑥𝑡


1. 𝐸𝑥2𝑡+1 = 𝑎2 𝐸𝑥2𝑡 + 𝑏2 , 𝑡 ≥ 0, where 𝐸𝑥20 = 𝜎02
2. 𝐸𝑥𝑡+𝑗 𝑥𝑡 = 𝑎𝑗 𝐸𝑥2𝑡 , ∀𝑡 ∀𝑗
Given some 𝑇 , we can formulate the sequence {𝑥𝑡 }𝑇𝑡=0 as a random vector

𝑥0
⎡ 𝑥 ⎤
𝑋=⎢ 1 ⎥
⎢ ⋮ ⎥
⎣ 𝑥𝑇 ⎦
and the covariance matrix Σ𝑥 can be constructed using the moments we have computed above.
Similarly, we can define

𝑦0 𝑣0
⎡ 𝑦 ⎤ ⎡ 𝑣 ⎤
𝑌 =⎢ 1 ⎥, 𝑣=⎢ 1 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣ 𝑦𝑇 ⎦ ⎣ 𝑣𝑇 ⎦
and therefore

𝑌 = 𝐶𝑋 + 𝐷𝑉

where 𝐶 and 𝐷 are both diagonal matrices with constant 𝑐 and 𝑑 as diagonal respectively.
Consequently, the covariance matrix of 𝑌 is

Σ𝑦 = 𝐸𝑌 𝑌 ′ = 𝐶Σ𝑥 𝐶 ′ + 𝐷𝐷′

By stacking 𝑋 and 𝑌 , we can write

𝑋
𝑍=[ ]
𝑌

and
Σ𝑥 Σ𝑥 𝐶 ′
Σ𝑧 = 𝐸𝑍𝑍 ′ = [ ]
𝐶Σ𝑥 Σ𝑦

Thus, the stacked sequences {𝑥𝑡 }𝑇𝑡=0 and {𝑦𝑡 }𝑇𝑡=0 jointly follow the multivariate normal distribution 𝑁 (0, Σ𝑧 ).

# as an example, consider the case where T = 3


T = 3

12.9. Univariate Time Series Analysis 227


Intermediate Quantitative Economics with Python

# variance of the initial distribution x_0


σ0 = 1.

# parameters of the equation system


a = .9
b = 1.
c = 1.0
d = .05

# construct the covariance matrix of X


Σx = np.empty((T+1, T+1))

Σx[0, 0] = σ0 ** 2
for i in range(T):
Σx[i, i+1:] = Σx[i, i] * a ** np.arange(1, T+1-i)
Σx[i+1:, i] = Σx[i, i+1:]

Σx[i+1, i+1] = a ** 2 * Σx[i, i] + b ** 2

Σx

array([[1. , 0.9 , 0.81 , 0.729 ],


[0.9 , 1.81 , 1.629 , 1.4661 ],
[0.81 , 1.629 , 2.4661 , 2.21949 ],
[0.729 , 1.4661 , 2.21949 , 2.997541]])

# construct the covariance matrix of Y


C = np.eye(T+1) * c
D = np.eye(T+1) * d

Σy = C @ Σx @ C.T + D @ D.T

# construct the covariance matrix of Z


Σz = np.empty((2*(T+1), 2*(T+1)))

Σz[:T+1, :T+1] = Σx
Σz[:T+1, T+1:] = Σx @ C.T
Σz[T+1:, :T+1] = C @ Σx
Σz[T+1:, T+1:] = Σy

Σz

array([[1. , 0.9 , 0.81 , 0.729 , 1. , 0.9 ,


0.81 , 0.729 ],
[0.9 , 1.81 , 1.629 , 1.4661 , 0.9 , 1.81 ,
1.629 , 1.4661 ],
[0.81 , 1.629 , 2.4661 , 2.21949 , 0.81 , 1.629 ,
2.4661 , 2.21949 ],
[0.729 , 1.4661 , 2.21949 , 2.997541, 0.729 , 1.4661 ,
2.21949 , 2.997541],
[1. , 0.9 , 0.81 , 0.729 , 1.0025 , 0.9 ,
0.81 , 0.729 ],
(continues on next page)

228 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

(continued from previous page)


[0.9 , 1.81 , 1.629 , 1.4661 , 0.9 , 1.8125 ,
1.629 , 1.4661 ],
[0.81 , 1.629 , 2.4661 , 2.21949 , 0.81 , 1.629 ,
2.4686 , 2.21949 ],
[0.729 , 1.4661 , 2.21949 , 2.997541, 0.729 , 1.4661 ,
2.21949 , 3.000041]])

# construct the mean vector of Z


μz = np.zeros(2*(T+1))

The following Python code lets us sample random vectors 𝑋 and 𝑌 .


This is going to be very useful for doing the conditioning to be used in the fun exercises below.

z = np.random.multivariate_normal(μz, Σz)

x = z[:T+1]
y = z[T+1:]

12.9.1 Smoothing Example

This is an instance of a classic smoothing calculation whose purpose is to compute 𝐸𝑋 ∣ 𝑌 .


An interpretation of this example is
• 𝑋 is a random sequence of hidden Markov state variables 𝑥𝑡
• 𝑌 is a sequence of observed signals 𝑦𝑡 bearing information about the hidden state

# construct a MultivariateNormal instance


multi_normal_ex1 = MultivariateNormal(μz, Σz)
x = z[:T+1]
y = z[T+1:]

# partition Z into X and Y


multi_normal_ex1.partition(T+1)

# compute the conditional mean and covariance matrix of X given Y=y

print("X = ", x)
print("Y = ", y)
print(" E [ X | Y] = ", )

multi_normal_ex1.cond_dist(0, y)

X = [0.84498196 0.39657404 1.96415412 1.34909681]


Y = [0.84836004 0.36291572 1.96174386 1.3549349 ]
E [ X | Y] =

(array([0.84536178, 0.36755731, 1.95676737, 1.35594775]),


array([[2.48875094e-03, 5.57449314e-06, 1.24861729e-08, 2.80236945e-11],
(continues on next page)

12.9. Univariate Time Series Analysis 229


Intermediate Quantitative Economics with Python

(continued from previous page)


[5.57449314e-06, 2.48876343e-03, 5.57452116e-06, 1.25113944e-08],
[1.24861728e-08, 5.57452116e-06, 2.48876346e-03, 5.58575339e-06],
[2.80236945e-11, 1.25113941e-08, 5.58575339e-06, 2.49377812e-03]]))

12.9.2 Filtering Exercise

Compute 𝐸 [𝑥𝑡 ∣ 𝑦𝑡−1 , 𝑦𝑡−2 , … , 𝑦0 ].


To do so, we need to first construct the mean vector and the covariance matrix of the subvector [𝑥𝑡 , 𝑦0 , … , 𝑦𝑡−2 , 𝑦𝑡−1 ].
For example, let’s say that we want the conditional distribution of 𝑥3 .

t = 3

# mean of the subvector


sub_μz = np.zeros(t+1)

# covariance matrix of the subvector


sub_Σz = np.empty((t+1, t+1))

sub_Σz[0, 0] = Σz[t, t] # x_t


sub_Σz[0, 1:] = Σz[t, T+1:T+t+1]
sub_Σz[1:, 0] = Σz[T+1:T+t+1, t]
sub_Σz[1:, 1:] = Σz[T+1:T+t+1, T+1:T+t+1]

sub_Σz

array([[2.997541, 0.729 , 1.4661 , 2.21949 ],


[0.729 , 1.0025 , 0.9 , 0.81 ],
[1.4661 , 0.9 , 1.8125 , 1.629 ],
[2.21949 , 0.81 , 1.629 , 2.4686 ]])

multi_normal_ex2 = MultivariateNormal(sub_μz, sub_Σz)


multi_normal_ex2.partition(1)

sub_y = y[:t]

multi_normal_ex2.cond_dist(0, sub_y)

(array([1.76190901]), array([[1.00201996]]))

230 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

12.9.3 Prediction Exercise

Compute 𝐸 [𝑦𝑡 ∣ 𝑦𝑡−𝑗 , … , 𝑦0 ].


As what we did in exercise 2, we will construct the mean vector and covariance matrix of the subvector
[𝑦𝑡 , 𝑦0 , … , 𝑦𝑡−𝑗−1 , 𝑦𝑡−𝑗 ].
For example, we take a case in which 𝑡 = 3 and 𝑗 = 2.

t = 3
j = 2

sub_μz = np.zeros(t-j+2)
sub_Σz = np.empty((t-j+2, t-j+2))

sub_Σz[0, 0] = Σz[T+t+1, T+t+1]


sub_Σz[0, 1:] = Σz[T+t+1, T+1:T+t-j+2]
sub_Σz[1:, 0] = Σz[T+1:T+t-j+2, T+t+1]
sub_Σz[1:, 1:] = Σz[T+1:T+t-j+2, T+1:T+t-j+2]

sub_Σz

array([[3.000041, 0.729 , 1.4661 ],


[0.729 , 1.0025 , 0.9 ],
[1.4661 , 0.9 , 1.8125 ]])

multi_normal_ex3 = MultivariateNormal(sub_μz, sub_Σz)


multi_normal_ex3.partition(1)

sub_y = y[:t-j+1]

multi_normal_ex3.cond_dist(0, sub_y)

(array([0.29476547]), array([[1.81413617]]))

12.9.4 Constructing a Wold Representation

Now we’ll apply Cholesky decomposition to decompose Σ𝑦 = 𝐻𝐻 ′ and form

𝜖 = 𝐻 −1 𝑌 .

Then we can represent 𝑦𝑡 as

𝑦𝑡 = ℎ𝑡,𝑡 𝜖𝑡 + ℎ𝑡,𝑡−1 𝜖𝑡−1 + ⋯ + ℎ𝑡,0 𝜖0 .

H = np.linalg.cholesky(Σy)

12.9. Univariate Time Series Analysis 231


Intermediate Quantitative Economics with Python

array([[1.00124922, 0. , 0. , 0. ],
[0.8988771 , 1.00225743, 0. , 0. ],
[0.80898939, 0.89978675, 1.00225743, 0. ],
[0.72809046, 0.80980808, 0.89978676, 1.00225743]])

ε = np.linalg.inv(H) @ y

array([ 0.84730157, -0.39780625, 1.63054582, -0.40605746])

array([0.84836004, 0.36291572, 1.96174386, 1.3549349 ])

This example is an instance of what is known as a Wold representation in time series analysis.

12.10 Stochastic Difference Equation

Consider the stochastic second-order linear difference equation


𝑦𝑡 = 𝛼0 + 𝛼1 𝑦𝑦−1 + 𝛼2 𝑦𝑡−2 + 𝑢𝑡
where 𝑢𝑡 ∼ 𝑁 (0, 𝜎𝑢2 ) and
𝑦−1
[ ] ∼ 𝑁 (𝜇𝑦̃ , Σ𝑦̃ )
𝑦0
It can be written as a stacked system
1 0 0 0 ⋯ 0 0 0 𝑦1 𝛼0 + 𝛼1 𝑦0 + 𝛼2 𝑦−1 𝑢1
⎡ −𝛼 1 0 0 ⋯ 0 0 0 ⎤⎡ 𝑦2 ⎤ ⎡ 𝛼 0 + 𝛼 2 𝑦0 ⎤ ⎡ 𝑢 ⎤
1
⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ 2 ⎥
⎢ −𝛼2 −𝛼1 1 0 ⋯ 0 0 0 ⎥⎢ 𝑦3 ⎥= ⎢ 𝛼0 ⎥+ ⎢ 𝑢3 ⎥
⎢ 0 −𝛼2 −𝛼1 1 ⋯ 0 0 0 ⎥⎢ 𝑦4 ⎥ ⎢ 𝛼0 ⎥ ⎢ 𝑢4 ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⋯ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
⎣ 0 0 0 0 ⋯ −𝛼2 −𝛼1 1 ⎦ ⎣
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ 𝑦𝑇 ⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⎣ 𝛼0 ⎦ ⎣ 𝑢𝑇 ⎦

≡𝐴 ≡𝑏 ≡𝑢

We can compute 𝑦 by solving the system


𝑦 = 𝐴−1 (𝑏 + 𝑢)
We have
𝜇𝑦 = 𝐴−1 𝜇𝑏
′ ′
Σ𝑦 = 𝐴−1 𝐸 [(𝑏 − 𝜇𝑏 + 𝑢) (𝑏 − 𝜇𝑏 + 𝑢) ] (𝐴−1 )

= 𝐴−1 (Σ𝑏 + Σ𝑢 ) (𝐴−1 )
where
𝛼0 + 𝛼1 𝜇𝑦0 + 𝛼2 𝜇𝑦−1
⎡ 𝛼 0 + 𝛼 2 𝜇 𝑦0 ⎤
⎢ ⎥
𝜇𝑏 = ⎢ 𝛼0 ⎥
⎢ ⋮ ⎥
⎣ 𝛼0 ⎦

232 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

𝐶Σ𝑦̃ 𝐶 ′ 0𝑁−2×𝑁−2 𝛼2 𝛼1
Σ𝑏 = [ ], 𝐶=[ ]
0𝑁−2×2 0𝑁−2×𝑁−2 0 𝛼2
𝜎𝑢2 0 ⋯ 0
⎡ 0 𝜎𝑢2 ⋯ 0 ⎤
Σ𝑢 = ⎢ ⎥
⎢ ⋮ ⋮ ⋮ ⋮ ⎥
⎣ 0 0 ⋯ 𝜎𝑢2 ⎦
# set parameters
T = 80
T = 160
# coefficients of the second order difference equation
0 = 10
1 = 1.53
2 = -.9

# variance of u
σu = 1.
σu = 10.

# distribution of y_{-1} and y_{0}


μy_tilde = np.array([1., 0.5])
Σy_tilde = np.array([[2., 1.], [1., 0.5]])

# construct A and A^{\prime}


A = np.zeros((T, T))

for i in range(T):
A[i, i] = 1

if i-1 >= 0:
A[i, i-1] = - 1

if i-2 >= 0:
A[i, i-2] = - 2

A_inv = np.linalg.inv(A)

# compute the mean vectors of b and y


μb = np.full(T, 0)
μb[0] += 1 * μy_tilde[1] + 2 * μy_tilde[0]
μb[1] += 2 * μy_tilde[1]

μy = A_inv @ μb

# compute the covariance matrices of b and y


Σu = np.eye(T) * σu ** 2

Σb = np.zeros((T, T))

C = np.array([[ 2, 1], [0, 2]])


Σb[:2, :2] = C @ Σy_tilde @ C.T

Σy = A_inv @ (Σb + Σu) @ A_inv.T

12.10. Stochastic Difference Equation 233


Intermediate Quantitative Economics with Python

12.11 Application to Stock Price Model

Let
𝑇 −𝑡
𝑝𝑡 = ∑ 𝛽 𝑗 𝑦𝑡+𝑗
𝑗=0

Form
𝑝1 1 𝛽 𝛽 2 ⋯ 𝛽 𝑇 −1 𝑦1
⎡ 𝑝 ⎤ ⎡ 0 1 𝛽 ⋯ 𝛽 𝑇 −2 ⎤ ⎡ 𝑦2 ⎤
⎢ 2 ⎥ ⎢ 𝑇 −3 ⎥ ⎢ ⎥
⎢ 𝑝3 ⎥ = ⎢ 0 0 1 ⋯ 𝛽 ⎥⎢ 𝑦3 ⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⋮ ⋮ ⋮ ⋮ ⎥⎢ ⋮ ⎥
⎣ 𝑝𝑇 ⎦ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⏟ ⎣ 0 0 0 ⋯ 1 ⎦⎣ 𝑦𝑇 ⎦
≡𝑝 ≡𝐵

we have
𝜇𝑝 = 𝐵𝜇𝑦
Σ𝑝 = 𝐵Σ𝑦 𝐵′

β = .96

# construct B
B = np.zeros((T, T))

for i in range(T):
B[i, i:] = β ** np.arange(0, T-i)

Denote
𝑦 𝐼
𝑧=[ ]= [ ]𝑦
𝑝 ⏟ 𝐵
≡𝐷

Thus, {𝑦𝑡 }𝑇𝑡=1 and {𝑝𝑡 }𝑇𝑡=1 jointly follow the multivariate normal distribution 𝑁 (𝜇𝑧 , Σ𝑧 ), where

𝜇𝑧 = 𝐷𝜇𝑦

Σ𝑧 = 𝐷Σ𝑦 𝐷′

D = np.vstack([np.eye(T), B])

μz = D @ μy
Σz = D @ Σy @ D.T

We can simulate paths of 𝑦𝑡 and 𝑝𝑡 and compute the conditional mean 𝐸 [𝑝𝑡 ∣ 𝑦𝑡−1 , 𝑦𝑡 ] using the MultivariateNor-
mal class.

z = np.random.multivariate_normal(μz, Σz)
y, p = z[:T], z[T:]

234 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

cond_Ep = np.empty(T-1)

sub_μ = np.empty(3)
sub_Σ = np.empty((3, 3))
for t in range(2, T+1):
sub_μ[:] = μz[[t-2, t-1, T-1+t]]
sub_Σ[:, :] = Σz[[t-2, t-1, T-1+t], :][:, [t-2, t-1, T-1+t]]

multi_normal = MultivariateNormal(sub_μ, sub_Σ)


multi_normal.partition(2)

cond_Ep[t-2] = multi_normal.cond_dist(1, y[t-2:t])[0][0]

plt.plot(range(1, T), y[1:], label='$y_{t}$')


plt.plot(range(1, T), y[:-1], label='$y_{t-1}$')
plt.plot(range(1, T), p[1:], label='$p_{t}$')
plt.plot(range(1, T), cond_Ep, label='$Ep_{t}|y_{t}, y_{t-1}$')

plt.xlabel('t')
plt.legend(loc=1)
plt.show()

In the above graph, the green line is what the price of the stock would be if people had perfect foresight about the path of
dividends while the green line is the conditional expectation 𝐸𝑝𝑡 |𝑦𝑡 , 𝑦𝑡−1 , which is what the price would be if people did
not have perfect foresight but were optimally predicting future dividends on the basis of the information 𝑦𝑡 , 𝑦𝑡−1 at time
𝑡.

12.11. Application to Stock Price Model 235


Intermediate Quantitative Economics with Python

12.12 Filtering Foundations

Assume that 𝑥0 is an 𝑛 × 1 random vector and that 𝑦0 is a 𝑝 × 1 random vector determined by the observation equation

𝑦0 = 𝐺𝑥0 + 𝑣0 , 𝑥0 ∼ 𝒩(𝑥0̂ , Σ0 ), 𝑣0 ∼ 𝒩(0, 𝑅)

where 𝑣0 is orthogonal to 𝑥0 , 𝐺 is a 𝑝 × 𝑛 matrix, and 𝑅 is a 𝑝 × 𝑝 positive definite matrix.


We consider the problem of someone who
• observes 𝑦0
• does not observe 𝑥0 ,
𝑥
• knows 𝑥0̂ , Σ0 , 𝐺, 𝑅 and therefore the joint probability distribution of the vector [ 0 ]
𝑦0
• wants to infer 𝑥0 from 𝑦0 in light of what he knows about that joint probability distribution.
Therefore, the person wants to construct the probability distribution of 𝑥0 conditional on the random vector 𝑦0 .
𝑥0
The joint distribution of [ ] is multivariate normal 𝒩(𝜇, Σ) with
𝑦0

𝑥0̂ Σ0 Σ0 𝐺′
𝜇=[ ], Σ=[ ]
𝐺𝑥0̂ 𝐺Σ0 𝐺Σ0 𝐺′ + 𝑅

By applying an appropriate instance of the above formulas for the mean vector 𝜇1̂ and covariance matrix Σ̂ 11 of 𝑧1
conditional on 𝑧2 , we find that the probability distribution of 𝑥0 conditional on 𝑦0 is 𝒩(𝑥0̃ , Σ̃ 0 ) where

𝛽0 = Σ0 𝐺′ (𝐺Σ0 𝐺′ + 𝑅)−1
𝑥0̃ = 𝑥0̂ + 𝛽0 (𝑦0 − 𝐺𝑥0̂ )
Σ̃ 0 = Σ0 − Σ0 𝐺′ (𝐺Σ0 𝐺′ + 𝑅)−1 𝐺Σ0

We can express our finding that the probability distribution of 𝑥0 conditional on 𝑦0 is 𝒩(𝑥0̃ , Σ̃ 0 ) by representing 𝑥0 as

𝑥0 = 𝑥0̃ + 𝜁0 (12.2)

where 𝜁0 is a Gaussian random vector that is orthogonal to 𝑥0̃ and 𝑦0 and that has mean vector 0 and conditional covariance
matrix 𝐸[𝜁0 𝜁0′ |𝑦0 ] = Σ̃ 0 .

12.12.1 Step toward dynamics

Now suppose that we are in a time series setting and that we have the one-step state transition equation

𝑥1 = 𝐴𝑥0 + 𝐶𝑤1 , 𝑤1 ∼ 𝒩(0, 𝐼)

where 𝐴 is an 𝑛 × 𝑛 matrix and 𝐶 is an 𝑛 × 𝑚 matrix.


Using equation (12.2), we can also represent 𝑥1 as

𝑥1 = 𝐴(𝑥0̃ + 𝜁0 ) + 𝐶𝑤1

It follows that

𝐸𝑥1 |𝑦0 = 𝐴𝑥0̃

236 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

and that the corresponding conditional covariance matrix 𝐸(𝑥1 − 𝐸𝑥1 |𝑦0 )(𝑥1 − 𝐸𝑥1 |𝑦0 )′ ≡ Σ1 is

Σ1 = 𝐴Σ̃ 0 𝐴′ + 𝐶𝐶 ′

or

Σ1 = 𝐴Σ0 𝐴′ − 𝐴Σ0 𝐺′ (𝐺Σ0 𝐺′ + 𝑅)−1 𝐺Σ0 𝐴′

We can write the mean of 𝑥1 conditional on 𝑦0 as

𝑥1̂ = 𝐴𝑥0̂ + 𝐴Σ0 𝐺′ (𝐺Σ0 𝐺′ + 𝑅)−1 (𝑦0 − 𝐺𝑥0̂ )

or

𝑥1̂ = 𝐴𝑥0̂ + 𝐾0 (𝑦0 − 𝐺𝑥0̂ )

where

𝐾0 = 𝐴Σ0 𝐺′ (𝐺Σ0 𝐺′ + 𝑅)−1

12.12.2 Dynamic version

Suppose now that for 𝑡 ≥ 0, {𝑥𝑡+1 , 𝑦𝑡 }∞


𝑡=0 are governed by the equations

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡 + 𝑣𝑡

where as before 𝑥0 ∼ 𝒩(𝑥0̂ , Σ0 ), 𝑤𝑡+1 is the 𝑡 + 1th component of an i.i.d. stochastic process distributed as 𝑤𝑡+1 ∼
𝒩(0, 𝐼), and 𝑣𝑡 is the 𝑡th component of an i.i.d. process distributed as 𝑣𝑡 ∼ 𝒩(0, 𝑅) and the {𝑤𝑡+1 }∞ ∞
𝑡=0 and {𝑣𝑡 }𝑡=0
processes are orthogonal at all pairs of dates.
The logic and formulas that we applied above imply that the probability distribution of 𝑥𝑡 conditional on 𝑦0 , 𝑦1 , … , 𝑦𝑡−1 =
𝑦𝑡−1 is

𝑥𝑡 |𝑦𝑡−1 ∼ 𝒩(𝐴𝑥𝑡̃ , 𝐴Σ̃ 𝑡 𝐴′ + 𝐶𝐶 ′ )

where {𝑥𝑡̃ , Σ̃ 𝑡 }∞
𝑡=1 can be computed by iterating on the following equations starting from 𝑡 = 1 and initial conditions for
̃
𝑥0̃ , Σ0 computed as we have above:

Σ𝑡 = 𝐴Σ̃ 𝑡−1 𝐴′ + 𝐶𝐶 ′
𝑥𝑡̂ = 𝐴𝑥𝑡−1
̃
𝛽𝑡 = Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1
𝑥𝑡̃ = 𝑥𝑡̂ + 𝛽𝑡 (𝑦𝑡 − 𝐺𝑥𝑡̂ )
Σ̃ 𝑡 = Σ𝑡 − Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1 𝐺Σ𝑡

If we shift the first equation forward one period and then substitute the expression for Σ̃ 𝑡 on the right side of the fifth
equation into it we obtain

Σ𝑡+1 = 𝐶𝐶 ′ + 𝐴Σ𝑡 𝐴′ − 𝐴Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1 𝐺Σ𝑡 𝐴′ .

This is a matrix Riccati difference equation that is closely related to another matrix Riccati difference equation that appears
in a quantecon lecture on the basics of linear quadratic control theory.

12.12. Filtering Foundations 237


Intermediate Quantitative Economics with Python

That equation has the form

𝑃𝑡−1 = 𝑅 + 𝐴′ 𝑃𝑡 𝐴 − 𝐴′ 𝑃𝑡 𝐵(𝐵′ 𝑃𝑡 𝐵 + 𝑄)−1 𝐵′ 𝑃𝑡 𝐴.

Stare at the two preceding equations for a moment or two, the first being a matrix difference equation for a conditional
covariance matrix, the second being a matrix difference equation in the matrix appearing in a quadratic form for an
intertemporal cost of value function.
Although the two equations are not identical, they display striking family resemblences.
• the first equation tells dynamics that work forward in time
• the second equation tells dynamics that work backward in time
• while many of the terms are similar, one equation seems to apply matrix transformations to some matrices that play
similar roles in the other equation
The family resemblences of these two equations reflects a transcendent duality that prevails between control theory and
filtering theory.

12.12.3 An example

We can use the Python class MultivariateNormal to construct examples.


Here is an example for a single period problem at time 0

G = np.array([[1., 3.]])
R = np.array([[1.]])

x0_hat = np.array([0., 1.])


Σ0 = np.array([[1., .5], [.3, 2.]])

μ = np.hstack([x0_hat, G @ x0_hat])
Σ = np.block([[Σ0, Σ0 @ G.T], [G @ Σ0, G @ Σ0 @ G.T + R]])

# construction of the multivariate normal instance


multi_normal = MultivariateNormal(μ, Σ)

multi_normal.partition(2)

# the observation of y
y0 = 2.3

# conditional distribution of x0
μ1_hat, Σ11 = multi_normal.cond_dist(0, y0)
μ1_hat, Σ11

(array([-0.078125, 0.803125]),
array([[ 0.72098214, -0.203125 ],
[-0.403125 , 0.228125 ]]))

A = np.array([[0.5, 0.2], [-0.1, 0.3]])


C = np.array([[2.], [1.]])

(continues on next page)

238 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

(continued from previous page)


# conditional distribution of x1
x1_cond = A @ μ1_hat
Σ1_cond = C @ C.T + A @ Σ11 @ A.T
x1_cond, Σ1_cond

(array([0.1215625, 0.24875 ]),


array([[4.12874554, 1.95523214],
[1.92123214, 1.04592857]]))

12.12.4 Code for Iterating

Here is code for solving a dynamic filtering problem by iterating on our equations, followed by an example.

def iterate(x0_hat, Σ0, A, C, G, R, y_seq):

p, n = G.shape

T = len(y_seq)
x_hat_seq = np.empty((T+1, n))
Σ_hat_seq = np.empty((T+1, n, n))

x_hat_seq[0] = x0_hat
Σ_hat_seq[0] = Σ0

for t in range(T):
xt_hat = x_hat_seq[t]
Σt = Σ_hat_seq[t]
μ = np.hstack([xt_hat, G @ xt_hat])
Σ = np.block([[Σt, Σt @ G.T], [G @ Σt, G @ Σt @ G.T + R]])

# filtering
multi_normal = MultivariateNormal(μ, Σ)
multi_normal.partition(n)
x_tilde, Σ_tilde = multi_normal.cond_dist(0, y_seq[t])

# forecasting
x_hat_seq[t+1] = A @ x_tilde
Σ_hat_seq[t+1] = C @ C.T + A @ Σ_tilde @ A.T

return x_hat_seq, Σ_hat_seq

iterate(x0_hat, Σ0, A, C, G, R, [2.3, 1.2, 3.2])

(array([[0. , 1. ],
[0.1215625 , 0.24875 ],
[0.18680212, 0.06904689],
[0.75576875, 0.05558463]]),
array([[[1. , 0.5 ],
[0.3 , 2. ]],

[[4.12874554, 1.95523214],
[1.92123214, 1.04592857]],
(continues on next page)

12.12. Filtering Foundations 239


Intermediate Quantitative Economics with Python

(continued from previous page)

[[4.08198663, 1.99218488],
[1.98640488, 1.00886423]],

[[4.06457628, 2.00041999],
[1.99943739, 1.00275526]]]))

The iterative algorithm just described is a version of the celebrated Kalman filter.
We describe the Kalman filter and some applications of it in A First Look at the Kalman Filter

12.13 Classic Factor Analysis Model

The factor analysis model widely used in psychology and other fields can be represented as

𝑌 = Λ𝑓 + 𝑈

where
1. 𝑌 is 𝑛 × 1 random vector, 𝐸𝑈 𝑈 ′ = 𝐷 is a diagonal matrix,
2. Λ is 𝑛 × 𝑘 coefficient matrix,
3. 𝑓 is 𝑘 × 1 random vector, 𝐸𝑓𝑓 ′ = 𝐼,
4. 𝑈 is 𝑛 × 1 random vector, and 𝑈 ⟂ 𝑓 (i.e., 𝐸𝑈 𝑓 ′ = 0 )
5. It is presumed that 𝑘 is small relative to 𝑛; often 𝑘 is only 1 or 2, as in our IQ examples.
This implies that

Σ𝑦 = 𝐸𝑌 𝑌 ′ = ΛΛ′ + 𝐷
𝐸𝑌 𝑓 ′ = Λ
𝐸𝑓𝑌 ′ = Λ′

Thus, the covariance matrix Σ𝑌 is the sum of a diagonal matrix 𝐷 and a positive semi-definite matrix ΛΛ′ of rank 𝑘.
This means that all covariances among the 𝑛 components of the 𝑌 vector are intermediated by their common dependencies
on the 𝑘 < factors.
Form
𝑓
𝑍=( )
𝑌

the covariance matrix of the expanded random vector 𝑍 can be computed as

𝐼 Λ′
Σ𝑧 = 𝐸𝑍𝑍 ′ = ( )
Λ ΛΛ′ + 𝐷

In the following, we first construct the mean vector and the covariance matrix for the case where 𝑁 = 10 and 𝑘 = 2.

N = 10
k = 2

240 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

We set the coefficient matrix Λ and the covariance matrix of 𝑈 to be


1 0

⎜ ⋮ ⋮ ⎞
⎟ 𝜎𝑢2 0 ⋯ 0
⎜ ⎟ ⎛ ⎞
⎜ 1 0 ⎟ 0 𝜎𝑢2 ⋯ 0
Λ=⎜
⎜ ⎟
⎟ , 𝐷=⎜

⎜ ⋮



⎜ 0 1 ⎟ ⋮ ⋮ ⋮

⎜ ⎟

⋮ ⋮ ⎝ 0 0 ⋯ 𝜎𝑢2 ⎠
⎝ 0 1 ⎠

where the first half of the first column of Λ is filled with 1s and 0s for the rest half, and symmetrically for the second
column.
𝐷 is a diagonal matrix with parameter 𝜎𝑢2 on the diagonal.

Λ = np.zeros((N, k))
Λ[:N//2, 0] = 1
Λ[N//2:, 1] = 1

σu = .5
D = np.eye(N) * σu ** 2

# compute Σy
Σy = Λ @ Λ.T + D

We can now construct the mean vector and the covariance matrix for 𝑍.

μz = np.zeros(k+N)

Σz = np.empty((k+N, k+N))

Σz[:k, :k] = np.eye(k)


Σz[:k, k:] = Λ.T
Σz[k:, :k] = Λ
Σz[k:, k:] = Σy

z = np.random.multivariate_normal(μz, Σz)

f = z[:k]
y = z[k:]

multi_normal_factor = MultivariateNormal(μz, Σz)


multi_normal_factor.partition(k)

Let’s compute the conditional distribution of the hidden factor 𝑓 on the observations 𝑌 , namely, 𝑓 ∣ 𝑌 = 𝑦.

multi_normal_factor.cond_dist(0, y)

(array([-0.30191322, 1.22653669]),
array([[0.04761905, 0. ],
[0. , 0.04761905]]))

We can verify that the conditional mean 𝐸 [𝑓 ∣ 𝑌 = 𝑦] = 𝐵𝑌 where 𝐵 = Λ′ Σ−1


𝑦 .

12.13. Classic Factor Analysis Model 241


Intermediate Quantitative Economics with Python

B = Λ.T @ np.linalg.inv(Σy)

B @ y

array([-0.30191322, 1.22653669])

Similarly, we can compute the conditional distribution 𝑌 ∣ 𝑓.

multi_normal_factor.cond_dist(1, f)

(array([-0.1949429 , -0.1949429 , -0.1949429 , -0.1949429 , -0.1949429 ,


1.36894286, 1.36894286, 1.36894286, 1.36894286, 1.36894286]),
array([[0.25, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0.25, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0.25, 0. , 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0.25, 0. , 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0.25, 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0.25, 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0.25, 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.25, 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.25, 0. ],
[0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.25]]))

It can be verified that the mean is Λ𝐼 −1 𝑓 = Λ𝑓.

Λ @ f

array([-0.1949429 , -0.1949429 , -0.1949429 , -0.1949429 , -0.1949429 ,


1.36894286, 1.36894286, 1.36894286, 1.36894286, 1.36894286])

12.14 PCA and Factor Analysis

To learn about Principal Components Analysis (PCA), please see this lecture Singular Value Decompositions.
For fun, let’s apply a PCA decomposition to a covariance matrix Σ𝑦 that in fact is governed by our factor-analytic model.
Technically, this means that the PCA model is misspecified. (Can you explain why?)
Nevertheless, this exercise will let us study how well the first two principal components from a PCA can approximate the
conditional expectations 𝐸𝑓𝑖 |𝑌 for our two factors 𝑓𝑖 , 𝑖 = 1, 2 for the factor analytic model that we have assumed truly
governs the data on 𝑌 we have generated.
So we compute the PCA decomposition

̃ ′
Σ𝑦 = 𝑃 Λ𝑃

where Λ̃ is a diagonal matrix.


We have

𝑌 = 𝑃𝜖

242 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

and

𝜖 = 𝑃 ′𝑌

Note that we will arrange the eigenvectors in 𝑃 in the descending order of eigenvalues.

_tilde, P = np.linalg.eigh(Σy)

# arrange the eigenvectors by eigenvalues


ind = sorted(range(N), key=lambda x: _tilde[x], reverse=True)

P = P[:, ind]
_tilde = _tilde[ind]
Λ_tilde = np.diag( _tilde)

print(' _tilde =', _tilde)

_tilde = [5.25 5.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25]

# verify the orthogonality of eigenvectors


np.abs(P @ P.T - np.eye(N)).max()

4.440892098500626e-16

# verify the eigenvalue decomposition is correct


P @ Λ_tilde @ P.T

array([[1.25, 1. , 1. , 1. , 1. , 0. , 0. , 0. , 0. , 0. ],
[1. , 1.25, 1. , 1. , 1. , 0. , 0. , 0. , 0. , 0. ],
[1. , 1. , 1.25, 1. , 1. , 0. , 0. , 0. , 0. , 0. ],
[1. , 1. , 1. , 1.25, 1. , 0. , 0. , 0. , 0. , 0. ],
[1. , 1. , 1. , 1. , 1.25, 0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. , 1.25, 1. , 1. , 1. , 1. ],
[0. , 0. , 0. , 0. , 0. , 1. , 1.25, 1. , 1. , 1. ],
[0. , 0. , 0. , 0. , 0. , 1. , 1. , 1.25, 1. , 1. ],
[0. , 0. , 0. , 0. , 0. , 1. , 1. , 1. , 1.25, 1. ],
[0. , 0. , 0. , 0. , 0. , 1. , 1. , 1. , 1. , 1.25]])

ε = P.T @ y

print("ε = ", ε)

ε = [ 2.87975038 -0.70885341 -0.0648366 0.11824707 0.23763429 -0.25914236


-0.06501703 -0.25015218 -0.30772868 -0.37248783]

# print the values of the two factors

print('f = ', f)

f = [-0.1949429 1.36894286]

Below we’ll plot several things

12.14. PCA and Factor Analysis 243


Intermediate Quantitative Economics with Python

• the 𝑁 values of 𝑦
• the 𝑁 values of the principal components 𝜖
• the value of the first factor 𝑓1 plotted only for the first 𝑁 /2 observations of 𝑦 for which it receives a non-zero
loading in Λ
• the value of the second factor 𝑓2 plotted only for the final 𝑁 /2 observations for which it receives a non-zero loading
in Λ

plt.scatter(range(N), y, label='y')
plt.scatter(range(N), ε, label='$\epsilon$')
plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$')
plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$')
plt.legend()

plt.show()

Consequently, the first two 𝜖𝑗 correspond to the largest two eigenvalues.


Let’s look at them, after which we’ll look at 𝐸𝑓|𝑦 = 𝐵𝑦

ε[:2]

array([ 2.87975038, -0.70885341])

# compare with Ef|y


B @ y

array([-0.30191322, 1.22653669])

The fraction of variance in 𝑦𝑡 explained by the first two principal components can be computed as below.

_tilde[:2].sum() / _tilde.sum()

244 Chapter 12. Multivariate Normal Distribution


Intermediate Quantitative Economics with Python

0.84

Compute

𝑌 ̂ = 𝑃 𝑗 𝜖𝑗 + 𝑃 𝑘 𝜖𝑘

where 𝑃𝑗 and 𝑃𝑘 correspond to the largest two eigenvalues.

y_hat = P[:, :2] @ ε[:2]

In this example, it turns out that the projection 𝑌 ̂ of 𝑌 on the first two principal components does a good job of approx-
imating 𝐸𝑓 ∣ 𝑦.
We confirm this in the following plot of 𝑓, 𝐸𝑦 ∣ 𝑓, 𝐸𝑓 ∣ 𝑦, and 𝑦 ̂ on the coordinate axis versus 𝑦 on the ordinate axis.

plt.scatter(range(N), Λ @ f, label='$Ey|f$')
plt.scatter(range(N), y_hat, label='$\hat{y}$')
plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$')
plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$')

Efy = B @ y
plt.hlines(Efy[0], 0, N//2-1, ls='--', color='b', label='$Ef_{1}|y$')
plt.hlines(Efy[1], N//2, N-1, ls='-.', color='b', label='$Ef_{2}|y$')
plt.legend()

plt.show()

The covariance matrix of 𝑌 ̂ can be computed by first constructing the covariance matrix of 𝜖 and then use the upper left
block for 𝜖1 and 𝜖2 .

Σεjk = (P.T @ Σy @ P)[:2, :2]

Pjk = P[:, :2]

Σy_hat = Pjk @ Σεjk @ Pjk.T


print('Σy_hat = \n', Σy_hat)

12.14. PCA and Factor Analysis 245


Intermediate Quantitative Economics with Python

Σy_hat =
[[1.05 1.05 1.05 1.05 1.05 0. 0. 0. 0. 0. ]
[1.05 1.05 1.05 1.05 1.05 0. 0. 0. 0. 0. ]
[1.05 1.05 1.05 1.05 1.05 0. 0. 0. 0. 0. ]
[1.05 1.05 1.05 1.05 1.05 0. 0. 0. 0. 0. ]
[1.05 1.05 1.05 1.05 1.05 0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 0. 1.05 1.05 1.05 1.05 1.05]
[0. 0. 0. 0. 0. 1.05 1.05 1.05 1.05 1.05]
[0. 0. 0. 0. 0. 1.05 1.05 1.05 1.05 1.05]
[0. 0. 0. 0. 0. 1.05 1.05 1.05 1.05 1.05]
[0. 0. 0. 0. 0. 1.05 1.05 1.05 1.05 1.05]]

246 Chapter 12. Multivariate Normal Distribution


CHAPTER

THIRTEEN

FAULT TREE UNCERTAINTIES

13.1 Overview

This lecture puts elementary tools to work to approximate probability distributions of the annual failure rates of a system
consisting of a number of critical parts.
We’ll use log normal distributions to approximate probability distributions of critical component parts.
To approximate the probability distribution of the sum of 𝑛 log normal probability distributions that describes the failure
rate of the entire system, we’ll compute the convolution of those 𝑛 log normal probability distributions.
We’ll use the following concepts and tools:
• log normal distributions
• the convolution theorem that describes the probability distribution of the sum independent random variables
• fault tree analysis for approximating a failure rate of a multi-component system
• a hierarchical probability model for describing uncertain probabilities
• Fourier transforms and inverse Fourier tranforms as efficient ways of computing convolutions of sequences
For more about Fourier transforms see this quantecon lecture Circulant Matrices as well as these lecture Covariance
Stationary Processes and Estimation of Spectra.
El-Shanawany, Ardron, and Walker [El-Shanawany et al., 2018] and Greenfield and Sargent [Greenfield and Sargent,
1993] used some of the methods described here to approximate probabilities of failures of safety systems in nuclear
facilities.
These methods respond to some of the recommendations made by Apostolakis [Apostolakis, 1990] for constructing
procedures for quantifying uncertainty about the reliability of a safety system.
We’ll start by bringing in some Python machinery.

!pip install tabulate

Requirement already satisfied: tabulate in /opt/conda/envs/quantecon/lib/python3.


↪11/site-packages (0.9.0)

WARNING: Running pip as the 'root' user can result in broken permissions and␣
↪conflicting behaviour with the system package manager. It is recommended to use␣

↪a virtual environment instead: https://pip.pypa.io/warnings/venv

247
Intermediate Quantitative Economics with Python

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import fftconvolve
from tabulate import tabulate
import time

np.set_printoptions(precision=3, suppress=True)

13.2 Log normal distribution

If a random variable 𝑥 follows a normal distribution with mean 𝜇 and variance 𝜎2 , then the natural logarithm of 𝑥, say
𝑦 = log(𝑥), follows a log normal distribution with parameters 𝜇, 𝜎2 .
Notice that we said parameters and not mean and variance 𝜇, 𝜎2 .
• 𝜇 and 𝜎2 are the mean and variance of 𝑥 = exp(𝑦)
• they are not the mean and variance of 𝑦
1 2 2 2
• instead, the mean of 𝑦 is 𝑒𝜇+ 2 𝜎 and the variance of 𝑦 is (𝑒𝜎 − 1)𝑒2𝜇+𝜎
A log normal random variable 𝑦 is nonnegative.
The density for a log normal random variate 𝑦 is

1 −(log 𝑦 − 𝜇)2
𝑓(𝑦) = √ exp ( )
𝑦𝜎 2𝜋 2𝜎2

for 𝑦 ≥ 0.
Important features of a log normal random variable are
1 2
mean: 𝑒𝜇+ 2 𝜎
2 2
variance: (𝑒𝜎 − 1)𝑒2𝜇+𝜎
median: 𝑒𝜇
2
mode: 𝑒𝜇−𝜎
.95 quantile: 𝑒𝜇+1.645𝜎
.95-.05 quantile ratio: 𝑒1.645𝜎

Recall the following stability property of two independent normally distributed random variables:
If 𝑥1 is normal with mean 𝜇1 and variance 𝜎12 and 𝑥2 is independent of 𝑥1 and normal with mean 𝜇2 and variance 𝜎22 ,
then 𝑥1 + 𝑥2 is normally distributed with mean 𝜇1 + 𝜇2 and variance 𝜎12 + 𝜎22 .
Independent log normal distributions have a different stability property.
The product of independent log normal random variables is also log normal.
In particular, if 𝑦1 is log normal with parameters (𝜇1 , 𝜎12 ) and 𝑦2 is log normal with parameters (𝜇2 , 𝜎22 ), then the product
𝑦1 𝑦2 is log normal with parameters (𝜇1 + 𝜇2 , 𝜎12 + 𝜎22 ).

Note: While the product of two log normal distributions is log normal, the sum of two log normal distributions is not
log normal.

248 Chapter 13. Fault Tree Uncertainties


Intermediate Quantitative Economics with Python

This observation sets the stage for challenge that confronts us in this lecture, namely, to approximate probability distri-
butions of sums of independent log normal random variables.
To compute the probability distribution of the sum of two log normal distributions, we can use the following convolution
property of a probability distribution that is a sum of independent random variables.

13.3 The Convolution Property

Let 𝑥 be a random variable with probability density 𝑓(𝑥), where 𝑥 ∈ R.


Let 𝑦 be a random variable with probability density 𝑔(𝑦), where 𝑦 ∈ R.
Let 𝑥 and 𝑦 be independent random variables and let 𝑧 = 𝑥 + 𝑦 ∈ R.
Then the probability distribution of 𝑧 is

ℎ(𝑧) = (𝑓 ∗ 𝑔)(𝑧) ≡ ∫ 𝑓(𝑧)𝑔(𝑧 − 𝜏 )𝑑𝜏
−∞

where (𝑓 ∗ 𝑔) denotes the convolution of the two functions 𝑓 and 𝑔.


If the random variables are both nonnegative, then the above formula specializes to

ℎ(𝑧) = (𝑓 ∗ 𝑔)(𝑧) ≡ ∫ 𝑓(𝑧)𝑔(𝑧 − 𝜏 )𝑑𝜏
0

Below, we’ll use a discretized version of the preceding formula.


In particular, we’ll replace both 𝑓 and 𝑔 with discretized counterparts, normalized to sum to 1 so that they are probability
distributions.
• by discretized we mean an equally spaced sampled version
Then we’ll use the following version of the above formula

ℎ𝑛 = (𝑓 ∗ 𝑔)𝑛 = ∑ 𝑓𝑚 𝑔𝑛−𝑚 , 𝑛 ≥ 0
𝑚=0

to compute a discretized version of the probability distribution of the sum of two random variables, one with probability
mass function 𝑓, the other with probability mass function 𝑔.
Before applying the convolution property to sums of log normal distributions, let’s practice on some simple discrete
distributions.
To take one example, let’s consider the following two probability distributions

𝑓𝑗 = Prob(𝑋 = 𝑗), 𝑗 = 0, 1

and

𝑔𝑗 = Prob(𝑌 = 𝑗), 𝑗 = 0, 1, 2, 3

and

ℎ𝑗 = Prob(𝑍 ≡ 𝑋 + 𝑌 = 𝑗), 𝑗 = 0, 1, 2, 3, 4

The convolution property tells us that

ℎ=𝑓 ∗𝑔 =𝑔∗𝑓

Let’s compute an example using the numpy.convolve and scipy.signal.fftconvolve.

13.3. The Convolution Property 249


Intermediate Quantitative Economics with Python

f = [.75, .25]
g = [0., .6, 0., .4]
h = np.convolve(f,g)
hf = fftconvolve(f,g)

print("f = ", f, ", np.sum(f) = ", np.sum(f))


print("g = ", g, ", np.sum(g) = ", np.sum(g))
print("h = ", h, ", np.sum(h) = ", np.sum(h))
print("hf = ", hf, ",np.sum(hf) = ", np.sum(hf))

f = [0.75, 0.25] , np.sum(f) = 1.0


g = [0.0, 0.6, 0.0, 0.4] , np.sum(g) = 1.0
h = [0. 0.45 0.15 0.3 0.1 ] , np.sum(h) = 1.0
hf = [0. 0.45 0.15 0.3 0.1 ] ,np.sum(hf) = 1.0000000000000002

A little later we’ll explain some advantages that come from using scipy.signal.ftconvolve rather than numpy.
convolve.numpy program convolve.
They provide the same answers but scipy.signal.ftconvolve is much faster.
That’s why we rely on it later in this lecture.

13.4 Approximating Distributions

We’ll construct an example to verify that discretized distributions can do a good job of approximating samples drawn
from underlying continuous distributions.
We’ll start by generating samples of size 25000 of three independent log normal random variates as well as pairwise and
triple-wise sums.
Then we’ll plot histograms and compare them with convolutions of appropriate discretized log normal distributions.

## create sums of two and three log normal random variates ssum2 = s1 + s2 and ssum3␣
↪= s1 + s2 + s3

mu1, sigma1 = 5., 1. # mean and standard deviation


s1 = np.random.lognormal(mu1, sigma1, 25000)

mu2, sigma2 = 5., 1. # mean and standard deviation


s2 = np.random.lognormal(mu2, sigma2, 25000)

mu3, sigma3 = 5., 1. # mean and standard deviation


s3 = np.random.lognormal(mu3, sigma3, 25000)

ssum2 = s1 + s2

ssum3 = s1 + s2 + s3

count, bins, ignored = plt.hist(s1, 1000, density=True, align='mid')

250 Chapter 13. Fault Tree Uncertainties


Intermediate Quantitative Economics with Python

count, bins, ignored = plt.hist(ssum2, 1000, density=True, align='mid')

13.4. Approximating Distributions 251


Intermediate Quantitative Economics with Python

count, bins, ignored = plt.hist(ssum3, 1000, density=True, align='mid')

samp_mean2 = np.mean(s2)
pop_mean2 = np.exp(mu2+ (sigma2**2)/2)

pop_mean2, samp_mean2, mu2, sigma2

(244.69193226422038, 245.39218776762786, 5.0, 1.0)

Here are helper functions that create a discretized version of a log normal probability density function.

def p_log_normal(x,μ,σ):
p = 1 / (σ*x*np.sqrt(2*np.pi)) * np.exp(-1/2*((np.log(x) - μ)/σ)**2)
return p

def pdf_seq(μ,σ,I,m):
x = np.arange(1e-7,I,m)
p_array = p_log_normal(x,μ,σ)
p_array_norm = p_array/np.sum(p_array)
return p_array,p_array_norm,x

Now we shall set a grid length 𝐼 and a grid increment size 𝑚 = 1 for our discretizations.

Note: We set 𝐼 equal to a power of two because we want to be free to use a Fast Fourier Transform to compute a
convolution of two sequences (discrete distributions).

We recommend experimenting with different values of the power 𝑝 of 2.

252 Chapter 13. Fault Tree Uncertainties


Intermediate Quantitative Economics with Python

Setting it to 15 rather than 12, for example, improves how well the discretized probability mass function approximates
the original continuous probability density function being studied.

p=15
I = 2**p # Truncation value
m = .1 # increment size

## Cell to check -- note what happens when don't normalize!


## things match up without adjustment. Compare with above

p1,p1_norm,x = pdf_seq(mu1,sigma1,I,m)
## compute number of points to evaluate the probability mass function
NT = x.size

plt.figure(figsize = (8,8))
plt.subplot(2,1,1)
plt.plot(x[:int(NT)],p1[:int(NT)],label = '')
plt.xlim(0,2500)
count, bins, ignored = plt.hist(s1, 1000, density=True, align='mid')

plt.show()

# Compute mean from discretized pdf and compare with the theoretical value

mean= np.sum(np.multiply(x[:NT],p1_norm[:NT]))
meantheory = np.exp(mu1+.5*sigma1**2)
mean, meantheory

(244.69059898302908, 244.69193226422038)

13.4. Approximating Distributions 253


Intermediate Quantitative Economics with Python

13.5 Convolving Probability Mass Functions

Now let’s use the convolution theorem to compute the probability distribution of a sum of the two log normal random
variables we have parameterized above.
We’ll also compute the probability of a sum of three log normal distributions constructed above.
Before we do these things, we shall explain our choice of Python algorithm to compute a convolution of two sequences.
Because the sequences that we convolve are long, we use the scipy.signal.fftconvolve function rather than
the numpy.convove function.
These two functions give virtually equivalent answers but for long sequences scipy.signal.fftconvolve is much
faster.
The program scipy.signal.fftconvolve uses fast Fourier transforms and their inverses to calculate convolu-
tions.
Let’s define the Fourier transform and the inverse Fourier transform.
The Fourier transform of a sequence {𝑥𝑡 }𝑇𝑡=0
−1
is a sequence of complex numbers {𝑥(𝜔𝑗 )}𝑇𝑗=0
−1
given by

𝑇 −1
𝑥(𝜔𝑗 ) = ∑ 𝑥𝑡 exp(−𝑖𝜔𝑗 𝑡) (13.1)
𝑡=0

2𝜋𝑗
where 𝜔𝑗 = 𝑇 for 𝑗 = 0, 1, … , 𝑇 − 1.
The inverse Fourier transform of the sequence {𝑥(𝜔𝑗 )}𝑇𝑗=0
−1
is

𝑇 −1
𝑥𝑡 = 𝑇 −1 ∑ 𝑥(𝜔𝑗 ) exp(𝑖𝜔𝑗 𝑡) (13.2)
𝑗=0

The sequences {𝑥𝑡 }𝑇𝑡=0


−1
and {𝑥(𝜔𝑗 )}𝑇𝑗=0
−1
contain the same information.
The pair of equations (13.1) and (13.2) tell how to recover one series from its Fourier partner.
The program scipy.signal.fftconvolve deploys the theorem that a convolution of two sequences {𝑓𝑘 }, {𝑔𝑘 }
can be computed in the following way:
• Compute Fourier transforms 𝐹 (𝜔), 𝐺(𝜔) of the {𝑓𝑘 } and {𝑔𝑘 } sequences, respectively
• Form the product 𝐻(𝜔) = 𝐹 (𝜔)𝐺(𝜔)
• The convolution of 𝑓 ∗ 𝑔 is the inverse Fourier transform of 𝐻(𝜔)
The fast Fourier transform and the associated inverse fast Fourier transform execute these calculations very quickly.
This is the algorithm that scipy.signal.fftconvolve uses.
Let’s do a warmup calculation that compares the times taken by numpy.convove and scipy.signal.
fftconvolve.

p1,p1_norm,x = pdf_seq(mu1,sigma1,I,m)
p2,p2_norm,x = pdf_seq(mu2,sigma2,I,m)
p3,p3_norm,x = pdf_seq(mu3,sigma3,I,m)

tic = time.perf_counter()

c1 = np.convolve(p1_norm,p2_norm)
c2 = np.convolve(c1,p3_norm)
(continues on next page)

254 Chapter 13. Fault Tree Uncertainties


Intermediate Quantitative Economics with Python

(continued from previous page)

toc = time.perf_counter()

tdiff1 = toc - tic

tic = time.perf_counter()

c1f = fftconvolve(p1_norm,p2_norm)
c2f = fftconvolve(c1f,p3_norm)
toc = time.perf_counter()

toc = time.perf_counter()

tdiff2 = toc - tic

print("time with np.convolve = ", tdiff1, "; time with fftconvolve = ", tdiff2)

time with np.convolve = 47.5052065660002 ; time with fftconvolve = 0.


↪16856744300002902

The fast Fourier transform is two orders of magnitude faster than numpy.convolve
Now let’s plot our computed probability mass function approximation for the sum of two log normal random variables
against the histogram of the sample that we formed above.

NT= np.size(x)

plt.figure(figsize = (8,8))
plt.subplot(2,1,1)
plt.plot(x[:int(NT)],c1f[:int(NT)]/m,label = '')
plt.xlim(0,5000)

count, bins, ignored = plt.hist(ssum2, 1000, density=True, align='mid')


# plt.plot(P2P3[:10000],label = 'FFT method',linestyle = '--')

plt.show()

13.5. Convolving Probability Mass Functions 255


Intermediate Quantitative Economics with Python

NT= np.size(x)
plt.figure(figsize = (8,8))
plt.subplot(2,1,1)
plt.plot(x[:int(NT)],c2f[:int(NT)]/m,label = '')
plt.xlim(0,5000)

count, bins, ignored = plt.hist(ssum3, 1000, density=True, align='mid')


# plt.plot(P2P3[:10000],label = 'FFT method',linestyle = '--')

plt.show()

## Let's compute the mean of the discretized pdf


mean= np.sum(np.multiply(x[:NT],c1f[:NT]))
# meantheory = np.exp(mu1+.5*sigma1**2)
mean, 2*meantheory

(489.3810974093853, 489.38386452844077)

## Let's compute the mean of the discretized pdf


mean= np.sum(np.multiply(x[:NT],c2f[:NT]))
# meantheory = np.exp(mu1+.5*sigma1**2)
mean, 3*meantheory

(734.0714863312272, 734.0757967926611)

256 Chapter 13. Fault Tree Uncertainties


Intermediate Quantitative Economics with Python

13.6 Failure Tree Analysis

We shall soon apply the convolution theorem to compute the probability of a top event in a failure tree analysis.
Before applying the convolution theorem, we first describe the model that connects constituent events to the top end whose
failure rate we seek to quantify.
The model is an example of the widely used failure tree analysis described by El-Shanawany, Ardron, and Walker
[El-Shanawany et al., 2018].
To construct the statistical model, we repeatedly use what is called the rare event approximation.
We want to compute the probabilty of an event 𝐴 ∪ 𝐵.
• the union 𝐴 ∪ 𝐵 is the event that 𝐴 OR 𝐵 occurs
A law of probability tells us that 𝐴 OR 𝐵 occurs with probability

𝑃 (𝐴 ∪ 𝐵) = 𝑃 (𝐴) + 𝑃 (𝐵) − 𝑃 (𝐴 ∩ 𝐵)

where the intersection 𝐴 ∩ 𝐵 is the event that 𝐴 AND 𝐵 both occur and the union 𝐴 ∪ 𝐵 is the event that 𝐴 OR 𝐵
occurs.
If 𝐴 and 𝐵 are independent, then

𝑃 (𝐴 ∩ 𝐵) = 𝑃 (𝐴)𝑃 (𝐵)

If 𝑃 (𝐴) and 𝑃 (𝐵) are both small, then 𝑃 (𝐴)𝑃 (𝐵) is even smaller.
The rare event approximation is

𝑃 (𝐴 ∪ 𝐵) ≈ 𝑃 (𝐴) + 𝑃 (𝐵)

This approximation is widely used in evaluating system failures.

13.7 Application

A system has been designed with the feature a system failure occurs when any of 𝑛 critical components fails.
The failure probability 𝑃 (𝐴𝑖 ) of each event 𝐴𝑖 is small.
We assume that failures of the components are statistically independent random variables.
We repeatedly apply a rare event approximation to obtain the following formula for the problem of a system failure:

𝑃 (𝐹 ) ≈ 𝑃 (𝐴1 ) + 𝑃 (𝐴2 ) + ⋯ + 𝑃 (𝐴𝑛 )

or
𝑛
𝑃 (𝐹 ) ≈ ∑ 𝑃 (𝐴𝑖 ) (13.3)
𝑖=1

Probabilities for each event are recorded as failure rates per year.

13.6. Failure Tree Analysis 257


Intermediate Quantitative Economics with Python

13.8 Failure Rates Unknown

Now we come to the problem that really interests us, following [El-Shanawany et al., 2018] and Greenfield and Sargent
[Greenfield and Sargent, 1993] in the spirit of Apostolakis [Apostolakis, 1990].
The constituent probabilities or failure rates 𝑃 (𝐴𝑖 ) are not known a priori and have to be estimated.
We address this problem by specifying probabilities of probabilities that capture one notion of not knowing the con-
stituent probabilities that are inputs into a failure tree analysis.
Thus, we assume that a system analyst is uncertain about the failure rates 𝑃 (𝐴𝑖 ), 𝑖 = 1, … , 𝑛 for components of a system.
The analyst copes with this situation by regarding the systems failure probability 𝑃 (𝐹 ) and each of the component prob-
abilities 𝑃 (𝐴𝑖 ) as random variables.
• dispersions of the probability distribution of 𝑃 (𝐴𝑖 ) characterizes the analyst’s uncertainty about the failure prob-
ability 𝑃 (𝐴𝑖 )
• the dispersion of the implied probability distribution of 𝑃 (𝐹 ) characterizes his uncertainty about the probability
of a system’s failure.
This leads to what is sometimes called a hierarchical model in which the analyst has probabilities about the probabilities
𝑃 (𝐴𝑖 ).
The analyst formalizes his uncertainty by assuming that
• the failure probability 𝑃 (𝐴𝑖 ) is itself a log normal random variable with parameters (𝜇𝑖 , 𝜎𝑖 ).
• failure rates 𝑃 (𝐴𝑖 ) and 𝑃 (𝐴𝑗 ) are statistically independent for all pairs with 𝑖 ≠ 𝑗.
The analyst calibrates the parameters (𝜇𝑖 , 𝜎𝑖 ) for the failure events 𝑖 = 1, … , 𝑛 by reading reliability studies in engineering
papers that have studied historical failure rates of components that are as similar as possible to the components being used
in the system under study.
The analyst assumes that such information about the observed dispersion of annual failure rates, or times to failure, can
inform him of what to expect about parts’ performances in his system.
The analyst assumes that the random variables 𝑃 (𝐴𝑖 ) are statistically mutually independent.
The analyst wants to approximate a probability mass function and cumulative distribution function of the systems failure
probability 𝑃 (𝐹 ).
• We say probability mass function because of how we discretize each random variable, as described earlier.
The analyst calculates the probability mass function for the top event 𝐹 , i.e., a system failure, by repeatedly applying
the convolution theorem to compute the probability distribution of a sum of independent log normal random variables, as
described in equation (13.3).

13.9 Waste Hoist Failure Rate

We’ll take close to a real world example by assuming that 𝑛 = 14.


The example estimates the annual failure rate of a critical hoist at a nuclear waste facility.
A regulatory agency wants the sytem to be designed in a way that makes the failure rate of the top event small with high
probability.
This example is Design Option B-2 (Case I) described in Table 10 on page 27 of [Greenfield and Sargent, 1993].
The table describes parameters 𝜇𝑖 , 𝜎𝑖 for fourteen log normal random variables that consist of seven pairs of random
variables that are identically and independently distributed.

258 Chapter 13. Fault Tree Uncertainties


Intermediate Quantitative Economics with Python

• Within a pair, parameters 𝜇𝑖 , 𝜎𝑖 are the same


• As described in table 10 of [Greenfield and Sargent, 1993] p. 27, parameters of log normal distributions for the
seven unique probabilities 𝑃 (𝐴𝑖 ) have been calibrated to be the values in the following Python code:

mu1, sigma1 = 4.28, 1.1947


mu2, sigma2 = 3.39, 1.1947
mu3, sigma3 = 2.795, 1.1947
mu4, sigma4 = 2.717, 1.1947
mu5, sigma5 = 2.717, 1.1947
mu6, sigma6 = 1.444, 1.4632
mu7, sigma7 = -.040, 1.4632

Note: Because the failure rates are all very small, log normal distributions with the above parameter values actually
describe 𝑃 (𝐴𝑖 ) times 10−09 .

So the probabilities that we’ll put on the 𝑥 axis of the probability mass function and associated cumulative distribution
function should be multiplied by 10−09
To extract a table that summarizes computed quantiles, we’ll use a helper function

def find_nearest(array, value):


array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return idx

We compute the required thirteen convolutions in the following code.


(Please feel free to try different values of the power parameter 𝑝 that we use to set the number of points in our grid for
constructing the probability mass functions that discretize the continuous log normal distributions.)
We’ll plot a counterpart to the cumulative distribution function (CDF) in figure 5 on page 29 of [Greenfield and Sargent,
1993] and we’ll also present a counterpart to their Table 11 on page 28.

p=15
I = 2**p # Truncation value
m = .05 # increment size

p1,p1_norm,x = pdf_seq(mu1,sigma1,I,m)
p2,p2_norm,x = pdf_seq(mu2,sigma2,I,m)
p3,p3_norm,x = pdf_seq(mu3,sigma3,I,m)
p4,p4_norm,x = pdf_seq(mu4,sigma4,I,m)
p5,p5_norm,x = pdf_seq(mu5,sigma5,I,m)
p6,p6_norm,x = pdf_seq(mu6,sigma6,I,m)
p7,p7_norm,x = pdf_seq(mu7,sigma7,I,m)
p8,p8_norm,x = pdf_seq(mu7,sigma7,I,m)
p9,p9_norm,x = pdf_seq(mu7,sigma7,I,m)
p10,p10_norm,x = pdf_seq(mu7,sigma7,I,m)
p11,p11_norm,x = pdf_seq(mu7,sigma7,I,m)
p12,p12_norm,x = pdf_seq(mu7,sigma7,I,m)
p13,p13_norm,x = pdf_seq(mu7,sigma7,I,m)
p14,p14_norm,x = pdf_seq(mu7,sigma7,I,m)

(continues on next page)

13.9. Waste Hoist Failure Rate 259


Intermediate Quantitative Economics with Python

(continued from previous page)


tic = time.perf_counter()

c1 = fftconvolve(p1_norm,p2_norm)
c2 = fftconvolve(c1,p3_norm)
c3 = fftconvolve(c2,p4_norm)
c4 = fftconvolve(c3,p5_norm)
c5 = fftconvolve(c4,p6_norm)
c6 = fftconvolve(c5,p7_norm)
c7 = fftconvolve(c6,p8_norm)
c8 = fftconvolve(c7,p9_norm)
c9 = fftconvolve(c8,p10_norm)
c10 = fftconvolve(c9,p11_norm)
c11 = fftconvolve(c10,p12_norm)
c12 = fftconvolve(c11,p13_norm)
c13 = fftconvolve(c12,p14_norm)

toc = time.perf_counter()

tdiff13 = toc - tic

print("time for 13 convolutions = ", tdiff13)

time for 13 convolutions = 11.15301869599989

d13 = np.cumsum(c13)
Nx=int(1400)
plt.figure()
plt.plot(x[0:int(Nx/m)],d13[0:int(Nx/m)]) # show Yad this -- I multiplied by m --␣
↪step size

plt.hlines(0.5,min(x),Nx,linestyles='dotted',colors = {'black'})
plt.hlines(0.9,min(x),Nx,linestyles='dotted',colors = {'black'})
plt.hlines(0.95,min(x),Nx,linestyles='dotted',colors = {'black'})
plt.hlines(0.1,min(x),Nx,linestyles='dotted',colors = {'black'})
plt.hlines(0.05,min(x),Nx,linestyles='dotted',colors = {'black'})
plt.ylim(0,1)
plt.xlim(0,Nx)
plt.xlabel("$x10^{-9}$",loc = "right")
plt.show()

x_1 = x[find_nearest(d13,0.01)]
x_5 = x[find_nearest(d13,0.05)]
x_10 = x[find_nearest(d13,0.1)]
x_50 = x[find_nearest(d13,0.50)]
x_66 = x[find_nearest(d13,0.665)]
x_85 = x[find_nearest(d13,0.85)]
x_90 = x[find_nearest(d13,0.90)]
x_95 = x[find_nearest(d13,0.95)]
x_99 = x[find_nearest(d13,0.99)]
x_9978 = x[find_nearest(d13,0.9978)]

print(tabulate([
['1%',f"{x_1}"],
['5%',f"{x_5}"],
['10%',f"{x_10}"],
['50%',f"{x_50}"],
(continues on next page)

260 Chapter 13. Fault Tree Uncertainties


Intermediate Quantitative Economics with Python

(continued from previous page)


['66.5%',f"{x_66}"],
['85%',f"{x_85}"],
['90%',f"{x_90}"],
['95%',f"{x_95}"],
['99%',f"{x_99}"],
['99.78%',f"{x_9978}"]],
headers = ['Percentile', 'x * 1e-9']))

Percentile x * 1e-9
------------ ----------
1% 76.15
5% 106.5
10% 128.2
50% 260.55
66.5% 338.55
85% 509.4
90% 608.8
95% 807.6
99% 1470.2
99.78% 2474.85

The above table agrees closely with column 2 of Table 11 on p. 28 of of [Greenfield and Sargent, 1993].
Discrepancies are probably due to slight differences in the number of digits retained in inputting 𝜇𝑖 , 𝜎𝑖 , 𝑖 = 1, … , 14 and
in the number of points deployed in the discretizations.

13.9. Waste Hoist Failure Rate 261


Intermediate Quantitative Economics with Python

262 Chapter 13. Fault Tree Uncertainties


CHAPTER

FOURTEEN

INTRODUCTION TO ARTIFICIAL NEURAL NETWORKS

!pip install --upgrade jax jaxlib


!conda install -y -c plotly plotly plotly-orca retrying

Note: If you are running this on Google Colab the above cell will present an error. This is because Google Colab doesn’t
use Anaconda to manage the Python packages. However this lecture will still execute as Google Colab has plotly
installed.

14.1 Overview

Substantial parts of machine learning and artificial intelligence are about


• approximating an unknown function with a known function
• estimating the known function from a set of data on the left- and right-hand variables
This lecture describes the structure of a plain vanilla artificial neural network (ANN) of a type that is widely used to
approximate a function 𝑓 that maps 𝑥 in a space 𝑋 into 𝑦 in a space 𝑌 .
To introduce elementary concepts, we study an example in which 𝑥 and 𝑦 are scalars.
We’ll describe the following concepts that are brick and mortar for neural networks:
• a neuron
• an activation function
• a network of neurons
• A neural network as a composition of functions
• back-propagation and its relationship to the chain rule of differential calculus

263
Intermediate Quantitative Economics with Python

14.2 A Deep (but not Wide) Artificial Neural Network

We describe a “deep” neural network of “width” one.


Deep means that the network composes a large number of functions organized into nodes of a graph.
Width refers to the number of right hand side variables on the right hand side of the function being approximated.
Setting “width” to one means that the network composes just univariate functions.
Let 𝑥 ∈ ℝ be a scalar and 𝑦 ∈ ℝ be another scalar.
We assume that 𝑦 is a nonlinear function of 𝑥:

𝑦 = 𝑓(𝑥)

We want to approximate 𝑓(𝑥) with another function that we define recursively.


For a network of depth 𝑁 ≥ 1, each layer 𝑖 = 1, … 𝑁 consists of
• an input 𝑥𝑖
• an affine function 𝑤𝑖 𝑥𝑖 + 𝑏𝐼, where 𝑤𝑖 is a scalar weight placed on the input 𝑥𝑖 and 𝑏𝑖 is a scalar bias
• an activation function ℎ𝑖 that takes (𝑤𝑖 𝑥𝑖 + 𝑏𝑖 ) as an argument and produces an output 𝑥𝑖+1
An example of an activation function ℎ is the sigmoid function
1
ℎ(𝑧) =
1 + 𝑒−𝑧
Another popular activation function is the rectified linear unit (ReLU) function

ℎ(𝑧) = max(0, 𝑧)

Yet another activation function is the identity function

ℎ(𝑧) = 𝑧

As activation functions below, we’ll use the sigmoid function for layers 1 to 𝑁 − 1 and the identity function for layer 𝑁 .
̂ by proceeding as follows.
To approximate a function 𝑓(𝑥) we construct 𝑓(𝑥)
Let

𝑙𝑖 (𝑥) = 𝑤𝑖 𝑥 + 𝑏𝑖 .

We construct 𝑓 ̂ by iterating on compositions of functions ℎ𝑖 ∘ 𝑙𝑖 :

̂ =ℎ ∘𝑙 ∘ℎ
𝑓(𝑥) ≈ 𝑓(𝑥) 𝑁 𝑁 𝑁−1 ∘ 𝑙1 ∘ ⋯ ∘ ℎ1 ∘ 𝑙1 (𝑥)

If 𝑁 > 1, we call the right side a “deep” neural net.


The larger is the integer 𝑁 , the “deeper” is the neural net.
Evidently, if we know the parameters {𝑤𝑖 , 𝑏𝑖 }𝑁 ̂
𝑖=1 , then we can compute 𝑓(𝑥) for a given 𝑥 = 𝑥̃ by iterating on the
recursion

𝑥𝑖+1 = ℎ𝑖 ∘ 𝑙𝑖 (𝑥𝑖 ), , 𝑖 = 1, … 𝑁 (14.1)

starting from 𝑥1 = 𝑥.̃


The value of 𝑥𝑁+1 that emerges from this iterative scheme equals 𝑓(̂ 𝑥).
̃

264 Chapter 14. Introduction to Artificial Neural Networks


Intermediate Quantitative Economics with Python

14.3 Calibrating Parameters

We now consider a neural network like the one describe above with width 1, depth 𝑁 , and activation functions ℎ𝑖 for
1 ⩽ 𝑖 ⩽ 𝑁 that map ℝ into itself.
𝑁
Let {(𝑤𝑖 , 𝑏𝑖 )}𝑖=1 denote a sequence of weights and biases.

As mentioned above, for a given input 𝑥1 , our approximating function 𝑓 ̂ evaluated at 𝑥1 equals the “output” 𝑥𝑁+1 from
our network that can be computed by iterating on 𝑥𝑖+1 = ℎ𝑖 (𝑤𝑖 𝑥𝑖 + 𝑏𝑖 ).
For a given prediction 𝑦(𝑥)
̂ and target 𝑦 = 𝑓(𝑥), consider the loss function

1 2
ℒ (𝑦,̂ 𝑦) (𝑥) = (𝑦 ̂ − 𝑦) (𝑥).
2
𝑁
This criterion is a function of the parameters {(𝑤𝑖 , 𝑏𝑖 )}𝑖=1 and the point 𝑥.
We’re interested in solving the following problem:

min ∫ ℒ (𝑥𝑁+1 , 𝑦) (𝑥)𝑑𝜇(𝑥)


𝑁
{(𝑤𝑖 ,𝑏𝑖 )}𝑖=1

̂ to 𝑓(𝑥).
where 𝜇(𝑥) is some measure of points 𝑥 ∈ ℝ over which we want a good approximation 𝑓(𝑥)
Stack weights and biases into a vector of parameters 𝑝:

𝑤1
⎡𝑏 ⎤
⎢ 1⎥
⎢ 𝑤2 ⎥
𝑝 = ⎢ 𝑏2 ⎥
⎢ ⎥
⎢ ⋮ ⎥
⎢𝑤𝑁 ⎥
⎣ 𝑏𝑁 ⎦

Applying a “poor man’s version” of a stochastic gradient descent algorithm for finding a zero of a function leads to the
following update rule for parameters:

𝑑ℒ 𝑑𝑥𝑁+1
𝑝𝑘+1 = 𝑝𝑘 − 𝛼 (14.2)
𝑑𝑥𝑁+1 𝑑𝑝𝑘
𝑑ℒ
where 𝑑𝑥𝑁+1 = − (𝑥𝑁+1 − 𝑦) and 𝛼 > 0 is a step size.
(See this and this to gather insights about how stochastic gradient descent relates to Newton’s method.)
𝑑𝑥𝑁+1
To implement one step of this parameter update rule, we want the vector of derivatives 𝑑𝑝𝑘 .

In the neural network literature, this step is accomplished by what is known as back propagation.

14.4 Back Propagation and the Chain Rule

Thanks to properties of
• the chain and product rules for differentiation from differential calculus, and
• lower triangular matrices
back propagation can actually be accomplished in one step by

14.3. Calibrating Parameters 265


Intermediate Quantitative Economics with Python

• inverting a lower triangular matrix, and


• matrix multiplication
(This idea is from the last 7 minutes of this great youtube video by MIT’s Alan Edelman)

https://youtu.be/rZS2LGiurKY

Here goes.
Define the derivative of ℎ(𝑧) with respect to 𝑧 evaluated at 𝑧 = 𝑧𝑖 as 𝛿𝑖 :
𝑑
𝛿𝑖 = ℎ(𝑧)|𝑧=𝑧𝑖
𝑑𝑧
or

𝛿𝑖 = ℎ′ (𝑤𝑖 𝑥𝑖 + 𝑏𝑖 ) .

Repeated application of the chain rule and product rule to our recursion (14.1) allows us to obtain:

𝑑𝑥𝑖+1 = 𝛿𝑖 (𝑑𝑤𝑖 𝑥𝑖 + 𝑤𝑖 𝑑𝑥𝑖 + 𝑏𝑖 )

After imposing 𝑑𝑥1 = 0, we get the following system of equations:

𝑑𝑤1
⎛ ⎞ 0 0 0 0
𝑑𝑥2 𝛿1 𝑤1 𝛿1 0 0 0 ⎜ 𝑑𝑏1 ⎟ ⎛ 𝑑𝑥2
⎜ ⎟ 𝑤 0 0 0 ⎞

⎜ ⋮ ⎞
⎟= ⎛
⎜ 0 0 ⋱ 0 0 ⎞
⎟⎜⎜ ⋮ ⎟
⎟ +⎜
⎜ 2
⎜ 0 ⋱ 0 0 ⎟

⎟ ⎛
⎜ ⋮ ⎞

⎜ ⎟
⎝ 0 0 0 𝛿𝑁 𝑤𝑁 𝛿𝑁 ⎠ ⎜ 𝑑𝑤𝑁
⎝ 𝑑𝑥𝑁+1 ⎠ ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟ ⎟ ⎝ 𝑑𝑥𝑁+1 ⎠
⎝ 0 0 𝑤𝑁 0 ⎠
𝐷 ⎝ 𝑑𝑏𝑁 ⎠ ⏟⏟⏟⏟⏟⏟⏟⏟⏟
𝐿

or

𝑑𝑥 = 𝐷𝑑𝑝 + 𝐿𝑑𝑥

which implies that

𝑑𝑥 = (𝐼 − 𝐿)−1 𝐷𝑑𝑝

which in turn implies

𝑑𝑥𝑁+1 /𝑑𝑤1

⎜ 𝑑𝑥𝑁+1 /𝑑𝑏1 ⎞


⎜ ⎟
⎟ −1
⎜ ⋮ ⎟ = 𝑒𝑁 (𝐼 − 𝐿) 𝐷.

⎜ 𝑑𝑥𝑁+1 /𝑑𝑤𝑁 ⎟

⎝ 𝑑𝑥𝑁+1 /𝑑𝑏𝑁 ⎠
We can then solve the above problem by applying our update for 𝑝 multiple times for a collection of input-output pairs
𝑀
{(𝑥𝑖1 , 𝑦𝑖 )}𝑖=1 that we’ll call our “training set”.

14.5 Training Set

Choosing a training set amounts to a choice of measure 𝜇 in the above formulation of our function approximation problem
as a minimization problem.
In this spirit, we shall use a uniform grid of, say, 50 or 200 points.
There are many possible approaches to the minimization problem posed above:

266 Chapter 14. Introduction to Artificial Neural Networks


Intermediate Quantitative Economics with Python

• batch gradient descent in which you use an average gradient over the training set
• stochastic gradient descent in which you sample points randomly and use individual gradients
• something in-between (so-called “mini-batch gradient descent”)
The update rule (14.2) described above amounts to a stochastic gradient descent algorithm.

from IPython.display import Image


import jax.numpy as jnp
from jax import grad, jit, jacfwd, vmap
from jax import random
import jax
import plotly.graph_objects as go

# A helper function to randomly initialize weights and biases


# for a dense neural network layer
def random_layer_params(m, n, key, scale=1.):
w_key, b_key = random.split(key)
return scale * random.normal(w_key, (n, m)), scale * random.normal(b_key, (n,))

# Initialize all layers for a fully-connected neural network with sizes "sizes"
def init_network_params(sizes, key):
keys = random.split(key, len(sizes))
return [random_layer_params(m, n, k) for m, n, k in zip(sizes[:-1], sizes[1:],␣
↪keys)]

def compute_xδw_seq(params, x):


# Initialize arrays
δ = jnp.zeros(len(params))
xs = jnp.zeros(len(params) + 1)
ws = jnp.zeros(len(params))
bs = jnp.zeros(len(params))

h = jax.nn.sigmoid

xs = xs.at[0].set(x)
for i, (w, b) in enumerate(params[:-1]):
output = w * xs[i] + b
activation = h(output[0, 0])

# Store elements
δ = δ.at[i].set(grad(h)(output[0, 0]))
ws = ws.at[i].set(w[0, 0])
bs = bs.at[i].set(b[0])
xs = xs.at[i+1].set(activation)

final_w, final_b = params[-1]


preds = final_w * xs[-2] + final_b

# Store elements
δ = δ.at[-1].set(1.)
ws = ws.at[-1].set(final_w[0, 0])
bs = bs.at[-1].set(final_b[0])
xs = xs.at[-1].set(preds[0, 0])

return xs, δ, ws, bs


(continues on next page)

14.5. Training Set 267


Intermediate Quantitative Economics with Python

(continued from previous page)

def loss(params, x, y):


xs, δ, ws, bs = compute_xδw_seq(params, x)
preds = xs[-1]

return 1 / 2 * (y - preds) ** 2

# Parameters
N = 3 # Number of layers
layer_sizes = [1, ] * (N + 1)
param_scale = 0.1
step_size = 0.01
params = init_network_params(layer_sizes, random.PRNGKey(1))

An NVIDIA GPU may be present on this machine, but a CUDA-enabled jaxlib is not␣
↪installed. Falling back to cpu.

x = 5
y = 3
xs, δ, ws, bs = compute_xδw_seq(params, x)

dxs_ad = jacfwd(lambda params, x: compute_xδw_seq(params, x)[0], argnums=0)(params, x)


dxs_ad_mat = jnp.block([dx.reshape((-1, 1)) for dx_tuple in dxs_ad for dx in dx_tuple␣
↪])[1:]

jnp.block([[δ * xs[:-1]], [δ]])

Array([[8.5726520e-03, 4.0850646e-04, 6.1021698e-01],


[1.7145304e-03, 2.3785222e-01, 1.0000000e+00]], dtype=float32)

L = jnp.diag(δ * ws, k=-1)


L = L[1:, 1:]

D = jax.scipy.linalg.block_diag(*[row.reshape((1, 2)) for row in jnp.block([[δ * xs[:-


↪1]], [δ]]).T])

dxs_la = jax.scipy.linalg.solve_triangular(jnp.eye(N) - L, D, lower=True)

# Check that the `dx` generated by the linear algebra method


# are the same as the ones generated using automatic differentiation
jnp.max(jnp.abs(dxs_ad_mat - dxs_la))

Array(0., dtype=float32)

grad_loss_ad = jnp.block([dx.reshape((-1, 1)) for dx_tuple in grad(loss)(params, x,␣


↪y) for dx in dx_tuple ])

268 Chapter 14. Introduction to Artificial Neural Networks


Intermediate Quantitative Economics with Python

# Check that the gradient of the loss is the same for both approaches
jnp.max(jnp.abs(-(y - xs[-1]) * dxs_la[-1] - grad_loss_ad))

Array(1.4901161e-08, dtype=float32)

@jit
def update_ad(params, x, y):
grads = grad(loss)(params, x, y)
return [(w - step_size * dw, b - step_size * db)
for (w, b), (dw, db) in zip(params, grads)]

@jit
def update_la(params, x, y):
xs, δ, ws, bs = compute_xδw_seq(params, x)
N = len(params)
L = jnp.diag(δ * ws, k=-1)
L = L[1:, 1:]

D = jax.scipy.linalg.block_diag(*[row.reshape((1, 2)) for row in jnp.block([[δ *␣


↪xs[:-1]], [δ]]).T])

dxs_la = jax.scipy.linalg.solve_triangular(jnp.eye(N) - L, D, lower=True)

grads = -(y - xs[-1]) * dxs_la[-1]

return [(w - step_size * dw, b - step_size * db)


for (w, b), (dw, db) in zip(params, grads.reshape((-1, 2)))]

# Check that both updates are the same


update_la(params, x, y)

[(Array([[-1.3489482]], dtype=float32), Array([0.37956238], dtype=float32)),


(Array([[-0.00782906]], dtype=float32), Array([0.44972023], dtype=float32)),
(Array([[0.22937916]], dtype=float32), Array([-0.04793657], dtype=float32))]

update_ad(params, x, y)

[(Array([[-1.3489482]], dtype=float32), Array([0.37956238], dtype=float32)),


(Array([[-0.00782906]], dtype=float32), Array([0.44972023], dtype=float32)),
(Array([[0.22937916]], dtype=float32), Array([-0.04793657], dtype=float32))]

14.5. Training Set 269


Intermediate Quantitative Economics with Python

14.6 Example 1

Consider the function

𝑓 (𝑥) = −3𝑥 + 2

on [0.5, 3].
We use a uniform grid of 200 points and update the parameters for each point on the grid 300 times.
ℎ𝑖 is the sigmoid activation function for all layers except the final one for which we use the identity function and 𝑁 = 3.
Weights are initialized randomly.

def f(x):
return -3 * x + 2

M = 200
grid = jnp.linspace(0.5, 3, num=M)
f_val = f(grid)

indices = jnp.arange(M)
key = random.PRNGKey(0)

def train(params, grid, f_val, key, num_epochs=300):


for epoch in range(num_epochs):
key, _ = random.split(key)
random_permutation = random.permutation(random.PRNGKey(1), indices)
for x, y in zip(grid[random_permutation], f_val[random_permutation]):
params = update_la(params, x, y)

return params

# Parameters
N = 3 # Number of layers
layer_sizes = [1, ] * (N + 1)
params_ex1 = init_network_params(layer_sizes, key)

%%time
params_ex1 = train(params_ex1, grid, f_val, key, num_epochs=500)

CPU times: user 4.83 s, sys: 1.7 ms, total: 4.83 s


Wall time: 4.79 s

predictions = vmap(compute_xδw_seq, in_axes=(None, 0))(params_ex1, grid)[0][:, -1]

fig = go.Figure()
fig.add_trace(go.Scatter(x=grid, y=f_val, name=r'$-3x+2$'))
fig.add_trace(go.Scatter(x=grid, y=predictions, name='Approximation'))

# Export to PNG file


Image(fig.to_image(format="png"))
# fig.show() will provide interactive plot when running
# notebook locally

270 Chapter 14. Introduction to Artificial Neural Networks


Intermediate Quantitative Economics with Python

14.7 How Deep?

It is fun to think about how deepening the neural net for the above example affects the quality of approximation
• If the network is too deep, you’ll run into the vanishing gradient problem
• Other parameters such as the step size and the number of epochs can be as important or more important than the
number of layers in the situation considered in this lecture.
• Indeed, since 𝑓 is a linear function of 𝑥, a one-layer network with the identity map as an activation would probably
work best.

14.8 Example 2

We use the same setup as for the previous example with

𝑓 (𝑥) = log (𝑥)

def f(x):
return jnp.log(x)

grid = jnp.linspace(0.5, 3, num=M)


f_val = f(grid)

14.7. How Deep? 271


Intermediate Quantitative Economics with Python

# Parameters
N = 1 # Number of layers
layer_sizes = [1, ] * (N + 1)
params_ex2_1 = init_network_params(layer_sizes, key)

# Parameters
N = 2 # Number of layers
layer_sizes = [1, ] * (N + 1)
params_ex2_2 = init_network_params(layer_sizes, key)

# Parameters
N = 3 # Number of layers
layer_sizes = [1, ] * (N + 1)
params_ex2_3 = init_network_params(layer_sizes, key)

params_ex2_1 = train(params_ex2_1, grid, f_val, key, num_epochs=300)

params_ex2_2 = train(params_ex2_2, grid, f_val, key, num_epochs=300)

params_ex2_3 = train(params_ex2_3, grid, f_val, key, num_epochs=300)

predictions_1 = vmap(compute_xδw_seq, in_axes=(None, 0))(params_ex2_1, grid)[0][:, -1]


predictions_2 = vmap(compute_xδw_seq, in_axes=(None, 0))(params_ex2_2, grid)[0][:, -1]
predictions_3 = vmap(compute_xδw_seq, in_axes=(None, 0))(params_ex2_3, grid)[0][:, -1]

fig = go.Figure()
fig.add_trace(go.Scatter(x=grid, y=f_val, name=r'$\log{x}$'))
fig.add_trace(go.Scatter(x=grid, y=predictions_1, name='One-layer neural network'))
fig.add_trace(go.Scatter(x=grid, y=predictions_2, name='Two-layer neural network'))
fig.add_trace(go.Scatter(x=grid, y=predictions_3, name='Three-layer neural network'))

# Export to PNG file


Image(fig.to_image(format="png"))
# fig.show() will provide interactive plot when running
# notebook locally

272 Chapter 14. Introduction to Artificial Neural Networks


Intermediate Quantitative Economics with Python

## to check that gpu is activated in environment

from jax.lib import xla_bridge


print(xla_bridge.get_backend().platform)

cpu

Note: Cloud Environment: This lecture site is built in a server environment that doesn’t have access to a gpu If you
run this lecture locally this lets you know where your code is being executed, either via the cpu or the gpu

14.8. Example 2 273


Intermediate Quantitative Economics with Python

274 Chapter 14. Introduction to Artificial Neural Networks


CHAPTER

FIFTEEN

RANDOMIZED RESPONSE SURVEYS

15.1 Overview

Social stigmas can inhibit people from confessing potentially embarrassing activities or opinions.
When people are reluctant to participate a sample survey about personally sensitive issues, they might decline to partici-
pate, and even if they do participate, they might choose to provide incorrect answers to sensitive questions.
These problems induce selection biases that present challenges to interpreting and designing surveys.
To illustrate how social scientists have thought about estimating the prevalence of such embarrassing activities and opin-
ions, this lecture describes a classic approach of S. L. Warner [Warner, 1965].
Warner used elementary probability to construct a way to protect the privacy of individual respondents to surveys while
still estimating the fraction of a collection of individuals who have a socially stigmatized characteristic or who engage in
a socially stigmatized activity.
Warner’s idea was to add noise between the respondent’s answer and the signal about that answer that the survey maker
ultimately receives.
Knowing about the structure of the noise assures the respondent that the survey maker does not observe his answer.
Statistical properties of the noise injection procedure provide the respondent plausible deniability.
Related ideas underlie modern differential privacy systems.
(See https://en.wikipedia.org/wiki/Differential_privacy)

15.2 Warner’s Strategy

As usual, let’s bring in the Python modules we’ll be using.

import numpy as np
import pandas as pd

Suppose that every person in population either belongs to Group A or Group B.


We want to estimate the proportion 𝜋 who belong to Group A while protecting individual respondents’ privacy.
Warner [Warner, 1965] proposed and analyzed the following procedure.
• A random sample of 𝑛 people is drawn with replacement from the population and each person is interviewed.
• Draw 𝑛 random samples from the population with replacement and interview each person.

275
Intermediate Quantitative Economics with Python

• Prepare a random spinner that with 𝑝 probability points to the Letter A and with (1 − 𝑝) probability points to the
Letter B.
• Each subject spins a random spinner and sees an outcome (A or B) that the interviewer does not observe.
• The subject states whether he belongs to the group to which the spinner points.
• If the spinner points to the group that the spinner belongs, the subject reports “yes”; otherwise he reports “no”.
• The subject answers the question truthfully.
Warner constructed a maximum likelihood estimators of the proportion of the population in set A.
Let
• 𝜋 : True probability of A in the population
• 𝑝 : Probability that the spinner points to A
1, if the 𝑖th subject says yes
• 𝑋𝑖 = {
0, if the 𝑖th subject says no
Index the sample set so that the first 𝑛1 report “yes”, while the second 𝑛 − 𝑛1 report “no”.
The likelihood function of a sample set is
𝑛1 𝑛−𝑛1
𝐿 = [𝜋𝑝 + (1 − 𝜋)(1 − 𝑝)] [(1 − 𝜋)𝑝 + 𝜋(1 − 𝑝)] (15.1)
The log of the likelihood function is:
log(𝐿) = 𝑛1 log [𝜋𝑝 + (1 − 𝜋)(1 − 𝑝)] + (𝑛 − 𝑛1 ) log [(1 − 𝜋)𝑝 + 𝜋(1 − 𝑝)] (15.2)
The first-order necessary condition for maximizing the log likelihood function with respect to 𝜋 is:
(𝑛 − 𝑛1 )(2𝑝 − 1) 𝑛1 (2𝑝 − 1)
=
(1 − 𝜋)𝑝 + 𝜋(1 − 𝑝) 𝜋𝑝 + (1 − 𝜋)(1 − 𝑝)
or
𝑛1
𝜋𝑝 + (1 − 𝜋)(1 − 𝑝) = (15.3)
𝑛
If 𝑝 ≠ 21 , then the maximum likelihood estimator (MLE) of 𝜋 is:
𝑝−1 𝑛1
𝜋̂ = + (15.4)
2𝑝 − 1 (2𝑝 − 1)𝑛
We compute the mean and variance of the MLE estimator 𝜋̂ to be:
1 1 𝑛
𝔼(𝜋)̂ = [𝑝 − 1 + ∑ 𝔼𝑋𝑖 ]
2𝑝 − 1 𝑛 𝑖=1
1 (15.5)
= [𝑝 − 1 + 𝜋𝑝 + (1 − 𝜋)(1 − 𝑝)]
2𝑝 − 1
=𝜋
and
𝑛𝑉 𝑎𝑟(𝑋𝑖 )
𝑉 𝑎𝑟(𝜋)̂ =
(2𝑝 − 1)2 𝑛2
[𝜋𝑝 + (1 − 𝜋)(1 − 𝑝)] [(1 − 𝜋)𝑝 + 𝜋(1 − 𝑝)]
=
(2𝑝 − 1)2 𝑛2
1
+ (2𝑝2 − 2𝑝 + 12 )(−2𝜋2 + 2𝜋 − 21 ) (15.6)
4
=
(2𝑝 − 1)2 𝑛2
1 1 1
= [ − (𝜋 − )2 ]
𝑛 16(𝑝 − 12 )2 2

276 Chapter 15. Randomized Response Surveys


Intermediate Quantitative Economics with Python

Equation (15.5) indicates that 𝜋̂ is an unbiased estimator of 𝜋 while equation (15.6) tell us the variance of the estimator.
To compute a confidence interval, first rewrite (15.6) as:
1 1
1
− (𝜋 − 12 )2 16(𝑝− 12 )2
− 4
𝑉 𝑎𝑟(𝜋)̂ = 4
+ (15.7)
𝑛 𝑛
This equation indicates that the variance of 𝜋̂ can be represented as a sum of the variance due to sampling plus the variance
due to the random device.
From the expressions above we can find that:
• When 𝑝 is 12 , expression (15.1) degenerates to a constant.
• When 𝑝 is 1 or 0, the randomized estimate degenerates to an estimator without randomized sampling.
We shall only discuss situations in which 𝑝 ∈ ( 12 , 1)
(a situation in which 𝑝 ∈ (0, 21 ) is symmetric).
From expressions (15.5) and (15.7) we can deduce that:
• The MSE of 𝜋̂ decreases as 𝑝 increases.

15.3 Comparing Two Survey Designs

Let’s compare the preceding randomized-response method with a stylized non-randomized response method.
In our non-randomized response method, we suppose that:
• Members of Group A tells the truth with probability 𝑇𝑎 while the members of Group B tells the truth with proba-
bility 𝑇𝑏
• 𝑌𝑖 is 1 or 0 according to whether the sample’s 𝑖th member’s report is in Group A or not.
Then we can estimate 𝜋 as:
𝑛
∑ 𝑌𝑖
𝜋̂ = 𝑖=1 (15.8)
𝑛
We calculate the expectation, bias, and variance of the estimator to be:

𝔼(𝜋)̂ = 𝜋𝑇𝑎 + [(1 − 𝜋)(1 − 𝑇𝑏 )] (15.9)

𝐵𝑖𝑎𝑠(𝜋)̂ = 𝔼(𝜋̂ − 𝜋)
(15.10)
= 𝜋[𝑇𝑎 + 𝑇𝑏 − 2] + [1 − 𝑇𝑏 ]
[𝜋𝑇𝑎 + (1 − 𝜋)(1 − 𝑇𝑏 )] [1 − 𝜋𝑇𝑎 − (1 − 𝜋)(1 − 𝑇𝑏 )]
𝑉 𝑎𝑟(𝜋)̂ = (15.11)
𝑛
It is useful to define a
Mean Square Error Randomized
MSE Ratio =
Mean Square Error Regular

We can compute MSE Ratios for different survey designs associated with different parameter values.
The following Python code computes objects we want to stare at in order to make comparisons under different values of
𝜋𝐴 and 𝑛:

15.3. Comparing Two Survey Designs 277


Intermediate Quantitative Economics with Python

class Comparison:
def __init__(self, A, n):
self.A = A
self.n = n
TaTb = np.array([[0.95, 1], [0.9, 1], [0.7, 1],
[0.5, 1], [1, 0.95], [1, 0.9],
[1, 0.7], [1, 0.5], [0.95, 0.95],
[0.9, 0.9], [0.7, 0.7], [0.5, 0.5]])
self.p_arr = np.array([0.6, 0.7, 0.8, 0.9])
self.p_map = dict(zip(self.p_arr, [f"MSE Ratio: p = {x}" for x in self.p_
↪arr]))

self.template = pd.DataFrame(columns=self.p_arr)
self.template[['T_a','T_b']] = TaTb
self.template['Bias'] = None

def theoretical(self):
A = self.A
n = self.n
df = self.template.copy()
df['Bias'] = A * (df['T_a'] + df['T_b'] - 2) + (1 - df['T_b'])
for p in self.p_arr:
df[p] = (1 / (16 * (p - 1/2)**2) - (A - 1/2)**2) / n / \
(df['Bias']**2 + ((A * df['T_a'] + (1 - A) * (1 - df['T_b'])) *␣
↪(1 - A * df['T_a'] - (1 - A) * (1 - df['T_b'])) / n))

df[p] = df[p].round(2)
df = df.set_index(["T_a", "T_b", "Bias"]).rename(columns=self.p_map)
return df

def MCsimulation(self, size=1000, seed=123456):


A = self.A
n = self.n
df = self.template.copy()
np.random.seed(seed)
sample = np.random.rand(size, self.n) <= A
random_device = np.random.rand(size, n)
mse_rd = {}
for p in self.p_arr:
spinner = random_device <= p
rd_answer = sample * spinner + (1 - sample) * (1 - spinner)
n1 = rd_answer.sum(axis=1)
pi_hat = (p - 1) / (2 * p - 1) + n1 / n / (2 * p - 1)
mse_rd[p] = np.sum((pi_hat - A)**2)
for inum, irow in df.iterrows():
truth_a = np.random.rand(size, self.n) <= irow.T_a
truth_b = np.random.rand(size, self.n) <= irow.T_b
trad_answer = sample * truth_a + (1 - sample) * (1 - truth_b)
pi_trad = trad_answer.sum(axis=1) / n
df.loc[inum, 'Bias'] = pi_trad.mean() - A
mse_trad = np.sum((pi_trad - A)**2)
for p in self.p_arr:
df.loc[inum, p] = (mse_rd[p] / mse_trad).round(2)
df = df.set_index(["T_a", "T_b", "Bias"]).rename(columns=self.p_map)
return df

Let’s put the code to work for parameter values


• 𝜋𝐴 = 0.6

278 Chapter 15. Randomized Response Surveys


Intermediate Quantitative Economics with Python

• 𝑛 = 1000
We can generate MSE Ratios theoretically using the above formulas.
We can also perform Monte Carlo simulations of a MSE Ratio.

cp1 = Comparison(0.6, 1000)


df1_theoretical = cp1.theoretical()
df1_theoretical

MSE Ratio: p = 0.6 MSE Ratio: p = 0.7 MSE Ratio: p = 0.8 \


T_a T_b Bias
0.95 1.00 -0.03 5.45 1.36 0.60
0.90 1.00 -0.06 1.62 0.40 0.18
0.70 1.00 -0.18 0.19 0.05 0.02
0.50 1.00 -0.30 0.07 0.02 0.01
1.00 0.95 0.02 9.82 2.44 1.08
0.90 0.04 3.41 0.85 0.37
0.70 0.12 0.43 0.11 0.05
0.50 0.20 0.16 0.04 0.02
0.95 0.95 -0.01 18.25 4.54 2.00
0.90 0.90 -0.02 9.70 2.41 1.06
0.70 0.70 -0.06 1.62 0.40 0.18
0.50 0.50 -0.10 0.61 0.15 0.07

MSE Ratio: p = 0.9


T_a T_b Bias
0.95 1.00 -0.03 0.33
0.90 1.00 -0.06 0.10
0.70 1.00 -0.18 0.01
0.50 1.00 -0.30 0.00
1.00 0.95 0.02 0.60
0.90 0.04 0.21
0.70 0.12 0.03
0.50 0.20 0.01
0.95 0.95 -0.01 1.11
0.90 0.90 -0.02 0.59
0.70 0.70 -0.06 0.10
0.50 0.50 -0.10 0.04

df1_mc = cp1.MCsimulation()
df1_mc

MSE Ratio: p = 0.6 MSE Ratio: p = 0.7 MSE Ratio: p = 0.8 \


T_a T_b Bias
0.95 1.00 -0.030060 5.76 1.36 0.63
0.90 1.00 -0.060045 1.73 0.41 0.19
0.70 1.00 -0.179530 0.21 0.05 0.02
0.50 1.00 -0.300077 0.07 0.02 0.01
1.00 0.95 0.019770 10.59 2.5 1.15
0.90 0.040050 3.63 0.86 0.39
0.70 0.120052 0.46 0.11 0.05
0.50 0.199746 0.17 0.04 0.02
0.95 0.95 -0.010137 18.65 4.41 2.02
0.90 0.90 -0.020103 10.48 2.48 1.14
0.70 0.70 -0.060488 1.71 0.4 0.19
(continues on next page)

15.3. Comparing Two Survey Designs 279


Intermediate Quantitative Economics with Python

(continued from previous page)


0.50 0.50 -0.099341 0.66 0.16 0.07

MSE Ratio: p = 0.9


T_a T_b Bias
0.95 1.00 -0.030060 0.35
0.90 1.00 -0.060045 0.1
0.70 1.00 -0.179530 0.01
0.50 1.00 -0.300077 0.0
1.00 0.95 0.019770 0.64
0.90 0.040050 0.22
0.70 0.120052 0.03
0.50 0.199746 0.01
0.95 0.95 -0.010137 1.12
0.90 0.90 -0.020103 0.63
0.70 0.70 -0.060488 0.1
0.50 0.50 -0.099341 0.04

The theoretical calculations do a good job of predicting Monte Carlo results.


We see that in many situations, especially when the bias is not small, the MSE of the randomized-sampling methods is
smaller than that of the non-randomized sampling method.
These differences become larger as 𝑝 increases.
By adjusting parameters 𝜋𝐴 and 𝑛, we can study outcomes in different situations.
For example, for another situation described in Warner [Warner, 1965]:
• 𝜋𝐴 = 0.5
• 𝑛 = 1000
we can use the code

cp2 = Comparison(0.5, 1000)


df2_theoretical = cp2.theoretical()
df2_theoretical

MSE Ratio: p = 0.6 MSE Ratio: p = 0.7 MSE Ratio: p = 0.8 \


T_a T_b Bias
0.95 1.00 -0.025 7.15 1.79 0.79
0.90 1.00 -0.050 2.27 0.57 0.25
0.70 1.00 -0.150 0.27 0.07 0.03
0.50 1.00 -0.250 0.10 0.02 0.01
1.00 0.95 0.025 7.15 1.79 0.79
0.90 0.050 2.27 0.57 0.25
0.70 0.150 0.27 0.07 0.03
0.50 0.250 0.10 0.02 0.01
0.95 0.95 0.000 25.00 6.25 2.78
0.90 0.90 0.000 25.00 6.25 2.78
0.70 0.70 0.000 25.00 6.25 2.78
0.50 0.50 0.000 25.00 6.25 2.78

MSE Ratio: p = 0.9


T_a T_b Bias
0.95 1.00 -0.025 0.45
0.90 1.00 -0.050 0.14
(continues on next page)

280 Chapter 15. Randomized Response Surveys


Intermediate Quantitative Economics with Python

(continued from previous page)


0.70 1.00 -0.150 0.02
0.50 1.00 -0.250 0.01
1.00 0.95 0.025 0.45
0.90 0.050 0.14
0.70 0.150 0.02
0.50 0.250 0.01
0.95 0.95 0.000 1.56
0.90 0.90 0.000 1.56
0.70 0.70 0.000 1.56
0.50 0.50 0.000 1.56

df2_mc = cp2.MCsimulation()
df2_mc

MSE Ratio: p = 0.6 MSE Ratio: p = 0.7 MSE Ratio: p = 0.8 \


T_a T_b Bias
0.95 1.00 -0.025230 7.0 1.69 0.75
0.90 1.00 -0.050279 2.23 0.54 0.24
0.70 1.00 -0.149866 0.27 0.07 0.03
0.50 1.00 -0.250211 0.1 0.02 0.01
1.00 0.95 0.024410 7.38 1.78 0.79
0.90 0.049839 2.26 0.54 0.24
0.70 0.149769 0.27 0.07 0.03
0.50 0.249851 0.1 0.02 0.01
0.95 0.95 -0.000260 24.29 5.86 2.59
0.90 0.90 -0.000109 25.73 6.2 2.74
0.70 0.70 -0.000439 25.75 6.21 2.74
0.50 0.50 0.000768 24.91 6.01 2.65

MSE Ratio: p = 0.9


T_a T_b Bias
0.95 1.00 -0.025230 0.44
0.90 1.00 -0.050279 0.14
0.70 1.00 -0.149866 0.02
0.50 1.00 -0.250211 0.01
1.00 0.95 0.024410 0.46
0.90 0.049839 0.14
0.70 0.149769 0.02
0.50 0.249851 0.01
0.95 0.95 -0.000260 1.52
0.90 0.90 -0.000109 1.61
0.70 0.70 -0.000439 1.61
0.50 0.50 0.000768 1.56

We can also revisit a calculation in the concluding section of Warner [Warner, 1965] in which
• 𝜋𝐴 = 0.6
• 𝑛 = 2000
We use the code

cp3 = Comparison(0.6, 2000)


df3_theoretical = cp3.theoretical()
df3_theoretical

15.3. Comparing Two Survey Designs 281


Intermediate Quantitative Economics with Python

MSE Ratio: p = 0.6 MSE Ratio: p = 0.7 MSE Ratio: p = 0.8 \


T_a T_b Bias
0.95 1.00 -0.03 3.05 0.76 0.33
0.90 1.00 -0.06 0.84 0.21 0.09
0.70 1.00 -0.18 0.10 0.02 0.01
0.50 1.00 -0.30 0.03 0.01 0.00
1.00 0.95 0.02 6.03 1.50 0.66
0.90 0.04 1.82 0.45 0.20
0.70 0.12 0.22 0.05 0.02
0.50 0.20 0.08 0.02 0.01
0.95 0.95 -0.01 14.12 3.51 1.55
0.90 0.90 -0.02 5.98 1.49 0.66
0.70 0.70 -0.06 0.84 0.21 0.09
0.50 0.50 -0.10 0.31 0.08 0.03

MSE Ratio: p = 0.9


T_a T_b Bias
0.95 1.00 -0.03 0.19
0.90 1.00 -0.06 0.05
0.70 1.00 -0.18 0.01
0.50 1.00 -0.30 0.00
1.00 0.95 0.02 0.37
0.90 0.04 0.11
0.70 0.12 0.01
0.50 0.20 0.00
0.95 0.95 -0.01 0.86
0.90 0.90 -0.02 0.36
0.70 0.70 -0.06 0.05
0.50 0.50 -0.10 0.02

df3_mc = cp3.MCsimulation()
df3_mc

MSE Ratio: p = 0.6 MSE Ratio: p = 0.7 MSE Ratio: p = 0.8 \


T_a T_b Bias
0.95 1.00 -0.030316 3.27 0.8 0.34
0.90 1.00 -0.060352 0.91 0.22 0.09
0.70 1.00 -0.180087 0.11 0.03 0.01
0.50 1.00 -0.299849 0.04 0.01 0.0
1.00 0.95 0.019734 6.7 1.64 0.69
0.90 0.039766 2.01 0.49 0.21
0.70 0.119789 0.24 0.06 0.02
0.50 0.200138 0.09 0.02 0.01
0.95 0.95 -0.010475 14.78 3.61 1.53
0.90 0.90 -0.020373 6.32 1.54 0.65
0.70 0.70 -0.059945 0.92 0.23 0.1
0.50 0.50 -0.100103 0.34 0.08 0.03

MSE Ratio: p = 0.9


T_a T_b Bias
0.95 1.00 -0.030316 0.19
0.90 1.00 -0.060352 0.05
0.70 1.00 -0.180087 0.01
0.50 1.00 -0.299849 0.0
1.00 0.95 0.019734 0.39
(continues on next page)

282 Chapter 15. Randomized Response Surveys


Intermediate Quantitative Economics with Python

(continued from previous page)


0.90 0.039766 0.12
0.70 0.119789 0.01
0.50 0.200138 0.0
0.95 0.95 -0.010475 0.85
0.90 0.90 -0.020373 0.36
0.70 0.70 -0.059945 0.05
0.50 0.50 -0.100103 0.02

Evidently, as 𝑛 increases, the randomized response method does better performance in more situations.

15.4 Concluding Remarks

This QuantEcon lecture describes some alternative randomized response surveys.


That lecture presents a utilitarian analysis of those alternatives conducted by Lars Ljungqvist [Ljungqvist, 1993].

import matplotlib.pyplot as plt


import numpy as np

15.4. Concluding Remarks 283


Intermediate Quantitative Economics with Python

284 Chapter 15. Randomized Response Surveys


CHAPTER

SIXTEEN

EXPECTED UTILITIES OF RANDOM RESPONSES

16.1 Overview

This QuantEcon lecture describes randomized response surveys in the tradition of Warner [Warner, 1965] that are designed
to protect respondents’ privacy.
Lars Ljungqvist [Ljungqvist, 1993] analyzed how a respondent’s decision about whether to answer truthfully depends on
expected utility.
The lecture tells how Ljungqvist used his framework to shed light on alternative randomized response survey techniques
proposed, for example, by [Lanke, 1975], [Lanke, 1976], [Leysieffer and Warner, 1976], [Anderson, 1976], [Fligner et
al., 1977], [Greenberg et al., 1977], [Greenberg et al., 1969].

16.2 Privacy Measures

We consider randomized response models with only two possible answers, “yes” and “no.”
The design determines probabilities

Pr(yes|𝐴) = 1 − Pr(no|𝐴)
′ ′
Pr(yes|𝐴 ) = 1 − Pr(no|𝐴 )

These design probabilities in turn can be used to compute the conditional probability of belonging to the sensitive group
𝐴 for a given response, say 𝑟:

𝜋𝐴 Pr(𝑟|𝐴)
Pr(𝐴|𝑟) = (16.1)
𝜋𝐴 Pr(𝑟|𝐴) + (1 − 𝜋𝐴 )Pr(𝑟|𝐴′ )

16.3 Zoo of Concepts

At this point we describe some concepts proposed by various researchers

285
Intermediate Quantitative Economics with Python

16.3.1 Leysieffer and Warner(1976)



The response 𝑟 is regarded as jeopardizing with respect to 𝐴 or 𝐴 if
Pr(𝐴|𝑟) > 𝜋𝐴
or (16.2)

Pr(𝐴 |𝑟) > 1 − 𝜋𝐴
From Bayes’s rule:
Pr(𝐴|𝑟) (1 − 𝜋𝐴 ) Pr(𝑟|𝐴)
× = (16.3)
Pr(𝐴′ |𝑟) 𝜋𝐴 Pr(𝑟|𝐴′ )

If this expression is greater (less) than unity, it follows that 𝑟 is jeopardizing with respect to 𝐴(𝐴 ). Then, the natural
measure of jeopardy will be:
Pr(𝑟|𝐴)
𝑔(𝑟|𝐴) =
Pr(𝑟|𝐴′ )
and (16.4)

′ Pr(𝑟|𝐴 )
𝑔(𝑟|𝐴 ) =
Pr(𝑟|𝐴)

Suppose, without loss of generality, that Pr(yes|𝐴) > Pr(yes|𝐴 ), then a yes (no) answer is jeopardizing with respect

𝐴(𝐴 ), that is,
𝑔(yes|𝐴) > 1
and

𝑔(no|𝐴 ) > 1
Leysieffer and Warner proved that the variance of the estimate can only be decreased through an increase in one or both
of these two measures of jeopardy.
An efficient randomized response model is, therefore, any model that attains the maximum acceptable levels of jeopardy
that are consistent with cooperation of the respondents.
As a special example, Leysieffer and Warner considered “a problem in which there is no jeopardy in a no answer”; that

is, 𝑔(no|𝐴 ) can be of unlimited magnitude.
Evidently, an optimal design must have

Pr(yes|𝐴) = 1

which implies that

Pr(𝐴|no) = 0

16.3.2 Lanke(1976)

Lanke (1975) [Lanke, 1975] argued that “it is membership in Group A that people may want to hide, not membership in
the complementary Group A’.”
For that reason, Lanke (1976) [Lanke, 1976] argued that an appropriate measure of protection is to minimize

max {Pr(𝐴|yes), Pr(𝐴|no)} (16.5)

Holding this measure constant, he explained under what conditions the smallest variance of the estimate was achieved
with the unrelated question model or Warner’s (1965) original model.

286 Chapter 16. Expected Utilities of Random Responses


Intermediate Quantitative Economics with Python

16.3.3 2.3 Fligner, Policello, and Singh

Fligner, Policello, and Singh reached similar conclusion as Lanke (1976). [Fligner et al., 1977]
They measured “private protection” as
1 − max {Pr(𝐴|yes), Pr(𝐴|no)}
(16.6)
1 − 𝜋𝐴

16.3.4 2.4 Greenberg, Kuebler, Abernathy, and Horvitz (1977)

[Greenberg et al., 1977]


Greenberg, Kuebler, Abernathy, and Horvitz (1977) stressed the importance of examining the risk to respondents who
do not belong to 𝐴 as well as the risk to those who do belong to the sensitive group.
They defined the hazard for an individual in 𝐴 as the probability that he or she is perceived as belonging to 𝐴:
Pr(yes|𝐴) × Pr(𝐴|yes) + Pr(no|𝐴) × Pr(𝐴|no) (16.7)

Similarly, the hazard for an individual who does not belong to 𝐴 would be
′ ′
Pr(yes|𝐴 ) × Pr(𝐴|yes) + Pr(no|𝐴 ) × Pr(𝐴|no) (16.8)

Greenberg et al. (1977) also considered an alternative related measure of hazard that “is likely to be closer to the actual
concern felt by a respondent.”

The “limited hazard” for an individual in 𝐴 and 𝐴 is
Pr(yes|𝐴) × Pr(𝐴|yes) (16.9)

and

Pr(yes|𝐴 ) × Pr(𝐴|yes) (16.10)

This measure is just the first term in (16.7), i.e., the probability that an individual answers “yes” and is perceived to belong
to 𝐴.

16.4 Respondent’s Expected Utility

16.4.1 Truth Border

Key assumptions that underlie a randomized response technique for estimating the fraction of a population that belongs
to 𝐴 are:
• Assumption 1: Respondents feel discomfort from being thought of as belonging to 𝐴.
• Assumption 2: Respondents prefer to answer questions truthfully than to lie, so long as the cost of doing so is not
too high. The cost is taken to be the discomfort in 1.
Let 𝑟𝑖 denote individual 𝑖’s response to the randomized question.
𝑟𝑖 can only take values “yes” or “no”.
For a given design of a randomized response interview and a given belief about the fraction of the population that belongs
to 𝐴, the respondent’s answer is associated with a conditional probability Pr(𝐴|𝑟𝑖 ) that the individual belongs to 𝐴.
Given 𝑟𝑖 and complete privacy, the individual’s utility is higher if 𝑟𝑖 represents a truthful answer rather than a lie.
In terms of a respondent’s expected utility as a function of Pr(𝐴|𝑟𝑖 ) and 𝑟𝑖

16.4. Respondent’s Expected Utility 287


Intermediate Quantitative Economics with Python

• The higher is Pr(𝐴|𝑟𝑖 ), the lower isindividual 𝑖’s expected utility.


• expected utility is higher if 𝑟𝑖 represents a truthful answer rather than a lie
Define:
• 𝜙𝑖 ∈ {truth, lie}, a dichotomous variable that indicates whether or not 𝑟𝑖 is a truthful statement.
• 𝑈𝑖 (Pr(𝐴|𝑟𝑖 ), 𝜙𝑖 ), a utility function that is differentiable in its first argument, summarizes individual 𝑖’s expected
utility.
Then there is an 𝑟𝑖 such that

𝜕𝑈𝑖 (Pr(𝐴|𝑟𝑖 ), 𝜙𝑖 )
< 0, for 𝜙𝑖 ∈ {truth, lie} (16.11)
𝜕Pr(𝐴|𝑟𝑖 )

and

𝑈𝑖 (Pr(𝐴|𝑟𝑖 ), truth) > 𝑈𝑖 (Pr(𝐴|𝑟𝑖 ), lie) , for Pr(𝐴|𝑟𝑖 ) ∈ [0, 1] (16.12)

Suppose now that correct answer for individual 𝑖 is “yes”.


Individual 𝑖 would choose to answer truthfully if

𝑈𝑖 (Pr(𝐴|yes), truth) ≥ 𝑈𝑖 (Pr(𝐴|no), lie) (16.13)

If the correct answer is “no”, individual 𝑖 would volunteer the correct answer only if

𝑈𝑖 (Pr(𝐴|no), truth) ≥ 𝑈𝑖 (Pr(𝐴|yes), lie) (16.14)

Assume that

Pr(𝐴|yes) > 𝜋𝐴 > Pr(𝐴|no)

so that a “yes” answer increases the odds that an individual belongs to 𝐴.


Constraint (16.14) holds for sure.
Consequently, constraint (16.13) becomes the single necessary condition for individual 𝑖 always to answer truthfully.
At equality, constraint (10.a) determines conditional probabilities that make the individual indifferent between telling the
truth and lying when the correct answer is “yes”:

𝑈𝑖 (Pr(𝐴|yes), truth) = 𝑈𝑖 (Pr(𝐴|no), lie) (16.15)

Equation (16.15) defines a “truth border”.


Differentiating (16.15) with respect to the conditional probabilities shows that the truth border has a positive slope in the
space of conditional probabilities:
𝜕𝑈𝑖 (Pr(𝐴|yes),truth)
𝜕Pr(𝐴|no) 𝜕Pr(𝐴|yes)
= 𝜕𝑈𝑖 (Pr(𝐴|no),lie)
>0 (16.16)
𝜕Pr(𝐴|yes)
𝜕Pr(𝐴|no)

The source of the positive relationship is:


• The individual is willing to volunteer a truthful “yes” answer so long as the utility from doing so (i.e., the left side
of (16.15)) is at least as high as the utility of lying on the right side of (16.15).
• Suppose now that Pr(𝐴|yes) increases. That reduces the utility of telling the truth. To preserve indifference between
a truthful answer and a lie, Pr(𝐴|no) must increase to reduce the utility of lying.

288 Chapter 16. Expected Utilities of Random Responses


Intermediate Quantitative Economics with Python

16.4.2 Drawing a Truth Border

We can deduce two things about the truth border:


• The truth border divides the space of conditional probabilities into two subsets: “truth telling” and “lying”. Thus,
sufficient privacy elicits a truthful answer, whereas insufficient privacy results in a lie. The truth border depends on
a respondent’s utility function.
• Assumptions in (16.11) and (16.11) are sufficient only to guarantee a positive slope of the truth border. The truth
border can have either a concave or a convex shape.
We can draw some truth borders with the following Python code:

x1 = np.arange(0, 1, 0.001)
y1 = x1 - 0.4
x2 = np.arange(0.4**2, 1, 0.001)
y2 = (pow(x2, 0.5) - 0.4)**2
x3 = np.arange(0.4**0.5, 1, 0.001)
y3 = pow(x3**2 - 0.4, 0.5)
plt.figure(figsize=(12, 10))
plt.plot(x1, y1, 'r-', label='Truth Border of: $U_i(Pr(A|r_i),\phi_i)=-Pr(A|r_i)+f(\
↪phi_i)$')

plt.fill_between(x1, 0, y1, facecolor='red', alpha=0.05)


plt.plot(x2, y2, 'b-', label='Truth Border of: $U_i(Pr(A|r_i),\phi_i)=-Pr(A|r_i)^{2}
↪+f(\phi_i)$')

plt.fill_between(x2, 0, y2, facecolor='blue', alpha=0.05)


plt.plot(x3, y3, 'y-', label='Truth Border of: $U_i(Pr(A|r_i),\phi_i)=-\sqrt{Pr(A|r_
↪i)}+f(\phi_i)$')

plt.fill_between(x3, 0, y3, facecolor='green', alpha=0.05)


plt.plot(x1, x1, ':', linewidth=2)
plt.xlim([0, 1])
plt.ylim([0, 1])

plt.xlabel('Pr(A|yes)')
plt.ylabel('Pr(A|no)')
plt.text(0.42, 0.3, "Truth Telling", fontdict={'size':28, 'style':'italic'})
plt.text(0.8, 0.1, "Lying", fontdict={'size':28, 'style':'italic'})

plt.legend(loc=0, fontsize='large')
plt.title('Figure 1.1')
plt.show()

16.4. Respondent’s Expected Utility 289


Intermediate Quantitative Economics with Python

Figure 1.1 three types of truth border.


Without loss of generality, we consider the truth border:

𝑈𝑖 (Pr(𝐴|𝑟𝑖 ), 𝜙𝑖 ) = −Pr(𝐴|𝑟𝑖 ) + 𝑓(𝜙𝑖 )

and plot the “truth telling” and “lying area” of individual 𝑖 in Figure 1.2:

x1 = np.arange(0, 1, 0.001)
y1 = x1 - 0.4
z1 = x1
z2 = 0
plt.figure(figsize=(12, 10))
plt.plot(x1, y1,'r-',label='Truth Border of: $U_i(Pr(A|r_i),\phi_i)=-Pr(A|r_i)+f(\phi_
↪i)$')

plt.plot(x1, x1, ':', linewidth=2)


plt.fill_between(x1, y1, z1, facecolor='blue', alpha=0.05, label='truth telling')
plt.fill_between(x1, z2, y1, facecolor='green', alpha=0.05, label='lying')
plt.xlim([0, 1])
plt.ylim([0, 1])

plt.xlabel('Pr(A|yes)')
plt.ylabel('Pr(A|no)')
(continues on next page)

290 Chapter 16. Expected Utilities of Random Responses


Intermediate Quantitative Economics with Python

(continued from previous page)


plt.text(0.5, 0.4, "Truth Telling", fontdict={'size':28, 'style':'italic'})
plt.text(0.8, 0.2, "Lying", fontdict={'size':28, 'style':'italic'})

plt.legend(loc=0, fontsize='large')
plt.title('Figure 1.2')
plt.show()

16.5 Utilitarian View of Survey Design

16.5.1 Iso-variance Curves

A statistician’s objective is
• to find a randomized response survey design that minimizes the bias and the variance of the estimator.
Given a design that ensures truthful answers by all respondents, Anderson(1976, Theorem 1) [Anderson, 1976] showed
that the minimum variance estimate in the two-response model has variance
𝜋𝐴 2 (1 − 𝜋𝐴 )2 1 1
𝑉 (Pr(𝐴|yes), Pr(𝐴|no)) = × × (16.17)
𝑛 Pr(𝐴|yes) − 𝜋𝐴 𝜋𝐴 − Pr(𝐴|no)

16.5. Utilitarian View of Survey Design 291


Intermediate Quantitative Economics with Python

where the random sample with replacement consists of 𝑛 individuals.


We can use Expression (16.17) to draw iso-variance curves.
The following inequalities restrict the shapes of iso-variance curves:

𝑑 Pr(𝐴|no) 𝜋 − Pr(𝐴|no)
∣ = 𝐴 >0 (16.18)
𝑑 Pr(𝐴|yes) constant variance Pr(𝐴|yes) − 𝜋𝐴

𝑑2 Pr(𝐴|no) 2 [𝜋𝐴 − Pr(𝐴|no)]


2
∣ =− 2
<0 (16.19)
𝑑 Pr(𝐴|yes) constant variance [Pr(𝐴|yes) − 𝜋𝐴 ]
From expression (16.17), (16.18) and (16.19) we can see that:
• Variance can be reduced only by increasing the distance of Pr(𝐴|yes) and/or Pr(𝐴|no) from 𝑟𝐴 .
• Iso-variance curves are always upward-sloping and concave.

16.5.2 Drawing Iso-variance Curves

We use Python code to draw iso-variance curves.


The pairs of conditional probabilities can be attained using Warner’s (1965) model.
Note that:
• Any point on the iso-variance curves can be attained with the unrelated question model as long as the statistician
can completely control the model design.
• Warner’s (1965) original randomized response model is less flexible than the unrelated question model.

class Iso_Variance:
def __init__(self, pi, n):
self.pi = pi
self.n = n

def plotting_iso_variance_curve(self):
pi = self.pi
n = self.n

nv = np.array([0.27, 0.34, 0.49, 0.74, 0.92, 1.1, 1.47, 2.94, 14.7])


x = np.arange(0, 1, 0.001)
x0 = np.arange(pi, 1, 0.001)
x2 = np.arange(0, pi, 0.001)
y1 = [pi for i in x0]
y2 = [pi for i in x2]
y0 = 1 / (1 + (x0 * (1 - pi)**2) / ((1 - x0) * pi**2))

plt.figure(figsize=(12, 10))
plt.plot(x0, y0, 'm-', label='Warner')
plt.plot(x, x, 'c:', linewidth=2)
plt.plot(x0, y1,'c:', linewidth=2)
plt.plot(y2, x2, 'c:', linewidth=2)
for i in range(len(nv)):
y = pi - (pi**2 * (1 - pi)**2) / (n * (nv[i] / n) * (x0 - pi + 1e-8))
plt.plot(x0, y, 'k--', alpha=1 - 0.07 * i, label=f'V{i+1}')
plt.xlim([0, 1])
plt.ylim([0, 0.5])
plt.xlabel('Pr(A|yes)')
(continues on next page)

292 Chapter 16. Expected Utilities of Random Responses


Intermediate Quantitative Economics with Python

(continued from previous page)


plt.ylabel('Pr(A|no)')
plt.legend(loc=0, fontsize='large')
plt.text(0.32, 0.28, "High Var", fontdict={'size':15, 'style':'italic'})
plt.text(0.91, 0.01, "Low Var", fontdict={'size':15, 'style':'italic'})
plt.title('Figure 2')
plt.show()

Properties of iso-variance curves are:


• All points on one iso-variance curve share the same variance
• From 𝑉1 to 𝑉9 , the variance of the iso-variance curve increase monotonically, as colors brighten monotonically
Suppose the parameters of the iso-variance model follow those in Ljungqvist [Ljungqvist, 1993], which are:
• 𝜋 = 0.3
• 𝑛 = 100
Then we can plot the iso-variance curve in Figure 2:

var = Iso_Variance(pi=0.3, n=100)


var.plotting_iso_variance_curve()

16.5. Utilitarian View of Survey Design 293


Intermediate Quantitative Economics with Python

16.5.3 Optimal Survey

A point on an iso-variance curves can be attained with the unrelated question design.
We now focus on finding an “optimal survey design” that
• Minimizes the variance of the estimator subject to privacy restrictions.
To obtain an optimal design, we first superimpose all individuals’ truth borders on the iso-variance mapping.
To construct an optimal design
• The statistician should find the intersection of areas above all truth borders; that is, the set of conditional probabilities
ensuring truthful answers from all respondents.
• The point where this set touches the lowest possible iso-variance curve determines an optimal survey design.
Consquently, a minimum variance unbiased estimator is pinned down by an individual who is the least willing to volunteer
a truthful answer.
Here are some comments about the model design:
• An individual’s decision of whether or not to answer truthfully depends on his or her belief about other respondents’
behavior, because this determines the individual’s calculation of Pr(𝐴|yes) and Pr(𝐴|no).
• An equilibrium of the optimal design model is a Nash equilibrium of a noncooperative game.
• Assumption (16.12) is sufficient to guarantee existence of an optimal model design. By choosing Pr(𝐴|yes) and
Pr(𝐴|no) sufficiently close to each other, all respondents will find it optimal to answer truthfully. The closer are
these probabilities, the higher the variance of the estimator becomes.
• If respondents experience a large enough increase in expected utility from telling the truth, then there is no need to
use a randomized response model. The smallest possible variance of the estimate is then obtained at Pr(𝐴|yes) = 1
and Pr(𝐴|no) = 0 ; that is, when respondents answer truthfully to direct questioning.
• A more general design problem would be to minimize some weighted sum of the estimator’s variance and bias. It
would be optimal to accept some lies from the most “reluctant” respondents.

16.6 Criticisms of Proposed Privacy Measures

We can use a utilitarian approach to analyze some privacy measures.


We’ll enlist Python Code to help us.

16.6.1 Analysis of Method of Lanke’s (1976)

Lanke (1976) recommends a privacy protection criterion that minimizes:

max {Pr(𝐴|yes), Pr(𝐴|no)} (16.20)

Following Lanke’s suggestion, the statistician should find the highest possible Pr(𝐴|yes) consistent with truth telling while
Pr(𝐴|no) is fixed at 0. The variance is then minimized at point 𝑋 in Figure 3.
However, we can see that in Figure 3, point 𝑍 offers a smaller variance that still allows cooperation of the respondents,
and it is achievable following our discussion of the truth border in Part III:

294 Chapter 16. Expected Utilities of Random Responses


Intermediate Quantitative Economics with Python

pi = 0.3
n = 100
nv = [0.27, 0.34, 0.49, 0.74, 0.92, 1.1, 1.47, 2.94, 14.7]
x = np.arange(0, 1, 0.001)
y = x - 0.4
z = x
x0 = np.arange(pi, 1, 0.001)
x2 = np.arange(0, pi, 0.001)
y1 = [pi for i in x0]
y2 = [pi for i in x2]

plt.figure(figsize=(12, 10))
plt.plot(x, x, 'c:', linewidth=2)
plt.plot(x0, y1, 'c:', linewidth=2)
plt.plot(y2, x2, 'c:', linewidth=2)
plt.plot(x, y, 'r-', label='Truth Border')
plt.fill_between(x, y, z, facecolor='blue', alpha=0.05, label='truth telling')
plt.fill_between(x, 0, y, facecolor='green', alpha=0.05, label='lying')
for i in range(len(nv)):
y = pi - (pi**2 * (1 - pi)**2) / (n * (nv[i] / n) * (x0 - pi + 1e-8))
plt.plot(x0, y, 'k--', alpha=1 - 0.07 * i, label=f'V{i+1}')

plt.scatter(0.498, 0.1, c='b', marker='*', label='Z', s=150)


plt.scatter(0.4, 0, c='y', label='X', s=150)
plt.xlim([0, 1])
plt.ylim([0, 0.5])
plt.xlabel('Pr(A|yes)')
plt.ylabel('Pr(A|no)')
plt.text(0.45, 0.35, "Truth Telling", fontdict={'size':28, 'style':'italic'})
plt.text(0.85, 0.35, "Lying",fontdict = {'size':28, 'style':'italic'})
plt.text(0.515, 0.095, "Optimal Design", fontdict={'size':16,'color':'b'})
plt.legend(loc=0, fontsize='large')
plt.title('Figure 3')
plt.show()

16.6. Criticisms of Proposed Privacy Measures 295


Intermediate Quantitative Economics with Python

16.6.2 Method of Leysieffer and Warner (1976)

Leysieffer and Warner (1976) recommend a two-dimensional measure of jeopardy that reduces to a single dimension
when there is no jeopardy in a ‘no’ answer”, which means that

Pr(yes|𝐴) = 1

and

Pr(𝐴|no) = 0

This is not an optimal choice under a utilitarian approach.

296 Chapter 16. Expected Utilities of Random Responses


Intermediate Quantitative Economics with Python

16.6.3 Analysis on the Method of Chaudhuri and Mukerjee’s (1988)

[Chadhuri and Mukerjee, 1988]


Chaudhuri and Mukerjee (1988) argued that the individual may find that since “yes” may sometimes relate to the sensitive
group A, a clever respondent may falsely but safely always be inclined to respond “no”. In this situation, the truth border
is such that individuals choose to lie whenever the truthful answer is “yes” and

Pr(𝐴|no) = 0

Here the gain from lying is too high for someone to volunteer a “yes” answer.
This means that

𝑈𝑖 (Pr(𝐴|yes), truth) < 𝑈𝑖 (Pr(𝐴|no), lie)

in any situation always.


As a result, there is no attainable model design.
However, under a utilitarian approach there should exist other survey designs that are consistent with truthful answers.
In particular, respondents will choose to answer truthfully if the relative advantage from lying is eliminated.
We can use Python to show that the optimal model design corresponds to point Q in Figure 4:

def f(x):
if x < 0.16:
return 0
else:
return (pow(x, 0.5) - 0.4)**2

pi = 0.3
n = 100
nv = [0.27, 0.34, 0.49, 0.74, 0.92, 1.1, 1.47, 2.94, 14.7]
x = np.arange(0, 1, 0.001)
y = [f(i) for i in x]
z = x
x0 = np.arange(pi, 1, 0.001)
x2 = np.arange(0, pi, 0.001)
y1 = [pi for i in x0]
y2 = [pi for i in x2]
x3 = np.arange(0.16, 1, 0.001)
y3 = (pow(x3, 0.5) - 0.4)**2

plt.figure(figsize=(12, 10))
plt.plot(x, x, 'c:', linewidth=2)
plt.plot(x0, y1,'c:', linewidth=2)
plt.plot(y2, x2,'c:', linewidth=2)
plt.plot(x3, y3,'b-', label='Truth Border')
plt.fill_between(x, y, z, facecolor='blue', alpha=0.05, label='Truth telling')
plt.fill_between(x3, 0, y3,facecolor='green', alpha=0.05, label='Lying')
for i in range(len(nv)):
y = pi - (pi**2 * (1 - pi)**2) / (n * (nv[i] / n) * (x0 - pi + 1e-8))
plt.plot(x0, y, 'k--', alpha=1 - 0.07 * i, label=f'V{i+1}')
plt.scatter(0.61, 0.146, c='r', marker='*', label='Z', s=150)
plt.xlim([0, 1])
plt.ylim([0, 0.5])
(continues on next page)

16.6. Criticisms of Proposed Privacy Measures 297


Intermediate Quantitative Economics with Python

(continued from previous page)


plt.xlabel('Pr(A|yes)')
plt.ylabel('Pr(A|no)')
plt.text(0.45, 0.35, "Truth Telling", fontdict={'size':28, 'style':'italic'})
plt.text(0.8, 0.1, "Lying", fontdict={'size':28, 'style':'italic'})
plt.text(0.63, 0.141, "Optimal Design", fontdict={'size':16,'color':'r'})
plt.legend(loc=0, fontsize='large')
plt.title('Figure 4')
plt.show()

16.6.4 Method of Greenberg et al. (1977)

[Greenberg et al., 1977]


Greenberg et al. (1977) defined the hazard for an individual in 𝐴 as the probability that he or she is perceived as belonging
to 𝐴:

Pr(yes|𝐴) × Pr(𝐴|yes) + Pr(no|𝐴) × Pr(𝐴|no) (16.21)

The hazard for an individual who does not belong to 𝐴 is


′ ′
Pr(yes|𝐴 ) × Pr(𝐴|yes) + Pr(no|𝐴 ) × Pr(𝐴|no) (16.22)

298 Chapter 16. Expected Utilities of Random Responses


Intermediate Quantitative Economics with Python

They also considered an alternative related measure of hazard that they said “is likely to be closer to the actual concern
felt by a respondent.”

Their “limited hazard” for an individual in 𝐴 and 𝐴 is

Pr(yes|𝐴) × Pr(𝐴|yes) (16.23)

and

Pr(yes|𝐴 ) × Pr(𝐴|yes) (16.24)

According to Greenberg et al. (1977), a respondent commits himself or herself to answer truthfully on the basis of a
probability in (16.21) or (16.23) before randomly selecting the question to be answered.
Suppose that the appropriate privacy measure is captured by the notion of “limited hazard” in (16.23) and (16.24).
Consider an unrelated question model where the unrelated question is replaced by the instruction “Say the word ‘no’”,
which implies that

Pr(𝐴|yes) = 1

and it follows that:



• Hazard for an individual in 𝐴 is 0.
• Hazard for an individual in 𝐴 can also be made arbitrarily small by choosing a sufficiently small Pr(yes|𝐴).
Even though this hazard can be set arbitrarily close to 0, an individual in 𝐴 will completely reveal his or her identity
whenever truthfully answering the sensitive question.
However, under utilitarian framework, it is obviously contradictory.
If the individuals are willing to volunteer this information, it seems that the randomized response design was not necessary
in the first place.
It ignores the fact that respondents retain the option of lying until they have seen the question to be answered.

16.7 Concluding Remarks

The justifications for a randomized response procedure are that


• Respondents are thought to feel discomfort from being perceived as belonging to the sensitive group.
• Respondents prefer to answer questions truthfully than to lie, unless it is too revealing.
If a privacy measure is not completely consistent with the rational behavior of the respondents, all efforts to derive an
optimal model design are futile.
A utilitarian approach provides a systematic way to model respondents’ behavior under the assumption that they maximize
their expected utilities.
In a utilitarian analysis:
• A truth border divides the space of conditional probabilities of being perceived as belonging to the sensitive group,
Pr(𝐴|yes) and Pr(𝐴|no), into the truth-telling region and the lying region.
• The optimal model design is obtained at the point where the truth border touches the lowest possible iso-variance
curve.
A practical implication of the analysis of [Ljungqvist, 1993] is that uncertainty about respondents’ demands for privacy
can be acknowledged by choosing Pr(𝐴|yes) and Pr(𝐴|no) sufficiently close to each other.

16.7. Concluding Remarks 299


Intermediate Quantitative Economics with Python

300 Chapter 16. Expected Utilities of Random Responses


Part III

Linear Programming

301
CHAPTER

SEVENTEEN

OPTIMAL TRANSPORT

17.1 Overview

The transportation or optimal transport problem is interesting both because of its many applications and because of
its important role in the history of economic theory.
In this lecture, we describe the problem, tell how linear programming is a key tool for solving it, and then provide some
examples.
We will provide other applications in followup lectures.
The optimal transport problem was studied in early work about linear programming, as summarized for example by
[Dorfman et al., 1958]. A modern reference about applications in economics is [Galichon, 2016].
Below, we show how to solve the optimal transport problem using several implementations of linear programming, in-
cluding, in order,
1. the linprog solver from SciPy,
2. the linprog_simplex solver from QuantEcon and
3. the simplex-based solvers included in the Python Optimal Transport package.

!pip install --upgrade quantecon


!pip install --upgrade POT

Let’s start with some imports.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import linprog
from quantecon.optimize.linprog_simplex import linprog_simplex
import ot
from scipy.stats import betabinom
import networkx as nx

303
Intermediate Quantitative Economics with Python

17.2 The Optimal Transport Problem

Suppose that 𝑚 factories produce goods that must be sent to 𝑛 locations.


Let
• 𝑥𝑖𝑗 denote the quantity shipped from factory 𝑖 to location 𝑗
• 𝑐𝑖𝑗 denote the cost of shipping one unit from factory 𝑖 to location 𝑗
• 𝑝𝑖 denote the capacity of factory 𝑖 and 𝑞𝑗 denote the amount required at location 𝑗.
• 𝑖 = 1, 2, … , 𝑚 and 𝑗 = 1, 2, … , 𝑛.
A planner wants to minimize total transportation costs subject to the following constraints:
• The amount shipped from each factory must equal its capacity.
• The amount shipped to each location must equal the quantity required there.
The figure below shows one visualization of this idea, when factories and target locations are distributed in the plane.

The size of the vertices in the figure are proportional to


• capacity, for the factories, and
• demand (amount required) for the target locations.
The arrows show one possible transport plan, which respects the constraints stated above.
The planner’s problem can be expressed as the following constrained minimization problem:
𝑚 𝑛
min ∑ ∑ 𝑐𝑖𝑗 𝑥𝑖𝑗
𝑥𝑖𝑗
𝑖=1 𝑗=1
𝑛
subject to ∑ 𝑥𝑖𝑗 = 𝑝𝑖 , 𝑖 = 1, 2, … , 𝑚
𝑗=1 (17.1)
𝑚
∑ 𝑥𝑖𝑗 = 𝑞𝑗 , 𝑗 = 1, 2, … , 𝑛
𝑖=1
𝑥𝑖𝑗 ≥ 0

This is an optimal transport problem with


• 𝑚𝑛 decision variables, namely, the entries 𝑥𝑖𝑗 and
• 𝑚 + 𝑛 constraints.

304 Chapter 17. Optimal Transport


Intermediate Quantitative Economics with Python

Summing the 𝑞𝑗 ’s across all 𝑗’s and the 𝑝𝑖 ’s across all 𝑖’s indicates that the total capacity of all the factories equals total
requirements at all locations:
𝑛 𝑛 𝑚 𝑚 𝑛 𝑚
∑ 𝑞𝑗 = ∑ ∑ 𝑥𝑖𝑗 = ∑ ∑ 𝑥𝑖𝑗 = ∑ 𝑝𝑖 (17.2)
𝑗=1 𝑗=1 𝑖=1 𝑖=1 𝑗=1 𝑖=1

The presence of the restrictions in (17.2) will be the source of one redundancy in the complete set of restrictions that we
describe below.
More about this later.

17.3 The Linear Programming Approach

In this section we discuss using using standard linear programming solvers to tackle the optimal transport problem.

17.3.1 Vectorizing a Matrix of Decision Variables

A matrix of decision variables 𝑥𝑖𝑗 appears in problem (17.1).


The SciPy function linprog expects to see a vector of decision variables.
This situation impels us to rewrite our problem in terms of a vector of decision variables.
Let
• 𝑋, 𝐶 be 𝑚 × 𝑛 matrices with entries 𝑥𝑖𝑗 , 𝑐𝑖𝑗 ,
• 𝑝 be 𝑚-dimensional vector with entries 𝑝𝑖 ,
• 𝑞 be 𝑛-dimensional vector with entries 𝑞𝑗 .
With 1𝑛 denoting the 𝑛-dimensional column vector (1, 1, … , 1)′ , our problem can now be expressed compactly as:

min tr(𝐶 ′ 𝑋)
𝑋
subject to 𝑋 1𝑛 = 𝑝
𝑋 ′ 1𝑚 = 𝑞
𝑋≥0

We can convert the matrix 𝑋 into a vector by stacking all of its columns into a column vector.
Doing this is called vectorization, an operation that we denote vec(𝑋).
Similarly, we convert the matrix 𝐶 into an 𝑚𝑛-dimensional vector vec(𝐶).
The objective function can be expressed as the inner product between vec(𝐶) and vec(𝑋):

vec(𝐶)′ ⋅ vec(𝑋).

To express the constraints in terms of vec(𝑋), we use a Kronecker product denoted by ⊗ and defined as follows.
Suppose 𝐴 is an 𝑚 × 𝑠 matrix with entries (𝑎𝑖𝑗 ) and that 𝐵 is an 𝑛 × 𝑡 matrix.
The Kronecker product of 𝐴 and 𝐵 is defined, in block matrix form, by

𝑎11 𝐵 𝑎12 𝐵 … 𝑎1𝑠 𝐵



⎜ 𝑎21 𝐵 𝑎22 𝐵 … 𝑎2𝑠 𝐵 ⎞

𝐴⊗𝐵 =⎜
⎜ ⎟
⎟.

⎝𝑎𝑚1 𝐵 𝑎𝑚2 𝐵 … 𝑎𝑚𝑠 𝐵⎠

17.3. The Linear Programming Approach 305


Intermediate Quantitative Economics with Python

𝐴 ⊗ 𝐵 is an 𝑚𝑛 × 𝑠𝑡 matrix.
It has the property that for any 𝑚 × 𝑛 matrix 𝑋

vec(𝐴′ 𝑋𝐵) = (𝐵′ ⊗ 𝐴′ ) vec(𝑋). (17.3)

We can now express our constraints in terms of vec(𝑋).


Let 𝐴 = I′𝑚 , 𝐵 = 1𝑛 .
By equation (17.3)

𝑋 1𝑛 = vec(𝑋 1𝑛 ) = vec(I𝑚 𝑋 1𝑛 ) = (1′𝑛 ⊗ I𝑚 ) vec(𝑋).

where I𝑚 denotes the 𝑚 × 𝑚 identity matrix.


Constraint 𝑋 1𝑛 = 𝑝 can now be written as:

(1′𝑛 ⊗ I𝑚 ) vec(𝑋) = 𝑝.

Similarly, the constraint 𝑋 ′ 1𝑚 = 𝑞 can be rewriten as:

(I𝑛 ⊗ 1′𝑚 ) vec(𝑋) = 𝑞.

With 𝑧 ∶= vec(𝑋), our problem can now be expressed in terms of an 𝑚𝑛-dimensional vector of decision variables:

min vec(𝐶)′ 𝑧
𝑧
subject to 𝐴𝑧 = 𝑏 (17.4)
𝑧≥0

where
1′𝑛 ⊗ I𝑚 𝑝
𝐴=( ) and 𝑏 = ( )
I𝑛 ⊗ 1′𝑚 𝑞

17.3.2 An Application

We now provide an example that takes the form (17.4) that we’ll solve by deploying the function linprog.
The table below provides numbers for the requirements vector 𝑞, the capacity vector 𝑝, and entries 𝑐𝑖𝑗 of the cost-of-
shipping matrix 𝐶.
The numbers in the above table tell us to set 𝑚 = 3, 𝑛 = 5, and construct the following objects:

25
50 ⎛
⎜115⎞ 10 15 20 20 40
⎜ ⎟ ⎟
𝑝=⎛
⎜100⎞⎟, 𝑞=⎜
⎜ 60 ⎟
⎟ and 𝐶=⎛
⎜20 40 15 30 30⎞⎟.

⎜ 30 ⎟

⎝150⎠ ⎝30 35 40 55 25⎠
⎝ 70 ⎠
Let’s write Python code that sets up the problem and solves it.

# Define parameters
m = 3
n = 5

p = np.array([50, 100, 150])


(continues on next page)

306 Chapter 17. Optimal Transport


Intermediate Quantitative Economics with Python

(continued from previous page)


q = np.array([25, 115, 60, 30, 70])

C = np.array([[10, 15, 20, 20, 40],


[20, 40, 15, 30, 30],
[30, 35, 40, 55, 25]])

# Vectorize matrix C
C_vec = C.reshape((m*n, 1), order='F')

# Construct matrix A by Kronecker product


A1 = np.kron(np.ones((1, n)), np.identity(m))
A2 = np.kron(np.identity(n), np.ones((1, m)))
A = np.vstack([A1, A2])

# Construct vector b
b = np.hstack([p, q])

# Solve the primal problem


res = linprog(C_vec, A_eq=A, b_eq=b)

# Print results
print("message:", res.message)
print("nit:", res.nit)
print("fun:", res.fun)
print("z:", res.x)
print("X:", res.x.reshape((m,n), order='F'))

message: Optimization terminated successfully. (HiGHS Status 7: Optimal)


nit: 8
fun: 7225.0
z: [ 0. 10. 15. 50. 0. 65. 0. 60. 0. 0. 30. 0. 0. 0. 70.]
X: [[ 0. 50. 0. 0. 0.]
[10. 0. 60. 30. 0.]
[15. 65. 0. 0. 70.]]

Notice how, in the line C_vec = C.reshape((m*n, 1), order='F'), we are careful to vectorize using the
flag order='F'.
This is consistent with converting 𝐶 into a vector by stacking all of its columns into a column vector.
Here 'F' stands for “Fortran”, and we are using Fortran style column-major order.
(For an alternative approach, using Python’s default row-major ordering, see this lecture by Alfred Galichon.)
Interpreting the warning:
The above warning message from SciPy points out that A is not full rank.
This indicates that the linear program has been set up to include one or more redundant constraints.
Here, the source of the redundancy is the structure of restrictions (17.2).
Let’s explore this further by printing out 𝐴 and staring at it.

17.3. The Linear Programming Approach 307


Intermediate Quantitative Economics with Python

array([[1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0.],
[0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0.],
[0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1., 0., 0., 1.],
[1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1.]])

The singularity of 𝐴 reflects that the first three constraints and the last five constraints both require that “total requirements
equal total capacities” expressed in (17.2).
One equality constraint here is redundant.
Below we drop one of the equality constraints, and use only 7 of them.
After doing this, we attain the same minimized cost.
However, we find a different transportation plan.
Though it is a different plan, it attains the same cost!

linprog(C_vec, A_eq=A[:-1], b_eq=b[:-1])

message: Optimization terminated successfully. (HiGHS Status 7: Optimal)


success: True
status: 0
fun: 7225.0
x: [ 0.000e+00 1.000e+01 ... 0.000e+00 7.000e+01]
nit: 8
lower: residual: [ 0.000e+00 1.000e+01 ... 0.000e+00
7.000e+01]
marginals: [ 0.000e+00 0.000e+00 ... 1.500e+01
0.000e+00]
upper: residual: [ inf inf ... inf
inf]
marginals: [ 0.000e+00 0.000e+00 ... 0.000e+00
0.000e+00]
eqlin: residual: [ 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00]
marginals: [ 5.000e+00 1.500e+01 2.500e+01 5.000e+00
1.000e+01 -0.000e+00 1.500e+01]
ineqlin: residual: []
marginals: []
mip_node_count: 0
mip_dual_bound: 0.0
mip_gap: 0.0

%time linprog(C_vec, A_eq=A[:-1], b_eq=b[:-1])

CPU times: user 1.95 ms, sys: 0 ns, total: 1.95 ms


Wall time: 1.54 ms

message: Optimization terminated successfully. (HiGHS Status 7: Optimal)


success: True
(continues on next page)

308 Chapter 17. Optimal Transport


Intermediate Quantitative Economics with Python

(continued from previous page)


status: 0
fun: 7225.0
x: [ 0.000e+00 1.000e+01 ... 0.000e+00 7.000e+01]
nit: 8
lower: residual: [ 0.000e+00 1.000e+01 ... 0.000e+00
7.000e+01]
marginals: [ 0.000e+00 0.000e+00 ... 1.500e+01
0.000e+00]
upper: residual: [ inf inf ... inf
inf]
marginals: [ 0.000e+00 0.000e+00 ... 0.000e+00
0.000e+00]
eqlin: residual: [ 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00]
marginals: [ 5.000e+00 1.500e+01 2.500e+01 5.000e+00
1.000e+01 -0.000e+00 1.500e+01]
ineqlin: residual: []
marginals: []
mip_node_count: 0
mip_dual_bound: 0.0
mip_gap: 0.0

%time linprog(C_vec, A_eq=A, b_eq=b)

CPU times: user 2.01 ms, sys: 38 µs, total: 2.05 ms


Wall time: 1.58 ms

message: Optimization terminated successfully. (HiGHS Status 7: Optimal)


success: True
status: 0
fun: 7225.0
x: [ 0.000e+00 1.000e+01 ... 0.000e+00 7.000e+01]
nit: 8
lower: residual: [ 0.000e+00 1.000e+01 ... 0.000e+00
7.000e+01]
marginals: [ 0.000e+00 0.000e+00 ... 1.500e+01
0.000e+00]
upper: residual: [ inf inf ... inf
inf]
marginals: [ 0.000e+00 0.000e+00 ... 0.000e+00
0.000e+00]
eqlin: residual: [ 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00]
marginals: [ 1.000e+01 2.000e+01 3.000e+01 -0.000e+00
5.000e+00 -5.000e+00 1.000e+01 -5.000e+00]
ineqlin: residual: []
marginals: []
mip_node_count: 0
mip_dual_bound: 0.0
mip_gap: 0.0

Evidently, it is slightly quicker to work with the system that removed a redundant constraint.
Let’s drill down and do some more calculations to help us understand whether or not our finding two different optimal
transport plans reflects our having dropped a redundant equality constraint.

17.3. The Linear Programming Approach 309


Intermediate Quantitative Economics with Python

Hint
It will turn out that dropping a redundant equality isn’t really what mattered.

To verify our hint, we shall simply use all of the original equality constraints (including a redundant one), but we’ll just
shuffle the order of the constraints.

arr = np.arange(m+n)

sol_found = []
cost = []

# simulate 1000 times


for i in range(1000):

np.random.shuffle(arr)
res_shuffle = linprog(C_vec, A_eq=A[arr], b_eq=b[arr])

# if find a new solution


sol = tuple(res_shuffle.x)
if sol not in sol_found:
sol_found.append(sol)
cost.append(res_shuffle.fun)

for i in range(len(sol_found)):
print(f"transportation plan {i}: ", sol_found[i])
print(f" minimized cost {i}: ", cost[i])

transportation plan 0: (0.0, 10.0, 15.0, 50.0, 0.0, 65.0, 0.0, 60.0, 0.0, 0.0, 30.
↪0, 0.0, 0.0, 0.0, 70.0)

minimized cost 0: 7225.0

Ah hah! As you can see, putting constraints in different orders in this case uncovers two optimal transportation plans that
achieve the same minimized cost.
These are the same two plans computed earlier.
Next, we show that leaving out the first constraint “accidentally” leads to the initial plan that we computed.

linprog(C_vec, A_eq=A[1:], b_eq=b[1:])

message: Optimization terminated successfully. (HiGHS Status 7: Optimal)


success: True
status: 0
fun: 7225.0
x: [ 0.000e+00 1.000e+01 ... 0.000e+00 7.000e+01]
nit: 8
lower: residual: [ 0.000e+00 1.000e+01 ... 0.000e+00
7.000e+01]
marginals: [ 0.000e+00 0.000e+00 ... 1.500e+01
0.000e+00]
upper: residual: [ inf inf ... inf
inf]
(continues on next page)

310 Chapter 17. Optimal Transport


Intermediate Quantitative Economics with Python

(continued from previous page)


marginals: [ 0.000e+00 0.000e+00 ... 0.000e+00
0.000e+00]
eqlin: residual: [ 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00]
marginals: [ 1.000e+01 2.000e+01 1.000e+01 1.500e+01
5.000e+00 2.000e+01 5.000e+00]
ineqlin: residual: []
marginals: []
mip_node_count: 0
mip_dual_bound: 0.0
mip_gap: 0.0

Let’s compare this transport plan with

res.x

array([ 0., 10., 15., 50., 0., 65., 0., 60., 0., 0., 30., 0., 0.,
0., 70.])

Here the matrix 𝑋 contains entries 𝑥𝑖𝑗 that tell amounts shipped from factor 𝑖 = 1, 2, 3 to location 𝑗 = 1, 2, … , 5.
The vector 𝑧 evidently equals vec(𝑋).
The minimized cost from the optimal transport plan is given by the 𝑓𝑢𝑛 variable.

17.3.3 Using a Just-in-Time Compiler

We can also solve optimal transportation problems using a powerful tool from QuantEcon, namely, quantecon.
optimize.linprog_simplex.
While this routine uses the same simplex algorithm as scipy.optimize.linprog, the code is accelerated by using
a just-in-time compiler shipped in the numba library.
As you will see very soon, by using scipy.optimize.linprog the time required to solve an optimal transportation
problem can be reduced significantly.

# construct matrices/vectors for linprog_simplex


c = C.flatten()

# Equality constraints
A_eq = np.zeros((m+n, m*n))
for i in range(m):
for j in range(n):
A_eq[i, i*n+j] = 1
A_eq[m+j, i*n+j] = 1

b_eq = np.hstack([p, q])

Since quantecon.optimize.linprog_simplex does maximization instead of minimization, we need to put a


negative sign before vector c.

res_qe = linprog_simplex(-c, A_eq=A_eq, b_eq=b_eq)

Since the two LP solvers use the same simplex algorithm, we expect to get exactly the same solutions

17.3. The Linear Programming Approach 311


Intermediate Quantitative Economics with Python

res_qe.x.reshape((m, n), order='C')

array([[15., 35., 0., 0., 0.],


[10., 0., 60., 30., 0.],
[ 0., 80., 0., 0., 70.]])

res.x.reshape((m, n), order='F')

array([[ 0., 50., 0., 0., 0.],


[10., 0., 60., 30., 0.],
[15., 65., 0., 0., 70.]])

Let’s do a speed comparison between scipy.optimize.linprog and quantecon.optimize.


linprog_simplex.

# scipy.optimize.linprog
%time res = linprog(C_vec, A_eq=A[:-1, :], b_eq=b[:-1])

CPU times: user 2.74 ms, sys: 203 µs, total: 2.95 ms
Wall time: 2.04 ms

# quantecon.optimize.linprog_simplex
%time out = linprog_simplex(-c, A_eq=A_eq, b_eq=b_eq)

CPU times: user 114 µs, sys: 0 ns, total: 114 µs


Wall time: 121 µs

As you can see, the quantecon.optimize.linprog_simplex is much faster.


(Note however, that the SciPy version is probably more stable than the QuantEcon version, having been tested more
extensively over a longer period of time.)

17.4 The Dual Problem

Let 𝑢, 𝑣 denotes vectors of dual decision variables with entries (𝑢𝑖 ), (𝑣𝑗 ).
The dual to minimization problem (17.1) is the maximization problem:
𝑚 𝑛
max ∑ 𝑝𝑖 𝑢𝑖 + ∑ 𝑞𝑗 𝑣𝑗
𝑢𝑖 ,𝑣𝑗 (17.5)
𝑖=1 𝑗=1

subject to 𝑢𝑖 + 𝑣𝑗 ≤ 𝑐𝑖𝑗 , 𝑖 = 1, 2, … , 𝑚; 𝑗 = 1, 2, … , 𝑛

The dual problem is also a linear programming problem.


It has 𝑚 + 𝑛 dual variables and 𝑚𝑛 constraints.
Vectors 𝑢 and 𝑣 of values are attached to the first and the second sets of primal constraits, respectively.
Thus, 𝑢 is attached to the constraints
• (1′𝑛 ⊗ I𝑚 ) vec(𝑋) = 𝑝

312 Chapter 17. Optimal Transport


Intermediate Quantitative Economics with Python

and 𝑣 is attached to constraints


• (I𝑛 ⊗ 1′𝑚 ) vec(𝑋) = 𝑞.
Components of the vectors 𝑢 and 𝑣 of per unit values are shadow prices of the quantities appearing on the right sides of
those constraints.
We can write the dual problem as

max 𝑝𝑢 + 𝑞𝑣
𝑢𝑖 ,𝑣𝑗
(17.6)
𝑢
subject to 𝐴′ ( ) = vec(𝐶)
𝑣

For the same numerical example described above, let’s solve the dual problem.

# Solve the dual problem


res_dual = linprog(-b, A_ub=A.T, b_ub=C_vec,
bounds=[(None, None)]*(m+n))

#Print results
print("message:", res_dual.message)
print("nit:", res_dual.nit)
print("fun:", res_dual.fun)
print("u:", res_dual.x[:m])
print("v:", res_dual.x[-n:])

message: Optimization terminated successfully. (HiGHS Status 7: Optimal)


nit: 9
fun: -7225.0
u: [-20. -10. 0.]
v: [30. 35. 25. 40. 25.]

We can also solve the dual problem using quantecon.optimize.linprog_simplex.

res_dual_qe = linprog_simplex(b_eq, A_ub=A_eq.T, b_ub=c)

And the shadow prices computed by the two programs are identical.

res_dual_qe.x

array([ 5., 15., 25., 5., 10., 0., 15., 0.])

res_dual.x

array([-20., -10., 0., 30., 35., 25., 40., 25.])

We can compare computational times from using our two tools.

%time linprog(-b, A_ub=A.T, b_ub=C_vec, bounds=[(None, None)]*(m+n))

CPU times: user 3.23 ms, sys: 0 ns, total: 3.23 ms


Wall time: 2.35 ms

17.4. The Dual Problem 313


Intermediate Quantitative Economics with Python

message: Optimization terminated successfully. (HiGHS Status 7: Optimal)


success: True
status: 0
fun: -7225.0
x: [-2.000e+01 -1.000e+01 0.000e+00 3.000e+01 3.500e+01
2.500e+01 4.000e+01 2.500e+01]
nit: 9
lower: residual: [ inf inf inf inf
inf inf inf inf]
marginals: [ 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00]
upper: residual: [ inf inf inf inf
inf inf inf inf]
marginals: [ 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00 0.000e+00]
eqlin: residual: []
marginals: []
ineqlin: residual: [ 0.000e+00 0.000e+00 ... 1.500e+01
0.000e+00]
marginals: [-0.000e+00 -1.000e+01 ... -0.000e+00
-7.000e+01]
mip_node_count: 0
mip_dual_bound: 0.0
mip_gap: 0.0

%time linprog_simplex(b_eq, A_ub=A_eq.T, b_ub=c)

CPU times: user 379 µs, sys: 25 µs, total: 404 µs


Wall time: 412 µs

SimplexResult(x=array([ 5., 15., 25., 5., 10., 0., 15., 0.]), lambd=array([ 0.,␣
↪35., 0., 15., 0., 25., 0., 60., 15., 0., 0., 80., 0.,
0., 70.]), fun=7225.0, success=True, status=0, num_iter=24)

quantecon.optimize.linprog_simplex solves the dual problem 10 times faster.


Just for completeness, let’s solve the dual problems with nonsingular 𝐴 matrices that we create by dropping a redundant
equality constraint.
Try first leaving out the first constraint:

linprog(-b[1:], A_ub=A[1:].T, b_ub=C_vec,


bounds=[(None, None)]*(m+n-1))

message: Optimization terminated successfully. (HiGHS Status 7: Optimal)


success: True
status: 0
fun: -7225.0
x: [ 1.000e+01 2.000e+01 1.000e+01 1.500e+01 5.000e+00
2.000e+01 5.000e+00]
nit: 12
lower: residual: [ inf inf inf inf
inf inf inf]
marginals: [ 0.000e+00 0.000e+00 0.000e+00 0.000e+00
(continues on next page)

314 Chapter 17. Optimal Transport


Intermediate Quantitative Economics with Python

(continued from previous page)


0.000e+00 0.000e+00 0.000e+00]
upper: residual: [ inf inf inf inf
inf inf inf]
marginals: [ 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00]
eqlin: residual: []
marginals: []
ineqlin: residual: [ 0.000e+00 0.000e+00 ... 1.500e+01
0.000e+00]
marginals: [-1.500e+01 -1.000e+01 ... -0.000e+00
-7.000e+01]
mip_node_count: 0
mip_dual_bound: 0.0
mip_gap: 0.0

Not let’s instead leave out the last constraint:

linprog(-b[:-1], A_ub=A[:-1].T, b_ub=C_vec,


bounds=[(None, None)]*(m+n-1))

message:
Optimization terminated successfully. (HiGHS Status 7: Optimal)
success:
True
status:
0
fun:
-7225.0
x:
[ 5.000e+00 1.500e+01 2.500e+01 5.000e+00 1.000e+01
-0.000e+00 1.500e+01]
nit: 9
lower: residual: [ inf inf inf inf
inf inf inf]
marginals: [ 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00]
upper: residual: [ inf inf inf inf
inf inf inf]
marginals: [ 0.000e+00 0.000e+00 0.000e+00 0.000e+00
0.000e+00 0.000e+00 0.000e+00]
eqlin: residual: []
marginals: []
ineqlin: residual: [ 0.000e+00 0.000e+00 ... 1.500e+01
0.000e+00]
marginals: [-0.000e+00 -1.000e+01 ... -0.000e+00
-7.000e+01]
mip_node_count: 0
mip_dual_bound: 0.0
mip_gap: 0.0

17.4. The Dual Problem 315


Intermediate Quantitative Economics with Python

17.4.1 Interpretation of dual problem

By strong duality (please see this lecture Linear Programming), we know that:
𝑚 𝑛 𝑚 𝑛
∑ ∑ 𝑐𝑖𝑗 𝑥𝑖𝑗 = ∑ 𝑝𝑖 𝑢𝑖 + ∑ 𝑞𝑗 𝑣𝑗
𝑖=1 𝑗=1 𝑖=1 𝑗=1

One unit more capacity in factory 𝑖, i.e. 𝑝𝑖 , results in 𝑢𝑖 more transportation costs.
Thus, 𝑢𝑖 describes the cost of shipping one unit from factory 𝑖.
Call this the ship-out cost of one unit shipped from factory 𝑖.
Similarly, 𝑣𝑗 is the cost of shipping one unit to location 𝑗.
Call this the ship-in cost of one unit to location 𝑗.
Strong duality implies that total transprotation costs equals total ship-out costs plus total ship-in costs.
It is reasonable that, for one unit of a product, ship-out cost 𝑢𝑖 plus ship-in cost 𝑣𝑗 should equal transportation cost 𝑐𝑖𝑗 .
This equality is assured by complementary slackness conditions that state that whenever 𝑥𝑖𝑗 > 0, meaning that there
are positive shipments from factory 𝑖 to location 𝑗, it must be true that 𝑢𝑖 + 𝑣𝑗 = 𝑐𝑖𝑗 .

17.5 The Python Optimal Transport Package

There is an excellent Python package for optimal transport that simplifies some of the steps we took above.
In particular, the package takes care of the vectorization steps before passing the data out to a linear programming routine.
(That said, the discussion provided above on vectorization remains important, since we want to understand what happens
under the hood.)

17.5.1 Replicating Previous Results

The following line of code solves the example application discussed above using linear programming.

X = ot.emd(p, q, C)
X

/tmp/ipykernel_9255/1617639716.py:1: UserWarning: Input histogram consists of␣


↪integer. The transport plan will be casted accordingly, possibly resulting in a␣

↪loss of precision. If this behaviour is unwanted, please make sure your input␣

↪histogram consists of floating point elements.

X = ot.emd(p, q, C)

array([[15, 35, 0, 0, 0],


[10, 0, 60, 30, 0],
[ 0, 80, 0, 0, 70]])

Sure enough, we have the same solution and the same cost

total_cost = np.sum(X * C)
total_cost

316 Chapter 17. Optimal Transport


Intermediate Quantitative Economics with Python

7225

17.5.2 A Larger Application

Now let’s try using the same package on a slightly larger application.
The application has the same interpretation as above but we will also give each node (i.e., vertex) a location in the plane.
This will allow us to plot the resulting transport plan as edges in a graph.
The following class defines a node by
• its location (𝑥, 𝑦) ∈ ℝ2 ,
• its group (factory or location, denoted by p or q) and
• its mass (e.g., 𝑝𝑖 or 𝑞𝑗 ).

class Node:

def __init__(self, x, y, mass, group, name):

self.x, self.y = x, y
self.mass, self.group = mass, group
self.name = name

Next we write a function that repeatedly calls the class above to build instances.
It allocates to the nodes it creates their location, mass, and group.
Locations are assigned randomly.

def build_nodes_of_one_type(group='p', n=100, seed=123):

nodes = []
np.random.seed(seed)

for i in range(n):

if group == 'p':
m = 1/n
x = np.random.uniform(-2, 2)
y = np.random.uniform(-2, 2)
else:
m = betabinom.pmf(i, n-1, 2, 2)
x = 0.6 * np.random.uniform(-1.5, 1.5)
y = 0.6 * np.random.uniform(-1.5, 1.5)

name = group + str(i)


nodes.append(Node(x, y, m, group, name))

return nodes

Now we build two lists of nodes, each one containing one type (factories or locations)

n_p = 32
n_q = 32
(continues on next page)

17.5. The Python Optimal Transport Package 317


Intermediate Quantitative Economics with Python

(continued from previous page)


p_list = build_nodes_of_one_type(group='p', n=n_p)
q_list = build_nodes_of_one_type(group='q', n=n_q)

p_probs = [p.mass for p in p_list]


q_probs = [q.mass for q in q_list]

For the cost matrix 𝐶, we use the Euclidean distance between each factory and location.

c = np.empty((n_p, n_q))
for i in range(n_p):
for j in range(n_q):
x0, y0 = p_list[i].x, p_list[i].y
x1, y1 = q_list[j].x, q_list[j].y
c[i, j] = np.sqrt((x0-x1)**2 + (y0-y1)**2)

Now we are ready to apply the solver

%time pi = ot.emd(p_probs, q_probs, c)

CPU times: user 699 µs, sys: 46 µs, total: 745 µs


Wall time: 471 µs

Finally, let’s plot the results using networkx.


In the plot below,
• node size is proportional to probability mass
• an edge (arrow) from 𝑖 to 𝑗 is drawn when a positive transfer is made from 𝑖 to 𝑗 under the optimal transport plan.

g = nx.DiGraph()
g.add_nodes_from([p.name for p in p_list])
g.add_nodes_from([q.name for q in q_list])

for i in range(n_p):
for j in range(n_q):
if pi[i, j] > 0:
g.add_edge(p_list[i].name, q_list[j].name, weight=pi[i, j])

node_pos_dict={}
for p in p_list:
node_pos_dict[p.name] = (p.x, p.y)

for q in q_list:
node_pos_dict[q.name] = (q.x, q.y)

node_color_list = []
node_size_list = []
scale = 8_000
for p in p_list:
node_color_list.append('blue')
node_size_list.append(p.mass * scale)
for q in q_list:
node_color_list.append('red')
node_size_list.append(q.mass * scale)
(continues on next page)

318 Chapter 17. Optimal Transport


Intermediate Quantitative Economics with Python

(continued from previous page)

fig, ax = plt.subplots(figsize=(7, 10))


plt.axis('off')

nx.draw_networkx_nodes(g,
node_pos_dict,
node_color=node_color_list,
node_size=node_size_list,
edgecolors='grey',
linewidths=1,
alpha=0.5,
ax=ax)

nx.draw_networkx_edges(g,
node_pos_dict,
arrows=True,
connectionstyle='arc3,rad=0.1',
alpha=0.6)
plt.show()

17.5. The Python Optimal Transport Package 319


Intermediate Quantitative Economics with Python

320 Chapter 17. Optimal Transport


CHAPTER

EIGHTEEN

VON NEUMANN GROWTH MODEL (AND A GENERALIZATION)

Contents

• Von Neumann Growth Model (and a Generalization)


– Notation
– Model Ingredients and Assumptions
– Dynamic Interpretation
– Duality
– Interpretation as Two-player Zero-sum Game

This lecture uses the class Neumann to calculate key objects of a linear growth model of John von Neumann [von
Neumann, 1937] that was generalized by Kemeny, Morgenstern and Thompson [Kemeny et al., 1956].
Objects of interest are the maximal expansion rate (𝛼), the interest factor (𝛽), the optimal intensities (𝑥), and prices (𝑝).
In addition to watching how the towering mind of John von Neumann formulated an equilibrium model of price and
quantity vectors in balanced growth, this lecture shows how fruitfully to employ the following important tools:
• a zero-sum two-player game
• linear programming
• the Perron-Frobenius theorem
We’ll begin with some imports:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import fsolve, linprog
from textwrap import dedent

np.set_printoptions(precision=2)

The code below provides the Neumann class

class Neumann(object):

"""
This class describes the Generalized von Neumann growth model as it was
discussed in Kemeny et al. (1956, ECTA) and Gale (1960, Chapter 9.5):

(continues on next page)

321
Intermediate Quantitative Economics with Python

(continued from previous page)


Let:
n ... number of goods
m ... number of activities
A ... input matrix is m-by-n
a_{i,j} - amount of good j consumed by activity i
B ... output matrix is m-by-n
b_{i,j} - amount of good j produced by activity i

x ... intensity vector (m-vector) with non-negative entries


x'B - the vector of goods produced
x'A - the vector of goods consumed
p ... price vector (n-vector) with non-negative entries
Bp - the revenue vector for every activity
Ap - the cost of each activity

Both A and B have non-negative entries. Moreover, we assume that


(1) Assumption I (every good which is consumed is also produced):
for all j, b_{.,j} > 0, i.e. at least one entry is strictly positive
(2) Assumption II (no free lunch):
for all i, a_{i,.} > 0, i.e. at least one entry is strictly positive

Parameters
----------
A : array_like or scalar(float)
Part of the state transition equation. It should be `n x n`
B : array_like or scalar(float)
Part of the state transition equation. It should be `n x k`
"""

def __init__(self, A, B):

self.A, self.B = list(map(self.convert, (A, B)))


self.m, self.n = self.A.shape

# Check if (A, B) satisfy the basic assumptions


assert self.A.shape == self.B.shape, 'The input and output matrices \
must have the same dimensions!'
assert (self.A >= 0).all() and (self.B >= 0).all(), 'The input and \
output matrices must have only non-negative entries!'

# (1) Check whether Assumption I is satisfied:


if (np.sum(B, 0) <= 0).any():
self.AI = False
else:
self.AI = True

# (2) Check whether Assumption II is satisfied:


if (np.sum(A, 1) <= 0).any():
self.AII = False
else:
self.AII = True

def __repr__(self):
return self.__str__()

def __str__(self):

(continues on next page)

322 Chapter 18. Von Neumann Growth Model (and a Generalization)


Intermediate Quantitative Economics with Python

(continued from previous page)

me = """
Generalized von Neumann expanding model:
- number of goods : {n}
- number of activities : {m}

Assumptions:
- AI: every column of B has a positive entry : {AI}
- AII: every row of A has a positive entry : {AII}

"""
# Irreducible : {irr}
return dedent(me.format(n=self.n, m=self.m,
AI=self.AI, AII=self.AII))

def convert(self, x):


"""
Convert array_like objects (lists of lists, floats, etc.) into
well-formed 2D NumPy arrays
"""
return np.atleast_2d(np.asarray(x))

def bounds(self):
"""
Calculate the trivial upper and lower bounds for alpha (expansion rate)
and beta (interest factor). See the proof of Theorem 9.8 in Gale (1960)
"""

n, m = self.n, self.m
A, B = self.A, self.B

f = lambda α: ((B - α * A) @ np.ones((n, 1))).max()


g = lambda β: (np.ones((1, m)) @ (B - β * A)).min()

UB = fsolve(f, 1).item() # Upper bound for α, β


LB = fsolve(g, 2).item() # Lower bound for α, β

return LB, UB

def zerosum(self, γ, dual=False):


"""
Given gamma, calculate the value and optimal strategies of a
two-player zero-sum game given by the matrix

M(gamma) = B - gamma * A

Row player maximizing, column player minimizing

Zero-sum game as an LP (primal --> α)

max (0', 1) @ (x', v)


subject to
[-M', ones(n, 1)] @ (x', v)' <= 0
(x', v) @ (ones(m, 1), 0) = 1

(continues on next page)

323
Intermediate Quantitative Economics with Python

(continued from previous page)


(x', v) >= (0', -inf)

Zero-sum game as an LP (dual --> beta)

min (0', 1) @ (p', u)


subject to
[M, -ones(m, 1)] @ (p', u)' <= 0
(p', u) @ (ones(n, 1), 0) = 1
(p', u) >= (0', -inf)

Outputs:
--------
value: scalar
value of the zero-sum game

strategy: vector
if dual = False, it is the intensity vector,
if dual = True, it is the price vector
"""

A, B, n, m = self.A, self.B, self.n, self.m


M = B - γ * A

if dual == False:
# Solve the primal LP (for details see the description)
# (1) Define the problem for v as a maximization (linprog minimizes)
c = np.hstack([np.zeros(m), -1])

# (2) Add constraints :


# ... non-negativity constraints
bounds = tuple(m * [(0, None)] + [(None, None)])
# ... inequality constraints
A_iq = np.hstack([-M.T, np.ones((n, 1))])
b_iq = np.zeros((n, 1))
# ... normalization
A_eq = np.hstack([np.ones(m), 0]).reshape(1, m + 1)
b_eq = 1

res = linprog(c, A_ub=A_iq, b_ub=b_iq, A_eq=A_eq, b_eq=b_eq,


bounds=bounds)

else:
# Solve the dual LP (for details see the description)
# (1) Define the problem for v as a maximization (linprog minimizes)
c = np.hstack([np.zeros(n), 1])

# (2) Add constraints :


# ... non-negativity constraints
bounds = tuple(n * [(0, None)] + [(None, None)])
# ... inequality constraints
A_iq = np.hstack([M, -np.ones((m, 1))])
b_iq = np.zeros((m, 1))
# ... normalization
A_eq = np.hstack([np.ones(n), 0]).reshape(1, n + 1)
b_eq = 1

(continues on next page)

324 Chapter 18. Von Neumann Growth Model (and a Generalization)


Intermediate Quantitative Economics with Python

(continued from previous page)


res = linprog(c, A_ub=A_iq, b_ub=b_iq, A_eq=A_eq, b_eq=b_eq,
bounds=bounds)

if res.status != 0:
print(res.message)

# Pull out the required quantities


value = res.x[-1]
strategy = res.x[:-1]

return value, strategy

def expansion(self, tol=1e-8, maxit=1000):


"""
The algorithm used here is described in Hamburger-Thompson-Weil
(1967, ECTA). It is based on a simple bisection argument and utilizes
the idea that for a given γ (= α or β), the matrix "M = B - γ * A"
defines a two-player zero-sum game, where the optimal strategies are
the (normalized) intensity and price vector.

Outputs:
--------
alpha: scalar
optimal expansion rate
"""

LB, UB = self.bounds()

for iter in range(maxit):

γ = (LB + UB) / 2
ZS = self.zerosum(γ=γ)
V = ZS[0] # value of the game with γ

if V >= 0:
LB = γ
else:
UB = γ

if abs(UB - LB) < tol:


γ = (UB + LB) / 2
x = self.zerosum(γ=γ)[1]
p = self.zerosum(γ=γ, dual=True)[1]
break

return γ, x, p

def interest(self, tol=1e-8, maxit=1000):


"""
The algorithm used here is described in Hamburger-Thompson-Weil
(1967, ECTA). It is based on a simple bisection argument and utilizes
the idea that for a given gamma (= alpha or beta),
the matrix "M = B - γ * A" defines a two-player zero-sum game,
where the optimal strategies are the (normalized) intensity and price
vector

(continues on next page)

325
Intermediate Quantitative Economics with Python

(continued from previous page)

Outputs:
--------
beta: scalar
optimal interest rate
"""

LB, UB = self.bounds()

for iter in range(maxit):


γ = (LB + UB) / 2
ZS = self.zerosum(γ=γ, dual=True)
V = ZS[0]

if V > 0:
LB = γ
else:
UB = γ

if abs(UB - LB) < tol:


γ = (UB + LB) / 2
p = self.zerosum(γ=γ, dual=True)[1]
x = self.zerosum(γ=γ)[1]
break

return γ, x, p

18.1 Notation

We use the following notation.


0 denotes a vector of zeros.
We call an 𝑛-vector positive and write 𝑥 ≫ 0 if 𝑥𝑖 > 0 for all 𝑖 = 1, 2, … , 𝑛.
We call a vector non-negative and write 𝑥 ≥ 0 if 𝑥𝑖 ≥ 0 for all 𝑖 = 1, 2, … , 𝑛.
We call a vector semi-positive and written 𝑥 > 0 if 𝑥 ≥ 0 and 𝑥 ≠ 0.
For two conformable vectors 𝑥 and 𝑦, 𝑥 ≫ 𝑦, 𝑥 ≥ 𝑦 and 𝑥 > 𝑦 mean 𝑥 − 𝑦 ≫ 0, 𝑥 − 𝑦 ≥ 0, and 𝑥 − 𝑦 > 0, respectively.
We let all vectors in this lecture be column vectors; 𝑥𝑇 denotes the transpose of 𝑥 (i.e., a row vector).
Let 𝜄𝑛 denote a column vector composed of 𝑛 ones, i.e. 𝜄𝑛 = (1, 1, … , 1)𝑇 .
Let 𝑒𝑖 denote a vector (of arbitrary size) containing zeros except for the 𝑖 th position where it is one.
We denote matrices by capital letters. For an arbitrary matrix 𝐴, 𝑎𝑖,𝑗 represents the entry in its 𝑖 th row and 𝑗 th column.
𝑎⋅𝑗 and 𝑎𝑖⋅ denote the 𝑗 th column and 𝑖 th row of 𝐴, respectively.

326 Chapter 18. Von Neumann Growth Model (and a Generalization)


Intermediate Quantitative Economics with Python

18.2 Model Ingredients and Assumptions

A pair (𝐴, 𝐵) of 𝑚 × 𝑛 non-negative matrices defines an economy.


• 𝑚 is the number of activities (or sectors)
• 𝑛 is the number of goods (produced and/or consumed).
• 𝐴 is called the input matrix; 𝑎𝑖,𝑗 denotes the amount of good 𝑗 consumed by activity 𝑖
• 𝐵 is called the output matrix; 𝑏𝑖,𝑗 represents the amount of good 𝑗 produced by activity 𝑖
Two key assumptions restrict economy (𝐴, 𝐵):
• Assumption I: (every good that is consumed is also produced)

𝑏.,𝑗 > 0 ∀𝑗 = 1, 2, … , 𝑛

• Assumption II: (no free lunch)

𝑎𝑖,. > 0 ∀𝑖 = 1, 2, … , 𝑚

A semi-positive intensity 𝑚-vector 𝑥 denotes levels at which activities are operated.


Therefore,
• vector 𝑥𝑇 𝐴 gives the total amount of goods used in production
• vector 𝑥𝑇 𝐵 gives total outputs
An economy (𝐴, 𝐵) is said to be productive, if there exists a non-negative intensity vector 𝑥 ≥ 0 such that 𝑥𝑇 𝐵 > 𝑥𝑇 𝐴.
The semi-positive 𝑛-vector 𝑝 contains prices assigned to the 𝑛 goods.
The 𝑝 vector implies cost and revenue vectors
• the vector 𝐴𝑝 tells costs of the vector of activities
• the vector 𝐵𝑝 tells revenues from the vector of activities
Satisfaction or a property of an input-output pair (𝐴, 𝐵) called irreducibility (or indecomposability) determines whether
an economy can be decomposed into multiple “sub-economies”.
Definition: For an economy (𝐴, 𝐵), the set of goods 𝑆 ⊂ {1, 2, … , 𝑛} is called an independent subset if it is possible
to produce every good in 𝑆 without consuming goods from outside 𝑆. Formally, the set 𝑆 is independent if ∃𝑇 ⊂
{1, 2, … , 𝑚} (a subset of activities) such that 𝑎𝑖,𝑗 = 0 ∀𝑖 ∈ 𝑇 and 𝑗 ∈ 𝑆 𝑐 and for all 𝑗 ∈ 𝑆, ∃𝑖 ∈ 𝑇 for which 𝑏𝑖,𝑗 > 0.
The economy is irreducible if there are no proper independent subsets.
We study two examples, both in Chapter 9.6 of Gale [Gale, 1989]

# (1) Irreducible (A, B) example: α_0 = β_0


A1 = np.array([[0, 1, 0, 0],
[1, 0, 0, 1],
[0, 0, 1, 0]])

B1 = np.array([[1, 0, 0, 0],
[0, 0, 2, 0],
[0, 1, 0, 1]])

# (2) Reducible (A, B) example: β_0 < α_0


A2 = np.array([[0, 1, 0, 0, 0, 0],
(continues on next page)

18.2. Model Ingredients and Assumptions 327


Intermediate Quantitative Economics with Python

(continued from previous page)


[1, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 1, 0]])

B2 = np.array([[1, 0, 0, 1, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 2, 0],
[0, 0, 0, 1, 0, 1]])

The following code sets up our first Neumann economy or Neumann instance

n1 = Neumann(A1, B1)
n1

Generalized von Neumann expanding model:


- number of goods : 4
- number of activities : 3

Assumptions:
- AI: every column of B has a positive entry : True
- AII: every row of A has a positive entry : True

Here is a second instance of a Neumann economy

n2 = Neumann(A2, B2)
n2

Generalized von Neumann expanding model:


- number of goods : 6
- number of activities : 5

Assumptions:
- AI: every column of B has a positive entry : True
- AII: every row of A has a positive entry : True

18.3 Dynamic Interpretation

Attach a time index 𝑡 to the preceding objects, regard an economy as a dynamic system, and study sequences

{(𝐴𝑡 , 𝐵𝑡 )}𝑡≥0 , {𝑥𝑡 }𝑡≥0 , {𝑝𝑡 }𝑡≥0

An interesting special case holds the technology process constant and investigates the dynamics of quantities and prices
only.
Accordingly, in the rest of this lecture, we assume that (𝐴𝑡 , 𝐵𝑡 ) = (𝐴, 𝐵) for all 𝑡 ≥ 0.
A crucial element of the dynamic interpretation involves the timing of production.
We assume that production (consumption of inputs) takes place in period 𝑡, while the consequent output materializes in
period 𝑡 + 1, i.e., consumption of 𝑥𝑇𝑡 𝐴 in period 𝑡 results in 𝑥𝑇𝑡 𝐵 amounts of output in period 𝑡 + 1.

328 Chapter 18. Von Neumann Growth Model (and a Generalization)


Intermediate Quantitative Economics with Python

These timing conventions imply the following feasibility condition:

𝑥𝑇𝑡 𝐵 ≥ 𝑥𝑇𝑡+1 𝐴 ∀𝑡 ≥ 1

which asserts that no more goods can be used today than were produced yesterday.
Accordingly, 𝐴𝑝𝑡 tells the costs of production in period 𝑡 and 𝐵𝑝𝑡 tells revenues in period 𝑡 + 1.

18.3.1 Balanced Growth

We follow John von Neumann in studying “balanced growth”.


Let ./ denote an elementwise division of one vector by another and let 𝛼 > 0 be a scalar.
Then balanced growth is a situation in which

𝑥𝑡+1 ./𝑥𝑡 = 𝛼, ∀𝑡 ≥ 0

With balanced growth, the law of motion of 𝑥 is evidently 𝑥𝑡+1 = 𝛼𝑥𝑡 and so we can rewrite the feasibility constraint as

𝑥𝑇𝑡 𝐵 ≥ 𝛼𝑥𝑇𝑡 𝐴 ∀𝑡

In the same spirit, define 𝛽 ∈ ℝ as the interest factor per unit of time.
We assume that it is always possible to earn a gross return equal to the constant interest factor 𝛽 by investing “outside the
model”.
Under this assumption about outside investment opportunities, a no-arbitrage condition gives rise to the following (no
profit) restriction on the price sequence:

𝛽𝐴𝑝𝑡 ≥ 𝐵𝑝𝑡 ∀𝑡

This says that production cannot yield a return greater than that offered by the outside investment opportunity (here we
compare values in period 𝑡 + 1).
The balanced growth assumption allows us to drop time subscripts and conduct an analysis purely in terms of a time-
invariant growth rate 𝛼 and interest factor 𝛽.

18.4 Duality

Two problems are connected by a remarkable dual relationship between technological and valuation characteristics of the
economy:
Definition: The technological expansion problem (TEP) for the economy (𝐴, 𝐵) is to find a semi-positive 𝑚-vector 𝑥 > 0
and a number 𝛼 ∈ ℝ that satisfy

max 𝛼
𝛼
s.t. 𝑥𝑇 𝐵 ≥ 𝛼𝑥𝑇 𝐴

Theorem 9.3 of David Gale’s book [Gale, 1989] asserts that if Assumptions I and II are both satisfied, then a maximum
value of 𝛼 exists and that it is positive.
The maximal value is called the technological expansion rate and is denoted by 𝛼0 . The associated intensity vector 𝑥0 is
the optimal intensity vector.

18.4. Duality 329


Intermediate Quantitative Economics with Python

Definition: The economic expansion problem (EEP) for (𝐴, 𝐵) is to find a semi-positive 𝑛-vector 𝑝 > 0 and a number
𝛽 ∈ ℝ that satisfy
min 𝛽
𝛽

s.t. 𝐵𝑝 ≤ 𝛽𝐴𝑝
Assumptions I and II imply existence of a minimum value 𝛽0 > 0 called the economic expansion rate.
The corresponding price vector 𝑝0 is the optimal price vector.
Because the criterion functions in the technological expansion problem and the economical expansion problem are both
linearly homogeneous, the optimality of 𝑥0 and 𝑝0 are defined only up to a positive scale factor.
For convenience (and to emphasize a close connection to zero-sum games), we normalize both vectors 𝑥0 and 𝑝0 to have
unit length.
A standard duality argument (see Lemma 9.4. in (Gale, 1960) [Gale, 1989]) implies that under Assumptions I and II,
𝛽0 ≤ 𝛼 0 .
But to deduce that 𝛽0 ≥ 𝛼0 , Assumptions I and II are not sufficient.
Therefore, von Neumann [von Neumann, 1937] went on to prove the following remarkable “duality” result that connects
TEP and EEP.
Theorem 1 (von Neumann): If the economy (𝐴, 𝐵) satisfies Assumptions I and II, then there exist (𝛾 ∗ , 𝑥0 , 𝑝0 ), where
𝛾 ∗ ∈ [𝛽0 , 𝛼0 ] ⊂ ℝ, 𝑥0 > 0 is an 𝑚-vector, 𝑝0 > 0 is an 𝑛-vector, and the following arbitrage true
𝑥𝑇0 𝐵 ≥ 𝛾 ∗ 𝑥𝑇0 𝐴
𝐵𝑝0 ≤ 𝛾 ∗ 𝐴𝑝0
𝑥𝑇0 (𝐵 − 𝛾 ∗ 𝐴) 𝑝0 = 0

Note: Proof (Sketch): Assumption I and II imply that there exist (𝛼0 , 𝑥0 ) and (𝛽0 , 𝑝0 ) that solve the TEP and EEP,
respectively. If 𝛾 ∗ > 𝛼0 , then by definition of 𝛼0 , there cannot exist a semi-positive 𝑥 that satisfies 𝑥𝑇 𝐵 ≥ 𝛾 ∗ 𝑥𝑇 𝐴.
Similarly, if 𝛾 ∗ < 𝛽0 , there is no semi-positive 𝑝 for which 𝐵𝑝 ≤ 𝛾 ∗ 𝐴𝑝. Let 𝛾 ∗ ∈ [𝛽0 , 𝛼0 ], then 𝑥𝑇0 𝐵 ≥ 𝛼0 𝑥𝑇0 𝐴 ≥
𝛾 ∗ 𝑥𝑇0 𝐴. Moreover, 𝐵𝑝0 ≤ 𝛽0 𝐴𝑝0 ≤ 𝛾 ∗ 𝐴𝑝0 . These two inequalities imply 𝑥0 (𝐵 − 𝛾 ∗ 𝐴) 𝑝0 = 0.

Here the constant 𝛾 ∗ is both an expansion factor and an interest factor (not necessarily optimal).
We have already encountered and discussed the first two inequalities that represent feasibility and no-profit conditions.
Moreover, the equality 𝑥𝑇0 (𝐵 − 𝛾 ∗ 𝐴) 𝑝0 = 0 concisely expresses the requirements that if any good grows at a rate larger
than 𝛾 ∗ (i.e., if it is oversupplied), then its price must be zero; and that if any activity provides negative profit, it must be
unused.
Therefore, the conditions stated in Theorem I ex encode all equilibrium conditions.
So Theorem I essentially states that under Assumptions I and II there always exists an equilibrium (𝛾 ∗ , 𝑥0 , 𝑝0 ) with
balanced growth.
Note that Theorem I is silent about uniqueness of the equilibrium. In fact, it does not rule out (trivial) cases with 𝑥𝑇0 𝐵𝑝0 =
0 so that nothing of value is produced.
To exclude such uninteresting cases, Kemeny, Morgenstern and Thomspson [Kemeny et al., 1956] add an extra require-
ment

𝑥𝑇0 𝐵𝑝0 > 0

and call the associated equilibria economic solutions.


They show that this extra condition does not affect the existence result, while it significantly reduces the number of
(relevant) solutions.

330 Chapter 18. Von Neumann Growth Model (and a Generalization)


Intermediate Quantitative Economics with Python

18.5 Interpretation as Two-player Zero-sum Game

To compute the equilibrium (𝛾 ∗ , 𝑥0 , 𝑝0 ), we follow the algorithm proposed by Hamburger, Thompson and Weil (1967),
building on the key insight that an equilibrium (with balanced growth) can be solves a particular two-player zero-sum
game. First, we introduce some notation.
Consider the 𝑚 × 𝑛 matrix 𝐶 as a payoff matrix, with the entries representing payoffs from the minimizing column
player to the maximizing row player and assume that the players can use mixed strategies. Thus,
• the row player chooses the 𝑚-vector 𝑥 > 0 subject to 𝜄𝑇𝑚 𝑥 = 1
• the column player chooses the 𝑛-vector 𝑝 > 0 subject to 𝜄𝑇𝑛 𝑝 = 1.
Definition: The 𝑚 × 𝑛 matrix game 𝐶 has the solution (𝑥∗ , 𝑝∗ , 𝑉 (𝐶)) in mixed strategies if

(𝑥∗ )𝑇 𝐶𝑒𝑗 ≥ 𝑉 (𝐶) ∀𝑗 ∈ {1, … , 𝑛} and (𝑒𝑖 )𝑇 𝐶𝑝∗ ≤ 𝑉 (𝐶) ∀𝑖 ∈ {1, … , 𝑚}

The number 𝑉 (𝐶) is called the value of the game.


From the above definition, it is clear that the value 𝑉 (𝐶) has two alternative interpretations:
• by playing the appropriate mixed stategy, the maximizing player can assure himself at least 𝑉 (𝐶) (no matter what
the column player chooses)
• by playing the appropriate mixed stategy, the minimizing player can make sure that the maximizing player will not
get more than 𝑉 (𝐶) (irrespective of what is the maximizing player’s choice)
A famous theorem of Nash (1951) tells us that there always exists a mixed strategy Nash equilibrium for any finite two-
player zero-sum game.
Moreover, von Neumann’s Minmax Theorem [von Neumann, 1928] implies that

𝑉 (𝐶) = max min 𝑥𝑇 𝐶𝑝 = min max 𝑥𝑇 𝐶𝑝 = (𝑥∗ )𝑇 𝐶𝑝∗


𝑥 𝑝 𝑝 𝑥

18.5.1 Connection with Linear Programming (LP)

Nash equilibria of a finite two-player zero-sum game solve a linear programming problem.
To see this, we introduce the following notation
• For a fixed 𝑥, let 𝑣 be the value of the minimization problem: 𝑣 ≡ min𝑝 𝑥𝑇 𝐶𝑝 = min𝑗 𝑥𝑇 𝐶𝑒𝑗
• For a fixed 𝑝, let 𝑢 be the value of the maximization problem: 𝑢 ≡ max𝑥 𝑥𝑇 𝐶𝑝 = max𝑖 (𝑒𝑖 )𝑇 𝐶𝑝
Then the max-min problem (the game from the maximizing player’s point of view) can be written as the primal LP

𝑉 (𝐶) = max 𝑣
s.t. 𝑣𝜄𝑇𝑛 ≤ 𝑥𝑇 𝐶
𝑥≥0
𝜄𝑇𝑛 𝑥 = 1

while the min-max problem (the game from the minimizing player’s point of view) is the dual LP

𝑉 (𝐶) = min 𝑢
s.t. 𝑢𝜄𝑚 ≥ 𝐶𝑝
𝑝≥0
𝜄𝑇𝑚 𝑝 = 1

18.5. Interpretation as Two-player Zero-sum Game 331


Intermediate Quantitative Economics with Python

Hamburger, Thompson and Weil [Hamburger et al., 1967] view the input-output pair of the economy as payoff matrices
of two-player zero-sum games.
Using this interpretation, they restate Assumption I and II as follows

𝑉 (−𝐴) < 0 and 𝑉 (𝐵) > 0

Note: Proof (Sketch):


• ⇒ 𝑉 (𝐵) > 0 implies 𝑥𝑇0 𝐵 ≫ 0, where 𝑥0 is a maximizing vector. Since 𝐵 is non-negative, this requires that
each column of 𝐵 has at least one positive entry, which is Assumption I.
• ⇐ From Assumption I and the fact that 𝑝 > 0, it follows that 𝐵𝑝 > 0. This implies that the maximizing player
can always choose 𝑥 so that 𝑥𝑇 𝐵𝑝 > 0 so that it must be the case that 𝑉 (𝐵) > 0.

In order to (re)state Theorem I in terms of a particular two-player zero-sum game, we define a matrix for 𝛾 ∈ ℝ

𝑀 (𝛾) ≡ 𝐵 − 𝛾𝐴

For fixed 𝛾, treating 𝑀 (𝛾) as a matrix game, calculating the solution of the game implies
• If 𝛾 > 𝛼0 , then for all 𝑥 > 0, there ∃𝑗 ∈ {1, … , 𝑛}, s.t. [𝑥𝑇 𝑀 (𝛾)]𝑗 < 0 implying that 𝑉 (𝑀 (𝛾)) < 0.
• If 𝛾 < 𝛽0 , then for all 𝑝 > 0, there ∃𝑖 ∈ {1, … , 𝑚}, s.t. [𝑀 (𝛾)𝑝]𝑖 > 0 implying that 𝑉 (𝑀 (𝛾)) > 0.
• If 𝛾 ∈ {𝛽0 , 𝛼0 }, then (by Theorem I) the optimal intensity and price vectors 𝑥0 and 𝑝0 satisfy
𝑥𝑇0 𝑀 (𝛾) ≥ 0𝑇 and 𝑀 (𝛾)𝑝0 ≤ 0
That is, (𝑥0 , 𝑝0 , 0) is a solution of the game 𝑀 (𝛾) so that 𝑉 (𝑀 (𝛽0 )) = 𝑉 (𝑀 (𝛼0 )) = 0.
• If 𝛽0 < 𝛼0 and 𝛾 ∈ (𝛽0 , 𝛼0 ), then 𝑉 (𝑀 (𝛾)) = 0.
Moreover, if 𝑥′ is optimal for the maximizing player in 𝑀 (𝛾 ′ ) for 𝛾 ′ ∈ (𝛽0 , 𝛼0 ) and 𝑝″ is optimal for the minimizing
player in 𝑀 (𝛾 ″ ) where 𝛾 ″ ∈ (𝛽0 , 𝛾 ′ ), then (𝑥′ , 𝑝″ , 0) is a solution for 𝑀 (𝛾) ∀𝛾 ∈ (𝛾 ″ , 𝛾 ′ ).
Proof (Sketch): If 𝑥′ is optimal for a maximizing player in game 𝑀 (𝛾 ′ ), then (𝑥′ )𝑇 𝑀 (𝛾 ′ ) ≥ 0𝑇 and so for all 𝛾 < 𝛾 ′ .

(𝑥′ )𝑇 𝑀 (𝛾) = (𝑥′ )𝑇 𝑀 (𝛾 ′ ) + (𝑥′ )𝑇 (𝛾 ′ − 𝛾)𝐴 ≥ 0𝑇

hence 𝑉 (𝑀 (𝛾)) ≥ 0. If 𝑝″ is optimal for a minimizing player in game 𝑀 (𝛾 ″ ), then 𝑀 (𝛾)𝑝 ≤ 0 and so for all 𝛾 ″ < 𝛾

𝑀 (𝛾)𝑝″ = 𝑀 (𝛾 ″ ) + (𝛾 ″ − 𝛾)𝐴𝑝″ ≤ 0

hence 𝑉 (𝑀 (𝛾)) ≤ 0.
It is clear from the above argument that 𝛽0 , 𝛼0 are the minimal and maximal 𝛾 for which 𝑉 (𝑀 (𝛾)) = 0.
Furthermore, Hamburger et al. [Hamburger et al., 1967] show that the function 𝛾 ↦ 𝑉 (𝑀 (𝛾)) is continuous and
nonincreasing in 𝛾.
This suggests an algorithm to compute (𝛼0 , 𝑥0 ) and (𝛽0 , 𝑝0 ) for a given input-output pair (𝐴, 𝐵).

332 Chapter 18. Von Neumann Growth Model (and a Generalization)


Intermediate Quantitative Economics with Python

18.5.2 Algorithm

Hamburger, Thompson and Weil [Hamburger et al., 1967] propose a simple bisection algorithm to find the minimal and
maximal roots (i.e. 𝛽0 and 𝛼0 ) of the function 𝛾 ↦ 𝑉 (𝑀 (𝛾)).

Step 1

First, notice that we can easily find trivial upper and lower bounds for 𝛼0 and 𝛽0 .
• TEP requires that 𝑥𝑇 (𝐵 − 𝛼𝐴) ≥ 0𝑇 and 𝑥 > 0, so if 𝛼 is so large that max𝑖 {[(𝐵 − 𝛼𝐴)𝜄𝑛 ]𝑖 } < 0, then TEP
ceases to have a solution.
Accordingly, let UB be the 𝛼∗ that solves max𝑖 {[(𝐵 − 𝛼∗ 𝐴)𝜄𝑛 ]𝑖 } = 0.
• Similar to the upper bound, if 𝛽 is so low that min𝑗 {[𝜄𝑇𝑚 (𝐵 − 𝛽𝐴)]𝑗 } > 0, then the EEP has no solution and so
we can define LB as the 𝛽 ∗ that solves min𝑗 {[𝜄𝑇𝑚 (𝐵 − 𝛽 ∗ 𝐴)]𝑗 } = 0.
The bounds method calculates these trivial bounds for us

n1.bounds()

(1.0, 2.0)

Step 2

Compute 𝛼0 and 𝛽0
• Finding 𝛼0
1. Fix 𝛾 = 𝑈𝐵+𝐿𝐵2 and compute the solution of the two-player zero-sum game associated with 𝑀 (𝛾). We can
use either the primal or the dual LP problem.
2. If 𝑉 (𝑀 (𝛾)) ≥ 0, then set 𝐿𝐵 = 𝛾, otherwise let 𝑈 𝐵 = 𝛾.
3. Iterate on 1. and 2. until |𝑈 𝐵 − 𝐿𝐵| < 𝜖.
• Finding 𝛽0
1. Fix 𝛾 = 𝑈𝐵+𝐿𝐵2 and compute the solution of the two-player zero-sum game associated. with 𝑀 (𝛾). We can
use either the primal or the dual LP problem.
2. If 𝑉 (𝑀 (𝛾)) > 0, then set 𝐿𝐵 = 𝛾, otherwise let 𝑈 𝐵 = 𝛾.
3. Iterate on 1. and 2. until |𝑈 𝐵 − 𝐿𝐵| < 𝜖.
• Existence: Since 𝑉 (𝑀 (𝐿𝐵)) > 0 and 𝑉 (𝑀 (𝑈 𝐵)) < 0 and 𝑉 (𝑀 (⋅)) is a continuous, nonincreasing function,
there is at least one 𝛾 ∈ [𝐿𝐵, 𝑈 𝐵], s.t. 𝑉 (𝑀 (𝛾)) = 0.
The zerosum method calculates the value and optimal strategies associated with a given 𝛾.

γ = 2

print(f'Value of the game with γ = {γ}')


print(n1.zerosum(γ=γ)[0])
print('Intensity vector (from the primal)')
print(n1.zerosum(γ=γ)[1])
print('Price vector (from the dual)')
print(n1.zerosum(γ=γ, dual=True)[1])

18.5. Interpretation as Two-player Zero-sum Game 333


Intermediate Quantitative Economics with Python

Value of the game with γ = 2


-0.24
Intensity vector (from the primal)
[0.32 0.28 0.4 ]
Price vector (from the dual)
[0.4 0.32 0.28 0. ]

numb_grid = 100
γ_grid = np.linspace(0.4, 2.1, numb_grid)

value_ex1_grid = np.asarray([n1.zerosum(γ=γ_grid[i])[0]
for i in range(numb_grid)])
value_ex2_grid = np.asarray([n2.zerosum(γ=γ_grid[i])[0]
for i in range(numb_grid)])

fig, axes = plt.subplots(1, 2, figsize=(14, 5), sharey=True)


fig.suptitle(r'The function $V(M(\gamma))$', fontsize=16)

for ax, grid, N, i in zip(axes, (value_ex1_grid, value_ex2_grid),


(n1, n2), (1, 2)):
ax.plot(γ_grid, grid)
ax.set(title=f'Example {i}', xlabel='$\gamma$')
ax.axhline(0, c='k', lw=1)
ax.axvline(N.bounds()[0], c='r', ls='--', label='lower bound')
ax.axvline(N.bounds()[1], c='g', ls='--', label='upper bound')

plt.show()

The expansion method implements the bisection algorithm for 𝛼0 (and uses the primal LP problem for 𝑥0 )

α_0, x, p = n1.expansion()
print(f'α_0 = {α_0}')
print(f'x_0 = {x}')
print(f'The corresponding p from the dual = {p}')

α_0 = 1.2599210478365421
x_0 = [0.33 0.26 0.41]
(continues on next page)

334 Chapter 18. Von Neumann Growth Model (and a Generalization)


Intermediate Quantitative Economics with Python

(continued from previous page)


The corresponding p from the dual = [0.41 0.33 0.26 0. ]

The interest method implements the bisection algorithm for 𝛽0 (and uses the dual LP problem for 𝑝0 )

β_0, x, p = n1.interest()
print(f'β_0 = {β_0}')
print(f'p_0 = {p}')
print(f'The corresponding x from the primal = {x}')

β_0 = 1.2599210478365421
p_0 = [0.41 0.33 0.26 0. ]
The corresponding x from the primal = [0.33 0.26 0.41]

Of course, when 𝛾 ∗ is unique, it is irrelevant which one of the two methods we use – both work.
In particular, as will be shown below, in case of an irreducible (𝐴, 𝐵) (like in Example 1), the maximal and minimal
roots of 𝑉 (𝑀 (𝛾)) necessarily coincide implying a ‘‘full duality’’ result, i.e. 𝛼0 = 𝛽0 = 𝛾 ∗ so that the expansion (and
interest) rate 𝛾 ∗ is unique.

18.5.3 Uniqueness and Irreducibility

As an illustration, compute first the maximal and minimal roots of 𝑉 (𝑀 (⋅)) for our Example 2 that has a reducible
input-output pair (𝐴, 𝐵)

α_0, x, p = n2.expansion()
print(f'α_0 = {α_0}')
print(f'x_0 = {x}')
print(f'The corresponding p from the dual = {p}')

α_0 = 1.259921052493155
x_0 = [5.27e-10 0.00e+00 3.27e-01 2.60e-01 4.13e-01]
The corresponding p from the dual = [0. 0.21 0.33 0.26 0.21 0. ]

β_0, x, p = n2.interest()
print(f'β_0 = {β_0}')
print(f'p_0 = {p}')
print(f'The corresponding x from the primal = {x}')

β_0 = 1.0000000009313226
p_0 = [ 5.00e-01 5.00e-01 -1.55e-09 -1.24e-09 -9.31e-10 0.00e+00]
The corresponding x from the primal = [-0. 0. 0.25 0.25 0.5 ]

As we can see, with a reducible (𝐴, 𝐵), the roots found by the bisection algorithms might differ, so there might be multiple
𝛾 ∗ that make the value of the game with 𝑀 (𝛾 ∗ ) zero. (see the figure above).
Indeed, although the von Neumann theorem assures existence of the equilibrium, Assumptions I and II are not sufficient
for uniqueness. Nonetheless, Kemeny et al. (1967) show that there are at most finitely many economic solutions, meaning
that there are only finitely many 𝛾 ∗ that satisfy 𝑉 (𝑀 (𝛾 ∗ )) = 0 and 𝑥𝑇0 𝐵𝑝0 > 0 and that for each such 𝛾𝑖∗ , there is a
self-contained part of the economy (a sub-economy) that in equilibrium can expand independently with the expansion
coefficient 𝛾𝑖∗ .

18.5. Interpretation as Two-player Zero-sum Game 335


Intermediate Quantitative Economics with Python

The following theorem (see Theorem 9.10. in Gale [Gale, 1989]) asserts that imposing irreducibility is sufficient for
uniqueness of (𝛾 ∗ , 𝑥0 , 𝑝0 ).
Theorem II: Adopt the conditions of Theorem 1. If the economy (𝐴, 𝐵) is irreducible, then 𝛾 ∗ = 𝛼0 = 𝛽0 .

18.5.4 A Special Case

There is a special (𝐴, 𝐵) that allows us to simplify the solution method significantly by invoking the powerful Perron-
Frobenius theorem for non-negative matrices.
Definition: We call an economy simple if it satisfies
• 𝑛=𝑚
• Each activity produces exactly one good
• Each good is produced by one and only one activity.
These assumptions imply that 𝐵 = 𝐼𝑛 , i.e., that 𝐵 can be written as an identity matrix (possibly after reshuffling its rows
and columns).
The simple model has the following special property (Theorem 9.11. in Gale [Gale, 1989]): if 𝑥0 and 𝛼0 > 0 solve the
TEP with (𝐴, 𝐼𝑛 ), then

1
𝑥𝑇0 = 𝛼0 𝑥𝑇0 𝐴 ⇔ 𝑥𝑇0 𝐴 = ( ) 𝑥𝑇0
𝛼0

The latter shows that 1/𝛼0 is a positive eigenvalue of 𝐴 and 𝑥0 is the corresponding non-negative left eigenvector.
The classic result of Perron and Frobenius implies that a non-negative matrix has a non-negative eigenvalue-eigenvector
pair.
Moreover, if 𝐴 is irreducible, then the optimal intensity vector 𝑥0 is positive and unique up to multiplication by a positive
scalar.
Suppose that 𝐴 is reducible with 𝑘 irreducible subsets 𝑆1 , … , 𝑆𝑘 . Let 𝐴𝑖 be the submatrix corresponding to 𝑆𝑖 and let
𝛼𝑖 and 𝛽𝑖 be the associated expansion and interest factors, respectively. Then we have

𝛼0 = max{𝛼𝑖 } and 𝛽0 = min{𝛽𝑖 }


𝑖 𝑖

336 Chapter 18. Von Neumann Growth Model (and a Generalization)


Part IV

Introduction to Dynamics

337
CHAPTER

NINETEEN

FINITE MARKOV CHAINS

Contents

• Finite Markov Chains


– Overview
– Definitions
– Simulation
– Marginal Distributions
– Irreducibility and Aperiodicity
– Stationary Distributions
– Ergodicity
– Computing Expectations
– Exercises

In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon

19.1 Overview

Markov chains are one of the most useful classes of stochastic processes, being
• simple, flexible and supported by many elegant theoretical results
• valuable for building intuition about random dynamic models
• central to quantitative modeling in their own right
You will find them in many of the workhorse models of economics and finance.
In this lecture, we review some of the theory of Markov chains.
We will also introduce some of the high-quality routines for working with Markov chains available in QuantEcon.py.
Prerequisite knowledge is basic probability and linear algebra.
Let’s start with some standard imports:

339
Intermediate Quantitative Economics with Python

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import quantecon as qe
import numpy as np
from mpl_toolkits.mplot3d import Axes3D

19.2 Definitions

The following concepts are fundamental.

19.2.1 Stochastic Matrices

A stochastic matrix (or Markov matrix) is an 𝑛 × 𝑛 square matrix 𝑃 such that


1. each element of 𝑃 is nonnegative, and
2. each row of 𝑃 sums to one
Each row of 𝑃 can be regarded as a probability mass function over 𝑛 possible outcomes.
It is too not difficult to check1 that if 𝑃 is a stochastic matrix, then so is the 𝑘-th power 𝑃 𝑘 for all 𝑘 ∈ ℕ.

19.2.2 Markov Chains

There is a close connection between stochastic matrices and Markov chains.


To begin, let 𝑆 be a finite set with 𝑛 elements {𝑥1 , … , 𝑥𝑛 }.
The set 𝑆 is called the state space and 𝑥1 , … , 𝑥𝑛 are the state values.
A Markov chain {𝑋𝑡 } on 𝑆 is a sequence of random variables on 𝑆 that have the Markov property.
This means that, for any date 𝑡 and any state 𝑦 ∈ 𝑆,

ℙ{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 } = ℙ{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 , 𝑋𝑡−1 , …} (19.1)

In other words, knowing the current state is enough to know probabilities for future states.
In particular, the dynamics of a Markov chain are fully determined by the set of values

𝑃 (𝑥, 𝑦) ∶= ℙ{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥} (𝑥, 𝑦 ∈ 𝑆) (19.2)

By construction,
• 𝑃 (𝑥, 𝑦) is the probability of going from 𝑥 to 𝑦 in one unit of time (one step)
• 𝑃 (𝑥, ⋅) is the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥
We can view 𝑃 as a stochastic matrix where

𝑃𝑖𝑗 = 𝑃 (𝑥𝑖 , 𝑥𝑗 ) 1 ≤ 𝑖, 𝑗 ≤ 𝑛

Going the other way, if we take a stochastic matrix 𝑃 , we can generate a Markov chain {𝑋𝑡 } as follows:
1 Hint: First show that if 𝑃 and 𝑄 are stochastic matrices then so is their product — to check the row sums, try post multiplying by a column vector

of ones. Finally, argue that 𝑃 𝑛 is a stochastic matrix using induction.

340 Chapter 19. Finite Markov Chains


Intermediate Quantitative Economics with Python

• draw 𝑋0 from a marginal distribution 𝜓


• for each 𝑡 = 0, 1, …, draw 𝑋𝑡+1 from 𝑃 (𝑋𝑡 , ⋅)
By construction, the resulting process satisfies (19.2).

19.2.3 Example 1

Consider a worker who, at any given time 𝑡, is either unemployed (state 0) or employed (state 1).
Suppose that, over a one month period,
1. An unemployed worker finds a job with probability 𝛼 ∈ (0, 1).
2. An employed worker loses her job and becomes unemployed with probability 𝛽 ∈ (0, 1).
In terms of a Markov model, we have
• 𝑆 = {0, 1}
• 𝑃 (0, 1) = 𝛼 and 𝑃 (1, 0) = 𝛽
We can write out the transition probabilities in matrix form as

1−𝛼 𝛼
𝑃 =( ) (19.3)
𝛽 1−𝛽

Once we have the values 𝛼 and 𝛽, we can address a range of questions, such as
• What is the average duration of unemployment?
• Over the long-run, what fraction of time does a worker find herself unemployed?
• Conditional on employment, what is the probability of becoming unemployed at least once over the next 12 months?
We’ll cover such applications below.

19.2.4 Example 2

From US unemployment data, Hamilton [Hamilton, 2005] estimated the stochastic matrix

0.971 0.029 0
𝑃 =⎛
⎜ 0.145 0.778 0.077 ⎞

⎝ 0 0.508 0.492 ⎠

where
• the frequency is monthly
• the first state represents “normal growth”
• the second state represents “mild recession”
• the third state represents “severe recession”
For example, the matrix tells us that when the state is normal growth, the state will again be normal growth next month
with probability 0.97.
In general, large values on the main diagonal indicate persistence in the process {𝑋𝑡 }.
This Markov process can also be represented as a directed graph, with edges labeled by transition probabilities
Here “ng” is normal growth, “mr” is mild recession, etc.

19.2. Definitions 341


Intermediate Quantitative Economics with Python

19.3 Simulation

One natural way to answer questions about Markov chains is to simulate them.
(To approximate the probability of event 𝐸, we can simulate many times and count the fraction of times that 𝐸 occurs).
Nice functionality for simulating Markov chains exists in QuantEcon.py.
• Efficient, bundled with lots of other useful routines for handling Markov chains.
However, it’s also a good exercise to roll our own routines — let’s do that first and then come back to the methods in
QuantEcon.py.
In these exercises, we’ll take the state space to be 𝑆 = 0, … , 𝑛 − 1.

19.3.1 Rolling Our Own

To simulate a Markov chain, we need its stochastic matrix 𝑃 and a marginal probability distribution 𝜓 from which to
draw a realization of 𝑋0 .
The Markov chain is then constructed as discussed above. To repeat:
1. At time 𝑡 = 0, draw a realization of 𝑋0 from 𝜓.
2. At each subsequent time 𝑡, draw a realization of the new state 𝑋𝑡+1 from 𝑃 (𝑋𝑡 , ⋅).
To implement this simulation procedure, we need a method for generating draws from a discrete distribution.
For this task, we’ll use random.draw from QuantEcon, which works as follows:

ψ = (0.3, 0.7) # probabilities over {0, 1}


cdf = np.cumsum(ψ) # convert into cummulative distribution
qe.random.draw(cdf, 5) # generate 5 independent draws from ψ

array([1, 0, 1, 1, 1])

We’ll write our code as a function that accepts the following three arguments
• A stochastic matrix P
• An initial state init
• A positive integer sample_size representing the length of the time series the function should return

342 Chapter 19. Finite Markov Chains


Intermediate Quantitative Economics with Python

def mc_sample_path(P, ψ_0=None, sample_size=1_000):

# set up
P = np.asarray(P)
X = np.empty(sample_size, dtype=int)

# Convert each row of P into a cdf


n = len(P)
P_dist = [np.cumsum(P[i, :]) for i in range(n)]

# draw initial state, defaulting to 0


if ψ_0 is not None:
X_0 = qe.random.draw(np.cumsum(ψ_0))
else:
X_0 = 0

# simulate
X[0] = X_0
for t in range(sample_size - 1):
X[t+1] = qe.random.draw(P_dist[X[t]])

return X

Let’s see how it works using the small matrix

P = [[0.4, 0.6],
[0.2, 0.8]]

As we’ll see later, for a long series drawn from P, the fraction of the sample that takes value 0 will be about 0.25.
Moreover, this is true, regardless of the initial distribution from which 𝑋0 is drawn.
The following code illustrates this

X = mc_sample_path(P, ψ_0=[0.1, 0.9], sample_size=100_000)


np.mean(X == 0)

0.25041

You can try changing the initial distribution to confirm that the output is always close to 0.25, at least for the P matrix
above.

19.3.2 Using QuantEcon’s Routines

As discussed above, QuantEcon.py has routines for handling Markov chains, including simulation.
Here’s an illustration using the same P as the preceding example

from quantecon import MarkovChain

mc = qe.MarkovChain(P)
X = mc.simulate(ts_length=1_000_000)
np.mean(X == 0)

19.3. Simulation 343


Intermediate Quantitative Economics with Python

0.249516

The QuantEcon.py routine is JIT compiled and much faster.

%time mc_sample_path(P, sample_size=1_000_000) # Our homemade code version

CPU times: user 1.49 s, sys: 0 ns, total: 1.49 s


Wall time: 1.49 s

array([0, 1, 1, ..., 1, 1, 1])

%time mc.simulate(ts_length=1_000_000) # qe code version

CPU times: user 19.6 ms, sys: 4.75 ms, total: 24.3 ms
Wall time: 23.8 ms

array([0, 1, 1, ..., 1, 0, 1])

Adding State Values and Initial Conditions

If we wish to, we can provide a specification of state values to MarkovChain.


These state values can be integers, floats, or even strings.
The following code illustrates

mc = qe.MarkovChain(P, state_values=('unemployed', 'employed'))


mc.simulate(ts_length=4, init='employed')

array(['employed', 'employed', 'employed', 'unemployed'], dtype='<U10')

mc.simulate(ts_length=4, init='unemployed')

array(['unemployed', 'employed', 'employed', 'unemployed'], dtype='<U10')

mc.simulate(ts_length=4) # Start at randomly chosen initial state

array(['employed', 'unemployed', 'employed', 'employed'], dtype='<U10')

If we want to see indices rather than state values as outputs as we can use

mc.simulate_indices(ts_length=4)

array([1, 1, 1, 1])

344 Chapter 19. Finite Markov Chains


Intermediate Quantitative Economics with Python

19.4 Marginal Distributions

Suppose that
1. {𝑋𝑡 } is a Markov chain with stochastic matrix 𝑃
2. the marginal distribution of 𝑋𝑡 is known to be 𝜓𝑡
What then is the marginal distribution of 𝑋𝑡+1 , or, more generally, of 𝑋𝑡+𝑚 ?
To answer this, we let 𝜓𝑡 be the marginal distribution of 𝑋𝑡 for 𝑡 = 0, 1, 2, ….
Our first aim is to find 𝜓𝑡+1 given 𝜓𝑡 and 𝑃 .
To begin, pick any 𝑦 ∈ 𝑆.
Using the law of total probability, we can decompose the probability that 𝑋𝑡+1 = 𝑦 as follows:

ℙ{𝑋𝑡+1 = 𝑦} = ∑ ℙ{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥} ⋅ ℙ{𝑋𝑡 = 𝑥}


𝑥∈𝑆

In words, to get the probability of being at 𝑦 tomorrow, we account for all ways this can happen and sum their probabilities.
Rewriting this statement in terms of marginal and conditional probabilities gives

𝜓𝑡+1 (𝑦) = ∑ 𝑃 (𝑥, 𝑦)𝜓𝑡 (𝑥)


𝑥∈𝑆

There are 𝑛 such equations, one for each 𝑦 ∈ 𝑆.


If we think of 𝜓𝑡+1 and 𝜓𝑡 as row vectors, these 𝑛 equations are summarized by the matrix expression

𝜓𝑡+1 = 𝜓𝑡 𝑃 (19.4)

Thus, to move a marginal distribution forward one unit of time, we postmultiply by 𝑃 .


By postmultiplying 𝑚 times, we move a marginal distribution forward 𝑚 steps into the future.
Hence, iterating on (19.4), the expression 𝜓𝑡+𝑚 = 𝜓𝑡 𝑃 𝑚 is also valid — here 𝑃 𝑚 is the 𝑚-th power of 𝑃 .
As a special case, we see that if 𝜓0 is the initial distribution from which 𝑋0 is drawn, then 𝜓0 𝑃 𝑚 is the distribution of
𝑋𝑚 .
This is very important, so let’s repeat it

𝑋0 ∼ 𝜓0 ⟹ 𝑋𝑚 ∼ 𝜓 0 𝑃 𝑚 (19.5)

and, more generally,

𝑋𝑡 ∼ 𝜓𝑡 ⟹ 𝑋𝑡+𝑚 ∼ 𝜓𝑡 𝑃 𝑚 (19.6)

19.4.1 Multiple Step Transition Probabilities

We know that the probability of transitioning from 𝑥 to 𝑦 in one step is 𝑃 (𝑥, 𝑦).
It turns out that the probability of transitioning from 𝑥 to 𝑦 in 𝑚 steps is 𝑃 𝑚 (𝑥, 𝑦), the (𝑥, 𝑦)-th element of the 𝑚-th
power of 𝑃 .
To see why, consider again (19.6), but now with a 𝜓𝑡 that puts all probability on state 𝑥 so that the transition probabilities
are
• 1 in the 𝑥-th position and zero elsewhere

19.4. Marginal Distributions 345


Intermediate Quantitative Economics with Python

Inserting this into (19.6), we see that, conditional on 𝑋𝑡 = 𝑥, the distribution of 𝑋𝑡+𝑚 is the 𝑥-th row of 𝑃 𝑚 .
In particular

ℙ{𝑋𝑡+𝑚 = 𝑦 | 𝑋𝑡 = 𝑥} = 𝑃 𝑚 (𝑥, 𝑦) = (𝑥, 𝑦)-th element of 𝑃 𝑚

19.4.2 Example: Probability of Recession

Recall the stochastic matrix 𝑃 for recession and growth considered above.
Suppose that the current state is unknown — perhaps statistics are available only at the end of the current month.
We guess that the probability that the economy is in state 𝑥 is 𝜓(𝑥).
The probability of being in recession (either mild or severe) in 6 months time is given by the inner product

0
𝜓𝑃 6 ⋅ ⎛
⎜ 1 ⎞

1
⎝ ⎠

19.4.3 Example 2: Cross-Sectional Distributions

The marginal distributions we have been studying can be viewed either as probabilities or as cross-sectional frequencies
that a Law of Large Numbers leads us to anticipate for large samples.
To illustrate, recall our model of employment/unemployment dynamics for a given worker discussed above.
Consider a large population of workers, each of whose lifetime experience is described by the specified dynamics, with
each worker’s outcomes being realizations of processes that are statistically independent of all other workers’ processes.
Let 𝜓 be the current cross-sectional distribution over {0, 1}.
The cross-sectional distribution records fractions of workers employed and unemployed at a given moment.
• For example, 𝜓(0) is the unemployment rate.
What will the cross-sectional distribution be in 10 periods hence?
The answer is 𝜓𝑃 10 , where 𝑃 is the stochastic matrix in (19.3).
This is because each worker’s state evolves according to 𝑃 , so 𝜓𝑃 10 is a marginal distibution for a single randomly selected
worker.
But when the sample is large, outcomes and probabilities are roughly equal (by an application of the Law of Large
Numbers).
So for a very large (tending to infinite) population, 𝜓𝑃 10 also represents fractions of workers in each state.
This is exactly the cross-sectional distribution.

346 Chapter 19. Finite Markov Chains


Intermediate Quantitative Economics with Python

19.5 Irreducibility and Aperiodicity

Irreducibility and aperiodicity are central concepts of modern Markov chain theory.
Let’s see what they’re about.

19.5.1 Irreducibility

Let 𝑃 be a fixed stochastic matrix.


Two states 𝑥 and 𝑦 are said to communicate with each other if there exist positive integers 𝑗 and 𝑘 such that

𝑃 𝑗 (𝑥, 𝑦) > 0 and 𝑃 𝑘 (𝑦, 𝑥) > 0

In view of our discussion above, this means precisely that


• state 𝑥 can eventually be reached from state 𝑦, and
• state 𝑦 can eventually be reached from state 𝑥
The stochastic matrix 𝑃 is called irreducible if all states communicate; that is, if 𝑥 and 𝑦 communicate for all (𝑥, 𝑦) in
𝑆 × 𝑆.
For example, consider the following transition probabilities for wealth of a fictitious set of households

We can translate this into a stochastic matrix, putting zeros where there’s no edge between nodes

0.9 0.1 0
𝑃 ∶= ⎛
⎜ 0.4 0.4 0.2 ⎞

⎝ 0.1 0.1 0.8 ⎠

It’s clear from the graph that this stochastic matrix is irreducible: we can eventually reach any state from any other state.
We can also test this using QuantEcon.py’s MarkovChain class

P = [[0.9, 0.1, 0.0],


[0.4, 0.4, 0.2],
[0.1, 0.1, 0.8]]

mc = qe.MarkovChain(P, ('poor', 'middle', 'rich'))


mc.is_irreducible

True

19.5. Irreducibility and Aperiodicity 347


Intermediate Quantitative Economics with Python

Here’s a more pessimistic scenario in which poor people remain poor forever
This stochastic matrix is not irreducible, since, for example, rich is not accessible from poor.
Let’s confirm this

P = [[1.0, 0.0, 0.0],


[0.1, 0.8, 0.1],
[0.0, 0.2, 0.8]]

mc = qe.MarkovChain(P, ('poor', 'middle', 'rich'))


mc.is_irreducible

False

We can also determine the “communication classes”

mc.communication_classes

[array(['poor'], dtype='<U6'), array(['middle', 'rich'], dtype='<U6')]

It might be clear to you already that irreducibility is going to be important in terms of long run outcomes.
For example, poverty is a life sentence in the second graph but not the first.
We’ll come back to this a bit later.

348 Chapter 19. Finite Markov Chains


Intermediate Quantitative Economics with Python

19.5.2 Aperiodicity

Loosely speaking, a Markov chain is called periodic if it cycles in a predictable way, and aperiodic otherwise.
Here’s a trivial example with three states

The chain cycles with period 3:

P = [[0, 1, 0],
[0, 0, 1],
[1, 0, 0]]

mc = qe.MarkovChain(P)
mc.period

More formally, the period of a state 𝑥 is the largest common divisor of a set of integers

𝐷(𝑥) ∶= {𝑗 ≥ 1 ∶ 𝑃 𝑗 (𝑥, 𝑥) > 0}

In the last example, 𝐷(𝑥) = {3, 6, 9, …} for every state 𝑥, so the period is 3.
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic otherwise.
For example, the stochastic matrix associated with the transition probabilities below is periodic because, for example,
state 𝑎 has period 2

We can confirm that the stochastic matrix is periodic with the following code

P = [[0.0, 1.0, 0.0, 0.0],


[0.5, 0.0, 0.5, 0.0],
[0.0, 0.5, 0.0, 0.5],
[0.0, 0.0, 1.0, 0.0]]

mc = qe.MarkovChain(P)
mc.period

19.5. Irreducibility and Aperiodicity 349


Intermediate Quantitative Economics with Python

mc.is_aperiodic

False

19.6 Stationary Distributions

As seen in (19.4), we can shift a marginal distribution forward one unit of time via postmultiplication by 𝑃 .
Some distributions are invariant under this updating process — for example,

P = np.array([[0.4, 0.6],
[0.2, 0.8]])
ψ = (0.25, 0.75)
ψ @ P

array([0.25, 0.75])

Such distributions are called stationary or invariant.


Formally, a marginal distribution 𝜓∗ on 𝑆 is called stationary for 𝑃 if 𝜓∗ = 𝜓∗ 𝑃 .
(This is the same notion of stationarity that we learned about in the lecture on AR(1) processes applied to a different
setting.)
From this equality, we immediately get 𝜓∗ = 𝜓∗ 𝑃 𝑡 for all 𝑡.
This tells us an important fact: If the distribution of 𝑋0 is a stationary distribution, then 𝑋𝑡 will have this same distribution
for all 𝑡.
Hence stationary distributions have a natural interpretation as stochastic steady states — we’ll discuss this more soon.
Mathematically, a stationary distribution is a fixed point of 𝑃 when 𝑃 is thought of as the map 𝜓 ↦ 𝜓𝑃 from (row)
vectors to (row) vectors.
Theorem. Every stochastic matrix 𝑃 has at least one stationary distribution.
(We are assuming here that the state space 𝑆 is finite; if not more assumptions are required)
For proof of this result, you can apply Brouwer’s fixed point theorem, or see EDTC, theorem 4.3.5.
There can be many stationary distributions corresponding to a given stochastic matrix 𝑃 .
• For example, if 𝑃 is the identity matrix, then all marginal distributions are stationary.
To get uniqueness an invariant distribution, the transition matrix 𝑃 must have the property that no nontrivial subsets of
the state space are infinitely persistent.
A subset of the state space is infinitely persistent if other parts of the state space cannot be accessed from it.
Thus, infinite persistence of a non-trivial subset is the opposite of irreducibility.
This gives some intuition for the following fundamental theorem.
Theorem. If 𝑃 is both aperiodic and irreducible, then
1. 𝑃 has exactly one stationary distribution 𝜓∗ .
2. For any initial marginal distribution 𝜓0 , we have ‖𝜓0 𝑃 𝑡 − 𝜓∗ ‖ → 0 as 𝑡 → ∞.

350 Chapter 19. Finite Markov Chains


Intermediate Quantitative Economics with Python

For a proof, see, for example, theorem 5.2 of [Häggström, 2002].


(Note that part 1 of the theorem only requires irreducibility, whereas part 2 requires both irreducibility and aperiodicity)
A stochastic matrix that satisfies the conditions of the theorem is sometimes called uniformly ergodic.
A sufficient condition for aperiodicity and irreducibility is that every element of 𝑃 is strictly positive.
• Try to convince yourself of this.

19.6.1 Example

Recall our model of the employment/unemployment dynamics of a particular worker discussed above.
Assuming 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), the uniform ergodicity condition is satisfied.
Let 𝜓∗ = (𝑝, 1 − 𝑝) be the stationary distribution, so that 𝑝 corresponds to unemployment (state 0).
Using 𝜓∗ = 𝜓∗ 𝑃 and a bit of algebra yields

𝛽
𝑝=
𝛼+𝛽
This is, in some sense, a steady state probability of unemployment — more about the interpretation of this below.
Not surprisingly it tends to zero as 𝛽 → 0, and to one as 𝛼 → 0.

19.6.2 Calculating Stationary Distributions

As discussed above, a particular Markov matrix 𝑃 can have many stationary distributions.
That is, there can be many row vectors 𝜓 such that 𝜓 = 𝜓𝑃 .
In fact if 𝑃 has two distinct stationary distributions 𝜓1 , 𝜓2 then it has infinitely many, since in this case, as you can verify,
for any 𝜆 ∈ [0, 1]

𝜓3 ∶= 𝜆𝜓1 + (1 − 𝜆)𝜓2

is a stationary distribution for 𝑃 .


If we restrict attention to the case in which only one stationary distribution exists, one way to finding it is to solve the
system

𝜓(𝐼𝑛 − 𝑃 ) = 0 (19.7)

for 𝜓, where 𝐼𝑛 is the 𝑛 × 𝑛 identity.


But the zero vector solves system (19.7), so we must proceed cautiously.
We want to impose the restriction that 𝜓 is a probability distribution.
There are various ways to do this.
One option is to regard solving system (19.7) as an eigenvector problem: a vector 𝜓 such that 𝜓 = 𝜓𝑃 is a left eigenvector
associated with the unit eigenvalue 𝜆 = 1.
A stable and sophisticated algorithm specialized for stochastic matrices is implemented in QuantEcon.py.
This is the one we recommend:

19.6. Stationary Distributions 351


Intermediate Quantitative Economics with Python

P = [[0.4, 0.6],
[0.2, 0.8]]

mc = qe.MarkovChain(P)
mc.stationary_distributions # Show all stationary distributions

array([[0.25, 0.75]])

19.6.3 Convergence to Stationarity

Part 2 of the Markov chain convergence theorem stated above tells us that the marginal distribution of 𝑋𝑡 converges to
the stationary distribution regardless of where we begin.
This adds considerable authority to our interpretation of 𝜓∗ as a stochastic steady state.
The convergence in the theorem is illustrated in the next figure

P = ((0.971, 0.029, 0.000),


(0.145, 0.778, 0.077),
(0.000, 0.508, 0.492))
P = np.array(P)

ψ = (0.0, 0.2, 0.8) # Initial condition

fig = plt.figure(figsize=(8, 6))


ax = fig.add_subplot(111, projection='3d')

ax.set(xlim=(0, 1), ylim=(0, 1), zlim=(0, 1),


xticks=(0.25, 0.5, 0.75),
yticks=(0.25, 0.5, 0.75),
zticks=(0.25, 0.5, 0.75))

x_vals, y_vals, z_vals = [], [], []


for t in range(20):
x_vals.append(ψ[0])
y_vals.append(ψ[1])
z_vals.append(ψ[2])
ψ = ψ @ P

ax.scatter(x_vals, y_vals, z_vals, c='r', s=60)


ax.view_init(30, 210)

mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ax.scatter(ψ_star[0], ψ_star[1], ψ_star[2], c='k', s=60)

plt.show()

352 Chapter 19. Finite Markov Chains


Intermediate Quantitative Economics with Python

Here
• 𝑃 is the stochastic matrix for recession and growth considered above.
• The highest red dot is an arbitrarily chosen initial marginal probability distribution 𝜓, represented as a vector in
ℝ3 .
• The other red dots are the marginal distributions 𝜓𝑃 𝑡 for 𝑡 = 1, 2, ….
• The black dot is 𝜓∗ .
You might like to try experimenting with different initial conditions.

19.7 Ergodicity

Under irreducibility, yet another important result obtains: for all 𝑥 ∈ 𝑆,

1 𝑚
∑ 1{𝑋𝑡 = 𝑥} → 𝜓∗ (𝑥) as 𝑚 → ∞ (19.8)
𝑚 𝑡=1

Here
• 1{𝑋𝑡 = 𝑥} = 1 if 𝑋𝑡 = 𝑥 and zero otherwise
• convergence is with probability one
• the result does not depend on the marginal distribution of 𝑋0

19.7. Ergodicity 353


Intermediate Quantitative Economics with Python

The result tells us that the fraction of time the chain spends at state 𝑥 converges to 𝜓∗ (𝑥) as time goes to infinity.
This gives us another way to interpret the stationary distribution — provided that the convergence result in (19.8) is valid.
The convergence asserted in (19.8) is a special case of a law of large numbers result for Markov chains — see EDTC,
section 4.3.4 for some additional information.

19.7.1 Example

Recall our cross-sectional interpretation of the employment/unemployment model discussed above.


Assume that 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), so that irreducibility and aperiodicity both hold.
We saw that the stationary distribution is (𝑝, 1 − 𝑝), where

𝛽
𝑝=
𝛼+𝛽
In the cross-sectional interpretation, this is the fraction of people unemployed.
In view of our latest (ergodicity) result, it is also the fraction of time that a single worker can expect to spend unemployed.
Thus, in the long-run, cross-sectional averages for a population and time-series averages for a given person coincide.
This is one aspect of the concept of ergodicity.

19.8 Computing Expectations

We sometimes want to compute mathematical expectations of functions of 𝑋𝑡 of the form

𝔼[ℎ(𝑋𝑡 )] (19.9)

and conditional expectations such as

𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] (19.10)

where
• {𝑋𝑡 } is a Markov chain generated by 𝑛 × 𝑛 stochastic matrix 𝑃
• ℎ is a given function, which, in terms of matrix algebra, we’ll think of as the column vector
ℎ(𝑥1 )
ℎ=⎛
⎜ ⋮ ⎞

⎝ ℎ(𝑥𝑛 ) ⎠
Computing the unconditional expectation (19.9) is easy.
We just sum over the marginal distribution of 𝑋𝑡 to get

𝔼[ℎ(𝑋𝑡 )] = ∑(𝜓𝑃 𝑡 )(𝑥)ℎ(𝑥)


𝑥∈𝑆

Here 𝜓 is the distribution of 𝑋0 .


Since 𝜓 and hence 𝜓𝑃 𝑡 are row vectors, we can also write this as

𝔼[ℎ(𝑋𝑡 )] = 𝜓𝑃 𝑡 ℎ

For the conditional expectation (19.10), we need to sum over the conditional distribution of 𝑋𝑡+𝑘 given 𝑋𝑡 = 𝑥.

354 Chapter 19. Finite Markov Chains


Intermediate Quantitative Economics with Python

We already know that this is 𝑃 𝑘 (𝑥, ⋅), so

𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] = (𝑃 𝑘 ℎ)(𝑥) (19.11)

The vector 𝑃 𝑘 ℎ stores the conditional expectation 𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] over all 𝑥.

19.8.1 Iterated Expectations

The law of iterated expectations states that

𝔼 [𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥]] = 𝔼[ℎ(𝑋𝑡+𝑘 )]

where the outer 𝔼 on the left side is an unconditional distribution taken with respect to the marginal distribution 𝜓𝑡 of 𝑋𝑡
(again see equation (19.6)).
To verify the law of iterated expectations, use equation (19.11) to substitute (𝑃 𝑘 ℎ)(𝑥) for 𝐸[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥], write

𝔼 [𝔼[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥]] = 𝜓𝑡 𝑃 𝑘 ℎ,

and note 𝜓𝑡 𝑃 𝑘 ℎ = 𝜓𝑡+𝑘 ℎ = 𝔼[ℎ(𝑋𝑡+𝑘 )].

19.8.2 Expectations of Geometric Sums

Sometimes we want to compute the mathematical expectation of a geometric sum, such as ∑𝑡 𝛽 𝑡 ℎ(𝑋𝑡 ).
In view of the preceding discussion, this is

𝔼[∑ 𝛽 𝑗 ℎ(𝑋𝑡+𝑗 ) ∣ 𝑋𝑡 = 𝑥] = [(𝐼 − 𝛽𝑃 )−1 ℎ](𝑥)
𝑗=0

where

(𝐼 − 𝛽𝑃 )−1 = 𝐼 + 𝛽𝑃 + 𝛽 2 𝑃 2 + ⋯

Premultiplication by (𝐼 − 𝛽𝑃 )−1 amounts to “applying the resolvent operator”.

19.9 Exercises

Exercise 19.9.1
According to the discussion above, if a worker’s employment dynamics obey the stochastic matrix

1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽

with 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), then, in the long-run, the fraction of time spent unemployed will be
𝛽
𝑝 ∶=
𝛼+𝛽

In other words, if {𝑋𝑡 } represents the Markov chain for employment, then 𝑋̄ 𝑚 → 𝑝 as 𝑚 → ∞, where

1 𝑚
𝑋̄ 𝑚 ∶= ∑ 1{𝑋𝑡 = 0}
𝑚 𝑡=1

19.9. Exercises 355


Intermediate Quantitative Economics with Python

This exercise asks you to illustrate convergence by computing 𝑋̄ 𝑚 for large 𝑚 and checking that it is close to 𝑝.
You will see that this statement is true regardless of the choice of initial condition or the values of 𝛼, 𝛽, provided both lie
in (0, 1).

Solution to Exercise 19.9.1


We will address this exercise graphically.
The plots show the time series of 𝑋̄ 𝑚 − 𝑝 for two initial conditions.
As 𝑚 gets large, both series converge to zero.

α = β = 0.1
N = 10000
p = β / (α + β)

P = ((1 - α, α), # Careful: P and p are distinct


( β, 1 - β))
mc = MarkovChain(P)

fig, ax = plt.subplots(figsize=(9, 6))


ax.set_ylim(-0.25, 0.25)
ax.grid()
ax.hlines(0, 0, N, lw=2, alpha=0.6) # Horizonal line at zero

for x0, col in ((0, 'blue'), (1, 'green')):


# Generate time series for worker that starts at x0
X = mc.simulate(N, init=x0)
# Compute fraction of time spent unemployed, for each n
X_bar = (X == 0).cumsum() / (1 + np.arange(N, dtype=float))
# Plot
ax.fill_between(range(N), np.zeros(N), X_bar - p, color=col, alpha=0.1)
ax.plot(X_bar - p, color=col, label=f'$X_0 = \, {x0} $')
# Overlay in black--make lines clearer
ax.plot(X_bar - p, 'k-', alpha=0.6)

ax.legend(loc='upper right')
plt.show()

356 Chapter 19. Finite Markov Chains


Intermediate Quantitative Economics with Python

Exercise 19.9.2
A topic of interest for economics and many other disciplines is ranking.
Let’s now consider one of the most practical and important ranking problems — the rank assigned to web pages by search
engines.
(Although the problem is motivated from outside of economics, there is in fact a deep connection between search ranking
systems and prices in certain competitive equilibria — see [Du et al., 2013].)
To understand the issue, consider the set of results returned by a query to a web search engine.
For the user, it is desirable to
1. receive a large set of accurate matches
2. have the matches returned in order, where the order corresponds to some measure of “importance”
Ranking according to a measure of importance is the problem we now consider.
The methodology developed to solve this problem by Google founders Larry Page and Sergey Brin is known as PageRank.
To illustrate the idea, consider the following diagram
Imagine that this is a miniature version of the WWW, with
• each node representing a web page
• each arrow representing the existence of a link from one page to another
Now let’s think about which pages are likely to be important, in the sense of being valuable to a search engine user.
One possible criterion for the importance of a page is the number of inbound links — an indication of popularity.

19.9. Exercises 357


Intermediate Quantitative Economics with Python

By this measure, m and j are the most important pages, with 5 inbound links each.
However, what if the pages linking to m, say, are not themselves important?
Thinking this way, it seems appropriate to weight the inbound nodes by relative importance.
The PageRank algorithm does precisely this.
A slightly simplified presentation that captures the basic idea is as follows.
Letting 𝑗 be (the integer index of) a typical page and 𝑟𝑗 be its ranking, we set
𝑟𝑖
𝑟𝑗 = ∑
𝑖∈𝐿𝑗
ℓ𝑖

where
• ℓ𝑖 is the total number of outbound links from 𝑖
• 𝐿𝑗 is the set of all pages 𝑖 such that 𝑖 has a link to 𝑗
This is a measure of the number of inbound links, weighted by their own ranking (and normalized by 1/ℓ𝑖 ).
There is, however, another interpretation, and it brings us back to Markov chains.
Let 𝑃 be the matrix given by 𝑃 (𝑖, 𝑗) = 1{𝑖 → 𝑗}/ℓ𝑖 where 1{𝑖 → 𝑗} = 1 if 𝑖 has a link to 𝑗 and zero otherwise.
The matrix 𝑃 is a stochastic matrix provided that each page has at least one link.
With this definition of 𝑃 we have
𝑟𝑖 𝑟
𝑟𝑗 = ∑ = ∑ 1{𝑖 → 𝑗} 𝑖 = ∑ 𝑃 (𝑖, 𝑗)𝑟𝑖
𝑖∈𝐿𝑗
ℓ𝑖 all 𝑖
ℓ𝑖 all 𝑖

Writing 𝑟 for the row vector of rankings, this becomes 𝑟 = 𝑟𝑃 .


Hence 𝑟 is the stationary distribution of the stochastic matrix 𝑃 .

358 Chapter 19. Finite Markov Chains


Intermediate Quantitative Economics with Python

Let’s think of 𝑃 (𝑖, 𝑗) as the probability of “moving” from page 𝑖 to page 𝑗.


The value 𝑃 (𝑖, 𝑗) has the interpretation
• 𝑃 (𝑖, 𝑗) = 1/𝑘 if 𝑖 has 𝑘 outbound links and 𝑗 is one of them
• 𝑃 (𝑖, 𝑗) = 0 if 𝑖 has no direct link to 𝑗
Thus, motion from page to page is that of a web surfer who moves from one page to another by randomly clicking on one
of the links on that page.
Here “random” means that each link is selected with equal probability.
Since 𝑟 is the stationary distribution of 𝑃 , assuming that the uniform ergodicity condition is valid, we can interpret 𝑟𝑗 as
the fraction of time that a (very persistent) random surfer spends at page 𝑗.
Your exercise is to apply this ranking algorithm to the graph pictured above and return the list of pages ordered by rank.
There is a total of 14 nodes (i.e., web pages), the first named a and the last named n.
A typical line from the file has the form

d -> h;

This should be interpreted as meaning that there exists a link from d to h.


The data for this graph is shown below, and read into a file called web_graph_data.txt when the cell is executed.

%%file web_graph_data.txt
a -> d;
a -> f;
b -> j;
b -> k;
b -> m;
c -> c;
c -> g;
c -> j;
c -> m;
d -> f;
d -> h;
d -> k;
e -> d;
e -> h;
e -> l;
f -> a;
f -> b;
f -> j;
f -> l;
g -> b;
g -> j;
h -> d;
h -> g;
h -> l;
h -> m;
i -> g;
i -> h;
i -> n;
j -> e;
j -> i;
j -> k;
k -> n;
(continues on next page)

19.9. Exercises 359


Intermediate Quantitative Economics with Python

(continued from previous page)


l -> m;
m -> g;
n -> c;
n -> j;
n -> m;

Overwriting web_graph_data.txt

To parse this file and extract the relevant information, you can use regular expressions.
The following code snippet provides a hint as to how you can go about this

import re
re.findall('\w', 'x +++ y ****** z') # \w matches alphanumerics

['x', 'y', 'z']

re.findall('\w', 'a ^^ b &&& $$ c')

['a', 'b', 'c']

When you solve for the ranking, you will find that the highest ranked node is in fact g, while the lowest is a.

Solution to Exercise 19.9.2


Here is one solution:

"""
Return list of pages, ordered by rank
"""
import re
from operator import itemgetter

infile = 'web_graph_data.txt'
alphabet = 'abcdefghijklmnopqrstuvwxyz'

n = 14 # Total number of web pages (nodes)

# Create a matrix Q indicating existence of links


# * Q[i, j] = 1 if there is a link from i to j
# * Q[i, j] = 0 otherwise
Q = np.zeros((n, n), dtype=int)
with open(infile) as f:
edges = f.readlines()
for edge in edges:
from_node, to_node = re.findall('\w', edge)
i, j = alphabet.index(from_node), alphabet.index(to_node)
Q[i, j] = 1
# Create the corresponding Markov matrix P
P = np.empty((n, n))
for i in range(n):
(continues on next page)

360 Chapter 19. Finite Markov Chains


Intermediate Quantitative Economics with Python

(continued from previous page)


P[i, :] = Q[i, :] / Q[i, :].sum()
mc = MarkovChain(P)
# Compute the stationary distribution r
r = mc.stationary_distributions[0]
ranked_pages = {alphabet[i] : r[i] for i in range(n)}
# Print solution, sorted from highest to lowest rank
print('Rankings\n ***')
for name, rank in sorted(ranked_pages.items(), key=itemgetter(1), reverse=1):
print(f'{name}: {rank:.4}')

Rankings
***
g: 0.1607
j: 0.1594
m: 0.1195
n: 0.1088
k: 0.09106
b: 0.08326
e: 0.05312
i: 0.05312
c: 0.04834
h: 0.0456
l: 0.03202
d: 0.03056
f: 0.01164
a: 0.002911

Exercise 19.9.3
In numerical work, it is sometimes convenient to replace a continuous model with a discrete one.
In particular, Markov chains are routinely generated as discrete approximations to AR(1) processes of the form

𝑦𝑡+1 = 𝜌𝑦𝑡 + 𝑢𝑡+1

Here 𝑢𝑡 is assumed to be IID and 𝑁 (0, 𝜎𝑢2 ).


The variance of the stationary probability distribution of {𝑦𝑡 } is

𝜎𝑢2
𝜎𝑦2 ∶=
1 − 𝜌2
Tauchen’s method [Tauchen, 1986] is the most common method for approximating this continuous state process with a
finite state Markov chain.
A routine for this already exists in QuantEcon.py but let’s write our own version as an exercise.
As a first step, we choose
• 𝑛, the number of states for the discrete approximation
• 𝑚, an integer that parameterizes the width of the state space
Next, we create a state space {𝑥0 , … , 𝑥𝑛−1 } ⊂ ℝ and a stochastic 𝑛 × 𝑛 matrix 𝑃 such that
• 𝑥0 = −𝑚 𝜎𝑦
• 𝑥𝑛−1 = 𝑚 𝜎𝑦

19.9. Exercises 361


Intermediate Quantitative Economics with Python

• 𝑥𝑖+1 = 𝑥𝑖 + 𝑠 where 𝑠 = (𝑥𝑛−1 − 𝑥0 )/(𝑛 − 1)


Let 𝐹 be the cumulative distribution function of the normal distribution 𝑁 (0, 𝜎𝑢2 ).
The values 𝑃 (𝑥𝑖 , 𝑥𝑗 ) are computed to approximate the AR(1) process — omitting the derivation, the rules are as follows:
1. If 𝑗 = 0, then set

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝑃 (𝑥𝑖 , 𝑥0 ) = 𝐹 (𝑥0 − 𝜌𝑥𝑖 + 𝑠/2)

2. If 𝑗 = 𝑛 − 1, then set

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝑃 (𝑥𝑖 , 𝑥𝑛−1 ) = 1 − 𝐹 (𝑥𝑛−1 − 𝜌𝑥𝑖 − 𝑠/2)

3. Otherwise, set

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝐹 (𝑥𝑗 − 𝜌𝑥𝑖 + 𝑠/2) − 𝐹 (𝑥𝑗 − 𝜌𝑥𝑖 − 𝑠/2)

The exercise is to write a function approx_markov(rho, sigma_u, m=3, n=7) that returns {𝑥0 , … , 𝑥𝑛−1 } ⊂
ℝ and 𝑛 × 𝑛 matrix 𝑃 as described above.
• Even better, write a function that returns an instance of QuantEcon.py’s MarkovChain class.

Solution to Exercise 19.9.3


A solution from the QuantEcon.py library can be found here.

362 Chapter 19. Finite Markov Chains


CHAPTER

TWENTY

INVENTORY DYNAMICS

Contents

• Inventory Dynamics
– Overview
– Sample Paths
– Marginal Distributions
– Exercises

20.1 Overview

In this lecture we will study the time path of inventories for firms that follow so-called s-S inventory dynamics.
Such firms
1. wait until inventory falls below some level 𝑠 and then
2. order sufficient quantities to bring their inventory back up to capacity 𝑆.
These kinds of policies are common in practice and also optimal in certain circumstances.
A review of early literature and some macroeconomic implications can be found in [Caplin, 1985].
Here our main aim is to learn more about simulation, time series and Markov dynamics.
While our Markov environment and many of the concepts we consider are related to those found in our lecture on finite
Markov chains, the state space is a continuum in the current application.
Let’s start with some imports

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from numba import njit, float64, prange
from numba.experimental import jitclass

363
Intermediate Quantitative Economics with Python

20.2 Sample Paths

Consider a firm with inventory 𝑋𝑡 .


The firm waits until 𝑋𝑡 ≤ 𝑠 and then restocks up to 𝑆 units.
It faces stochastic demand {𝐷𝑡 }, which we assume is IID.
With notation 𝑎+ ∶= max{𝑎, 0}, inventory dynamics can be written as

(𝑆 − 𝐷𝑡+1 )+ if 𝑋𝑡 ≤ 𝑠
𝑋𝑡+1 = {
(𝑋𝑡 − 𝐷𝑡+1 )+ if 𝑋𝑡 > 𝑠

In what follows, we will assume that each 𝐷𝑡 is lognormal, so that

𝐷𝑡 = exp(𝜇 + 𝜎𝑍𝑡 )

where 𝜇 and 𝜎 are parameters and {𝑍𝑡 } is IID and standard normal.
Here’s a class that stores parameters and generates time paths for inventory.

firm_data = [
('s', float64), # restock trigger level
('S', float64), # capacity
('mu', float64), # shock location parameter
('sigma', float64) # shock scale parameter
]

@jitclass(firm_data)
class Firm:

def __init__(self, s=10, S=100, mu=1.0, sigma=0.5):

self.s, self.S, self.mu, self.sigma = s, S, mu, sigma

def update(self, x):


"Update the state from t to t+1 given current state x."

Z = np.random.randn()
D = np.exp(self.mu + self.sigma * Z)
if x <= self.s:
return max(self.S - D, 0)
else:
return max(x - D, 0)

def sim_inventory_path(self, x_init, sim_length):

X = np.empty(sim_length)
X[0] = x_init

for t in range(sim_length-1):
X[t+1] = self.update(X[t])
return X

Let’s run a first simulation, of a single path:

364 Chapter 20. Inventory Dynamics


Intermediate Quantitative Economics with Python

firm = Firm()

s, S = firm.s, firm.S
sim_length = 100
x_init = 50

X = firm.sim_inventory_path(x_init, sim_length)

fig, ax = plt.subplots()
bbox = (0., 1.02, 1., .102)
legend_args = {'ncol': 3,
'bbox_to_anchor': bbox,
'loc': 3,
'mode': 'expand'}

ax.plot(X, label="inventory")
ax.plot(np.full(sim_length, s), 'k--', label="$s$")
ax.plot(np.full(sim_length, S), 'k-', label="$S$")
ax.set_ylim(0, S+10)
ax.set_xlabel("time")
ax.legend(**legend_args)

plt.show()

Now let’s simulate multiple paths in order to build a more complete picture of the probabilities of different outcomes:

sim_length=200
fig, ax = plt.subplots()

ax.plot(np.full(sim_length, s), 'k--', label="$s$")


ax.plot(np.full(sim_length, S), 'k-', label="$S$")
ax.set_ylim(0, S+10)
ax.legend(**legend_args)

(continues on next page)

20.2. Sample Paths 365


Intermediate Quantitative Economics with Python

(continued from previous page)


for i in range(400):
X = firm.sim_inventory_path(x_init, sim_length)
ax.plot(X, 'b', alpha=0.2, lw=0.5)

plt.show()

20.3 Marginal Distributions

Now let’s look at the marginal distribution 𝜓𝑇 of 𝑋𝑇 for some fixed 𝑇 .


We will do this by generating many draws of 𝑋𝑇 given initial condition 𝑋0 .
With these draws of 𝑋𝑇 we can build up a picture of its distribution 𝜓𝑇 .
Here’s one visualization, with 𝑇 = 50.

T = 50
M = 200 # Number of draws

ymin, ymax = 0, S + 10

fig, axes = plt.subplots(1, 2, figsize=(11, 6))

for ax in axes:
ax.grid(alpha=0.4)

ax = axes[0]

ax.set_ylim(ymin, ymax)
ax.set_ylabel('$X_t$', fontsize=16)
ax.vlines((T,), -1.5, 1.5)

ax.set_xticks((T,))
(continues on next page)

366 Chapter 20. Inventory Dynamics


Intermediate Quantitative Economics with Python

(continued from previous page)


ax.set_xticklabels((r'$T$',))

sample = np.empty(M)
for m in range(M):
X = firm.sim_inventory_path(x_init, 2 * T)
ax.plot(X, 'b-', lw=1, alpha=0.5)
ax.plot((T,), (X[T+1],), 'ko', alpha=0.5)
sample[m] = X[T+1]

axes[1].set_ylim(ymin, ymax)

axes[1].hist(sample,
bins=16,
density=True,
orientation='horizontal',
histtype='bar',
alpha=0.5)

plt.show()

We can build up a clearer picture by drawing more samples

T = 50
M = 50_000

fig, ax = plt.subplots()

sample = np.empty(M)
for m in range(M):
X = firm.sim_inventory_path(x_init, T+1)
sample[m] = X[T]

ax.hist(sample,
(continues on next page)

20.3. Marginal Distributions 367


Intermediate Quantitative Economics with Python

(continued from previous page)


bins=36,
density=True,
histtype='bar',
alpha=0.75)

plt.show()

Note that the distribution is bimodal


• Most firms have restocked twice but a few have restocked only once (see figure with paths above).
• Firms in the second category have lower inventory.
We can also approximate the distribution using a kernel density estimator.
Kernel density estimators can be thought of as smoothed histograms.
They are preferable to histograms when the distribution being estimated is likely to be smooth.
We will use a kernel density estimator from scikit-learn

from sklearn.neighbors import KernelDensity

def plot_kde(sample, ax, label=''):

xmin, xmax = 0.9 * min(sample), 1.1 * max(sample)


xgrid = np.linspace(xmin, xmax, 200)
kde = KernelDensity(kernel='gaussian').fit(sample[:, None])
log_dens = kde.score_samples(xgrid[:, None])

ax.plot(xgrid, np.exp(log_dens), label=label)

fig, ax = plt.subplots()
plot_kde(sample, ax)
plt.show()

368 Chapter 20. Inventory Dynamics


Intermediate Quantitative Economics with Python

The allocation of probability mass is similar to what was shown by the histogram just above.

20.4 Exercises

Exercise 20.4.1
This model is asymptotically stationary, with a unique stationary distribution.
(See the discussion of stationarity in our lecture on AR(1) processes for background — the fundamental concepts are the
same.)
In particular, the sequence of marginal distributions {𝜓𝑡 } is converging to a unique limiting distribution that does not
depend on initial conditions.
Although we will not prove this here, we can investigate it using simulation.
Your task is to generate and plot the sequence {𝜓𝑡 } at times 𝑡 = 10, 50, 250, 500, 750 based on the discussion above.
(The kernel density estimator is probably the best way to present each distribution.)
You should see convergence, in the sense that differences between successive distributions are getting smaller.
Try different initial conditions to verify that, in the long run, the distribution is invariant across initial conditions.

Solution to Exercise 20.4.1


Below is one possible solution:
The computations involve a lot of CPU cycles so we have tried to write the code efficiently.
This meant writing a specialized function rather than using the class above.

s, S, mu, sigma = firm.s, firm.S, firm.mu, firm.sigma

@njit(parallel=True)
def shift_firms_forward(current_inventory_levels, num_periods):

(continues on next page)

20.4. Exercises 369


Intermediate Quantitative Economics with Python

(continued from previous page)


num_firms = len(current_inventory_levels)
new_inventory_levels = np.empty(num_firms)

for f in prange(num_firms):
x = current_inventory_levels[f]
for t in range(num_periods):
Z = np.random.randn()
D = np.exp(mu + sigma * Z)
if x <= s:
x = max(S - D, 0)
else:
x = max(x - D, 0)
new_inventory_levels[f] = x

return new_inventory_levels

x_init = 50
num_firms = 50_000

sample_dates = 0, 10, 50, 250, 500, 750

first_diffs = np.diff(sample_dates)

fig, ax = plt.subplots()

X = np.full(num_firms, x_init)

current_date = 0
for d in first_diffs:
X = shift_firms_forward(X, d)
current_date += d
plot_kde(X, ax, label=f't = {current_date}')

ax.set_xlabel('inventory')
ax.set_ylabel('probability')
ax.legend()
plt.show()

370 Chapter 20. Inventory Dynamics


Intermediate Quantitative Economics with Python

Notice that by 𝑡 = 500 or 𝑡 = 750 the densities are barely changing.


We have reached a reasonable approximation of the stationary density.
You can convince yourself that initial conditions don’t matter by testing a few of them.
For example, try rerunning the code above with all firms starting at 𝑋0 = 20 or 𝑋0 = 80.

Exercise 20.4.2
Using simulation, calculate the probability that firms that start with 𝑋0 = 70 need to order twice or more in the first 50
periods.
You will need a large sample size to get an accurate reading.

Solution to Exercise 20.4.2


Here is one solution.
Again, the computations are relatively intensive so we have written a a specialized function rather than using the class
above.
We will also use parallelization across firms.

@njit(parallel=True)
def compute_freq(sim_length=50, x_init=70, num_firms=1_000_000):

firm_counter = 0 # Records number of firms that restock 2x or more


for m in prange(num_firms):
x = x_init
restock_counter = 0 # Will record number of restocks for firm m

for t in range(sim_length):
Z = np.random.randn()
D = np.exp(mu + sigma * Z)
if x <= s:
(continues on next page)

20.4. Exercises 371


Intermediate Quantitative Economics with Python

(continued from previous page)


x = max(S - D, 0)
restock_counter += 1
else:
x = max(x - D, 0)

if restock_counter > 1:
firm_counter += 1

return firm_counter / num_firms

Note the time the routine takes to run, as well as the output.

%%time

freq = compute_freq()
print(f"Frequency of at least two stock outs = {freq}")

Frequency of at least two stock outs = 0.447305


CPU times: user 3.75 s, sys: 3.51 ms, total: 3.76 s
Wall time: 918 ms

Try switching the parallel flag to False in the jitted function above.
Depending on your system, the difference can be substantial.
(On our desktop machine, the speed up is by a factor of 5.)

372 Chapter 20. Inventory Dynamics


CHAPTER

TWENTYONE

LINEAR STATE SPACE MODELS

Contents

• Linear State Space Models


– Overview
– The Linear State Space Model
– Distributions and Moments
– Stationarity and Ergodicity
– Noisy Observations
– Prediction
– Code
– Exercises

“We may regard the present state of the universe as the effect of its past and the cause of its future” – Marquis
de Laplace
In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon

21.1 Overview

This lecture introduces the linear state space dynamic system.


The linear state space system is a generalization of the scalar AR(1) process we studied before.
This model is a workhorse that carries a powerful theory of prediction.
Its many applications include:
• representing dynamics of higher-order linear systems
• predicting the position of a system 𝑗 steps into the future
• predicting a geometric sum of future values of a variable like
– non-financial income
– dividends on a stock

373
Intermediate Quantitative Economics with Python

– the money supply


– a government deficit or surplus, etc.
• key ingredient of useful models
– Friedman’s permanent income model of consumption smoothing.
– Barro’s model of smoothing total tax collections.
– Rational expectations version of Cagan’s model of hyperinflation.
– Sargent and Wallace’s “unpleasant monetarist arithmetic,” etc.
Let’s start with some imports:

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from quantecon import LinearStateSpace
from scipy.stats import norm
import random

21.2 The Linear State Space Model

The objects in play are:


• An 𝑛 × 1 vector 𝑥𝑡 denoting the state at time 𝑡 = 0, 1, 2, ….
• An IID sequence of 𝑚 × 1 random vectors 𝑤𝑡 ∼ 𝑁 (0, 𝐼).
• A 𝑘 × 1 vector 𝑦𝑡 of observations at time 𝑡 = 0, 1, 2, ….
• An 𝑛 × 𝑛 matrix 𝐴 called the transition matrix.
• An 𝑛 × 𝑚 matrix 𝐶 called the volatility matrix.
• A 𝑘 × 𝑛 matrix 𝐺 sometimes called the output matrix.
Here is the linear state-space system

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡
𝑥0 ∼ 𝑁 (𝜇0 , Σ0 )

21.2.1 Primitives

The primitives of the model are


1. the matrices 𝐴, 𝐶, 𝐺
2. shock distribution, which we have specialized to 𝑁 (0, 𝐼)
3. the distribution of the initial condition 𝑥0 , which we have set to 𝑁 (𝜇0 , Σ0 )
Given 𝐴, 𝐶, 𝐺 and draws of 𝑥0 and 𝑤1 , 𝑤2 , …, the model (21.1) pins down the values of the sequences {𝑥𝑡 } and {𝑦𝑡 }.
Even without these draws, the primitives 1–3 pin down the probability distributions of {𝑥𝑡 } and {𝑦𝑡 }.
Later we’ll see how to compute these distributions and their moments.

374 Chapter 21. Linear State Space Models


Intermediate Quantitative Economics with Python

Martingale Difference Shocks

We’ve made the common assumption that the shocks are independent standardized normal vectors.
But some of what we say will be valid under the assumption that {𝑤𝑡+1 } is a martingale difference sequence.
A martingale difference sequence is a sequence that is zero mean when conditioned on past information.
In the present case, since {𝑥𝑡 } is our state sequence, this means that it satisfies

𝔼[𝑤𝑡+1 |𝑥𝑡 , 𝑥𝑡−1 , …] = 0

This is a weaker condition than that {𝑤𝑡 } is IID with 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼).

21.2.2 Examples

By appropriate choice of the primitives, a variety of dynamics can be represented in terms of the linear state space model.
The following examples help to highlight this point.
They also illustrate the wise dictum finding the state is an art.

Second-order Difference Equation

Let {𝑦𝑡 } be a deterministic sequence that satisfies

𝑦𝑡+1 = 𝜙0 + 𝜙1 𝑦𝑡 + 𝜙2 𝑦𝑡−1 s.t. 𝑦0 , 𝑦−1 given (21.1)

To map (21.1) into our state space system (21.1), we set

1 1 0 0 0
𝑥𝑡 = ⎡ 𝑦
⎢ 𝑡 ⎥
⎤ 𝐴=⎡𝜙
⎢ 0 𝜙1 𝜙2 ⎤
⎥ 𝐶=⎡
⎢0⎥
⎤ 𝐺 = [0 1 0]
𝑦
⎣ 𝑡−1 ⎦ ⎣0 1 0⎦ 0
⎣ ⎦
You can confirm that under these definitions, (21.1) and (21.1) agree.
The next figure shows the dynamics of this process when 𝜙0 = 1.1, 𝜙1 = 0.8, 𝜙2 = −0.8, 𝑦0 = 𝑦−1 = 1.

def plot_lss(A,
C,
G,
n=3,
ts_length=50):

ar = LinearStateSpace(A, C, G, mu_0=np.ones(n))
x, y = ar.simulate(ts_length)

fig, ax = plt.subplots()
y = y.flatten()
ax.plot(y, 'b-', lw=2, alpha=0.7)
ax.grid()
ax.set_xlabel('time', fontsize=12)
ax.set_ylabel('$y_t$', fontsize=12)
plt.show()

21.2. The Linear State Space Model 375


Intermediate Quantitative Economics with Python

ϕ_0, ϕ_1, ϕ_2 = 1.1, 0.8, -0.8

A = [[1, 0, 0 ],
[ϕ_0, ϕ_1, ϕ_2],
[0, 1, 0 ]]

C = np.zeros((3, 1))
G = [0, 1, 0]

plot_lss(A, C, G)

Later you’ll be asked to recreate this figure.

Univariate Autoregressive Processes

We can use (21.1) to represent the model

𝑦𝑡+1 = 𝜙1 𝑦𝑡 + 𝜙2 𝑦𝑡−1 + 𝜙3 𝑦𝑡−2 + 𝜙4 𝑦𝑡−3 + 𝜎𝑤𝑡+1 (21.2)

where {𝑤𝑡 } is IID and standard normal.



To put this in the linear state space format we take 𝑥𝑡 = [𝑦𝑡 𝑦𝑡−1 𝑦𝑡−2 𝑦𝑡−3 ] and

𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡1 0 0 0⎤ ⎡0⎤
𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [1 0 0 0]
⎢0 1 0 0⎥ ⎢0⎥
⎣0 0 1 0⎦ ⎣0⎦

The matrix 𝐴 has the form of the companion matrix to the vector [𝜙1 𝜙2 𝜙3 𝜙4 ].
The next figure shows the dynamics of this process when

𝜙1 = 0.5, 𝜙2 = −0.2, 𝜙3 = 0, 𝜙4 = 0.5, 𝜎 = 0.2, 𝑦0 = 𝑦−1 = 𝑦−2 = 𝑦−3 = 1

376 Chapter 21. Linear State Space Models


Intermediate Quantitative Economics with Python

ϕ_1, ϕ_2, ϕ_3, ϕ_4 = 0.5, -0.2, 0, 0.5


σ = 0.2

A_1 = [[ϕ_1, ϕ_2, ϕ_3, ϕ_4],


[1, 0, 0, 0 ],
[0, 1, 0, 0 ],
[0, 0, 1, 0 ]]

C_1 = [[σ],
[0],
[0],
[0]]

G_1 = [1, 0, 0, 0]

plot_lss(A_1, C_1, G_1, n=4, ts_length=200)

Vector Autoregressions

Now suppose that


• 𝑦𝑡 is a 𝑘 × 1 vector
• 𝜙𝑗 is a 𝑘 × 𝑘 matrix and
• 𝑤𝑡 is 𝑘 × 1
Then (21.2) is termed a vector autoregression.
To map this into (21.1), we set

𝑦𝑡 𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡𝑦 ⎤ ⎡𝐼 0 0 0⎤ ⎡0⎤
𝑥𝑡 = ⎢ 𝑡−1 ⎥ 𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [𝐼 0 0 0]
⎢𝑦𝑡−2 ⎥ ⎢0 𝐼 0 0⎥ ⎢0⎥
⎣𝑦𝑡−3 ⎦ ⎣0 0 𝐼 0⎦ ⎣0⎦

where 𝐼 is the 𝑘 × 𝑘 identity matrix and 𝜎 is a 𝑘 × 𝑘 matrix.

21.2. The Linear State Space Model 377


Intermediate Quantitative Economics with Python

Seasonals

We can use (21.1) to represent


1. the deterministic seasonal 𝑦𝑡 = 𝑦𝑡−4
2. the indeterministic seasonal 𝑦𝑡 = 𝜙4 𝑦𝑡−4 + 𝑤𝑡
In fact, both are special cases of (21.2).
With the deterministic seasonal, the transition matrix becomes

0 0 0 1
⎡1 0 0 0⎤
𝐴=⎢ ⎥
⎢0 1 0 0⎥
⎣0 0 1 0⎦

It is easy to check that 𝐴4 = 𝐼, which implies that 𝑥𝑡 is strictly periodic with period 4:1

𝑥𝑡+4 = 𝑥𝑡

Such an 𝑥𝑡 process can be used to model deterministic seasonals in quarterly time series.
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations.

Time Trends

The model 𝑦𝑡 = 𝑎𝑡 + 𝑏 is known as a linear time trend.


We can represent this model in the linear state space form by taking

1 1 0
𝐴=[ ] 𝐶=[ ] 𝐺 = [𝑎 𝑏] (21.3)
0 1 0

and starting at initial condition 𝑥0 = [0 1] .
In fact, it’s possible to use the state-space system to represent polynomial trends of any order.
For instance, we can represent the model 𝑦𝑡 = 𝑎𝑡2 + 𝑏𝑡 + 𝑐 in the linear state space form by taking

1 1 0 0
𝐴=⎡
⎢ 1 1⎥
0 ⎤ 𝐶=⎡
⎢0⎥
⎤ 𝐺 = [2𝑎 𝑎 + 𝑏 𝑐]
⎣0 0 1 ⎦ ⎣0⎦

and starting at initial condition 𝑥0 = [0 0 1] .
It follows that
1 𝑡 𝑡(𝑡 − 1)/2
𝐴𝑡 = ⎡
⎢0 1 𝑡 ⎤

⎣0 0 1 ⎦

Then 𝑥′𝑡 = [𝑡(𝑡 − 1)/2 𝑡 1]. You can now confirm that 𝑦𝑡 = 𝐺𝑥𝑡 has the correct form.
1 The eigenvalues of 𝐴 are (1, −1, 𝑖, −𝑖).

378 Chapter 21. Linear State Space Models


Intermediate Quantitative Economics with Python

21.2.3 Moving Average Representations

A nonrecursive expression for 𝑥𝑡 as a function of 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 can be found by using (21.1) repeatedly to obtain

𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡

𝑡−1
= ∑ 𝐴𝑗 𝐶𝑤𝑡−𝑗 + 𝐴𝑡 𝑥0
𝑗=0

Representation (21.4) is a moving average representation.


It expresses {𝑥𝑡 } as a linear function of
1. current and past values of the process {𝑤𝑡 } and
2. the initial condition 𝑥0
As an example of a moving average representation, let the model be

1 1 1
𝐴=[ ] 𝐶=[ ]
0 1 0

1 𝑡 ′
You will be able to show that 𝐴𝑡 = [ ] and 𝐴𝑗 𝐶 = [1 0] .
0 1
Substituting into the moving average representation (21.4), we obtain
𝑡−1
𝑥1𝑡 = ∑ 𝑤𝑡−𝑗 + [1 𝑡] 𝑥0
𝑗=0

where 𝑥1𝑡 is the first entry of 𝑥𝑡 .


The first term on the right is a cumulated sum of martingale differences and is therefore a martingale.
The second term is a translated linear function of time.
For this reason, 𝑥1𝑡 is called a martingale with drift.

21.3 Distributions and Moments

21.3.1 Unconditional Moments

Using (21.1), it’s easy to obtain expressions for the (unconditional) means of 𝑥𝑡 and 𝑦𝑡 .
We’ll explain what unconditional and conditional mean soon.
Letting 𝜇𝑡 ∶= 𝔼[𝑥𝑡 ] and using linearity of expectations, we find that

𝜇𝑡+1 = 𝐴𝜇𝑡 with 𝜇0 given (21.4)

Here 𝜇0 is a primitive given in (21.1).


The variance-covariance matrix of 𝑥𝑡 is Σ𝑡 ∶= 𝔼[(𝑥𝑡 − 𝜇𝑡 )(𝑥𝑡 − 𝜇𝑡 )′ ].
Using 𝑥𝑡+1 − 𝜇𝑡+1 = 𝐴(𝑥𝑡 − 𝜇𝑡 ) + 𝐶𝑤𝑡+1 , we can determine this matrix recursively via

Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ + 𝐶𝐶 ′ with Σ0 given (21.5)

21.3. Distributions and Moments 379


Intermediate Quantitative Economics with Python

As with 𝜇0 , the matrix Σ0 is a primitive given in (21.1).


As a matter of terminology, we will sometimes call
• 𝜇𝑡 the unconditional mean of 𝑥𝑡
• Σ𝑡 the unconditional variance-covariance matrix of 𝑥𝑡
This is to distinguish 𝜇𝑡 and Σ𝑡 from related objects that use conditioning information, to be defined below.
However, you should be aware that these “unconditional” moments do depend on the initial distribution 𝑁 (𝜇0 , Σ0 ).

Moments of the Observables

Using linearity of expectations again we have

𝔼[𝑦𝑡 ] = 𝔼[𝐺𝑥𝑡 ] = 𝐺𝜇𝑡 (21.6)

The variance-covariance matrix of 𝑦𝑡 is easily shown to be

Var[𝑦𝑡 ] = Var[𝐺𝑥𝑡 ] = 𝐺Σ𝑡 𝐺′ (21.7)

21.3.2 Distributions

In general, knowing the mean and variance-covariance matrix of a random vector is not quite as good as knowing the full
distribution.
However, there are some situations where these moments alone tell us all we need to know.
These are situations in which the mean vector and covariance matrix are all of the parameters that pin down the population
distribution.
One such situation is when the vector in question is Gaussian (i.e., normally distributed).
This is the case here, given
1. our Gaussian assumptions on the primitives
2. the fact that normality is preserved under linear operations
In fact, it’s well-known that

𝑢 ∼ 𝑁 (𝑢,̄ 𝑆) and 𝑣 = 𝑎 + 𝐵𝑢 ⟹ 𝑣 ∼ 𝑁 (𝑎 + 𝐵𝑢,̄ 𝐵𝑆𝐵′ ) (21.8)

In particular, given our Gaussian assumptions on the primitives and the linearity of (21.1) we can see immediately that
both 𝑥𝑡 and 𝑦𝑡 are Gaussian for all 𝑡 ≥ 02 .
Since 𝑥𝑡 is Gaussian, to find the distribution, all we need to do is find its mean and variance-covariance matrix.
But in fact we’ve already done this, in (21.4) and (21.5).
Letting 𝜇𝑡 and Σ𝑡 be as defined by these equations, we have

𝑥𝑡 ∼ 𝑁 (𝜇𝑡 , Σ𝑡 ) (21.9)

By similar reasoning combined with (21.6) and (21.7),

𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ ) (21.10)


2 The correct way to argue this is by induction. Suppose that 𝑥 is Gaussian. Then (21.1) and (21.8) imply that 𝑥
𝑡 𝑡+1 is Gaussian. Since 𝑥0 is
assumed to be Gaussian, it follows that every 𝑥𝑡 is Gaussian. Evidently, this implies that each 𝑦𝑡 is Gaussian.

380 Chapter 21. Linear State Space Models


Intermediate Quantitative Economics with Python

21.3.3 Ensemble Interpretations

How should we interpret the distributions defined by (21.9)–(21.10)?


Intuitively, the probabilities in a distribution correspond to relative frequencies in a large population drawn from that
distribution.
Let’s apply this idea to our setting, focusing on the distribution of 𝑦𝑇 for fixed 𝑇 .
We can generate independent draws of 𝑦𝑇 by repeatedly simulating the evolution of the system up to time 𝑇 , using an
independent set of shocks each time.
The next figure shows 20 simulations, producing 20 time series for {𝑦𝑡 }, and hence 20 draws of 𝑦𝑇 .
The system in question is the univariate autoregressive model (21.2).
The values of 𝑦𝑇 are represented by black dots in the left-hand figure

def cross_section_plot(A,
C,
G,
T=20, # Set the time
ymin=-0.8,
ymax=1.25,
sample_size = 20, # 20 observations/simulations
n=4): # The number of dimensions for the initial x0

ar = LinearStateSpace(A, C, G, mu_0=np.ones(n))

fig, axes = plt.subplots(1, 2, figsize=(16, 5))

for ax in axes:
ax.grid(alpha=0.4)
ax.set_ylim(ymin, ymax)

ax = axes[0]
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$y_t$', fontsize=12)
ax.set_xlabel('time', fontsize=12)
ax.vlines((T,), -1.5, 1.5)

ax.set_xticks((T,))
ax.set_xticklabels(('$T$',))

sample = []
for i in range(sample_size):
rcolor = random.choice(('c', 'g', 'b', 'k'))
x, y = ar.simulate(ts_length=T+15)
y = y.flatten()
ax.plot(y, color=rcolor, lw=1, alpha=0.5)
ax.plot((T,), (y[T],), 'ko', alpha=0.5)
sample.append(y[T])

y = y.flatten()
axes[1].set_ylim(ymin, ymax)
axes[1].set_ylabel('$y_t$', fontsize=12)
axes[1].set_xlabel('relative frequency', fontsize=12)
axes[1].hist(sample, bins=16, density=True, orientation='horizontal', alpha=0.5)
plt.show()

21.3. Distributions and Moments 381


Intermediate Quantitative Economics with Python

ϕ_1, ϕ_2, ϕ_3, ϕ_4 = 0.5, -0.2, 0, 0.5


σ = 0.1

A_2 = [[ϕ_1, ϕ_2, ϕ_3, ϕ_4],


[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0]]

C_2 = [[σ], [0], [0], [0]]

G_2 = [1, 0, 0, 0]

cross_section_plot(A_2, C_2, G_2)

In the right-hand figure, these values are converted into a rotated histogram that shows relative frequencies from our
sample of 20 𝑦𝑇 ’s.
Here is another figure, this time with 100 observations

t = 100
cross_section_plot(A_2, C_2, G_2, T=t)

Let’s now try with 500,000 observations, showing only the histogram (without rotation)

T = 100
ymin=-0.8
ymax=1.25
(continues on next page)

382 Chapter 21. Linear State Space Models


Intermediate Quantitative Economics with Python

(continued from previous page)


sample_size = 500_000

ar = LinearStateSpace(A_2, C_2, G_2, mu_0=np.ones(4))


fig, ax = plt.subplots()
x, y = ar.simulate(sample_size)
mu_x, mu_y, Sigma_x, Sigma_y, Sigma_yx = ar.stationary_distributions()
f_y = norm(loc=float(mu_y), scale=float(np.sqrt(Sigma_y)))
y = y.flatten()
ygrid = np.linspace(ymin, ymax, 150)

ax.hist(y, bins=50, density=True, alpha=0.4)


ax.plot(ygrid, f_y.pdf(ygrid), 'k-', lw=2, alpha=0.8, label=r'true density')
ax.set_xlim(ymin, ymax)
ax.set_xlabel('$y_t$', fontsize=12)
ax.set_ylabel('relative frequency', fontsize=12)
ax.legend(fontsize=12)
plt.show()

/tmp/ipykernel_6445/1034809053.py:10: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

f_y = norm(loc=float(mu_y), scale=float(np.sqrt(Sigma_y)))

The black line is the population density of 𝑦𝑇 calculated from (21.10).


The histogram and population distribution are close, as expected.
By looking at the figures and experimenting with parameters, you will gain a feel for how the population distribution
depends on the model primitives listed above, as intermediated by the distribution’s parameters.

21.3. Distributions and Moments 383


Intermediate Quantitative Economics with Python

Ensemble Means

In the preceding figure, we approximated the population distribution of 𝑦𝑇 by


1. generating 𝐼 sample paths (i.e., time series) where 𝐼 is a large number
2. recording each observation 𝑦𝑇𝑖
3. histogramming this sample
Just as the histogram approximates the population distribution, the ensemble or cross-sectional average

1 𝐼 𝑖
𝑦𝑇̄ ∶= ∑𝑦
𝐼 𝑖=1 𝑇

approximates the expectation 𝔼[𝑦𝑇 ] = 𝐺𝜇𝑇 (as implied by the law of large numbers).
Here’s a simulation comparing the ensemble averages and population means at time points 𝑡 = 0, … , 50.
The parameters are the same as for the preceding figures, and the sample size is relatively small (𝐼 = 20).

I = 20
T = 50
ymin = -0.5
ymax = 1.15

ar = LinearStateSpace(A_2, C_2, G_2, mu_0=np.ones(4))

fig, ax = plt.subplots()

ensemble_mean = np.zeros(T)
for i in range(I):
x, y = ar.simulate(ts_length=T)
y = y.flatten()
ax.plot(y, 'c-', lw=0.8, alpha=0.5)
ensemble_mean = ensemble_mean + y

ensemble_mean = ensemble_mean / I
ax.plot(ensemble_mean, color='b', lw=2, alpha=0.8, label='$\\bar y_t$')
m = ar.moment_sequence()

population_means = []
for t in range(T):
μ_x, μ_y, Σ_x, Σ_y = next(m)
population_means.append(float(μ_y))

ax.plot(population_means, color='g', lw=2, alpha=0.8, label='$G\mu_t$')


ax.set_ylim(ymin, ymax)
ax.set_xlabel('time', fontsize=12)
ax.set_ylabel('$y_t$', fontsize=12)
ax.legend(ncol=2)
plt.show()

/tmp/ipykernel_6445/3206934063.py:24: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

population_means.append(float(μ_y))

384 Chapter 21. Linear State Space Models


Intermediate Quantitative Economics with Python

The ensemble mean for 𝑥𝑡 is

1 𝐼 𝑖
𝑥𝑇̄ ∶= ∑ 𝑥 → 𝜇𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇

The limit 𝜇𝑇 is a “long-run average”.


(By long-run average we mean the average for an infinite (𝐼 = ∞) number of sample 𝑥𝑇 ’s)
Another application of the law of large numbers assures us that

1 𝐼
∑(𝑥𝑖 − 𝑥𝑇̄ )(𝑥𝑖𝑇 − 𝑥𝑇̄ )′ → Σ𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇

21.3.4 Joint Distributions

In the preceding discussion, we looked at the distributions of 𝑥𝑡 and 𝑦𝑡 in isolation.


This gives us useful information but doesn’t allow us to answer questions like
• what’s the probability that 𝑥𝑡 ≥ 0 for all 𝑡?
• what’s the probability that the process {𝑦𝑡 } exceeds some value 𝑎 before falling below 𝑏?
• etc., etc.
Such questions concern the joint distributions of these sequences.
To compute the joint distribution of 𝑥0 , 𝑥1 , … , 𝑥𝑇 , recall that joint and conditional densities are linked by the rule

𝑝(𝑥, 𝑦) = 𝑝(𝑦 | 𝑥)𝑝(𝑥) (joint = conditional × marginal)

From this rule we get 𝑝(𝑥0 , 𝑥1 ) = 𝑝(𝑥1 | 𝑥0 )𝑝(𝑥0 ).


The Markov property 𝑝(𝑥𝑡 | 𝑥𝑡−1 , … , 𝑥0 ) = 𝑝(𝑥𝑡 | 𝑥𝑡−1 ) and repeated applications of the preceding rule lead us to
𝑇 −1
𝑝(𝑥0 , 𝑥1 , … , 𝑥𝑇 ) = 𝑝(𝑥0 ) ∏ 𝑝(𝑥𝑡+1 | 𝑥𝑡 )
𝑡=0

21.3. Distributions and Moments 385


Intermediate Quantitative Economics with Python

The marginal 𝑝(𝑥0 ) is just the primitive 𝑁 (𝜇0 , Σ0 ).


In view of (21.1), the conditional densities are

𝑝(𝑥𝑡+1 | 𝑥𝑡 ) = 𝑁 (𝐴𝑥𝑡 , 𝐶𝐶 ′ )

Autocovariance Functions

An important object related to the joint distribution is the autocovariance function

Σ𝑡+𝑗,𝑡 ∶= 𝔼[(𝑥𝑡+𝑗 − 𝜇𝑡+𝑗 )(𝑥𝑡 − 𝜇𝑡 )′ ] (21.11)

Elementary calculations show that

Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ𝑡 (21.12)

Notice that Σ𝑡+𝑗,𝑡 in general depends on both 𝑗, the gap between the two dates, and 𝑡, the earlier date.

21.4 Stationarity and Ergodicity

Stationarity and ergodicity are two properties that, when they hold, greatly aid analysis of linear state space models.
Let’s start with the intuition.

21.4.1 Visualizing Stability

Let’s look at some more time series from the same model that we analyzed above.
This picture shows cross-sectional distributions for 𝑦 at times 𝑇 , 𝑇 ′ , 𝑇 ″

def cross_plot(A,
C,
G,
steady_state='False',
T0 = 10,
T1 = 50,
T2 = 75,
T4 = 100):

ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))

if steady_state == 'True':
μ_x, μ_y, Σ_x, Σ_y, Σ_yx = ar.stationary_distributions()
ar_state = LinearStateSpace(A, C, G, mu_0=μ_x, Sigma_0=Σ_x)

ymin, ymax = -0.6, 0.6


fig, ax = plt.subplots()
ax.grid(alpha=0.4)
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$y_t$', fontsize=12)
ax.set_xlabel('$time$', fontsize=12)

ax.vlines((T0, T1, T2), -1.5, 1.5)


(continues on next page)

386 Chapter 21. Linear State Space Models


Intermediate Quantitative Economics with Python

(continued from previous page)


ax.set_xticks((T0, T1, T2))
ax.set_xticklabels(("$T$", "$T'$", "$T''$"), fontsize=12)
for i in range(80):
rcolor = random.choice(('c', 'g', 'b'))

if steady_state == 'True':
x, y = ar_state.simulate(ts_length=T4)
else:
x, y = ar.simulate(ts_length=T4)

y = y.flatten()
ax.plot(y, color=rcolor, lw=0.8, alpha=0.5)
ax.plot((T0, T1, T2), (y[T0], y[T1], y[T2],), 'ko', alpha=0.5)
plt.show()

cross_plot(A_2, C_2, G_2)

Note how the time series “settle down” in the sense that the distributions at 𝑇 ′ and 𝑇 ″ are relatively similar to each other
— but unlike the distribution at 𝑇 .
Apparently, the distributions of 𝑦𝑡 converge to a fixed long-run distribution as 𝑡 → ∞.
When such a distribution exists it is called a stationary distribution.

21.4.2 Stationary Distributions

In our setting, a distribution 𝜓∞ is said to be stationary for 𝑥𝑡 if

𝑥𝑡 ∼ 𝜓 ∞ and 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 ⟹ 𝑥𝑡+1 ∼ 𝜓∞

Since
1. in the present case, all distributions are Gaussian
2. a Gaussian distribution is pinned down by its mean and variance-covariance matrix

21.4. Stationarity and Ergodicity 387


Intermediate Quantitative Economics with Python

we can restate the definition as follows: 𝜓∞ is stationary for 𝑥𝑡 if

𝜓∞ = 𝑁 (𝜇∞ , Σ∞ )

where 𝜇∞ and Σ∞ are fixed points of (21.4) and (21.5) respectively.

21.4.3 Covariance Stationary Processes

Let’s see what happens to the preceding figure if we start 𝑥0 at the stationary distribution.

cross_plot(A_2, C_2, G_2, steady_state='True')

Now the differences in the observed distributions at 𝑇 , 𝑇 ′ and 𝑇 ″ come entirely from random fluctuations due to the
finite sample size.
By
• our choosing 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ )
• the definitions of 𝜇∞ and Σ∞ as fixed points of (21.4) and (21.5) respectively
we’ve ensured that

𝜇𝑡 = 𝜇∞ and Σ𝑡 = Σ∞ for all 𝑡

Moreover, in view of (21.12), the autocovariance function takes the form Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞ , which depends on 𝑗 but not
on 𝑡.
This motivates the following definition.
A process {𝑥𝑡 } is said to be covariance stationary if
• both 𝜇𝑡 and Σ𝑡 are constant in 𝑡
• Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on time 𝑡
In our setting, {𝑥𝑡 } will be covariance stationary if 𝜇0 , Σ0 , 𝐴, 𝐶 assume values that imply that none of 𝜇𝑡 , Σ𝑡 , Σ𝑡+𝑗,𝑡
depends on 𝑡.

388 Chapter 21. Linear State Space Models


Intermediate Quantitative Economics with Python

21.4.4 Conditions for Stationarity

The Globally Stable Case

The difference equation 𝜇𝑡+1 = 𝐴𝜇𝑡 is known to have unique fixed point 𝜇∞ = 0 if all eigenvalues of 𝐴 have moduli
strictly less than unity.
That is, if (np.absolute(np.linalg.eigvals(A)) < 1).all() == True.
The difference equation (21.5) also has a unique fixed point in this case, and, moreover

𝜇𝑡 → 𝜇 ∞ = 0 and Σ𝑡 → Σ∞ as 𝑡→∞

regardless of the initial conditions 𝜇0 and Σ0 .


This is the globally stable case — see these notes for more a theoretical treatment.
However, global stability is more than we need for stationary solutions, and often more than we want.
To illustrate, consider our second order difference equation example.

Here the state is 𝑥𝑡 = [1 𝑦𝑡 𝑦𝑡−1 ] .
Because of the constant first component in the state vector, we will never have 𝜇𝑡 → 0.
How can we find stationary solutions that respect a constant state component?

Processes with a Constant State Component

To investigate such a process, suppose that 𝐴 and 𝐶 take the form

𝐴1 𝑎 𝐶1
𝐴=[ ] 𝐶=[ ]
0 1 0
where
• 𝐴1 is an (𝑛 − 1) × (𝑛 − 1) matrix
• 𝑎 is an (𝑛 − 1) × 1 column vector

Let 𝑥𝑡 = [𝑥′1𝑡 1] where 𝑥1𝑡 is (𝑛 − 1) × 1.
It follows that

𝑥1,𝑡+1 = 𝐴1 𝑥1𝑡 + 𝑎 + 𝐶1 𝑤𝑡+1

Let 𝜇1𝑡 = 𝔼[𝑥1𝑡 ] and take expectations on both sides of this expression to get

𝜇1,𝑡+1 = 𝐴1 𝜇1,𝑡 + 𝑎 (21.13)

Assume now that the moduli of the eigenvalues of 𝐴1 are all strictly less than one.
Then (21.13) has a unique stationary solution, namely,

𝜇1∞ = (𝐼 − 𝐴1 )−1 𝑎

The stationary value of 𝜇𝑡 itself is then 𝜇∞ ∶= [𝜇′1∞ 1] .
The stationary values of Σ𝑡 and Σ𝑡+𝑗,𝑡 satisfy

Σ∞ = 𝐴Σ∞ 𝐴′ + 𝐶𝐶 ′
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞

21.4. Stationarity and Ergodicity 389


Intermediate Quantitative Economics with Python

Notice that here Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on calendar time 𝑡.
In conclusion, if
• 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ ) and
• the moduli of the eigenvalues of 𝐴1 are all strictly less than unity
then the {𝑥𝑡 } process is covariance stationary, with constant state component.

Note: If the eigenvalues of 𝐴1 are less than unity in modulus, then (a) starting from any initial value, the mean and
variance-covariance matrix both converge to their stationary values; and (b) iterations on (21.5) converge to the fixed
point of the discrete Lyapunov equation in the first line of (21.14).

21.4.5 Ergodicity

Let’s suppose that we’re working with a covariance stationary process.


In this case, we know that the ensemble mean will converge to 𝜇∞ as the sample size 𝐼 approaches infinity.

Averages over Time

Ensemble averages across simulations are interesting theoretically, but in real life, we usually observe only a single real-
ization {𝑥𝑡 , 𝑦𝑡 }𝑇𝑡=0 .
So now let’s take a single realization and form the time-series averages

1 𝑇 1 𝑇
𝑥̄ ∶= ∑𝑥 and 𝑦 ̄ ∶= ∑𝑦
𝑇 𝑡=1 𝑡 𝑇 𝑡=1 𝑡

Do these time series averages converge to something interpretable in terms of our basic state-space representation?
The answer depends on something called ergodicity.
Ergodicity is the property that time series and ensemble averages coincide.
More formally, ergodicity implies that time series sample averages converge to their expectation under the stationary
distribution.
In particular,
1 𝑇
• 𝑇 ∑𝑡=1 𝑥𝑡 → 𝜇∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → Σ∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡+𝑗 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → 𝐴𝑗 Σ∞
In our linear Gaussian setting, any covariance stationary process is also ergodic.

390 Chapter 21. Linear State Space Models


Intermediate Quantitative Economics with Python

21.5 Noisy Observations

In some settings, the observation equation 𝑦𝑡 = 𝐺𝑥𝑡 is modified to include an error term.
Often this error term represents the idea that the true state can only be observed imperfectly.
To include an error term in the observation we introduce
• An IID sequence of ℓ × 1 random vectors 𝑣𝑡 ∼ 𝑁 (0, 𝐼).
• A 𝑘 × ℓ matrix 𝐻.
and extend the linear state-space system to

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡 + 𝐻𝑣𝑡
𝑥0 ∼ 𝑁 (𝜇0 , Σ0 )

The sequence {𝑣𝑡 } is assumed to be independent of {𝑤𝑡 }.


The process {𝑥𝑡 } is not modified by noise in the observation equation and its moments, distributions and stability prop-
erties remain the same.
The unconditional moments of 𝑦𝑡 from (21.6) and (21.7) now become

𝔼[𝑦𝑡 ] = 𝔼[𝐺𝑥𝑡 + 𝐻𝑣𝑡 ] = 𝐺𝜇𝑡 (21.14)

The variance-covariance matrix of 𝑦𝑡 is easily shown to be

Var[𝑦𝑡 ] = Var[𝐺𝑥𝑡 + 𝐻𝑣𝑡 ] = 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ (21.15)

The distribution of 𝑦𝑡 is therefore

𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ )

21.6 Prediction

The theory of prediction for linear state space systems is elegant and simple.

21.6.1 Forecasting Formulas – Conditional Means

The natural way to predict variables is to use conditional distributions.


For example, the optimal forecast of 𝑥𝑡+1 given information known at time 𝑡 is

𝔼𝑡 [𝑥𝑡+1 ] ∶= 𝔼[𝑥𝑡+1 ∣ 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 ] = 𝐴𝑥𝑡

The right-hand side follows from 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 and the fact that 𝑤𝑡+1 is zero mean and independent of
𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 .
That 𝔼𝑡 [𝑥𝑡+1 ] = 𝔼[𝑥𝑡+1 ∣ 𝑥𝑡 ] is an implication of {𝑥𝑡 } having the Markov property.
The one-step-ahead forecast error is

𝑥𝑡+1 − 𝔼𝑡 [𝑥𝑡+1 ] = 𝐶𝑤𝑡+1

21.5. Noisy Observations 391


Intermediate Quantitative Economics with Python

The covariance matrix of the forecast error is

𝔼[(𝑥𝑡+1 − 𝔼𝑡 [𝑥𝑡+1 ])(𝑥𝑡+1 − 𝔼𝑡 [𝑥𝑡+1 ])′ ] = 𝐶𝐶 ′

More generally, we’d like to compute the 𝑗-step ahead forecasts 𝔼𝑡 [𝑥𝑡+𝑗 ] and 𝔼𝑡 [𝑦𝑡+𝑗 ].
With a bit of algebra, we obtain

𝑥𝑡+𝑗 = 𝐴𝑗 𝑥𝑡 + 𝐴𝑗−1 𝐶𝑤𝑡+1 + 𝐴𝑗−2 𝐶𝑤𝑡+2 + ⋯ + 𝐴0 𝐶𝑤𝑡+𝑗

In view of the IID property, current and past state values provide no information about future values of the shock.
Hence 𝔼𝑡 [𝑤𝑡+𝑘 ] = 𝔼[𝑤𝑡+𝑘 ] = 0.
It now follows from linearity of expectations that the 𝑗-step ahead forecast of 𝑥 is

𝔼𝑡 [𝑥𝑡+𝑗 ] = 𝐴𝑗 𝑥𝑡

The 𝑗-step ahead forecast of 𝑦 is therefore

𝔼𝑡 [𝑦𝑡+𝑗 ] = 𝔼𝑡 [𝐺𝑥𝑡+𝑗 + 𝐻𝑣𝑡+𝑗 ] = 𝐺𝐴𝑗 𝑥𝑡

21.6.2 Covariance of Prediction Errors

It is useful to obtain the covariance matrix of the vector of 𝑗-step-ahead prediction errors
𝑗−1
𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ] = ∑ 𝐴𝑠 𝐶𝑤𝑡−𝑠+𝑗 (21.16)
𝑠=0

Evidently,
𝑗−1

𝑉𝑗 ∶= 𝔼𝑡 [(𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ])(𝑥𝑡+𝑗 − 𝔼𝑡 [𝑥𝑡+𝑗 ])′ ] = ∑ 𝐴𝑘 𝐶𝐶 ′ 𝐴𝑘 (21.17)
𝑘=0

𝑉𝑗 defined in (21.17) can be calculated recursively via 𝑉1 = 𝐶𝐶 ′ and

𝑉𝑗 = 𝐶𝐶 ′ + 𝐴𝑉𝑗−1 𝐴′ , 𝑗≥2 (21.18)

𝑉𝑗 is the conditional covariance matrix of the errors in forecasting 𝑥𝑡+𝑗 , conditioned on time 𝑡 information 𝑥𝑡 .
Under particular conditions, 𝑉𝑗 converges to

𝑉∞ = 𝐶𝐶 ′ + 𝐴𝑉∞ 𝐴′ (21.19)

Equation (21.19) is an example of a discrete Lyapunov equation in the covariance matrix 𝑉∞ .


A sufficient condition for 𝑉𝑗 to converge is that the eigenvalues of 𝐴 be strictly less than one in modulus.
Weaker sufficient conditions for convergence associate eigenvalues equaling or exceeding one in modulus with elements
of 𝐶 that equal 0.

21.7 Code

Our preceding simulations and calculations are based on code in the file lss.py from the QuantEcon.py package.
The code implements a class for handling linear state space models (simulations, calculating moments, etc.).

392 Chapter 21. Linear State Space Models


Intermediate Quantitative Economics with Python

One Python construct you might not be familiar with is the use of a generator function in the method mo-
ment_sequence().
Go back and read the relevant documentation if you’ve forgotten how generator functions work.
Examples of usage are given in the solutions to the exercises.

21.8 Exercises

Exercise 21.8.1
In several contexts, we want to compute forecasts of geometric sums of future random variables governed by the linear
state-space system (21.1).
We want the following objects

• Forecast of a geometric sum of future 𝑥’s, or 𝔼𝑡 [∑𝑗=0 𝛽 𝑗 𝑥𝑡+𝑗 ].

• Forecast of a geometric sum of future 𝑦’s, or 𝔼𝑡 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 ].
These objects are important components of some famous and interesting dynamic models.
For example,

• if {𝑦𝑡 } is a stream of dividends, then 𝔼 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of a stock price

• if {𝑦𝑡 } is the money supply, then 𝔼 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of the price level
Show that:

𝔼𝑡 [∑ 𝛽 𝑗 𝑥𝑡+𝑗 ] = [𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

and

𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝐺[𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

what must the modulus for every eigenvalue of 𝐴 be less than?

Solution to Exercise 21.8.1


Suppose that every eigenvalue of 𝐴 has modulus strictly less than 𝛽1 .
−1
It then follows that 𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ = [𝐼 − 𝛽𝐴] .
This leads to our formulas:
• Forecast of a geometric sum of future 𝑥’s

𝔼𝑡 [∑ 𝛽 𝑗 𝑥𝑡+𝑗 ] = [𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = [𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

• Forecast of a geometric sum of future 𝑦’s

21.8. Exercises 393


Intermediate Quantitative Economics with Python


𝔼𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝐺[𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = 𝐺[𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

394 Chapter 21. Linear State Space Models


CHAPTER

TWENTYTWO

SAMUELSON MULTIPLIER-ACCELERATOR

Contents

• Samuelson Multiplier-Accelerator
– Overview
– Details
– Implementation
– Stochastic Shocks
– Government Spending
– Wrapping Everything Into a Class
– Using the LinearStateSpace Class
– Pure Multiplier Model
– Summary

In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon

22.1 Overview

This lecture creates non-stochastic and stochastic versions of Paul Samuelson’s celebrated multiplier accelerator model
[Samuelson, 1939].
In doing so, we extend the example of the Solow model class in our second OOP lecture.
Our objectives are to
• provide a more detailed example of OOP and classes
• review a famous model
• review linear difference equations, both deterministic and stochastic
Let’s start with some standard imports:

395
Intermediate Quantitative Economics with Python

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np

We’ll also use the following for various tasks described below:

from quantecon import LinearStateSpace


import cmath
import math
import sympy
from sympy import Symbol, init_printing
from cmath import sqrt

22.1.1 Samuelson’s Model

Samuelson used a second-order linear difference equation to represent a model of national output based on three compo-
nents:
• a national output identity asserting that national output or national income is the sum of consumption plus investment
plus government purchases.
• a Keynesian consumption function asserting that consumption at time 𝑡 is equal to a constant times national output
at time 𝑡 − 1.
• an investment accelerator asserting that investment at time 𝑡 equals a constant called the accelerator coefficient times
the difference in output between period 𝑡 − 1 and 𝑡 − 2.
Consumption plus investment plus government purchases constitute aggregate demand, which automatically calls forth an
equal amount of aggregate supply.
(To read about linear difference equations see here or chapter IX of [Sargent, 1987].)
Samuelson used the model to analyze how particular values of the marginal propensity to consume and the accelerator
coefficient might give rise to transient business cycles in national output.
Possible dynamic properties include
• smooth convergence to a constant level of output
• damped business cycles that eventually converge to a constant level of output
• persistent business cycles that neither dampen nor explode
Later we present an extension that adds a random shock to the right side of the national income identity representing
random fluctuations in aggregate demand.
This modification makes national output become governed by a second-order stochastic linear difference equation that,
with appropriate parameter values, gives rise to recurrent irregular business cycles.
(To read about stochastic linear difference equations see chapter XI of [Sargent, 1987].)

396 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

22.2 Details

Let’s assume that


• {𝐺𝑡 } is a sequence of levels of government expenditures – we’ll start by setting 𝐺𝑡 = 𝐺 for all 𝑡.
• {𝐶𝑡 } is a sequence of levels of aggregate consumption expenditures, a key endogenous variable in the model.
• {𝐼𝑡 } is a sequence of rates of investment, another key endogenous variable.
• {𝑌𝑡 } is a sequence of levels of national income, yet another endogenous variable.
• 𝑎 is the marginal propensity to consume in the Keynesian consumption function 𝐶𝑡 = 𝑎𝑌𝑡−1 + 𝛾.
• 𝑏 is the “accelerator coefficient” in the “investment accelerator” 𝐼𝑡 = 𝑏(𝑌𝑡−1 − 𝑌𝑡−2 ).
• {𝜖𝑡 } is an IID sequence standard normal random variables.
• 𝜎 ≥ 0 is a “volatility” parameter — setting 𝜎 = 0 recovers the non-stochastic case that we’ll start with.
The model combines the consumption function
𝐶𝑡 = 𝑎𝑌𝑡−1 + 𝛾 (22.1)
with the investment accelerator
𝐼𝑡 = 𝑏(𝑌𝑡−1 − 𝑌𝑡−2 ) (22.2)
and the national income identity
𝑌𝑡 = 𝐶𝑡 + 𝐼𝑡 + 𝐺𝑡 (22.3)
• The parameter 𝑎 is peoples’ marginal propensity to consume out of income - equation (22.1) asserts that people
consume a fraction of 𝑎 ∈ (0, 1) of each additional dollar of income.
• The parameter 𝑏 > 0 is the investment accelerator coefficient - equation (22.2) asserts that people invest in physical
capital when income is increasing and disinvest when it is decreasing.
Equations (22.1), (22.2), and (22.3) imply the following second-order linear difference equation for national income:
𝑌𝑡 = (𝑎 + 𝑏)𝑌𝑡−1 − 𝑏𝑌𝑡−2 + (𝛾 + 𝐺𝑡 )
or
𝑌𝑡 = 𝜌1 𝑌𝑡−1 + 𝜌2 𝑌𝑡−2 + (𝛾 + 𝐺𝑡 ) (22.4)
where 𝜌1 = (𝑎 + 𝑏) and 𝜌2 = −𝑏.
To complete the model, we require two initial conditions.
If the model is to generate time series for 𝑡 = 0, … , 𝑇 , we require initial values
̄ ,
𝑌−1 = 𝑌−1 ̄
𝑌−2 = 𝑌−2
̄ , 𝑌−2
We’ll ordinarily set the parameters (𝑎, 𝑏) so that starting from an arbitrary pair of initial conditions (𝑌−1 ̄ ), national
income 𝑌𝑡 converges to a constant value as 𝑡 becomes large.
We are interested in studying
• the transient fluctuations in 𝑌𝑡 as it converges to its steady state level
• the rate at which it converges to a steady state level
The deterministic version of the model described so far — meaning that no random shocks hit aggregate demand — has
only transient fluctuations.
We can convert the model to one that has persistent irregular fluctuations by adding a random shock to aggregate demand.

22.2. Details 397


Intermediate Quantitative Economics with Python

22.2.1 Stochastic Version of the Model

We create a random or stochastic version of the model by adding a random process of shocks or disturbances {𝜎𝜖𝑡 }
to the right side of equation (22.4), leading to the second-order scalar linear stochastic difference equation:

𝑌𝑡 = 𝐺𝑡 + 𝑎(1 − 𝑏)𝑌𝑡−1 − 𝑎𝑏𝑌𝑡−2 + 𝜎𝜖𝑡 (22.5)

22.2.2 Mathematical Analysis of the Model

To get started, let’s set 𝐺𝑡 ≡ 0, 𝜎 = 0, and 𝛾 = 0.


Then we can write equation (22.5) as

𝑌𝑡 = 𝜌1 𝑌𝑡−1 + 𝜌2 𝑌𝑡−2

or

𝑌𝑡+2 − 𝜌1 𝑌𝑡+1 − 𝜌2 𝑌𝑡 = 0 (22.6)

To discover the properties of the solution of (22.6), it is useful first to form the characteristic polynomial for (22.6):

𝑧 2 − 𝜌1 𝑧 − 𝜌 2 (22.7)

where 𝑧 is possibly a complex number.


We want to find the two zeros (a.k.a. roots) – namely 𝜆1 , 𝜆2 – of the characteristic polynomial.
These are two special values of 𝑧, say 𝑧 = 𝜆1 and 𝑧 = 𝜆2 , such that if we set 𝑧 equal to one of these values in expression
(22.7), the characteristic polynomial (22.7) equals zero:

𝑧 2 − 𝜌1 𝑧 − 𝜌2 = (𝑧 − 𝜆1 )(𝑧 − 𝜆2 ) = 0 (22.8)

Equation (22.8) is said to factor the characteristic polynomial.


When the roots are complex, they will occur as a complex conjugate pair.
When the roots are complex, it is convenient to represent them in the polar form

𝜆1 = 𝑟𝑒𝑖𝜔 , 𝜆2 = 𝑟𝑒−𝑖𝜔

where 𝑟 is the amplitude of the complex number and 𝜔 is its angle or phase.
These can also be represented as

𝜆1 = 𝑟(𝑐𝑜𝑠(𝜔) + 𝑖 sin(𝜔))

𝜆2 = 𝑟(𝑐𝑜𝑠(𝜔) − 𝑖 sin(𝜔))
(To read about the polar form, see here)
Given initial conditions 𝑌−1 , 𝑌−2 , we want to generate a solution of the difference equation (22.6).
It can be represented as

𝑌𝑡 = 𝜆𝑡1 𝑐1 + 𝜆𝑡2 𝑐2

where 𝑐1 and 𝑐2 are constants that depend on the two initial conditions and on 𝜌1 , 𝜌2 .
When the roots are complex, it is useful to pursue the following calculations.

398 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

Notice that
𝑌𝑡 = 𝑐1 (𝑟𝑒𝑖𝜔 )𝑡 + 𝑐2 (𝑟𝑒−𝑖𝜔 )𝑡
= 𝑐1 𝑟𝑡 𝑒𝑖𝜔𝑡 + 𝑐2 𝑟𝑡 𝑒−𝑖𝜔𝑡
= 𝑐1 𝑟𝑡 [cos(𝜔𝑡) + 𝑖 sin(𝜔𝑡)] + 𝑐2 𝑟𝑡 [cos(𝜔𝑡) − 𝑖 sin(𝜔𝑡)]
= (𝑐1 + 𝑐2 )𝑟𝑡 cos(𝜔𝑡) + 𝑖(𝑐1 − 𝑐2 )𝑟𝑡 sin(𝜔𝑡)

The only way that 𝑌𝑡 can be a real number for each 𝑡 is if 𝑐1 + 𝑐2 is a real number and 𝑐1 − 𝑐2 is an imaginary number.
This happens only when 𝑐1 and 𝑐2 are complex conjugates, in which case they can be written in the polar forms

𝑐1 = 𝑣𝑒𝑖𝜃 , 𝑐2 = 𝑣𝑒−𝑖𝜃

So we can write
𝑌𝑡 = 𝑣𝑒𝑖𝜃 𝑟𝑡 𝑒𝑖𝜔𝑡 + 𝑣𝑒−𝑖𝜃 𝑟𝑡 𝑒−𝑖𝜔𝑡
= 𝑣𝑟𝑡 [𝑒𝑖(𝜔𝑡+𝜃) + 𝑒−𝑖(𝜔𝑡+𝜃) ]
= 2𝑣𝑟𝑡 cos(𝜔𝑡 + 𝜃)

where 𝑣 and 𝜃 are constants that must be chosen to satisfy initial conditions for 𝑌−1 , 𝑌−2 .
2𝜋
This formula shows that when the roots are complex, 𝑌𝑡 displays oscillations with period 𝑝̌ = 𝜔 and damping factor 𝑟.
We say that 𝑝̌ is the period because in that amount of time the cosine wave cos(𝜔𝑡+𝜃) goes through exactly one complete
cycles.
(Draw a cosine function to convince yourself of this please)
Remark: Following [Samuelson, 1939], we want to choose the parameters 𝑎, 𝑏 of the model so that the absolute values
(of the possibly complex) roots 𝜆1 , 𝜆2 of the characteristic polynomial are both strictly less than one:

|𝜆𝑗 | < 1 for 𝑗 = 1, 2

Remark: When both roots 𝜆1 , 𝜆2 of the characteristic polynomial have absolute values strictly less than one, the absolute
value of the larger one governs the rate of convergence to the steady state of the non stochastic version of the model.

22.2.3 Things This Lecture Does

We write a function to generate simulations of a {𝑌𝑡 } sequence as a function of time.


The function requires that we put in initial conditions for 𝑌−1 , 𝑌−2 .
The function checks that 𝑎, 𝑏 are set so that 𝜆1 , 𝜆2 are less than unity in absolute value (also called “modulus”).
The function also tells us whether the roots are complex, and, if they are complex, returns both their real and complex
parts.
If the roots are both real, the function returns their values.
We use our function written to simulate paths that are stochastic (when 𝜎 > 0).
We have written the function in a way that allows us to input {𝐺𝑡 } paths of a few simple forms, e.g.,
• one time jumps in 𝐺 at some time
• a permanent jump in 𝐺 that occurs at some time
We proceed to use the Samuelson multiplier-accelerator model as a laboratory to make a simple OOP example.
The “state” that determines next period’s 𝑌𝑡+1 is now not just the current value 𝑌𝑡 but also the once lagged value 𝑌𝑡−1 .
This involves a little more bookkeeping than is required in the Solow model class definition.

22.2. Details 399


Intermediate Quantitative Economics with Python

We use the Samuelson multiplier-accelerator model as a vehicle for teaching how we can gradually add more features to
the class.
We want to have a method in the class that automatically generates a simulation, either non-stochastic (𝜎 = 0) or stochastic
(𝜎 > 0).
We also show how to map the Samuelson model into a simple instance of the LinearStateSpace class described
here.
We can use a LinearStateSpace instance to do various things that we did above with our homemade function and
class.
Among other things, we show by example that the eigenvalues of the matrix 𝐴 that we use to form the instance of the
LinearStateSpace class for the Samuelson model equal the roots of the characteristic polynomial (22.7) for the
Samuelson multiplier accelerator model.
Here is the formula for the matrix 𝐴 in the linear state space system in the case that government expenditures are a
constant 𝐺:
1 0 0
𝐴=⎡
⎢𝛾 + 𝐺 𝜌1 𝜌2 ⎤

⎣ 0 1 0⎦

22.3 Implementation

We’ll start by drawing an informative graph from page 189 of [Sargent, 1987]

def param_plot():

"""This function creates the graph on page 189 of


Sargent Macroeconomic Theory, second edition, 1987.
"""

fig, ax = plt.subplots(figsize=(10, 6))


ax.set_aspect('equal')

# Set axis
xmin, ymin = -3, -2
xmax, ymax = -xmin, -ymin
plt.axis([xmin, xmax, ymin, ymax])

# Set axis labels


ax.set(xticks=[], yticks=[])
ax.set_xlabel(r'$\rho_2$', fontsize=16)
ax.xaxis.set_label_position('top')
ax.set_ylabel(r'$\rho_1$', rotation=0, fontsize=16)
ax.yaxis.set_label_position('right')

# Draw (t1, t2) points


ρ1 = np.linspace(-2, 2, 100)
ax.plot(ρ1, -abs(ρ1) + 1, c='black')
ax.plot(ρ1, np.full_like(ρ1, -1), c='black')
ax.plot(ρ1, -(ρ1**2 / 4), c='black')

# Turn normal axes off


for spine in ['left', 'bottom', 'top', 'right']:
ax.spines[spine].set_visible(False)
(continues on next page)

400 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

(continued from previous page)

# Add arrows to represent axes


axes_arrows = {'arrowstyle': '<|-|>', 'lw': 1.3}
ax.annotate('', xy=(xmin, 0), xytext=(xmax, 0), arrowprops=axes_arrows)
ax.annotate('', xy=(0, ymin), xytext=(0, ymax), arrowprops=axes_arrows)

# Annotate the plot with equations


plot_arrowsl = {'arrowstyle': '-|>', 'connectionstyle': "arc3, rad=-0.2"}
plot_arrowsr = {'arrowstyle': '-|>', 'connectionstyle': "arc3, rad=0.2"}
ax.annotate(r'$\rho_1 + \rho_2 < 1$', xy=(0.5, 0.3), xytext=(0.8, 0.6),
arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'$\rho_1 + \rho_2 = 1$', xy=(0.38, 0.6), xytext=(0.6, 0.8),
arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'$\rho_2 < 1 + \rho_1$', xy=(-0.5, 0.3), xytext=(-1.3, 0.6),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'$\rho_2 = 1 + \rho_1$', xy=(-0.38, 0.6), xytext=(-1, 0.8),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'$\rho_2 = -1$', xy=(1.5, -1), xytext=(1.8, -1.3),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'${\rho_1}^2 + 4\rho_2 = 0$', xy=(1.15, -0.35),
xytext=(1.5, -0.3), arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'${\rho_1}^2 + 4\rho_2 < 0$', xy=(1.4, -0.7),
xytext=(1.8, -0.6), arrowprops=plot_arrowsr, fontsize='12')

# Label categories of solutions


ax.text(1.5, 1, 'Explosive\n growth', ha='center', fontsize=16)
ax.text(-1.5, 1, 'Explosive\n oscillations', ha='center', fontsize=16)
ax.text(0.05, -1.5, 'Explosive oscillations', ha='center', fontsize=16)
ax.text(0.09, -0.5, 'Damped oscillations', ha='center', fontsize=16)

# Add small marker to y-axis


ax.axhline(y=1.005, xmin=0.495, xmax=0.505, c='black')
ax.text(-0.12, -1.12, '-1', fontsize=10)
ax.text(-0.12, 0.98, '1', fontsize=10)

return fig

param_plot()
plt.show()

22.3. Implementation 401


Intermediate Quantitative Economics with Python

The graph portrays regions in which the (𝜆1 , 𝜆2 ) root pairs implied by the (𝜌1 = (𝑎 + 𝑏), 𝜌2 = −𝑏) difference equation
parameter pairs in the Samuelson model are such that:
• (𝜆1 , 𝜆2 ) are complex with modulus less than 1 - in this case, the {𝑌𝑡 } sequence displays damped oscillations.
• (𝜆1 , 𝜆2 ) are both real, but one is strictly greater than 1 - this leads to explosive growth.
• (𝜆1 , 𝜆2 ) are both real, but one is strictly less than −1 - this leads to explosive oscillations.
• (𝜆1 , 𝜆2 ) are both real and both are less than 1 in absolute value - in this case, there is smooth convergence to the
steady state without damped cycles.
Later we’ll present the graph with a red mark showing the particular point implied by the setting of (𝑎, 𝑏).

22.3.1 Function to Describe Implications of Characteristic Polynomial

def categorize_solution(ρ1, ρ2):

"""This function takes values of ρ1 and ρ2 and uses them


to classify the type of solution
"""

discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 > 1 + ρ1 or ρ2 < -1:
print('Explosive oscillations')
elif ρ1 + ρ2 > 1:
print('Explosive growth')
elif discriminant < 0:
print('Roots are complex with modulus less than one; \
(continues on next page)

402 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

(continued from previous page)


therefore damped oscillations')
else:
print('Roots are real and absolute values are less than one; \
therefore get smooth convergence to a steady state')

### Test the categorize_solution function

categorize_solution(1.3, -.4)

Roots are real and absolute values are less than one; therefore get smooth␣
↪convergence to a steady state

22.3.2 Function for Plotting Paths

A useful function for our work below is

def plot_y(function=None):

"""Function plots path of Y_t"""

plt.subplots(figsize=(10, 6))
plt.plot(function)
plt.xlabel('Time $t$')
plt.ylabel('$Y_t$', rotation=0)
plt.grid()
plt.show()

22.3.3 Manual or “by hand” Root Calculations

The following function calculates roots of the characteristic polynomial using high school algebra.
(We’ll calculate the roots in other ways later)
The function also plots a 𝑌𝑡 starting from initial conditions that we set

# This is a 'manual' method

def y_nonstochastic(y_0=100, y_1=80, α=.92, β=.5, γ=10, n=80):

"""Takes values of parameters and computes the roots of characteristic


polynomial. It tells whether they are real or complex and whether they
are less than unity in absolute value.It also computes a simulation of
length n starting from the two given initial conditions for national
income
"""

roots = []

ρ1 = α + β
ρ2 = -β

(continues on next page)

22.3. Implementation 403


Intermediate Quantitative Economics with Python

(continued from previous page)


print(f'ρ_1 is {ρ1}')
print(f'ρ_2 is {ρ2}')

discriminant = ρ1 ** 2 + 4 * ρ2

if discriminant == 0:
roots.append(-ρ1 / 2)
print('Single real root: ')
print(''.join(str(roots)))
elif discriminant > 0:
roots.append((-ρ1 + sqrt(discriminant).real) / 2)
roots.append((-ρ1 - sqrt(discriminant).real) / 2)
print('Two real roots: ')
print(''.join(str(roots)))
else:
roots.append((-ρ1 + sqrt(discriminant)) / 2)
roots.append((-ρ1 - sqrt(discriminant)) / 2)
print('Two complex roots: ')
print(''.join(str(roots)))

if all(abs(root) < 1 for root in roots):


print('Absolute values of roots are less than one')
else:
print('Absolute values of roots are not less than one')

def transition(x, t): return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ

y_t = [y_0, y_1]

for t in range(2, n):


y_t.append(transition(y_t, t))

return y_t

plot_y(y_nonstochastic())

ρ_1 is 1.42
ρ_2 is -0.5
Two real roots:
[-0.6459687576256715, -0.7740312423743284]
Absolute values of roots are less than one

404 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

22.3.4 Reverse-Engineering Parameters to Generate Damped Cycles

The next cell writes code that takes as inputs the modulus 𝑟 and phase 𝜙 of a conjugate pair of complex numbers in polar
form

𝜆1 = 𝑟 exp(𝑖𝜙), 𝜆2 = 𝑟 exp(−𝑖𝜙)

• The code assumes that these two complex numbers are the roots of the characteristic polynomial
• It then reverse-engineers (𝑎, 𝑏) and (𝜌1 , 𝜌2 ), pairs that would generate those roots

### code to reverse-engineer a cycle


### y_t = r^t (c_1 cos(ϕ t) + c2 sin(ϕ t))
###

def f(r, ϕ):


"""
Takes modulus r and angle ϕ of complex number r exp(j ϕ)
and creates ρ1 and ρ2 of characteristic polynomial for which
r exp(j ϕ) and r exp(- j ϕ) are complex roots.

Returns the multiplier coefficient a and the accelerator coefficient b


that verifies those roots.
"""
g1 = cmath.rect(r, ϕ) # Generate two complex roots
g2 = cmath.rect(r, -ϕ)
ρ1 = g1 + g2 # Implied ρ1, ρ2
ρ2 = -g1 * g2
b = -ρ2 # Reverse-engineer a and b that validate these
a = ρ1 - b
(continues on next page)

22.3. Implementation 405


Intermediate Quantitative Economics with Python

(continued from previous page)


return ρ1, ρ2, a, b

## Now let's use the function in an example


## Here are the example parameters

r = .95
period = 10 # Length of cycle in units of time
ϕ = 2 * math.pi/period

## Apply the function

ρ1, ρ2, a, b = f(r, ϕ)

print(f"a, b = {a}, {b}")


print(f"ρ1, ρ2 = {ρ1}, {ρ2}")

a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ρ1, ρ2 = (1.5371322893124+0j), (-0.9024999999999999+0j)

## Print the real components of ρ1 and ρ2

ρ1 = ρ1.real
ρ2 = ρ2.real

ρ1, ρ2

(1.5371322893124, -0.9024999999999999)

22.3.5 Root Finding Using Numpy

Here we’ll use numpy to compute the roots of the characteristic polynomial

r1, r2 = np.roots([1, -ρ1, -ρ2])

p1 = cmath.polar(r1)
p2 = cmath.polar(r2)

print(f"r, ϕ = {r}, {ϕ}")


print(f"p1, p2 = {p1}, {p2}")
# print(f"g1, g2 = {g1}, {g2}")

print(f"a, b = {a}, {b}")


print(f"ρ1, ρ2 = {ρ1}, {ρ2}")

r, ϕ = 0.95, 0.6283185307179586
p1, p2 = (0.95, 0.6283185307179586), (0.95, -0.6283185307179586)
a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ρ1, ρ2 = 1.5371322893124, -0.9024999999999999

406 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

##=== This method uses numpy to calculate roots ===#

def y_nonstochastic(y_0=100, y_1=80, α=.9, β=.8, γ=10, n=80):

""" Rather than computing the roots of the characteristic


polynomial by hand as we did earlier, this function
enlists numpy to do the work for us
"""

# Useful constants
ρ1 = α + β
ρ2 = -β

categorize_solution(ρ1, ρ2)

# Find roots of polynomial


roots = np.roots([1, -ρ1, -ρ2])
print(f'Roots are {roots}')

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Define transition equation


def transition(x, t): return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ

# Set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):
y_t.append(transition(y_t, t))

return y_t

plot_y(y_nonstochastic())

Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.85+0.27838822j 0.85-0.27838822j]
Roots are complex
Roots are less than one

22.3. Implementation 407


Intermediate Quantitative Economics with Python

22.3.6 Reverse-Engineered Complex Roots: Example

The next cell studies the implications of reverse-engineered complex roots.


We’ll generate an undamped cycle of period 10

r = 1 # Generates undamped, nonexplosive cycles

period = 10 # Length of cycle in units of time


ϕ = 2 * math.pi/period

## Apply the reverse-engineering function f

ρ1, ρ2, a, b = f(r, ϕ)

# Drop the imaginary part so that it is a valid input into y_nonstochastic


a = a.real
b = b.real

print(f"a, b = {a}, {b}")

ytemp = y_nonstochastic(α=a, β=b, y_0=20, y_1=30)


plot_y(ytemp)

a, b = 0.6180339887498949, 1.0
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.80901699+0.58778525j 0.80901699-0.58778525j]
Roots are complex
Roots are not less than one

408 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

22.3.7 Digression: Using Sympy to Find Roots

We can also use sympy to compute analytic formulas for the roots

init_printing()

r1 = Symbol("ρ_1")
r2 = Symbol("ρ_2")
z = Symbol("z")

sympy.solve(z**2 - r1*z - r2, z)

𝜌1 √𝜌12 + 4𝜌2 𝜌1 √𝜌12 + 4𝜌2


[ − , + ]
2 2 2 2

a = Symbol("α")
b = Symbol("β")
r1 = a + b
r2 = -b

sympy.solve(z**2 - r1*z - r2, z)

𝛼 𝛽 √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽 𝛼 𝛽 √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽


[ + − , + + ]
2 2 2 2 2 2

22.3. Implementation 409


Intermediate Quantitative Economics with Python

22.4 Stochastic Shocks

Now we’ll construct some code to simulate the stochastic version of the model that emerges when we add a random shock
process to aggregate demand

def y_stochastic(y_0=0, y_1=0, α=0.8, β=0.2, γ=10, n=100, σ=5):

"""This function takes parameters of a stochastic version of


the model and proceeds to analyze the roots of the characteristic
polynomial and also generate a simulation.
"""

# Useful constants
ρ1 = α + β
ρ2 = -β

# Categorize solution
categorize_solution(ρ1, ρ2)

# Find roots of polynomial


roots = np.roots([1, -ρ1, -ρ2])
print(roots)

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Generate shocks
ϵ = np.random.normal(0, 1, n)

# Define transition equation


def transition(x, t): return ρ1 * \
x[t - 1] + ρ2 * x[t - 2] + γ + σ * ϵ[t]

# Set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):
y_t.append(transition(y_t, t))

return y_t

plot_y(y_stochastic())

Roots are real and absolute values are less than one; therefore get smooth␣
↪convergence to a steady state

[0.7236068 0.2763932]
(continues on next page)

410 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

(continued from previous page)


Roots are real
Roots are less than one

Let’s do a simulation in which there are shocks and the characteristic polynomial has complex roots

r = .97

period = 10 # Length of cycle in units of time


ϕ = 2 * math.pi/period

### Apply the reverse-engineering function f

ρ1, ρ2, a, b = f(r, ϕ)

# Drop the imaginary part so that it is a valid input into y_nonstochastic


a = a.real
b = b.real

print(f"a, b = {a}, {b}")


plot_y(y_stochastic(y_0=40, y_1 = 42, α=a, β=b, σ=2, n=100))

a, b = 0.6285929690873979, 0.9409000000000001
Roots are complex with modulus less than one; therefore damped oscillations
[0.78474648+0.57015169j 0.78474648-0.57015169j]
Roots are complex
Roots are less than one

22.4. Stochastic Shocks 411


Intermediate Quantitative Economics with Python

22.5 Government Spending

This function computes a response to either a permanent or one-off increase in government expenditures

def y_stochastic_g(y_0=20,
y_1=20,
α=0.8,
β=0.2,
γ=10,
n=100,
σ=2,
g=0,
g_t=0,
duration='permanent'):

"""This program computes a response to a permanent increase


in government expenditures that occurs at time 20
"""

# Useful constants
ρ1 = α + β
ρ2 = -β

# Categorize solution
categorize_solution(ρ1, ρ2)

# Find roots of polynomial


roots = np.roots([1, -ρ1, -ρ2])
print(roots)
(continues on next page)

412 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

(continued from previous page)

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Generate shocks
ϵ = np.random.normal(0, 1, n)

def transition(x, t, g):

# Non-stochastic - separated to avoid generating random series


# when not needed
if σ == 0:
return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ + g

# Stochastic
else:
ϵ = np.random.normal(0, 1, n)
return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ + g + σ * ϵ[t]

# Create list and set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):

# No government spending
if g == 0:
y_t.append(transition(y_t, t))

# Government spending (no shock)


elif g != 0 and duration == None:
y_t.append(transition(y_t, t))

# Permanent government spending shock


elif duration == 'permanent':
if t < g_t:
y_t.append(transition(y_t, t, g=0))
else:
y_t.append(transition(y_t, t, g=g))

# One-off government spending shock


elif duration == 'one-off':
if t == g_t:
y_t.append(transition(y_t, t, g=g))
else:
y_t.append(transition(y_t, t, g=0))
return y_t

22.5. Government Spending 413


Intermediate Quantitative Economics with Python

A permanent government spending shock can be simulated as follows

plot_y(y_stochastic_g(g=10, g_t=20, duration='permanent'))

Roots are real and absolute values are less than one; therefore get smooth␣
↪convergence to a steady state

[0.7236068 0.2763932]
Roots are real
Roots are less than one

We can also see the response to a one time jump in government expenditures

plot_y(y_stochastic_g(g=500, g_t=50, duration='one-off'))

Roots are real and absolute values are less than one; therefore get smooth␣
↪convergence to a steady state

[0.7236068 0.2763932]
Roots are real
Roots are less than one

414 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

22.6 Wrapping Everything Into a Class

Up to now, we have written functions to do the work.


Now we’ll roll up our sleeves and write a Python class called Samuelson for the Samuelson model

class Samuelson():

"""This class represents the Samuelson model, otherwise known as the


multiple-accelerator model. The model combines the Keynesian multiplier
with the accelerator theory of investment.

The path of output is governed by a linear second-order difference equation

.. math::

Y_t = + \alpha (1 + \beta) Y_{t-1} - \alpha \beta Y_{t-2}

Parameters
----------
y_0 : scalar
Initial condition for Y_0
y_1 : scalar
Initial condition for Y_1
α : scalar
Marginal propensity to consume
β : scalar
Accelerator coefficient
n : int
(continues on next page)

22.6. Wrapping Everything Into a Class 415


Intermediate Quantitative Economics with Python

(continued from previous page)


Number of iterations
σ : scalar
Volatility parameter. It must be greater than or equal to 0. Set
equal to 0 for a non-stochastic model.
g : scalar
Government spending shock
g_t : int
Time at which government spending shock occurs. Must be specified
when duration != None.
duration : {None, 'permanent', 'one-off'}
Specifies type of government spending shock. If none, government
spending equal to g for all t.

"""

def __init__(self,
y_0=100,
y_1=50,
α=1.3,
β=0.2,
γ=10,
n=100,
σ=0,
g=0,
g_t=0,
duration=None):

self.y_0, self.y_1, self.α, self.β = y_0, y_1, α, β


self.n, self.g, self.g_t, self.duration = n, g, g_t, duration
self.γ, self.σ = γ, σ
self.ρ1 = α + β
self.ρ2 = -β
self.roots = np.roots([1, -self.ρ1, -self.ρ2])

def root_type(self):
if all(isinstance(root, complex) for root in self.roots):
return 'Complex conjugate'
elif len(self.roots) > 1:
return 'Double real'
else:
return 'Single real'

def root_less_than_one(self):
if all(abs(root) < 1 for root in self.roots):
return True

def solution_type(self):
ρ1, ρ2 = self.ρ1, self.ρ2
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 >= 1 + ρ1 or ρ2 <= -1:
return 'Explosive oscillations'
elif ρ1 + ρ2 >= 1:
return 'Explosive growth'
elif discriminant < 0:
return 'Damped oscillations'
else:

(continues on next page)

416 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

(continued from previous page)


return 'Steady state'

def _transition(self, x, t, g):

# Non-stochastic - separated to avoid generating random series


# when not needed
if self.σ == 0:
return self.ρ1 * x[t - 1] + self.ρ2 * x[t - 2] + self.γ + g

# Stochastic
else:
ϵ = np.random.normal(0, 1, self.n)
return self.ρ1 * x[t - 1] + self.ρ2 * x[t - 2] + self.γ + g \
+ self.σ * ϵ[t]

def generate_series(self):

# Create list and set initial conditions


y_t = [self.y_0, self.y_1]

# Generate y_t series


for t in range(2, self.n):

# No government spending
if self.g == 0:
y_t.append(self._transition(y_t, t))

# Government spending (no shock)


elif self.g != 0 and self.duration == None:
y_t.append(self._transition(y_t, t))

# Permanent government spending shock


elif self.duration == 'permanent':
if t < self.g_t:
y_t.append(self._transition(y_t, t, g=0))
else:
y_t.append(self._transition(y_t, t, g=self.g))

# One-off government spending shock


elif self.duration == 'one-off':
if t == self.g_t:
y_t.append(self._transition(y_t, t, g=self.g))
else:
y_t.append(self._transition(y_t, t, g=0))
return y_t

def summary(self):
print('Summary\n' + '-' * 50)
print(f'Root type: {self.root_type()}')
print(f'Solution type: {self.solution_type()}')
print(f'Roots: {str(self.roots)}')

if self.root_less_than_one() == True:
print('Absolute value of roots is less than one')
else:
print('Absolute value of roots is not less than one')

(continues on next page)

22.6. Wrapping Everything Into a Class 417


Intermediate Quantitative Economics with Python

(continued from previous page)

if self.σ > 0:
print('Stochastic series with σ = ' + str(self.σ))
else:
print('Non-stochastic series')

if self.g != 0:
print('Government spending equal to ' + str(self.g))

if self.duration != None:
print(self.duration.capitalize() +
' government spending shock at t = ' + str(self.g_t))

def plot(self):
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(self.generate_series())
ax.set(xlabel='Iteration', xlim=(0, self.n))
ax.set_ylabel('$Y_t$', rotation=0)
ax.grid()

# Add parameter values to plot


paramstr = f'$\\alpha={self.α:.2f}$ \n $\\beta={self.β:.2f}$ \n \
$\\gamma={self.γ:.2f}$ \n $\\sigma={self.σ:.2f}$ \n \
$\\rho_1={self.ρ1:.2f}$ \n $\\rho_2={self.ρ2:.2f}$'
props = dict(fc='white', pad=10, alpha=0.5)
ax.text(0.87, 0.05, paramstr, transform=ax.transAxes,
fontsize=12, bbox=props, va='bottom')

return fig

def param_plot(self):

# Uses the param_plot() function defined earlier (it is then able


# to be used standalone or as part of the model)

fig = param_plot()
ax = fig.gca()

# Add λ values to legend


for i, root in enumerate(self.roots):
if isinstance(root, complex):
# Need to fill operator for positive as string is split apart
operator = ['+', '']
label = rf'$\lambda_{i+1} = {sam.roots[i].real:.2f} {operator[i]}
↪{sam.roots[i].imag:.2f}i$'

else:
label = rf'$\lambda_{i+1} = {sam.roots[i].real:.2f}$'
ax.scatter(0, 0, 0, label=label) # dummy to add to legend

# Add ρ pair to plot


ax.scatter(self.ρ1, self.ρ2, 100, 'red', '+',
label=r'$(\ \rho_1, \ \rho_2 \ )$', zorder=5)

plt.legend(fontsize=12, loc=3)

return fig

418 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

22.6.1 Illustration of Samuelson Class

Now we’ll put our Samuelson class to work on an example

sam = Samuelson(α=0.8, β=0.5, σ=2, g=10, g_t=20, duration='permanent')


sam.summary()

Summary
--------------------------------------------------
Root type: Complex conjugate
Solution type: Damped oscillations
Roots: [0.65+0.27838822j 0.65-0.27838822j]
Absolute value of roots is less than one
Stochastic series with σ = 2
Government spending equal to 10
Permanent government spending shock at t = 20

sam.plot()
plt.show()

22.6. Wrapping Everything Into a Class 419


Intermediate Quantitative Economics with Python

22.6.2 Using the Graph

We’ll use our graph to show where the roots lie and how their location is consistent with the behavior of the path just
graphed.
The red + sign shows the location of the roots

sam.param_plot()
plt.show()

22.7 Using the LinearStateSpace Class

It turns out that we can use the QuantEcon.py LinearStateSpace class to do much of the work that we have done from
scratch above.
Here is how we map the Samuelson model into an instance of a LinearStateSpace class

"""This script maps the Samuelson model in the the


``LinearStateSpace`` class
"""
α = 0.8
β = 0.9
ρ1 = α + β
ρ2 = -β
γ = 10
σ = 1
(continues on next page)

420 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

(continued from previous page)


g = 10
n = 100

A = [[1, 0, 0],
[γ + g, ρ1, ρ2],
[0, 1, 0]]

G = [[γ + g, ρ1, ρ2], # this is Y_{t+1}


[γ, α, 0], # this is C_{t+1}
[0, β, -β]] # this is I_{t+1}

μ_0 = [1, 100, 50]


C = np.zeros((3,1))
C[1] = σ # stochastic

sam_t = LinearStateSpace(A, C, G, mu_0=μ_0)

x, y = sam_t.simulate(ts_length=n)

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


titles = ['Output ($Y_t$)', 'Consumption ($C_t$)', 'Investment ($I_t$)']
colors = ['darkblue', 'red', 'purple']
for ax, series, title, color in zip(axes, y, titles, colors):
ax.plot(series, color=color)
ax.set(title=title, xlim=(0, n))
ax.grid()

axes[-1].set_xlabel('Iteration')

plt.show()

22.7. Using the LinearStateSpace Class 421


Intermediate Quantitative Economics with Python

22.7.1 Other Methods in the LinearStateSpace Class

Let’s plot impulse response functions for the instance of the Samuelson model using a method in the LinearStateS-
pace class

imres = sam_t.impulse_response()
imres = np.asarray(imres)
y1 = imres[:, :, 0]
y2 = imres[:, :, 1]
y1.shape

(2, 6, 1)

Now let’s compute the zeros of the characteristic polynomial by simply calculating the eigenvalues of 𝐴

A = np.asarray(A)
w, v = np.linalg.eig(A)
print(w)

[0.85+0.42130749j 0.85-0.42130749j 1. +0.j ]

422 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

22.7.2 Inheriting Methods from LinearStateSpace

We could also create a subclass of LinearStateSpace (inheriting all its methods and attributes) to add more functions
to use

class SamuelsonLSS(LinearStateSpace):

"""
This subclass creates a Samuelson multiplier-accelerator model
as a linear state space system.
"""
def __init__(self,
y_0=100,
y_1=50,
α=0.8,
β=0.9,
γ=10,
σ=1,
g=10):

self.α, self.β = α, β
self.y_0, self.y_1, self.g = y_0, y_1, g
self.γ, self.σ = γ, σ

# Define intial conditions


self.μ_0 = [1, y_0, y_1]

self.ρ1 = α + β
self.ρ2 = -β

# Define transition matrix


self.A = [[1, 0, 0],
[γ + g, self.ρ1, self.ρ2],
[0, 1, 0]]

# Define output matrix


self.G = [[γ + g, self.ρ1, self.ρ2], # this is Y_{t+1}
[γ, α, 0], # this is C_{t+1}
[0, β, -β]] # this is I_{t+1}

self.C = np.zeros((3, 1))


self.C[1] = σ # stochastic

# Initialize LSS with parameters from Samuelson model


LinearStateSpace.__init__(self, self.A, self.C, self.G, mu_0=self.μ_0)

def plot_simulation(self, ts_length=100, stationary=True):

# Temporarily store original parameters


temp_mu = self.mu_0
temp_Sigma = self.Sigma_0

# Set distribution parameters equal to their stationary


# values for simulation
if stationary == True:
try:
self.mu_x, self.mu_y, self.Sigma_x, self.Sigma_y, self.Sigma_yx = \
(continues on next page)

22.7. Using the LinearStateSpace Class 423


Intermediate Quantitative Economics with Python

(continued from previous page)


self.stationary_distributions()
self.mu_0 = self.mu_x
self.Sigma_0 = self.Sigma_x
# Exception where no convergence achieved when
#calculating stationary distributions
except ValueError:
print('Stationary distribution does not exist')

x, y = self.simulate(ts_length)

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


titles = ['Output ($Y_t$)', 'Consumption ($C_t$)', 'Investment ($I_t$)']
colors = ['darkblue', 'red', 'purple']
for ax, series, title, color in zip(axes, y, titles, colors):
ax.plot(series, color=color)
ax.set(title=title, xlim=(0, n))
ax.grid()

axes[-1].set_xlabel('Iteration')

# Reset distribution parameters to their initial values


self.mu_0 = temp_mu
self.Sigma_0 = temp_Sigma

return fig

def plot_irf(self, j=5):

x, y = self.impulse_response(j)

# Reshape into 3 x j matrix for plotting purposes


yimf = np.array(y).flatten().reshape(j+1, 3).T

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


labels = ['$Y_t$', '$C_t$', '$I_t$']
colors = ['darkblue', 'red', 'purple']
for ax, series, label, color in zip(axes, yimf, labels, colors):
ax.plot(series, color=color)
ax.set(xlim=(0, j))
ax.set_ylabel(label, rotation=0, fontsize=14, labelpad=10)
ax.grid()

axes[0].set_title('Impulse Response Functions')


axes[-1].set_xlabel('Iteration')

return fig

def multipliers(self, j=5):


x, y = self.impulse_response(j)
return np.sum(np.array(y).flatten().reshape(j+1, 3), axis=0)

424 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

22.7.3 Illustrations

Let’s show how we can use the SamuelsonLSS

samlss = SamuelsonLSS()

samlss.plot_simulation(100, stationary=False)
plt.show()

samlss.plot_simulation(100, stationary=True)
plt.show()

22.7. Using the LinearStateSpace Class 425


Intermediate Quantitative Economics with Python

samlss.plot_irf(100)
plt.show()

426 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

samlss.multipliers()

array([7.414389, 6.835896, 0.578493])

22.8 Pure Multiplier Model

Let’s shut down the accelerator by setting 𝑏 = 0 to get a pure multiplier model
• the absence of cycles gives an idea about why Samuelson included the accelerator

pure_multiplier = SamuelsonLSS(α=0.95, β=0)

pure_multiplier.plot_simulation()

22.8. Pure Multiplier Model 427


Intermediate Quantitative Economics with Python

428 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

pure_multiplier = SamuelsonLSS(α=0.8, β=0)

pure_multiplier.plot_simulation()

22.8. Pure Multiplier Model 429


Intermediate Quantitative Economics with Python

pure_multiplier.plot_irf(100)

430 Chapter 22. Samuelson Multiplier-Accelerator


Intermediate Quantitative Economics with Python

22.8. Pure Multiplier Model 431


Intermediate Quantitative Economics with Python

22.9 Summary

In this lecture, we wrote functions and classes to represent non-stochastic and stochastic versions of the Samuelson (1939)
multiplier-accelerator model, described in [Samuelson, 1939].
We saw that different parameter values led to different output paths, which could either be stationary, explosive, or
oscillating.
We also were able to represent the model using the QuantEcon.py LinearStateSpace class.

432 Chapter 22. Samuelson Multiplier-Accelerator


CHAPTER

TWENTYTHREE

KESTEN PROCESSES AND FIRM DYNAMICS

Contents

• Kesten Processes and Firm Dynamics


– Overview
– Kesten Processes
– Heavy Tails
– Application: Firm Dynamics
– Exercises

In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon


!pip install --upgrade yfinance

23.1 Overview

Previously we learned about linear scalar-valued stochastic processes (AR(1) models).


Now we generalize these linear models slightly by allowing the multiplicative coefficient to be stochastic.
Such processes are known as Kesten processes after German–American mathematician Harry Kesten (1931–2019)
Although simple to write down, Kesten processes are interesting for at least two reasons:
1. A number of significant economic processes are or can be described as Kesten processes.
2. Kesten processes generate interesting dynamics, including, in some cases, heavy-tailed cross-sectional distributions.
We will discuss these issues as we go along.
Let’s start with some imports:

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
import quantecon as qe

The following two lines are only added to avoid a FutureWarning caused by compatibility issues between pandas and
matplotlib.

433
Intermediate Quantitative Economics with Python

from pandas.plotting import register_matplotlib_converters


register_matplotlib_converters()

Additional technical background related to this lecture can be found in the monograph of [Buraczewski et al., 2016].

23.2 Kesten Processes

A Kesten process is a stochastic process of the form

𝑋𝑡+1 = 𝑎𝑡+1 𝑋𝑡 + 𝜂𝑡+1 (23.1)

where {𝑎𝑡 }𝑡≥1 and {𝜂𝑡 }𝑡≥1 are IID sequences.


We are interested in the dynamics of {𝑋𝑡 }𝑡≥0 when 𝑋0 is given.
We will focus on the nonnegative scalar case, where 𝑋𝑡 takes values in ℝ+ .
In particular, we will assume that
• the initial condition 𝑋0 is nonnegative,
• {𝑎𝑡 }𝑡≥1 is a nonnegative IID stochastic process and
• {𝜂𝑡 }𝑡≥1 is another nonnegative IID stochastic process, independent of the first.

23.2.1 Example: GARCH Volatility

The GARCH model is common in financial applications, where time series such as asset returns exhibit time varying
volatility.
For example, consider the following plot of daily returns on the Nasdaq Composite Index for the period 1st January 2006
to 1st November 2019.

import yfinance as yf

s = yf.download('^IXIC', '2006-1-1', '2019-11-1')['Adj Close']

r = s.pct_change()

fig, ax = plt.subplots()

ax.plot(r, alpha=0.7)

ax.set_ylabel('returns', fontsize=12)
ax.set_xlabel('date', fontsize=12)

plt.show()

[*********************100%%**********************] 1 of 1 completed

434 Chapter 23. Kesten Processes and Firm Dynamics


Intermediate Quantitative Economics with Python

Notice how the series exhibits bursts of volatility (high variance) and then settles down again.
GARCH models can replicate this feature.
The GARCH(1, 1) volatility process takes the form
2
𝜎𝑡+1 = 𝛼0 + 𝜎𝑡2 (𝛼1 𝜉𝑡+1
2
+ 𝛽) (23.2)

where {𝜉𝑡 } is IID with 𝔼𝜉𝑡2 = 1 and all parameters are positive.
Returns on a given asset are then modeled as

𝑟𝑡 = 𝜎𝑡 𝜁𝑡 (23.3)

where {𝜁𝑡 } is again IID and independent of {𝜉𝑡 }.


The volatility sequence {𝜎𝑡2 }, which drives the dynamics of returns, is a Kesten process.

23.2.2 Example: Wealth Dynamics

Suppose that a given household saves a fixed fraction 𝑠 of its current wealth in every period.
The household earns labor income 𝑦𝑡 at the start of time 𝑡.
Wealth then evolves according to

𝑤𝑡+1 = 𝑅𝑡+1 𝑠𝑤𝑡 + 𝑦𝑡+1 (23.4)

where {𝑅𝑡 } is the gross rate of return on assets.


If {𝑅𝑡 } and {𝑦𝑡 } are both IID, then (23.4) is a Kesten process.

23.2. Kesten Processes 435


Intermediate Quantitative Economics with Python

23.2.3 Stationarity

In earlier lectures, such as the one on AR(1) processes, we introduced the notion of a stationary distribution.
In the present context, we can define a stationary distribution as follows:
The distribution 𝐹 ∗ on ℝ is called stationary for the Kesten process (23.1) if

𝑋𝑡 ∼ 𝐹 ∗ ⟹ 𝑎𝑡+1 𝑋𝑡 + 𝜂𝑡+1 ∼ 𝐹 ∗ (23.5)

In other words, if the current state 𝑋𝑡 has distribution 𝐹 ∗ , then so does the next period state 𝑋𝑡+1 .
We can write this alternatively as

𝐹 ∗ (𝑦) = ∫ ℙ{𝑎𝑡+1 𝑥 + 𝜂𝑡+1 ≤ 𝑦}𝐹 ∗ (𝑑𝑥) for all 𝑦 ≥ 0. (23.6)

The left hand side is the distribution of the next period state when the current state is drawn from 𝐹 ∗ .
The equality in (23.6) states that this distribution is unchanged.

23.2.4 Cross-Sectional Interpretation

There is an important cross-sectional interpretation of stationary distributions, discussed previously but worth repeating
here.
Suppose, for example, that we are interested in the wealth distribution — that is, the current distribution of wealth across
households in a given country.
Suppose further that
• the wealth of each household evolves independently according to (23.4),
• 𝐹 ∗ is a stationary distribution for this stochastic process and
• there are many households.
Then 𝐹 ∗ is a steady state for the cross-sectional wealth distribution in this country.
In other words, if 𝐹 ∗ is the current wealth distribution then it will remain so in subsequent periods, ceteris paribus.
To see this, suppose that 𝐹 ∗ is the current wealth distribution.
What is the fraction of households with wealth less than 𝑦 next period?
To obtain this, we sum the probability that wealth is less than 𝑦 tomorrow, given that current wealth is 𝑤, weighted by the
fraction of households with wealth 𝑤.
Noting that the fraction of households with wealth in interval 𝑑𝑤 is 𝐹 ∗ (𝑑𝑤), we get

∫ ℙ{𝑅𝑡+1 𝑠𝑤 + 𝑦𝑡+1 ≤ 𝑦}𝐹 ∗ (𝑑𝑤)

By the definition of stationarity and the assumption that 𝐹 ∗ is stationary for the wealth process, this is just 𝐹 ∗ (𝑦).
Hence the fraction of households with wealth in [0, 𝑦] is the same next period as it is this period.
Since 𝑦 was chosen arbitrarily, the distribution is unchanged.

436 Chapter 23. Kesten Processes and Firm Dynamics


Intermediate Quantitative Economics with Python

23.2.5 Conditions for Stationarity

The Kesten process 𝑋𝑡+1 = 𝑎𝑡+1 𝑋𝑡 + 𝜂𝑡+1 does not always have a stationary distribution.
For example, if 𝑎𝑡 ≡ 𝜂𝑡 ≡ 1 for all 𝑡, then 𝑋𝑡 = 𝑋0 + 𝑡, which diverges to infinity.
To prevent this kind of divergence, we require that {𝑎𝑡 } is strictly less than 1 most of the time.
In particular, if

𝔼 ln 𝑎𝑡 < 0 and 𝔼𝜂𝑡 < ∞ (23.7)

then a unique stationary distribution exists on ℝ+ .


• See, for example, theorem 2.1.3 of [Buraczewski et al., 2016], which provides slightly weaker conditions.
As one application of this result, we see that the wealth process (23.4) will have a unique stationary distribution whenever
labor income has finite mean and 𝔼 ln 𝑅𝑡 + ln 𝑠 < 0.

23.3 Heavy Tails

Under certain conditions, the stationary distribution of a Kesten process has a Pareto tail.
(See our earlier lecture on heavy-tailed distributions for background.)
This fact is significant for economics because of the prevalence of Pareto-tailed distributions.

23.3.1 The Kesten–Goldie Theorem

To state the conditions under which the stationary distribution of a Kesten process has a Pareto tail, we first recall that a
random variable is called nonarithmetic if its distribution is not concentrated on {… , −2𝑡, −𝑡, 0, 𝑡, 2𝑡, …} for any 𝑡 ≥ 0.
For example, any random variable with a density is nonarithmetic.
The famous Kesten–Goldie Theorem (see, e.g., [Buraczewski et al., 2016], theorem 2.4.4) states that if
1. the stationarity conditions in (23.7) hold,
2. the random variable 𝑎𝑡 is positive with probability one and nonarithmetic,
3. ℙ{𝑎𝑡 𝑥 + 𝜂𝑡 = 𝑥} < 1 for all 𝑥 ∈ ℝ+ and
4. there exists a positive constant 𝛼 such that
𝔼𝑎𝛼
𝑡 = 1, 𝔼𝜂𝑡𝛼 < ∞, and 𝔼[𝑎𝛼+1
𝑡 ]<∞
then the stationary distribution of the Kesten process has a Pareto tail with tail index 𝛼.
More precisely, if 𝐹 ∗ is the unique stationary distribution and 𝑋 ∗ ∼ 𝐹 ∗ , then

lim 𝑥𝛼 ℙ{𝑋 ∗ > 𝑥} = 𝑐


𝑥→∞

for some positive constant 𝑐.

23.3. Heavy Tails 437


Intermediate Quantitative Economics with Python

23.3.2 Intuition

Later we will illustrate the Kesten–Goldie Theorem using rank-size plots.


Prior to doing so, we can give the following intuition for the conditions.
Two important conditions are that 𝔼 ln 𝑎𝑡 < 0, so the model is stationary, and 𝔼𝑎𝛼
𝑡 = 1 for some 𝛼 > 0.

The first condition implies that the distribution of 𝑎𝑡 has a large amount of probability mass below 1.
The second condition implies that the distribution of 𝑎𝑡 has at least some probability mass at or above 1.
The first condition gives us existence of the stationary condition.
The second condition means that the current state can be expanded by 𝑎𝑡 .
If this occurs for several concurrent periods, the effects compound each other, since 𝑎𝑡 is multiplicative.
This leads to spikes in the time series, which fill out the extreme right hand tail of the distribution.
The spikes in the time series are visible in the following simulation, which generates of 10 paths when 𝑎𝑡 and 𝑏𝑡 are
lognormal.

μ = -0.5
σ = 1.0

def kesten_ts(ts_length=100):
x = np.zeros(ts_length)
for t in range(ts_length-1):
a = np.exp(μ + σ * np.random.randn())
b = np.exp(np.random.randn())
x[t+1] = a * x[t] + b
return x

fig, ax = plt.subplots()

num_paths = 10
np.random.seed(12)

for i in range(num_paths):
ax.plot(kesten_ts())

ax.set(xlabel='time', ylabel='$X_t$')
plt.show()

438 Chapter 23. Kesten Processes and Firm Dynamics


Intermediate Quantitative Economics with Python

23.4 Application: Firm Dynamics

As noted in our lecture on heavy tails, for common measures of firm size such as revenue or employment, the US firm
size distribution exhibits a Pareto tail (see, e.g., [Axtell, 2001], [Gabaix, 2016]).
Let us try to explain this rather striking fact using the Kesten–Goldie Theorem.

23.4.1 Gibrat’s Law

It was postulated many years ago by Robert Gibrat [Gibrat, 1931] that firm size evolves according to a simple rule whereby
size next period is proportional to current size.
This is now known as Gibrat’s law of proportional growth.
We can express this idea by stating that a suitably defined measure 𝑠𝑡 of firm size obeys
𝑠𝑡+1
= 𝑎𝑡+1 (23.8)
𝑠𝑡

for some positive IID sequence {𝑎𝑡 }.


One implication of Gibrat’s law is that the growth rate of individual firms does not depend on their size.
However, over the last few decades, research contradicting Gibrat’s law has accumulated in the literature.
For example, it is commonly found that, on average,
1. small firms grow faster than large firms (see, e.g., [Evans, 1987] and [Hall, 1987]) and
2. the growth rate of small firms is more volatile than that of large firms [Dunne et al., 1989].
On the other hand, Gibrat’s law is generally found to be a reasonable approximation for large firms [Evans, 1987].
We can accommodate these empirical findings by modifying (23.8) to

𝑠𝑡+1 = 𝑎𝑡+1 𝑠𝑡 + 𝑏𝑡+1 (23.9)

where {𝑎𝑡 } and {𝑏𝑡 } are both IID and independent of each other.

23.4. Application: Firm Dynamics 439


Intermediate Quantitative Economics with Python

In the exercises you are asked to show that (23.9) is more consistent with the empirical findings presented above than
Gibrat’s law in (23.8).

23.4.2 Heavy Tails

So what has this to do with Pareto tails?


The answer is that (23.9) is a Kesten process.
If the conditions of the Kesten–Goldie Theorem are satisfied, then the firm size distribution is predicted to have heavy
tails — which is exactly what we see in the data.
In the exercises below we explore this idea further, generalizing the firm size dynamics and examining the corresponding
rank-size plots.
We also try to illustrate why the Pareto tail finding is significant for quantitative analysis.

23.5 Exercises

Exercise 23.5.1
Simulate and plot 15 years of daily returns (consider each year as having 250 working days) using the GARCH(1, 1)
process in (23.2)–(23.3).
Take 𝜉𝑡 and 𝜁𝑡 to be independent and standard normal.
Set 𝛼0 = 0.00001, 𝛼1 = 0.1, 𝛽 = 0.9 and 𝜎0 = 0.
Compare visually with the Nasdaq Composite Index returns shown above.
While the time path differs, you should see bursts of high volatility.

Solution to Exercise 23.5.1


Here is one solution:

α_0 = 1e-5
α_1 = 0.1
β = 0.9

years = 15
days = years * 250

def garch_ts(ts_length=days):
σ2 = 0
r = np.zeros(ts_length)
for t in range(ts_length-1):
ξ = np.random.randn()
σ2 = α_0 + σ2 * (α_1 * ξ**2 + β)
r[t] = np.sqrt(σ2) * np.random.randn()
return r

fig, ax = plt.subplots()

(continues on next page)

440 Chapter 23. Kesten Processes and Firm Dynamics


Intermediate Quantitative Economics with Python

(continued from previous page)


np.random.seed(12)

ax.plot(garch_ts(), alpha=0.7)

ax.set(xlabel='time', ylabel='$\\sigma_t^2$')
plt.show()

Exercise 23.5.2
In our discussion of firm dynamics, it was claimed that (23.9) is more consistent with the empirical literature than Gibrat’s
law in (23.8).
(The empirical literature was reviewed immediately above (23.9).)
In what sense is this true (or false)?

Solution to Exercise 23.5.2


The empirical findings are that
1. small firms grow faster than large firms and
2. the growth rate of small firms is more volatile than that of large firms.
Also, Gibrat’s law is generally found to be a reasonable approximation for large firms than for small firms
The claim is that the dynamics in (23.9) are more consistent with points 1-2 than Gibrat’s law.
To see why, we rewrite (23.9) in terms of growth dynamics:
𝑠𝑡+1 𝑏
= 𝑎𝑡+1 + 𝑡+1 (23.10)
𝑠𝑡 𝑠𝑡

Taking 𝑠𝑡 = 𝑠 as given, the mean and variance of firm growth are

𝔼𝑏 𝕍𝑏
𝔼𝑎 + and 𝕍𝑎 +
𝑠 𝑠2

23.5. Exercises 441


Intermediate Quantitative Economics with Python

Both of these decline with firm size 𝑠, consistent with the data.
Moreover, the law of motion (23.10) clearly approaches Gibrat’s law (23.8) as 𝑠𝑡 gets large.

Exercise 23.5.3
Consider an arbitrary Kesten process as given in (23.1).
Suppose that {𝑎𝑡 } is lognormal with parameters (𝜇, 𝜎).
In other words, each 𝑎𝑡 has the same distribution as exp(𝜇 + 𝜎𝑍) when 𝑍 is standard normal.
Suppose further that 𝔼𝜂𝑡𝑟 < ∞ for every 𝑟 > 0, as would be the case if, say, 𝜂𝑡 is also lognormal.
Show that the conditions of the Kesten–Goldie theorem are satisfied if and only if 𝜇 < 0.
Obtain the value of 𝛼 that makes the Kesten–Goldie conditions hold.

Solution to Exercise 23.5.3


Since 𝑎𝑡 has a density it is nonarithmetic.
Since 𝑎𝑡 has the same density as 𝑎 = exp(𝜇 + 𝜎𝑍) when 𝑍 is standard normal, we have

𝔼 ln 𝑎𝑡 = 𝔼(𝜇 + 𝜎𝑍) = 𝜇,

and since 𝜂𝑡 has finite moments of all orders, the stationarity condition holds if and only if 𝜇 < 0.
Given the properties of the lognormal distribution (which has finite moments of all orders), the only other condition in
doubt is existence of a positive constant 𝛼 such that 𝔼𝑎𝛼
𝑡 = 1.

This is equivalent to the statement

𝛼2 𝜎2
exp (𝛼𝜇 + ) = 1.
2

Solving for 𝛼 gives 𝛼 = −2𝜇/𝜎2 .

Exercise 23.5.4
One unrealistic aspect of the firm dynamics specified in (23.9) is that it ignores entry and exit.
In any given period and in any given market, we observe significant numbers of firms entering and exiting the market.
Empirical discussion of this can be found in a famous paper by Hugo Hopenhayn [Hopenhayn, 1992].
In the same paper, Hopenhayn builds a model of entry and exit that incorporates profit maximization by firms and market
clearing quantities, wages and prices.
In his model, a stationary equilibrium occurs when the number of entrants equals the number of exiting firms.
In this setting, firm dynamics can be expressed as

𝑠𝑡+1 = 𝑒𝑡+1 𝟙{𝑠𝑡 < 𝑠}̄ + (𝑎𝑡+1 𝑠𝑡 + 𝑏𝑡+1 )𝟙{𝑠𝑡 ≥ 𝑠}̄ (23.11)

Here
• the state variable 𝑠𝑡 represents productivity (which is a proxy for output and hence firm size),
• the IID sequence {𝑒𝑡 } is thought of as a productivity draw for a new entrant and

442 Chapter 23. Kesten Processes and Firm Dynamics


Intermediate Quantitative Economics with Python

• the variable 𝑠 ̄ is a threshold value that we take as given, although it is determined endogenously in Hopenhayn’s
model.
The idea behind (23.11) is that firms stay in the market as long as their productivity 𝑠𝑡 remains at or above 𝑠.̄
• In this case, their productivity updates according to (23.9).
Firms choose to exit when their productivity 𝑠𝑡 falls below 𝑠.̄
• In this case, they are replaced by a new firm with productivity 𝑒𝑡+1 .
What can we say about dynamics?
Although (23.11) is not a Kesten process, it does update in the same way as a Kesten process when 𝑠𝑡 is large.
So perhaps its stationary distribution still has Pareto tails?
Your task is to investigate this question via simulation and rank-size plots.
The approach will be to
1. generate 𝑀 draws of 𝑠𝑇 when 𝑀 and 𝑇 are large and
2. plot the largest 1,000 of the resulting draws in a rank-size plot.
(The distribution of 𝑠𝑇 will be close to the stationary distribution when 𝑇 is large.)
In the simulation, assume that
• each of 𝑎𝑡 , 𝑏𝑡 and 𝑒𝑡 is lognormal,
• the parameters are

μ_a = -0.5 # location parameter for a


σ_a = 0.1 # scale parameter for a
μ_b = 0.0 # location parameter for b
σ_b = 0.5 # scale parameter for b
μ_e = 0.0 # location parameter for e
σ_e = 0.5 # scale parameter for e
s_bar = 1.0 # threshold
T = 500 # sampling date
M = 1_000_000 # number of firms
s_init = 1.0 # initial condition for each firm

Solution to Exercise 23.5.4


Here’s one solution. First we generate the observations:

from numba import njit, prange


from numpy.random import randn

@njit(parallel=True)
def generate_draws(μ_a=-0.5,
σ_a=0.1,
μ_b=0.0,
σ_b=0.5,
μ_e=0.0,
σ_e=0.5,
s_bar=1.0,
T=500,
(continues on next page)

23.5. Exercises 443


Intermediate Quantitative Economics with Python

(continued from previous page)


M=1_000_000,
s_init=1.0):

draws = np.empty(M)
for m in prange(M):
s = s_init
for t in range(T):
if s < s_bar:
new_s = np.exp(μ_e + σ_e * randn())
else:
a = np.exp(μ_a + σ_a * randn())
b = np.exp(μ_b + σ_b * randn())
new_s = a * s + b
s = new_s
draws[m] = s

return draws

data = generate_draws()

Now we produce the rank-size plot:

fig, ax = plt.subplots()

rank_data, size_data = qe.rank_size(data, c=0.01)


ax.loglog(rank_data, size_data, 'o', markersize=3.0, alpha=0.5)
ax.set_xlabel("log rank")
ax.set_ylabel("log size")

plt.show()

The plot produces a straight line, consistent with a Pareto tail.

444 Chapter 23. Kesten Processes and Firm Dynamics


CHAPTER

TWENTYFOUR

WEALTH DISTRIBUTION DYNAMICS

Contents

• Wealth Distribution Dynamics


– Overview
– Lorenz Curves and the Gini Coefficient
– A Model of Wealth Dynamics
– Implementation
– Applications
– Exercises

See also:
A version of this lecture using a GPU is available here
In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon

24.1 Overview

This notebook gives an introduction to wealth distribution dynamics, with a focus on


• modeling and computing the wealth distribution via simulation,
• measures of inequality such as the Lorenz curve and Gini coefficient, and
• how inequality is affected by the properties of wage income and returns on assets.
One interesting property of the wealth distribution we discuss is Pareto tails.
The wealth distribution in many countries exhibits a Pareto tail
• See this lecture for a definition.
• For a review of the empirical evidence, see, for example, [Benhabib and Bisin, 2018].
This is consistent with high concentration of wealth amongst the richest households.
It also gives us a way to quantify such concentration, in terms of the tail index.

445
Intermediate Quantitative Economics with Python

One question of interest is whether or not we can replicate Pareto tails from a relatively simple model.

24.1.1 A Note on Assumptions

The evolution of wealth for any given household depends on their savings behavior.
Modeling such behavior will form an important part of this lecture series.
However, in this particular lecture, we will be content with rather ad hoc (but plausible) savings rules.
We do this to more easily explore the implications of different specifications of income dynamics and investment returns.
At the same time, all of the techniques discussed here can be plugged into models that use optimization to obtain savings
rules.
We will use the following imports.

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
import quantecon as qe
from numba import njit, float64, prange
from numba.experimental import jitclass

24.2 Lorenz Curves and the Gini Coefficient

Before we investigate wealth dynamics, we briefly review some measures of inequality.

24.2.1 Lorenz Curves

One popular graphical measure of inequality is the Lorenz curve.


The package QuantEcon.py, already imported above, contains a function to compute Lorenz curves.
To illustrate, suppose that

n = 10_000 # size of sample


w = np.exp(np.random.randn(n)) # lognormal draws

is data representing the wealth of 10,000 households.


We can compute and plot the Lorenz curve as follows:

f_vals, l_vals = qe.lorenz_curve(w)

fig, ax = plt.subplots()
ax.plot(f_vals, l_vals, label='Lorenz curve, lognormal sample')
ax.plot(f_vals, f_vals, label='Lorenz curve, equality')
ax.legend()
plt.show()

446 Chapter 24. Wealth Distribution Dynamics


Intermediate Quantitative Economics with Python

This curve can be understood as follows: if point (𝑥, 𝑦) lies on the curve, it means that, collectively, the bottom (100𝑥)%
of the population holds (100𝑦)% of the wealth.
The “equality” line is the 45 degree line (which might not be exactly 45 degrees in the figure, depending on the aspect
ratio).
A sample that produces this line exhibits perfect equality.
The other line in the figure is the Lorenz curve for the lognormal sample, which deviates significantly from perfect equality.
For example, the bottom 80% of the population holds around 40% of total wealth.
Here is another example, which shows how the Lorenz curve shifts as the underlying distribution changes.
We generate 10,000 observations using the Pareto distribution with a range of parameters, and then compute the Lorenz
curve corresponding to each set of observations.

a_vals = (1, 2, 5) # Pareto tail index


n = 10_000 # size of each sample
fig, ax = plt.subplots()
for a in a_vals:
u = np.random.uniform(size=n)
y = u**(-1/a) # distributed as Pareto with tail index a
f_vals, l_vals = qe.lorenz_curve(y)
ax.plot(f_vals, l_vals, label=f'$a = {a}$')
ax.plot(f_vals, f_vals, label='equality')
ax.legend()
plt.show()

24.2. Lorenz Curves and the Gini Coefficient 447


Intermediate Quantitative Economics with Python

You can see that, as the tail parameter of the Pareto distribution increases, inequality decreases.
This is to be expected, because a higher tail index implies less weight in the tail of the Pareto distribution.

24.2.2 The Gini Coefficient

The definition and interpretation of the Gini coefficient can be found on the corresponding Wikipedia page.
A value of 0 indicates perfect equality (corresponding the case where the Lorenz curve matches the 45 degree line) and
a value of 1 indicates complete inequality (all wealth held by the richest household).
The QuantEcon.py library contains a function to calculate the Gini coefficient.
We can test it on the Weibull distribution with parameter 𝑎, where the Gini coefficient is known to be

𝐺 = 1 − 2−1/𝑎

Let’s see if the Gini coefficient computed from a simulated sample matches this at each fixed value of 𝑎.

a_vals = range(1, 20)


ginis = []
ginis_theoretical = []
n = 100

fig, ax = plt.subplots()
for a in a_vals:
y = np.random.weibull(a, size=n)
ginis.append(qe.gini_coefficient(y))
ginis_theoretical.append(1 - 2**(-1/a))
ax.plot(a_vals, ginis, label='estimated gini coefficient')
ax.plot(a_vals, ginis_theoretical, label='theoretical gini coefficient')
ax.legend()
ax.set_xlabel("Weibull parameter $a$")
ax.set_ylabel("Gini coefficient")
plt.show()

448 Chapter 24. Wealth Distribution Dynamics


Intermediate Quantitative Economics with Python

The simulation shows that the fit is good.

24.3 A Model of Wealth Dynamics

Having discussed inequality measures, let us now turn to wealth dynamics.


The model we will study is

𝑤𝑡+1 = (1 + 𝑟𝑡+1 )𝑠(𝑤𝑡 ) + 𝑦𝑡+1 (24.1)

where
• 𝑤𝑡 is wealth at time 𝑡 for a given household,
• 𝑟𝑡 is the rate of return of financial assets,
• 𝑦𝑡 is current non-financial (e.g., labor) income and
• 𝑠(𝑤𝑡 ) is current wealth net of consumption
Letting {𝑧𝑡 } be a correlated state process of the form

𝑧𝑡+1 = 𝑎𝑧𝑡 + 𝑏 + 𝜎𝑧 𝜖𝑡+1

we’ll assume that

𝑅𝑡 ∶= 1 + 𝑟𝑡 = 𝑐𝑟 exp(𝑧𝑡 ) + exp(𝜇𝑟 + 𝜎𝑟 𝜉𝑡 )

and

𝑦𝑡 = 𝑐𝑦 exp(𝑧𝑡 ) + exp(𝜇𝑦 + 𝜎𝑦 𝜁𝑡 )

Here {(𝜖𝑡 , 𝜉𝑡 , 𝜁𝑡 )} is IID and standard normal in ℝ3 .


The value of 𝑐𝑟 should be close to zero, since rates of return on assets do not exhibit large trends.
When we simulate a population of households, we will assume all shocks are idiosyncratic (i.e., specific to individual
households and independent across them).

24.3. A Model of Wealth Dynamics 449


Intermediate Quantitative Economics with Python

Regarding the savings function 𝑠, our default model will be

𝑠(𝑤) = 𝑠0 𝑤 ⋅ 𝟙{𝑤 ≥ 𝑤}
̂ (24.2)

where 𝑠0 is a positive constant.


Thus, for 𝑤 < 𝑤,̂ the household saves nothing. For 𝑤 ≥ 𝑤,̄ the household saves a fraction 𝑠0 of their wealth.
We are using something akin to a fixed savings rate model, while acknowledging that low wealth households tend to save
very little.

24.4 Implementation

Here’s some type information to help Numba.

wealth_dynamics_data = [
('w_hat', float64), # savings parameter
('s_0', float64), # savings parameter
('c_y', float64), # labor income parameter
('μ_y', float64), # labor income paraemter
('σ_y', float64), # labor income parameter
('c_r', float64), # rate of return parameter
('μ_r', float64), # rate of return parameter
('σ_r', float64), # rate of return parameter
('a', float64), # aggregate shock parameter
('b', float64), # aggregate shock parameter
('σ_z', float64), # aggregate shock parameter
('z_mean', float64), # mean of z process
('z_var', float64), # variance of z process
('y_mean', float64), # mean of y process
('R_mean', float64) # mean of R process
]

Here’s a class that stores instance data and implements methods that update the aggregate state and household wealth.

@jitclass(wealth_dynamics_data)
class WealthDynamics:

def __init__(self,
w_hat=1.0,
s_0=0.75,
c_y=1.0,
μ_y=1.0,
σ_y=0.2,
c_r=0.05,
μ_r=0.1,
σ_r=0.5,
a=0.5,
b=0.0,
σ_z=0.1):

self.w_hat, self.s_0 = w_hat, s_0


self.c_y, self.μ_y, self.σ_y = c_y, μ_y, σ_y
self.c_r, self.μ_r, self.σ_r = c_r, μ_r, σ_r
self.a, self.b, self.σ_z = a, b, σ_z

(continues on next page)

450 Chapter 24. Wealth Distribution Dynamics


Intermediate Quantitative Economics with Python

(continued from previous page)


# Record stationary moments
self.z_mean = b / (1 - a)
self.z_var = σ_z**2 / (1 - a**2)
exp_z_mean = np.exp(self.z_mean + self.z_var / 2)
self.R_mean = c_r * exp_z_mean + np.exp(μ_r + σ_r**2 / 2)
self.y_mean = c_y * exp_z_mean + np.exp(μ_y + σ_y**2 / 2)

# Test a stability condition that ensures wealth does not diverge


# to infinity.
α = self.R_mean * self.s_0
if α >= 1:
raise ValueError("Stability condition failed.")

def parameters(self):
"""
Collect and return parameters.
"""
parameters = (self.w_hat, self.s_0,
self.c_y, self.μ_y, self.σ_y,
self.c_r, self.μ_r, self.σ_r,
self.a, self.b, self.σ_z)
return parameters

def update_states(self, w, z):


"""
Update one period, given current wealth w and persistent
state z.
"""

# Simplify names
params = self.parameters()
w_hat, s_0, c_y, μ_y, σ_y, c_r, μ_r, σ_r, a, b, σ_z = params
zp = a * z + b + σ_z * np.random.randn()

# Update wealth
y = c_y * np.exp(zp) + np.exp(μ_y + σ_y * np.random.randn())
wp = y
if w >= w_hat:
R = c_r * np.exp(zp) + np.exp(μ_r + σ_r * np.random.randn())
wp += R * s_0 * w
return wp, zp

Here’s function to simulate the time series of wealth for in individual households.

@njit
def wealth_time_series(wdy, w_0, n):
"""
Generate a single time series of length n for wealth given
initial value w_0.

The initial persistent state z_0 for each household is drawn from
the stationary distribution of the AR(1) process.

* wdy: an instance of WealthDynamics


* w_0: scalar
* n: int
(continues on next page)

24.4. Implementation 451


Intermediate Quantitative Economics with Python

(continued from previous page)

"""
z = wdy.z_mean + np.sqrt(wdy.z_var) * np.random.randn()
w = np.empty(n)
w[0] = w_0
for t in range(n-1):
w[t+1], z = wdy.update_states(w[t], z)
return w

Now here’s function to simulate a cross section of households forward in time.


Note the use of parallelization to speed up computation.

@njit(parallel=True)
def update_cross_section(wdy, w_distribution, shift_length=500):
"""
Shifts a cross-section of household forward in time

* wdy: an instance of WealthDynamics


* w_distribution: array_like, represents current cross-section

Takes a current distribution of wealth values as w_distribution


and updates each w_t in w_distribution to w_{t+j}, where
j = shift_length.

Returns the new distribution.

"""
new_distribution = np.empty_like(w_distribution)

# Update each household


for i in prange(len(new_distribution)):
z = wdy.z_mean + np.sqrt(wdy.z_var) * np.random.randn()
w = w_distribution[i]
for t in range(shift_length-1):
w, z = wdy.update_states(w, z)
new_distribution[i] = w
return new_distribution

Parallelization is very effective in the function above because the time path of each household can be calculated indepen-
dently once the path for the aggregate state is known.

452 Chapter 24. Wealth Distribution Dynamics


Intermediate Quantitative Economics with Python

24.5 Applications

Let’s try simulating the model at different parameter values and investigate the implications for the wealth distribution.

24.5.1 Time Series

Let’s look at the wealth dynamics of an individual household.

wdy = WealthDynamics()
ts_length = 200
w = wealth_time_series(wdy, wdy.y_mean, ts_length)

fig, ax = plt.subplots()
ax.plot(w)
plt.show()

Notice the large spikes in wealth over time.


Such spikes are similar to what we observed in time series when we studied Kesten processes.

24.5.2 Inequality Measures

Let’s look at how inequality varies with returns on financial assets.


The next function generates a cross section and then computes the Lorenz curve and Gini coefficient.

def generate_lorenz_and_gini(wdy, num_households=100_000, T=500):


"""
Generate the Lorenz curve data and gini coefficient corresponding to a
WealthDynamics mode by simulating num_households forward to time T.
"""
ψ_0 = np.full(num_households, wdy.y_mean)
z_0 = wdy.z_mean

(continues on next page)

24.5. Applications 453


Intermediate Quantitative Economics with Python

(continued from previous page)


ψ_star = update_cross_section(wdy, ψ_0, shift_length=T)
return qe.gini_coefficient(ψ_star), qe.lorenz_curve(ψ_star)

Now we investigate how the Lorenz curves associated with the wealth distribution change as return to savings varies.
The code below plots Lorenz curves for three different values of 𝜇𝑟 .
If you are running this yourself, note that it will take one or two minutes to execute.
This is unavoidable because we are executing a CPU intensive task.
In fact the code, which is JIT compiled and parallelized, runs extremely fast relative to the number of computations.

%%time

fig, ax = plt.subplots()
μ_r_vals = (0.0, 0.025, 0.05)
gini_vals = []

for μ_r in μ_r_vals:


wdy = WealthDynamics(μ_r=μ_r)
gv, (f_vals, l_vals) = generate_lorenz_and_gini(wdy)
ax.plot(f_vals, l_vals, label=f'$\psi^*$ at $\mu_r = {μ_r:0.2}$')
gini_vals.append(gv)

ax.plot(f_vals, f_vals, label='equality')


ax.legend(loc="upper left")
plt.show()

CPU times: user 1min 29s, sys: 96.4 ms, total: 1min 29s
Wall time: 12.3 s

The Lorenz curve shifts downwards as returns on financial income rise, indicating a rise in inequality.
We will look at this again via the Gini coefficient immediately below, but first consider the following image of our system
resources when the code above is executing:

454 Chapter 24. Wealth Distribution Dynamics


Intermediate Quantitative Economics with Python

Since the code is both efficiently JIT compiled and fully parallelized, it’s close to impossible to make this sequence of
tasks run faster without changing hardware.
Now let’s check the Gini coefficient.

fig, ax = plt.subplots()
ax.plot(μ_r_vals, gini_vals, label='gini coefficient')
ax.set_xlabel("$\mu_r$")
ax.legend()
plt.show()

Once again, we see that inequality increases as returns on financial income rise.
Let’s finish this section by investigating what happens when we change the volatility term 𝜎𝑟 in financial returns.

%%time

fig, ax = plt.subplots()
σ_r_vals = (0.35, 0.45, 0.52)
gini_vals = []

for σ_r in σ_r_vals:


wdy = WealthDynamics(σ_r=σ_r)
gv, (f_vals, l_vals) = generate_lorenz_and_gini(wdy)
ax.plot(f_vals, l_vals, label=f'$\psi^*$ at $\sigma_r = {σ_r:0.2}$')
gini_vals.append(gv)

ax.plot(f_vals, f_vals, label='equality')


ax.legend(loc="upper left")
plt.show()

24.5. Applications 455


Intermediate Quantitative Economics with Python

CPU times: user 1min 28s, sys: 23.3 ms, total: 1min 28s
Wall time: 11.4 s

We see that greater volatility has the effect of increasing inequality in this model.

24.6 Exercises

Exercise 24.6.1
For a wealth or income distribution with Pareto tail, a higher tail index suggests lower inequality.
Indeed, it is possible to prove that the Gini coefficient of the Pareto distribution with tail index 𝑎 is 1/(2𝑎 − 1).
To the extent that you can, confirm this by simulation.
In particular, generate a plot of the Gini coefficient against the tail index using both the theoretical value just given and
the value computed from a sample via qe.gini_coefficient.
For the values of the tail index, use a_vals = np.linspace(1, 10, 25).
Use sample of size 1,000 for each 𝑎 and the sampling method for generating Pareto draws employed in the discussion of
Lorenz curves for the Pareto distribution.
To the extent that you can, interpret the monotone relationship between the Gini index and 𝑎.

Solution to Exercise 24.6.1


Here is one solution, which produces a good match between theory and simulation.

a_vals = np.linspace(1, 10, 25) # Pareto tail index


ginis = np.empty_like(a_vals)

n = 1000 # size of each sample


fig, ax = plt.subplots()
(continues on next page)

456 Chapter 24. Wealth Distribution Dynamics


Intermediate Quantitative Economics with Python

(continued from previous page)


for i, a in enumerate(a_vals):
y = np.random.uniform(size=n)**(-1/a)
ginis[i] = qe.gini_coefficient(y)
ax.plot(a_vals, ginis, label='sampled')
ax.plot(a_vals, 1/(2*a_vals - 1), label='theoretical')
ax.legend()
plt.show()

In general, for a Pareto distribution, a higher tail index implies less weight in the right hand tail.
This means less extreme values for wealth and hence more equality.
More equality translates to a lower Gini index.

Exercise 24.6.2
The wealth process (24.1) is similar to a Kesten process.
This is because, according to (24.2), savings is constant for all wealth levels above 𝑤.̂
When savings is constant, the wealth process has the same quasi-linear structure as a Kesten process, with multiplicative
and additive shocks.
The Kesten–Goldie theorem tells us that Kesten processes have Pareto tails under a range of parameterizations.
The theorem does not directly apply here, since savings is not always constant and since the multiplicative and additive
terms in (24.1) are not IID.
At the same time, given the similarities, perhaps Pareto tails will arise.
To test this, run a simulation that generates a cross-section of wealth and generate a rank-size plot.
If you like, you can use the function rank_size from the quantecon library (documentation here).
In viewing the plot, remember that Pareto tails generate a straight line. Is this what you see?
For sample size and initial conditions, use

24.6. Exercises 457


Intermediate Quantitative Economics with Python

num_households = 250_000
T = 500 # shift forward T periods
ψ_0 = np.full(num_households, wdy.y_mean) # initial distribution
z_0 = wdy.z_mean

Solution to Exercise 24.6.2


First let’s generate the distribution:

num_households = 250_000
T = 500 # how far to shift forward in time
wdy = WealthDynamics()
ψ_0 = np.full(num_households, wdy.y_mean)
z_0 = wdy.z_mean

ψ_star = update_cross_section(wdy, ψ_0, shift_length=T)

Now let’s see the rank-size plot:

fig, ax = plt.subplots()

rank_data, size_data = qe.rank_size(ψ_star, c=0.001)


ax.loglog(rank_data, size_data, 'o', markersize=3.0, alpha=0.5)
ax.set_xlabel("log rank")
ax.set_ylabel("log size")

plt.show()

458 Chapter 24. Wealth Distribution Dynamics


CHAPTER

TWENTYFIVE

A FIRST LOOK AT THE KALMAN FILTER

Contents

• A First Look at the Kalman Filter


– Overview
– The Basic Idea
– Convergence
– Implementation
– Exercises

In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon

25.1 Overview

This lecture provides a simple and intuitive introduction to the Kalman filter, for those who either
• have heard of the Kalman filter but don’t know how it works, or
• know the Kalman filter equations, but don’t know where they come from
For additional (more advanced) reading on the Kalman filter, see
• [Ljungqvist and Sargent, 2018], section 2.7
• [Anderson and Moore, 2005]
The second reference presents a comprehensive treatment of the Kalman filter.
Required knowledge: Familiarity with matrix manipulations, multivariate normal distributions, covariance matrices, etc.
We’ll need the following imports:

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
from scipy import linalg
import numpy as np
import matplotlib.cm as cm
(continues on next page)

459
Intermediate Quantitative Economics with Python

(continued from previous page)


from quantecon import Kalman, LinearStateSpace
from scipy.stats import norm
from scipy.integrate import quad
from scipy.linalg import eigvals

25.2 The Basic Idea

The Kalman filter has many applications in economics, but for now let’s pretend that we are rocket scientists.
A missile has been launched from country Y and our mission is to track it.
Let 𝑥 ∈ ℝ2 denote the current location of the missile—a pair indicating latitude-longitude coordinates on a map.
At the present moment in time, the precise location 𝑥 is unknown, but we do have some beliefs about 𝑥.
One way to summarize our knowledge is a point prediction 𝑥̂
• But what if the President wants to know the probability that the missile is currently over the Sea of Japan?
• Then it is better to summarize our initial beliefs with a bivariate probability density 𝑝
– ∫𝐸 𝑝(𝑥)𝑑𝑥 indicates the probability that we attach to the missile being in region 𝐸.
The density 𝑝 is called our prior for the random variable 𝑥.
To keep things tractable in our example, we assume that our prior is Gaussian.
In particular, we take

𝑝 = 𝑁 (𝑥,̂ Σ) (25.1)

where 𝑥̂ is the mean of the distribution and Σ is a 2 × 2 covariance matrix. In our simulations, we will suppose that

0.2 0.4 0.3


𝑥̂ = ( ), Σ=( ) (25.2)
−0.2 0.3 0.45

This density 𝑝(𝑥) is shown below as a contour map, with the center of the red ellipse being equal to 𝑥.̂

# Set up the Gaussian prior density p


Σ = [[0.4, 0.3], [0.3, 0.45]]
Σ = np.matrix(Σ)
x_hat = np.matrix([0.2, -0.2]).T
# Define the matrices G and R from the equation y = G x + N(0, R)
G = [[1, 0], [0, 1]]
G = np.matrix(G)
R = 0.5 * Σ
# The matrices A and Q
A = [[1.2, 0], [0, -0.2]]
A = np.matrix(A)
Q = 0.3 * Σ
# The observed value of y
y = np.matrix([2.3, -1.9]).T

# Set up grid for plotting


x_grid = np.linspace(-1.5, 2.9, 100)
y_grid = np.linspace(-3.1, 1.7, 100)
X, Y = np.meshgrid(x_grid, y_grid)
(continues on next page)

460 Chapter 25. A First Look at the Kalman Filter


Intermediate Quantitative Economics with Python

(continued from previous page)

def bivariate_normal(x, y, σ_x=1.0, σ_y=1.0, μ_x=0.0, μ_y=0.0, σ_xy=0.0):


"""
Compute and return the probability density function of bivariate normal
distribution of normal random variables x and y

Parameters
----------
x : array_like(float)
Random variable

y : array_like(float)
Random variable

σ_x : array_like(float)
Standard deviation of random variable x

σ_y : array_like(float)
Standard deviation of random variable y

μ_x : scalar(float)
Mean value of random variable x

μ_y : scalar(float)
Mean value of random variable y

σ_xy : array_like(float)
Covariance of random variables x and y

"""

x_μ = x - μ_x
y_μ = y - μ_y

ρ = σ_xy / (σ_x * σ_y)


z = x_μ**2 / σ_x**2 + y_μ**2 / σ_y**2 - 2 * ρ * x_μ * y_μ / (σ_x * σ_y)
denom = 2 * np.pi * σ_x * σ_y * np.sqrt(1 - ρ**2)
return np.exp(-z / (2 * (1 - ρ**2))) / denom

def gen_gaussian_plot_vals(μ, C):


"Z values for plotting the bivariate Gaussian N(μ, C)"
m_x, m_y = float(μ[0]), float(μ[1])
s_x, s_y = np.sqrt(C[0, 0]), np.sqrt(C[1, 1])
s_xy = C[0, 1]
return bivariate_normal(X, Y, s_x, s_y, m_x, m_y, s_xy)

# Plot the figure

fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

Z = gen_gaussian_plot_vals(x_hat, Σ)
ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)
cs = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs, inline=1, fontsize=10)

plt.show()

25.2. The Basic Idea 461


Intermediate Quantitative Economics with Python

/tmp/ipykernel_5938/3508717107.py:61: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

m_x, m_y = float(μ[0]), float(μ[1])

25.2.1 The Filtering Step

We are now presented with some good news and some bad news.
The good news is that the missile has been located by our sensors, which report that the current location is 𝑦 = (2.3, −1.9).
The next figure shows the original prior 𝑝(𝑥) and the new reported location 𝑦

fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

Z = gen_gaussian_plot_vals(x_hat, Σ)
ax.contourf(X, Y, Z, 6, alpha=0.6, cmap=cm.jet)
cs = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs, inline=1, fontsize=10)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")
(continues on next page)

462 Chapter 25. A First Look at the Kalman Filter


Intermediate Quantitative Economics with Python

(continued from previous page)

plt.show()

/tmp/ipykernel_5938/3508717107.py:61: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

m_x, m_y = float(μ[0]), float(μ[1])


/tmp/ipykernel_5938/3470248806.py:8: DeprecationWarning: Conversion of an array␣
↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")

The bad news is that our sensors are imprecise.


In particular, we should interpret the output of our sensor not as 𝑦 = 𝑥, but rather as

𝑦 = 𝐺𝑥 + 𝑣, where 𝑣 ∼ 𝑁 (0, 𝑅) (25.3)

Here 𝐺 and 𝑅 are 2 × 2 matrices with 𝑅 positive definite. Both are assumed known, and the noise term 𝑣 is assumed to
be independent of 𝑥.

25.2. The Basic Idea 463


Intermediate Quantitative Economics with Python

How then should we combine our prior 𝑝(𝑥) = 𝑁 (𝑥,̂ Σ) and this new information 𝑦 to improve our understanding of the
location of the missile?
As you may have guessed, the answer is to use Bayes’ theorem, which tells us to update our prior 𝑝(𝑥) to 𝑝(𝑥 | 𝑦) via
𝑝(𝑦 | 𝑥) 𝑝(𝑥)
𝑝(𝑥 | 𝑦) =
𝑝(𝑦)
where 𝑝(𝑦) = ∫ 𝑝(𝑦 | 𝑥) 𝑝(𝑥)𝑑𝑥.
In solving for 𝑝(𝑥 | 𝑦), we observe that
• 𝑝(𝑥) = 𝑁 (𝑥,̂ Σ).
• In view of (25.3), the conditional density 𝑝(𝑦 | 𝑥) is 𝑁 (𝐺𝑥, 𝑅).
• 𝑝(𝑦) does not depend on 𝑥, and enters into the calculations only as a normalizing constant.
Because we are in a linear and Gaussian framework, the updated density can be computed by calculating population linear
regressions.
In particular, the solution is known1 to be

𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 )

where

𝑥𝐹̂ ∶= 𝑥̂ + Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 (𝑦 − 𝐺𝑥)̂ and Σ𝐹 ∶= Σ − Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ (25.4)

Here Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is the matrix of population regression coefficients of the hidden object 𝑥 − 𝑥̂ on the surprise
𝑦 − 𝐺𝑥.̂
This new density 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) is shown in the next figure via contour lines and the color map.
The original density is left in as contour lines for comparison

fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y - G * x_hat)
Σ_F = Σ - M * G * Σ
new_Z = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, new_Z, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)
ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")

plt.show()

/tmp/ipykernel_5938/3508717107.py:61: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

m_x, m_y = float(μ[0]), float(μ[1])


(continues on next page)
1 See, for example, page 93 of [Bishop, 2006]. To get from his expressions to the ones used above, you will also need to apply the Woodbury matrix
identity.

464 Chapter 25. A First Look at the Kalman Filter


Intermediate Quantitative Economics with Python

(continued from previous page)


/tmp/ipykernel_5938/792457825.py:14: DeprecationWarning: Conversion of an array␣
↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")

Our new density twists the prior 𝑝(𝑥) in a direction determined by the new information 𝑦 − 𝐺𝑥.̂
In generating the figure, we set 𝐺 to the identity matrix and 𝑅 = 0.5Σ for Σ defined in (25.2).

25.2.2 The Forecast Step

What have we achieved so far?


We have obtained probabilities for the current location of the state (missile) given prior and current information.
This is called “filtering” rather than forecasting because we are filtering out noise rather than looking into the future.
• 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) is called the filtering distribution
But now let’s suppose that we are given another task: to predict the location of the missile after one unit of time (whatever
that may be) has elapsed.
To do this we need a model of how the state evolves.

25.2. The Basic Idea 465


Intermediate Quantitative Economics with Python

Let’s suppose that we have one, and that it’s linear and Gaussian. In particular,

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝑤𝑡+1 , where 𝑤𝑡 ∼ 𝑁 (0, 𝑄) (25.5)

Our aim is to combine this law of motion and our current distribution 𝑝(𝑥 | 𝑦) = 𝑁 (𝑥𝐹̂ , Σ𝐹 ) to come up with a new
predictive distribution for the location in one unit of time.
In view of (25.5), all we have to do is introduce a random vector 𝑥𝐹 ∼ 𝑁 (𝑥𝐹̂ , Σ𝐹 ) and work out the distribution of
𝐴𝑥𝐹 + 𝑤 where 𝑤 is independent of 𝑥𝐹 and has distribution 𝑁 (0, 𝑄).
Since linear combinations of Gaussians are Gaussian, 𝐴𝑥𝐹 + 𝑤 is Gaussian.
Elementary calculations and the expressions in (25.4) tell us that

𝔼[𝐴𝑥𝐹 + 𝑤] = 𝐴𝔼𝑥𝐹 + 𝔼𝑤 = 𝐴𝑥𝐹̂ = 𝐴𝑥̂ + 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 (𝑦 − 𝐺𝑥)̂

and

Var[𝐴𝑥𝐹 + 𝑤] = 𝐴 Var[𝑥𝐹 ]𝐴′ + 𝑄 = 𝐴Σ𝐹 𝐴′ + 𝑄 = 𝐴Σ𝐴′ − 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ𝐴′ + 𝑄

The matrix 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 is often written as 𝐾Σ and called the Kalman gain.
• The subscript Σ has been added to remind us that 𝐾Σ depends on Σ, but not 𝑦 or 𝑥.̂
Using this notation, we can summarize our results as follows.
Our updated prediction is the density 𝑁 (𝑥𝑛𝑒𝑤
̂ , Σ𝑛𝑒𝑤 ) where

𝑥𝑛𝑒𝑤
̂ ∶= 𝐴𝑥̂ + 𝐾Σ (𝑦 − 𝐺𝑥)̂
Σ𝑛𝑒𝑤 ∶= 𝐴Σ𝐴′ − 𝐾Σ 𝐺Σ𝐴′ + 𝑄

• The density 𝑝𝑛𝑒𝑤 (𝑥) = 𝑁 (𝑥𝑛𝑒𝑤


̂ , Σ𝑛𝑒𝑤 ) is called the predictive distribution
The predictive distribution is the new density shown in the following figure, where the update has used parameters.

1.2 0.0
𝐴=( ), 𝑄 = 0.3 ∗ Σ
0.0 −0.2

fig, ax = plt.subplots(figsize=(10, 8))


ax.grid()

# Density 1
Z = gen_gaussian_plot_vals(x_hat, Σ)
cs1 = ax.contour(X, Y, Z, 6, colors="black")
ax.clabel(cs1, inline=1, fontsize=10)

# Density 2
M = Σ * G.T * linalg.inv(G * Σ * G.T + R)
x_hat_F = x_hat + M * (y - G * x_hat)
Σ_F = Σ - M * G * Σ
Z_F = gen_gaussian_plot_vals(x_hat_F, Σ_F)
cs2 = ax.contour(X, Y, Z_F, 6, colors="black")
ax.clabel(cs2, inline=1, fontsize=10)

# Density 3
new_x_hat = A * x_hat_F
new_Σ = A * Σ_F * A.T + Q
new_Z = gen_gaussian_plot_vals(new_x_hat, new_Σ)
cs3 = ax.contour(X, Y, new_Z, 6, colors="black")
(continues on next page)

466 Chapter 25. A First Look at the Kalman Filter


Intermediate Quantitative Economics with Python

(continued from previous page)


ax.clabel(cs3, inline=1, fontsize=10)
ax.contourf(X, Y, new_Z, 6, alpha=0.6, cmap=cm.jet)
ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")

plt.show()

/tmp/ipykernel_5938/3508717107.py:61: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

m_x, m_y = float(μ[0]), float(μ[1])


/tmp/ipykernel_5938/3056082785.py:24: DeprecationWarning: Conversion of an array␣
↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

ax.text(float(y[0]), float(y[1]), "$y$", fontsize=20, color="black")

25.2. The Basic Idea 467


Intermediate Quantitative Economics with Python

25.2.3 The Recursive Procedure

Let’s look back at what we’ve done.


We started the current period with a prior 𝑝(𝑥) for the location 𝑥 of the missile.
We then used the current measurement 𝑦 to update to 𝑝(𝑥 | 𝑦).
Finally, we used the law of motion (25.5) for {𝑥𝑡 } to update to 𝑝𝑛𝑒𝑤 (𝑥).
If we now step into the next period, we are ready to go round again, taking 𝑝𝑛𝑒𝑤 (𝑥) as the current prior.
Swapping notation 𝑝𝑡 (𝑥) for 𝑝(𝑥) and 𝑝𝑡+1 (𝑥) for 𝑝𝑛𝑒𝑤 (𝑥), the full recursive procedure is:
1. Start the current period with prior 𝑝𝑡 (𝑥) = 𝑁 (𝑥𝑡̂ , Σ𝑡 ).
2. Observe current measurement 𝑦𝑡 .
3. Compute the filtering distribution 𝑝𝑡 (𝑥 | 𝑦) = 𝑁 (𝑥𝐹 𝐹
𝑡̂ , Σ𝑡 ) from 𝑝𝑡 (𝑥) and 𝑦𝑡 , applying Bayes rule and the condi-
tional distribution (25.3).
4. Compute the predictive distribution 𝑝𝑡+1 (𝑥) = 𝑁 (𝑥𝑡+1
̂ , Σ𝑡+1 ) from the filtering distribution and (25.5).
5. Increment 𝑡 by one and go to step 1.
Repeating (25.6), the dynamics for 𝑥𝑡̂ and Σ𝑡 are as follows

𝑥𝑡+1
̂ = 𝐴𝑥𝑡̂ + 𝐾Σ𝑡 (𝑦𝑡 − 𝐺𝑥𝑡̂ )
Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ − 𝐾Σ𝑡 𝐺Σ𝑡 𝐴′ + 𝑄

These are the standard dynamic equations for the Kalman filter (see, for example, [Ljungqvist and Sargent, 2018], page
58).

25.3 Convergence

The matrix Σ𝑡 is a measure of the uncertainty of our prediction 𝑥𝑡̂ of 𝑥𝑡 .


Apart from special cases, this uncertainty will never be fully resolved, regardless of how much time elapses.
One reason is that our prediction 𝑥𝑡̂ is made based on information available at 𝑡 − 1, not 𝑡.
Even if we know the precise value of 𝑥𝑡−1 (which we don’t), the transition equation (25.5) implies that 𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝑤𝑡 .
Since the shock 𝑤𝑡 is not observable at 𝑡−1, any time 𝑡−1 prediction of 𝑥𝑡 will incur some error (unless 𝑤𝑡 is degenerate).
However, it is certainly possible that Σ𝑡 converges to a constant matrix as 𝑡 → ∞.
To study this topic, let’s expand the second equation in (25.6):

Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ − 𝐴Σ𝑡 𝐺′ (𝐺Σ𝑡 𝐺′ + 𝑅)−1 𝐺Σ𝑡 𝐴′ + 𝑄 (25.6)

This is a nonlinear difference equation in Σ𝑡 .


A fixed point of (25.6) is a constant matrix Σ such that

Σ = 𝐴Σ𝐴′ − 𝐴Σ𝐺′ (𝐺Σ𝐺′ + 𝑅)−1 𝐺Σ𝐴′ + 𝑄 (25.7)

Equation (25.6) is known as a discrete-time Riccati difference equation.


Equation (25.7) is known as a discrete-time algebraic Riccati equation.
Conditions under which a fixed point exists and the sequence {Σ𝑡 } converges to it are discussed in [Anderson et al., 1996]
and [Anderson and Moore, 2005], chapter 4.

468 Chapter 25. A First Look at the Kalman Filter


Intermediate Quantitative Economics with Python

A sufficient (but not necessary) condition is that all the eigenvalues 𝜆𝑖 of 𝐴 satisfy |𝜆𝑖 | < 1 (cf. e.g., [Anderson and
Moore, 2005], p. 77).
(This strong condition assures that the unconditional distribution of 𝑥𝑡 converges as 𝑡 → +∞.)
In this case, for any initial choice of Σ0 that is both non-negative and symmetric, the sequence {Σ𝑡 } in (25.6) converges
to a non-negative symmetric matrix Σ that solves (25.7).

25.4 Implementation

The class Kalman from the QuantEcon.py package implements the Kalman filter
• Instance data consists of:
– the moments (𝑥𝑡̂ , Σ𝑡 ) of the current prior.
– An instance of the LinearStateSpace class from QuantEcon.py.
The latter represents a linear state space model of the form

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡 + 𝐻𝑣𝑡

where the shocks 𝑤𝑡 and 𝑣𝑡 are IID standard normals.


To connect this with the notation of this lecture we set

𝑄 ∶= 𝐶𝐶 ′ and 𝑅 ∶= 𝐻𝐻 ′

• The class Kalman from the QuantEcon.py package has a number of methods, some that we will wait to use until
we study more advanced applications in subsequent lectures.
• Methods pertinent for this lecture are:
– prior_to_filtered, which updates (𝑥𝑡̂ , Σ𝑡 ) to (𝑥𝐹 𝐹
𝑡̂ , Σ𝑡 )

– filtered_to_forecast, which updates the filtering distribution to the predictive distribution – which
becomes the new prior (𝑥𝑡+1
̂ , Σ𝑡+1 )
– update, which combines the last two methods
– a stationary_values, which computes the solution to (25.7) and the corresponding (stationary)
Kalman gain
You can view the program on GitHub.

25.5 Exercises

Exercise 25.5.1
Consider the following simple application of the Kalman filter, loosely based on [Ljungqvist and Sargent, 2018], section
2.9.2.
Suppose that
• all variables are scalars
• the hidden state {𝑥𝑡 } is in fact constant, equal to some 𝜃 ∈ ℝ unknown to the modeler

25.4. Implementation 469


Intermediate Quantitative Economics with Python

State dynamics are therefore given by (25.5) with 𝐴 = 1, 𝑄 = 0 and 𝑥0 = 𝜃.


The measurement equation is 𝑦𝑡 = 𝜃 + 𝑣𝑡 where 𝑣𝑡 is 𝑁 (0, 1) and IID.
The task of this exercise to simulate the model and, using the code from kalman.py, plot the first five predictive densities
𝑝𝑡 (𝑥) = 𝑁 (𝑥𝑡̂ , Σ𝑡 ).
As shown in [Ljungqvist and Sargent, 2018], sections 2.9.1–2.9.2, these distributions asymptotically put all mass on the
unknown value 𝜃.
In the simulation, take 𝜃 = 10, 𝑥0̂ = 8 and Σ0 = 1.
Your figure should – modulo randomness – look something like this

Solution to Exercise 25.5.1

# Parameters
θ = 10 # Constant value of state x_t
A, C, G, H = 1, 0, 1, 1
ss = LinearStateSpace(A, C, G, H, mu_0=θ)

(continues on next page)

470 Chapter 25. A First Look at the Kalman Filter


Intermediate Quantitative Economics with Python

(continued from previous page)


# Set prior, initialize kalman filter
x_hat_0, Σ_0 = 8, 1
kalman = Kalman(ss, x_hat_0, Σ_0)

# Draw observations of y from state space model


N = 5
x, y = ss.simulate(N)
y = y.flatten()

# Set up plot
fig, ax = plt.subplots(figsize=(10,8))
xgrid = np.linspace(θ - 5, θ + 2, 200)

for i in range(N):
# Record the current predicted mean and variance
m, v = [float(z) for z in (kalman.x_hat, kalman.Sigma)]
# Plot, update filter
ax.plot(xgrid, norm.pdf(xgrid, loc=m, scale=np.sqrt(v)), label=f'$t={i}$')
kalman.update(y[i])

ax.set_title(f'First {N} densities when $\\theta = {θ:.1f}$')


ax.legend(loc='upper left')
plt.show()

/tmp/ipykernel_5938/1660567565.py:21: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

m, v = [float(z) for z in (kalman.x_hat, kalman.Sigma)]

25.5. Exercises 471


Intermediate Quantitative Economics with Python

Exercise 25.5.2
The preceding figure gives some support to the idea that probability mass converges to 𝜃.
To get a better idea, choose a small 𝜖 > 0 and calculate
𝜃+𝜖
𝑧𝑡 ∶= 1 − ∫ 𝑝𝑡 (𝑥)𝑑𝑥
𝜃−𝜖

for 𝑡 = 0, 1, 2, … , 𝑇 .
Plot 𝑧𝑡 against 𝑇 , setting 𝜖 = 0.1 and 𝑇 = 600.
Your figure should show error erratically declining something like this

Solution to Exercise 25.5.2

ϵ = 0.1
θ = 10 # Constant value of state x_t
A, C, G, H = 1, 0, 1, 1
ss = LinearStateSpace(A, C, G, H, mu_0=θ)
(continues on next page)

472 Chapter 25. A First Look at the Kalman Filter


Intermediate Quantitative Economics with Python

25.5. Exercises 473


Intermediate Quantitative Economics with Python

(continued from previous page)

x_hat_0, Σ_0 = 8, 1
kalman = Kalman(ss, x_hat_0, Σ_0)

T = 600
z = np.empty(T)
x, y = ss.simulate(T)
y = y.flatten()

for t in range(T):
# Record the current predicted mean and variance and plot their densities
m, v = [float(temp) for temp in (kalman.x_hat, kalman.Sigma)]

f = lambda x: norm.pdf(x, loc=m, scale=np.sqrt(v))


integral, error = quad(f, θ - ϵ, θ + ϵ)
z[t] = 1 - integral

kalman.update(y[t])

fig, ax = plt.subplots(figsize=(9, 7))


ax.set_ylim(0, 1)
ax.set_xlim(0, T)
ax.plot(range(T), z)
ax.fill_between(range(T), np.zeros(T), z, color="blue", alpha=0.2)
plt.show()

/tmp/ipykernel_5938/3050251196.py:16: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

m, v = [float(temp) for temp in (kalman.x_hat, kalman.Sigma)]

474 Chapter 25. A First Look at the Kalman Filter


Intermediate Quantitative Economics with Python

Exercise 25.5.3
As discussed above, if the shock sequence {𝑤𝑡 } is not degenerate, then it is not in general possible to predict 𝑥𝑡 without
error at time 𝑡 − 1 (and this would be the case even if we could observe 𝑥𝑡−1 ).
Let’s now compare the prediction 𝑥𝑡̂ made by the Kalman filter against a competitor who is allowed to observe 𝑥𝑡−1 .
This competitor will use the conditional expectation 𝔼[𝑥𝑡 | 𝑥𝑡−1 ], which in this case is 𝐴𝑥𝑡−1 .
The conditional expectation is known to be the optimal prediction method in terms of minimizing mean squared error.
(More precisely, the minimizer of 𝔼 ‖𝑥𝑡 − 𝑔(𝑥𝑡−1 )‖2 with respect to 𝑔 is 𝑔∗ (𝑥𝑡−1 ) ∶= 𝔼[𝑥𝑡 | 𝑥𝑡−1 ])
Thus we are comparing the Kalman filter against a competitor who has more information (in the sense of being able to
observe the latent state) and behaves optimally in terms of minimizing squared error.
Our horse race will be assessed in terms of squared error.
In particular, your task is to generate a graph plotting observations of both ‖𝑥𝑡 − 𝐴𝑥𝑡−1 ‖2 and ‖𝑥𝑡 − 𝑥𝑡̂ ‖2 against 𝑡 for
𝑡 = 1, … , 50.
For the parameters, set 𝐺 = 𝐼, 𝑅 = 0.5𝐼 and 𝑄 = 0.3𝐼, where 𝐼 is the 2 × 2 identity.
Set
0.5 0.4
𝐴=( )
0.6 0.3

25.5. Exercises 475


Intermediate Quantitative Economics with Python

To initialize the prior density, set


0.9 0.3
Σ0 = ( )
0.3 0.9
and 𝑥0̂ = (8, 8).
Finally, set 𝑥0 = (0, 0).
You should end up with a figure similar to the following (modulo randomness)

Observe how, after an initial learning period, the Kalman filter performs quite well, even relative to the competitor who
predicts optimally with knowledge of the latent state.

Solution to Exercise 25.5.3

# Define A, C, G, H
G = np.identity(2)
H = np.sqrt(0.5) * np.identity(2)

A = [[0.5, 0.4],
[0.6, 0.3]]
C = np.sqrt(0.3) * np.identity(2)

# Set up state space mode, initial value x_0 set to zero


ss = LinearStateSpace(A, C, G, H, mu_0 = np.zeros(2))

(continues on next page)

476 Chapter 25. A First Look at the Kalman Filter


Intermediate Quantitative Economics with Python

(continued from previous page)


# Define the prior density
Σ = [[0.9, 0.3],
[0.3, 0.9]]
Σ = np.array(Σ)
x_hat = np.array([8, 8])

# Initialize the Kalman filter


kn = Kalman(ss, x_hat, Σ)

# Print eigenvalues of A
print("Eigenvalues of A:")
print(eigvals(A))

# Print stationary Σ
S, K = kn.stationary_values()
print("Stationary prediction error variance:")
print(S)

# Generate the plot


T = 50
x, y = ss.simulate(T)

e1 = np.empty(T-1)
e2 = np.empty(T-1)

for t in range(1, T):


kn.update(y[:,t])
e1[t-1] = np.sum((x[:, t] - kn.x_hat.flatten())**2)
e2[t-1] = np.sum((x[:, t] - A @ x[:, t-1])**2)

fig, ax = plt.subplots(figsize=(9,6))
ax.plot(range(1, T), e1, 'k-', lw=2, alpha=0.6,
label='Kalman filter error')
ax.plot(range(1, T), e2, 'g-', lw=2, alpha=0.6,
label='Conditional expectation error')
ax.legend()
plt.show()

Eigenvalues of A:
[ 0.9+0.j -0.1+0.j]
Stationary prediction error variance:
[[0.40329108 0.1050718 ]
[0.1050718 0.41061709]]

25.5. Exercises 477


Intermediate Quantitative Economics with Python

Exercise 25.5.4
Try varying the coefficient 0.3 in 𝑄 = 0.3𝐼 up and down.
Observe how the diagonal values in the stationary solution Σ (see (25.7)) increase and decrease in line with this coefficient.
The interpretation is that more randomness in the law of motion for 𝑥𝑡 causes more (permanent) uncertainty in prediction.

478 Chapter 25. A First Look at the Kalman Filter


CHAPTER

TWENTYSIX

ANOTHER LOOK AT THE KALMAN FILTER

Contents

• Another Look at the Kalman Filter


– A worker’s output
– A firm’s wage-setting policy
– A state-space representation
– An Innovations Representation
– Some Computational Experiments
– Future Extensions

In this quantecon lecture A First Look at the Kalman filter, we used a Kalman filter to estimate locations of a rocket.
In this lecture, we’ll use the Kalman filter to infer a worker’s human capital and the effort that the worker devotes to
accumulating human capital, neither of which the firm observes directly.
The firm learns about those things only by observing a history of the output that the worker generates for the firm, and
from understanding how that output depends on the worker’s human capital and how human capital evolves as a function
of the worker’s effort.
We’ll posit a rule that expresses how the much firm pays the worker each period as a function of the firm’s information
each period.
In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon

To conduct simulations, we bring in these imports, as in A First Look at the Kalman filter.

import matplotlib.pyplot as plt


import numpy as np
from quantecon import Kalman, LinearStateSpace
from collections import namedtuple
from scipy.stats import multivariate_normal
import matplotlib as mpl
mpl.rcParams['text.usetex'] = True
mpl.rcParams['text.latex.preamble'] = r'\usepackage{{amsmath}}'

479
Intermediate Quantitative Economics with Python

26.1 A worker’s output

A representative worker is permanently employed at a firm.


The workers’ output is described by the following dynamic process:

ℎ𝑡+1 = 𝛼ℎ𝑡 + 𝛽𝑢𝑡 + 𝑐𝑤𝑡+1 , 𝑐𝑡+1 ∼ 𝒩(0, 1)


𝑢𝑡+1 = 𝑢𝑡 (26.1)
𝑦𝑡 = 𝑔ℎ𝑡 + 𝑣𝑡 , 𝑣𝑡 ∼ 𝒩(0, 𝑅)

Here
• ℎ𝑡 is the logarithm of human capital at time 𝑡
• 𝑢𝑡 is the logarithm of the worker’s effort at accumulating human capital at 𝑡
• 𝑦𝑡 is the logarithm of the worker’s output at time 𝑡
• ℎ0 ∼ 𝒩(ℎ̂ 0 , 𝜎ℎ,0 )
• 𝑢0 ∼ 𝒩(𝑢̂0 , 𝜎𝑢,0 )

Parameters of the model are 𝛼, 𝛽, 𝑐, 𝑅, 𝑔, ℎ̂ 0 , 𝑢̂0 , 𝜎ℎ , 𝜎𝑢 .


At time 0, a firm has hired the worker.
The worker is permanently attached to the firm and so works for the same firm at all dates 𝑡 = 0, 1, 2, ….
At the beginning of time 0, the firm observes neither the worker’s innate initial human capital ℎ0 nor its hard-wired
permanent effort level 𝑢0 .
The firm believes that 𝑢0 for a particular worker is drawn from a Gaussian probability distribution, and so is described by
𝑢0 ∼ 𝒩(𝑢̂0 , 𝜎𝑢,0 ).
The ℎ𝑡 part of a worker’s “type” moves over time, but the effort component of the worker’s type is 𝑢𝑡 = 𝑢0 .
This means that from the firm’s point of view, the worker’s effort is effectively an unknown fixed “parameter”.
At time 𝑡 ≥ 1, for a particular worker the firm observed 𝑦𝑡−1 = [𝑦𝑡−1 , 𝑦𝑡−2 , … , 𝑦0 ].
The firm does not observe the worker’s “type” (ℎ0 , 𝑢0 ).
But the firm does observe the worker’s output 𝑦𝑡 at time 𝑡 and remembers the worker’s past outputs 𝑦𝑡−1 .

26.2 A firm’s wage-setting policy

Based on information about the worker that the firm has at time 𝑡 ≥ 1, the firm pays the worker log wage

𝑤𝑡 = 𝑔𝐸[ℎ𝑡 |𝑦𝑡−1 ], 𝑡≥1

and at time 0 pays the worker a log wage equal to the unconditional mean of 𝑦0 :

𝑤0 = 𝑔ℎ̂ 0

In using this payment rule, the firm is taking into account that the worker’s log output today is partly due to the random
component 𝑣𝑡 that comes entirely from luck, and that is assumed to be independent of ℎ𝑡 and 𝑢𝑡 .

480 Chapter 26. Another Look at the Kalman Filter


Intermediate Quantitative Economics with Python

26.3 A state-space representation

Write system (26.1.1) in the state-space form

ℎ𝑡+1 𝛼 𝛽 ℎ𝑡 𝑐
[ ]=[ ] [ ] + [ ] 𝑤𝑡+1
𝑢𝑡+1 0 1 𝑢𝑡 0

𝑦𝑡 = [𝑔 0] [ 𝑡 ] + 𝑣𝑡
𝑢𝑡

which is equivalent with

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1


𝑦𝑡 = 𝐺𝑥𝑡 + 𝑣𝑡 (26.2)
𝑥0 ∼ 𝒩(𝑥0̂ , Σ0 )

where
ℎ ℎ̂ 0 𝜎ℎ,0 0
𝑥𝑡 = [ 𝑡 ] , 𝑥0̂ = [ ], Σ0 = [ ]
𝑢𝑡 𝑢̂0 0 𝜎𝑢,0

To compute the firm’s wage setting policy, we first we create a namedtuple to store the parameters of the model

WorkerModel = namedtuple("WorkerModel",
('A', 'C', 'G', 'R', 'xhat_0', 'Σ_0'))

def create_worker(α=.8, β=.2, c=.2,


R=.5, g=1.0, hhat_0=4, uhat_0=4,
σ_h=4, σ_u=4):

A = np.array([[α, β],
[0, 1]])
C = np.array([[c],
[0]])
G = np.array([g, 1])

# Define initial state and covariance matrix


xhat_0 = np.array([[hhat_0],
[uhat_0]])

Σ_0 = np.array([[σ_h, 0],


[0, σ_u]])

return WorkerModel(A=A, C=C, G=G, R=R, xhat_0=xhat_0, Σ_0=Σ_0)

Please note how the WorkerModel namedtuple creates all of the objects required to compute an associated state-space
representation (26.2).
This is handy, because in order to simulate a history {𝑦𝑡 , ℎ𝑡 } for a worker, we’ll want to form state space system for
him/her by using the LinearStateSpace class.

# Define A, C, G, R, xhat_0, Σ_0


worker = create_worker()
A, C, G, R = worker.A, worker.C, worker.G, worker.R
xhat_0, Σ_0 = worker.xhat_0, worker.Σ_0

# Create a LinearStateSpace object


(continues on next page)

26.3. A state-space representation 481


Intermediate Quantitative Economics with Python

(continued from previous page)


ss = LinearStateSpace(A, C, G, np.sqrt(R),
mu_0=xhat_0, Sigma_0=np.zeros((2,2)))

T = 100
x, y = ss.simulate(T)
y = y.flatten()

h_0, u_0 = x[0, 0], x[1, 0]

Next, to compute the firm’s policy for setting the log wage based on the information it has about the worker, we use the
Kalman filter described in this quantecon lecture A First Look at the Kalman filter.
In particular, we want to compute all of the objects in an “innovation representation”.

26.4 An Innovations Representation

We have all the objects in hand required to form an innovations representation for the output process {𝑦𝑡 }𝑇𝑡=0 for a worker.
Let’s code that up now.

𝑥𝑡+1
̂ = 𝐴𝑥𝑡̂ + 𝐾𝑡 𝑎𝑡
𝑦𝑡 = 𝐺𝑥𝑡̂ + 𝑎𝑡

where 𝐾𝑡 is the Kalman gain matrix at time 𝑡.


We accomplish this in the following code that uses the Kalman class.

kalman = Kalman(ss, xhat_0, Σ_0)


Σ_t = np.zeros((*Σ_0.shape, T-1))
y_hat_t = np.zeros(T-1)
x_hat_t = np.zeros((2, T-1))

for t in range(1, T):


kalman.update(y[t])
x_hat, Σ = kalman.x_hat, kalman.Sigma
Σ_t[:, :, t-1] = Σ
x_hat_t[:, t-1] = x_hat.reshape(-1)
y_hat_t[t-1] = worker.G @ x_hat

x_hat_t = np.concatenate((x[:, 1][:, np.newaxis],


x_hat_t), axis=1)
Σ_t = np.concatenate((worker.Σ_0[:, :, np.newaxis],
Σ_t), axis=2)
u_hat_t = x_hat_t[1, :]

/tmp/ipykernel_5973/2927621375.py:11: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

y_hat_t[t-1] = worker.G @ x_hat

For a draw of ℎ0 , 𝑢0 , we plot 𝐸𝑦𝑡 = 𝐺𝑥𝑡̂ where 𝑥𝑡̂ = 𝐸[𝑥𝑡 |𝑦𝑡−1 ].


We also plot 𝐸[𝑢0 |𝑦𝑡−1 ], which is the firm inference about a worker’s hard-wired “work ethic” 𝑢0 , conditioned on infor-
mation 𝑦𝑡−1 that it has about him or her coming into period 𝑡.

482 Chapter 26. Another Look at the Kalman Filter


Intermediate Quantitative Economics with Python

We can watch as the firm’s inference 𝐸[𝑢0 |𝑦𝑡−1 ] of the worker’s work ethic converges toward the hidden 𝑢0 , which is not
directly observed by the firm.

fig, ax = plt.subplots(1, 2)

ax[0].plot(y_hat_t, label=r'$E[y_t| y^{t-1}]$')


ax[0].set_xlabel('Time')
ax[0].set_ylabel(r'$E[y_t]$')
ax[0].set_title(r'$E[y_t]$ over time')
ax[0].legend()

ax[1].plot(u_hat_t, label=r'$E[u_t|y^{t-1}]$')
ax[1].axhline(y=u_0, color='grey',
linestyle='dashed', label=fr'$u_0={u_0:.2f}$')
ax[1].set_xlabel('Time')
ax[1].set_ylabel(r'$E[u_t|y^{t-1}]$')
ax[1].set_title('Inferred work ethic over time')
ax[1].legend()

fig.tight_layout()
plt.show()

26.4. An Innovations Representation 483


Intermediate Quantitative Economics with Python

26.5 Some Computational Experiments

Let’s look at Σ0 and Σ𝑇 in order to see how much the firm learns about the hidden state during the horizon we have set.

print(Σ_t[:, :, 0])

[[4. 0.]
[0. 4.]]

print(Σ_t[:, :, -1])

[[0.08805027 0.00100377]
[0.00100377 0.00398351]]

Evidently, entries in the conditional covariance matrix become smaller over time.
It is enlightening to portray how conditional covariance matrices Σ𝑡 evolve by plotting confidence ellipsoides around
𝐸[𝑥𝑡 |𝑦𝑡−1 ] at various 𝑡’s.

# Create a grid of points for contour plotting


h_range = np.linspace(x_hat_t[0, :].min()-0.5*Σ_t[0, 0, 1],
x_hat_t[0, :].max()+0.5*Σ_t[0, 0, 1], 100)
u_range = np.linspace(x_hat_t[1, :].min()-0.5*Σ_t[1, 1, 1],
x_hat_t[1, :].max()+0.5*Σ_t[1, 1, 1], 100)
h, u = np.meshgrid(h_range, u_range)

# Create a figure with subplots for each time step


fig, axs = plt.subplots(1, 3, figsize=(12, 7))

# Iterate through each time step


for i, t in enumerate(np.linspace(0, T-1, 3, dtype=int)):
# Create a multivariate normal distribution with x_hat and Σ at time step t
mu = x_hat_t[:, t]
cov = Σ_t[:, :, t]
mvn = multivariate_normal(mean=mu, cov=cov)

# Evaluate the multivariate normal PDF on the grid


pdf_values = mvn.pdf(np.dstack((h, u)))

# Create a contour plot for the PDF


con = axs[i].contour(h, u, pdf_values, cmap='viridis')
axs[i].clabel(con, inline=1, fontsize=10)
axs[i].set_title(f'Time Step {t+1}')
axs[i].set_xlabel(r'$h_{{{}}}$'.format(str(t+1)))
axs[i].set_ylabel(r'$u_{{{}}}$'.format(str(t+1)))

cov_latex = r'$\Sigma_{{{}}}= \begin{{bmatrix}} {:.2f} & {:.2f} \\ {:.2f} & {:.2f}


↪ \end{{bmatrix}}$'.format(
t+1, cov[0, 0], cov[0, 1], cov[1, 0], cov[1, 1]
)
axs[i].text(0.33, -0.15, cov_latex, transform=axs[i].transAxes)

plt.tight_layout()
plt.show()

484 Chapter 26. Another Look at the Kalman Filter


Intermediate Quantitative Economics with Python

Note how the accumulation of evidence 𝑦𝑡 affects the shape of the confidence ellipsoid as sample size 𝑡 grows.
Now let’s use our code to set the hidden state 𝑥0 to a particular vector in order to watch how a firm learns starting from
some 𝑥0 we are interested in.
For example, let’s say ℎ0 = 0 and 𝑢0 = 4.
Here is one way to do this.

# For example, we might want h_0 = 0 and u_0 = 4


mu_0 = np.array([0.0, 4.0])

# Create a LinearStateSpace object with Sigma_0 as a matrix of zeros


ss_example = LinearStateSpace(A, C, G, np.sqrt(R), mu_0=mu_0,
# This line forces exact h_0=0 and u_0=4
Sigma_0=np.zeros((2, 2))
)

T = 100
x, y = ss_example.simulate(T)
y = y.flatten()

# Now h_0=0 and u_0=4


h_0, u_0 = x[0, 0], x[1, 0]
print('h_0 =', h_0)
print('u_0 =', u_0)

h_0 = 0.0
u_0 = 4.0

Another way to accomplish the same goal is to use the following code.

26.5. Some Computational Experiments 485


Intermediate Quantitative Economics with Python

# If we want to set the initial


# h_0 = hhat_0 = 0 and u_0 = uhhat_0 = 4.0:
worker = create_worker(hhat_0=0.0, uhat_0=4.0)

ss_example = LinearStateSpace(A, C, G, np.sqrt(R),


# This line takes h_0=hhat_0 and u_0=uhhat_0
mu_0=worker.xhat_0,
# This line forces exact h_0=hhat_0 and u_0=uhhat_0
Sigma_0=np.zeros((2, 2))
)

T = 100
x, y = ss_example.simulate(T)
y = y.flatten()

# Now h_0 and u_0 will be exactly hhat_0


h_0, u_0 = x[0, 0], x[1, 0]
print('h_0 =', h_0)
print('u_0 =', u_0)

h_0 = 0.0
u_0 = 4.0

For this worker, let’s generate a plot like the one above.

# First we compute the Kalman filter with initial xhat_0 and Σ_0
kalman = Kalman(ss, xhat_0, Σ_0)
Σ_t = []
y_hat_t = np.zeros(T-1)
u_hat_t = np.zeros(T-1)

# Then we iteratively update the Kalman filter class using


# observation y based on the linear state model above:
for t in range(1, T):
kalman.update(y[t])
x_hat, Σ = kalman.x_hat, kalman.Sigma
Σ_t.append(Σ)
y_hat_t[t-1] = worker.G @ x_hat
u_hat_t[t-1] = x_hat[1]

# Generate plots for y_hat_t and u_hat_t


fig, ax = plt.subplots(1, 2)

ax[0].plot(y_hat_t, label=r'$E[y_t| y^{t-1}]$')


ax[0].set_xlabel('Time')
ax[0].set_ylabel(r'$E[y_t]$')
ax[0].set_title(r'$E[y_t]$ over time')
ax[0].legend()

ax[1].plot(u_hat_t, label=r'$E[u_t|y^{t-1}]$')
ax[1].axhline(y=u_0, color='grey',
linestyle='dashed', label=fr'$u_0={u_0:.2f}$')
ax[1].set_xlabel('Time')
ax[1].set_ylabel(r'$E[u_t|y^{t-1}]$')
ax[1].set_title('Inferred work ethic over time')
(continues on next page)

486 Chapter 26. Another Look at the Kalman Filter


Intermediate Quantitative Economics with Python

(continued from previous page)


ax[1].legend()

fig.tight_layout()
plt.show()

/tmp/ipykernel_5973/1462412779.py:13: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

y_hat_t[t-1] = worker.G @ x_hat


/tmp/ipykernel_5973/1462412779.py:14: DeprecationWarning: Conversion of an array␣
↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

u_hat_t[t-1] = x_hat[1]

More generally, we can change some or all of the parameters defining a worker in our create_worker namedtuple.
Here is an example.

# We can set these parameters when creating a worker -- just like classes!
hard_working_worker = create_worker(α=.4, β=.8,
hhat_0=7.0, uhat_0=100, σ_h=2.5, σ_u=3.2)

print(hard_working_worker)

26.5. Some Computational Experiments 487


Intermediate Quantitative Economics with Python

WorkerModel(A=array([[0.4, 0.8],
[0. , 1. ]]), C=array([[0.2],
[0. ]]), G=array([1., 1.]), R=0.5, xhat_0=array([[ 7.],
[100.]]), Σ_0=array([[2.5, 0. ],
[0. , 3.2]]))

We can also simulate the system for 𝑇 = 50 periods for different workers.
The difference between the inferred work ethics and true work ethics converges to 0 over time.
This shows that the filter is gradually teaching the worker and firm about the worker’s effort.

num_workers = 3
T = 50
fig, ax = plt.subplots(figsize=(7, 7))

for i in range(num_workers):
worker = create_worker(uhat_0=4+2*i)
simulate_workers(worker, T, ax)
ax.set_ylim(ymin=-2, ymax=2)
plt.show()

/tmp/ipykernel_5973/2747793518.py:30: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

y_hat_t[i] = worker.G @ x_hat


/tmp/ipykernel_5973/2747793518.py:31: DeprecationWarning: Conversion of an array␣
↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

u_hat_t[i] = x_hat[1]

488 Chapter 26. Another Look at the Kalman Filter


Intermediate Quantitative Economics with Python

# We can also generate plots of u_t:

T = 50
fig, ax = plt.subplots(figsize=(7, 7))

uhat_0s = [2, -2, 1]


αs = [0.2, 0.3, 0.5]
βs = [0.1, 0.9, 0.3]

for i, (uhat_0, α, β) in enumerate(zip(uhat_0s, αs, βs)):


worker = create_worker(uhat_0=uhat_0, α=α, β=β)
simulate_workers(worker, T, ax,
# By setting diff=False, it will give u_t
diff=False, name=r'$u_{{{}, t}}$'.format(i))

ax.axhline(y=u_0, xmin=0, xmax=0, color='grey',


(continues on next page)

26.5. Some Computational Experiments 489


Intermediate Quantitative Economics with Python

(continued from previous page)


linestyle='dashed', label=r'$u_{i, 0}$')
ax.legend(bbox_to_anchor=(1, 0.5))
plt.show()

/tmp/ipykernel_5973/2747793518.py:30: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

y_hat_t[i] = worker.G @ x_hat


/tmp/ipykernel_5973/2747793518.py:31: DeprecationWarning: Conversion of an array␣
↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

u_hat_t[i] = x_hat[1]

# We can also use exact u_0=1 and h_0=2 for all workers

(continues on next page)

490 Chapter 26. Another Look at the Kalman Filter


Intermediate Quantitative Economics with Python

(continued from previous page)


T = 50
fig, ax = plt.subplots(figsize=(7, 7))

# These two lines set u_0=1 and h_0=2 for all workers
mu_0 = np.array([[1],
[2]])
Sigma_0 = np.zeros((2,2))

uhat_0s = [2, -2, 1]


αs = [0.2, 0.3, 0.5]
βs = [0.1, 0.9, 0.3]

for i, (uhat_0, α, β) in enumerate(zip(uhat_0s, αs, βs)):


worker = create_worker(uhat_0=uhat_0, α=α, β=β)
simulate_workers(worker, T, ax, mu_0=mu_0, Sigma_0=Sigma_0,
diff=False, name=r'$u_{{{}, t}}$'.format(i))

# This controls the boundary of plots


ax.set_ylim(ymin=-3, ymax=3)
ax.axhline(y=u_0, xmin=0, xmax=0, color='grey',
linestyle='dashed', label=r'$u_{i, 0}$')
ax.legend(bbox_to_anchor=(1, 0.5))
plt.show()

/tmp/ipykernel_5973/2747793518.py:30: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

y_hat_t[i] = worker.G @ x_hat


/tmp/ipykernel_5973/2747793518.py:31: DeprecationWarning: Conversion of an array␣
↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

u_hat_t[i] = x_hat[1]

26.5. Some Computational Experiments 491


Intermediate Quantitative Economics with Python

# We can generate a plot for only one of the workers:

T = 50
fig, ax = plt.subplots(figsize=(7, 7))

mu_0_1 = np.array([[1],
[100]])
mu_0_2 = np.array([[1],
[30]])
Sigma_0 = np.zeros((2,2))

uhat_0s = 100
αs = 0.5
βs = 0.3

worker = create_worker(uhat_0=uhat_0, α=α, β=β)


(continues on next page)

492 Chapter 26. Another Look at the Kalman Filter


Intermediate Quantitative Economics with Python

(continued from previous page)


simulate_workers(worker, T, ax, mu_0=mu_0_1, Sigma_0=Sigma_0,
diff=False, name=r'Hard-working worker')
simulate_workers(worker, T, ax, mu_0=mu_0_2, Sigma_0=Sigma_0,
diff=False,
title='A hard-working worker and a less hard-working worker',
name=r'Normal worker')
ax.axhline(y=u_0, xmin=0, xmax=0, color='grey',
linestyle='dashed', label=r'$u_{i, 0}$')
ax.legend(bbox_to_anchor=(1, 0.5))
plt.show()

/tmp/ipykernel_5973/2747793518.py:30: DeprecationWarning: Conversion of an array␣


↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

y_hat_t[i] = worker.G @ x_hat


/tmp/ipykernel_5973/2747793518.py:31: DeprecationWarning: Conversion of an array␣
↪with ndim > 0 to a scalar is deprecated, and will error in future. Ensure you␣

↪extract a single element from your array before performing this operation.␣

↪(Deprecated NumPy 1.25.)

u_hat_t[i] = x_hat[1]

26.5. Some Computational Experiments 493


Intermediate Quantitative Economics with Python

26.6 Future Extensions

We can do lots of enlightening experiments by creating new types of workers and letting the firm learn about their hidden
(to the firm) states by observing just their output histories.

494 Chapter 26. Another Look at the Kalman Filter


Part V

Search

495
CHAPTER

TWENTYSEVEN

JOB SEARCH I: THE MCCALL SEARCH MODEL

Contents

• Job Search I: The McCall Search Model


– Overview
– The McCall Model
– Computing the Optimal Policy: Take 1
– Computing an Optimal Policy: Take 2
– Exercises

“Questioning a McCall worker is like having a conversation with an out-of-work friend: ‘Maybe you are
setting your sights too high’, or ‘Why did you quit your old job before you had a new one lined up?’ This is
real social science: an attempt to model, to understand, human behavior by visualizing the situation people
find themselves in, the options they face and the pros and cons as they themselves see them.” – Robert E.
Lucas, Jr.
In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon

27.1 Overview

The McCall search model [McCall, 1970] helped transform economists’ way of thinking about labor markets.
To clarify notions such as “involuntary” unemployment, McCall modeled the decision problem of an unemployed worker
in terms of factors including
• current and likely future wages
• impatience
• unemployment compensation
To solve the decision problem McCall used dynamic programming.
Here we set up McCall’s model and use dynamic programming to analyze it.
As we’ll see, McCall’s model is not only interesting in its own right but also an excellent vehicle for learning dynamic
programming.

497
Intermediate Quantitative Economics with Python

Let’s start with some imports:

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from numba import jit, float64
from numba.experimental import jitclass
import quantecon as qe
from quantecon.distributions import BetaBinomial

27.2 The McCall Model

An unemployed agent receives in each period a job offer at wage 𝑤𝑡 .


In this lecture, we adopt the following simple environment:
• The offer sequence {𝑤𝑡 }𝑡≥0 is IID, with 𝑞(𝑤) being the probability of observing wage 𝑤 in finite set 𝕎.
• The agent observes 𝑤𝑡 at the start of 𝑡.
• The agent knows that {𝑤𝑡 } is IID with common distribution 𝑞 and can use this when computing expectations.
(In later lectures, we will relax these assumptions.)
At time 𝑡, our agent has two choices:
1. Accept the offer and work permanently at constant wage 𝑤𝑡 .
2. Reject the offer, receive unemployment compensation 𝑐, and reconsider next period.
The agent is infinitely lived and aims to maximize the expected discounted sum of earnings

𝔼 ∑ 𝛽 𝑡 𝑦𝑡
𝑡=0

The constant 𝛽 lies in (0, 1) and is called a discount factor.


The smaller is 𝛽, the more the agent discounts future utility relative to current utility.
The variable 𝑦𝑡 is income, equal to
• his/her wage 𝑤𝑡 when employed
• unemployment compensation 𝑐 when unemployed

27.2.1 A Trade-Off

The worker faces a trade-off:


• Waiting too long for a good offer is costly, since the future is discounted.
• Accepting too early is costly, since better offers might arrive in the future.
To decide optimally in the face of this trade-off, we use dynamic programming.
Dynamic programming can be thought of as a two-step procedure that
1. first assigns values to “states” and
2. then deduces optimal actions given those values
We’ll go through these steps in turn.

498 Chapter 27. Job Search I: The McCall Search Model


Intermediate Quantitative Economics with Python

27.2.2 The Value Function

In order to optimally trade-off current and future rewards, we need to think about two things:
1. the current payoffs we get from different choices
2. the different states that those choices will lead to in next period
To weigh these two aspects of the decision problem, we need to assign values to states.
To this end, let 𝑣∗ (𝑤) be the total lifetime value accruing to an unemployed worker who enters the current period unem-
ployed when the wage is 𝑤 ∈ 𝕎.
In particular, the agent has wage offer 𝑤 in hand.
More precisely, 𝑣∗ (𝑤) denotes the value of the objective function (28.1) when an agent in this situation makes optimal
decisions now and at all future points in time.
Of course 𝑣∗ (𝑤) is not trivial to calculate because we don’t yet know what decisions are optimal and what aren’t!
But think of 𝑣∗ as a function that assigns to each possible wage 𝑠 the maximal lifetime value that can be obtained with
that offer in hand.
A crucial observation is that this function 𝑣∗ must satisfy the recursion

𝑤
𝑣∗ (𝑤) = max { , 𝑐 + 𝛽 ∑ 𝑣∗ (𝑤′ )𝑞(𝑤′ )} (27.1)
1−𝛽 𝑤′ ∈𝕎

for every possible 𝑤 in 𝕎.


This important equation is a version of the Bellman equation, which is ubiquitous in economic dynamics and other fields
involving planning over time.
The intuition behind it is as follows:
• the first term inside the max operation is the lifetime payoff from accepting current offer, since
𝑤
= 𝑤 + 𝛽𝑤 + 𝛽 2 𝑤 + ⋯
1−𝛽
• the second term inside the max operation is the continuation value, which is the lifetime payoff from rejecting the
current offer and then behaving optimally in all subsequent periods
If we optimize and pick the best of these two options, we obtain maximal lifetime value from today, given current offer
𝑤.
But this is precisely 𝑣∗ (𝑤), which is the left-hand side of (27.1).

27.2.3 The Optimal Policy

Suppose for now that we are able to solve (27.1) for the unknown function 𝑣∗ .
Once we have this function in hand we can behave optimally (i.e., make the right choice between accept and reject).
All we have to do is select the maximal choice on the right-hand side of (27.1).
The optimal action is best thought of as a policy, which is, in general, a map from states to actions.
Given any 𝑤, we can read off the corresponding best choice (accept or reject) by picking the max on the right-hand side
of (27.1).
Thus, we have a map from ℝ to {0, 1}, with 1 meaning accept and 0 meaning reject.

27.2. The McCall Model 499


Intermediate Quantitative Economics with Python

We can write the policy as follows

𝑤
𝜎(𝑤) ∶= 1 { ≥ 𝑐 + 𝛽 ∑ 𝑣∗ (𝑤′ )𝑞(𝑤′ )}
1−𝛽 𝑤′ ∈𝕎

Here 1{𝑃 } = 1 if statement 𝑃 is true and equals 0 otherwise.


We can also write this as

𝜎(𝑤) ∶= 1{𝑤 ≥ 𝑤}
̄

where

𝑤̄ ∶= (1 − 𝛽) {𝑐 + 𝛽 ∑ 𝑣∗ (𝑤′ )𝑞(𝑤′ )} (27.2)


𝑤′

Here 𝑤̄ (called the reservation wage) is a constant depending on 𝛽, 𝑐 and the wage distribution.
The agent should accept if and only if the current wage offer exceeds the reservation wage.
In view of (27.2), we can compute this reservation wage if we can compute the value function.

27.3 Computing the Optimal Policy: Take 1

To put the above ideas into action, we need to compute the value function at each possible state 𝑤 ∈ 𝕎.
To simplify notation, let’s set

𝕎 ∶= {𝑤1 , … , 𝑤𝑛 } and 𝑣∗ (𝑖) ∶= 𝑣∗ (𝑤𝑖 )

The value function is then represented by the vector 𝑣∗ = (𝑣∗ (𝑖))𝑛𝑖=1 .


In view of (27.1), this vector satisfies the nonlinear system of equations

𝑤(𝑖)
𝑣∗ (𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣∗ (𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (27.3)
1−𝛽 1≤𝑗≤𝑛

27.3.1 The Algorithm

To compute this vector, we use successive approximations:


Step 1: pick an arbitrary initial guess 𝑣 ∈ ℝ𝑛 .
Step 2: compute a new vector 𝑣′ ∈ ℝ𝑛 via

𝑤(𝑖)
𝑣′ (𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣(𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (27.4)
1−𝛽 1≤𝑗≤𝑛

Step 3: calculate a measure of a discrepancy between 𝑣 and 𝑣′ , such as max𝑖 |𝑣(𝑖) − 𝑣′ (𝑖)|.
Step 4: if the deviation is larger than some fixed tolerance, set 𝑣 = 𝑣′ and go to step 2, else continue.
Step 5: return 𝑣.
For a small tolerance, the returned function 𝑣 is a close approximation to the value function 𝑣∗ .
The theory below elaborates on this point.

500 Chapter 27. Job Search I: The McCall Search Model


Intermediate Quantitative Economics with Python

27.3.2 Fixed Point Theory

What’s the mathematics behind these ideas?


First, one defines a mapping 𝑇 from ℝ𝑛 to itself via

𝑤(𝑖)
(𝑇 𝑣)(𝑖) = max { , 𝑐 + 𝛽 ∑ 𝑣(𝑗)𝑞(𝑗)} for 𝑖 = 1, … , 𝑛 (27.5)
1−𝛽 1≤𝑗≤𝑛

(A new vector 𝑇 𝑣 is obtained from given vector 𝑣 by evaluating the r.h.s. at each 𝑖.)
The element 𝑣𝑘 in the sequence {𝑣𝑘 } of successive approximations corresponds to 𝑇 𝑘 𝑣.
• This is 𝑇 applied 𝑘 times, starting at the initial guess 𝑣
One can show that the conditions of the Banach fixed point theorem are satisfied by 𝑇 on ℝ𝑛 .
One implication is that 𝑇 has a unique fixed point in ℝ𝑛 .
• That is, a unique vector 𝑣 ̄ such that 𝑇 𝑣 ̄ = 𝑣.̄
Moreover, it’s immediate from the definition of 𝑇 that this fixed point is 𝑣∗ .
A second implication of the Banach contraction mapping theorem is that {𝑇 𝑘 𝑣} converges to the fixed point 𝑣∗ regardless
of 𝑣.

27.3.3 Implementation

Our default for 𝑞, the distribution of the state process, will be Beta-binomial.

n, a, b = 50, 200, 100 # default parameters


q_default = BetaBinomial(n, a, b).pdf() # default choice of q

Our default set of values for wages will be

w_min, w_max = 10, 60


w_default = np.linspace(w_min, w_max, n+1)

Here’s a plot of the probabilities of different wage outcomes:

fig, ax = plt.subplots()
ax.plot(w_default, q_default, '-o', label='$q(w(i))$')
ax.set_xlabel('wages')
ax.set_ylabel('probabilities')

plt.show()

27.3. Computing the Optimal Policy: Take 1 501


Intermediate Quantitative Economics with Python

We are going to use Numba to accelerate our code.


• See, in particular, the discussion of @jitclass in our lecture on Numba.
The following helps Numba by providing some type

mccall_data = [
('c', float64), # unemployment compensation
('β', float64), # discount factor
('w', float64[:]), # array of wage values, w[i] = wage at state i
('q', float64[:]) # array of probabilities
]

Here’s a class that stores the data and computes the values of state-action pairs, i.e. the value in the maximum bracket on
the right hand side of the Bellman equation (27.4), given the current state and an arbitrary feasible action.
Default parameter values are embedded in the class.

@jitclass(mccall_data)
class McCallModel:

def __init__(self, c=25, β=0.99, w=w_default, q=q_default):

self.c, self.β = c, β
self.w, self.q = w_default, q_default

def state_action_values(self, i, v):


"""
The values of state-action pairs.
"""
# Simplify names
c, β, w, q = self.c, self.β, self.w, self.q
# Evaluate value for each state-action pair
# Consider action = accept or reject the current offer
accept = w[i] / (1 - β)
reject = c + β * np.sum(v * q)

return np.array([accept, reject])

502 Chapter 27. Job Search I: The McCall Search Model


Intermediate Quantitative Economics with Python

Based on these defaults, let’s try plotting the first few approximate value functions in the sequence {𝑇 𝑘 𝑣}.
We will start from guess 𝑣 given by 𝑣(𝑖) = 𝑤(𝑖)/(1 − 𝛽), which is the value of accepting at every given wage.
Here’s a function to implement this:

def plot_value_function_seq(mcm, ax, num_plots=6):


"""
Plot a sequence of value functions.

* mcm is an instance of McCallModel


* ax is an axes object that implements a plot method.

"""

n = len(mcm.w)
v = mcm.w / (1 - mcm.β)
v_next = np.empty_like(v)
for i in range(num_plots):
ax.plot(mcm.w, v, '-', alpha=0.4, label=f"iterate {i}")
# Update guess
for j in range(n):
v_next[j] = np.max(mcm.state_action_values(j, v))
v[:] = v_next # copy contents into v

ax.legend(loc='lower right')

Now let’s create an instance of McCallModel and watch iterations 𝑇 𝑘 𝑣 converge from below:

mcm = McCallModel()

fig, ax = plt.subplots()
ax.set_xlabel('wage')
ax.set_ylabel('value')
plot_value_function_seq(mcm, ax)
plt.show()

You can see that convergence is occurring: successive iterates are getting closer together.

27.3. Computing the Optimal Policy: Take 1 503


Intermediate Quantitative Economics with Python

Here’s a more serious iteration effort to compute the limit, which continues until measured deviation between successive
iterates is below tol.
Once we obtain a good approximation to the limit, we will use it to calculate the reservation wage.
We’ll be using JIT compilation via Numba to turbocharge our loops.

@jit(nopython=True)
def compute_reservation_wage(mcm,
max_iter=500,
tol=1e-6):

# Simplify names
c, β, w, q = mcm.c, mcm.β, mcm.w, mcm.q

# == First compute the value function == #

n = len(w)
v = w / (1 - β) # initial guess
v_next = np.empty_like(v)
j = 0
error = tol + 1
while j < max_iter and error > tol:

for j in range(n):
v_next[j] = np.max(mcm.state_action_values(j, v))

error = np.max(np.abs(v_next - v))


j += 1

v[:] = v_next # copy contents into v

# == Now compute the reservation wage == #

return (1 - β) * (c + β * np.sum(v * q))

The next line computes the reservation wage at default parameters

compute_reservation_wage(mcm)

47.316499710024964

27.3.4 Comparative Statics

Now that we know how to compute the reservation wage, let’s see how it varies with parameters.
In particular, let’s look at what happens when we change 𝛽 and 𝑐.

grid_size = 25
R = np.empty((grid_size, grid_size))

c_vals = np.linspace(10.0, 30.0, grid_size)


β_vals = np.linspace(0.9, 0.99, grid_size)

for i, c in enumerate(c_vals):
(continues on next page)

504 Chapter 27. Job Search I: The McCall Search Model


Intermediate Quantitative Economics with Python

(continued from previous page)


for j, β in enumerate(β_vals):
mcm = McCallModel(c=c, β=β)
R[i, j] = compute_reservation_wage(mcm)

fig, ax = plt.subplots()

cs1 = ax.contourf(c_vals, β_vals, R.T, alpha=0.75)


ctr1 = ax.contour(c_vals, β_vals, R.T)

plt.clabel(ctr1, inline=1, fontsize=13)


plt.colorbar(cs1, ax=ax)

ax.set_title("reservation wage")
ax.set_xlabel("$c$", fontsize=16)
ax.set_ylabel("$β$", fontsize=16)

ax.ticklabel_format(useOffset=False)

plt.show()

As expected, the reservation wage increases both with patience and with unemployment compensation.

27.3. Computing the Optimal Policy: Take 1 505


Intermediate Quantitative Economics with Python

27.4 Computing an Optimal Policy: Take 2

The approach to dynamic programming just described is standard and broadly applicable.
But for our McCall search model there’s also an easier way that circumvents the need to compute the value function.
Let ℎ denote the continuation value:

ℎ = 𝑐 + 𝛽 ∑ 𝑣∗ (𝑠′ )𝑞(𝑠′ ) (27.6)


𝑠′

The Bellman equation can now be written as

𝑤(𝑠′ )
𝑣∗ (𝑠′ ) = max { , ℎ}
1−𝛽

Substituting this last equation into (27.6) gives

𝑤(𝑠′ )
ℎ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝑞(𝑠′ ) (27.7)
𝑠′ ∈𝕊
1−𝛽

This is a nonlinear equation that we can solve for ℎ.


As before, we will use successive approximations:
Step 1: pick an initial guess ℎ.
Step 2: compute the update ℎ′ via

𝑤(𝑠′ )
ℎ′ = 𝑐 + 𝛽 ∑ max { , ℎ} 𝑞(𝑠′ ) (27.8)
𝑠′ ∈𝕊
1−𝛽

Step 3: calculate the deviation |ℎ − ℎ′ |.


Step 4: if the deviation is larger than some fixed tolerance, set ℎ = ℎ′ and go to step 2, else return ℎ.
One can again use the Banach contraction mapping theorem to show that this process always converges.
The big difference here, however, is that we’re iterating on a scalar ℎ, rather than an 𝑛-vector, 𝑣(𝑖), 𝑖 = 1, … , 𝑛.
Here’s an implementation:

@jit(nopython=True)
def compute_reservation_wage_two(mcm,
max_iter=500,
tol=1e-5):

# Simplify names
c, β, w, q = mcm.c, mcm.β, mcm.w, mcm.q

# == First compute h == #

h = np.sum(w * q) / (1 - β)
i = 0
error = tol + 1
while i < max_iter and error > tol:

s = np.maximum(w / (1 - β), h)
h_next = c + β * np.sum(s * q)

(continues on next page)

506 Chapter 27. Job Search I: The McCall Search Model


Intermediate Quantitative Economics with Python

(continued from previous page)


error = np.abs(h_next - h)
i += 1

h = h_next

# == Now compute the reservation wage == #

return (1 - β) * h

You can use this code to solve the exercise below.

27.5 Exercises

Exercise 27.5.1
Compute the average duration of unemployment when 𝛽 = 0.99 and 𝑐 takes the following values
c_vals = np.linspace(10, 40, 25)
That is, start the agent off as unemployed, compute their reservation wage given the parameters, and then simulate to see
how long it takes to accept.
Repeat a large number of times and take the average.
Plot mean unemployment duration as a function of 𝑐 in c_vals.

Solution to Exercise 27.5.1


Here’s one solution

cdf = np.cumsum(q_default)

@jit(nopython=True)
def compute_stopping_time(w_bar, seed=1234):

np.random.seed(seed)
t = 1
while True:
# Generate a wage draw
w = w_default[qe.random.draw(cdf)]
# Stop when the draw is above the reservation wage
if w >= w_bar:
stopping_time = t
break
else:
t += 1
return stopping_time

@jit(nopython=True)
def compute_mean_stopping_time(w_bar, num_reps=100000):
obs = np.empty(num_reps)
for i in range(num_reps):
obs[i] = compute_stopping_time(w_bar, seed=i)
(continues on next page)

27.5. Exercises 507


Intermediate Quantitative Economics with Python

(continued from previous page)


return obs.mean()

c_vals = np.linspace(10, 40, 25)


stop_times = np.empty_like(c_vals)
for i, c in enumerate(c_vals):
mcm = McCallModel(c=c)
w_bar = compute_reservation_wage_two(mcm)
stop_times[i] = compute_mean_stopping_time(w_bar)

fig, ax = plt.subplots()

ax.plot(c_vals, stop_times, label="mean unemployment duration")


ax.set(xlabel="unemployment compensation", ylabel="months")
ax.legend()

plt.show()

Exercise 27.5.2
The purpose of this exercise is to show how to replace the discrete wage offer distribution used above with a continuous
distribution.
This is a significant topic because many convenient distributions are continuous (i.e., have a density).
Fortunately, the theory changes little in our simple model.
Recall that ℎ in (27.6) denotes the value of not accepting a job in this period but then behaving optimally in all subsequent
periods:
To shift to a continuous offer distribution, we can replace (27.6) by

ℎ = 𝑐 + 𝛽 ∫ 𝑣∗ (𝑠′ )𝑞(𝑠′ )𝑑𝑠′ . (27.9)

508 Chapter 27. Job Search I: The McCall Search Model


Intermediate Quantitative Economics with Python

Equation (27.7) becomes

𝑤(𝑠′ )
ℎ = 𝑐 + 𝛽 ∫ max { , ℎ} 𝑞(𝑠′ )𝑑𝑠′ (27.10)
1−𝛽
The aim is to solve this nonlinear equation by iteration, and from it obtain the reservation wage.
Try to carry this out, setting
• the state sequence {𝑠𝑡 } to be IID and standard normal and
• the wage function to be 𝑤(𝑠) = exp(𝜇 + 𝜎𝑠).
You will need to implement a new version of the McCallModel class that assumes a lognormal wage distribution.
Calculate the integral by Monte Carlo, by averaging over a large number of wage draws.
For default parameters, use c=25, β=0.99, σ=0.5, μ=2.5.
Once your code is working, investigate how the reservation wage changes with 𝑐 and 𝛽.

Solution to Exercise 27.5.2


Here is one solution:

mccall_data_continuous = [
('c', float64), # unemployment compensation
('β', float64), # discount factor
('σ', float64), # scale parameter in lognormal distribution
('μ', float64), # location parameter in lognormal distribution
('w_draws', float64[:]) # draws of wages for Monte Carlo
]

@jitclass(mccall_data_continuous)
class McCallModelContinuous:

def __init__(self, c=25, β=0.99, σ=0.5, μ=2.5, mc_size=1000):

self.c, self.β, self.σ, self.μ = c, β, σ, μ

# Draw and store shocks


np.random.seed(1234)
s = np.random.randn(mc_size)
self.w_draws = np.exp(μ+ σ * s)

@jit(nopython=True)
def compute_reservation_wage_continuous(mcmc, max_iter=500, tol=1e-5):

c, β, σ, μ, w_draws = mcmc.c, mcmc.β, mcmc.σ, mcmc.μ, mcmc.w_draws

h = np.mean(w_draws) / (1 - β) # initial guess


i = 0
error = tol + 1
while i < max_iter and error > tol:

integral = np.mean(np.maximum(w_draws / (1 - β), h))


h_next = c + β * integral

(continues on next page)

27.5. Exercises 509


Intermediate Quantitative Economics with Python

(continued from previous page)


error = np.abs(h_next - h)
i += 1

h = h_next

# == Now compute the reservation wage == #

return (1 - β) * h

Now we investigate how the reservation wage changes with 𝑐 and 𝛽.


We will do this using a contour plot.

grid_size = 25
R = np.empty((grid_size, grid_size))

c_vals = np.linspace(10.0, 30.0, grid_size)


β_vals = np.linspace(0.9, 0.99, grid_size)

for i, c in enumerate(c_vals):
for j, β in enumerate(β_vals):
mcmc = McCallModelContinuous(c=c, β=β)
R[i, j] = compute_reservation_wage_continuous(mcmc)

fig, ax = plt.subplots()

cs1 = ax.contourf(c_vals, β_vals, R.T, alpha=0.75)


ctr1 = ax.contour(c_vals, β_vals, R.T)

plt.clabel(ctr1, inline=1, fontsize=13)


plt.colorbar(cs1, ax=ax)

ax.set_title("reservation wage")
ax.set_xlabel("$c$", fontsize=16)
ax.set_ylabel("$β$", fontsize=16)

ax.ticklabel_format(useOffset=False)

plt.show()

510 Chapter 27. Job Search I: The McCall Search Model


Intermediate Quantitative Economics with Python

27.5. Exercises 511


Intermediate Quantitative Economics with Python

512 Chapter 27. Job Search I: The McCall Search Model


CHAPTER

TWENTYEIGHT

JOB SEARCH II: SEARCH AND SEPARATION

Contents

• Job Search II: Search and Separation


– Overview
– The Model
– Solving the Model
– Implementation
– Impact of Parameters
– Exercises

In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon

28.1 Overview

Previously we looked at the McCall job search model [McCall, 1970] as a way of understanding unemployment and
worker decisions.
One unrealistic feature of the model is that every job is permanent.
In this lecture, we extend the McCall model by introducing job separation.
Once separation enters the picture, the agent comes to view
• the loss of a job as a capital loss, and
• a spell of unemployment as an investment in searching for an acceptable job
The other minor addition is that a utility function will be included to make worker preferences slightly more sophisticated.
We’ll need the following imports

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
from numba import njit, float64
(continues on next page)

513
Intermediate Quantitative Economics with Python

(continued from previous page)


from numba.experimental import jitclass
from quantecon.distributions import BetaBinomial

28.2 The Model

The model is similar to the baseline McCall job search model.


It concerns the life of an infinitely lived worker and
• the opportunities he or she (let’s say he to save one character) has to work at different wages
• exogenous events that destroy his current job
• his decision making process while unemployed
The worker can be in one of two states: employed or unemployed.
He wants to maximize

𝔼 ∑ 𝛽 𝑡 𝑢(𝑦𝑡 ) (28.1)
𝑡=0

At this stage the only difference from the baseline model is that we’ve added some flexibility to preferences by introducing
a utility function 𝑢.
It satisfies 𝑢′ > 0 and 𝑢″ < 0.

28.2.1 The Wage Process

For now we will drop the separation of state process and wage process that we maintained for the baseline model.
In particular, we simply suppose that wage offers {𝑤𝑡 } are IID with common distribution 𝑞.
The set of possible wage values is denoted by 𝕎.
(Later we will go back to having a separate state process {𝑠𝑡 } driving random outcomes, since this formulation is usually
convenient in more sophisticated models.)

28.2.2 Timing and Decisions

At the start of each period, the agent can be either


• unemployed or
• employed at some existing wage level 𝑤𝑒 .
At the start of a given period, the current wage offer 𝑤𝑡 is observed.
If currently employed, the worker
1. receives utility 𝑢(𝑤𝑒 ) and
2. is fired with some (small) probability 𝛼.

514 Chapter 28. Job Search II: Search and Separation


Intermediate Quantitative Economics with Python

If currently unemployed, the worker either accepts or rejects the current offer 𝑤𝑡 .
If he accepts, then he begins work immediately at wage 𝑤𝑡 .
If he rejects, then he receives unemployment compensation 𝑐.
The process then repeats.

Note: We do not allow for job search while employed—this topic is taken up in a later lecture.

28.3 Solving the Model

We drop time subscripts in what follows and primes denote next period values.
Let
• 𝑣(𝑤𝑒 ) be total lifetime value accruing to a worker who enters the current period employed with existing wage 𝑤𝑒
• ℎ(𝑤) be total lifetime value accruing to a worker who who enters the current period unemployed and receives wage
offer 𝑤.
Here value means the value of the objective function (28.1) when the worker makes optimal decisions at all future points
in time.
Our first aim is to obtain these functions.

28.3.1 The Bellman Equations

Suppose for now that the worker can calculate the functions 𝑣 and ℎ and use them in his decision making.
Then 𝑣 and ℎ should satisfy

𝑣(𝑤𝑒 ) = 𝑢(𝑤𝑒 ) + 𝛽 [(1 − 𝛼)𝑣(𝑤𝑒 ) + 𝛼 ∑ ℎ(𝑤′ )𝑞(𝑤′ )] (28.2)


𝑤′ ∈𝕎

and

ℎ(𝑤) = max {𝑣(𝑤), 𝑢(𝑐) + 𝛽 ∑ ℎ(𝑤′ )𝑞(𝑤′ )} (28.3)


𝑤′ ∈𝕎

Equation (28.2) expresses the value of being employed at wage 𝑤𝑒 in terms of


• current reward 𝑢(𝑤𝑒 ) plus
• discounted expected reward tomorrow, given the 𝛼 probability of being fired
Equation (28.3) expresses the value of being unemployed with offer 𝑤 in hand as a maximum over the value of two
options: accept or reject the current offer.
Accepting transitions the worker to employment and hence yields reward 𝑣(𝑤).
Rejecting leads to unemployment compensation and unemployment tomorrow.
Equations (28.2) and (28.3) are the Bellman equations for this model.
They provide enough information to solve for both 𝑣 and ℎ.

28.3. Solving the Model 515


Intermediate Quantitative Economics with Python

28.3.2 A Simplifying Transformation

Rather than jumping straight into solving these equations, let’s see if we can simplify them somewhat.
(This process will be analogous to our second pass at the plain vanilla McCall model, where we simplified the Bellman
equation.)
First, let

𝑑 ∶= ∑ ℎ(𝑤′ )𝑞(𝑤′ ) (28.4)


𝑤′ ∈𝕎

be the expected value of unemployment tomorrow.


We can now write (28.3) as

ℎ(𝑤) = max {𝑣(𝑤), 𝑢(𝑐) + 𝛽𝑑}

or, shifting time forward one period

∑ ℎ(𝑤′ )𝑞(𝑤′ ) = ∑ max {𝑣(𝑤′ ), 𝑢(𝑐) + 𝛽𝑑} 𝑞(𝑤′ )


𝑤′ ∈𝕎 𝑤′ ∈𝕎

Using (28.4) again now gives

𝑑 = ∑ max {𝑣(𝑤′ ), 𝑢(𝑐) + 𝛽𝑑} 𝑞(𝑤′ ) (28.5)


𝑤′ ∈𝕎

Finally, (28.2) can now be rewritten as

𝑣(𝑤) = 𝑢(𝑤) + 𝛽 [(1 − 𝛼)𝑣(𝑤) + 𝛼𝑑] (28.6)

In the last expression, we wrote 𝑤𝑒 as 𝑤 to make the notation simpler.

28.3.3 The Reservation Wage

Suppose we can use (28.5) and (28.6) to solve for 𝑑 and 𝑣.


(We will do this soon.)
We can then determine optimal behavior for the worker.
From (28.3), we see that an unemployed agent accepts current offer 𝑤 if 𝑣(𝑤) ≥ 𝑢(𝑐) + 𝛽𝑑.
This means precisely that the value of accepting is higher than the expected value of rejecting.
It is clear that 𝑣 is (at least weakly) increasing in 𝑤, since the agent is never made worse off by a higher wage offer.
Hence, we can express the optimal choice as accepting wage offer 𝑤 if and only if

𝑤 ≥ 𝑤̄ where 𝑤̄ solves 𝑣(𝑤)̄ = 𝑢(𝑐) + 𝛽𝑑

28.3.4 Solving the Bellman Equations

We’ll use the same iterative approach to solving the Bellman equations that we adopted in the first job search lecture.
Here this amounts to
1. make guesses for 𝑑 and 𝑣
2. plug these guesses into the right-hand sides of (28.5) and (28.6)

516 Chapter 28. Job Search II: Search and Separation


Intermediate Quantitative Economics with Python

3. update the left-hand sides from this rule and then repeat
In other words, we are iterating using the rules

𝑑𝑛+1 = ∑ max {𝑣𝑛 (𝑤′ ), 𝑢(𝑐) + 𝛽𝑑𝑛 } 𝑞(𝑤′ ) (28.7)


𝑤′ ∈𝕎

𝑣𝑛+1 (𝑤) = 𝑢(𝑤) + 𝛽 [(1 − 𝛼)𝑣𝑛 (𝑤) + 𝛼𝑑𝑛 ] (28.8)


starting from some initial conditions 𝑑0 , 𝑣0 .
As before, the system always converges to the true solutions—in this case, the 𝑣 and 𝑑 that solve (28.5) and (28.6).
(A proof can be obtained via the Banach contraction mapping theorem.)

28.4 Implementation

Let’s implement this iterative process.


In the code, you’ll see that we use a class to store the various parameters and other objects associated with a given model.
This helps to tidy up the code and provides an object that’s easy to pass to functions.
The default utility function is a CRRA utility function

@njit
def u(c, σ=2.0):
return (c**(1 - σ) - 1) / (1 - σ)

Also, here’s a default wage distribution, based around the BetaBinomial distribution:

n = 60 # n possible outcomes for w


w_default = np.linspace(10, 20, n) # wages between 10 and 20
a, b = 600, 400 # shape parameters
dist = BetaBinomial(n-1, a, b)
q_default = dist.pdf()

Here’s our jitted class for the McCall model with separation.

mccall_data = [
('α', float64), # job separation rate
('β', float64), # discount factor
('c', float64), # unemployment compensation
('w', float64[:]), # list of wage values
('q', float64[:]) # pmf of random variable w
]

@jitclass(mccall_data)
class McCallModel:
"""
Stores the parameters and functions associated with a given model.
"""

def __init__(self, α=0.2, β=0.98, c=6.0, w=w_default, q=q_default):

self.α, self.β, self.c, self.w, self.q = α, β, c, w, q

(continues on next page)

28.4. Implementation 517


Intermediate Quantitative Economics with Python

(continued from previous page)

def update(self, v, d):

α, β, c, w, q = self.α, self.β, self.c, self.w, self.q

v_new = np.empty_like(v)

for i in range(len(w)):
v_new[i] = u(w[i]) + β * ((1 - α) * v[i] + α * d)

d_new = np.sum(np.maximum(v, u(c) + β * d) * q)

return v_new, d_new

Now we iterate until successive realizations are closer together than some small tolerance level.
We then return the current iterate as an approximate solution.

@njit
def solve_model(mcm, tol=1e-5, max_iter=2000):
"""
Iterates to convergence on the Bellman equations

* mcm is an instance of McCallModel


"""

v = np.ones_like(mcm.w) # Initial guess of v


d = 1 # Initial guess of d
i = 0
error = tol + 1

while error > tol and i < max_iter:


v_new, d_new = mcm.update(v, d)
error_1 = np.max(np.abs(v_new - v))
error_2 = np.abs(d_new - d)
error = max(error_1, error_2)
v = v_new
d = d_new
i += 1

return v, d

28.4.1 The Reservation Wage: First Pass

The optimal choice of the agent is summarized by the reservation wage.


As discussed above, the reservation wage is the 𝑤̄ that solves 𝑣(𝑤)̄ = ℎ where ℎ ∶= 𝑢(𝑐) + 𝛽𝑑 is the continuation value.
Let’s compare 𝑣 and ℎ to see what they look like.
We’ll use the default parameterizations found in the code above.

mcm = McCallModel()
v, d = solve_model(mcm)
h = u(mcm.c) + mcm.β * d
(continues on next page)

518 Chapter 28. Job Search II: Search and Separation


Intermediate Quantitative Economics with Python

(continued from previous page)

fig, ax = plt.subplots()

ax.plot(mcm.w, v, 'b-', lw=2, alpha=0.7, label='$v$')


ax.plot(mcm.w, [h] * len(mcm.w),
'g-', lw=2, alpha=0.7, label='$h$')
ax.set_xlim(min(mcm.w), max(mcm.w))
ax.legend()

plt.show()

The value 𝑣 is increasing because higher 𝑤 generates a higher wage flow conditional on staying employed.

28.4.2 The Reservation Wage: Computation

Here’s a function compute_reservation_wage that takes an instance of McCallModel and returns the associ-
ated reservation wage.

@njit
def compute_reservation_wage(mcm):
"""
Computes the reservation wage of an instance of the McCall model
by finding the smallest w such that v(w) >= h.

If no such w exists, then w_bar is set to np.inf.


"""

v, d = solve_model(mcm)
h = u(mcm.c) + mcm.β * d

i = np.searchsorted(v, h, side='right')
w_bar = mcm.w[i]

return w_bar

Next we will investigate how the reservation wage varies with parameters.

28.4. Implementation 519


Intermediate Quantitative Economics with Python

28.5 Impact of Parameters

In each instance below, we’ll show you a figure and then ask you to reproduce it in the exercises.

28.5.1 The Reservation Wage and Unemployment Compensation

First, let’s look at how 𝑤̄ varies with unemployment compensation.


In the figure below, we use the default parameters in the McCallModel class, apart from c (which takes the values given
on the horizontal axis)

As expected, higher unemployment compensation causes the worker to hold out for higher wages.
In effect, the cost of continuing job search is reduced.

28.5.2 The Reservation Wage and Discounting

Next, let’s investigate how 𝑤̄ varies with the discount factor.


The next figure plots the reservation wage associated with different values of 𝛽
Again, the results are intuitive: More patient workers will hold out for higher wages.

520 Chapter 28. Job Search II: Search and Separation


Intermediate Quantitative Economics with Python

28.5.3 The Reservation Wage and Job Destruction

Finally, let’s look at how 𝑤̄ varies with the job separation rate 𝛼.
Higher 𝛼 translates to a greater chance that a worker will face termination in each period once employed.
Once more, the results are in line with our intuition.
If the separation rate is high, then the benefit of holding out for a higher wage falls.
Hence the reservation wage is lower.

28.6 Exercises

Exercise 28.6.1
Reproduce all the reservation wage figures shown above.
Regarding the values on the horizontal axis, use

grid_size = 25
c_vals = np.linspace(2, 12, grid_size) # unemployment compensation
beta_vals = np.linspace(0.8, 0.99, grid_size) # discount factors
alpha_vals = np.linspace(0.05, 0.5, grid_size) # separation rate

Solution to Exercise 28.6.1


Here’s the first figure.

28.6. Exercises 521


Intermediate Quantitative Economics with Python

mcm = McCallModel()

w_bar_vals = np.empty_like(c_vals)

fig, ax = plt.subplots()

for i, c in enumerate(c_vals):
mcm.c = c
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.set(xlabel='unemployment compensation',
ylabel='reservation wage')
ax.plot(c_vals, w_bar_vals, label=r'$\bar w$ as a function of $c$')
ax.legend()

plt.show()

522 Chapter 28. Job Search II: Search and Separation


Intermediate Quantitative Economics with Python

Here’s the second one.

fig, ax = plt.subplots()

for i, β in enumerate(beta_vals):
mcm.β = β
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.set(xlabel='discount factor', ylabel='reservation wage')


ax.plot(beta_vals, w_bar_vals, label=r'$\bar w$ as a function of $\beta$')
ax.legend()

plt.show()

Here’s the third.

28.6. Exercises 523


Intermediate Quantitative Economics with Python

fig, ax = plt.subplots()

for i, α in enumerate(alpha_vals):
mcm.α = α
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.set(xlabel='separation rate', ylabel='reservation wage')


ax.plot(alpha_vals, w_bar_vals, label=r'$\bar w$ as a function of $\alpha$')
ax.legend()

plt.show()

524 Chapter 28. Job Search II: Search and Separation


CHAPTER

TWENTYNINE

JOB SEARCH III: FITTED VALUE FUNCTION ITERATION

Contents

• Job Search III: Fitted Value Function Iteration


– Overview
– The Algorithm
– Implementation
– Exercises

29.1 Overview

In this lecture we again study the McCall job search model with separation, but now with a continuous wage distribution.
While we already considered continuous wage distributions briefly in the exercises of the first job search lecture, the change
was relatively trivial in that case.
This is because we were able to reduce the problem to solving for a single scalar value (the continuation value).
Here, with separation, the change is less trivial, since a continuous wage distribution leads to an uncountably infinite state
space.
The infinite state space leads to additional challenges, particularly when it comes to applying value function iteration (VFI).
These challenges will lead us to modify VFI by adding an interpolation step.
The combination of VFI and this interpolation step is called fitted value function iteration (fitted VFI).
Fitted VFI is very common in practice, so we will take some time to work through the details.
We will use the following imports:

import matplotlib.pyplot as plt


import numpy as np
from numba import njit, float64
from numba.experimental import jitclass

525
Intermediate Quantitative Economics with Python

29.2 The Algorithm

The model is the same as the McCall model with job separation we studied before, except that the wage offer distribution
is continuous.
We are going to start with the two Bellman equations we obtained for the model with job separation after a simplifying
transformation.
Modified to accommodate continuous wage draws, they take the following form:

𝑑 = ∫ max {𝑣(𝑤′ ), 𝑢(𝑐) + 𝛽𝑑} 𝑞(𝑤′ )𝑑𝑤′ (29.1)

and

𝑣(𝑤) = 𝑢(𝑤) + 𝛽 [(1 − 𝛼)𝑣(𝑤) + 𝛼𝑑] (29.2)

The unknowns here are the function 𝑣 and the scalar 𝑑.


The difference between these and the pair of Bellman equations we previously worked on are
1. in (29.1), what used to be a sum over a finite number of wage values is an integral over an infinite set.
2. The function 𝑣 in (29.2) is defined over all 𝑤 ∈ ℝ+ .
The function 𝑞 in (29.1) is the density of the wage offer distribution.
Its support is taken as equal to ℝ+ .

29.2.1 Value Function Iteration

In theory, we should now proceed as follows:


1. Begin with a guess 𝑣, 𝑑 for the solutions to (29.1)–(29.2).
2. Plug 𝑣, 𝑑 into the right hand side of (29.1)–(29.2) and compute the left hand side to obtain updates 𝑣′ , 𝑑′
3. Unless some stopping condition is satisfied, set (𝑣, 𝑑) = (𝑣′ , 𝑑′ ) and go to step 2.
However, there is a problem we must confront before we implement this procedure: The iterates of the value function
can neither be calculated exactly nor stored on a computer.
To see the issue, consider (29.2).
Even if 𝑣 is a known function, the only way to store its update 𝑣′ is to record its value 𝑣′ (𝑤) for every 𝑤 ∈ ℝ+ .
Clearly, this is impossible.

29.2.2 Fitted Value Function Iteration

What we will do instead is use fitted value function iteration.


The procedure is as follows:
Let a current guess 𝑣 be given.
Now we record the value of the function 𝑣′ at only finitely many “grid” points 𝑤1 < 𝑤2 < ⋯ < 𝑤𝐼 and then reconstruct
𝑣′ from this information when required.
More precisely, the algorithm will be
1. Begin with an array v representing the values of an initial guess of the value function on some grid points {𝑤𝑖 }.

526 Chapter 29. Job Search III: Fitted Value Function Iteration
Intermediate Quantitative Economics with Python

2. Build a function 𝑣 on the state space ℝ+ by interpolation or approximation, based on v and {𝑤𝑖 }.
3. Obtain and record the samples of the updated function 𝑣′ (𝑤𝑖 ) on each grid point 𝑤𝑖 .
4. Unless some stopping condition is satisfied, take this as the new array and go to step 1.
How should we go about step 2?
This is a problem of function approximation, and there are many ways to approach it.
What’s important here is that the function approximation scheme must not only produce a good approximation to each 𝑣,
but also that it combines well with the broader iteration algorithm described above.
One good choice from both respects is continuous piecewise linear interpolation.
This method
1. combines well with value function iteration (see., e.g., [Gordon, 1995] or [Stachurski, 2008]) and
2. preserves useful shape properties such as monotonicity and concavity/convexity.
Linear interpolation will be implemented using numpy.interp.
The next figure illustrates piecewise linear interpolation of an arbitrary function on grid points 0, 0.2, 0.4, 0.6, 0.8, 1.

def f(x):
y1 = 2 * np.cos(6 * x) + np.sin(14 * x)
return y1 + 2.5

c_grid = np.linspace(0, 1, 6)
f_grid = np.linspace(0, 1, 150)

def Af(x):
return np.interp(x, c_grid, f(c_grid))

fig, ax = plt.subplots()

ax.plot(f_grid, f(f_grid), 'b-', label='true function')


ax.plot(f_grid, Af(f_grid), 'g-', label='linear approximation')
ax.vlines(c_grid, c_grid * 0, f(c_grid), linestyle='dashed', alpha=0.5)

ax.legend(loc="upper center")

ax.set(xlim=(0, 1), ylim=(0, 6))


plt.show()

29.2. The Algorithm 527


Intermediate Quantitative Economics with Python

29.3 Implementation

The first step is to build a jitted class for the McCall model with separation and a continuous wage offer distribution.
We will take the utility function to be the log function for this application, with 𝑢(𝑐) = ln 𝑐.
We will adopt the lognormal distribution for wages, with 𝑤 = exp(𝜇 + 𝜎𝑧) when 𝑧 is standard normal and 𝜇, 𝜎 are
parameters.

@njit
def lognormal_draws(n=1000, μ=2.5, σ=0.5, seed=1234):
np.random.seed(seed)
z = np.random.randn(n)
w_draws = np.exp(μ + σ * z)
return w_draws

Here’s our class.

mccall_data_continuous = [
('c', float64), # unemployment compensation
('α', float64), # job separation rate
('β', float64), # discount factor
('w_grid', float64[:]), # grid of points for fitted VFI
('w_draws', float64[:]) # draws of wages for Monte Carlo
]

@jitclass(mccall_data_continuous)
(continues on next page)

528 Chapter 29. Job Search III: Fitted Value Function Iteration
Intermediate Quantitative Economics with Python

(continued from previous page)


class McCallModelContinuous:

def __init__(self,
c=1,
α=0.1,
β=0.96,
grid_min=1e-10,
grid_max=5,
grid_size=100,
w_draws=lognormal_draws()):

self.c, self.α, self.β = c, α, β

self.w_grid = np.linspace(grid_min, grid_max, grid_size)


self.w_draws = w_draws

def update(self, v, d):

# Simplify names
c, α, β = self.c, self.α, self.β
w = self.w_grid
u = lambda x: np.log(x)

# Interpolate array represented value function


vf = lambda x: np.interp(x, w, v)

# Update d using Monte Carlo to evaluate integral


d_new = np.mean(np.maximum(vf(self.w_draws), u(c) + β * d))

# Update v
v_new = u(w) + β * ((1 - α) * v + α * d)

return v_new, d_new

We then return the current iterate as an approximate solution.

@njit
def solve_model(mcm, tol=1e-5, max_iter=2000):
"""
Iterates to convergence on the Bellman equations

* mcm is an instance of McCallModel


"""

v = np.ones_like(mcm.w_grid) # Initial guess of v


d = 1 # Initial guess of d
i = 0
error = tol + 1

while error > tol and i < max_iter:


v_new, d_new = mcm.update(v, d)
error_1 = np.max(np.abs(v_new - v))
error_2 = np.abs(d_new - d)
error = max(error_1, error_2)
v = v_new
d = d_new
(continues on next page)

29.3. Implementation 529


Intermediate Quantitative Economics with Python

(continued from previous page)


i += 1

return v, d

Here’s a function compute_reservation_wage that takes an instance of McCallModelContinuous and re-


turns the associated reservation wage.
If 𝑣(𝑤) < ℎ for all 𝑤, then the function returns np.inf

@njit
def compute_reservation_wage(mcm):
"""
Computes the reservation wage of an instance of the McCall model
by finding the smallest w such that v(w) >= h.

If no such w exists, then w_bar is set to np.inf.


"""
u = lambda x: np.log(x)

v, d = solve_model(mcm)
h = u(mcm.c) + mcm.β * d

w_bar = np.inf
for i, wage in enumerate(mcm.w_grid):
if v[i] > h:
w_bar = wage
break

return w_bar

The exercises ask you to explore the solution and how it changes with parameters.

29.4 Exercises

Exercise 29.4.1
Use the code above to explore what happens to the reservation wage when the wage parameter 𝜇 changes.
Use the default parameters and 𝜇 in mu_vals = np.linspace(0.0, 2.0, 15).
Is the impact on the reservation wage as you expected?

Solution to Exercise 29.4.1


Here is one solution

mcm = McCallModelContinuous()
mu_vals = np.linspace(0.0, 2.0, 15)
w_bar_vals = np.empty_like(mu_vals)

fig, ax = plt.subplots()

(continues on next page)

530 Chapter 29. Job Search III: Fitted Value Function Iteration
Intermediate Quantitative Economics with Python

(continued from previous page)


for i, m in enumerate(mu_vals):
mcm.w_draws = lognormal_draws(μ=m)
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.set(xlabel='mean', ylabel='reservation wage')


ax.plot(mu_vals, w_bar_vals, label=r'$\bar w$ as a function of $\mu$')
ax.legend()

plt.show()

Not surprisingly, the agent is more inclined to wait when the distribution of offers shifts to the right.

Exercise 29.4.2
Let us now consider how the agent responds to an increase in volatility.
To try to understand this, compute the reservation wage when the wage offer distribution is uniform on (𝑚 − 𝑠, 𝑚 + 𝑠)
and 𝑠 varies.
The idea here is that we are holding the mean constant and spreading the support.
(This is a form of mean-preserving spread.)
Use s_vals = np.linspace(1.0, 2.0, 15) and m = 2.0.
State how you expect the reservation wage to vary with 𝑠.

29.4. Exercises 531


Intermediate Quantitative Economics with Python

Now compute it. Is this as you expected?

Solution to Exercise 29.4.2


Here is one solution

mcm = McCallModelContinuous()
s_vals = np.linspace(1.0, 2.0, 15)
m = 2.0
w_bar_vals = np.empty_like(s_vals)

fig, ax = plt.subplots()

for i, s in enumerate(s_vals):
a, b = m - s, m + s
mcm.w_draws = np.random.uniform(low=a, high=b, size=10_000)
w_bar = compute_reservation_wage(mcm)
w_bar_vals[i] = w_bar

ax.set(xlabel='volatility', ylabel='reservation wage')


ax.plot(s_vals, w_bar_vals, label=r'$\bar w$ as a function of wage volatility')
ax.legend()

plt.show()

The reservation wage increases with volatility.


One might think that higher volatility would make the agent more inclined to take a given offer, since doing so represents

532 Chapter 29. Job Search III: Fitted Value Function Iteration
Intermediate Quantitative Economics with Python

certainty and waiting represents risk.


But job search is like holding an option: the worker is only exposed to upside risk (since, in a free market, no one can
force them to take a bad offer).
More volatility means higher upside potential, which encourages the agent to wait.

29.4. Exercises 533


Intermediate Quantitative Economics with Python

534 Chapter 29. Job Search III: Fitted Value Function Iteration
CHAPTER

THIRTY

JOB SEARCH IV: CORRELATED WAGE OFFERS

Contents

• Job Search IV: Correlated Wage Offers


– Overview
– The Model
– Implementation
– Unemployment Duration
– Exercises

In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon

30.1 Overview

In this lecture we solve a McCall style job search model with persistent and transitory components to wages.
In other words, we relax the unrealistic assumption that randomness in wages is independent over time.
At the same time, we will go back to assuming that jobs are permanent and no separation occurs.
This is to keep the model relatively simple as we study the impact of correlation.
We will use the following imports:

import matplotlib.pyplot as plt


import numpy as np
import quantecon as qe
from numpy.random import randn
from numba import njit, prange, float64
from numba.experimental import jitclass

535
Intermediate Quantitative Economics with Python

30.2 The Model

Wages at each point in time are given by

𝑤𝑡 = exp(𝑧𝑡 ) + 𝑦𝑡

where

𝑦𝑡 ∼ exp(𝜇 + 𝑠𝜁𝑡 ) and 𝑧𝑡+1 = 𝑑 + 𝜌𝑧𝑡 + 𝜎𝜖𝑡+1

Here {𝜁𝑡 } and {𝜖𝑡 } are both IID and standard normal.
Here {𝑦𝑡 } is a transitory component and {𝑧𝑡 } is persistent.
As before, the worker can either
1. accept an offer and work permanently at that wage, or
2. take unemployment compensation 𝑐 and wait till next period.
The value function satisfies the Bellman equation

𝑢(𝑤)
𝑣∗ (𝑤, 𝑧) = max { , 𝑢(𝑐) + 𝛽 𝔼𝑧 𝑣∗ (𝑤′ , 𝑧 ′ )}
1−𝛽

In this express, 𝑢 is a utility function and 𝔼𝑧 is expectation of next period variables given current 𝑧.
The variable 𝑧 enters as a state in the Bellman equation because its current value helps predict future wages.

30.2.1 A Simplification

There is a way that we can reduce dimensionality in this problem, which greatly accelerates computation.
To start, let 𝑓 ∗ be the continuation value function, defined by

𝑓 ∗ (𝑧) ∶= 𝑢(𝑐) + 𝛽 𝔼𝑧 𝑣∗ (𝑤′ , 𝑧 ′ )

The Bellman equation can now be written

𝑢(𝑤) ∗
𝑣∗ (𝑤, 𝑧) = max { , 𝑓 (𝑧)}
1−𝛽

Combining the last two expressions, we see that the continuation value function satisfies

𝑢(𝑤′ ) ∗ ′
𝑓 ∗ (𝑧) = 𝑢(𝑐) + 𝛽 𝔼𝑧 max { , 𝑓 (𝑧 )}
1−𝛽

We’ll solve this functional equation for 𝑓 ∗ by introducing the operator

𝑢(𝑤′ )
𝑄𝑓(𝑧) = 𝑢(𝑐) + 𝛽 𝔼𝑧 max { , 𝑓(𝑧 ′ )}
1−𝛽

By construction, 𝑓 ∗ is a fixed point of 𝑄, in the sense that 𝑄𝑓 ∗ = 𝑓 ∗ .


Under mild assumptions, it can be shown that 𝑄 is a contraction mapping over a suitable space of continuous functions
on ℝ.
By Banach’s contraction mapping theorem, this means that 𝑓 ∗ is the unique fixed point and we can calculate it by iterating
with 𝑄 from any reasonable initial condition.

536 Chapter 30. Job Search IV: Correlated Wage Offers


Intermediate Quantitative Economics with Python

Once we have 𝑓 ∗ , we can solve the search problem by stopping when the reward for accepting exceeds the continuation
value, or
𝑢(𝑤)
≥ 𝑓 ∗ (𝑧)
1−𝛽
For utility we take 𝑢(𝑐) = ln(𝑐).
The reservation wage is the wage where equality holds in the last expression.
That is,

𝑤(𝑧)
̄ ∶= exp(𝑓 ∗ (𝑧)(1 − 𝛽)) (30.1)

Our main aim is to solve for the reservation rule and study its properties and implications.

30.3 Implementation

Let 𝑓 be our initial guess of 𝑓 ∗ .


When we iterate, we use the fitted value function iteration algorithm.
In particular, 𝑓 and all subsequent iterates are stored as a vector of values on a grid.
These points are interpolated into a function as required, using piecewise linear interpolation.
The integral in the definition of 𝑄𝑓 is calculated by Monte Carlo.
The following list helps Numba by providing some type information about the data we will work with.

job_search_data = [
('μ', float64), # transient shock log mean
('s', float64), # transient shock log variance
('d', float64), # shift coefficient of persistent state
('ρ', float64), # correlation coefficient of persistent state
('σ', float64), # state volatility
('β', float64), # discount factor
('c', float64), # unemployment compensation
('z_grid', float64[:]), # grid over the state space
('e_draws', float64[:,:]) # Monte Carlo draws for integration
]

Here’s a class that stores the data and the right hand side of the Bellman equation.
Default parameter values are embedded in the class.

@jitclass(job_search_data)
class JobSearch:

def __init__(self,
μ=0.0, # transient shock log mean
s=1.0, # transient shock log variance
d=0.0, # shift coefficient of persistent state
ρ=0.9, # correlation coefficient of persistent state
σ=0.1, # state volatility
β=0.98, # discount factor
c=5, # unemployment compensation
mc_size=1000,
(continues on next page)

30.3. Implementation 537


Intermediate Quantitative Economics with Python

(continued from previous page)


grid_size=100):

self.μ, self.s, self.d, = μ, s, d,


self.ρ, self.σ, self.β, self.c = ρ, σ, β, c

# Set up grid
z_mean = d / (1 - ρ)
z_sd = σ / np.sqrt(1 - ρ**2)
k = 3 # std devs from mean
a, b = z_mean - k * z_sd, z_mean + k * z_sd
self.z_grid = np.linspace(a, b, grid_size)

# Draw and store shocks


np.random.seed(1234)
self.e_draws = randn(2, mc_size)

def parameters(self):
"""
Return all parameters as a tuple.
"""
return self.μ, self.s, self.d, \
self.ρ, self.σ, self.β, self.c

Next we implement the 𝑄 operator.

@njit(parallel=True)
def Q(js, f_in, f_out):
"""
Apply the operator Q.

* js is an instance of JobSearch
* f_in and f_out are arrays that represent f and Qf respectively

"""

μ, s, d, ρ, σ, β, c = js.parameters()
M = js.e_draws.shape[1]

for i in prange(len(js.z_grid)):
z = js.z_grid[i]
expectation = 0.0
for m in range(M):
e1, e2 = js.e_draws[:, m]
z_next = d + ρ * z + σ * e1
go_val = np.interp(z_next, js.z_grid, f_in) # f(z')
y_next = np.exp(μ + s * e2) # y' draw
w_next = np.exp(z_next) + y_next # w' draw
stop_val = np.log(w_next) / (1 - β)
expectation += max(stop_val, go_val)
expectation = expectation / M
f_out[i] = np.log(c) + β * expectation

Here’s a function to compute an approximation to the fixed point of 𝑄.

def compute_fixed_point(js,
(continues on next page)

538 Chapter 30. Job Search IV: Correlated Wage Offers


Intermediate Quantitative Economics with Python

(continued from previous page)


use_parallel=True,
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=25):

f_init = np.full(len(js.z_grid), np.log(js.c))


f_out = np.empty_like(f_init)

# Set up loop
f_in = f_init
i = 0
error = tol + 1

while i < max_iter and error > tol:


Q(js, f_in, f_out)
error = np.max(np.abs(f_in - f_out))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
f_in[:] = f_out

if error > tol:


print("Failed to converge!")
elif verbose:
print(f"\nConverged in {i} iterations.")

return f_out

Let’s try generating an instance and solving the model.

js = JobSearch()

qe.tic()
f_star = compute_fixed_point(js, verbose=True)
qe.toc()

Error at iteration 25 is 0.5762477839587632.

Error at iteration 50 is 0.11808817939665062.

Error at iteration 75 is 0.02857744138523799.

Error at iteration 100 is 0.00715833638517438.

Error at iteration 125 is 0.0018027870994501427.

Error at iteration 150 is 0.0004548908741099922.

Error at iteration 175 is 0.00011479050299101345.

(continues on next page)

30.3. Implementation 539


Intermediate Quantitative Economics with Python

(continued from previous page)


Converged in 178 iterations.
TOC: Elapsed: 0:00:6.43

6.438979864120483

Next we will compute and plot the reservation wage function defined in (30.1).

res_wage_function = np.exp(f_star * (1 - js.β))

fig, ax = plt.subplots()
ax.plot(js.z_grid, res_wage_function, label="reservation wage given $z$")
ax.set(xlabel="$z$", ylabel="wage")
ax.legend()
plt.show()

Notice that the reservation wage is increasing in the current state 𝑧.


This is because a higher state leads the agent to predict higher future wages, increasing the option value of waiting.
Let’s try changing unemployment compensation and look at its impact on the reservation wage:

c_vals = 1, 2, 3

fig, ax = plt.subplots()

for c in c_vals:
(continues on next page)

540 Chapter 30. Job Search IV: Correlated Wage Offers


Intermediate Quantitative Economics with Python

(continued from previous page)


js = JobSearch(c=c)
f_star = compute_fixed_point(js, verbose=False)
res_wage_function = np.exp(f_star * (1 - js.β))
ax.plot(js.z_grid, res_wage_function, label=rf"$\bar w$ at $c = {c}$")

ax.set(xlabel="$z$", ylabel="wage")
ax.legend()
plt.show()

As expected, higher unemployment compensation shifts the reservation wage up at all state values.

30.4 Unemployment Duration

Next we study how mean unemployment duration varies with unemployment compensation.
For simplicity we’ll fix the initial state at 𝑧𝑡 = 0.

def compute_unemployment_duration(js, seed=1234):

f_star = compute_fixed_point(js, verbose=False)


μ, s, d, ρ, σ, β, c = js.parameters()
z_grid = js.z_grid
np.random.seed(seed)

@njit
(continues on next page)

30.4. Unemployment Duration 541


Intermediate Quantitative Economics with Python

(continued from previous page)


def f_star_function(z):
return np.interp(z, z_grid, f_star)

@njit
def draw_tau(t_max=10_000):
z = 0
t = 0

unemployed = True
while unemployed and t < t_max:
# draw current wage
y = np.exp(μ + s * np.random.randn())
w = np.exp(z) + y
res_wage = np.exp(f_star_function(z) * (1 - β))
# if optimal to stop, record t
if w >= res_wage:
unemployed = False
τ = t
# else increment data and state
else:
z = ρ * z + d + σ * np.random.randn()
t += 1
return τ

@njit(parallel=True)
def compute_expected_tau(num_reps=100_000):
sum_value = 0
for i in prange(num_reps):
sum_value += draw_tau()
return sum_value / num_reps

return compute_expected_tau()

Let’s test this out with some possible values for unemployment compensation.

c_vals = np.linspace(1.0, 10.0, 8)


durations = np.empty_like(c_vals)
for i, c in enumerate(c_vals):
js = JobSearch(c=c)
τ = compute_unemployment_duration(js)
durations[i] = τ

Here is a plot of the results.

fig, ax = plt.subplots()
ax.plot(c_vals, durations)
ax.set_xlabel("unemployment compensation")
ax.set_ylabel("mean unemployment duration")
plt.show()

542 Chapter 30. Job Search IV: Correlated Wage Offers


Intermediate Quantitative Economics with Python

Not surprisingly, unemployment duration increases when unemployment compensation is higher.


This is because the value of waiting increases with unemployment compensation.

30.5 Exercises

Exercise 30.5.1
Investigate how mean unemployment duration varies with the discount factor 𝛽.
• What is your prior expectation?
• Do your results match up?

Solution to Exercise 30.5.1


Here is one solution

beta_vals = np.linspace(0.94, 0.99, 8)


durations = np.empty_like(beta_vals)
for i, β in enumerate(beta_vals):
js = JobSearch(β=β)
τ = compute_unemployment_duration(js)
durations[i] = τ

30.5. Exercises 543


Intermediate Quantitative Economics with Python

fig, ax = plt.subplots()
ax.plot(beta_vals, durations)
ax.set_xlabel(r"$\beta$")
ax.set_ylabel("mean unemployment duration")
plt.show()

The figure shows that more patient individuals tend to wait longer before accepting an offer.

544 Chapter 30. Job Search IV: Correlated Wage Offers


CHAPTER

THIRTYONE

JOB SEARCH V: MODELING CAREER CHOICE

Contents

• Job Search V: Modeling Career Choice


– Overview
– Model
– Implementation
– Exercises

In addition to what’s in Anaconda, this lecture will need the following libraries:

!pip install quantecon

31.1 Overview

Next, we study a computational problem concerning career and job choices.


The model is originally due to Derek Neal [Neal, 1999].
This exposition draws on the presentation in [Ljungqvist and Sargent, 2018], section 6.5.
We begin with some imports:

import matplotlib.pyplot as plt


plt.rcParams["figure.figsize"] = (11, 5) #set default figure size
import numpy as np
import quantecon as qe
from numba import njit, prange
from quantecon.distributions import BetaBinomial
from scipy.special import binom, beta
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm

545
Intermediate Quantitative Economics with Python

31.1.1 Model Features

• Career and job within career both chosen to maximize expected discounted wage flow.
• Infinite horizon dynamic programming with two state variables.

31.2 Model

In what follows we distinguish between a career and a job, where


• a career is understood to be a general field encompassing many possible jobs, and
• a job is understood to be a position with a particular firm
For workers, wages can be decomposed into the contribution of job and career
• 𝑤𝑡 = 𝜃𝑡 + 𝜖𝑡 , where
– 𝜃𝑡 is the contribution of career at time 𝑡
– 𝜖𝑡 is the contribution of the job at time 𝑡
At the start of time 𝑡, a worker has the following options
• retain a current (career, job) pair (𝜃𝑡 , 𝜖𝑡 ) — referred to hereafter as “stay put”
• retain a current career 𝜃𝑡 but redraw a job 𝜖𝑡 — referred to hereafter as “new job”
• redraw both a career 𝜃𝑡 and a job 𝜖𝑡 — referred to hereafter as “new life”
Draws of 𝜃 and 𝜖 are independent of each other and past values, with
• 𝜃𝑡 ∼ 𝐹
• 𝜖𝑡 ∼ 𝐺
Notice that the worker does not have the option to retain a job but redraw a career — starting a new career always requires
starting a new job.
A young worker aims to maximize the expected sum of discounted wages

𝔼 ∑ 𝛽 𝑡 𝑤𝑡 (31.1)
𝑡=0

subject to the choice restrictions specified above.


Let 𝑣(𝜃, 𝜖) denote the value function, which is the maximum of (31.1) overall feasible (career, job) policies, given the
initial state (𝜃, 𝜖).
The value function obeys

𝑣(𝜃, 𝜖) = max{𝐼, 𝐼𝐼, 𝐼𝐼𝐼}

where
𝐼 = 𝜃 + 𝜖 + 𝛽𝑣(𝜃, 𝜖)

𝐼𝐼 = 𝜃 + ∫ 𝜖′ 𝐺(𝑑𝜖′ ) + 𝛽 ∫ 𝑣(𝜃, 𝜖′ )𝐺(𝑑𝜖′ )

𝐼𝐼𝐼 = ∫ 𝜃′ 𝐹 (𝑑𝜃′ ) + ∫ 𝜖′ 𝐺(𝑑𝜖′ ) + 𝛽 ∫ ∫ 𝑣(𝜃′ , 𝜖′ )𝐺(𝑑𝜖′ )𝐹 (𝑑𝜃′ )

Evidently 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 correspond to “stay put”, “new job” and “new life”, respectively.

546 Chapter 31. Job Search V: Modeling Career Choice


Intermediate Quantitative Economics with Python

31.2.1 Parameterization

As in [Ljungqvist and Sargent, 2018], section 6.5, we will focus on a discrete version of the model, parameterized as
follows:
• both 𝜃 and 𝜖 take values in the set np.linspace(0, B, grid_size) — an even grid of points between 0
and 𝐵 inclusive
• grid_size = 50
• B = 5
• β = 0.95
The distributions 𝐹 and 𝐺 are discrete distributions generating draws from the grid points np.linspace(0, B,
grid_size).
A very useful family of discrete distributions is the Beta-binomial family, with probability mass function

𝑛 𝐵(𝑘 + 𝑎, 𝑛 − 𝑘 + 𝑏)
𝑝(𝑘 | 𝑛, 𝑎, 𝑏) = ( ) , 𝑘 = 0, … , 𝑛
𝑘 𝐵(𝑎, 𝑏)

Interpretation:
• draw 𝑞 from a Beta distribution with shape parameters (𝑎, 𝑏)
• run 𝑛 independent binary trials, each with success probability 𝑞
• 𝑝(𝑘 | 𝑛, 𝑎, 𝑏) is the probability of 𝑘 successes in these 𝑛 trials
Nice properties:
• very flexible class of distributions, including uniform, symmetric unimodal, etc.
• only three parameters
Here’s a figure showing the effect on the pmf of different shape parameters when 𝑛 = 50.

def gen_probs(n, a, b):


probs = np.zeros(n+1)
for k in range(n+1):
probs[k] = binom(n, k) * beta(k + a, n - k + b) / beta(a, b)
return probs

n = 50
a_vals = [0.5, 1, 100]
b_vals = [0.5, 1, 100]
fig, ax = plt.subplots(figsize=(10, 6))
for a, b in zip(a_vals, b_vals):
ab_label = f'$a = {a:.1f}$, $b = {b:.1f}$'
ax.plot(list(range(0, n+1)), gen_probs(n, a, b), '-o', label=ab_label)
ax.legend()
plt.show()

31.2. Model 547


Intermediate Quantitative Economics with Python

31.3 Implementation

We will first create a class CareerWorkerProblem which will hold the default parameterizations of the model and
an initial guess for the value function.

class CareerWorkerProblem:

def __init__(self,
B=5.0, # Upper bound
β=0.95, # Discount factor
grid_size=50, # Grid size
F_a=1,
F_b=1,
G_a=1,
G_b=1):

self.β, self.grid_size, self.B = β, grid_size, B

self.θ = np.linspace(0, B, grid_size) # Set of θ values


self.ϵ = np.linspace(0, B, grid_size) # Set of ϵ values

self.F_probs = BetaBinomial(grid_size - 1, F_a, F_b).pdf()


self.G_probs = BetaBinomial(grid_size - 1, G_a, G_b).pdf()
self.F_mean = np.sum(self.θ * self.F_probs)
self.G_mean = np.sum(self.ϵ * self.G_probs)

# Store these parameters for str and repr methods


self._F_a, self._F_b = F_a, F_b
self._G_a, self._G_b = G_a, G_b

548 Chapter 31. Job Search V: Modeling Career Choice


Intermediate Quantitative Economics with Python

The following function takes an instance of CareerWorkerProblem and returns the corresponding Bellman operator
𝑇 and the greedy policy function.
In this model, 𝑇 is defined by 𝑇 𝑣(𝜃, 𝜖) = max{𝐼, 𝐼𝐼, 𝐼𝐼𝐼}, where 𝐼, 𝐼𝐼 and 𝐼𝐼𝐼 are as given in (31.2).

def operator_factory(cw, parallel_flag=True):

"""
Returns jitted versions of the Bellman operator and the
greedy policy function

cw is an instance of ``CareerWorkerProblem``
"""

θ, ϵ, β = cw.θ, cw.ϵ, cw.β


F_probs, G_probs = cw.F_probs, cw.G_probs
F_mean, G_mean = cw.F_mean, cw.G_mean

@njit(parallel=parallel_flag)
def T(v):
"The Bellman operator"

v_new = np.empty_like(v)

for i in prange(len(v)):
for j in prange(len(v)):
v1 = θ[i] + ϵ[j] + β * v[i, j] # Stay put
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs # New job
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs # New life
v_new[i, j] = max(v1, v2, v3)

return v_new

@njit
def get_greedy(v):
"Computes the v-greedy policy"

σ = np.empty(v.shape)

for i in range(len(v)):
for j in range(len(v)):
v1 = θ[i] + ϵ[j] + β * v[i, j]
v2 = θ[i] + G_mean + β * v[i, :] @ G_probs
v3 = G_mean + F_mean + β * F_probs @ v @ G_probs
if v1 > max(v2, v3):
action = 1
elif v2 > max(v1, v3):
action = 2
else:
action = 3
σ[i, j] = action

return σ

return T, get_greedy

Lastly, solve_model will take an instance of CareerWorkerProblem and iterate using the Bellman operator to
find the fixed point of the Bellman equation.

31.3. Implementation 549


Intermediate Quantitative Economics with Python

def solve_model(cw,
use_parallel=True,
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=25):

T, _ = operator_factory(cw, parallel_flag=use_parallel)

# Set up loop
v = np.full((cw.grid_size, cw.grid_size), 100.) # Initial guess
i = 0
error = tol + 1

while i < max_iter and error > tol:


v_new = T(v)
error = np.max(np.abs(v - v_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if error > tol:


print("Failed to converge!")

elif verbose:
print(f"\nConverged in {i} iterations.")

return v_new

Here’s the solution to the model – an approximate value function

cw = CareerWorkerProblem()
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)

fig = plt.figure(figsize=(8, 6))


ax = fig.add_subplot(111, projection='3d')
tg, eg = np.meshgrid(cw.θ, cw.ϵ)
ax.plot_surface(tg,
eg,
v_star.T,
cmap=cm.jet,
alpha=0.5,
linewidth=0.25)
ax.set(xlabel='θ', ylabel='ϵ', zlim=(150, 200))
ax.view_init(ax.elev, 225)
plt.show()

550 Chapter 31. Job Search V: Modeling Career Choice


Intermediate Quantitative Economics with Python

And here is the optimal policy

fig, ax = plt.subplots(figsize=(6, 6))


tg, eg = np.meshgrid(cw.θ, cw.ϵ)
lvls = (0.5, 1.5, 2.5, 3.5)
ax.contourf(tg, eg, greedy_star.T, levels=lvls, cmap=cm.winter, alpha=0.5)
ax.contour(tg, eg, greedy_star.T, colors='k', levels=lvls, linewidths=2)
ax.set(xlabel='θ', ylabel='ϵ')
ax.text(1.8, 2.5, 'new life', fontsize=14)
ax.text(4.5, 2.5, 'new job', fontsize=14, rotation='vertical')
ax.text(4.0, 4.5, 'stay put', fontsize=14)
plt.show()

31.3. Implementation 551


Intermediate Quantitative Economics with Python

Interpretation:
• If both job and career are poor or mediocre, the worker will experiment with a new job and new career.
• If career is sufficiently good, the worker will hold it and experiment with new jobs until a sufficiently good one is
found.
• If both job and career are good, the worker will stay put.
Notice that the worker will always hold on to a sufficiently good career, but not necessarily hold on to even the best paying
job.
The reason is that high lifetime wages require both variables to be large, and the worker cannot change careers without
changing jobs.
• Sometimes a good job must be sacrificed in order to change to a better career.

552 Chapter 31. Job Search V: Modeling Career Choice


Intermediate Quantitative Economics with Python

31.4 Exercises

Exercise 31.4.1
Using the default parameterization in the class CareerWorkerProblem, generate and plot typical sample paths for 𝜃
and 𝜖 when the worker follows the optimal policy.
In particular, modulo randomness, reproduce the following figure (where the horizontal axis represents time)

Hint: To generate the draws from the distributions 𝐹 and 𝐺, use quantecon.random.draw().

Solution to Exercise 31.4.1


Simulate job/career paths.
In reading the code, recall that optimal_policy[i, j] = policy at (𝜃𝑖 , 𝜖𝑗 ) = either 1, 2 or 3; meaning ‘stay put’,
‘new job’ and ‘new life’.

31.4. Exercises 553


Intermediate Quantitative Economics with Python

F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
v_star = solve_model(cw, verbose=False)
T, get_greedy = operator_factory(cw)
greedy_star = get_greedy(v_star)

def gen_path(optimal_policy, F, G, t=20):


i = j = 0
θ_index = []
ϵ_index = []
for t in range(t):
if optimal_policy[i, j] == 1: # Stay put
pass

elif greedy_star[i, j] == 2: # New job


j = qe.random.draw(G)

else: # New life


i, j = qe.random.draw(F), qe.random.draw(G)
θ_index.append(i)
ϵ_index.append(j)
return cw.θ[θ_index], cw.ϵ[ϵ_index]

fig, axes = plt.subplots(2, 1, figsize=(10, 8))


for ax in axes:
θ_path, ϵ_path = gen_path(greedy_star, F, G)
ax.plot(ϵ_path, label='ϵ')
ax.plot(θ_path, label='θ')
ax.set_ylim(0, 6)

plt.legend()
plt.show()

554 Chapter 31. Job Search V: Modeling Career Choice


Intermediate Quantitative Economics with Python

Exercise 31.4.2
Let’s now consider how long it takes for the worker to settle down to a permanent job, given a starting point of (𝜃, 𝜖) =
(0, 0).
In other words, we want to study the distribution of the random variable

𝑇 ∗ ∶= the first point in time from which the worker's job no longer changes

Evidently, the worker’s job becomes permanent if and only if (𝜃𝑡 , 𝜖𝑡 ) enters the “stay put” region of (𝜃, 𝜖) space.
Letting 𝑆 denote this region, 𝑇 ∗ can be expressed as the first passage time to 𝑆 under the optimal policy:

𝑇 ∗ ∶= inf{𝑡 ≥ 0 | (𝜃𝑡 , 𝜖𝑡 ) ∈ 𝑆}

Collect 25,000 draws of this random variable and compute the median (which should be about 7).
Repeat the exercise with 𝛽 = 0.99 and interpret the change.

Solution to Exercise 31.4.2


The median for the original parameterization can be computed as follows

31.4. Exercises 555


Intermediate Quantitative Economics with Python

cw = CareerWorkerProblem()
F = np.cumsum(cw.F_probs)
G = np.cumsum(cw.G_probs)
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)

@njit
def passage_time(optimal_policy, F, G):
t = 0
i = j = 0
while True:
if optimal_policy[i, j] == 1: # Stay put
return t
elif optimal_policy[i, j] == 2: # New job
j = qe.random.draw(G)
else: # New life
i, j = qe.random.draw(F), qe.random.draw(G)
t += 1

@njit(parallel=True)
def median_time(optimal_policy, F, G, M=25000):
samples = np.empty(M)
for i in prange(M):
samples[i] = passage_time(optimal_policy, F, G)
return np.median(samples)

median_time(greedy_star, F, G)

7.0

To compute the median with 𝛽 = 0.99 instead of the default value 𝛽 = 0.95, replace cw = CareerWorkerProb-
lem() with cw = CareerWorkerProblem(β=0.99).
The medians are subject to randomness but should be about 7 and 14 respectively.
Not surprisingly, more patient workers will wait longer to settle down to their final job.

Exercise 31.4.3
Set the parameterization to G_a = G_b = 100 and generate a new optimal policy figure – interpret.

Solution to Exercise 31.4.3


Here is one solution

cw = CareerWorkerProblem(G_a=100, G_b=100)
T, get_greedy = operator_factory(cw)
v_star = solve_model(cw, verbose=False)
greedy_star = get_greedy(v_star)

fig, ax = plt.subplots(figsize=(6, 6))


tg, eg = np.meshgrid(cw.θ, cw.ϵ)
lvls = (0.5, 1.5, 2.5, 3.5)
(continues on next page)

556 Chapter 31. Job Search V: Modeling Career Choice


Intermediate Quantitative Economics with Python

(continued from previous page)


ax.contourf(tg, eg, greedy_star.T, levels=lvls, cmap=cm.winter, alpha=0.5)
ax.contour(tg, eg, greedy_star.T, colors='k', levels=lvls, linewidths=2)
ax.set(xlabel='θ', ylabel='ϵ')
ax.text(1.8, 2.5, 'new life', fontsize=14)
ax.text(4.5, 1.5, 'new job', fontsize=14, rotation='vertical')
ax.text(4.0, 4.5, 'stay put', fontsize=14)
plt.show()

In the new figure, you see that the region for which the worker stays put has grown because the distribution for 𝜖 has
become more concentrated around the mean, making high-paying jobs less realistic.

31.4. Exercises 557


Intermediate Quantitative Economics with Python

558 Chapter 31. Job Search V: Modeling Career Choice


CHAPTER

THIRTYTWO

JOB SEARCH VI: ON-THE-JOB SEARCH

Contents

• Job Search VI: On-the-Job Search


– Overview
– Model
– Implementation
– Solving for Policies
– Exercises

32.1 Overview

In this section, we solve a simple on-the-job search model


• based on [Ljungqvist and Sargent, 2018], exercise 6.18, and [Jovanovic, 1979]
Let’s start with some imports:

import matplotlib.pyplot as plt


import numpy as np
import scipy.stats as stats
from numba import njit, prange

32.1.1 Model Features

• job-specific human capital accumulation combined with on-the-job search


• infinite-horizon dynamic programming with one state variable and two controls

559
Intermediate Quantitative Economics with Python

32.2 Model

Let 𝑥𝑡 denote the time-𝑡 job-specific human capital of a worker employed at a given firm and let 𝑤𝑡 denote current wages.
Let 𝑤𝑡 = 𝑥𝑡 (1 − 𝑠𝑡 − 𝜙𝑡 ), where
• 𝜙𝑡 is investment in job-specific human capital for the current role and
• 𝑠𝑡 is search effort, devoted to obtaining new offers from other firms.
For as long as the worker remains in the current job, evolution of {𝑥𝑡 } is given by 𝑥𝑡+1 = 𝑔(𝑥𝑡 , 𝜙𝑡 ).
When search effort at 𝑡 is 𝑠𝑡 , the worker receives a new job offer with probability 𝜋(𝑠𝑡 ) ∈ [0, 1].
The value of the offer, measured in job-specific human capital, is 𝑢𝑡+1 , where {𝑢𝑡 } is IID with common distribution 𝑓.
The worker can reject the current offer and continue with existing job.
Hence 𝑥𝑡+1 = 𝑢𝑡+1 if he/she accepts and 𝑥𝑡+1 = 𝑔(𝑥𝑡 , 𝜙𝑡 ) otherwise.
Let 𝑏𝑡+1 ∈ {0, 1} be a binary random variable, where 𝑏𝑡+1 = 1 indicates that the worker receives an offer at the end of
time 𝑡.
We can write

𝑥𝑡+1 = (1 − 𝑏𝑡+1 )𝑔(𝑥𝑡 , 𝜙𝑡 ) + 𝑏𝑡+1 max{𝑔(𝑥𝑡 , 𝜙𝑡 ), 𝑢𝑡+1 } (32.1)

Agent’s objective: maximize expected discounted sum of wages via controls {𝑠𝑡 } and {𝜙𝑡 }.
Taking the expectation of 𝑣(𝑥𝑡+1 ) and using (32.1), the Bellman equation for this problem can be written as

𝑣(𝑥) = max {𝑥(1 − 𝑠 − 𝜙) + 𝛽(1 − 𝜋(𝑠))𝑣[𝑔(𝑥, 𝜙)] + 𝛽𝜋(𝑠) ∫ 𝑣[𝑔(𝑥, 𝜙) ∨ 𝑢]𝑓(𝑑𝑢)} (32.2)
𝑠+𝜙≤1

Here nonnegativity of 𝑠 and 𝜙 is understood, while 𝑎 ∨ 𝑏 ∶= max{𝑎, 𝑏}.

32.2.1 Parameterization

In the implementation below, we will focus on the parameterization



𝑔(𝑥, 𝜙) = 𝐴(𝑥𝜙)𝛼 , 𝜋(𝑠) = 𝑠 and 𝑓 = Beta(2, 2)

with default parameter values


• 𝐴 = 1.4
• 𝛼 = 0.6
• 𝛽 = 0.96
The Beta(2, 2) distribution is supported on (0, 1) - it has a unimodal, symmetric density peaked at 0.5.

32.2.2 Back-of-the-Envelope Calculations

Before we solve the model, let’s make some quick calculations that provide intuition on what the solution should look like.
To begin, observe that the worker has two instruments to build capital and hence wages:
1. invest in capital specific to the current job via 𝜙
2. search for a new job with better job-specific capital match via 𝑠

560 Chapter 32. Job Search VI: On-the-Job Search


Intermediate Quantitative Economics with Python

Since wages are 𝑥(1 − 𝑠 − 𝜙), marginal cost of investment via either 𝜙 or 𝑠 is identical.
Our risk-neutral worker should focus on whatever instrument has the highest expected return.
The relative expected return will depend on 𝑥.
For example, suppose first that 𝑥 = 0.05
• If 𝑠 = 1 and 𝜙 = 0, then since 𝑔(𝑥, 𝜙) = 0, taking expectations of (32.1) gives expected next period capital equal
to 𝜋(𝑠)𝔼𝑢 = 𝔼𝑢 = 0.5.
• If 𝑠 = 0 and 𝜙 = 1, then next period capital is 𝑔(𝑥, 𝜙) = 𝑔(0.05, 1) ≈ 0.23.
Both rates of return are good, but the return from search is better.
Next, suppose that 𝑥 = 0.4
• If 𝑠 = 1 and 𝜙 = 0, then expected next period capital is again 0.5
• If 𝑠 = 0 and 𝜙 = 1, then 𝑔(𝑥, 𝜙) = 𝑔(0.4, 1) ≈ 0.8
Return from investment via 𝜙 dominates expected return from search.
Combining these observations gives us two informal predictions:
1. At any given state 𝑥, the two controls 𝜙 and 𝑠 will function primarily as substitutes — worker will focus on whichever
instrument has the higher expected return.
2. For sufficiently small 𝑥, search will be preferable to investment in job-specific human capital. For larger 𝑥, the
reverse will be true.
Now let’s turn to implementation, and see if we can match our predictions.

32.3 Implementation

We will set up a class JVWorker that holds the parameters of the model described above

class JVWorker:
r"""
A Jovanovic-type model of employment with on-the-job search.

"""

def __init__(self,
A=1.4,
α=0.6,
β=0.96, # Discount factor
π=np.sqrt, # Search effort function
a=2, # Parameter of f
b=2, # Parameter of f
grid_size=50,
mc_size=100,
ɛ=1e-4):

self.A, self.α, self.β, self.π = A, α, β, π


self.mc_size, self.ɛ = mc_size, ɛ

self.g = njit(lambda x, ϕ: A * (x * ϕ)**α) # Transition function


self.f_rvs = np.random.beta(a, b, mc_size)

(continues on next page)

32.3. Implementation 561


Intermediate Quantitative Economics with Python

(continued from previous page)


# Max of grid is the max of a large quantile value for f and the
# fixed point y = g(y, 1)
ɛ = 1e-4
grid_max = max(A**(1 / (1 - α)), stats.beta(a, b).ppf(1 - ɛ))

# Human capital
self.x_grid = np.linspace(ɛ, grid_max, grid_size)

The function operator_factory takes an instance of this class and returns a jitted version of the Bellman operator
T, i.e.
𝑇 𝑣(𝑥) = max 𝑤(𝑠, 𝜙)
𝑠+𝜙≤1

where

𝑤(𝑠, 𝜙) ∶= 𝑥(1 − 𝑠 − 𝜙) + 𝛽(1 − 𝜋(𝑠))𝑣[𝑔(𝑥, 𝜙)] + 𝛽𝜋(𝑠) ∫ 𝑣[𝑔(𝑥, 𝜙) ∨ 𝑢]𝑓(𝑑𝑢) (32.3)

When we represent 𝑣, it will be with a NumPy array v giving values on grid x_grid.
But to evaluate the right-hand side of (32.3), we need a function, so we replace the arrays v and x_grid with a function
v_func that gives linear interpolation of v on x_grid.
Inside the for loop, for each x in the grid over the state space, we set up the function 𝑤(𝑧) = 𝑤(𝑠, 𝜙) defined in (32.3).
The function is maximized over all feasible (𝑠, 𝜙) pairs.
Another function, get_greedy returns the optimal choice of 𝑠 and 𝜙 at each 𝑥, given a value function.

def operator_factory(jv, parallel_flag=True):

"""
Returns a jitted version of the Bellman operator T

jv is an instance of JVWorker

"""

π, β = jv.π, jv.β
x_grid, ɛ, mc_size = jv.x_grid, jv.ɛ, jv.mc_size
f_rvs, g = jv.f_rvs, jv.g

@njit
def state_action_values(z, x, v):
s, ϕ = z
v_func = lambda x: np.interp(x, x_grid, v)

integral = 0
for m in range(mc_size):
u = f_rvs[m]
integral += v_func(max(g(x, ϕ), u))
integral = integral / mc_size

q = π(s) * integral + (1 - π(s)) * v_func(g(x, ϕ))


return x * (1 - ϕ - s) + β * q

@njit(parallel=parallel_flag)
def T(v):
(continues on next page)

562 Chapter 32. Job Search VI: On-the-Job Search


Intermediate Quantitative Economics with Python

(continued from previous page)


"""
The Bellman operator
"""

v_new = np.empty_like(v)
for i in prange(len(x_grid)):
x = x_grid[i]

# Search on a grid
search_grid = np.linspace(ɛ, 1, 15)
max_val = -1
for s in search_grid:
for ϕ in search_grid:
current_val = state_action_values((s, ϕ), x, v) if s + ϕ <= 1␣
↪ else -1
if current_val > max_val:
max_val = current_val
v_new[i] = max_val

return v_new

@njit
def get_greedy(v):
"""
Computes the v-greedy policy of a given function v
"""
s_policy, ϕ_policy = np.empty_like(v), np.empty_like(v)

for i in range(len(x_grid)):
x = x_grid[i]
# Search on a grid
search_grid = np.linspace(ɛ, 1, 15)
max_val = -1
for s in search_grid:
for ϕ in search_grid:
current_val = state_action_values((s, ϕ), x, v) if s + ϕ <= 1␣
↪else -1

if current_val > max_val:


max_val = current_val
max_s, max_ϕ = s, ϕ
s_policy[i], ϕ_policy[i] = max_s, max_ϕ
return s_policy, ϕ_policy

return T, get_greedy

To solve the model, we will write a function that uses the Bellman operator and iterates to find a fixed point.

def solve_model(jv,
use_parallel=True,
tol=1e-4,
max_iter=1000,
verbose=True,
print_skip=25):

"""
Solves the model by value function iteration
(continues on next page)

32.3. Implementation 563


Intermediate Quantitative Economics with Python

(continued from previous page)

* jv is an instance of JVWorker

"""

T, _ = operator_factory(jv, parallel_flag=use_parallel)

# Set up loop
v = jv.x_grid * 0.5 # Initial condition
i = 0
error = tol + 1

while i < max_iter and error > tol:


v_new = T(v)
error = np.max(np.abs(v - v_new))
i += 1
if verbose and i % print_skip == 0:
print(f"Error at iteration {i} is {error}.")
v = v_new

if error > tol:


print("Failed to converge!")
elif verbose:
print(f"\nConverged in {i} iterations.")

return v_new

32.4 Solving for Policies

Let’s generate the optimal policies and see what they look like.

jv = JVWorker()
T, get_greedy = operator_factory(jv)
v_star = solve_model(jv)
s_star, ϕ_star = get_greedy(v_star)

Error at iteration 25 is 0.1511111530757523.

Error at iteration 50 is 0.05445996342335668.

Error at iteration 75 is 0.019627192017987127.

Error at iteration 100 is 0.007073575564430001.

Error at iteration 125 is 0.00254929340986898.

Error at iteration 150 is 0.0009187569752260316.

564 Chapter 32. Job Search VI: On-the-Job Search


Intermediate Quantitative Economics with Python

Error at iteration 175 is 0.0003311169974651307.

Error at iteration 200 is 0.0001193334787821243.

Converged in 205 iterations.

Here are the plots:

plots = [s_star, ϕ_star, v_star]


titles = ["s policy", "ϕ policy", "value function"]

fig, axes = plt.subplots(3, 1, figsize=(12, 12))

for ax, plot, title in zip(axes, plots, titles):


ax.plot(jv.x_grid, plot)