100% found this document useful (1 vote)
305 views1,478 pages

Quantitative Economics With Python

Uploaded by

Ivanildo Batista
Copyright
ยฉ ยฉ All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
305 views1,478 pages

Quantitative Economics With Python

Uploaded by

Ivanildo Batista
Copyright
ยฉ ยฉ All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1478

Lectures in Quantitative Economics

with Python

Thomas J. Sargent and John Stachurski1

July 1, 2019

1 https://lectures.quantecon.org/py/
2
Contents

I Introduction to Python 1

1 About Python 3

2 Setting up Your Python Environment 13

3 An Introductory Example 35

4 Python Essentials 55

5 OOP I: Introduction to Object Oriented Programming 73

II The Scientific Libraries 79

6 NumPy 81

7 Matplotlib 99

8 SciPy 111

9 Numba 123

10 Other Scientific Libraries 135

III Advanced Python Programming 145

11 Writing Good Code 147

12 OOP II: Building Classes 155

13 OOP III: Samuelson Multiplier Accelerator 171

14 More Language Features 205

15 Debugging 237

IV Data and Empirics 243

16 Pandas 245

17 Pandas for Panel Data 259

3
4 CONTENTS

18 Linear Regression in Python 279

19 Maximum Likelihood Estimation 295

V Tools and Techniques 315

20 Geometric Series for Elementary Economics 317

21 Linear Algebra 337

22 Complex Numbers and Trignometry 361

23 Orthogonal Projections and Their Applications 371

24 LLN and CLT 387

25 Linear State Space Models 405

26 Finite Markov Chains 429

27 Continuous State Markov Chains 455

28 Cass-Koopmans Optimal Growth Model 477

29 A First Look at the Kalman Filter 505

30 Reverse Engineering a la Muth 523

VI Dynamic Programming 531

31 Shortest Paths 533

32 Job Search I: The McCall Search Model 541

33 Job Search II: Search and Separation 553

34 A Problem that Stumped Milton Friedman 565

35 Job Search III: Search with Learning 583

36 Job Search IV: Modeling Career Choice 599

37 Job Search V: On-the-Job Search 611

38 Optimal Growth I: The Stochastic Optimal Growth Model 621

39 Optimal Growth II: Time Iteration 639

40 Optimal Growth III: The Endogenous Grid Method 657

41 LQ Dynamic Programming Problems 665

42 Optimal Savings I: The Permanent Income Model 693


CONTENTS 5

43 Optimal Savings II: LQ Techniques 711

44 Consumption and Tax Smoothing with Complete and Incomplete Markets 729

45 Optimal Savings III: Occasionally Binding Constraints 747

46 Robustness 763

47 Discrete State Dynamic Programming 783

VII Multiple Agent Models 807

48 Schellingโ€™s Segregation Model 809

49 A Lake Model of Employment and Unemployment 821

50 Rational Expectations Equilibrium 845

51 Markov Perfect Equilibrium 859

52 Robust Markov Perfect Equilibrium 875

53 Uncertainty Traps 893

54 The Aiyagari Model 907

55 Default Risk and Income Fluctuations 915

56 Globalization and Cycles 933

57 Coaseโ€™s Theory of the Firm 949

VIII Recursive Models of Dynamic Linear Economies 963

58 Recursive Models of Dynamic Linear Economies 965

59 Growth in Dynamic Linear Economies 1001

60 Lucas Asset Pricing Using DLE 1013

61 IRFs in Hall Models 1021

62 Permanent Income Model using the DLE Class 1029

63 Rosen Schooling Model 1035

64 Cattle Cycles 1041

65 Shock Non Invertibility 1049

IX Classic Linear Models 1055

66 Von Neumann Growth Model (and a Generalization) 1057


6 CONTENTS

X Time Series Models 1073

67 Covariance Stationary Processes 1075

68 Estimation of Spectra 1095

69 Additive and Multiplicative Functionals 1109

70 Classical Control with Linear Algebra 1131

71 Classical Prediction and Filtering With Linear Algebra 1151

XI Asset Pricing and Finance 1171

72 Asset Pricing I: Finite State Models 1173

73 Asset Pricing II: The Lucas Asset Pricing Model 1193

74 Asset Pricing III: Incomplete Markets 1203

75 Two Modifications of Mean-variance Portfolio Theory 1215

XII Dynamic Programming Squared 1239

76 Stackelberg Plans 1241

77 Ramsey Plans, Time Inconsistency, Sustainable Plans 1265

78 Optimal Taxation in an LQ Economy 1289

79 Optimal Taxation with State-Contingent Debt 1309

80 Optimal Taxation without State-Contingent Debt 1339

81 Fluctuating Interest Rates Deliver Fiscal Insurance 1365

82 Fiscal Risk and Government Debt 1389

83 Competitive Equilibria of Chang Model 1415

84 Credible Government Policies in Chang Model 1443


Part I

Introduction to Python

1
1

About Python

1.1 Contents

โ€ข Overview 1.2

โ€ข Whatโ€™s Python? 1.3

โ€ข Scientific Programming 1.4

โ€ข Learn More 1.5

1.2 Overview

In this lecture we will

โ€ข Outline what Python is


โ€ข Showcase some of its abilities
โ€ข Compare it to some other languages

At this stage, itโ€™s not our intention that you try to replicate all you see
We will work through what follows at a slow pace later in the lecture series
Our only objective for this lecture is to give you some feel of what Python is, and what it can
do

1.3 Whatโ€™s Python?

Python is a general-purpose programming language conceived in 1989 by Dutch programmer


Guido van Rossum
Python is free and open source, with development coordinated through the Python Software
Foundation
Python has experienced rapid adoption in the last decade and is now one of the most popular
programming languages

3
4 1. ABOUT PYTHON

1.3.1 Common Uses

Python is a general-purpose language used in almost all application domains

โ€ข communications
โ€ข web development
โ€ข CGI and graphical user interfaces
โ€ข games
โ€ข multimedia, data processing, security, etc., etc., etc.

Used extensively by Internet service and high tech companies such as

โ€ข Google
โ€ข Dropbox
โ€ข Reddit
โ€ข YouTube
โ€ข Walt Disney Animation, etc., etc.

Often used to teach computer science and programming


For reasons we will discuss, Python is particularly popular within the scientific community

โ€ข academia, NASA, CERN, Wall St., etc., etc.

1.3.2 Relative Popularity

The following chart, produced using Stack Overflow Trends, shows one measure of the relative
popularity of Python

The figure indicates not only that Python is widely used but also that adoption of Python
has accelerated significantly since 2012
We suspect this is driven at least in part by uptake in the scientific domain, particularly in
rapidly growing fields like data science
1.3. WHATโ€™S PYTHON? 5

For example, the popularity of pandas, a library for data analysis with Python has exploded,
as seen here
(The corresponding time path for MATLAB is shown for comparison)

Note that pandas takes off in 2012, which is the same year that we seek Pythonโ€™s popularity
begin to spike in the first figure
Overall, itโ€™s clear that

โ€ข Python is one of the most popular programming languages worldwide


โ€ข Python is a major tool for scientific computing, accounting for a rapidly rising share of
scientific work around the globe

1.3.3 Features

Python is a high-level language suitable for rapid development


It has a relatively small core language supported by many libraries
Other features:

โ€ข A multiparadigm language, in that multiple programming styles are supported (proce-


dural, object-oriented, functional, etc.)
โ€ข Interpreted rather than compiled

1.3.4 Syntax and Design

One nice feature of Python is its elegant syntax โ€” weโ€™ll see many examples later on
Elegant code might sound superfluous but in fact itโ€™s highly beneficial because it makes the
syntax easy to read and easy to remember
Remembering how to read from files, sort dictionaries and other such routine tasks means
that you donโ€™t need to break your flow in order to hunt down correct syntax
Closely related to elegant syntax is an elegant design
6 1. ABOUT PYTHON

Features like iterators, generators, decorators, list comprehensions, etc. make Python highly
expressive, allowing you to get more done with less code
Namespaces improve productivity by cutting down on bugs and syntax errors

1.4 Scientific Programming

Python has become one of the core languages of scientific computing


Itโ€™s either the dominant player or a major player in

โ€ข Machine learning and data science


โ€ข Astronomy
โ€ข Artificial intelligence
โ€ข Chemistry
โ€ข Computational biology
โ€ข Meteorology
โ€ข etc., etc.

Its popularity in economics is also beginning to rise


This section briefly showcases some examples of Python for scientific programming

โ€ข All of these topics will be covered in detail later on

1.4.1 Numerical Programming

Fundamental matrix and array processing capabilities are provided by the excellent NumPy
library
NumPy provides the basic array data type plus some simple processing operations
For example, letโ€™s build some arrays

In [1]: import numpy as np # Load the library

a = np.linspace(-np.pi, np.pi, 100) # Create even grid from -ฯ€ to ฯ€


b = np.cos(a) # Apply cosine to each element of a
c = np.sin(a) # Apply sin to each element of a

Now letโ€™s take the inner product:

In [2]: b @ c

Out[2]: 1.5265566588595902e-16

The number you see here might vary slightly but itโ€™s essentially zero
(For older versions of Python and NumPy you need to use the np.dot function)
The SciPy library is built on top of NumPy and provides additional functionality
2
For example, letโ€™s calculate โˆซโˆ’2 ๐œ™(๐‘ง)๐‘‘๐‘ง where ๐œ™ is the standard normal density
1.4. SCIENTIFIC PROGRAMMING 7

In [3]: from scipy.stats import norm


from scipy.integrate import quad

๏ฟฝ = norm()
value, error = quad(๏ฟฝ.pdf, -2, 2) # Integrate using Gaussian quadrature
value

Out[3]: 0.9544997361036417

SciPy includes many of the standard routines used in

โ€ข linear algebra
โ€ข integration
โ€ข interpolation
โ€ข optimization
โ€ข distributions and random number generation
โ€ข signal processing
โ€ข etc., etc.

1.4.2 Graphics

The most popular and comprehensive Python library for creating figures and graphs is Mat-
plotlib

โ€ข Plots, histograms, contour images, 3D, bar charts, etc., etc.


โ€ข Output in many formats (PDF, PNG, EPS, etc.)
โ€ข LaTeX integration

Example 2D plot with embedded LaTeX annotations

Example contour plot


8 1. ABOUT PYTHON

Example 3D plot

More examples can be found in the Matplotlib thumbnail gallery


Other graphics libraries include

โ€ข Plotly
โ€ข Bokeh
โ€ข VPython โ€” 3D graphics and animations
1.4. SCIENTIFIC PROGRAMMING 9

1.4.3 Symbolic Algebra

Itโ€™s useful to be able to manipulate symbolic expressions, as in Mathematica or Maple


The SymPy library provides this functionality from within the Python shell

In [4]: from sympy import Symbol

x, y = Symbol('x'), Symbol('y') # Treat 'x' and 'y' as algebraic symbols


x + x + x + y

Out[4]: 3*x + y

We can manipulate expressions

In [5]: expression = (x + y)**2


expression.expand()

Out[5]: x**2 + 2*x*y + y**2

solve polynomials

In [6]: from sympy import solve

solve(x**2 + x + 2)

Out[6]: [-1/2 - sqrt(7)*I/2, -1/2 + sqrt(7)*I/2]

and calculate limits, derivatives and integrals

In [7]: from sympy import limit, sin, diff

limit(1 / x, x, 0)

Out[7]: oo

In [8]: limit(sin(x) / x, x, 0)

Out[8]: 1

In [9]: diff(sin(x), x)

Out[9]: cos(x)

The beauty of importing this functionality into Python is that we are working within a fully
fledged programming language
Can easily create tables of derivatives, generate LaTeX output, add it to figures, etc., etc.
10 1. ABOUT PYTHON

1.4.4 Statistics

Pythonโ€™s data manipulation and statistics libraries have improved rapidly over the last few
years
Pandas
One of the most popular libraries for working with data is pandas
Pandas is fast, efficient, flexible and well designed
Hereโ€™s a simple example, using some fake data

In [10]: import pandas as pd


np.random.seed(1234)

data = np.random.randn(5, 2) # 5x2 matrix of N(0, 1) random draws


dates = pd.date_range('28/12/2010', periods=5)

df = pd.DataFrame(data, columns=('price', 'weight'), index=dates)


print(df)

price weight
2010-12-28 0.471435 -1.190976
2010-12-29 1.432707 -0.312652
2010-12-30 -0.720589 0.887163
2010-12-31 0.859588 -0.636524
2011-01-01 0.015696 -2.242685

In [11]: df.mean()

Out[11]: price 0.411768


weight -0.699135
dtype: float64

Other Useful Statistics Libraries


- statsmodels โ€” various statistical routines
- scikit-learn โ€” machine learning in Python (sponsored by Google, among others)
- pyMC โ€” for Bayesian data analysis
- pystan Bayesian analysis based on stan

1.4.5 Networks and Graphs

Python has many libraries for studying graphs


One well-known example is NetworkX

โ€ข Standard graph algorithms for analyzing network structure, etc.


โ€ข Plotting routines
โ€ข etc., etc.

Hereโ€™s some example code that generates and plots a random graph, with node color deter-
mined by shortest path length from a central node
1.4. SCIENTIFIC PROGRAMMING 11

In [12]: import networkx as nx


import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(1234)

# Generate a random graph


p = dict((i,(np.random.uniform(0, 1),np.random.uniform(0, 1))) for i in range(200))
G = nx.random_geometric_graph(200, 0.12, pos=p)
pos = nx.get_node_attributes(G, 'pos')

# find node nearest the center point (0.5, 0.5)


dists = [(x - 0.5)**2 + (y - 0.5)**2 for x, y in list(pos.values())]
ncenter = np.argmin(dists)

# Plot graph, coloring by path length from central node


p = nx.single_source_shortest_path_length(G, ncenter)
plt.figure()
nx.draw_networkx_edges(G, pos, alpha=0.4)
nx.draw_networkx_nodes(G,
pos,
nodelist=list(p.keys()),
node_size=120, alpha=0.5,
node_color=list(p.values()),
cmap=plt.cm.jet_r)
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:611: MatplotlibDeprecationWarnin
if cb.is_numlike(alpha):

1.4.6 Cloud Computing

Running your Python code on massive servers in the cloud is becoming easier and easier
A nice example is Anaconda Enterprise
12 1. ABOUT PYTHON

See also
- Amazon Elastic Compute Cloud
- The Google App Engine (Python, Java, PHP or Go)
- Pythonanywhere
- Sagemath Cloud

1.4.7 Parallel Processing

Apart from the cloud computing options listed above, you might like to consider
- Parallel computing through IPython clusters
- The Starcluster interface to Amazonโ€™s EC2
- GPU programming through PyCuda, PyOpenCL, Theano or similar

1.4.8 Other Developments

There are many other interesting developments with scientific programming in Python
Some representative examples include
- Jupyter โ€” Python in your browser with code cells, embedded images, etc.
- Numba โ€” Make Python run at the same speed as native machine code!
- Blaze โ€” a generalization of NumPy
- PyTables โ€” manage large data sets
- CVXPY โ€” convex optimization in Python

1.5 Learn More

โ€ข Browse some Python projects on GitHub


โ€ข Have a look at some of the Jupyter notebooks people have shared on various scientific
topics

- Visit the Python Package Index


- View some of the questions people are asking about Python on Stackoverflow
- Keep up to date on whatโ€™s happening in the Python community with the Python subreddit
2

Setting up Your Python


Environment

2.1 Contents

โ€ข Overview 2.2

โ€ข Anaconda 2.3

โ€ข Jupyter Notebooks 2.4

โ€ข Installing Libraries 2.5

โ€ข Working with Files 2.6

โ€ข Editors and IDEs 2.7

โ€ข Exercises 2.8

2.2 Overview

In this lecture, you will learn how to

1. get a Python environment up and running with all the necessary tools
2. execute simple Python commands
3. run a sample program
4. install the code libraries that underpin these lectures

2.3 Anaconda

The core Python package is easy to install but not what you should choose for these lectures
These lectures require the entire scientific programming ecosystem, which

โ€ข the core installation doesnโ€™t provide


โ€ข is painful to install one piece at a time

13
14 2. SETTING UP YOUR PYTHON ENVIRONMENT

Hence the best approach for our purposes is to install a free Python distribution that contains

1. the core Python language and


2. the most popular scientific libraries

The best such distribution is Anaconda


Anaconda is

โ€ข very popular
โ€ข cross platform
โ€ข comprehensive
โ€ข completely unrelated to the Nicki Minaj song of the same name

Anaconda also comes with a great package management system to organize your code li-
braries
All of what follows assumes that you adopt this recommendation!

2.3.1 Installing Anaconda

Installing Anaconda is straightforward: download the binary and follow the instructions
Important points:

โ€ข Install the latest version


โ€ข If you are asked during the installation process whether youโ€™d like to make Anaconda
your default Python installation, say yes
โ€ข Otherwise, you can accept all of the defaults

2.3.2 Updating Anaconda

Anaconda supplies a tool called conda to manage and upgrade your Anaconda packages
One conda command you should execute regularly is the one that updates the whole Ana-
conda distribution
As a practice run, please execute the following

1. Open up a terminal
2. Type conda update anaconda

For more information on conda, type conda help in a terminal

2.4 Jupyter Notebooks

Jupyter notebooks are one of the many possible ways to interact with Python and the scien-
tific libraries
They use a browser-based interface to Python with
2.4. JUPYTER NOTEBOOKS 15

โ€ข The ability to write and execute Python commands


โ€ข Formatted output in the browser, including tables, figures, animation, etc.
โ€ข The option to mix in formatted text and mathematical expressions

Because of these possibilities, Jupyter is fast turning into a major player in the scientific com-
puting ecosystem
Hereโ€™s an image showing execution of some code (borrowed from here) in a Jupyter notebook

You can find a nice example of the kinds of things you can do in a Jupyter notebook (such as
include maths and text) here
While Jupyter isnโ€™t the only way to code in Python, itโ€™s great for when you wish to

โ€ข start coding in Python


โ€ข test new ideas or interact with small pieces of code
โ€ข share or collaborate scientific ideas with students or colleagues

These lectures are designed for executing in Jupyter notebooks


16 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.4.1 Starting the Jupyter Notebook

Once you have installed Anaconda, you can start the Jupyter notebook
Either

โ€ข search for Jupyter in your applications menu, or

โ€ข open up a terminal and type jupyter notebook

โ€“ Windows users should substitute โ€œAnaconda command promptโ€ for โ€œterminalโ€ in


the previous line

If you use the second option, you will see something like this (click to enlarge)

The output tells us the notebook is running at http://localhost:8888/

โ€ข localhost is the name of the local machine


โ€ข 8888 refers to port number 8888 on your computer

Thus, the Jupyter kernel is listening for Python commands on port 8888 of our local machine
Hopefully, your default browser has also opened up with a web page that looks something like
this (click to enlarge)
2.4. JUPYTER NOTEBOOKS 17

What you see here is called the Jupyter dashboard


If you look at the URL at the top, it should be localhost:8888 or similar, matching the
message above
Assuming all this has worked OK, you can now click on New at the top right and select
Python 3 or similar
Hereโ€™s what shows up on our machine:
18 2. SETTING UP YOUR PYTHON ENVIRONMENT

The notebook displays an active cell, into which you can type Python commands

2.4.2 Notebook Basics

Letโ€™s start with how to edit code and run simple programs
Running Cells
Notice that in the previous figure the cell is surrounded by a green border
This means that the cell is in edit mode
As a result, you can type in Python code and it will appear in the cell
When youโ€™re ready to execute the code in a cell, hit Shift-Enter instead of the usual En-
ter
2.4. JUPYTER NOTEBOOKS 19

(Note: There are also menu and button options for running code in a cell that you can find
by exploring)
Modal Editing
The next thing to understand about the Jupyter notebook is that it uses a modal editing sys-
tem
This means that the effect of typing at the keyboard depends on which mode you are in
The two modes are

1. Edit mode

โ€ข Indicated by a green border around one cell


โ€ข Whatever you type appears as is in that cell

1. Command mode

โ€ข The green border is replaced by a grey border


โ€ข Key strokes are interpreted as commands โ€” for example, typing b adds a new cell be-
low the current one

To switch to

โ€ข command mode from edit mode, hit the Esc key or Ctrl-M
20 2. SETTING UP YOUR PYTHON ENVIRONMENT

โ€ข edit mode from command mode, hit Enter or click in a cell

The modal behavior of the Jupyter notebook is a little tricky at first but very efficient when
you get used to it
User Interface Tour
At this stage, we recommend you take your time to

โ€ข look at the various options in the menus and see what they do
โ€ข take the โ€œuser interface tourโ€, which can be accessed through the help menu

Inserting Unicode (e.g., Greek Letters)


Python 3 introduced support for unicode characters, allowing the use of characters such as ๏ฟฝ
and ๏ฟฝ in your code
Unicode characters can be typed quickly in Jupyter using the tab key
Try creating a new code cell and typing ๏ฟฝ, then hitting the tab key on your keyboard
A Test Program
Letโ€™s run a test program
Hereโ€™s an arbitrary program we can use: http://matplotlib.org/1.4.1/examples/
pie_and_polar_charts/polar_bar_demo.html
On that page, youโ€™ll see the following code

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

N = 20
ฮธ = np.linspace(0.0, 2 * np.pi, N, endpoint=False)
radii = 10 * np.random.rand(N)
width = np.pi / 4 * np.random.rand(N)

ax = plt.subplot(111, polar=True)
bars = ax.bar(ฮธ, radii, width=width, bottom=0.0)

# Use custom colors and opacity


for r, bar in zip(radii, bars):
bar.set_facecolor(plt.cm.jet(r / 10.))
bar.set_alpha(0.5)

plt.show()
2.4. JUPYTER NOTEBOOKS 21

Donโ€™t worry about the details for now โ€” letโ€™s just run it and see what happens
The easiest way to run this code is to copy and paste into a cell in the notebook
(In older versions of Jupyter you might need to add the command %matplotlib inline
before you generate the figure)

2.4.3 Working with the Notebook

Here are a few more tips on working with Jupyter notebooks


Tab Completion
In the previous program, we executed the line import numpy as np

โ€ข NumPy is a numerical library weโ€™ll work with in depth

After this import command, functions in NumPy can be accessed with


np.<function_name> type syntax

โ€ข For example, try np.random.randn(3)

We can explore these attributes of np using the Tab key


For example, here we type np.ran and hit Tab (click to enlarge)
22 2. SETTING UP YOUR PYTHON ENVIRONMENT

Jupyter offers up the two possible completions, random and rank


In this way, the Tab key helps remind you of whatโ€™s available and also saves you typing
On-Line Help
To get help on np.rank, say, we can execute np.rank?
Documentation appears in a split window of the browser, like so
2.4. JUPYTER NOTEBOOKS 23

Clicking on the top right of the lower split closes the on-line help
Other Content
In addition to executing code, the Jupyter notebook allows you to embed text, equations, fig-
ures and even videos in the page
For example, here we enter a mixture of plain text and LaTeX instead of code
24 2. SETTING UP YOUR PYTHON ENVIRONMENT

Next we Esc to enter command mode and then type m to indicate that we are writing Mark-
down, a mark-up language similar to (but simpler than) LaTeX
(You can also use your mouse to select Markdown from the Code drop-down box just below
the list of menu items)
Now we Shift+Enter to produce this
2.4. JUPYTER NOTEBOOKS 25

2.4.4 Sharing Notebooks

Notebook files are just text files structured in JSON and typically ending with .ipynb
You can share them in the usual way that you share files โ€” or by using web services such as
nbviewer
The notebooks you see on that site are static html representations
To run one, download it as an ipynb file by clicking on the download icon at the top right
Save it somewhere, navigate to it from the Jupyter dashboard and then run as discussed
above

2.4.5 QuantEcon Notes

QuantEcon has its own site for sharing Jupyter notebooks related to economics โ€“ QuantEcon
Notes
Notebooks submitted to QuantEcon Notes can be shared with a link, and are open to com-
ments and votes by the community
26 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.5 Installing Libraries

Most of the libraries we need come in Anaconda


Other libraries can be installed with pip
One library weโ€™ll be using is QuantEcon.py
You can install QuantEcon.py by starting Jupyter and typing

!pip install quantecon

into a cell
Alternatively, you can type the following into a terminal

pip install quantecon

More instructions can be found on the library page


To upgrade to the latest version, which you should do regularly, use

pip install --upgrade quantecon

Another library we will be using is interpolation.py


This can be installed by typing in Jupyter

!pip install interpolation

2.6 Working with Files

How does one run a locally saved Python file?


There are a number of ways to do this but letโ€™s focus on methods using Jupyter notebooks

2.6.1 Option 1: Copy and Paste

The steps are:

1. Navigate to your file with your mouse/trackpad using a file browser


2. Click on your file to open it with a text editor
3. Copy and paste into a cell and Shift-Enter

2.6.2 Method 2: Run

Using the run command is often easier than copy and paste

โ€ข For example, %run test.py will run the file test.py


2.6. WORKING WITH FILES 27

(You might find that the % is unnecessary โ€” use %automagic to toggle the need for %)
Note that Jupyter only looks for test.py in the present working directory (PWD)
If test.py isnโ€™t in that directory, you will get an error
Letโ€™s look at a successful example, where we run a file test.py with contents:

In [2]: for i in range(5):


print('foobar')

foobar
foobar
foobar
foobar
foobar

Hereโ€™s the notebook (click to enlarge)

Here

โ€ข pwd asks Jupyter to show the PWD (or %pwd โ€” see the comment about automagic
above)

โ€“ This is where Jupyter is going to look for files to run


โ€“ Your output will look a bit different depending on your OS

โ€ข ls asks Jupyter to list files in the PWD (or %ls)


28 2. SETTING UP YOUR PYTHON ENVIRONMENT

โ€“ Note that test.py is there (on our computer, because we saved it there earlier)

โ€ข cat test.py asks Jupyter to print the contents of test.py (or !type test.py on
Windows)

โ€ข run test.py runs the file and prints any output

2.6.3 But File X isnโ€™t in my PWD!

If youโ€™re trying to run a file not in the present working directory, youโ€™ll get an error
To fix this error you need to either

1. Shift the file into the PWD, or


2. Change the PWD to where the file lives

One way to achieve the first option is to use the Upload button

โ€ข The button is on the top level dashboard, where Jupyter first opened to
โ€ข Look where the pointer is in this picture

The second option can be achieved using the cd command

โ€ข On Windows it might look like this cd C:/Python27/Scripts/dir


โ€ข On Linux / OSX it might look like this cd /home/user/scripts/dir

Note: You can type the first letter or two of each directory name and then use the tab key to
expand

2.6.4 Loading Files

Itโ€™s often convenient to be able to see your code before you run it
2.7. EDITORS AND IDES 29

In the following example, we execute load white_noise_plot.py where


white_noise_plot.py is in the PWD
(Use %load if automagic is off)
Now the code from the file appears in a cell ready to execute

2.6.5 Saving Files

To save the contents of a cell as file foo.py

โ€ข put %%file foo.py as the first line of the cell


โ€ข Shift+Enter

Here %%file is an example of a cell magic

2.7 Editors and IDEs

The preceding discussion covers most of what you need to know to interact with this website
However, as you start to write longer programs, you might want to experiment with your
workflow
There are many different options and we mention them only in passing
30 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.7.1 JupyterLab

JupyterLab is an integrated development environment centered around Jupyter notebooks


It is available through Anaconda and will soon be made the default environment for Jupyter
notebooks
Reading the docs or searching for a recent YouTube video will give you more information

2.7.2 Text Editors

A text editor is an application that is specifically designed to work with text files โ€” such as
Python programs
Nothing beats the power and efficiency of a good text editor for working with program text
A good text editor will provide

โ€ข efficient text editing commands (e.g., copy, paste, search and replace)
โ€ข syntax highlighting, etc.

Among the most popular are Sublime Text and Atom


For a top quality open source text editor with a steeper learning curve, try Emacs
If you want an outstanding free text editor and donโ€™t mind a seemingly vertical learning
curve plus long days of pain and suffering while all your neural pathways are rewired, try
Vim

2.7.3 Text Editors Plus IPython Shell

A text editor is for writing programs


To run them you can continue to use Jupyter as described above
Another option is to use the excellent IPython shell
To use an IPython shell, open up a terminal and type ipython
You should see something like this
2.7. EDITORS AND IDES 31

The IPython shell has many of the features of the notebook: tab completion, color syntax,
etc.
It also has command history through the arrow key
The up arrow key to brings previously typed commands to the prompt
This saves a lot of typingโ€ฆ
Hereโ€™s one set up, on a Linux box, with

โ€ข a file being edited in Vim


โ€ข An IPython shell next to it, to run the file
32 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.7.4 IDEs

IDEs are Integrated Development Environments, which allow you to edit, execute and inter-
act with code from an integrated environment
One of the most popular in recent times is VS Code, which is now available via Anaconda
We hear good things about VS Code โ€” please tell us about your experiences on the forum

2.8 Exercises

2.8.1 Exercise 1

If Jupyter is still running, quit by using Ctrl-C at the terminal where you started it
Now launch again, but this time using jupyter notebook --no-browser
This should start the kernel without launching the browser
Note also the startup message: It should give you a URL such as
http://localhost:8888 where the notebook is running
Now

1. Start your browser โ€” or open a new tab if itโ€™s already running


2. Enter the URL from above (e.g. http://localhost:8888) in the address bar at the
top

You should now be able to run a standard Jupyter notebook session


This is an alternative way to start the notebook that can also be handy

2.8.2 Exercise 2

This exercise will familiarize you with git and GitHub


Git is a version control system โ€” a piece of software used to manage digital projects such as
code libraries
In many cases, the associated collections of files โ€” called repositories โ€” are stored on
GitHub
GitHub is a wonderland of collaborative coding projects
For example, it hosts many of the scientific libraries weโ€™ll be using later on, such as this one
Git is the underlying software used to manage these projects
Git is an extremely powerful tool for distributed collaboration โ€” for example, we use it to
share and synchronize all the source files for these lectures
There are two main flavors of Git

1. the plain vanilla command line Git version


2. the various point-and-click GUI versions
2.8. EXERCISES 33

โ€ข See, for example, the GitHub version

As an exercise, try

1. Installing Git
2. Getting a copy of QuantEcon.py using Git

For example, if youโ€™ve installed the command line version, open up a terminal and enter

git clone https://github.com/QuantEcon/QuantEcon.py

(This is just git clone in front of the URL for the repository)
Even better,

1. Sign up to GitHub
2. Look into โ€˜forkingโ€™ GitHub repositories (forking means making your own copy of a
GitHub repository, stored on GitHub)
3. Fork QuantEcon.py
4. Clone your fork to some local directory, make edits, commit them, and push them back
up to your forked GitHub repo
5. If you made a valuable improvement, send us a pull request!

For reading on these and other topics, try

โ€ข The official Git documentation


โ€ข Reading through the docs on GitHub
โ€ข Pro Git Book by Scott Chacon and Ben Straub
โ€ข One of the thousands of Git tutorials on the Net
34 2. SETTING UP YOUR PYTHON ENVIRONMENT
3

An Introductory Example

3.1 Contents

โ€ข Overview 3.2

โ€ข The Task: Plotting a White Noise Process 3.3

โ€ข Version 1 3.4

โ€ข Alternative Versions 3.5

โ€ข Exercises 3.6

โ€ข Solutions 3.7

Weโ€™re now ready to start learning the Python language itself


The level of this and the next few lectures will suit those with some basic knowledge of pro-
gramming
But donโ€™t give up if you have noneโ€”you are not excluded
You just need to cover a few of the fundamentals of programming before returning here
Good references for first time programmers include:

โ€ข The first 5 or 6 chapters of How to Think Like a Computer Scientist


โ€ข Automate the Boring Stuff with Python
โ€ข The start of Dive into Python 3

Note: These references offer help on installing Python but you should probably stick with the
method on our set up page
Youโ€™ll then have an outstanding scientific computing environment (Anaconda) and be ready
to move on to the rest of our course

3.2 Overview

In this lecture, we will write and then pick apart small Python programs

35
36 3. AN INTRODUCTORY EXAMPLE

The objective is to introduce you to basic Python syntax and data structures
Deeper concepts will be covered in later lectures

3.2.1 Prerequisites

The lecture on getting started with Python

3.3 The Task: Plotting a White Noise Process

Suppose we want to simulate and plot the white noise process ๐œ–0 , ๐œ–1 , โ€ฆ , ๐œ–๐‘‡ , where each draw
๐œ–๐‘ก is independent standard normal
In other words, we want to generate figures that look something like this:

Weโ€™ll do this in several different ways

3.4 Version 1

Here are a few lines of code that perform the task we set

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

x = np.random.randn(100)
plt.plot(x)
plt.show()
3.4. VERSION 1 37

Letโ€™s break this program down and see how it works

3.4.1 Import Statements

The first two lines of the program import functionality


The first line imports NumPy, a favorite Python package for tasks like

โ€ข working with arrays (vectors and matrices)


โ€ข common mathematical functions like cos and sqrt
โ€ข generating random numbers
โ€ข linear algebra, etc.

After import numpy as np we have access to these attributes via the syntax np.
Hereโ€™s another example

In [2]: import numpy as np

np.sqrt(4)

Out[2]: 2.0

We could also just write

In [3]: import numpy

numpy.sqrt(4)

Out[3]: 2.0
38 3. AN INTRODUCTORY EXAMPLE

But the former method is convenient and more standard


Why all the Imports?
Remember that Python is a general-purpose language
The core language is quite small so itโ€™s easy to learn and maintain
When you want to do something interesting with Python, you almost always need to import
additional functionality
Scientific work in Python is no exception
Most of our programs start off with lines similar to the import statements seen above
Packages
As stated above, NumPy is a Python package
Packages are used by developers to organize a code library
In fact, a package is just a directory containing

1. files with Python code โ€” called modules in Python speak


2. possibly some compiled code that can be accessed by Python (e.g., functions compiled
from C or FORTRAN code)
3. a file called __init__.py that specifies what will be executed when we type import
package_name

In fact, you can find and explore the directory for NumPy on your computer easily enough if
you look around
On this machine, itโ€™s located in

anaconda3/lib/python3.6/site-packages/numpy

Subpackages
Consider the line x = np.random.randn(100)
Here np refers to the package NumPy, while random is a subpackage of NumPy
You can see the contents here
Subpackages are just packages that are subdirectories of another package

3.4.2 Importing Names Directly

Recall this code that we saw above

In [4]: import numpy as np

np.sqrt(4)

Out[4]: 2.0

Hereโ€™s another way to access NumPyโ€™s square root function


3.5. ALTERNATIVE VERSIONS 39

In [5]: from numpy import sqrt

sqrt(4)

Out[5]: 2.0

This is also fine


The advantage is less typing if we use sqrt often in our code
The disadvantage is that, in a long program, these two lines might be separated by many
other lines
Then itโ€™s harder for readers to know where sqrt came from, should they wish to

3.5 Alternative Versions

Letโ€™s try writing some alternative versions of our first program


Our aim in doing this is to illustrate some more Python syntax and semantics
The programs below are less efficient but

โ€ข help us understand basic constructs like loops


โ€ข illustrate common data types like lists

3.5.1 A Version with a For Loop

Hereโ€™s a version that illustrates loops and Python lists

In [6]: ts_length = 100


๏ฟฝ_values = [] # Empty list

for i in range(ts_length):
e = np.random.randn()
๏ฟฝ_values.append(e)

plt.plot(๏ฟฝ_values)
plt.show()
40 3. AN INTRODUCTORY EXAMPLE

In brief,

โ€ข The first pair of lines import functionality as before


โ€ข The next line sets the desired length of the time series
โ€ข The next line creates an empty list called ๏ฟฝ_values that will store the ๐œ–๐‘ก values as we
generate them
โ€ข The next three lines are the for loop, which repeatedly draws a new random number ๐œ–๐‘ก
and appends it to the end of the list ๏ฟฝ_values
โ€ข The last two lines generate the plot and display it to the user

Letโ€™s study some parts of this program in more detail

3.5.2 Lists

Consider the statement ๏ฟฝ_values = [], which creates an empty list


Lists are a native Python data structure used to group a collection of objects
For example, try

In [7]: x = [10, 'foo', False] # We can include heterogeneous data inside a list
type(x)

Out[7]: list

The first element of x is an integer, the next is a string and the third is a Boolean value
When adding a value to a list, we can use the syntax list_name.append(some_value)

In [8]: x
3.5. ALTERNATIVE VERSIONS 41

Out[8]: [10, 'foo', False]

In [9]: x.append(2.5)
x

Out[9]: [10, 'foo', False, 2.5]

Here append() is whatโ€™s called a method, which is a function โ€œattached toโ€ an objectโ€”in
this case, the list x
Weโ€™ll learn all about methods later on, but just to give you some idea,

โ€ข Python objects such as lists, strings, etc. all have methods that are used to manipulate
the data contained in the object
โ€ข String objects have string methods, list objects have list methods, etc.

Another useful list method is pop()

In [10]: x

Out[10]: [10, 'foo', False, 2.5]

In [11]: x.pop()

Out[11]: 2.5

In [12]: x

Out[12]: [10, 'foo', False]

The full set of list methods can be found here


Following C, C++, Java, etc., lists in Python are zero-based

In [13]: x

Out[13]: [10, 'foo', False]

In [14]: x[0]

Out[14]: 10

In [15]: x[1]

Out[15]: 'foo'
42 3. AN INTRODUCTORY EXAMPLE

3.5.3 The For Loop

Now letโ€™s consider the for loop from the program above, which was

In [16]: for i in range(ts_length):


e = np.random.randn()
๏ฟฝ_values.append(e)

Python executes the two indented lines ts_length times before moving on
These two lines are called a code block, since they comprise the โ€œblockโ€ of code that we
are looping over
Unlike most other languages, Python knows the extent of the code block only from indenta-
tion
In our program, indentation decreases after line ๏ฟฝ_values.append(e), telling Python that
this line marks the lower limit of the code block
More on indentation belowโ€”for now, letโ€™s look at another example of a for loop

In [17]: animals = ['dog', 'cat', 'bird']


for animal in animals:
print("The plural of " + animal + " is " + animal + "s")

The plural of dog is dogs


The plural of cat is cats
The plural of bird is birds

This example helps to clarify how the for loop works: When we execute a loop of the form

for variable_name in sequence:


<code block>

The Python interpreter performs the following:

โ€ข For each element of the sequence, it โ€œbindsโ€ the name variable_name to that ele-
ment and then executes the code block

The sequence object can in fact be a very general object, as weโ€™ll see soon enough

3.5.4 Code Blocks and Indentation

In discussing the for loop, we explained that the code blocks being looped over are delimited
by indentation
In fact, in Python, all code blocks (i.e., those occurring inside loops, if clauses, function defi-
nitions, etc.) are delimited by indentation
Thus, unlike most other languages, whitespace in Python code affects the output of the pro-
gram
Once you get used to it, this is a good thing: It
3.5. ALTERNATIVE VERSIONS 43

โ€ข forces clean, consistent indentation, improving readability


โ€ข removes clutter, such as the brackets or end statements used in other languages

On the other hand, it takes a bit of care to get right, so please remember:

โ€ข The line before the start of a code block always ends in a colon

โ€“ for i in range(10):
โ€“ if x > y:
โ€“ while x < 100:
โ€“ etc., etc.

โ€ข All lines in a code block must have the same amount of indentation

โ€ข The Python standard is 4 spaces, and thatโ€™s what you should use

Tabs vs Spaces
One small โ€œgotchaโ€ here is the mixing of tabs and spaces, which often leads to errors
(Important: Within text files, the internal representation of tabs and spaces is not the same)
You can use your Tab key to insert 4 spaces, but you need to make sure itโ€™s configured to do
so
If you are using a Jupyter notebook you will have no problems here
Also, good text editors will allow you to configure the Tab key to insert spaces instead of tabs
โ€” trying searching online

3.5.5 While Loops

The for loop is the most common technique for iteration in Python
But, for the purpose of illustration, letโ€™s modify the program above to use a while loop in-
stead

In [18]: ts_length = 100


๏ฟฝ_values = []
i = 0
while i < ts_length:
e = np.random.randn()
๏ฟฝ_values.append(e)
i = i + 1
plt.plot(๏ฟฝ_values)
plt.show()
44 3. AN INTRODUCTORY EXAMPLE

Note that

โ€ข the code block for the while loop is again delimited only by indentation
โ€ข the statement i = i + 1 can be replaced by i += 1

3.5.6 User-Defined Functions

Now letโ€™s go back to the for loop, but restructure our program to make the logic clearer
To this end, we will break our program into two parts:

1. A user-defined function that generates a list of random variables

2. The main part of the program that

3. calls this function to get data

4. plots the data

This is accomplished in the next program

In [19]: def generate_data(n):


๏ฟฝ_values = []
for i in range(n):
e = np.random.randn()
๏ฟฝ_values.append(e)
return ๏ฟฝ_values

data = generate_data(100)
plt.plot(data)
plt.show()
3.5. ALTERNATIVE VERSIONS 45

Letโ€™s go over this carefully, in case youโ€™re not familiar with functions and how they work
We have defined a function called generate_data() as follows

โ€ข def is a Python keyword used to start function definitions


โ€ข def generate_data(n): indicates that the function is called generate_data and
that it has a single argument n
โ€ข The indented code is a code block called the function bodyโ€”in this case, it creates an
IID list of random draws using the same logic as before
โ€ข The return keyword indicates that ๏ฟฝ_values is the object that should be returned to
the calling code

This whole function definition is read by the Python interpreter and stored in memory
When the interpreter gets to the expression generate_data(100), it executes the function
body with n set equal to 100
The net result is that the name data is bound to the list ๏ฟฝ_values returned by the function

3.5.7 Conditions

Our function generate_data() is rather limited


Letโ€™s make it slightly more useful by giving it the ability to return either standard normals or
uniform random variables on (0, 1) as required
This is achieved the next piece of code

In [20]: def generate_data(n, generator_type):


๏ฟฝ_values = []
for i in range(n):
if generator_type == 'U':
e = np.random.uniform(0, 1)
46 3. AN INTRODUCTORY EXAMPLE

else:
e = np.random.randn()
๏ฟฝ_values.append(e)
return ๏ฟฝ_values

data = generate_data(100, 'U')


plt.plot(data)
plt.show()

Hopefully, the syntax of the if/else clause is self-explanatory, with indentation again delimit-
ing the extent of the code blocks
Notes

โ€ข We are passing the argument U as a string, which is why we write it as 'U'

โ€ข Notice that equality is tested with the == syntax, not =

โ€“ For example, the statement a = 10 assigns the name a to the value 10


โ€“ The expression a == 10 evaluates to either True or False, depending on the
value of a

Now, there are several ways that we can simplify the code above
For example, we can get rid of the conditionals all together by just passing the desired gener-
ator type as a function
To understand this, consider the following version

In [21]: def generate_data(n, generator_type):


๏ฟฝ_values = []
for i in range(n):
e = generator_type()
๏ฟฝ_values.append(e)
return ๏ฟฝ_values
3.5. ALTERNATIVE VERSIONS 47

data = generate_data(100, np.random.uniform)


plt.plot(data)
plt.show()

Now, when we call the function generate_data(), we pass np.random.uniform as the


second argument
This object is a function
When the function call generate_data(100, np.random.uniform) is executed,
Python runs the function code block with n equal to 100 and the name generator_type
โ€œboundโ€ to the function np.random.uniform

โ€ข While these lines are executed, the names generator_type and


np.random.uniform are โ€œsynonymsโ€, and can be used in identical ways

This principle works more generallyโ€”for example, consider the following piece of code

In [22]: max(7, 2, 4) # max() is a built-in Python function

Out[22]: 7

In [23]: m = max
m(7, 2, 4)

Out[23]: 7

Here we created another name for the built-in function max(), which could then be used in
identical ways
In the context of our program, the ability to bind new names to functions means that there is
no problem passing a function as an argument to another functionโ€”as we did above
48 3. AN INTRODUCTORY EXAMPLE

3.5.8 List Comprehensions

We can also simplify the code for generating the list of random draws considerably by using
something called a list comprehension
List comprehensions are an elegant Python tool for creating lists
Consider the following example, where the list comprehension is on the right-hand side of the
second line

In [24]: animals = ['dog', 'cat', 'bird']


plurals = [animal + 's' for animal in animals]
plurals

Out[24]: ['dogs', 'cats', 'birds']

Hereโ€™s another example

In [25]: range(8)

Out[25]: range(0, 8)

In [26]: doubles = [2 * x for x in range(8)]


doubles

Out[26]: [0, 2, 4, 6, 8, 10, 12, 14]

With the list comprehension syntax, we can simplify the lines

๏ฟฝ_values = []
for i in range(n):
e = generator_type()
๏ฟฝ_values.append(e)

into

๏ฟฝ_values = [generator_type() for i in range(n)]

3.6 Exercises

3.6.1 Exercise 1

Recall that ๐‘›! is read as โ€œ๐‘› factorialโ€ and defined as ๐‘›! = ๐‘› ร— (๐‘› โˆ’ 1) ร— โ‹ฏ ร— 2 ร— 1


There are functions to compute this in various modules, but letโ€™s write our own version as an
exercise
In particular, write a function factorial such that factorial(n) returns ๐‘›! for any posi-
tive integer ๐‘›
3.6. EXERCISES 49

3.6.2 Exercise 2

The binomial random variable ๐‘Œ โˆผ ๐ต๐‘–๐‘›(๐‘›, ๐‘) represents the number of successes in ๐‘› binary
trials, where each trial succeeds with probability ๐‘
Without any import besides from numpy.random import uniform, write a function
binomial_rv such that binomial_rv(n, p) generates one draw of ๐‘Œ
Hint: If ๐‘ˆ is uniform on (0, 1) and ๐‘ โˆˆ (0, 1), then the expression U < p evaluates to True
with probability ๐‘

3.6.3 Exercise 3

Compute an approximation to ๐œ‹ using Monte Carlo. Use no imports besides

In [27]: import numpy as np

Your hints are as follows:

โ€ข If ๐‘ˆ is a bivariate uniform random variable on the unit square (0, 1)2 , then the proba-
bility that ๐‘ˆ lies in a subset ๐ต of (0, 1)2 is equal to the area of ๐ต
โ€ข If ๐‘ˆ1 , โ€ฆ , ๐‘ˆ๐‘› are IID copies of ๐‘ˆ , then, as ๐‘› gets large, the fraction that falls in ๐ต, con-
verges to the probability of landing in ๐ต
โ€ข For a circle, area = pi * radius^2

3.6.4 Exercise 4

Write a program that prints one realization of the following random device:

โ€ข Flip an unbiased coin 10 times


โ€ข If 3 consecutive heads occur one or more times within this sequence, pay one dollar
โ€ข If not, pay nothing

Use no import besides from numpy.random import uniform

3.6.5 Exercise 5

Your next task is to simulate and plot the correlated time series

๐‘ฅ๐‘ก+1 = ๐›ผ ๐‘ฅ๐‘ก + ๐œ–๐‘ก+1 where ๐‘ฅ0 = 0 and ๐‘ก = 0, โ€ฆ , ๐‘‡

The sequence of shocks {๐œ–๐‘ก } is assumed to be IID and standard normal


In your solution, restrict your import statements to

In [28]: import numpy as np


import matplotlib.pyplot as plt

Set ๐‘‡ = 200 and ๐›ผ = 0.9


50 3. AN INTRODUCTORY EXAMPLE

3.6.6 Exercise 6

To do the next exercise, you will need to know how to produce a plot legend
The following example should be sufficient to convey the idea

In [29]: import numpy as np


import matplotlib.pyplot as plt

x = [np.random.randn() for i in range(100)]


plt.plot(x, label="white noise")
plt.legend()
plt.show()

Now, starting with your solution to exercise 5, plot three simulated time series, one for each
of the cases ๐›ผ = 0, ๐›ผ = 0.8 and ๐›ผ = 0.98
In particular, you should produce (modulo randomness) a figure that looks as follows
3.7. SOLUTIONS 51

(The figure nicely illustrates how time series with the same one-step-ahead conditional volatil-
ities, as these three processes have, can have very different unconditional volatilities.)
Use a for loop to step through the ๐›ผ values
Important hints:

โ€ข If you call the plot() function multiple times before calling show(), all of the lines
you produce will end up on the same figure

โ€“ And if you omit the argument 'b-' to the plot function, Matplotlib will automati-
cally select different colors for each line

โ€ข The expression 'foo' + str(42) evaluates to 'foo42'

3.7 Solutions

3.7.1 Exercise 1
In [30]: def factorial(n):
k = 1
for i in range(n):
k = k * (i + 1)
return k

factorial(4)

Out[30]: 24

3.7.2 Exercise 2
In [31]: from numpy.random import uniform
52 3. AN INTRODUCTORY EXAMPLE

def binomial_rv(n, p):


count = 0
for i in range(n):
U = uniform()
if U < p:
count = count + 1 # Or count += 1
return count

binomial_rv(10, 0.5)

Out[31]: 5

3.7.3 Exercise 3

Consider the circle of diameter 1 embedded in the unit square


Let ๐ด be its area and let ๐‘Ÿ = 1/2 be its radius
If we know ๐œ‹ then we can compute ๐ด via ๐ด = ๐œ‹๐‘Ÿ2
But here the point is to compute ๐œ‹, which we can do by ๐œ‹ = ๐ด/๐‘Ÿ2
Summary: If we can estimate the area of the unit circle, then dividing by ๐‘Ÿ2 = (1/2)2 = 1/4
gives an estimate of ๐œ‹
We estimate the area by sampling bivariate uniforms and looking at the fraction that falls
into the unit circle

In [32]: n = 100000

count = 0
for i in range(n):
u, v = np.random.uniform(), np.random.uniform()
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1

area_estimate = count / n

print(area_estimate * 4) # dividing by radius**2

3.13976

3.7.4 Exercise 4
In [33]: from numpy.random import uniform

payoff = 0
count = 0

for i in range(10):
U = uniform()
count = count + 1 if U < 0.5 else 0
if count == 3:
payoff = 1

print(payoff)

1
3.7. SOLUTIONS 53

3.7.5 Exercise 5

The next line embeds all subsequent figures in the browser itself

In [34]: ฮฑ = 0.9
ts_length = 200
current_x = 0

x_values = []
for i in range(ts_length + 1):
x_values.append(current_x)
current_x = ฮฑ * current_x + np.random.randn()
plt.plot(x_values)
plt.show()

3.7.6 Exercise 6

In [35]: ฮฑs = [0.0, 0.8, 0.98]


ts_length = 200

for ฮฑ in ฮฑs:
x_values = []
current_x = 0
for i in range(ts_length):
x_values.append(current_x)
current_x = ฮฑ * current_x + np.random.randn()
plt.plot(x_values, label=f'ฮฑ = {ฮฑ}')
plt.legend()
plt.show()
54 3. AN INTRODUCTORY EXAMPLE
4

Python Essentials

4.1 Contents

โ€ข Data Types 4.2

โ€ข Input and Output 4.3

โ€ข Iterating 4.4

โ€ข Comparisons and Logical Operators 4.5

โ€ข More Functions 4.6

โ€ข Coding Style and PEP8 4.7

โ€ข Exercises 4.8

โ€ข Solutions 4.9

In this lecture, weโ€™ll cover features of the language that are essential to reading and writing
Python code

4.2 Data Types

Weโ€™ve already met several built-in Python data types, such as strings, integers, floats and
lists
Letโ€™s learn a bit more about them

4.2.1 Primitive Data Types

One simple data type is Boolean values, which can be either True or False

In [1]: x = True
x

Out[1]: True

55
56 4. PYTHON ESSENTIALS

In the next line of code, the interpreter evaluates the expression on the right of = and binds y
to this value

In [2]: y = 100 < 10


y

Out[2]: False

In [3]: type(y)

Out[3]: bool

In arithmetic expressions, True is converted to 1 and False is converted 0


This is called Boolean arithmetic and is often useful in programming
Here are some examples

In [4]: x + y

Out[4]: 1

In [5]: x * y

Out[5]: 0

In [6]: True + True

Out[6]: 2

In [7]: bools = [True, True, False, True] # List of Boolean values

sum(bools)

Out[7]: 3

The two most common data types used to represent numbers are integers and floats

In [8]: a, b = 1, 2
c, d = 2.5, 10.0
type(a)

Out[8]: int

In [9]: type(c)

Out[9]: float

Computers distinguish between the two because, while floats are more informative, arithmetic
operations on integers are faster and more accurate
As long as youโ€™re using Python 3.x, division of integers yields floats

In [10]: 1 / 2
4.2. DATA TYPES 57

Out[10]: 0.5

But be careful! If youโ€™re still using Python 2.x, division of two integers returns only the inte-
ger part
For integer division in Python 3.x use this syntax:

In [11]: 1 // 2

Out[11]: 0

Complex numbers are another primitive data type in Python

In [12]: x = complex(1, 2)
y = complex(2, 1)
x * y

Out[12]: 5j

4.2.2 Containers

Python has several basic types for storing collections of (possibly heterogeneous) data
Weโ€™ve already discussed lists
A related data type is tuples, which are โ€œimmutableโ€ lists

In [13]: x = ('a', 'b') # Parentheses instead of the square brackets


x = 'a', 'b' # Or no brackets --- the meaning is identical
x

Out[13]: ('a', 'b')

In [14]: type(x)

Out[14]: tuple

In Python, an object is called immutable if, once created, the object cannot be changed
Conversely, an object is mutable if it can still be altered after creation
Python lists are mutable

In [15]: x = [1, 2]
x[0] = 10
x

Out[15]: [10, 2]

But tuples are not

In [16]: x = (1, 2)
x[0] = 10
58 4. PYTHON ESSENTIALS

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-16-d1b2647f6c81> in <module>
1 x = (1, 2)
----> 2 x[0] = 10

TypeError: 'tuple' object does not support item assignment

Weโ€™ll say more about the role of mutable and immutable data a bit later
Tuples (and lists) can be โ€œunpackedโ€ as follows

In [17]: integers = (10, 20, 30)


x, y, z = integers
x

Out[17]: 10

In [18]: y

Out[18]: 20

Youโ€™ve actually seen an example of this already


Tuple unpacking is convenient and weโ€™ll use it often
Slice Notation
To access multiple elements of a list or tuple, you can use Pythonโ€™s slice notation
For example,

In [19]: a = [2, 4, 6, 8]
a[1:]

Out[19]: [4, 6, 8]

In [20]: a[1:3]

Out[20]: [4, 6]

The general rule is that a[m:n] returns n - m elements, starting at a[m]


Negative numbers are also permissible

In [21]: a[-2:] # Last two elements of the list

Out[21]: [6, 8]

The same slice notation works on tuples and strings

In [22]: s = 'foobar'
s[-3:] # Select the last three elements
4.3. INPUT AND OUTPUT 59

Out[22]: 'bar'

Sets and Dictionaries


Two other container types we should mention before moving on are sets and dictionaries
Dictionaries are much like lists, except that the items are named instead of numbered

In [23]: d = {'name': 'Frodo', 'age': 33}


type(d)

Out[23]: dict

In [24]: d['age']

Out[24]: 33

The names 'name' and 'age' are called the keys


The objects that the keys are mapped to ('Frodo' and 33) are called the values
Sets are unordered collections without duplicates, and set methods provide the usual set-
theoretic operations

In [25]: s1 = {'a', 'b'}


type(s1)

Out[25]: set

In [26]: s2 = {'b', 'c'}


s1.issubset(s2)

Out[26]: False

In [27]: s1.intersection(s2)

Out[27]: {'b'}

The set() function creates sets from sequences

In [28]: s3 = set(('foo', 'bar', 'foo'))


s3

Out[28]: {'bar', 'foo'}

4.3 Input and Output

Letโ€™s briefly review reading and writing to text files, starting with writing

In [29]: f = open('newfile.txt', 'w') # Open 'newfile.txt' for writing


f.write('Testing\n') # Here '\n' means new line
f.write('Testing again')
f.close()
60 4. PYTHON ESSENTIALS

Here

โ€ข The built-in function open() creates a file object for writing to


โ€ข Both write() and close() are methods of file objects

Where is this file that weโ€™ve created?


Recall that Python maintains a concept of the present working directory (pwd) that can be
located from with Jupyter or IPython via

In [30]: %pwd

Out[30]: '/home/anju/Desktop/lecture-source-py/_build/jupyter/executed'

If a path is not specified, then this is where Python writes to


We can also use Python to read the contents of newline.txt as follows

In [31]: f = open('newfile.txt', 'r')


out = f.read()
out

Out[31]: 'Testing\nTesting again'

In [32]: print(out)

Testing
Testing again

4.3.1 Paths

Note that if newfile.txt is not in the present working directory then this call to open()
fails
In this case, you can shift the file to the pwd or specify the full path to the file

f = open('insert_full_path_to_file/newfile.txt', 'r')

4.4 Iterating

One of the most important tasks in computing is stepping through a sequence of data and
performing a given action
One of Pythonโ€™s strengths is its simple, flexible interface to this kind of iteration via the for
loop

4.4.1 Looping over Different Objects

Many Python objects are โ€œiterableโ€, in the sense that they can be looped over
To give an example, letโ€™s write the file us_cities.txt, which lists US cities and their popula-
tion, to the present working directory
4.4. ITERATING 61

In [33]: %%file us_cities.txt


new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229

Overwriting us_cities.txt

Suppose that we want to make the information more readable, by capitalizing names and
adding commas to mark thousands
The program us_cities.py program reads the data in and makes the conversion:

In [34]: data_file = open('us_cities.txt', 'r')


for line in data_file:
city, population = line.split(':') # Tuple unpacking
city = city.title() # Capitalize city names
population = f'{int(population):,}' # Add commas to numbers
print(city.ljust(15) + population)
data_file.close()

New York 8,244,910


Los Angeles 3,819,702
Chicago 2,707,120
Houston 2,145,146
Philadelphia 1,536,471
Phoenix 1,469,471
San Antonio 1,359,758
San Diego 1,326,179
Dallas 1,223,229

Here format() is a string method used for inserting variables into strings
The reformatting of each line is the result of three different string methods, the details of
which can be left till later
The interesting part of this program for us is line 2, which shows that

1. The file object f is iterable, in the sense that it can be placed to the right of in within
a for loop
2. Iteration steps through each line in the file

This leads to the clean, convenient syntax shown in our program


Many other kinds of objects are iterable, and weโ€™ll discuss some of them later on

4.4.2 Looping without Indices

One thing you might have noticed is that Python tends to favor looping without explicit in-
dexing
For example,
62 4. PYTHON ESSENTIALS

In [35]: x_values = [1, 2, 3] # Some iterable x


for x in x_values:
print(x * x)

1
4
9

is preferred to

In [36]: for i in range(len(x_values)):


print(x_values[i] * x_values[i])

1
4
9

When you compare these two alternatives, you can see why the first one is preferred
Python provides some facilities to simplify looping without indices
One is zip(), which is used for stepping through pairs from two sequences
For example, try running the following code

In [37]: countries = ('Japan', 'Korea', 'China')


cities = ('Tokyo', 'Seoul', 'Beijing')
for country, city in zip(countries, cities):
print(f'The capital of {country} is {city}')

The capital of Japan is Tokyo


The capital of Korea is Seoul
The capital of China is Beijing

The zip() function is also useful for creating dictionaries โ€” for example

In [38]: names = ['Tom', 'John']


marks = ['E', 'F']
dict(zip(names, marks))

Out[38]: {'Tom': 'E', 'John': 'F'}

If we actually need the index from a list, one option is to use enumerate()
To understand what enumerate() does, consider the following example

In [39]: letter_list = ['a', 'b', 'c']


for index, letter in enumerate(letter_list):
print(f"letter_list[{index}] = '{letter}'")

letter_list[0] = 'a'
letter_list[1] = 'b'
letter_list[2] = 'c'

The output of the loop is

In [40]: letter_list[0] = 'a'


letter_list[1] = 'b'
letter_list[2] = 'c'
4.5. COMPARISONS AND LOGICAL OPERATORS 63

4.5 Comparisons and Logical Operators

4.5.1 Comparisons

Many different kinds of expressions evaluate to one of the Boolean values (i.e., True or
False)
A common type is comparisons, such as

In [41]: x, y = 1, 2
x < y

Out[41]: True

In [42]: x > y

Out[42]: False

One of the nice features of Python is that we can chain inequalities

In [43]: 1 < 2 < 3

Out[43]: True

In [44]: 1 <= 2 <= 3

Out[44]: True

As we saw earlier, when testing for equality we use ==

In [45]: x = 1 # Assignment
x == 2 # Comparison

Out[45]: False

For โ€œnot equalโ€ use !=

In [46]: 1 != 2

Out[46]: True

Note that when testing conditions, we can use any valid Python expression

In [47]: x = 'yes' if 42 else 'no'


x

Out[47]: 'yes'

In [48]: x = 'yes' if [] else 'no'


x

Out[48]: 'no'
64 4. PYTHON ESSENTIALS

Whatโ€™s going on here?


The rule is:

โ€ข Expressions that evaluate to zero, empty sequences or containers (strings, lists, etc.)
and None are all equivalent to False

โ€“ for example, and () are equivalent to False in an if clause

โ€ข All other values are equivalent to True

โ€“ for example, 42 is equivalent to True in an if clause

4.5.2 Combining Expressions

We can combine expressions using and, or and not


These are the standard logical connectives (conjunction, disjunction and denial)

In [49]: 1 < 2 and 'f' in 'foo'

Out[49]: True

In [50]: 1 < 2 and 'g' in 'foo'

Out[50]: False

In [51]: 1 < 2 or 'g' in 'foo'

Out[51]: True

In [52]: not True

Out[52]: False

In [53]: not not True

Out[53]: True

Remember

โ€ข P and Q is True if both are True, else False


โ€ข P or Q is False if both are False, else True

4.6 More Functions

Letโ€™s talk a bit more about functions, which are all important for good programming style
Python has a number of built-in functions that are available without import
We have already met some
4.6. MORE FUNCTIONS 65

In [54]: max(19, 20)

Out[54]: 20

In [55]: range(4) # in python3 this returns a range iterator object

Out[55]: range(0, 4)

In [56]: list(range(4)) # will evaluate the range iterator and create a list

Out[56]: [0, 1, 2, 3]

In [57]: str(22)

Out[57]: '22'

In [58]: type(22)

Out[58]: int

Two more useful built-in functions are any() and all()

In [59]: bools = False, True, True


all(bools) # True if all are True and False otherwise

Out[59]: False

In [60]: any(bools) # False if all are False and True otherwise

Out[60]: True

The full list of Python built-ins is here


Now letโ€™s talk some more about user-defined functions constructed using the keyword def

4.6.1 Why Write Functions?

User-defined functions are important for improving the clarity of your code by

โ€ข separating different strands of logic


โ€ข facilitating code reuse

(Writing the same thing twice is almost always a bad idea)


The basics of user-defined functions were discussed here
66 4. PYTHON ESSENTIALS

4.6.2 The Flexibility of Python Functions

As we discussed in the previous lecture, Python functions are very flexible


In particular

โ€ข Any number of functions can be defined in a given file


โ€ข Functions can be (and often are) defined inside other functions
โ€ข Any object can be passed to a function as an argument, including other functions
โ€ข A function can return any kind of object, including functions

We already gave an example of how straightforward it is to pass a function to a function


Note that a function can have arbitrarily many return statements (including zero)
Execution of the function terminates when the first return is hit, allowing code like the fol-
lowing example

In [61]: def f(x):


if x < 0:
return 'negative'
return 'nonnegative'

Functions without a return statement automatically return the special Python object None

4.6.3 Docstrings

Python has a system for adding comments to functions, modules, etc. called docstrings
The nice thing about docstrings is that they are available at run-time
Try running this

In [62]: def f(x):


"""
This function squares its argument
"""
return x**2

After running this code, the docstring is available

In [63]: f?

Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Docstring: This function squares its argument

In [64]: f??

Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
4.6. MORE FUNCTIONS 67

Definition: f(x)
Source:
def f(x):
"""
This function squares its argument
"""
return x**2

With one question mark we bring up the docstring, and with two we get the source code as
well

4.6.4 One-Line Functions: lambda

The lambda keyword is used to create simple functions on one line


For example, the definitions

In [65]: def f(x):


return x**3

and

In [66]: f = lambda x: x**3

are entirely equivalent


2
To see why lambda is useful, suppose that we want to calculate โˆซ0 ๐‘ฅ3 ๐‘‘๐‘ฅ (and have forgotten
our high-school calculus)
The SciPy library has a function called quad that will do this calculation for us
The syntax of the quad function is quad(f, a, b) where f is a function and a and b are
numbers
To create the function ๐‘“(๐‘ฅ) = ๐‘ฅ3 we can use lambda as follows

In [67]: from scipy.integrate import quad

quad(lambda x: x**3, 0, 2)

Out[67]: (4.0, 4.440892098500626e-14)

Here the function created by lambda is said to be anonymous because it was never given a
name

4.6.5 Keyword Arguments

If you did the exercises in the previous lecture, you would have come across the statement

plt.plot(x, 'b-', label="white noise")


68 4. PYTHON ESSENTIALS

In this call to Matplotlibโ€™s plot function, notice that the last argument is passed in
name=argument syntax
This is called a keyword argument, with label being the keyword
Non-keyword arguments are called positional arguments, since their meaning is determined by
order

โ€ข plot(x, 'b-', label="white noise") is different from plot('b-', x, la-


bel="white noise")

Keyword arguments are particularly useful when a function has a lot of arguments, in which
case itโ€™s hard to remember the right order
You can adopt keyword arguments in user-defined functions with no difficulty
The next example illustrates the syntax

In [68]: def f(x, a=1, b=1):


return a + b * x

The keyword argument values we supplied in the definition of f become the default values

In [69]: f(2)

Out[69]: 3

They can be modified as follows

In [70]: f(2, a=4, b=5)

Out[70]: 14

4.7 Coding Style and PEP8

To learn more about the Python programming philosophy type import this at the prompt
Among other things, Python strongly favors consistency in programming style
Weโ€™ve all heard the saying about consistency and little minds
In programming, as in mathematics, the opposite is true

โ€ข A mathematical paper where the symbols โˆช and โˆฉ were reversed would be very hard to
read, even if the author told you so on the first page

In Python, the standard style is set out in PEP8


(Occasionally weโ€™ll deviate from PEP8 in these lectures to better match mathematical nota-
tion)
4.8. EXERCISES 69

4.8 Exercises

Solve the following exercises


(For some, the built-in function sum() comes in handy)

4.8.1 Exercise 1

Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute their
inner product using zip()
Part 2: In one line, count the number of even numbers in 0,โ€ฆ,99

โ€ข Hint: x % 2 returns 0 if x is even, 1 otherwise

Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of
pairs (a, b) such that both a and b are even

4.8.2 Exercise 2

Consider the polynomial

๐‘›
๐‘(๐‘ฅ) = ๐‘Ž0 + ๐‘Ž1 ๐‘ฅ + ๐‘Ž2 ๐‘ฅ2 + โ‹ฏ ๐‘Ž๐‘› ๐‘ฅ๐‘› = โˆ‘ ๐‘Ž๐‘– ๐‘ฅ๐‘– (1)
๐‘–=0

Write a function p such that p(x, coeff) that computes the value in Eq. (1) given a point
x and a list of coefficients coeff
Try to use enumerate() in your loop

4.8.3 Exercise 3

Write a function that takes a string as an argument and returns the number of capital letters
in the string
Hint: 'foo'.upper() returns 'FOO'

4.8.4 Exercise 4

Write a function that takes two sequences seq_a and seq_b as arguments and returns True
if every element in seq_a is also an element of seq_b, else False

โ€ข By โ€œsequenceโ€ we mean a list, a tuple or a string


โ€ข Do the exercise without using sets and set methods

4.8.5 Exercise 5

When we cover the numerical libraries, we will see they include many alternatives for interpo-
lation and function approximation
70 4. PYTHON ESSENTIALS

Nevertheless, letโ€™s write our own function approximation routine as an exercise


In particular, without using any imports, write a function linapprox that takes as argu-
ments

โ€ข A function f mapping some interval [๐‘Ž, ๐‘] into R


โ€ข two scalars a and b providing the limits of this interval
โ€ข An integer n determining the number of grid points
โ€ข A number x satisfying a <= x <= b

and returns the piecewise linear interpolation of f at x, based on n evenly spaced grid points
a = point[0] < point[1] < ... < point[n-1] = b
Aim for clarity, not efficiency

4.9 Solutions

4.9.1 Exercise 1

Part 1 Solution:
Hereโ€™s one possible solution

In [71]: x_vals = [1, 2, 3]


y_vals = [1, 1, 1]
sum([x * y for x, y in zip(x_vals, y_vals)])

Out[71]: 6

This also works

In [72]: sum(x * y for x, y in zip(x_vals, y_vals))

Out[72]: 6

Part 2 Solution:
One solution is

In [73]: sum([x % 2 == 0 for x in range(100)])

Out[73]: 50

This also works:

In [74]: sum(x % 2 == 0 for x in range(100))

Out[74]: 50

Some less natural alternatives that nonetheless help to illustrate the flexibility of list compre-
hensions are
4.9. SOLUTIONS 71

In [75]: len([x for x in range(100) if x % 2 == 0])

Out[75]: 50

and

In [76]: sum([1 for x in range(100) if x % 2 == 0])

Out[76]: 50

Part 3 Solution
Hereโ€™s one possibility

In [77]: pairs = ((2, 5), (4, 2), (9, 8), (12, 10))
sum([x % 2 == 0 and y % 2 == 0 for x, y in pairs])

Out[77]: 2

4.9.2 Exercise 2
In [78]: def p(x, coeff):
return sum(a * x**i for i, a in enumerate(coeff))

In [79]: p(1, (2, 4))

Out[79]: 6

4.9.3 Exercise 3

Hereโ€™s one solution:

In [80]: def f(string):


count = 0
for letter in string:
if letter == letter.upper() and letter.isalpha():
count += 1
return count
f('The Rain in Spain')

Out[80]: 3

4.9.4 Exercise 4

Hereโ€™s a solution:

In [81]: def f(seq_a, seq_b):


is_subset = True
for a in seq_a:
if a not in seq_b:
is_subset = False
return is_subset

# == test == #

print(f([1, 2], [1, 2, 3]))


print(f([1, 2, 3], [1, 2]))
72 4. PYTHON ESSENTIALS

True
False

Of course, if we use the sets data type then the solution is easier

In [82]: def f(seq_a, seq_b):


return set(seq_a).issubset(set(seq_b))

4.9.5 Exercise 5
In [83]: def linapprox(f, a, b, n, x):
"""
Evaluates the piecewise linear interpolant of f at x on the interval
[a, b], with n evenly spaced grid points.

Parameters
===========
f : function
The function to approximate

x, a, b : scalars (floats or integers)


Evaluation point and endpoints, with a <= x <= b

n : integer
Number of grid points

Returns
=========
A float. The interpolant evaluated at x

"""
length_of_interval = b - a
num_subintervals = n - 1
step = length_of_interval / num_subintervals

# === find first grid point larger than x === #


point = a
while point <= x:
point += step

# === x must lie between the gridpoints (point - step) and point === #
u, v = point - step, point

return f(u) + (x - u) * (f(v) - f(u)) / (v - u)


5

OOP I: Introduction to Object


Oriented Programming

5.1 Contents

โ€ข Overview 5.2

โ€ข Objects 5.3

โ€ข Summary 5.4

5.2 Overview

OOP is one of the major paradigms in programming


The traditional programming paradigm (think Fortran, C, MATLAB, etc.) is called procedu-
ral
It works as follows

โ€ข The program has a state corresponding to the values of its variables


โ€ข Functions are called to act on these data
โ€ข Data are passed back and forth via function calls

In contrast, in the OOP paradigm

โ€ข data and functions are โ€œbundled togetherโ€ into โ€œobjectsโ€

(Functions in this context are referred to as methods)

5.2.1 Python and OOP

Python is a pragmatic language that blends object-oriented and procedural styles, rather than
taking a purist approach
However, at a foundational level, Python is object-oriented

73
74 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING

In particular, in Python, everything is an object


In this lecture, we explain what that statement means and why it matters

5.3 Objects

In Python, an object is a collection of data and instructions held in computer memory that
consists of

1. a type
2. a unique identity
3. data (i.e., content)
4. methods

These concepts are defined and discussed sequentially below

5.3.1 Type

Python provides for different types of objects, to accommodate different categories of data
For example

In [1]: s = 'This is a string'


type(s)

Out[1]: str

In [2]: x = 42 # Now let's create an integer


type(x)

Out[2]: int

The type of an object matters for many expressions


For example, the addition operator between two strings means concatenation

In [3]: '300' + 'cc'

Out[3]: '300cc'

On the other hand, between two numbers it means ordinary addition

In [4]: 300 + 400

Out[4]: 700

Consider the following expression

In [5]: '300' + 400


5.3. OBJECTS 75

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-5-263a89d2d982> in <module>
----> 1 '300' + 400

TypeError: can only concatenate str (not "int") to str

Here we are mixing types, and itโ€™s unclear to Python whether the user wants to

โ€ข convert '300' to an integer and then add it to 400, or


โ€ข convert 400 to string and then concatenate it with '300'

Some languages might try to guess but Python is strongly typed

โ€ข Type is important, and implicit type conversion is rare


โ€ข Python will respond instead by raising a TypeError

To avoid the error, you need to clarify by changing the relevant type
For example,

In [6]: int('300') + 400 # To add as numbers, change the string to an integer

Out[6]: 700

5.3.2 Identity

In Python, each object has a unique identifier, which helps Python (and us) keep track of the
object
The identity of an object can be obtained via the id() function

In [7]: y = 2.5
z = 2.5
id(y)

Out[7]: 140535456630128

In [8]: id(z)

Out[8]: 140535456630080

In this example, y and z happen to have the same value (i.e., 2.5), but they are not the
same object
The identity of an object is in fact just the address of the object in memory
76 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING

5.3.3 Object Content: Data and Attributes

If we set x = 42 then we create an object of type int that contains the data 42
In fact, it contains more, as the following example shows

In [9]: x = 42
x

Out[9]: 42

In [10]: x.imag

Out[10]: 0

In [11]: x.__class__

Out[11]: int

When Python creates this integer object, it stores with it various auxiliary information, such
as the imaginary part, and the type
Any name following a dot is called an attribute of the object to the left of the dot

โ€ข e.g.,imag and __class__ are attributes of x

We see from this example that objects have attributes that contain auxiliary information
They also have attributes that act like functions, called methods
These attributes are important, so letโ€™s discuss them in-depth

5.3.4 Methods

Methods are functions that are bundled with objects


Formally, methods are attributes of objects that are callable (i.e., can be called as functions)

In [12]: x = ['foo', 'bar']


callable(x.append)

Out[12]: True

In [13]: callable(x.__doc__)

Out[13]: False

Methods typically act on the data contained in the object they belong to, or combine that
data with other data

In [14]: x = ['a', 'b']


x.append('c')
s = 'This is a string'
s.upper()
5.4. SUMMARY 77

Out[14]: 'THIS IS A STRING'

In [15]: s.lower()

Out[15]: 'this is a string'

In [16]: s.replace('This', 'That')

Out[16]: 'That is a string'

A great deal of Python functionality is organized around method calls


For example, consider the following piece of code

In [17]: x = ['a', 'b']


x[0] = 'aa' # Item assignment using square bracket notation
x

Out[17]: ['aa', 'b']

It doesnโ€™t look like there are any methods used here, but in fact the square bracket assign-
ment notation is just a convenient interface to a method call
What actually happens is that Python calls the __setitem__ method, as follows

In [18]: x = ['a', 'b']


x.__setitem__(0, 'aa') # Equivalent to x[0] = 'aa'
x

Out[18]: ['aa', 'b']

(If you wanted to you could modify the __setitem__ method, so that square bracket as-
signment does something totally different)

5.4 Summary

In Python, everything in memory is treated as an object


This includes not just lists, strings, etc., but also less obvious things, such as

โ€ข functions (once they have been read into memory)


โ€ข modules (ditto)
โ€ข files opened for reading or writing
โ€ข integers, etc.

Consider, for example, functions


When Python reads a function definition, it creates a function object and stores it in mem-
ory
The following code illustrates
78 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING

In [19]: def f(x): return x**2


f

Out[19]: <function __main__.f(x)>

In [20]: type(f)

Out[20]: function

In [21]: id(f)

Out[21]: 140535456543336

In [22]: f.__name__

Out[22]: 'f'

We can see that f has type, identity, attributes and so onโ€”just like any other object
It also has methods
One example is the __call__ method, which just evaluates the function

In [23]: f.__call__(3)

Out[23]: 9

Another is the __dir__ method, which returns a list of attributes


Modules loaded into memory are also treated as objects

In [24]: import math

id(math)

Out[24]: 140535632790936

This uniform treatment of data in Python (everything is an object) helps keep the language
simple and consistent
Part II

The Scientific Libraries

79
6

NumPy

6.1 Contents

โ€ข Overview 6.2

โ€ข Introduction to NumPy 6.3

โ€ข NumPy Arrays 6.4

โ€ข Operations on Arrays 6.5

โ€ข Additional Functionality 6.6

โ€ข Exercises 6.7

โ€ข Solutions 6.8

โ€œLetโ€™s be clear: the work of science has nothing whatever to do with consensus.
Consensus is the business of politics. Science, on the contrary, requires only one
investigator who happens to be right, which means that he or she has results that
are verifiable by reference to the real world. In science consensus is irrelevant.
What is relevant is reproducible results.โ€ โ€“ Michael Crichton

6.2 Overview

NumPy is a first-rate library for numerical programming

โ€ข Widely used in academia, finance and industry


โ€ข Mature, fast, stable and under continuous development

In this lecture, we introduce NumPy arrays and the fundamental array processing operations
provided by NumPy

6.2.1 References

โ€ข The official NumPy documentation

81
82 6. NUMPY

6.3 Introduction to NumPy

The essential problem that NumPy solves is fast array processing


For example, suppose we want to create an array of 1 million random draws from a uniform
distribution and compute the mean
If we did this in pure Python it would be orders of magnitude slower than C or Fortran
This is because

โ€ข Loops in Python over Python data types like lists carry significant overhead
โ€ข C and Fortran code contains a lot of type information that can be used for optimization
โ€ข Various optimizations can be carried out during compilation when the compiler sees the
instructions as a whole

However, for a task like the one described above, thereโ€™s no need to switch back to C or For-
tran
Instead, we can use NumPy, where the instructions look like this:

In [1]: import numpy as np

x = np.random.uniform(0, 1, size=1000000)
x.mean()

Out[1]: 0.5004892850074708

The operations of creating the array and computing its mean are both passed out to carefully
optimized machine code compiled from C
More generally, NumPy sends operations in batches to optimized C and Fortran code
This is similar in spirit to Matlab, which provides an interface to fast Fortran routines

6.3.1 A Comment on Vectorization

NumPy is great for operations that are naturally vectorized


Vectorized operations are precompiled routines that can be sent in batches, like

โ€ข matrix multiplication and other linear algebra routines


โ€ข generating a vector of random numbers
โ€ข applying a fixed transformation (e.g., sine or cosine) to an entire array

In a later lecture, weโ€™ll discuss code that isnโ€™t easy to vectorize and how such routines can
also be optimized

6.4 NumPy Arrays

The most important thing that NumPy defines is an array data type formally called a
numpy.ndarray
6.4. NUMPY ARRAYS 83

NumPy arrays power a large proportion of the scientific Python ecosystem


To create a NumPy array containing only zeros we use np.zeros

In [2]: a = np.zeros(3)
a

Out[2]: array([0., 0., 0.])

In [3]: type(a)

Out[3]: numpy.ndarray

NumPy arrays are somewhat like native Python lists, except that

โ€ข Data must be homogeneous (all elements of the same type)


โ€ข These types must be one of the data types (dtypes) provided by NumPy

The most important of these dtypes are:

โ€ข float64: 64 bit floating-point number


โ€ข int64: 64 bit integer
โ€ข bool: 8 bit True or False

There are also dtypes to represent complex numbers, unsigned integers, etc
On modern machines, the default dtype for arrays is float64

In [4]: a = np.zeros(3)
type(a[0])

Out[4]: numpy.float64

If we want to use integers we can specify as follows:

In [5]: a = np.zeros(3, dtype=int)


type(a[0])

Out[5]: numpy.int64

6.4.1 Shape and Dimension

Consider the following assignment

In [6]: z = np.zeros(10)

Here z is a flat array with no dimension โ€” neither row nor column vector
The dimension is recorded in the shape attribute, which is a tuple

In [7]: z.shape
84 6. NUMPY

Out[7]: (10,)

Here the shape tuple has only one element, which is the length of the array (tuples with one
element end with a comma)
To give it dimension, we can change the shape attribute

In [8]: z.shape = (10, 1)


z

Out[8]: array([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]])

In [9]: z = np.zeros(4)
z.shape = (2, 2)
z

Out[9]: array([[0., 0.],


[0., 0.]])

In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() func-
tion, as in z = np.zeros((2, 2))

6.4.2 Creating Arrays

As weโ€™ve seen, the np.zeros function creates an array of zeros


You can probably guess what np.ones creates
Related is np.empty, which creates arrays in memory that can later be populated with data

In [10]: z = np.empty(3)
z

Out[10]: array([0., 0., 0.])

The numbers you see here are garbage values


(Python allocates 3 contiguous 64 bit pieces of memory, and the existing contents of those
memory slots are interpreted as float64 values)
To set up a grid of evenly spaced numbers use np.linspace

In [11]: z = np.linspace(2, 4, 5) # From 2 to 4, with 5 elements

To create an identity matrix use either np.identity or np.eye

In [12]: z = np.identity(2)
z
6.4. NUMPY ARRAYS 85

Out[12]: array([[1., 0.],


[0., 1.]])

In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array

In [13]: z = np.array([10, 20]) # ndarray from Python list


z

Out[13]: array([10, 20])

In [14]: type(z)

Out[14]: numpy.ndarray

In [15]: z = np.array((10, 20), dtype=float) # Here 'float' is equivalent to 'np.float64'


z

Out[15]: array([10., 20.])

In [16]: z = np.array([[1, 2], [3, 4]]) # 2D array from a list of lists


z

Out[16]: array([[1, 2],


[3, 4]])

See also np.asarray, which performs a similar function, but does not make a distinct copy
of data already in a NumPy array

In [17]: na = np.linspace(10, 20, 2)


na is np.asarray(na) # Does not copy NumPy arrays

Out[17]: True

In [18]: na is np.array(na) # Does make a new copy --- perhaps unnecessarily

Out[18]: False

To read in the array data from a text file containing numeric data use np.loadtxt or
np.genfromtxtโ€”see the documentation for details

6.4.3 Array Indexing

For a flat array, indexing is the same as Python sequences:

In [19]: z = np.linspace(1, 2, 5)
z

Out[19]: array([1. , 1.25, 1.5 , 1.75, 2. ])

In [20]: z[0]

Out[20]: 1.0
86 6. NUMPY

In [21]: z[0:2] # Two elements, starting at element 0

Out[21]: array([1. , 1.25])

In [22]: z[-1]

Out[22]: 2.0

For 2D arrays the index syntax is as follows:

In [23]: z = np.array([[1, 2], [3, 4]])


z

Out[23]: array([[1, 2],


[3, 4]])

In [24]: z[0, 0]

Out[24]: 1

In [25]: z[0, 1]

Out[25]: 2

And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows

In [26]: z[0, :]

Out[26]: array([1, 2])

In [27]: z[:, 1]

Out[27]: array([2, 4])

NumPy arrays of integers can also be used to extract elements

In [28]: z = np.linspace(2, 4, 5)
z

Out[28]: array([2. , 2.5, 3. , 3.5, 4. ])

In [29]: indices = np.array((0, 2, 3))


z[indices]

Out[29]: array([2. , 3. , 3.5])

Finally, an array of dtype bool can be used to extract elements

In [30]: z
6.4. NUMPY ARRAYS 87

Out[30]: array([2. , 2.5, 3. , 3.5, 4. ])

In [31]: d = np.array([0, 1, 1, 0, 0], dtype=bool)


d

Out[31]: array([False, True, True, False, False])

In [32]: z[d]

Out[32]: array([2.5, 3. ])

Weโ€™ll see why this is useful below


An aside: all elements of an array can be set equal to one number using slice notation

In [33]: z = np.empty(3)
z

Out[33]: array([2. , 3. , 3.5])

In [34]: z[:] = 42
z

Out[34]: array([42., 42., 42.])

6.4.4 Array Methods

Arrays have useful methods, all of which are carefully optimized

In [35]: a = np.array((4, 3, 2, 1))


a

Out[35]: array([4, 3, 2, 1])

In [36]: a.sort() # Sorts a in place


a

Out[36]: array([1, 2, 3, 4])

In [37]: a.sum() # Sum

Out[37]: 10

In [38]: a.mean() # Mean

Out[38]: 2.5

In [39]: a.max() # Max

Out[39]: 4

In [40]: a.argmax() # Returns the index of the maximal element


88 6. NUMPY

Out[40]: 3

In [41]: a.cumsum() # Cumulative sum of the elements of a

Out[41]: array([ 1, 3, 6, 10])

In [42]: a.cumprod() # Cumulative product of the elements of a

Out[42]: array([ 1, 2, 6, 24])

In [43]: a.var() # Variance

Out[43]: 1.25

In [44]: a.std() # Standard deviation

Out[44]: 1.118033988749895

In [45]: a.shape = (2, 2)


a.T # Equivalent to a.transpose()

Out[45]: array([[1, 3],


[2, 4]])

Another method worth knowing is searchsorted()


If z is a nondecreasing array, then z.searchsorted(a) returns the index of the first ele-
ment of z that is >= a

In [46]: z = np.linspace(2, 4, 5)
z

Out[46]: array([2. , 2.5, 3. , 3.5, 4. ])

In [47]: z.searchsorted(2.2)

Out[47]: 1

Many of the methods discussed above have equivalent functions in the NumPy namespace

In [48]: a = np.array((4, 3, 2, 1))

In [49]: np.sum(a)

Out[49]: 10

In [50]: np.mean(a)

Out[50]: 2.5
6.5. OPERATIONS ON ARRAYS 89

6.5 Operations on Arrays

6.5.1 Arithmetic Operations

The operators +, -, *, / and ** all act elementwise on arrays

In [51]: a = np.array([1, 2, 3, 4])


b = np.array([5, 6, 7, 8])
a + b

Out[51]: array([ 6, 8, 10, 12])

In [52]: a * b

Out[52]: array([ 5, 12, 21, 32])

We can add a scalar to each element as follows

In [53]: a + 10

Out[53]: array([11, 12, 13, 14])

Scalar multiplication is similar

In [54]: a * 10

Out[54]: array([10, 20, 30, 40])

The two-dimensional arrays follow the same general rules

In [55]: A = np.ones((2, 2))


B = np.ones((2, 2))
A + B

Out[55]: array([[2., 2.],


[2., 2.]])

In [56]: A + 10

Out[56]: array([[11., 11.],


[11., 11.]])

In [57]: A * B

Out[57]: array([[1., 1.],


[1., 1.]])

In particular, A * B is not the matrix product, it is an element-wise product


90 6. NUMPY

6.5.2 Matrix Multiplication

With Anacondaโ€™s scientific Python package based around Python 3.5 and above, one can use
the @ symbol for matrix multiplication, as follows:

In [58]: A = np.ones((2, 2))


B = np.ones((2, 2))
A @ B

Out[58]: array([[2., 2.],


[2., 2.]])

(For older versions of Python and NumPy you need to use the np.dot function)
We can also use @ to take the inner product of two flat arrays

In [59]: A = np.array((1, 2))


B = np.array((10, 20))
A @ B

Out[59]: 50

In fact, we can use @ when one element is a Python list or tuple

In [60]: A = np.array(((1, 2), (3, 4)))


A

Out[60]: array([[1, 2],


[3, 4]])

In [61]: A @ (0, 1)

Out[61]: array([2, 4])

Since we are post-multiplying, the tuple is treated as a column vector

6.5.3 Mutability and Copying Arrays

NumPy arrays are mutable data types, like Python lists


In other words, their contents can be altered (mutated) in memory after initialization
We already saw examples above
Hereโ€™s another example:

In [62]: a = np.array([42, 44])


a

Out[62]: array([42, 44])

In [63]: a[-1] = 0 # Change last element to 0


a

Out[63]: array([42, 0])


6.5. OPERATIONS ON ARRAYS 91

Mutability leads to the following behavior (which can be shocking to MATLAB program-
mersโ€ฆ)

In [64]: a = np.random.randn(3)
a

Out[64]: array([ 1.05287718, -0.90366748, -1.51731058])

In [65]: b = a
b[0] = 0.0
a

Out[65]: array([ 0. , -0.90366748, -1.51731058])

Whatโ€™s happened is that we have changed a by changing b


The name b is bound to a and becomes just another reference to the array (the Python as-
signment model is described in more detail later in the course)
Hence, it has equal rights to make changes to that array
This is in fact the most sensible default behavior!
It means that we pass around only pointers to data, rather than making copies
Making copies is expensive in terms of both speed and memory
Making Copies
It is of course possible to make b an independent copy of a when required
This can be done using np.copy

In [66]: a = np.random.randn(3)
a

Out[66]: array([-0.19842005, 0.08435544, -0.34056112])

In [67]: b = np.copy(a)
b

Out[67]: array([-0.19842005, 0.08435544, -0.34056112])

Now b is an independent copy (called a deep copy)

In [68]: b[:] = 1
b

Out[68]: array([1., 1., 1.])

In [69]: a

Out[69]: array([-0.19842005, 0.08435544, -0.34056112])

Note that the change to b has not affected a


92 6. NUMPY

6.6 Additional Functionality

Letโ€™s look at some other useful things we can do with NumPy

6.6.1 Vectorized Functions

NumPy provides versions of the standard functions log, exp, sin, etc. that act element-
wise on arrays

In [70]: z = np.array([1, 2, 3])


np.sin(z)

Out[70]: array([0.84147098, 0.90929743, 0.14112001])

This eliminates the need for explicit element-by-element loops such as

In [71]: n = len(z)
y = np.empty(n)
for i in range(n):
y[i] = np.sin(z[i])

Because they act element-wise on arrays, these functions are called vectorized functions
In NumPy-speak, they are also called ufuncs, which stands for โ€œuniversal functionsโ€
As we saw above, the usual arithmetic operations (+, *, etc.) also work element-wise, and
combining these with the ufuncs gives a very large set of fast element-wise functions

In [72]: z

Out[72]: array([1, 2, 3])

In [73]: (1 / np.sqrt(2 * np.pi)) * np.exp(- 0.5 * z**2)

Out[73]: array([0.24197072, 0.05399097, 0.00443185])

Not all user-defined functions will act element-wise


For example, passing the function f defined below a NumPy array causes a ValueError

In [74]: def f(x):


return 1 if x > 0 else 0

The NumPy function np.where provides a vectorized alternative:

In [75]: x = np.random.randn(4)
x

Out[75]: array([ 1.61695912, -0.70388772, 0.17046687, 0.89294672])

In [76]: np.where(x > 0, 1, 0) # Insert 1 if x > 0 true, otherwise 0

Out[76]: array([1, 0, 1, 1])


6.6. ADDITIONAL FUNCTIONALITY 93

You can also use np.vectorize to vectorize a given function

In [77]: def f(x): return 1 if x > 0 else 0

f = np.vectorize(f)
f(x) # Passing the same vector x as in the previous example

Out[77]: array([1, 0, 1, 1])

However, this approach doesnโ€™t always obtain the same speed as a more carefully crafted vec-
torized function

6.6.2 Comparisons

As a rule, comparisons on arrays are done element-wise

In [78]: z = np.array([2, 3])


y = np.array([2, 3])
z == y

Out[78]: array([ True, True])

In [79]: y[0] = 5
z == y

Out[79]: array([False, True])

In [80]: z != y

Out[80]: array([ True, False])

The situation is similar for >, <, >= and <=


We can also do comparisons against scalars

In [81]: z = np.linspace(0, 10, 5)


z

Out[81]: array([ 0. , 2.5, 5. , 7.5, 10. ])

In [82]: z > 3

Out[82]: array([False, False, True, True, True])

This is particularly useful for conditional extraction

In [83]: b = z > 3
b

Out[83]: array([False, False, True, True, True])

In [84]: z[b]

Out[84]: array([ 5. , 7.5, 10. ])

Of course we canโ€”and frequently doโ€”perform this in one step

In [85]: z[z > 3]

Out[85]: array([ 5. , 7.5, 10. ])


94 6. NUMPY

6.6.3 Sub-packages

NumPy provides some additional functionality related to scientific programming through its
sub-packages
Weโ€™ve already seen how we can generate random variables using np.random

In [86]: z = np.random.randn(10000) # Generate standard normals


y = np.random.binomial(10, 0.5, size=1000) # 1,000 draws from Bin(10, 0.5)
y.mean()

Out[86]: 5.034

Another commonly used subpackage is np.linalg

In [87]: A = np.array([[1, 2], [3, 4]])

np.linalg.det(A) # Compute the determinant

Out[87]: -2.0000000000000004

In [88]: np.linalg.inv(A) # Compute the inverse

Out[88]: array([[-2. , 1. ],
[ 1.5, -0.5]])

Much of this functionality is also available in SciPy, a collection of modules that are built on
top of NumPy
Weโ€™ll cover the SciPy versions in more detail soon
For a comprehensive list of whatโ€™s available in NumPy see this documentation

6.7 Exercises

6.7.1 Exercise 1

Consider the polynomial expression

๐‘
๐‘(๐‘ฅ) = ๐‘Ž0 + ๐‘Ž1 ๐‘ฅ + ๐‘Ž2 ๐‘ฅ2 + โ‹ฏ ๐‘Ž๐‘ ๐‘ฅ๐‘ = โˆ‘ ๐‘Ž๐‘› ๐‘ฅ๐‘› (1)
๐‘›=0

Earlier, you wrote a simple function p(x, coeff) to evaluate Eq. (1) without considering
efficiency
Now write a new function that does the same job, but uses NumPy arrays and array opera-
tions for its computations, rather than any form of Python loop
(Such functionality is already implemented as np.poly1d, but for the sake of the exercise
donโ€™t use this class)

โ€ข Hint: Use np.cumprod()


6.7. EXERCISES 95

6.7.2 Exercise 2

Let q be a NumPy array of length n with q.sum() == 1


Suppose that q represents a probability mass function
We wish to generate a discrete random variable ๐‘ฅ such that P{๐‘ฅ = ๐‘–} = ๐‘ž๐‘–
In other words, x takes values in range(len(q)) and x = i with probability q[i]
The standard (inverse transform) algorithm is as follows:

โ€ข Divide the unit interval [0, 1] into ๐‘› subintervals ๐ผ0 , ๐ผ1 , โ€ฆ , ๐ผ๐‘›โˆ’1 such that the length of
๐ผ๐‘– is ๐‘ž๐‘–
โ€ข Draw a uniform random variable ๐‘ˆ on [0, 1] and return the ๐‘– such that ๐‘ˆ โˆˆ ๐ผ๐‘–

The probability of drawing ๐‘– is the length of ๐ผ๐‘– , which is equal to ๐‘ž๐‘–


We can implement the algorithm as follows

In [89]: from random import uniform

def sample(q):
a = 0.0
U = uniform(0, 1)
for i in range(len(q)):
if a < U <= a + q[i]:
return i
a = a + q[i]

If you canโ€™t see how this works, try thinking through the flow for a simple example, such as q
= [0.25, 0.75] It helps to sketch the intervals on paper
Your exercise is to speed it up using NumPy, avoiding explicit loops

โ€ข Hint: Use np.searchsorted and np.cumsum

If you can, implement the functionality as a class called discreteRV, where

โ€ข the data for an instance of the class is the vector of probabilities q


โ€ข the class has a draw() method, which returns one draw according to the algorithm de-
scribed above

If you can, write the method so that draw(k) returns k draws from q

6.7.3 Exercise 3

Recall our earlier discussion of the empirical cumulative distribution function


Your task is to

1. Make the __call__ method more efficient using NumPy


2. Add a method that plots the ECDF over [๐‘Ž, ๐‘], where ๐‘Ž and ๐‘ are method parameters
96 6. NUMPY

6.8 Solutions
In [90]: import matplotlib.pyplot as plt
%matplotlib inline

6.8.1 Exercise 1

This code does the job

In [91]: def p(x, coef):


X = np.empty(len(coef))
X[0] = 1
X[1:] = x
y = np.cumprod(X) # y = [1, x, x**2,...]
return coef @ y

Letโ€™s test it

In [92]: coef = np.ones(3)


print(coef)
print(p(1, coef))
# For comparison
q = np.poly1d(coef)
print(q(1))

[1. 1. 1.]
3.0
3.0

6.8.2 Exercise 2

Hereโ€™s our first pass at a solution:

In [93]: from numpy import cumsum


from numpy.random import uniform

class DiscreteRV:
"""
Generates an array of draws from a discrete random variable with vector of
probabilities given by q.
"""

def __init__(self, q):


"""
The argument q is a NumPy array, or array like, nonnegative and sums
to 1
"""
self.q = q
self.Q = cumsum(q)

def draw(self, k=1):


"""
Returns k draws from q. For each such draw, the value i is returned
with probability q[i].
"""
return self.Q.searchsorted(uniform(0, 1, size=k))

The logic is not obvious, but if you take your time and read it slowly, you will understand
There is a problem here, however
Suppose that q is altered after an instance of discreteRV is created, for example by
6.8. SOLUTIONS 97

In [94]: q = (0.1, 0.9)


d = DiscreteRV(q)
d.q = (0.5, 0.5)

The problem is that Q does not change accordingly, and Q is the data used in the draw
method
To deal with this, one option is to compute Q every time the draw method is called
But this is inefficient relative to computing Q once-off
A better option is to use descriptors
A solution from the quantecon library using descriptors that behaves as we desire can be
found here

6.8.3 Exercise 3

An example solution is given below


In essence, weโ€™ve just taken this code from QuantEcon and added in a plot method

In [95]: """
Modifies ecdf.py from QuantEcon to add in a plot method

"""

class ECDF:
"""
One-dimensional empirical distribution function given a vector of
observations.

Parameters
----------
observations : array_like
An array of observations

Attributes
----------
observations : array_like
An array of observations

"""

def __init__(self, observations):


self.observations = np.asarray(observations)

def __call__(self, x):


"""
Evaluates the ecdf at x

Parameters
----------
x : scalar(float)
The x at which the ecdf is evaluated

Returns
-------
scalar(float)
Fraction of the sample less than x

"""
return np.mean(self.observations <= x)

def plot(self, a=None, b=None):


"""
Plot the ecdf on the interval [a, b].
98 6. NUMPY

Parameters
----------
a : scalar(float), optional(default=None)
Lower endpoint of the plot interval
b : scalar(float), optional(default=None)
Upper endpoint of the plot interval

"""

# === choose reasonable interval if [a, b] not specified === #


if a is None:
a = self.observations.min() - self.observations.std()
if b is None:
b = self.observations.max() + self.observations.std()

# === generate plot === #


x_vals = np.linspace(a, b, num=100)
f = np.vectorize(self.__call__)
plt.plot(x_vals, f(x_vals))
plt.show()

Hereโ€™s an example of usage

In [96]: X = np.random.randn(1000)
F = ECDF(X)
F.plot()
7

Matplotlib

7.1 Contents

โ€ข Overview 7.2

โ€ข The APIs 7.3

โ€ข More Features 7.4

โ€ข Further Reading 7.5

โ€ข Exercises 7.6

โ€ข Solutions 7.7

7.2 Overview

Weโ€™ve already generated quite a few figures in these lectures using Matplotlib
Matplotlib is an outstanding graphics library, designed for scientific computing, with

โ€ข high-quality 2D and 3D plots


โ€ข output in all the usual formats (PDF, PNG, etc.)
โ€ข LaTeX integration
โ€ข fine-grained control over all aspects of presentation
โ€ข animation, etc.

7.2.1 Matplotlibโ€™s Split Personality

Matplotlib is unusual in that it offers two different interfaces to plotting


One is a simple MATLAB-style API (Application Programming Interface) that was written to
help MATLAB refugees find a ready home
The other is a more โ€œPythonicโ€ object-oriented API
For reasons described below, we recommend that you use the second API
But first, letโ€™s discuss the difference

99
100 7. MATPLOTLIB

7.3 The APIs

7.3.1 The MATLAB-style API

Hereโ€™s the kind of easy example you might find in introductory treatments

In [1]: import matplotlib.pyplot as plt


%matplotlib inline
import numpy as np

x = np.linspace(0, 10, 200)


y = np.sin(x)

plt.plot(x, y, 'b-', linewidth=2)


plt.show()

This is simple and convenient, but also somewhat limited and un-Pythonic
For example, in the function calls, a lot of objects get created and passed around without
making themselves known to the programmer
Python programmers tend to prefer a more explicit style of programming (run import this
in a code block and look at the second line)
This leads us to the alternative, object-oriented Matplotlib API

7.3.2 The Object-Oriented API

Hereโ€™s the code corresponding to the preceding figure using the object-oriented API

In [2]: fig, ax = plt.subplots()


ax.plot(x, y, 'b-', linewidth=2)
plt.show()
7.3. THE APIS 101

Here the call fig, ax = plt.subplots() returns a pair, where

โ€ข fig is a Figure instanceโ€”like a blank canvas


โ€ข ax is an AxesSubplot instanceโ€”think of a frame for plotting in

The plot() function is actually a method of ax


While thereโ€™s a bit more typing, the more explicit use of objects gives us better control
This will become more clear as we go along

7.3.3 Tweaks

Here weโ€™ve changed the line to red and added a legend

In [3]: fig, ax = plt.subplots()


ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend()
plt.show()
102 7. MATPLOTLIB

Weโ€™ve also used alpha to make the line slightly transparentโ€”which makes it look smoother
The location of the legend can be changed by replacing ax.legend() with
ax.legend(loc='upper center')

In [4]: fig, ax = plt.subplots()


ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend(loc='upper center')
plt.show()

If everything is properly configured, then adding LaTeX is trivial


7.3. THE APIS 103

In [5]: fig, ax = plt.subplots()


ax.plot(x, y, 'r-', linewidth=2, label='$y=\sin(x)$', alpha=0.6)
ax.legend(loc='upper center')
plt.show()

Controlling the ticks, adding titles and so on is also straightforward

In [6]: fig, ax = plt.subplots()


ax.plot(x, y, 'r-', linewidth=2, label='$y=\sin(x)$', alpha=0.6)
ax.legend(loc='upper center')
ax.set_yticks([-1, 0, 1])
ax.set_title('Test plot')
plt.show()
104 7. MATPLOTLIB

7.4 More Features

Matplotlib has a huge array of functions and features, which you can discover over time as
you have need for them
We mention just a few

7.4.1 Multiple Plots on One Axis

Itโ€™s straightforward to generate multiple plots on the same axes


Hereโ€™s an example that randomly generates three normal densities and adds a label with their
mean

In [7]: from scipy.stats import norm


from random import uniform

fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = f'$\mu = {m:.2}$'
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()

7.4.2 Multiple Subplots

Sometimes we want multiple subplots in one figure


7.4. MORE FEATURES 105

Hereโ€™s an example that generates 6 histograms

In [8]: num_rows, num_cols = 3, 2


fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 12))
for i in range(num_rows):
for j in range(num_cols):
m, s = uniform(-1, 1), uniform(1, 2)
x = norm.rvs(loc=m, scale=s, size=100)
axes[i, j].hist(x, alpha=0.6, bins=20)
t = f'$\mu = {m:.2}, \quad \sigma = {s:.2}$'
axes[i, j].set(title=t, xticks=[-4, 0, 4], yticks=[])
plt.show()
106 7. MATPLOTLIB

7.4.3 3D Plots

Matplotlib does a nice job of 3D plots โ€” here is one example

In [9]: from mpl_toolkits.mplot3d.axes3d import Axes3D


from matplotlib import cm

def f(x, y):


return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

xgrid = np.linspace(-3, 3, 50)


ygrid = xgrid
x, y = np.meshgrid(xgrid, ygrid)

fig = plt.figure(figsize=(8, 6))


ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x,
y,
f(x, y),
rstride=2, cstride=2,
cmap=cm.jet,
alpha=0.7,
linewidth=0.25)
ax.set_zlim(-0.5, 1.0)
plt.show()

7.4.4 A Customizing Function

Perhaps you will find a set of customizations that you regularly use
Suppose we usually prefer our axes to go through the origin, and to have a grid
7.5. FURTHER READING 107

Hereโ€™s a nice example from Matthew Doty of how the object-oriented API can be used to
build a custom subplots function that implements these changes
Read carefully through the code and see if you can follow whatโ€™s going on

In [10]: def subplots():


"Custom subplots with axes through the origin"
fig, ax = plt.subplots()

# Set the axes through the origin


for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.grid()
return fig, ax

fig, ax = subplots() # Call the local version, not plt.subplots()


x = np.linspace(-2, 10, 200)
y = np.sin(x)
ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend(loc='lower right')
plt.show()

The custom subplots function

1. calls the standard plt.subplots function internally to generate the fig, ax pair,
2. makes the desired customizations to ax, and
3. passes the fig, ax pair back to the calling code

7.5 Further Reading

โ€ข The Matplotlib gallery provides many examples


โ€ข A nice Matplotlib tutorial by Nicolas Rougier, Mike Muller and Gael Varoquaux
108 7. MATPLOTLIB

โ€ข mpltools allows easy switching between plot styles


โ€ข Seaborn facilitates common statistics plots in Matplotlib

7.6 Exercises

7.6.1 Exercise 1

Plot the function

๐‘“(๐‘ฅ) = cos(๐œ‹๐œƒ๐‘ฅ) exp(โˆ’๐‘ฅ)

over the interval [0, 5] for each ๐œƒ in np.linspace(0, 2, 10)


Place all the curves in the same figure
The output should look like this

7.7 Solutions

7.7.1 Exercise 1

Hereโ€™s one solution

In [11]: ฮธ_vals = np.linspace(0, 2, 10)


x = np.linspace(0, 5, 200)
fig, ax = plt.subplots()

for ฮธ in ฮธ_vals:
ax.plot(x, np.cos(np.pi * ฮธ * x) * np.exp(- x))

plt.show()
7.7. SOLUTIONS 109
110 7. MATPLOTLIB
8

SciPy

8.1 Contents

โ€ข SciPy versus NumPy 8.2

โ€ข Statistics 8.3

โ€ข Roots and Fixed Points 8.4

โ€ข Optimization 8.5

โ€ข Integration 8.6

โ€ข Linear Algebra 8.7

โ€ข Exercises 8.8

โ€ข Solutions 8.9

SciPy builds on top of NumPy to provide common tools for scientific programming such as

โ€ข linear algebra
โ€ข numerical integration
โ€ข interpolation
โ€ข optimization
โ€ข distributions and random number generation
โ€ข signal processing
โ€ข etc., etc

Like NumPy, SciPy is stable, mature and widely used


Many SciPy routines are thin wrappers around industry-standard Fortran libraries such as
LAPACK, BLAS, etc.
Itโ€™s not really necessary to โ€œlearnโ€ SciPy as a whole
A more common approach is to get some idea of whatโ€™s in the library and then look up docu-
mentation as required
In this lecture, we aim only to highlight some useful parts of the package

111
112 8. SCIPY

8.2 SciPy versus NumPy

SciPy is a package that contains various tools that are built on top of NumPy, using its array
data type and related functionality
In fact, when we import SciPy we also get NumPy, as can be seen from the SciPy initializa-
tion file

In [1]: # Import numpy symbols to scipy namespace


import numpy as _num
linalg = None
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *

__all__ = []
__all__ += _num.__all__
__all__ += ['randn', 'rand', 'fft', 'ifft']

del _num
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
__all__.remove('linalg')

However, itโ€™s more common and better practice to use NumPy functionality explicitly

In [2]: import numpy as np

a = np.identity(3)

What is useful in SciPy is the functionality in its sub-packages

โ€ข scipy.optimize, scipy.integrate, scipy.stats, etc.

These sub-packages and their attributes need to be imported separately

In [3]: from scipy.integrate import quad


from scipy.optimize import brentq
# etc

Letโ€™s explore some of the major sub-packages

8.3 Statistics

The scipy.stats subpackage supplies

โ€ข numerous random variable objects (densities, cumulative distributions, random sam-


pling, etc.)
โ€ข some estimation procedures
โ€ข some statistical tests
8.3. STATISTICS 113

8.3.1 Random Variables and Distributions

Recall that numpy.random provides functions for generating random variables

In [4]: np.random.beta(5, 5, size=3)

Out[4]: array([0.46025917, 0.2775525 , 0.25400856])

This generates a draw from the distribution below when a, b = 5, 5

๐‘ฅ(๐‘Žโˆ’1) (1 โˆ’ ๐‘ฅ)(๐‘โˆ’1)
๐‘“(๐‘ฅ; ๐‘Ž, ๐‘) = 1
(0 โ‰ค ๐‘ฅ โ‰ค 1) (1)
โˆซ0 ๐‘ข(๐‘Žโˆ’1) (1 โˆ’ ๐‘ข)(๐‘โˆ’1) ๐‘‘๐‘ข

Sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this, we can use scipy.stats, which provides all of this functionality as well as random
number generation in a single consistent interface
Hereโ€™s an example of usage

In [5]: from scipy.stats import beta


import matplotlib.pyplot as plt
%matplotlib inline

q = beta(5, 5) # Beta(a, b), with a = b = 5


obs = q.rvs(2000) # 2000 observations
grid = np.linspace(0.01, 0.99, 100)

fig, ax = plt.subplots(figsize=(10, 6))


ax.hist(obs, bins=40, density=True)
ax.plot(grid, q.pdf(grid), 'k-', linewidth=2)
plt.show()

In this code, we created a so-called rv_frozen object, via the call q = beta(5, 5)
114 8. SCIPY

The โ€œfrozenโ€ part of the notation implies that q represents a particular distribution with a
particular set of parameters
Once weโ€™ve done so, we can then generate random numbers, evaluate the density, etc., all
from this fixed distribution

In [6]: q.cdf(0.4) # Cumulative distribution function

Out[6]: 0.26656768000000003

In [7]: q.pdf(0.4) # Density function

Out[7]: 2.0901888000000013

In [8]: q.ppf(0.8) # Quantile (inverse cdf) function

Out[8]: 0.6339134834642708

In [9]: q.mean()

Out[9]: 0.5

The general syntax for creating these objects is

identifier = scipy.stats.distribution_name(shape_parameters)

where distribution_name is one of the distribution names in scipy.stats


There are also two keyword arguments, loc and scale, which following our example above,
are called as

identifier = scipy.stats.distribution_name(shape_parameters,
loc=c, scale=d)

These transform the original random variable ๐‘‹ into ๐‘Œ = ๐‘ + ๐‘‘๐‘‹


The methods rvs, pdf, cdf, etc. are transformed accordingly
Before finishing this section, we note that there is an alternative way of calling the methods
described above
For example, the previous code can be replaced by

In [10]: obs = beta.rvs(5, 5, size=2000)


grid = np.linspace(0.01, 0.99, 100)

fig, ax = plt.subplots()
ax.hist(obs, bins=40, density=True)
ax.plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
plt.show()
8.4. ROOTS AND FIXED POINTS 115

8.3.2 Other Goodies in scipy.stats

There are a variety statistical functions in scipy.stats


For example, scipy.stats.linregress implements simple linear regression

In [11]: from scipy.stats import linregress

x = np.random.randn(200)
y = 2 * x + 0.1 * np.random.randn(200)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
gradient, intercept

Out[11]: (2.0015196606243273, 0.009718239356687364)

To see the full list, consult the documentation

8.4 Roots and Fixed Points

A root of a real function ๐‘“ on [๐‘Ž, ๐‘] is an ๐‘ฅ โˆˆ [๐‘Ž, ๐‘] such that ๐‘“(๐‘ฅ) = 0


For example, if we plot the function

๐‘“(๐‘ฅ) = sin(4(๐‘ฅ โˆ’ 1/4)) + ๐‘ฅ + ๐‘ฅ20 โˆ’ 1 (2)

with ๐‘ฅ โˆˆ [0, 1] we get

In [12]: f = lambda x: np.sin(4 * (x - 1/4)) + x + x**20 - 1


x = np.linspace(0, 1, 100)
116 8. SCIPY

plt.figure(figsize=(10, 8))
plt.plot(x, f(x))
plt.axhline(ls='--', c='k')
plt.show()

The unique root is approximately 0.408


Letโ€™s consider some numerical techniques for finding roots

8.4.1 Bisection

One of the most common algorithms for numerical root-finding is bisection


To understand the idea, recall the well-known game where

โ€ข Player A thinks of a secret number between 1 and 100

โ€ข Player B asks if itโ€™s less than 50

โ€“ If yes, B asks if itโ€™s less than 25


โ€“ If no, B asks if itโ€™s less than 75

And so on
This is bisection
Hereโ€™s a fairly simplistic implementation of the algorithm in Python
It works for all sufficiently well behaved increasing continuous functions with ๐‘“(๐‘Ž) < 0 < ๐‘“(๐‘)
8.4. ROOTS AND FIXED POINTS 117

In [13]: def bisect(f, a, b, tol=10e-5):


"""
Implements the bisection root finding algorithm, assuming that f is a
real-valued function on [a, b] satisfying f(a) < 0 < f(b).
"""
lower, upper = a, b

while upper - lower > tol:


middle = 0.5 * (upper + lower)
# === if root is between lower and middle === #
if f(middle) > 0:
lower, upper = lower, middle
# === if root is between middle and upper === #
else:
lower, upper = middle, upper

return 0.5 * (upper + lower)

In fact, SciPy provides its own bisection function, which we now test using the function ๐‘“ de-
fined in Eq. (2)

In [14]: from scipy.optimize import bisect

bisect(f, 0, 1)

Out[14]: 0.4082935042806639

8.4.2 The Newton-Raphson Method

Another very common root-finding algorithm is the Newton-Raphson method


In SciPy this algorithm is implemented by scipy.optimize.newton
Unlike bisection, the Newton-Raphson method uses local slope information
This is a double-edged sword:

โ€ข When the function is well-behaved, the Newton-Raphson method is faster than bisec-
tion
โ€ข When the function is less well-behaved, the Newton-Raphson might fail

Letโ€™s investigate this using the same function ๐‘“, first looking at potential instability

In [15]: from scipy.optimize import newton

newton(f, 0.2) # Start the search at initial condition x = 0.2

Out[15]: 0.40829350427935673

In [16]: newton(f, 0.7) # Start the search at x = 0.7 instead

Out[16]: 0.7001700000000279

The second initial condition leads to failure of convergence


On the other hand, using IPythonโ€™s timeit magic, we see that newton can be much faster

In [17]: %timeit bisect(f, 0, 1)


118 8. SCIPY

62.4 ยตs ยฑ 4.15 ยตs per loop (mean ยฑ std. dev. of 7 runs, 10000 loops each)

In [18]: %timeit newton(f, 0.2)

149 ยตs ยฑ 5.77 ยตs per loop (mean ยฑ std. dev. of 7 runs, 10000 loops each)

8.4.3 Hybrid Methods

So far we have seen that the Newton-Raphson method is fast but not robust
This bisection algorithm is robust but relatively slow
This illustrates a general principle

โ€ข If you have specific knowledge about your function, you might be able to exploit it to
generate efficiency
โ€ข If not, then the algorithm choice involves a trade-off between the speed of convergence
and robustness

In practice, most default algorithms for root-finding, optimization and fixed points use hybrid
methods
These methods typically combine a fast method with a robust method in the following man-
ner:

1. Attempt to use a fast method


2. Check diagnostics
3. If diagnostics are bad, then switch to a more robust algorithm

In scipy.optimize, the function brentq is such a hybrid method and a good default

In [19]: brentq(f, 0, 1)

Out[19]: 0.40829350427936706

In [20]: %timeit brentq(f, 0, 1)

15.6 ยตs ยฑ 840 ns per loop (mean ยฑ std. dev. of 7 runs, 100000 loops each)

Here the correct solution is found and the speed is almost the same as newton

8.4.4 Multivariate Root-Finding

Use scipy.optimize.fsolve, a wrapper for a hybrid method in MINPACK


See the documentation for details
8.5. OPTIMIZATION 119

8.4.5 Fixed Points

SciPy has a function for finding (scalar) fixed points too

In [21]: from scipy.optimize import fixed_point

fixed_point(lambda x: x**2, 10.0) # 10.0 is an initial guess

Out[21]: array(1.)

If you donโ€™t get good results, you can always switch back to the brentq root finder, since
the fixed point of a function ๐‘“ is the root of ๐‘”(๐‘ฅ) โˆถ= ๐‘ฅ โˆ’ ๐‘“(๐‘ฅ)

8.5 Optimization

Most numerical packages provide only functions for minimization


Maximization can be performed by recalling that the maximizer of a function ๐‘“ on domain ๐ท
is the minimizer of โˆ’๐‘“ on ๐ท
Minimization is closely related to root-finding: For smooth functions, interior optima corre-
spond to roots of the first derivative
The speed/robustness trade-off described above is present with numerical optimization too
Unless you have some prior information you can exploit, itโ€™s usually best to use hybrid meth-
ods
For constrained, univariate (i.e., scalar) minimization, a good hybrid option is fminbound

In [22]: from scipy.optimize import fminbound

fminbound(lambda x: x**2, -1, 2) # Search in [-1, 2]

Out[22]: 0.0

8.5.1 Multivariate Optimization

Multivariate local optimizers include minimize, fmin, fmin_powell, fmin_cg,


fmin_bfgs, and fmin_ncg
Constrained multivariate local optimizers include fmin_l_bfgs_b, fmin_tnc,
fmin_cobyla
See the documentation for details

8.6 Integration

Most numerical integration methods work by computing the integral of an approximating


polynomial
The resulting error depends on how well the polynomial fits the integrand, which in turn de-
pends on how โ€œregularโ€ the integrand is
120 8. SCIPY

In SciPy, the relevant module for numerical integration is scipy.integrate


A good default for univariate integration is quad

In [23]: from scipy.integrate import quad

integral, error = quad(lambda x: x**2, 0, 1)


integral

Out[23]: 0.33333333333333337

In fact, quad is an interface to a very standard numerical integration routine in the Fortran
library QUADPACK
It uses Clenshaw-Curtis quadrature, based on expansion in terms of Chebychev polynomials
There are other options for univariate integrationโ€”a useful one is fixed_quad, which is fast
and hence works well inside for loops
There are also functions for multivariate integration
See the documentation for more details

8.7 Linear Algebra

We saw that NumPy provides a module for linear algebra called linalg
SciPy also provides a module for linear algebra with the same name
The latter is not an exact superset of the former, but overall it has more functionality
We leave you to investigate the set of available routines

8.8 Exercises

8.8.1 Exercise 1

Previously we discussed the concept of recursive function calls


Write a recursive implementation of the bisection function described above, which we repeat
here for convenience

In [24]: def bisect(f, a, b, tol=10e-5):


"""
Implements the bisection root finding algorithm, assuming that f is a
real-valued function on [a, b] satisfying f(a) < 0 < f(b).
"""
lower, upper = a, b

while upper - lower > tol:


middle = 0.5 * (upper + lower)
# === if root is between lower and middle === #
if f(middle) > 0:
lower, upper = lower, middle
# === if root is between middle and upper === #
else:
lower, upper = middle, upper

return 0.5 * (upper + lower)


8.9. SOLUTIONS 121

Test it on the function f = lambda x: np.sin(4 * (x - 0.25)) + x + x**20 -


1 discussed above

8.9 Solutions

8.9.1 Exercise 1

Hereโ€™s a reasonable solution:

In [25]: def bisect(f, a, b, tol=10e-5):


"""
Implements the bisection root-finding algorithm, assuming that f is a
real-valued function on [a, b] satisfying f(a) < 0 < f(b).
"""
lower, upper = a, b
if upper - lower < tol:
return 0.5 * (upper + lower)
else:
middle = 0.5 * (upper + lower)
print(f'Current mid point = {middle}')
if f(middle) > 0: # Implies root is between lower and middle
return bisect(f, lower, middle)
else: # Implies root is between middle and upper
return bisect(f, middle, upper)

We can test it as follows

In [26]: f = lambda x: np.sin(4 * (x - 0.25)) + x + x**20 - 1


bisect(f, 0, 1)

Current mid point = 0.5


Current mid point = 0.25
Current mid point = 0.375
Current mid point = 0.4375
Current mid point = 0.40625
Current mid point = 0.421875
Current mid point = 0.4140625
Current mid point = 0.41015625
Current mid point = 0.408203125
Current mid point = 0.4091796875
Current mid point = 0.40869140625
Current mid point = 0.408447265625
Current mid point = 0.4083251953125
Current mid point = 0.40826416015625

Out[26]: 0.408294677734375
122 8. SCIPY
9

Numba

9.1 Contents

โ€ข Overview 9.2

โ€ข Where are the Bottlenecks? 9.3

โ€ข Vectorization 9.4

โ€ข Numba 9.5

In addition to whatโ€™s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

9.2 Overview

In our lecture on NumPy, we learned one method to improve speed and efficiency in numeri-
cal work
That method, called vectorization, involved sending array processing operations in batch to
efficient low-level code
This clever idea dates back to Matlab, which uses it extensively
Unfortunately, vectorization is limited and has several weaknesses
One weakness is that it is highly memory-intensive
Another problem is that only some algorithms can be vectorized
In the last few years, a new Python library called Numba has appeared that solves many of
these problems
It does so through something called just in time (JIT) compilation
JIT compilation is effective in many numerical settings and can generate extremely fast, effi-
cient code
It can also do other tricks such as facilitate multithreading (a form of parallelization well
suited to numerical work)

123
124 9. NUMBA

9.2.1 The Need for Speed

To understand what Numba does and why, we need some background knowledge
Letโ€™s start by thinking about higher-level languages, such as Python
These languages are optimized for humans
This means that the programmer can leave many details to the runtime environment

โ€ข specifying variable types


โ€ข memory allocation/deallocation, etc.

The upside is that, compared to low-level languages, Python is typically faster to write, less
error-prone and easier to debug
The downside is that Python is harder to optimize โ€” that is, turn into fast machine code โ€”
than languages like C or Fortran
Indeed, the standard implementation of Python (called CPython) cannot match the speed of
compiled languages such as C or Fortran
Does that mean that we should just switch to C or Fortran for everything?
The answer is no, no and one hundred times no
High productivity languages should be chosen over high-speed languages for the great major-
ity of scientific computing tasks
This is because

1. Of any given program, relatively few lines are ever going to be time-critical
2. For those lines of code that are time-critical, we can achieve C-like speed using a combi-
nation of NumPy and Numba

This lecture provides a guide

9.3 Where are the Bottlenecks?

Letโ€™s start by trying to understand why high-level languages like Python are slower than com-
piled code

9.3.1 Dynamic Typing

Consider this Python operation

In [2]: a, b = 10, 10
a + b

Out[2]: 20

Even for this simple operation, the Python interpreter has a fair bit of work to do
For example, in the statement a + b, the interpreter has to know which operation to invoke
If a and b are strings, then a + b requires string concatenation
9.3. WHERE ARE THE BOTTLENECKS? 125

In [3]: a, b = 'foo', 'bar'


a + b

Out[3]: 'foobar'

If a and b are lists, then a + b requires list concatenation

In [4]: a, b = ['foo'], ['bar']


a + b

Out[4]: ['foo', 'bar']

(We say that the operator + is overloaded โ€” its action depends on the type of the objects on
which it acts)
As a result, Python must check the type of the objects and then call the correct operation
This involves substantial overheads
Static Types
Compiled languages avoid these overheads with explicit, static types
For example, consider the following C code, which sums the integers from 1 to 10

#include <stdio.h>

int main(void) {
int i;
int sum = 0;
for (i = 1; i <= 10; i++) {
sum = sum + i;
}
printf("sum = %d\n", sum);
return 0;
}

The variables i and sum are explicitly declared to be integers


Hence, the meaning of addition here is completely unambiguous

9.3.2 Data Access

Another drag on speed for high-level languages is data access


To illustrate, letโ€™s consider the problem of summing some data โ€” say, a collection of integers
Summing with Compiled Code
In C or Fortran, these integers would typically be stored in an array, which is a simple data
structure for storing homogeneous data
Such an array is stored in a single contiguous block of memory

โ€ข In modern computers, memory addresses are allocated to each byte (one byte = 8 bits)
126 9. NUMBA

โ€ข For example, a 64 bit integer is stored in 8 bytes of memory


โ€ข An array of ๐‘› such integers occupies 8๐‘› consecutive memory slots

Moreover, the compiler is made aware of the data type by the programmer

โ€ข In this case 64 bit integers

Hence, each successive data point can be accessed by shifting forward in memory space by a
known and fixed amount

โ€ข In this case 8 bytes

Summing in Pure Python


Python tries to replicate these ideas to some degree
For example, in the standard Python implementation (CPython), list elements are placed in
memory locations that are in a sense contiguous
However, these list elements are more like pointers to data rather than actual data
Hence, there is still overhead involved in accessing the data values themselves
This is a considerable drag on speed
In fact, itโ€™s generally true that memory traffic is a major culprit when it comes to slow execu-
tion
Letโ€™s look at some ways around these problems

9.4 Vectorization

Vectorization is about sending batches of related operations to native machine code

โ€ข The machine code itself is typically compiled from carefully optimized C or Fortran

This can greatly accelerate many (but not all) numerical computations

9.4.1 Operations on Arrays

First, letโ€™s run some imports

In [5]: import random


import numpy as np
import quantecon as qe

Now letโ€™s try this non-vectorized code

In [6]: qe.util.tic() # Start timing


n = 100_000
sum = 0
for i in range(n):
x = random.uniform(0, 1)
sum += x**2
qe.util.toc() # End timing
9.4. VECTORIZATION 127

TOC: Elapsed: 0:00:0.04

Out[6]: 0.04178762435913086

Now compare this vectorized code

In [7]: qe.util.tic()
n = 100_000
x = np.random.uniform(0, 1, n)
np.sum(x**2)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[7]: 0.0038301944732666016

The second code block โ€” which achieves the same thing as the first โ€” runs much faster
The reason is that in the second implementation we have broken the loop down into three
basic operations

1. draw n uniforms
2. square them
3. sum them

These are sent as batch operators to optimized machine code


Apart from minor overheads associated with sending data back and forth, the result is C or
Fortran-like speed
When we run batch operations on arrays like this, we say that the code is vectorized
Vectorized code is typically fast and efficient
It is also surprisingly flexible, in the sense that many operations can be vectorized
The next section illustrates this point

9.4.2 Universal Functions

Many functions provided by NumPy are so-called universal functions โ€” also called ufuncs
This means that they

โ€ข map scalars into scalars, as expected


โ€ข map arrays into arrays, acting element-wise

For example, np.cos is a ufunc:

In [8]: np.cos(1.0)

Out[8]: 0.5403023058681398
128 9. NUMBA

In [9]: np.cos(np.linspace(0, 1, 3))

Out[9]: array([1. , 0.87758256, 0.54030231])

By exploiting ufuncs, many operations can be vectorized


For example, consider the problem of maximizing a function ๐‘“ of two variables (๐‘ฅ, ๐‘ฆ) over the
square [โˆ’๐‘Ž, ๐‘Ž] ร— [โˆ’๐‘Ž, ๐‘Ž]
For ๐‘“ and ๐‘Ž letโ€™s choose

cos(๐‘ฅ2 + ๐‘ฆ2 )
๐‘“(๐‘ฅ, ๐‘ฆ) = and ๐‘Ž = 3
1 + ๐‘ฅ2 + ๐‘ฆ 2

Hereโ€™s a plot of ๐‘“

In [10]: import matplotlib.pyplot as plt


%matplotlib inline
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm

def f(x, y):


return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

xgrid = np.linspace(-3, 3, 50)


ygrid = xgrid
x, y = np.meshgrid(xgrid, ygrid)

fig = plt.figure(figsize=(8, 6))


ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x,
y,
f(x, y),
rstride=2, cstride=2,
cmap=cm.jet,
alpha=0.7,
linewidth=0.25)
ax.set_zlim(-0.5, 1.0)
plt.show()
9.4. VECTORIZATION 129

To maximize it, weโ€™re going to use a naive grid search:

1. Evaluate ๐‘“ for all (๐‘ฅ, ๐‘ฆ) in a grid on the square


2. Return the maximum of observed values

Hereโ€™s a non-vectorized version that uses Python loops

In [11]: def f(x, y):


return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

grid = np.linspace(-3, 3, 1000)


m = -np.inf

qe.tic()
for x in grid:
for y in grid:
z = f(x, y)
if z > m:
m = z

qe.toc()

TOC: Elapsed: 0:00:2.74

Out[11]: 2.7486989498138428

And hereโ€™s a vectorized version

In [12]: def f(x, y):


return np.cos(x**2 + y**2) / (1 + x**2 + y**2)
130 9. NUMBA

grid = np.linspace(-3, 3, 1000)


x, y = np.meshgrid(grid, grid)

qe.tic()
np.max(f(x, y))
qe.toc()

TOC: Elapsed: 0:00:0.02

Out[12]: 0.02516627311706543

In the vectorized version, all the looping takes place in compiled code
As you can see, the second version is much faster
(Weโ€™ll make it even faster again below when we discuss Numba)

9.4.3 Pros and Cons of Vectorization

At its best, vectorization yields fast, simple code


However, itโ€™s not without disadvantages
One issue is that it can be highly memory-intensive
For example, the vectorized maximization routine above is far more memory intensive than
the non-vectorized version that preceded it
Another issue is that not all algorithms can be vectorized
In these kinds of settings, we need to go back to loops
Fortunately, there are nice ways to speed up Python loops

9.5 Numba

One exciting development in this direction is Numba


Numba aims to automatically compile functions to native machine code instructions on the
fly
The process isnโ€™t flawless, since Numba needs to infer type information on all variables to
generate pure machine instructions
Such inference isnโ€™t possible in every setting
But for simple routines, Numba infers types very well
Moreover, the โ€œhot loopsโ€ at the heart of our code that we need to speed up are often such
simple routines

9.5.1 Prerequisites

If you followed our set up instructions, then Numba should be installed


Make sure you have the latest version of Anaconda by running conda update anaconda
from a terminal (Mac, Linux) / Anaconda command prompt (Windows)
9.5. NUMBA 131

9.5.2 An Example

Letโ€™s consider some problems that are difficult to vectorize


One is generating the trajectory of a difference equation given an initial condition
Letโ€™s take the difference equation to be the quadratic map

๐‘ฅ๐‘ก+1 = 4๐‘ฅ๐‘ก (1 โˆ’ ๐‘ฅ๐‘ก )

Hereโ€™s the plot of a typical trajectory, starting from ๐‘ฅ0 = 0.1, with ๐‘ก on the x-axis

In [13]: def qm(x0, n):


x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x

x = qm(0.1, 250)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, 'b-', lw=2, alpha=0.8)
ax.set_xlabel('time', fontsize=16)
plt.show()

To speed this up using Numba is trivial using Numbaโ€™s jit function

In [14]: from numba import jit

qm_numba = jit(qm) # qm_numba is now a 'compiled' version of qm

Letโ€™s time and compare identical function calls across these two versions:

In [15]: qe.util.tic()
qm(0.1, int(10**5))
time1 = qe.util.toc()
132 9. NUMBA

TOC: Elapsed: 0:00:0.06

In [16]: qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()

TOC: Elapsed: 0:00:0.11

The first execution is relatively slow because of JIT compilation (see below)
Next time and all subsequent times it runs much faster:

In [17]: qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()

TOC: Elapsed: 0:00:0.00

In [18]: time1 / time2 # Calculate speed gain

Out[18]: 174.51294400963275

Thatโ€™s a speed increase of two orders of magnitude!


Your mileage will of course vary depending on hardware and so on
Nonetheless, two orders of magnitude is huge relative to how simple and clear the implemen-
tation is
Decorator Notation
If you donโ€™t need a separate name for the โ€œnumbafiedโ€ version of qm, you can just put @jit
before the function

In [19]: @jit
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x

This is equivalent to qm = jit(qm)

9.5.3 How and When it Works

Numba attempts to generate fast machine code using the infrastructure provided by the
LLVM Project
It does this by inferring type information on the fly
As you can imagine, this is easier for simple Python objects (simple scalar data types, such as
floats, integers, etc.)
Numba also plays well with NumPy arrays, which it treats as typed memory regions
9.5. NUMBA 133

In an ideal setting, Numba can infer all necessary type information


This allows it to generate native machine code, without having to call the Python runtime
environment
In such a setting, Numba will be on par with machine code from low-level languages
When Numba cannot infer all type information, some Python objects are given generic ob-
ject status, and some code is generated using the Python runtime
In this second setting, Numba typically provides only minor speed gains โ€” or none at all
Hence, itโ€™s prudent when using Numba to focus on speeding up small, time-critical snippets of
code
This will give you much better performance than blanketing your Python programs with
@jit statements
A Gotcha: Global Variables
Consider the following example

In [20]: a = 1

@jit
def add_x(x):
return a + x

print(add_x(10))

11

In [21]: a = 2

print(add_x(10))

11

Notice that changing the global had no effect on the value returned by the function
When Numba compiles machine code for functions, it treats global variables as constants to
ensure type stability

9.5.4 Numba for Vectorization

Numba can also be used to create custom ufuncs with the @vectorize decorator
To illustrate the advantage of using Numba to vectorize a function, we return to a maximiza-
tion problem discussed above

In [22]: from numba import vectorize

@vectorize
def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

grid = np.linspace(-3, 3, 1000)


x, y = np.meshgrid(grid, grid)
134 9. NUMBA

np.max(f_vec(x, y)) # Run once to compile

qe.tic()
np.max(f_vec(x, y))
qe.toc()

TOC: Elapsed: 0:00:0.03

Out[22]: 0.030055522918701172

This is faster than our vectorized version using NumPyโ€™s ufuncs


Why should that be? After all, anything vectorized with NumPy will be running in fast C or
Fortran code
The reason is that itโ€™s much less memory-intensive
For example, when NumPy computes np.cos(x**2 + y**2) it first creates the intermedi-
ate arrays x**2 and y**2, then it creates the array np.cos(x**2 + y**2)
In our @vectorize version using Numba, the entire operator is reduced to a single vector-
ized process and none of these intermediate arrays are created
We can gain further speed improvements using Numbaโ€™s automatic parallelization feature by
specifying target='parallel'
In this case, we need to specify the types of our inputs and outputs

In [23]: @vectorize('float64(float64, float64)', target='parallel')


def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

np.max(f_vec(x, y)) # Run once to compile

qe.tic()
np.max(f_vec(x, y))
qe.toc()

TOC: Elapsed: 0:00:0.02

Out[23]: 0.023700714111328125

This is a striking speed up with very little effort


10

Other Scientific Libraries

10.1 Contents

โ€ข Overview 10.2

โ€ข Cython 10.3

โ€ข Joblib 10.4

โ€ข Other Options 10.5

โ€ข Exercises 10.6

โ€ข Solutions 10.7

In addition to whatโ€™s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

10.2 Overview

In this lecture, we review some other scientific libraries that are useful for economic research
and analysis
We have, however, already picked most of the low hanging fruit in terms of economic research
Hence you should feel free to skip this lecture on first pass

10.3 Cython

Like Numba, Cython provides an approach to generating fast compiled code that can be used
from Python
As was the case with Numba, a key problem is the fact that Python is dynamically typed
As youโ€™ll recall, Numba solves this problem (where possible) by inferring type
Cythonโ€™s approach is different โ€” programmers add type definitions directly to their โ€œPythonโ€
code

135
136 10. OTHER SCIENTIFIC LIBRARIES

As such, the Cython language can be thought of as Python with type definitions
In addition to a language specification, Cython is also a language translator, transforming
Cython code into optimized C and C++ code
Cython also takes care of building language extensions โ€” the wrapper code that interfaces
between the resulting compiled code and Python
Important Note:
In what follows code is executed in a Jupyter notebook
This is to take advantage of a Cython cell magic that makes Cython particularly easy to use
Some modifications are required to run the code outside a notebook

โ€ข See the book Cython by Kurt Smith or the online documentation

10.3.1 A First Example

Letโ€™s start with a rather artificial example


๐‘›
Suppose that we want to compute the sum โˆ‘๐‘–=0 ๐›ผ๐‘– for given ๐›ผ, ๐‘›
Suppose further that weโ€™ve forgotten the basic formula

๐‘›
1 โˆ’ ๐›ผ๐‘›+1
โˆ‘ ๐›ผ๐‘– =
๐‘–=0
1โˆ’๐›ผ

for a geometric progression and hence have resolved to rely on a loop


Python vs C
Hereโ€™s a pure Python function that does the job

In [2]: def geo_prog(alpha, n):


current = 1.0
sum = current
for i in range(n):
current = current * alpha
sum = sum + current
return sum

This works fine but for large ๐‘› it is slow


Hereโ€™s a C function that will do the same thing

double geo_prog(double alpha, int n) {


double current = 1.0;
double sum = current;
int i;
for (i = 1; i <= n; i++) {
current = current * alpha;
sum = sum + current;
}
return sum;
}
10.3. CYTHON 137

If youโ€™re not familiar with C, the main thing you should take notice of is the type definitions

โ€ข int means integer


โ€ข double means double precision floating-point number
โ€ข the double in double geo_prog(... indicates that the function will return a dou-
ble

Not surprisingly, the C code is faster than the Python code


A Cython Implementation
Cython implementations look like a convex combination of Python and C
Weโ€™re going to run our Cython code in the Jupyter notebook, so weโ€™ll start by loading the
Cython extension in a notebook cell

In [3]: %load_ext Cython

In the next cell, we execute the following

In [4]: %%cython
def geo_prog_cython(double alpha, int n):
cdef double current = 1.0
cdef double sum = current
cdef int i
for i in range(n):
current = current * alpha
sum = sum + current
return sum

Here cdef is a Cython keyword indicating a variable declaration and is followed by a type
The %%cython line at the top is not actually Cython code โ€” itโ€™s a Jupyter cell magic indi-
cating the start of Cython code
After executing the cell, you can now call the function geo_prog_cython from within
Python
What you are in fact calling is compiled C code with a Python call interface

In [5]: import quantecon as qe


qe.util.tic()
geo_prog(0.99, int(10**6))
qe.util.toc()

TOC: Elapsed: 0:00:0.08

Out[5]: 0.0884397029876709

In [6]: qe.util.tic()
geo_prog_cython(0.99, int(10**6))
qe.util.toc()

TOC: Elapsed: 0:00:0.03

Out[6]: 0.03421354293823242
138 10. OTHER SCIENTIFIC LIBRARIES

10.3.2 Example 2: Cython with NumPy Arrays

Letโ€™s go back to the first problem that we worked with: generating the iterates of the
quadratic map

๐‘ฅ๐‘ก+1 = 4๐‘ฅ๐‘ก (1 โˆ’ ๐‘ฅ๐‘ก )

The problem of computing iterates and returning a time series requires us to work with ar-
rays
The natural array type to work with is NumPy arrays
Hereโ€™s a Cython implementation that initializes, populates and returns a NumPy array

In [7]: %%cython
import numpy as np

def qm_cython_first_pass(double x0, int n):


cdef int t
x = np.zeros(n+1, float)
x[0] = x0
for t in range(n):
x[t+1] = 4.0 * x[t] * (1 - x[t])
return np.asarray(x)

If you run this code and time it, you will see that its performance is disappointing โ€” nothing
like the speed gain we got from Numba

In [8]: qe.util.tic()
qm_cython_first_pass(0.1, int(10**5))
qe.util.toc()

TOC: Elapsed: 0:00:0.03

Out[8]: 0.03150629997253418

This example was also computed in the Numba lecture, and you can see Numba is around 90
times faster
The reason is that working with NumPy arrays incurs substantial Python overheads
We can do better by using Cythonโ€™s typed memoryviews, which provide more direct access to
arrays in memory
When using them, the first step is to create a NumPy array
Next, we declare a memoryview and bind it to the NumPy array
Hereโ€™s an example:

In [9]: %%cython
import numpy as np
from numpy cimport float_t

def qm_cython(double x0, int n):


cdef int t
x_np_array = np.zeros(n+1, dtype=float)
cdef float_t [:] x = x_np_array
x[0] = x0
for t in range(n):
x[t+1] = 4.0 * x[t] * (1 - x[t])
return np.asarray(x)
10.4. JOBLIB 139

Here

โ€ข cimport pulls in some compile-time information from NumPy


โ€ข cdef float_t [:] x = x_np_array creates a memoryview on the NumPy array
x_np_array
โ€ข the return statement uses np.asarray(x) to convert the memoryview back to a
NumPy array

Letโ€™s time it:

In [10]: qe.util.tic()
qm_cython(0.1, int(10**5))
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[10]: 0.0006136894226074219

This is fast, although still slightly slower than qm_numba

10.3.3 Summary

Cython requires more expertise than Numba, and is a little more fiddly in terms of getting
good performance
In fact, itโ€™s surprising how difficult it is to beat the speed improvements provided by Numba
Nonetheless,

โ€ข Cython is a very mature, stable and widely used tool


โ€ข Cython can be more useful than Numba when working with larger, more sophisticated
applications

10.4 Joblib

Joblib is a popular Python library for caching and parallelization


To install it, start Jupyter and type

In [11]: !pip install joblib

Requirement already satisfied: joblib in /home/anju/anaconda3/lib/python3.7/site-packages (0.13.2)

from within a notebook


Here we review just the basics
140 10. OTHER SCIENTIFIC LIBRARIES

10.4.1 Caching

Perhaps, like us, you sometimes run a long computation that simulates a model at a given set
of parameters โ€” to generate a figure, say, or a table
20 minutes later you realize that you want to tweak the figure and now you have to do it all
again
What caching will do is automatically store results at each parameterization
With Joblib, results are compressed and stored on file, and automatically served back up to
you when you repeat the calculation

10.4.2 An Example

Letโ€™s look at a toy example, related to the quadratic map model discussed above
Letโ€™s say we want to generate a long trajectory from a certain initial condition ๐‘ฅ0 and see
what fraction of the sample is below 0.1
(Weโ€™ll omit JIT compilation or other speedups for simplicity)
Hereโ€™s our code

In [12]: from joblib import Memory


location = './cachedir'
memory = Memory(location='./joblib_cache')

@memory.cache
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return np.mean(x < 0.1)

We are using joblib to cache the result of calling qm at a given set of parameters
With the argument location=โ€™./joblib_cacheโ€™, any call to this function results in both the in-
put values and output values being stored a subdirectory joblib_cache of the present working
directory
(In UNIX shells, . refers to the present working directory)
The first time we call the function with a given set of parameters we see some extra output
that notes information being cached

In [13]: qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()

________________________________________________________________________________
[Memory] Calling __main__--home-anju-Desktop-lecture-source-py-_build-jupyter-executed-__ipython-input__.qmโ€ฆ
qm(0.2, 10000000)
_______________________________________________________________qm - 8.9s, 0.1min
TOC: Elapsed: 0:00:8.85

Out[13]: 8.85545039176941
10.5. OTHER OPTIONS 141

The next time we call the function with the same set of parameters, the result is returned
almost instantaneously

In [14]: qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[14]: 0.0007827281951904297

10.5 Other Options

There are in fact many other approaches to speeding up your Python code
One is interfacing with Fortran
If you are comfortable writing Fortran you will find it very easy to create extension modules
from Fortran code using F2Py
F2Py is a Fortran-to-Python interface generator that is particularly simple to use
Robert Johansson provides a very nice introduction to F2Py, among other things
Recently, a Jupyter cell magic for Fortran has been developed โ€” you might want to give it a
try

10.6 Exercises

10.6.1 Exercise 1

Later weโ€™ll learn all about finite-state Markov chains


For now, letโ€™s just concentrate on simulating a very simple example of such a chain
Suppose that the volatility of returns on an asset can be in one of two regimes โ€” high or low
The transition probabilities across states are as follows

For example, let the period length be one month, and suppose the current state is high
We see from the graph that the state next month will be

โ€ข high with probability 0.8


โ€ข low with probability 0.2
142 10. OTHER SCIENTIFIC LIBRARIES

Your task is to simulate a sequence of monthly volatility states according to this rule
Set the length of the sequence to n = 100000 and start in the high state
Implement a pure Python version, a Numba version and a Cython version, and compare
speeds
To test your code, evaluate the fraction of time that the chain spends in the low state
If your code is correct, it should be about 2/3

10.7 Solutions

10.7.1 Exercise 1

We let

โ€ข 0 represent โ€œlowโ€
โ€ข 1 represent โ€œhighโ€

In [15]: p, q = 0.1, 0.2 # Prob of leaving low and high state respectively

Hereโ€™s a pure Python version of the function

In [16]: def compute_series(n):


x = np.empty(n, dtype=int)
x[0] = 1 # Start in state 1
U = np.random.uniform(0, 1, size=n)
for t in range(1, n):
current_x = x[t-1]
if current_x == 0:
x[t] = U[t] < p
else:
x[t] = U[t] > q
return x

Letโ€™s run this code and check that the fraction of time spent in the low state is about 0.666

In [17]: n = 100000
x = compute_series(n)
print(np.mean(x == 0)) # Fraction of time x is in state 0

0.6629

Now letโ€™s time it

In [18]: qe.util.tic()
compute_series(n)
qe.util.toc()

TOC: Elapsed: 0:00:0.07

Out[18]: 0.0751335620880127
10.7. SOLUTIONS 143

Next letโ€™s implement a Numba version, which is easy

In [19]: from numba import jit

compute_series_numba = jit(compute_series)

Letโ€™s check we still get the right numbers

In [20]: x = compute_series_numba(n)
print(np.mean(x == 0))

0.66566

Letโ€™s see the time

In [21]: qe.util.tic()
compute_series_numba(n)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[21]: 0.0015265941619873047

This is a nice speed improvement for one line of code


Now letโ€™s implement a Cython version

In [22]: %load_ext Cython

The Cython extension is already loaded. To reload it, use:


%reload_ext Cython

In [23]: %%cython
import numpy as np
from numpy cimport int_t, float_t

def compute_series_cy(int n):


# == Create NumPy arrays first == #
x_np = np.empty(n, dtype=int)
U_np = np.random.uniform(0, 1, size=n)
# == Now create memoryviews of the arrays == #
cdef int_t [:] x = x_np
cdef float_t [:] U = U_np
# == Other variable declarations == #
cdef float p = 0.1
cdef float q = 0.2
cdef int t
# == Main loop == #
x[0] = 1
for t in range(1, n):
current_x = x[t-1]
if current_x == 0:
x[t] = U[t] < p
else:
x[t] = U[t] > q
return np.asarray(x)

In [24]: compute_series_cy(10)
144 10. OTHER SCIENTIFIC LIBRARIES

Out[24]: array([1, 1, 1, 1, 0, 0, 1, 0, 0, 0])

In [25]: x = compute_series_cy(n)
print(np.mean(x == 0))

0.66746

In [26]: qe.util.tic()
compute_series_cy(n)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[26]: 0.0033597946166992188

The Cython implementation is fast but not as fast as Numba


Part III

Advanced Python Programming

145
11

Writing Good Code

11.1 Contents

โ€ข Overview 11.2

โ€ข An Example of Bad Code 11.3

โ€ข Good Coding Practice 11.4

โ€ข Revisiting the Example 11.5

โ€ข Summary 11.6

11.2 Overview

When computer programs are small, poorly written code is not overly costly
But more data, more sophisticated models, and more computer power are enabling us to take
on more challenging problems that involve writing longer programs
For such programs, investment in good coding practices will pay high returns
The main payoffs are higher productivity and faster code
In this lecture, we review some elements of good coding practice
We also touch on modern developments in scientific computing โ€” such as just in time compi-
lation โ€” and how they affect good program design

11.3 An Example of Bad Code

Letโ€™s have a look at some poorly written code


The job of the code is to generate and plot time series of the simplified Solow model

๐‘˜๐‘ก+1 = ๐‘ ๐‘˜๐‘ก๐›ผ + (1 โˆ’ ๐›ฟ)๐‘˜๐‘ก , ๐‘ก = 0, 1, 2, โ€ฆ (1)

Here

147
148 11. WRITING GOOD CODE

โ€ข ๐‘˜๐‘ก is capital at time ๐‘ก and


โ€ข ๐‘ , ๐›ผ, ๐›ฟ are parameters (savings, a productivity parameter and depreciation)

For each parameterization, the code

1. sets ๐‘˜0 = 1
2. iterates using Eq. (1) to produce a sequence ๐‘˜0 , ๐‘˜1 , ๐‘˜2 โ€ฆ , ๐‘˜๐‘‡
3. plots the sequence

The plots will be grouped into three subfigures


In each subfigure, two parameters are held fixed while another varies

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

# Allocate memory for time series


k = np.empty(50)

fig, axes = plt.subplots(3, 1, figsize=(12, 15))

# Trajectories with different ฮฑ


ฮด = 0.1
s = 0.4
ฮฑ = (0.25, 0.33, 0.45)

for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**ฮฑ[j] + (1 - ฮด) * k[t]
axes[0].plot(k, 'o-', label=rf"$\alpha = {ฮฑ[j]},\; s = {s},\; \delta={ฮด}$")

axes[0].grid(lw=0.2)
axes[0].set_ylim(0, 18)
axes[0].set_xlabel('time')
axes[0].set_ylabel('capital')
axes[0].legend(loc='upper left', frameon=True, fontsize=14)

# Trajectories with different s


ฮด = 0.1
ฮฑ = 0.33
s = (0.3, 0.4, 0.5)

for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s[j] * k[t]**ฮฑ + (1 - ฮด) * k[t]
axes[1].plot(k, 'o-', label=rf"$\alpha = {ฮฑ},\; s = {s},\; \delta={ฮด}$")

axes[1].grid(lw=0.2)
axes[1].set_xlabel('time')
axes[1].set_ylabel('capital')
axes[1].set_ylim(0, 18)
axes[1].legend(loc='upper left', frameon=True, fontsize=14)

# Trajectories with different ฮด


ฮด = (0.05, 0.1, 0.15)
ฮฑ = 0.33
s = 0.4

for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**ฮฑ + (1 - ฮด[j]) * k[t]
axes[2].plot(k, 'o-', label=rf"$\alpha = {ฮฑ},\; s = {s},\; \delta={ฮด[j]}$")
11.3. AN EXAMPLE OF BAD CODE 149

axes[2].set_ylim(0, 18)
axes[2].set_xlabel('time')
axes[2].set_ylabel('capital')
axes[2].grid(lw=0.2)
axes[2].legend(loc='upper left', frameon=True, fontsize=14)

plt.show()

True, the code more or less follows PEP8


At the same time, itโ€™s very poorly structured
Letโ€™s talk about why thatโ€™s the case, and what we can do about it
150 11. WRITING GOOD CODE

11.4 Good Coding Practice

There are usually many different ways to write a program that accomplishes a given task
For small programs, like the one above, the way you write code doesnโ€™t matter too much
But if you are ambitious and want to produce useful things, youโ€™ll write medium to large pro-
grams too
In those settings, coding style matters a great deal
Fortunately, lots of smart people have thought about the best way to write code
Here are some basic precepts

11.4.1 Donโ€™t Use Magic Numbers

If you look at the code above, youโ€™ll see numbers like 50 and 49 and 3 scattered through the
code
These kinds of numeric literals in the body of your code are sometimes called โ€œmagic num-
bersโ€
This is not a complement
While numeric literals are not all evil, the numbers shown in the program above should cer-
tainly be replaced by named constants
For example, the code above could declare the variable time_series_length = 50
Then in the loops, 49 should be replaced by time_series_length - 1
The advantages are:

โ€ข the meaning is much clearer throughout


โ€ข to alter the time series length, you only need to change one value

11.4.2 Donโ€™t Repeat Yourself

The other mortal sin in the code snippet above is repetition


Blocks of logic (such as the loop to generate time series) are repeated with only minor
changes
This violates a fundamental tenet of programming: Donโ€™t repeat yourself (DRY)

โ€ข Also called DIE (duplication is evil)

Yes, we realize that you can just cut and paste and change a few symbols
But as a programmer, your aim should be to automate repetition, not do it yourself
More importantly, repeating the same logic in different places means that eventually one of
them will likely be wrong
If you want to know more, read the excellent summary found on this page
Weโ€™ll talk about how to avoid repetition below
11.4. GOOD CODING PRACTICE 151

11.4.3 Minimize Global Variables

Sure, global variables (i.e., names assigned to values outside of any function or class) are con-
venient
Rookie programmers typically use global variables with abandon โ€” as we once did ourselves
But global variables are dangerous, especially in medium to large size programs, since

โ€ข they can affect what happens in any part of your program


โ€ข they can be changed by any function

This makes it much harder to be certain about what some small part of a given piece of code
actually commands
Hereโ€™s a useful discussion on the topic
While the odd global in small scripts is no big deal, we recommend that you teach yourself to
avoid them
(Weโ€™ll discuss how just below)
JIT Compilation
In fact, thereโ€™s now another good reason to avoid global variables
In scientific computing, weโ€™re witnessing the rapid growth of just in time (JIT) compilation
JIT compilation can generate excellent performance for scripting languages like Python and
Julia
But the task of the compiler used for JIT compilation becomes much harder when many
global variables are present
(This is because data type instability hinders the generation of efficient machine code โ€” weโ€™ll
learn more about such topics later on)

11.4.4 Use Functions or Classes

Fortunately, we can easily avoid the evils of global variables and WET code

โ€ข WET stands for โ€œwe love typingโ€ and is the opposite of DRY

We can do this by making frequent use of functions or classes


In fact, functions and classes are designed specifically to help us avoid shaming ourselves by
repeating code or excessive use of global variables
Which One, Functions or Classes?
Both can be useful, and in fact they work well with each other
Weโ€™ll learn more about these topics over time
(Personal preference is part of the story too)
Whatโ€™s really important is that you use one or the other or both
152 11. WRITING GOOD CODE

11.5 Revisiting the Example

Hereโ€™s some code that reproduces the plot above with better coding style
It uses a function to avoid repetition
Note also that

โ€ข global variables are quarantined by collecting together at the end, not the start of the
program
โ€ข magic numbers are avoided
โ€ข the loop at the end where the actual work is done is short and relatively simple

In [2]: from itertools import product

def plot_path(ax, ฮฑs, s_vals, ฮดs, series_length=50):


"""
Add a time series plot to the axes ax for all given parameters.
"""
k = np.empty(series_length)

for (ฮฑ, s, ฮด) in product(ฮฑs, s_vals, ฮดs):


k[0] = 1
for t in range(series_length-1):
k[t+1] = s * k[t]**ฮฑ + (1 - ฮด) * k[t]
ax.plot(k, 'o-', label=rf"$\alpha = {ฮฑ},\; s = {s},\; \delta = {ฮด}$")

ax.grid(lw=0.2)
ax.set_xlabel('time')
ax.set_ylabel('capital')
ax.set_ylim(0, 18)
ax.legend(loc='upper left', frameon=True, fontsize=14)

fig, axes = plt.subplots(3, 1, figsize=(12, 15))

# Parameters (ฮฑs, s_vals, ฮดs)


set_one = ([0.25, 0.33, 0.45], [0.4], [0.1])
set_two = ([0.33], [0.3, 0.4, 0.5], [0.1])
set_three = ([0.33], [0.4], [0.05, 0.1, 0.15])

for (ax, params) in zip(axes, (set_one, set_two, set_three)):


ฮฑs, s_vals, ฮดs = params
plot_path(ax, ฮฑs, s_vals, ฮดs)

plt.show()
11.6. SUMMARY 153

11.6 Summary

Writing decent code isnโ€™t hard


Itโ€™s also fun and intellectually satisfying
We recommend that you cultivate good habits and style even when you write relatively short
programs
154 11. WRITING GOOD CODE
12

OOP II: Building Classes

12.1 Contents

โ€ข Overview 12.2

โ€ข OOP Review 12.3

โ€ข Defining Your Own Classes 12.4

โ€ข Special Methods 12.5

โ€ข Exercises 12.6

โ€ข Solutions 12.7

12.2 Overview

In an earlier lecture, we learned some foundations of object-oriented programming


The objectives of this lecture are

โ€ข cover OOP in more depth


โ€ข learn how to build our own objects, specialized to our needs

For example, you already know how to

โ€ข create lists, strings and other Python objects


โ€ข use their methods to modify their contents

So imagine now you want to write a program with consumers, who can

โ€ข hold and spend cash


โ€ข consume goods
โ€ข work and earn cash

A natural solution in Python would be to create consumers as objects with

155
156 12. OOP II: BUILDING CLASSES

โ€ข data, such as cash on hand


โ€ข methods, such as buy or work that affect this data

Python makes it easy to do this, by providing you with class definitions


Classes are blueprints that help you build objects according to your own specifications
It takes a little while to get used to the syntax so weโ€™ll provide plenty of examples

12.3 OOP Review

OOP is supported in many languages:

โ€ข JAVA and Ruby are relatively pure OOP


โ€ข Python supports both procedural and object-oriented programming
โ€ข Fortran and MATLAB are mainly procedural, some OOP recently tacked on
โ€ข C is a procedural language, while C++ is C with OOP added on top

Letโ€™s cover general OOP concepts before we specialize to Python

12.3.1 Key Concepts

As discussed an earlier lecture, in the OOP paradigm, data and functions are bundled to-
gether into โ€œobjectsโ€
An example is a Python list, which not only stores data but also knows how to sort itself, etc.

In [1]: x = [1, 5, 4]
x.sort()
x

Out[1]: [1, 4, 5]

As we now know, sort is a function that is โ€œpart ofโ€ the list object โ€” and hence called a
method
If we want to make our own types of objects we need to use class definitions
A class definition is a blueprint for a particular class of objects (e.g., lists, strings or complex
numbers)
It describes

โ€ข What kind of data the class stores


โ€ข What methods it has for acting on these data

An object or instance is a realization of the class, created from the blueprint

โ€ข Each instance has its own unique data


โ€ข Methods set out in the class definition act on this (and other) data
12.4. DEFINING YOUR OWN CLASSES 157

In Python, the data and methods of an object are collectively referred to as attributes
Attributes are accessed via โ€œdotted attribute notationโ€

โ€ข object_name.data
โ€ข object_name.method_name()

In the example

In [2]: x = [1, 5, 4]
x.sort()
x.__class__

Out[2]: list

โ€ข x is an object or instance, created from the definition for Python lists, but with its own
particular data
โ€ข x.sort() and x.__class__ are two attributes of x
โ€ข dir(x) can be used to view all the attributes of x

12.3.2 Why is OOP Useful?

OOP is useful for the same reason that abstraction is useful: for recognizing and exploiting
the common structure
For example,

โ€ข a Markov chain consists of a set of states and a collection of transition probabilities for
moving across states
โ€ข a general equilibrium theory consists of a commodity space, preferences, technologies,
and an equilibrium definition
โ€ข a game consists of a list of players, lists of actions available to each player, player pay-
offs as functions of all playersโ€™ actions, and a timing protocol

These are all abstractions that collect together โ€œobjectsโ€ of the same โ€œtypeโ€
Recognizing common structure allows us to employ common tools
In economic theory, this might be a proposition that applies to all games of a certain type
In Python, this might be a method thatโ€™s useful for all Markov chains (e.g., simulate)
When we use OOP, the simulate method is conveniently bundled together with the Markov
chain object

12.4 Defining Your Own Classes

Letโ€™s build some simple classes to start off


158 12. OOP II: BUILDING CLASSES

12.4.1 Example: A Consumer Class

First, weโ€™ll build a Consumer class with

โ€ข a wealth attribute that stores the consumerโ€™s wealth (data)


โ€ข an earn method, where earn(y) increments the consumerโ€™s wealth by y
โ€ข a spend method, where spend(x) either decreases wealth by x or returns an error if
insufficient funds exist

Admittedly a little contrived, this example of a class helps us internalize some new syntax
Hereโ€™s one implementation

In [3]: class Consumer:

def __init__(self, w):


"Initialize consumer with w dollars of wealth"
self.wealth = w

def earn(self, y):


"The consumer earns y dollars"
self.wealth += y

def spend(self, x):


"The consumer spends x dollars if feasible"
new_wealth = self.wealth - x
if new_wealth < 0:
print("Insufficent funds")
else:
self.wealth = new_wealth

Thereโ€™s some special syntax here so letโ€™s step through carefully

โ€ข The class keyword indicates that we are building a class

This class defines instance data wealth and three methods: __init__, earn and spend

โ€ข wealth is instance data because each consumer we create (each instance of the Con-
sumer class) will have its own separate wealth data

The ideas behind the earn and spend methods were discussed above
Both of these act on the instance data wealth
The __init__ method is a constructor method
Whenever we create an instance of the class, this method will be called automatically
Calling __init__ sets up a โ€œnamespaceโ€ to hold the instance data โ€” more on this soon
Weโ€™ll also discuss the role of self just below
Usage
Hereโ€™s an example of usage

In [4]: c1 = Consumer(10) # Create instance with initial wealth 10


c1.spend(5)
c1.wealth
12.4. DEFINING YOUR OWN CLASSES 159

Out[4]: 5

In [5]: c1.earn(15)
c1.spend(100)

Insufficent funds

We can of course create multiple instances each with its own data

In [6]: c1 = Consumer(10)
c2 = Consumer(12)
c2.spend(4)
c2.wealth

Out[6]: 8

In [7]: c1.wealth

Out[7]: 10

In fact, each instance stores its data in a separate namespace dictionary

In [8]: c1.__dict__

Out[8]: {'wealth': 10}

In [9]: c2.__dict__

Out[9]: {'wealth': 8}

When we access or set attributes weโ€™re actually just modifying the dictionary maintained by
the instance
Self
If you look at the Consumer class definition again youโ€™ll see the word self throughout the
code
The rules with self are that

โ€ข Any instance data should be prepended with self

โ€“ e.g., the earn method references self.wealth rather than just wealth

โ€ข Any method defined within the class should have self as its first argument

โ€“ e.g., def earn(self, y) rather than just def earn(y)

โ€ข Any method referenced within the class should be called as self.method_name

There are no examples of the last rule in the preceding code but we will see some shortly
Details
In this section, we look at some more formal details related to classes and self
160 12. OOP II: BUILDING CLASSES

โ€ข You might wish to skip to the next section on first pass of this lecture
โ€ข You can return to these details after youโ€™ve familiarized yourself with more examples

Methods actually live inside a class object formed when the interpreter reads the class defini-
tion

In [10]: print(Consumer.__dict__) # Show __dict__ attribute of class object

{'__module__': '__main__', '__init__': <function Consumer.__init__ at 0x7f89127b42f0>, 'earn': <function Consu

Note how the three methods __init__, earn and spend are stored in the class object
Consider the following code

In [11]: c1 = Consumer(10)
c1.earn(10)
c1.wealth

Out[11]: 20

When you call earn via c1.earn(10) the interpreter passes the instance c1 and the argu-
ment 10 to Consumer.earn
In fact, the following are equivalent

โ€ข c1.earn(10)
โ€ข Consumer.earn(c1, 10)

In the function call Consumer.earn(c1, 10) note that c1 is the first argument
Recall that in the definition of the earn method, self is the first parameter

In [12]: def earn(self, y):


"The consumer earns y dollars"
self.wealth += y

The end result is that self is bound to the instance c1 inside the function call
Thatโ€™s why the statement self.wealth += y inside earn ends up modifying c1.wealth

12.4.2 Example: The Solow Growth Model

For our next example, letโ€™s write a simple class to implement the Solow growth model
The Solow growth model is a neoclassical growth model where the amount of capital stock
per capita ๐‘˜๐‘ก evolves according to the rule

๐‘ ๐‘ง๐‘˜๐‘ก๐›ผ + (1 โˆ’ ๐›ฟ)๐‘˜๐‘ก
๐‘˜๐‘ก+1 = (1)
1+๐‘›

Here
12.4. DEFINING YOUR OWN CLASSES 161

โ€ข ๐‘  is an exogenously given savings rate


โ€ข ๐‘ง is a productivity parameter
โ€ข ๐›ผ is capitalโ€™s share of income
โ€ข ๐‘› is the population growth rate
โ€ข ๐›ฟ is the depreciation rate

The steady state of the model is the ๐‘˜ that solves Eq. (1) when ๐‘˜๐‘ก+1 = ๐‘˜๐‘ก = ๐‘˜
Hereโ€™s a class that implements this model
Some points of interest in the code are

โ€ข An instance maintains a record of its current capital stock in the variable self.k

โ€ข The h method implements the right-hand side of Eq. (1)

โ€ข The update method uses h to update capital as per Eq. (1)

โ€“ Notice how inside update the reference to the local method h is self.h

The methods steady_state and generate_sequence are fairly self-explanatory

In [13]: class Solow:


r"""
Implements the Solow growth model with the update rule

k_{t+1} = [(s z k^ฮฑ_t) + (1 - ฮด)k_t] /(1 + n)

"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
ฮด=0.1, # depreciation rate
ฮฑ=0.3, # share of labor
z=2.0, # productivity
k=1.0): # current capital stock

self.n, self.s, self.ฮด, self.ฮฑ, self.z = n, s, ฮด, ฮฑ, z


self.k = k

def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, ฮด, ฮฑ, z = self.n, self.s, self.ฮด, self.ฮฑ, self.z
# Apply the update rule
return (s * z * self.k**ฮฑ + (1 - ฮด) * self.k) / (1 + n)

def update(self):
"Update the current state (i.e., the capital stock)."
self.k = self.h()

def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, ฮด, ฮฑ, z = self.n, self.s, self.ฮด, self.ฮฑ, self.z
# Compute and return steady state
return ((s * z) / (n + ฮด))**(1 / (1 - ฮฑ))

def generate_sequence(self, t):


"Generate and return a time series of length t"
path = []
for i in range(t):
path.append(self.k)
self.update()
return path
162 12. OOP II: BUILDING CLASSES

Hereโ€™s a little program that uses the class to compute time series from two different initial
conditions
The common steady state is also plotted for comparison

In [14]: import matplotlib.pyplot as plt


%matplotlib inline

s1 = Solow()
s2 = Solow(k=8.0)

T = 60
fig, ax = plt.subplots(figsize=(9, 6))

# Plot the common steady state value of capital


ax.plot([s1.steady_state()]*T, 'k-', label='steady state')

# Plot time series for each economy


for s in s1, s2:
lb = f'capital series from initial state {s.k}'
ax.plot(s.generate_sequence(T), 'o-', lw=2, alpha=0.6, label=lb)

ax.legend()
plt.show()

12.4.3 Example: A Market

Next, letโ€™s write a class for a simple one good market where agents are price takers
The market consists of the following objects:

โ€ข A linear demand curve ๐‘„ = ๐‘Ž๐‘‘ โˆ’ ๐‘๐‘‘ ๐‘


โ€ข A linear supply curve ๐‘„ = ๐‘Ž๐‘ง + ๐‘๐‘ง (๐‘ โˆ’ ๐‘ก)
12.4. DEFINING YOUR OWN CLASSES 163

Here

โ€ข ๐‘ is price paid by the consumer, ๐‘„ is quantity and ๐‘ก is a per-unit tax


โ€ข Other symbols are demand and supply parameters

The class provides methods to compute various values of interest, including competitive equi-
librium price and quantity, tax revenue raised, consumer surplus and producer surplus
Hereโ€™s our implementation

In [15]: from scipy.integrate import quad

class Market:

def __init__(self, ad, bd, az, bz, tax):


"""
Set up market parameters. All parameters are scalars. See
https://lectures.quantecon.org/py/python_oop.html for interpretation.

"""
self.ad, self.bd, self.az, self.bz, self.tax = ad, bd, az, bz, tax
if ad < az:
raise ValueError('Insufficient demand.')

def price(self):
"Return equilibrium price"
return (self.ad - self.az + self.bz * self.tax) / (self.bd + self.bz)

def quantity(self):
"Compute equilibrium quantity"
return self.ad - self.bd * self.price()

def consumer_surp(self):
"Compute consumer surplus"
# == Compute area under inverse demand function == #
integrand = lambda x: (self.ad / self.bd) - (1 / self.bd) * x
area, error = quad(integrand, 0, self.quantity())
return area - self.price() * self.quantity()

def producer_surp(self):
"Compute producer surplus"
# == Compute area above inverse supply curve, excluding tax == #
integrand = lambda x: -(self.az / self.bz) + (1 / self.bz) * x
area, error = quad(integrand, 0, self.quantity())
return (self.price() - self.tax) * self.quantity() - area

def taxrev(self):
"Compute tax revenue"
return self.tax * self.quantity()

def inverse_demand(self, x):


"Compute inverse demand"
return self.ad / self.bd - (1 / self.bd)* x

def inverse_supply(self, x):


"Compute inverse supply curve"
return -(self.az / self.bz) + (1 / self.bz) * x + self.tax

def inverse_supply_no_tax(self, x):


"Compute inverse supply curve without tax"
return -(self.az / self.bz) + (1 / self.bz) * x

Hereโ€™s a sample of usage

In [16]: baseline_params = 15, .5, -2, .5, 3


m = Market(*baseline_params)
print("equilibrium price = ", m.price())
164 12. OOP II: BUILDING CLASSES

equilibrium price = 18.5

In [17]: print("consumer surplus = ", m.consumer_surp())

consumer surplus = 33.0625

Hereโ€™s a short program that uses this class to plot an inverse demand curve together with in-
verse supply curves with and without taxes

In [18]: import numpy as np

# Baseline ad, bd, az, bz, tax


baseline_params = 15, .5, -2, .5, 3
m = Market(*baseline_params)

q_max = m.quantity() * 2
q_grid = np.linspace(0.0, q_max, 100)
pd = m.inverse_demand(q_grid)
ps = m.inverse_supply(q_grid)
psno = m.inverse_supply_no_tax(q_grid)

fig, ax = plt.subplots()
ax.plot(q_grid, pd, lw=2, alpha=0.6, label='demand')
ax.plot(q_grid, ps, lw=2, alpha=0.6, label='supply')
ax.plot(q_grid, psno, '--k', lw=2, alpha=0.6, label='supply without tax')
ax.set_xlabel('quantity', fontsize=14)
ax.set_xlim(0, q_max)
ax.set_ylabel('price', fontsize=14)
ax.legend(loc='lower right', frameon=False, fontsize=14)
plt.show()

The next program provides a function that

โ€ข takes an instance of Market as a parameter


12.4. DEFINING YOUR OWN CLASSES 165

โ€ข computes dead weight loss from the imposition of the tax

In [19]: def deadw(m):


"Computes deadweight loss for market m."
# == Create analogous market with no tax == #
m_no_tax = Market(m.ad, m.bd, m.az, m.bz, 0)
# == Compare surplus, return difference == #
surp1 = m_no_tax.consumer_surp() + m_no_tax.producer_surp()
surp2 = m.consumer_surp() + m.producer_surp() + m.taxrev()
return surp1 - surp2

Hereโ€™s an example of usage

In [20]: baseline_params = 15, .5, -2, .5, 3


m = Market(*baseline_params)
deadw(m) # Show deadweight loss

Out[20]: 1.125

12.4.4 Example: Chaos

Letโ€™s look at one more example, related to chaotic dynamics in nonlinear systems
One simple transition rule that can generate complex dynamics is the logistic map

๐‘ฅ๐‘ก+1 = ๐‘Ÿ๐‘ฅ๐‘ก (1 โˆ’ ๐‘ฅ๐‘ก ), ๐‘ฅ0 โˆˆ [0, 1], ๐‘Ÿ โˆˆ [0, 4] (2)

Letโ€™s write a class for generating time series from this model
Hereโ€™s one implementation

In [21]: class Chaos:


"""
Models the dynamical system with :math:`x_{t+1} = r x_t (1 - x_t)`
"""
def __init__(self, x0, r):
"""
Initialize with state x0 and parameter r
"""
self.x, self.r = x0, r

def update(self):
"Apply the map to update state."
self.x = self.r * self.x *(1 - self.x)

def generate_sequence(self, n):


"Generate and return a sequence of length n."
path = []
for i in range(n):
path.append(self.x)
self.update()
return path

Hereโ€™s an example of usage

In [22]: ch = Chaos(0.1, 4.0) # x0 = 0.1 and r = 0.4


ch.generate_sequence(5) # First 5 iterates

Out[22]: [0.1, 0.36000000000000004, 0.9216, 0.28901376000000006, 0.8219392261226498]


166 12. OOP II: BUILDING CLASSES

This piece of code plots a longer trajectory

In [23]: ch = Chaos(0.1, 4.0)


ts_length = 250

fig, ax = plt.subplots()
ax.set_xlabel('$t$', fontsize=14)
ax.set_ylabel('$x_t$', fontsize=14)
x = ch.generate_sequence(ts_length)
ax.plot(range(ts_length), x, 'bo-', alpha=0.5, lw=2, label='$x_t$')
plt.show()

The next piece of code provides a bifurcation diagram

In [24]: fig, ax = plt.subplots()


ch = Chaos(0.1, 4)
r = 2.5
while r < 4:
ch.r = r
t = ch.generate_sequence(1000)[950:]
ax.plot([r] * len(t), t, 'b.', ms=0.6)
r = r + 0.005

ax.set_xlabel('$r$', fontsize=16)
plt.show()
12.5. SPECIAL METHODS 167

On the horizontal axis is the parameter ๐‘Ÿ in Eq. (2)


The vertical axis is the state space [0, 1]
For each ๐‘Ÿ we compute a long time series and then plot the tail (the last 50 points)
The tail of the sequence shows us where the trajectory concentrates after settling down to
some kind of steady state, if a steady state exists
Whether it settles down, and the character of the steady state to which it does settle down,
depend on the value of ๐‘Ÿ
For ๐‘Ÿ between about 2.5 and 3, the time series settles into a single fixed point plotted on the
vertical axis
For ๐‘Ÿ between about 3 and 3.45, the time series settles down to oscillating between the two
values plotted on the vertical axis
For ๐‘Ÿ a little bit higher than 3.45, the time series settles down to oscillating among the four
values plotted on the vertical axis
Notice that there is no value of ๐‘Ÿ that leads to a steady state oscillating among three values

12.5 Special Methods

Python provides special methods with which some neat tricks can be performed
For example, recall that lists and tuples have a notion of length and that this length can be
queried via the len function

In [25]: x = (10, 20)


len(x)
168 12. OOP II: BUILDING CLASSES

Out[25]: 2

If you want to provide a return value for the len function when applied to your user-defined
object, use the __len__ special method

In [26]: class Foo:

def __len__(self):
return 42

Now we get

In [27]: f = Foo()
len(f)

Out[27]: 42

A special method we will use regularly is the __call__ method


This method can be used to make your instances callable, just like functions

In [28]: class Foo:

def __call__(self, x):


return x + 42

After running we get

In [29]: f = Foo()
f(8) # Exactly equivalent to f.__call__(8)

Out[29]: 50

Exercise 1 provides a more useful example

12.6 Exercises

12.6.1 Exercise 1

The empirical cumulative distribution function (ecdf) corresponding to a sample {๐‘‹๐‘– }๐‘›๐‘–=1 is
defined as

1 ๐‘›
๐น๐‘› (๐‘ฅ) โˆถ= โˆ‘ 1{๐‘‹๐‘– โ‰ค ๐‘ฅ} (๐‘ฅ โˆˆ R) (3)
๐‘› ๐‘–=1

Here 1{๐‘‹๐‘– โ‰ค ๐‘ฅ} is an indicator function (one if ๐‘‹๐‘– โ‰ค ๐‘ฅ and zero otherwise) and hence ๐น๐‘› (๐‘ฅ)
is the fraction of the sample that falls below ๐‘ฅ
The Glivenkoโ€“Cantelli Theorem states that, provided that the sample is IID, the ecdf ๐น๐‘› con-
verges to the true distribution function ๐น
Implement ๐น๐‘› as a class called ECDF, where
12.7. SOLUTIONS 169

โ€ข A given sample {๐‘‹๐‘– }๐‘›๐‘–=1 are the instance data, stored as self.observations
โ€ข The class implements a __call__ method that returns ๐น๐‘› (๐‘ฅ) for any ๐‘ฅ

Your code should work as follows (modulo randomness)

from random import uniform

samples = [uniform(0, 1) for i in range(10)]


F = ECDF(samples)
F(0.5) # Evaluate ecdf at x = 0.5

F.observations = [uniform(0, 1) for i in range(1000)]


F(0.5)

Aim for clarity, not efficiency

12.6.2 Exercise 2

In an earlier exercise, you wrote a function for evaluating polynomials


This exercise is an extension, where the task is to build a simple class called Polynomial for
representing and manipulating polynomial functions such as

๐‘
๐‘(๐‘ฅ) = ๐‘Ž0 + ๐‘Ž1 ๐‘ฅ + ๐‘Ž2 ๐‘ฅ2 + โ‹ฏ ๐‘Ž๐‘ ๐‘ฅ๐‘ = โˆ‘ ๐‘Ž๐‘› ๐‘ฅ๐‘› (๐‘ฅ โˆˆ R) (4)
๐‘›=0

The instance data for the class Polynomial will be the coefficients (in the case of Eq. (4),
the numbers ๐‘Ž0 , โ€ฆ , ๐‘Ž๐‘ )
Provide methods that

1. Evaluate the polynomial Eq. (4), returning ๐‘(๐‘ฅ) for any ๐‘ฅ


2. Differentiate the polynomial, replacing the original coefficients with those of its deriva-
tive ๐‘โ€ฒ

Avoid using any import statements

12.7 Solutions

12.7.1 Exercise 1
In [30]: class ECDF:

def __init__(self, observations):


self.observations = observations

def __call__(self, x):


counter = 0.0
for obs in self.observations:
if obs <= x:
counter += 1
return counter / len(self.observations)
170 12. OOP II: BUILDING CLASSES

In [31]: # == test == #

from random import uniform

samples = [uniform(0, 1) for i in range(10)]


F = ECDF(samples)

print(F(0.5)) # Evaluate ecdf at x = 0.5

F.observations = [uniform(0, 1) for i in range(1000)]

print(F(0.5))

0.4
0.484

12.7.2 Exercise 2
In [32]: class Polynomial:

def __init__(self, coefficients):


"""
Creates an instance of the Polynomial class representing

p(x) = a_0 x^0 + ... + a_N x^N,

where a_i = coefficients[i].


"""
self.coefficients = coefficients

def __call__(self, x):


"Evaluate the polynomial at x."
y = 0
for i, a in enumerate(self.coefficients):
y += a * x**i
return y

def differentiate(self):
"Reset self.coefficients to those of p' instead of p."
new_coefficients = []
for i, a in enumerate(self.coefficients):
new_coefficients.append(i * a)
# Remove the first element, which is zero
del new_coefficients[0]
# And reset coefficients data to new values
self.coefficients = new_coefficients
return new_coefficients
13

OOP III: Samuelson Multiplier


Accelerator

13.1 Contents

โ€ข Overview 13.2
โ€ข Details 13.3
โ€ข Implementation 13.4
โ€ข Stochastic Shocks 13.5
โ€ข Government Spending 13.6
โ€ข Wrapping Everything Into a Class 13.7
โ€ข Using the LinearStateSpace Class 13.8
โ€ข Pure Multiplier Model 13.9
โ€ข Summary 13.10

Co-author: Natasha Watkins


In addition to whatโ€™s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

13.2 Overview

This lecture creates non-stochastic and stochastic versions of Paul Samuelsonโ€™s celebrated
multiplier accelerator model [115]
In doing so, we extend the example of the Solow model class in our second OOP lecture
Our objectives are to

โ€ข provide a more detailed example of OOP and classes


โ€ข review a famous model
โ€ข review linear difference equations, both deterministic and stochastic

171
172 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.2.1 Samuelsonโ€™s Model

Samuelson used a second-order linear difference equation to represent a model of national out-
put based on three components:

โ€ข a national output identity asserting that national outcome is the sum of consumption
plus investment plus government purchases
โ€ข a Keynesian consumption function asserting that consumption at time ๐‘ก is equal to a
constant times national output at time ๐‘ก โˆ’ 1
โ€ข an investment accelerator asserting that investment at time ๐‘ก equals a constant called
the accelerator coefficient times the difference in output between period ๐‘ก โˆ’ 1 and ๐‘ก โˆ’ 2
โ€ข the idea that consumption plus investment plus government purchases constitute aggre-
gate demand, which automatically calls forth an equal amount of aggregate supply

(To read about linear difference equations see here or chapter IX of [118])
Samuelson used the model to analyze how particular values of the marginal propensity to
consume and the accelerator coefficient might give rise to transient business cycles in national
output
Possible dynamic properties include

โ€ข smooth convergence to a constant level of output


โ€ข damped business cycles that eventually converge to a constant level of output
โ€ข persistent business cycles that neither dampen nor explode

Later we present an extension that adds a random shock to the right side of the national in-
come identity representing random fluctuations in aggregate demand
This modification makes national output become governed by a second-order stochastic linear
difference equation that, with appropriate parameter values, gives rise to recurrent irregular
business cycles
(To read about stochastic linear difference equations see chapter XI of [118])

13.3 Details

Letโ€™s assume that

โ€ข {๐บ๐‘ก } is a sequence of levels of government expenditures โ€“ weโ€™ll start by setting ๐บ๐‘ก = ๐บ


for all ๐‘ก

โ€ข {๐ถ๐‘ก } is a sequence of levels of aggregate consumption expenditures, a key endogenous


variable in the model

โ€ข {๐ผ๐‘ก } is a sequence of rates of investment, another key endogenous variable

โ€ข {๐‘Œ๐‘ก } is a sequence of levels of national income, yet another endogenous variable

โ€ข ๐‘Ž is the marginal propensity to consume in the Keynesian consumption function ๐ถ๐‘ก =


๐‘Ž๐‘Œ๐‘กโˆ’1 + ๐›พ
13.3. DETAILS 173

โ€ข ๐‘ is the โ€œaccelerator coefficientโ€ in the โ€œinvestment acceleratorโ€ ๐ผ_๐‘ก = ๐‘(๐‘Œ _๐‘ก โˆ’ 1 โˆ’


๐‘Œ _๐‘ก โˆ’ 2)

โ€ข {๐œ–๐‘ก } is an IID sequence standard normal random variables

โ€ข ๐œŽ โ‰ฅ 0 is a โ€œvolatilityโ€ parameter โ€” setting ๐œŽ = 0 recovers the non-stochastic case that


weโ€™ll start with

The model combines the consumption function

๐ถ๐‘ก = ๐‘Ž๐‘Œ๐‘กโˆ’1 + ๐›พ (1)

with the investment accelerator

๐ผ๐‘ก = ๐‘(๐‘Œ๐‘กโˆ’1 โˆ’ ๐‘Œ๐‘กโˆ’2 ) (2)

and the national income identity

๐‘Œ๐‘ก = ๐ถ๐‘ก + ๐ผ๐‘ก + ๐บ๐‘ก (3)

โ€ข The parameter ๐‘Ž is peoplesโ€™ marginal propensity to consume out of income - equation


Eq. (1) asserts that people consume a fraction of math:a in (0,1) of each additional dol-
lar of income
โ€ข The parameter ๐‘ > 0 is the investment accelerator coefficient - equation Eq. (2) asserts
that people invest in physical capital when income is increasing and disinvest when it is
decreasing

Equations Eq. (1), Eq. (2), and Eq. (3) imply the following second-order linear difference
equation for national income:

๐‘Œ๐‘ก = (๐‘Ž + ๐‘)๐‘Œ๐‘กโˆ’1 โˆ’ ๐‘๐‘Œ๐‘กโˆ’2 + (๐›พ + ๐บ๐‘ก )

or

๐‘Œ๐‘ก = ๐œŒ1 ๐‘Œ๐‘กโˆ’1 + ๐œŒ2 ๐‘Œ๐‘กโˆ’2 + (๐›พ + ๐บ๐‘ก ) (4)

where ๐œŒ1 = (๐‘Ž + ๐‘) and ๐œŒ2 = โˆ’๐‘
To complete the model, we require two initial conditions
If the model is to generate time series for ๐‘ก = 0, โ€ฆ , ๐‘‡ , we require initial values

ฬ„ ,
๐‘Œโˆ’1 = ๐‘Œโˆ’1 ฬ„
๐‘Œโˆ’2 = ๐‘Œโˆ’2

Weโ€™ll ordinarily set the parameters (๐‘Ž, ๐‘) so that starting from an arbitrary pair of initial con-
ฬ„ , ๐‘Œโˆ’2
ditions (๐‘Œโˆ’1 ฬ„ ), national income ๐‘Œ _๐‘ก converges to a constant value as ๐‘ก becomes large

We are interested in studying

โ€ข the transient fluctuations in ๐‘Œ๐‘ก as it converges to its steady state level


174 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

โ€ข the rate at which it converges to a steady state level

The deterministic version of the model described so far โ€” meaning that no random shocks
hit aggregate demand โ€” has only transient fluctuations
We can convert the model to one that has persistent irregular fluctuations by adding a ran-
dom shock to aggregate demand

13.3.1 Stochastic Version of the Model

We create a random or stochastic version of the model by adding a random process of


shocks or disturbances {๐œŽ๐œ–๐‘ก } to the right side of equation Eq. (4), leading to the second-
order scalar linear stochastic difference equation:

๐‘Œ๐‘ก = ๐บ๐‘ก + ๐‘Ž(1 โˆ’ ๐‘)๐‘Œ๐‘กโˆ’1 โˆ’ ๐‘Ž๐‘๐‘Œ๐‘กโˆ’2 + ๐œŽ๐œ–๐‘ก (5)

13.3.2 Mathematical Analysis of the Model

To get started, letโ€™s set ๐บ๐‘ก โ‰ก 0, ๐œŽ = 0, and ๐›พ = 0


Then we can write equation Eq. (5) as

๐‘Œ๐‘ก = ๐œŒ1 ๐‘Œ๐‘กโˆ’1 + ๐œŒ2 ๐‘Œ๐‘กโˆ’2

or

๐‘Œ๐‘ก+2 โˆ’ ๐œŒ1 ๐‘Œ๐‘ก+1 โˆ’ ๐œŒ2 ๐‘Œ๐‘ก = 0 (6)

To discover the properties of the solution of Eq. (6), it is useful first to form the characteris-
tic polynomial for Eq. (6):

๐‘ง 2 โˆ’ ๐œŒ1 ๐‘ง โˆ’ ๐œŒ 2 (7)

where ๐‘ง is possibly a complex number


We want to find the two zeros (a.k.a. roots) โ€“ namely ๐œ†1 , ๐œ†2 โ€“ of the characteristic polyno-
mial
These are two special values of ๐‘ง, say ๐‘ง = ๐œ†1 and ๐‘ง = ๐œ†2 , such that if we set ๐‘ง equal to one of
these values in expression Eq. (7), the characteristic polynomial Eq. (7) equals zero:

๐‘ง2 โˆ’ ๐œŒ1 ๐‘ง โˆ’ ๐œŒ2 = (๐‘ง โˆ’ ๐œ†1 )(๐‘ง โˆ’ ๐œ†2 ) = 0 (8)

Equation Eq. (8) is said to factor the characteristic polynomial


When the roots are complex, they will occur as a complex conjugate pair
When the roots are complex, it is convenient to represent them in the polar form

๐œ†1 = ๐‘Ÿ๐‘’๐‘–๐œ” , ๐œ†2 = ๐‘Ÿ๐‘’โˆ’๐‘–๐œ”
13.3. DETAILS 175

where ๐‘Ÿ is the amplitude of the complex number and ๐œ” is its angle or phase
These can also be represented as

๐œ†1 = ๐‘Ÿ(๐‘๐‘œ๐‘ (๐œ”) + ๐‘– sin(๐œ”))

๐œ†2 = ๐‘Ÿ(๐‘๐‘œ๐‘ (๐œ”) โˆ’ ๐‘– sin(๐œ”))

(To read about the polar form, see here)


Given initial conditions ๐‘Œโˆ’1 , ๐‘Œโˆ’2 , we want to generate a solution of the difference equation
Eq. (6)
It can be represented as

๐‘Œ๐‘ก = ๐œ†๐‘ก1 ๐‘1 + ๐œ†๐‘ก2 ๐‘2

where ๐‘1 and ๐‘2 are constants that depend on the two initial conditions and on ๐œŒ1 , ๐œŒ2
When the roots are complex, it is useful to pursue the following calculations
Notice that

๐‘Œ๐‘ก = ๐‘1 (๐‘Ÿ๐‘’๐‘–๐œ” )๐‘ก + ๐‘2 (๐‘Ÿ๐‘’โˆ’๐‘–๐œ” )๐‘ก
= ๐‘1 ๐‘Ÿ๐‘ก ๐‘’๐‘–๐œ”๐‘ก + ๐‘2 ๐‘Ÿ๐‘ก ๐‘’โˆ’๐‘–๐œ”๐‘ก
= ๐‘1 ๐‘Ÿ๐‘ก [cos(๐œ”๐‘ก) + ๐‘– sin(๐œ”๐‘ก)] + ๐‘2 ๐‘Ÿ๐‘ก [cos(๐œ”๐‘ก) โˆ’ ๐‘– sin(๐œ”๐‘ก)]
= (๐‘1 + ๐‘2 )๐‘Ÿ๐‘ก cos(๐œ”๐‘ก) + ๐‘–(๐‘1 โˆ’ ๐‘2 )๐‘Ÿ๐‘ก sin(๐œ”๐‘ก)

The only way that ๐‘Œ๐‘ก can be a real number for each ๐‘ก is if ๐‘1 + ๐‘2 is a real number and ๐‘1 โˆ’ ๐‘2
is an imaginary number
This happens only when ๐‘1 and ๐‘2 are complex conjugates, in which case they can be written
in the polar forms

๐‘1 = ๐‘ฃ๐‘’๐‘–๐œƒ , ๐‘2 = ๐‘ฃ๐‘’โˆ’๐‘–๐œƒ

So we can write

๐‘Œ๐‘ก = ๐‘ฃ๐‘’๐‘–๐œƒ ๐‘Ÿ๐‘ก ๐‘’๐‘–๐œ”๐‘ก + ๐‘ฃ๐‘’โˆ’๐‘–๐œƒ ๐‘Ÿ๐‘ก ๐‘’โˆ’๐‘–๐œ”๐‘ก


= ๐‘ฃ๐‘Ÿ๐‘ก [๐‘’๐‘–(๐œ”๐‘ก+๐œƒ) + ๐‘’โˆ’๐‘–(๐œ”๐‘ก+๐œƒ) ]
= 2๐‘ฃ๐‘Ÿ๐‘ก cos(๐œ”๐‘ก + ๐œƒ)

where ๐‘ฃ and ๐œƒ are constants that must be chosen to satisfy initial conditions for ๐‘Œโˆ’1 , ๐‘Œโˆ’2
This formula shows that when the roots are complex, ๐‘Œ๐‘ก displays oscillations with period
๐‘ฬŒ = 2๐œ‹
๐œ” and damping factor ๐‘Ÿ

We say that ๐‘ฬŒ is the period because in that amount of time the cosine wave cos(๐œ”๐‘ก + ๐œƒ) goes
through exactly one complete cycles
(Draw a cosine function to convince yourself of this please)
176 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

Remark: Following [115], we want to choose the parameters ๐‘Ž, ๐‘ of the model so that the ab-
solute values (of the possibly complex) roots ๐œ†1 , ๐œ†2 of the characteristic polynomial are both
strictly less than one:

|๐œ†๐‘— | < 1 for ๐‘— = 1, 2

Remark: When both roots ๐œ†1 , ๐œ†2 of the characteristic polynomial have absolute values
strictly less than one, the absolute value of the larger one governs the rate of convergence to
the steady state of the non stochastic version of the model

13.3.3 Things This Lecture Does

We write a function to generate simulations of a {๐‘Œ๐‘ก } sequence as a function of time


The function requires that we put in initial conditions for ๐‘Œโˆ’1 , ๐‘Œโˆ’2
The function checks that ๐‘Ž, ๐‘ are set so that ๐œ†1 , ๐œ†2 are less than
unity in absolute value (also called โ€œmodulusโ€)
The function also tells us whether the roots are complex, and, if they are complex, returns
both their real and complex parts
If the roots are both real, the function returns their values
We use our function written to simulate paths that are stochastic (when ๐œŽ > 0)
We have written the function in a way that allows us to input {๐บ๐‘ก } paths of a few simple
forms, e.g.,

โ€ข one time jumps in ๐บ at some time


โ€ข a permanent jump in ๐บ that occurs at some time

We proceed to use the Samuelson multiplier-accelerator model as a laboratory to make a sim-


ple OOP example
The โ€œstateโ€ that determines next periodโ€™s ๐‘Œ๐‘ก+1 is now not just the current value ๐‘Œ๐‘ก but also
the once lagged value ๐‘Œ๐‘กโˆ’1
This involves a little more bookkeeping than is required in the Solow model class definition
We use the Samuelson multiplier-accelerator model as a vehicle for teaching how we can grad-
ually add more features to the class
We want to have a method in the class that automatically generates a simulation, either non-
stochastic (๐œŽ = 0) or stochastic (๐œŽ > 0)
We also show how to map the Samuelson model into a simple instance of the Lin-
earStateSpace class described here
We can use a LinearStateSpace instance to do various things that we did above with our
homemade function and class
Among other things, we show by example that the eigenvalues of the matrix ๐ด that we use to
form the instance of the LinearStateSpace class for the Samuelson model equal the roots
of the characteristic polynomial Eq. (7) for the Samuelson multiplier accelerator model
13.4. IMPLEMENTATION 177

Here is the formula for the matrix ๐ด in the linear state space system in the case that govern-
ment expenditures are a constant ๐บ:

1 0 0
๐ด = โŽข๐›พ + ๐บ ๐œŒ1 ๐œŒ2 โŽค
โŽก
โŽฅ
โŽฃ 0 1 0 โŽฆ

13.4 Implementation

Weโ€™ll start by drawing an informative graph from page 189 of [118]

In [2]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

def param_plot():

"""this function creates the graph on page 189 of Sargent Macroeconomic Theory, second edition, 19

fig, ax = plt.subplots(figsize=(10, 6))


ax.set_aspect('equal')

# Set axis
xmin, ymin = -3, -2
xmax, ymax = -xmin, -ymin
plt.axis([xmin, xmax, ymin, ymax])

# Set axis labels


ax.set(xticks=[], yticks=[])
ax.set_xlabel(r'$\rho_2$', fontsize=16)
ax.xaxis.set_label_position('top')
ax.set_ylabel(r'$\rho_1$', rotation=0, fontsize=16)
ax.yaxis.set_label_position('right')

# Draw (t1, t2) points


ฯ1 = np.linspace(-2, 2, 100)
ax.plot(ฯ1, -abs(ฯ1) + 1, c='black')
ax.plot(ฯ1, np.ones_like(ฯ1) * -1, c='black')
ax.plot(ฯ1, -(ฯ1**2 / 4), c='black')

# Turn normal axes off


for spine in ['left', 'bottom', 'top', 'right']:
ax.spines[spine].set_visible(False)

# Add arrows to represent axes


axes_arrows = {'arrowstyle': '<|-|>', 'lw': 1.3}
ax.annotate('', xy=(xmin, 0), xytext=(xmax, 0), arrowprops=axes_arrows)
ax.annotate('', xy=(0, ymin), xytext=(0, ymax), arrowprops=axes_arrows)

# Annotate the plot with equations


plot_arrowsl = {'arrowstyle': '-|>', 'connectionstyle': "arc3, rad=-0.2"}
plot_arrowsr = {'arrowstyle': '-|>', 'connectionstyle': "arc3, rad=0.2"}
ax.annotate(r'$\rho_1 + \rho_2 < 1$', xy=(0.5, 0.3), xytext=(0.8, 0.6),
arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'$\rho_1 + \rho_2 = 1$', xy=(0.38, 0.6), xytext=(0.6, 0.8),
arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'$\rho_2 < 1 + \rho_1$', xy=(-0.5, 0.3), xytext=(-1.3, 0.6),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'$\rho_2 = 1 + \rho_1$', xy=(-0.38, 0.6), xytext=(-1, 0.8),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'$\rho_2 = -1$', xy=(1.5, -1), xytext=(1.8, -1.3),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'${\rho_1}^2 + 4\rho_2 = 0$', xy=(1.15, -0.35),
xytext=(1.5, -0.3), arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'${\rho_1}^2 + 4\rho_2 < 0$', xy=(1.4, -0.7),
xytext=(1.8, -0.6), arrowprops=plot_arrowsr, fontsize='12')
178 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

# Label categories of solutions


ax.text(1.5, 1, 'Explosive\n growth', ha='center', fontsize=16)
ax.text(-1.5, 1, 'Explosive\n oscillations', ha='center', fontsize=16)
ax.text(0.05, -1.5, 'Explosive oscillations', ha='center', fontsize=16)
ax.text(0.09, -0.5, 'Damped oscillations', ha='center', fontsize=16)

# Add small marker to y-axis


ax.axhline(y=1.005, xmin=0.495, xmax=0.505, c='black')
ax.text(-0.12, -1.12, '-1', fontsize=10)
ax.text(-0.12, 0.98, '1', fontsize=10)

return fig

param_plot()
plt.show()

The graph portrays regions in which the (๐œ†1 , ๐œ†2 ) root pairs implied by the (๐œŒ1 = (๐‘Ž + ๐‘), ๐œŒ2 =
โˆ’๐‘) difference equation parameter pairs in the Samuelson model are such that:

โ€ข (๐œ†1 , ๐œ†2 ) are complex with modulus less than 1 - in this case, the {๐‘Œ๐‘ก } sequence displays
damped oscillations
โ€ข (๐œ†1 , ๐œ†2 ) are both real, but one is strictly greater than 1 - this leads to explosive growth
โ€ข (๐œ†1 , ๐œ†2 ) are both real, but one is strictly less than โˆ’1 - this leads to explosive oscilla-
tions
โ€ข (๐œ†1 , ๐œ†2 ) are both real and both are less than 1 in absolute value - in this case, there is
smooth convergence to the steady state without damped cycles

Later weโ€™ll present the graph with a red mark showing the particular point implied by the
setting of (๐‘Ž, ๐‘)
13.4. IMPLEMENTATION 179

13.4.1 Function to Describe Implications of Characteristic Polynomial


In [3]: def categorize_solution(ฯ1, ฯ2):
"""this function takes values of ฯ1 and ฯ2 and uses them to classify the type of solution"""

discriminant = ฯ1 ** 2 + 4 * ฯ2
if ฯ2 > 1 + ฯ1 or ฯ2 < -1:
print('Explosive oscillations')
elif ฯ1 + ฯ2 > 1:
print('Explosive growth')
elif discriminant < 0:
print('Roots are complex with modulus less than one; therefore damped oscillations')
else:
print('Roots are real and absolute values are less than one; therefore get smooth convergence

In [4]: ### Test the categorize_solution function

categorize_solution(1.3, -.4)

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state

13.4.2 Function for Plotting Paths

A useful function for our work below is

In [5]: def plot_y(function=None):


"""function plots path of Y_t"""
plt.subplots(figsize=(10, 6))
plt.plot(function)
plt.xlabel('Time $t$')
plt.ylabel('$Y_t$', rotation=0)
plt.grid()
plt.show()

13.4.3 Manual or โ€œby handโ€ Root Calculations

The following function calculates roots of the characteristic polynomial using high school al-
gebra
(Weโ€™ll calculate the roots in other ways later)
The function also plots a ๐‘Œ๐‘ก starting from initial conditions that we set

In [6]: from cmath import sqrt

##=== This is a 'manual' method ===#

def y_nonstochastic(y_0=100, y_1=80, ฮฑ=.92, ฮฒ=.5, ฮณ=10, n=80):

"""Takes values of parameters and computes the roots of characteristic polynomial.


It tells whether they are real or complex and whether they are less than unity in absolute valu
It also computes a simulation of length n starting from the two given initial conditions for na

roots = []

ฯ1 = ฮฑ + ฮฒ
ฯ2 = -ฮฒ

print(f'ฯ_1 is {ฯ1}')
print(f'ฯ_2 is {ฯ2}')

discriminant = ฯ1 ** 2 + 4 * ฯ2
180 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

if discriminant == 0:
roots.append(-ฯ1 / 2)
print('Single real root: ')
print(''.join(str(roots)))
elif discriminant > 0:
roots.append((-ฯ1 + sqrt(discriminant).real) / 2)
roots.append((-ฯ1 - sqrt(discriminant).real) / 2)
print('Two real roots: ')
print(''.join(str(roots)))
else:
roots.append((-ฯ1 + sqrt(discriminant)) / 2)
roots.append((-ฯ1 - sqrt(discriminant)) / 2)
print('Two complex roots: ')
print(''.join(str(roots)))

if all(abs(root) < 1 for root in roots):


print('Absolute values of roots are less than one')
else:
print('Absolute values of roots are not less than one')

def transition(x, t): return ฯ1 * x[t - 1] + ฯ2 * x[t - 2] + ฮณ

y_t = [y_0, y_1]

for t in range(2, n):


y_t.append(transition(y_t, t))

return y_t

plot_y(y_nonstochastic())

ฯ_1 is 1.42
ฯ_2 is -0.5
Two real roots:
[-0.6459687576256715, -0.7740312423743284]
Absolute values of roots are less than one
13.4. IMPLEMENTATION 181

13.4.4 Reverse-Engineering Parameters to Generate Damped Cycles

The next cell writes code that takes as inputs the modulus ๐‘Ÿ and phase ๐œ™ of a conjugate pair
of complex numbers in polar form

๐œ†1 = ๐‘Ÿ exp(๐‘–๐œ™), ๐œ†2 = ๐‘Ÿ exp(โˆ’๐‘–๐œ™)

โ€ข The code assumes that these two complex numbers are the roots of the characteristic
polynomial
โ€ข It then reverse-engineers (๐‘Ž, ๐‘) and (๐œŒ1 , ๐œŒ2 ), pairs that would generate those roots

In [7]: ### code to reverse-engineer a cycle


### y_t = r^t (c_1 cos(๏ฟฝ t) + c2 sin(๏ฟฝ t))
###

import cmath
import math

def f(r, ๏ฟฝ):


"""
Takes modulus r and angle ๏ฟฝ of complex number r exp(j ๏ฟฝ)
and creates ฯ1 and ฯ2 of characteristic polynomial for which
r exp(j ๏ฟฝ) and r exp(- j ๏ฟฝ) are complex roots.

Returns the multiplier coefficient a and the accelerator coefficient b


that verifies those roots.
"""
g1 = cmath.rect(r, ๏ฟฝ) # Generate two complex roots
g2 = cmath.rect(r, -๏ฟฝ)
ฯ1 = g1 + g2 # Implied ฯ1, ฯ2
ฯ2 = -g1 * g2
b = -ฯ2 # Reverse-engineer a and b that validate these
a = ฯ1 - b
return ฯ1, ฯ2, a, b

## Now let's use the function in an example


## Here are the example parameters

r = .95
period = 10 # Length of cycle in units of time
๏ฟฝ = 2 * math.pi/period

## Apply the function

ฯ1, ฯ2, a, b = f(r, ๏ฟฝ)

print(f"a, b = {a}, {b}")


print(f"ฯ1, ฯ2 = {ฯ1}, {ฯ2}")

a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ฯ1, ฯ2 = (1.5371322893124+0j), (-0.9024999999999999+0j)

In [8]: ## Print the real components of ฯ1 and ฯ2

ฯ1 = ฯ1.real
ฯ2 = ฯ2.real

ฯ1, ฯ2

Out[8]: (1.5371322893124, -0.9024999999999999)


182 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.4.5 Root Finding Using Numpy

Here weโ€™ll use numpy to compute the roots of the characteristic polynomial

In [9]: r1, r2 = np.roots([1, -ฯ1, -ฯ2])

p1 = cmath.polar(r1)
p2 = cmath.polar(r2)

print(f"r, ๏ฟฝ = {r}, {๏ฟฝ}")


print(f"p1, p2 = {p1}, {p2}")
# print(f"g1, g2 = {g1}, {g2}")

print(f"a, b = {a}, {b}")


print(f"ฯ1, ฯ2 = {ฯ1}, {ฯ2}")

r, ๏ฟฝ = 0.95, 0.6283185307179586
p1, p2 = (0.95, 0.6283185307179586), (0.95, -0.6283185307179586)
a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ฯ1, ฯ2 = 1.5371322893124, -0.9024999999999999

In [10]: ##=== This method uses numpy to calculate roots ===#

def y_nonstochastic(y_0=100, y_1=80, ฮฑ=.9, ฮฒ=.8, ฮณ=10, n=80):

""" Rather than computing the roots of the characteristic polynomial by hand as we did earlier, t
enlists numpy to do the work for us """

# Useful constants
ฯ1 = ฮฑ + ฮฒ
ฯ2 = -ฮฒ

categorize_solution(ฯ1, ฯ2)

# Find roots of polynomial


roots = np.roots([1, -ฯ1, -ฯ2])
print(f'Roots are {roots}')

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Define transition equation


def transition(x, t): return ฯ1 * x[t - 1] + ฯ2 * x[t - 2] + ฮณ

# Set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):
y_t.append(transition(y_t, t))

return y_t

plot_y(y_nonstochastic())

Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.85+0.27838822j 0.85-0.27838822j]
Roots are complex
13.4. IMPLEMENTATION 183

Roots are less than one

13.4.6 Reverse-Engineered Complex Roots: Example

The next cell studies the implications of reverse-engineered complex roots


Weโ€™ll generate an undamped cycle of period 10

In [11]: r = 1 # generates undamped, nonexplosive cycles

period = 10 # length of cycle in units of time


๏ฟฝ = 2 * math.pi/period

## Apply the reverse-engineering function f

ฯ1, ฯ2, a, b = f(r, ๏ฟฝ)

a = a.real # drop the imaginary part so that it is a valid input into y_nonstochastic
b = b.real

print(f"a, b = {a}, {b}")

ytemp = y_nonstochastic(ฮฑ=a, ฮฒ=b, y_0=20, y_1=30)


plot_y(ytemp)

a, b = 0.6180339887498949, 1.0
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.80901699+0.58778525j 0.80901699-0.58778525j]
Roots are complex
Roots are less than one
184 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.4.7 Digression: Using Sympy to Find Roots

We can also use sympy to compute analytic formulas for the roots

In [12]: import sympy


from sympy import Symbol, init_printing
init_printing()

r1 = Symbol("ฯ_1")
r2 = Symbol("ฯ_2")
z = Symbol("z")

sympy.solve(z**2 - r1*z - r2, z)

Out[12]:

๐œŒ1 โˆš๐œŒ12 + 4๐œŒ2 ๐œŒ1 โˆš๐œŒ12 + 4๐œŒ2


[ โˆ’ , + ]
2 2 2 2

๐œŒ1 1 ๐œŒ1 1
[ โˆ’ โˆš๐œŒ12 + 4๐œŒ2 , + โˆš๐œŒ12 + 4๐œŒ2 ]
2 2 2 2

In [13]: a = Symbol("ฮฑ")
b = Symbol("ฮฒ")
r1 = a + b
r2 = -b

sympy.solve(z**2 - r1*z - r2, z)

Out[13]:
13.5. STOCHASTIC SHOCKS 185

๐›ผ ๐›ฝ โˆš๐›ผ2 + 2๐›ผ๐›ฝ + ๐›ฝ 2 โˆ’ 4๐›ฝ ๐›ผ ๐›ฝ โˆš๐›ผ2 + 2๐›ผ๐›ฝ + ๐›ฝ 2 โˆ’ 4๐›ฝ


[ + โˆ’ , + + ]
2 2 2 2 2 2

๐›ผ ๐›ฝ 1 ๐›ผ ๐›ฝ 1
[ + โˆ’ โˆš๐›ผ2 + 2๐›ผ๐›ฝ + ๐›ฝ 2 โˆ’ 4๐›ฝ, + + โˆš๐›ผ2 + 2๐›ผ๐›ฝ + ๐›ฝ 2 โˆ’ 4๐›ฝ]
2 2 2 2 2 2

13.5 Stochastic Shocks

Now weโ€™ll construct some code to simulate the stochastic version of the model that emerges
when we add a random shock process to aggregate demand

In [14]: def y_stochastic(y_0=0, y_1=0, ฮฑ=0.8, ฮฒ=0.2, ฮณ=10, n=100, ฯƒ=5):

"""This function takes parameters of a stochastic version of the model and proceeds to analyze
the roots of the characteristic polynomial and also generate a simulation"""

# Useful constants
ฯ1 = ฮฑ + ฮฒ
ฯ2 = -ฮฒ

# Categorize solution
categorize_solution(ฯ1, ฯ2)

# Find roots of polynomial


roots = np.roots([1, -ฯ1, -ฯ2])
print(roots)

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Generate shocks
๏ฟฝ = np.random.normal(0, 1, n)

# Define transition equation


def transition(x, t): return ฯ1 * \
x[t - 1] + ฯ2 * x[t - 2] + ฮณ + ฯƒ * ๏ฟฝ[t]

# Set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):
y_t.append(transition(y_t, t))

return y_t

plot_y(y_stochastic())

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
186 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

Letโ€™s do a simulation in which there are shocks and the characteristic polynomial has complex
roots

In [15]: r = .97

period = 10 # length of cycle in units of time


๏ฟฝ = 2 * math.pi/period

### apply the reverse-engineering function f

ฯ1, ฯ2, a, b = f(r, ๏ฟฝ)

a = a.real # drop the imaginary part so that it is a valid input into y_nonstochastic
b = b.real

print(f"a, b = {a}, {b}")


plot_y(y_stochastic(y_0=40, y_1 = 42, ฮฑ=a, ฮฒ=b, ฯƒ=2, n=100))

a, b = 0.6285929690873979, 0.9409000000000001
Roots are complex with modulus less than one; therefore damped oscillations
[0.78474648+0.57015169j 0.78474648-0.57015169j]
Roots are complex
Roots are less than one
13.6. GOVERNMENT SPENDING 187

13.6 Government Spending

This function computes a response to either a permanent or one-off increase in government


expenditures

In [16]: def y_stochastic_g(y_0=20,


y_1=20,
ฮฑ=0.8,
ฮฒ=0.2,
ฮณ=10,
n=100,
ฯƒ=2,
g=0,
g_t=0,
duration='permanent'):

"""This program computes a response to a permanent increase in government expenditures that occur
at time 20"""

# Useful constants
ฯ1 = ฮฑ + ฮฒ
ฯ2 = -ฮฒ

# Categorize solution
categorize_solution(ฯ1, ฯ2)

# Find roots of polynomial


roots = np.roots([1, -ฯ1, -ฯ2])
print(roots)

# Check if real or complex


if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one


if all(abs(root) < 1 for root in roots):
188 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

print('Roots are less than one')


else:
print('Roots are not less than one')

# Generate shocks
๏ฟฝ = np.random.normal(0, 1, n)

def transition(x, t, g):

# Non-stochastic - separated to avoid generating random series when not needed


if ฯƒ == 0:
return ฯ1 * x[t - 1] + ฯ2 * x[t - 2] + ฮณ + g

# Stochastic
else:
๏ฟฝ = np.random.normal(0, 1, n)
return ฯ1 * x[t - 1] + ฯ2 * x[t - 2] + ฮณ + g + ฯƒ * ๏ฟฝ[t]

# Create list and set initial conditions


y_t = [y_0, y_1]

# Generate y_t series


for t in range(2, n):

# No government spending
if g == 0:
y_t.append(transition(y_t, t))

# Government spending (no shock)


elif g != 0 and duration == None:
y_t.append(transition(y_t, t))

# Permanent government spending shock


elif duration == 'permanent':
if t < g_t:
y_t.append(transition(y_t, t, g=0))
else:
y_t.append(transition(y_t, t, g=g))

# One-off government spending shock


elif duration == 'one-off':
if t == g_t:
y_t.append(transition(y_t, t, g=g))
else:
y_t.append(transition(y_t, t, g=0))
return y_t

A permanent government spending shock can be simulated as follows

In [17]: plot_y(y_stochastic_g(g=10, g_t=20, duration='permanent'))

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
13.6. GOVERNMENT SPENDING 189

We can also see the response to a one time jump in government expenditures

In [18]: plot_y(y_stochastic_g(g=500, g_t=50, duration='one-off'))

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
190 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.7 Wrapping Everything Into a Class

Up to now, we have written functions to do the work


Now weโ€™ll roll up our sleeves and write a Python class called Samuelson for the Samuelson
model

In [19]: class Samuelson():

r"""This class represents the Samuelson model, otherwise known as the


multiple-accelerator model. The model combines the Keynesian multiplier
with the accelerator theory of investment.

The path of output is governed by a linear second-order difference equation

.. math::

Y_t = + \alpha (1 + \beta) Y_{t-1} - \alpha \beta Y_{t-2}

Parameters
----------
y_0 : scalar
Initial condition for Y_0
y_1 : scalar
Initial condition for Y_1
ฮฑ : scalar
Marginal propensity to consume
ฮฒ : scalar
Accelerator coefficient
n : int
Number of iterations
ฯƒ : scalar
Volatility parameter. It must be greater than or equal to 0. Set
equal to 0 for a non-stochastic model.
g : scalar
Government spending shock
g_t : int
Time at which government spending shock occurs. Must be specified
when duration != None.
duration : {None, 'permanent', 'one-off'}
Specifies type of government spending shock. If none, government
spending equal to g for all t.

"""

def __init__(self,
y_0=100,
y_1=50,
ฮฑ=1.3,
ฮฒ=0.2,
ฮณ=10,
n=100,
ฯƒ=0,
g=0,
g_t=0,
duration=None):

self.y_0, self.y_1, self.ฮฑ, self.ฮฒ = y_0, y_1, ฮฑ, ฮฒ


self.n, self.g, self.g_t, self.duration = n, g, g_t, duration
self.ฮณ, self.ฯƒ = ฮณ, ฯƒ
self.ฯ1 = ฮฑ + ฮฒ
self.ฯ2 = -ฮฒ
self.roots = np.roots([1, -self.ฯ1, -self.ฯ2])

def root_type(self):
if all(isinstance(root, complex) for root in self.roots):
return 'Complex conjugate'
elif len(self.roots) > 1:
return 'Double real'
else:
return 'Single real'
13.7. WRAPPING EVERYTHING INTO A CLASS 191

def root_less_than_one(self):
if all(abs(root) < 1 for root in self.roots):
return True

def solution_type(self):
ฯ1, ฯ2 = self.ฯ1, self.ฯ2
discriminant = ฯ1 ** 2 + 4 * ฯ2
if ฯ2 >= 1 + ฯ1 or ฯ2 <= -1:
return 'Explosive oscillations'
elif ฯ1 + ฯ2 >= 1:
return 'Explosive growth'
elif discriminant < 0:
return 'Damped oscillations'
else:
return 'Steady state'

def _transition(self, x, t, g):

# Non-stochastic - separated to avoid generating random series when not needed


if self.ฯƒ == 0:
return self.ฯ1 * x[t - 1] + self.ฯ2 * x[t - 2] + self.ฮณ + g

# Stochastic
else:
๏ฟฝ = np.random.normal(0, 1, self.n)
return self.ฯ1 * x[t - 1] + self.ฯ2 * x[t - 2] + self.ฮณ + g + self.ฯƒ * ๏ฟฝ[t]

def generate_series(self):

# Create list and set initial conditions


y_t = [self.y_0, self.y_1]

# Generate y_t series


for t in range(2, self.n):

# No government spending
if self.g == 0:
y_t.append(self._transition(y_t, t))

# Government spending (no shock)


elif self.g != 0 and self.duration == None:
y_t.append(self._transition(y_t, t))

# Permanent government spending shock


elif self.duration == 'permanent':
if t < self.g_t:
y_t.append(self._transition(y_t, t, g=0))
else:
y_t.append(self._transition(y_t, t, g=self.g))

# One-off government spending shock


elif self.duration == 'one-off':
if t == self.g_t:
y_t.append(self._transition(y_t, t, g=self.g))
else:
y_t.append(self._transition(y_t, t, g=0))
return y_t

def summary(self):
print('Summary\n' + '-' * 50)
print(f'Root type: {self.root_type()}')
print(f'Solution type: {self.solution_type()}')
print(f'Roots: {str(self.roots)}')

if self.root_less_than_one() == True:
print('Absolute value of roots is less than one')
else:
print('Absolute value of roots is not less than one')

if self.ฯƒ > 0:
print('Stochastic series with ฯƒ = ' + str(self.ฯƒ))
else:
192 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

print('Non-stochastic series')

if self.g != 0:
print('Government spending equal to ' + str(self.g))

if self.duration != None:
print(self.duration.capitalize() +
' government spending shock at t = ' + str(self.g_t))

def plot(self):
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(self.generate_series())
ax.set(xlabel='Iteration', xlim=(0, self.n))
ax.set_ylabel('$Y_t$', rotation=0)
ax.grid()

# Add parameter values to plot


paramstr = f'$\\alpha={self.ฮฑ:.2f}$ \n $\\beta={self.ฮฒ:.2f}$ \n $\\gamma={self.ฮณ:.2f}$ \n \
$\\sigma={self.ฯƒ:.2f}$ \n $\\rho_1={self.ฯ1:.2f}$ \n $\\rho_2={self.ฯ2:.2f}$'
props = dict(fc='white', pad=10, alpha=0.5)
ax.text(0.87, 0.05, paramstr, transform=ax.transAxes,
fontsize=12, bbox=props, va='bottom')

return fig

def param_plot(self):

# Uses the param_plot() function defined earlier (it is then able


# to be used standalone or as part of the model)

fig = param_plot()
ax = fig.gca()

# Add ฮป values to legend


for i, root in enumerate(self.roots):
if isinstance(root, complex):
operator = ['+', ''] # Need to fill operator for positive as string is split apart
label = rf'$\lambda_{i+1} = {sam.roots[i].real:.2f} {operator[i]} {sam.roots[i].imag:
else:
label = rf'$\lambda_{i+1} = {sam.roots[i].real:.2f}$'
ax.scatter(0, 0, 0, label=label) # dummy to add to legend

# Add ฯ pair to plot


ax.scatter(self.ฯ1, self.ฯ2, 100, 'red', '+', label=r'$(\ \rho_1, \ \rho_2 \ )$', zorder=5)

plt.legend(fontsize=12, loc=3)

return fig

13.7.1 Illustration of Samuelson Class

Now weโ€™ll put our Samuelson class to work on an example

In [20]: sam = Samuelson(ฮฑ=0.8, ฮฒ=0.5, ฯƒ=2, g=10, g_t=20, duration='permanent')


sam.summary()

Summary
--------------------------------------------------
Root type: Complex conjugate
Solution type: Damped oscillations
Roots: [0.65+0.27838822j 0.65-0.27838822j]
Absolute value of roots is less than one
Stochastic series with ฯƒ = 2
Government spending equal to 10
Permanent government spending shock at t = 20

In [21]: sam.plot()
plt.show()
13.7. WRAPPING EVERYTHING INTO A CLASS 193

13.7.2 Using the Graph

Weโ€™ll use our graph to show where the roots lie and how their location is consistent with the
behavior of the path just graphed
The red + sign shows the location of the roots

In [22]: sam.param_plot()
plt.show()
194 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.8 Using the LinearStateSpace Class

It turns out that we can use the QuantEcon.py LinearStateSpace class to do much of the
work that we have done from scratch above
Here is how we map the Samuelson model into an instance of a LinearStateSpace class

In [23]: from quantecon import LinearStateSpace

""" This script maps the Samuelson model in the the ``LinearStateSpace`` class"""
ฮฑ = 0.8
ฮฒ = 0.9
ฯ1 = ฮฑ + ฮฒ
ฯ2 = -ฮฒ
ฮณ = 10
ฯƒ = 1
g = 10
n = 100

A = [[1, 0, 0],
[ฮณ + g, ฯ1, ฯ2],
[0, 1, 0]]

G = [[ฮณ + g, ฯ1, ฯ2], # this is Y_{t+1}


[ฮณ, ฮฑ, 0], # this is C_{t+1}
[0, ฮฒ, -ฮฒ]] # this is I_{t+1}

ฮผ_0 = [1, 100, 100]


C = np.zeros((3,1))
C[1] = ฯƒ # stochastic

sam_t = LinearStateSpace(A, C, G, mu_0=ฮผ_0)

x, y = sam_t.simulate(ts_length=n)

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


titles = ['Output ($Y_t$)', 'Consumption ($C_t$)', 'Investment ($I_t$)']
colors = ['darkblue', 'red', 'purple']
for ax, series, title, color in zip(axes, y, titles, colors):
ax.plot(series, color=color)
ax.set(title=title, xlim=(0, n))
ax.grid()

axes[-1].set_xlabel('Iteration')

plt.show()
13.8. USING THE LINEARSTATESPACE CLASS 195

13.8.1 Other Methods in the LinearStateSpace Class

Letโ€™s plot impulse response functions for the instance of the Samuelson model using a
method in the LinearStateSpace class

In [24]: imres = sam_t.impulse_response()


imres = np.asarray(imres)
y1 = imres[:, :, 0]
y2 = imres[:, :, 1]
y1.shape

Out[24]:

(2, 6, 1)

(2, 6, 1)

Now letโ€™s compute the zeros of the characteristic polynomial by simply calculating the eigen-
values of ๐ด

In [25]: A = np.asarray(A)
w, v = np.linalg.eig(A)
print(w)

[0.85+0.42130749j 0.85-0.42130749j 1. +0.j ]


196 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.8.2 Inheriting Methods from LinearStateSpace

We could also create a subclass of LinearStateSpace (inheriting all its methods and at-
tributes) to add more functions to use

In [26]: class SamuelsonLSS(LinearStateSpace):

"""
this subclass creates a Samuelson multiplier-accelerator model
as a linear state space system
"""
def __init__(self,
y_0=100,
y_1=100,
ฮฑ=0.8,
ฮฒ=0.9,
ฮณ=10,
ฯƒ=1,
g=10):

self.ฮฑ, self.ฮฒ = ฮฑ, ฮฒ
self.y_0, self.y_1, self.g = y_0, y_1, g
self.ฮณ, self.ฯƒ = ฮณ, ฯƒ

# Define intial conditions


self.ฮผ_0 = [1, y_0, y_1]

self.ฯ1 = ฮฑ + ฮฒ
self.ฯ2 = -ฮฒ

# Define transition matrix


self.A = [[1, 0, 0],
[ฮณ + g, self.ฯ1, self.ฯ2],
[0, 1, 0]]

# Define output matrix


self.G = [[ฮณ + g, self.ฯ1, self.ฯ2], # this is Y_{t+1}
[ฮณ, ฮฑ, 0], # this is C_{t+1}
[0, ฮฒ, -ฮฒ]] # this is I_{t+1}

self.C = np.zeros((3, 1))


self.C[1] = ฯƒ # stochastic

# Initialize LSS with parameters from Samuelson model


LinearStateSpace.__init__(self, self.A, self.C, self.G, mu_0=self.ฮผ_0)

def plot_simulation(self, ts_length=100, stationary=True):

# Temporarily store original parameters


temp_ฮผ = self.ฮผ_0
temp_ฮฃ = self.Sigma_0

# Set distribution parameters equal to their stationary values for simulation


if stationary == True:
try:
self.ฮผ_x, self.ฮผ_y, self.ฯƒ_x, self.ฯƒ_y = self.stationary_distributions()
self.ฮผ_0 = self.ฮผ_y
self.ฮฃ_0 = self.ฯƒ_y
# Exception where no convergence achieved when calculating stationary distributions
except ValueError:
print('Stationary distribution does not exist')

x, y = self.simulate(ts_length)

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


titles = ['Output ($Y_t$)', 'Consumption ($C_t$)', 'Investment ($I_t$)']
colors = ['darkblue', 'red', 'purple']
for ax, series, title, color in zip(axes, y, titles, colors):
ax.plot(series, color=color)
ax.set(title=title, xlim=(0, n))
ax.grid()
13.8. USING THE LINEARSTATESPACE CLASS 197

axes[-1].set_xlabel('Iteration')

# Reset distribution parameters to their initial values


self.ฮผ_0 = temp_ฮผ
self.Sigma_0 = temp_ฮฃ

return fig

def plot_irf(self, j=5):

x, y = self.impulse_response(j)

# Reshape into 3 x j matrix for plotting purposes


yimf = np.array(y).flatten().reshape(j+1, 3).T

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))


labels = ['$Y_t$', '$C_t$', '$I_t$']
colors = ['darkblue', 'red', 'purple']
for ax, series, label, color in zip(axes, yimf, labels, colors):
ax.plot(series, color=color)
ax.set(xlim=(0, j))
ax.set_ylabel(label, rotation=0, fontsize=14, labelpad=10)
ax.grid()

axes[0].set_title('Impulse Response Functions')


axes[-1].set_xlabel('Iteration')

return fig

def multipliers(self, j=5):


x, y = self.impulse_response(j)
return np.sum(np.array(y).flatten().reshape(j+1, 3), axis=0)

13.8.3 Illustrations

Letโ€™s show how we can use the SamuelsonLSS

In [27]: samlss = SamuelsonLSS()

In [28]: samlss.plot_simulation(100, stationary=False)


plt.show()
198 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

In [29]: samlss.plot_simulation(100, stationary=True)


plt.show()
13.9. PURE MULTIPLIER MODEL 199

In [30]: samlss.plot_irf(100)
plt.show()

In [31]: samlss.multipliers()

Out[31]: array([7.414389, 6.835896, 0.578493])

13.9 Pure Multiplier Model

Letโ€™s shut down the accelerator by setting ๐‘ = 0 to get a pure multiplier model

โ€ข the absence of cycles gives an idea about why Samuelson included the accelerator

In [32]: pure_multiplier = SamuelsonLSS(ฮฑ=0.95, ฮฒ=0)

In [33]: pure_multiplier.plot_simulation()

Stationary distribution does not exist

Out[33]:
200 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

In [34]: pure_multiplier = SamuelsonLSS(ฮฑ=0.8, ฮฒ=0)

In [35]: pure_multiplier.plot_simulation()
13.9. PURE MULTIPLIER MODEL 201

Out[35]:

In [36]: pure_multiplier.plot_irf(100)
202 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

Out[36]:
13.10. SUMMARY 203

13.10 Summary

In this lecture, we wrote functions and classes to represent non-stochastic and stochastic ver-
sions of the Samuelson (1939) multiplier-accelerator model, described in [115]
We saw that different parameter values led to different output paths, which could either be
stationary, explosive, or oscillating
We also were able to represent the model using the QuantEcon.py LinearStateSpace class
204 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
14

More Language Features

14.1 Contents

โ€ข Overview 14.2
โ€ข Iterables and Iterators 14.3
โ€ข Names and Name Resolution 14.4
โ€ข Handling Errors 14.5
โ€ข Decorators and Descriptors 14.6
โ€ข Generators 14.7
โ€ข Recursive Function Calls 14.8
โ€ข Exercises 14.9
โ€ข Solutions 14.10

14.2 Overview

With this last lecture, our advice is to skip it on first pass, unless you have a burning de-
sire to read it
Itโ€™s here

1. as a reference, so we can link back to it when required, and


2. for those who have worked through a number of applications, and now want to learn
more about the Python language

A variety of topics are treated in the lecture, including generators, exceptions and descriptors

14.3 Iterables and Iterators

Weโ€™ve already said something about iterating in Python


Now letโ€™s look more closely at how it all works, focusing in Pythonโ€™s implementation of the
for loop

205
206 14. MORE LANGUAGE FEATURES

14.3.1 Iterators

Iterators are a uniform interface to stepping through elements in a collection


Here weโ€™ll talk about using iteratorsโ€”later weโ€™ll learn how to build our own
Formally, an iterator is an object with a __next__ method
For example, file objects are iterators
To see this, letโ€™s have another look at the US cities data, which is written to the present
working directory in the following cell

In [1]: %%file us_cities.txt


new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229

Writing us_cities.txt

In [2]: f = open('us_cities.txt')
f.__next__()

Out[2]: 'new york: 8244910\n'

In [3]: f.__next__()

Out[3]: 'los angeles: 3819702\n'

We see that file objects do indeed have a __next__ method, and that calling this method
returns the next line in the file
The next method can also be accessed via the builtin function next(), which directly calls
this method

In [4]: next(f)

Out[4]: 'chicago: 2707120\n'

The objects returned by enumerate() are also iterators

In [5]: e = enumerate(['foo', 'bar'])


next(e)

Out[5]: (0, 'foo')

In [6]: next(e)

Out[6]: (1, 'bar')


14.3. ITERABLES AND ITERATORS 207

as are the reader objects from the csv module


Letโ€™s create a small csv file that contains data from the NIKKEI index

In [7]: %%file test_table.csv


Date,Open,High,Low,Close,Volume,Adj Close
2009-05-21,9280.35,9286.35,9189.92,9264.15,133200,9264.15
2009-05-20,9372.72,9399.40,9311.61,9344.64,143200,9344.64
2009-05-19,9172.56,9326.75,9166.97,9290.29,167000,9290.29
2009-05-18,9167.05,9167.82,8997.74,9038.69,147800,9038.69
2009-05-15,9150.21,9272.08,9140.90,9265.02,172000,9265.02
2009-05-14,9212.30,9223.77,9052.41,9093.73,169400,9093.73
2009-05-13,9305.79,9379.47,9278.89,9340.49,176000,9340.49
2009-05-12,9358.25,9389.61,9298.61,9298.61,188400,9298.61
2009-05-11,9460.72,9503.91,9342.75,9451.98,230800,9451.98
2009-05-08,9351.40,9464.43,9349.57,9432.83,220200,9432.83

Writing test_table.csv

In [8]: from csv import reader

f = open('test_table.csv', 'r')
nikkei_data = reader(f)
next(nikkei_data)

Out[8]: ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']

In [9]: next(nikkei_data)

Out[9]: ['2009-05-21', '9280.35', '9286.35', '9189.92', '9264.15', '133200', '9264.15']

14.3.2 Iterators in For Loops

All iterators can be placed to the right of the in keyword in for loop statements
In fact this is how the for loop works: If we write

for x in iterator:
<code block>

then the interpreter

โ€ข calls iterator.___next___() and binds x to the result


โ€ข executes the code block
โ€ข repeats until a StopIteration error occurs

So now you know how this magical looking syntax works

f = open('somefile.txt', 'r')
for line in f:
# do something

The interpreter just keeps

1. calling f.__next__() and binding line to the result


2. executing the body of the loop

This continues until a StopIteration error occurs


208 14. MORE LANGUAGE FEATURES

14.3.3 Iterables

You already know that we can put a Python list to the right of in in a for loop

In [10]: for i in ['spam', 'eggs']:


print(i)

spam
eggs

So does that mean that a list is an iterator?


The answer is no

In [11]: x = ['foo', 'bar']


type(x)

Out[11]: list

In [12]: next(x)

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-12-92de4e9f6b1e> in <module>
----> 1 next(x)

TypeError: 'list' object is not an iterator

So why can we iterate over a list in a for loop?


The reason is that a list is iterable (as opposed to an iterator)
Formally, an object is iterable if it can be converted to an iterator using the built-in function
iter()
Lists are one such object

In [13]: x = ['foo', 'bar']


type(x)

Out[13]: list

In [14]: y = iter(x)
type(y)

Out[14]: list_iterator

In [15]: next(y)

Out[15]: 'foo'

In [16]: next(y)
14.3. ITERABLES AND ITERATORS 209

Out[16]: 'bar'

In [17]: next(y)

---------------------------------------------------------------------------

StopIteration Traceback (most recent call last)

<ipython-input-17-81b9d2f0f16a> in <module>
----> 1 next(y)

StopIteration:

Many other objects are iterable, such as dictionaries and tuples


Of course, not all objects are iterable

In [18]: iter(42)

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-18-ef50b48e4398> in <module>
----> 1 iter(42)

TypeError: 'int' object is not iterable

To conclude our discussion of for loops

โ€ข for loops work on either iterators or iterables


โ€ข In the second case, the iterable is converted into an iterator before the loop starts

14.3.4 Iterators and built-ins

Some built-in functions that act on sequences also work with iterables

โ€ข max(), min(), sum(), all(), any()

For example

In [19]: x = [10, -10]


max(x)

Out[19]: 10

In [20]: y = iter(x)
type(y)

Out[20]: list_iterator
210 14. MORE LANGUAGE FEATURES

In [21]: max(y)

Out[21]: 10

One thing to remember about iterators is that they are depleted by use

In [22]: x = [10, -10]


y = iter(x)
max(y)

Out[22]: 10

In [23]: max(y)

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-23-062424e6ec08> in <module>
----> 1 max(y)

ValueError: max() arg is an empty sequence

14.4 Names and Name Resolution

14.4.1 Variable Names in Python

Consider the Python statement

In [24]: x = 42

We now know that when this statement is executed, Python creates an object of type int in
your computerโ€™s memory, containing

โ€ข the value 42
โ€ข some associated attributes

But what is x itself?


In Python, x is called a name, and the statement x = 42 binds the name x to the integer
object we have just discussed
Under the hood, this process of binding names to objects is implemented as a dictionaryโ€”
more about this in a moment
There is no problem binding two or more names to the one object, regardless of what that
object is

In [25]: def f(string): # Create a function called f


print(string) # that prints any string it's passed

g = f
id(g) == id(f)
14.4. NAMES AND NAME RESOLUTION 211

Out[25]: True

In [26]: g('test')

test

In the first step, a function object is created, and the name f is bound to it
After binding the name g to the same object, we can use it anywhere we would use f
What happens when the number of names bound to an object goes to zero?
Hereโ€™s an example of this situation, where the name x is first bound to one object and then
rebound to another

In [27]: x = 'foo'
id(x)

Out[27]: 139979150881488

In [28]: x = 'bar' # No names bound to the first object

What happens here is that the first object is garbage collected


In other words, the memory slot that stores that object is deallocated, and returned to the
operating system

14.4.2 Namespaces

Recall from the preceding discussion that the statement

In [29]: x = 42

binds the name x to the integer object on the right-hand side


We also mentioned that this process of binding x to the correct object is implemented as a
dictionary
This dictionary is called a namespace
Definition: A namespace is a symbol table that maps names to objects in memory
Python uses multiple namespaces, creating them on the fly as necessary
For example, every time we import a module, Python creates a namespace for that module
To see this in action, suppose we write a script math2.py with a single line

In [30]: %%file math2.py


pi = 'foobar'

Writing math2.py

Now we start the Python interpreter and import it


212 14. MORE LANGUAGE FEATURES

In [31]: import math2

Next letโ€™s import the math module from the standard library

In [32]: import math

Both of these modules have an attribute called pi

In [33]: math.pi

Out[33]: 3.141592653589793

In [34]: math2.pi

Out[34]: 'foobar'

These two different bindings of pi exist in different namespaces, each one implemented as a
dictionary
We can look at the dictionary directly, using module_name.__dict__

In [35]: import math

math.__dict__.items()

Out[35]: dict_items([('__name__', 'math'), ('__doc__', 'This module is always available. It provides access t

In [36]: import math2

math2.__dict__.items()

Out[36]: dict_items([('__name__', 'math2'), ('__doc__', None), ('__package__', ''), ('__loader__', <_frozen_im


All Rights Reserved.

Copyright (c) 2000 BeOpen.com.


All Rights Reserved.

Copyright (c) 1995-2001 Corporation for National Research Initiatives.


All Rights Reserved.

Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.


All Rights Reserved., 'credits': Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of
for supporting Python development. See www.python.org for more information., 'license': Type lic

As you know, we access elements of the namespace using the dotted attribute notation

In [37]: math.pi

Out[37]: 3.141592653589793

In fact this is entirely equivalent to math.__dict__['pi']

In [38]: math.__dict__['pi'] == math.pi

Out[38]: True
14.4. NAMES AND NAME RESOLUTION 213

14.4.3 Viewing Namespaces

As we saw above, the math namespace can be printed by typing math.__dict__


Another way to see its contents is to type vars(math)

In [39]: vars(math).items()

Out[39]: dict_items([('__name__', 'math'), ('__doc__', 'This module is always available. It provides access t

If you just want to see the names, you can type

In [40]: dir(math)[0:10]

Out[40]: ['__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'acos',
'acosh',
'asin',
'asinh']

Notice the special names __doc__ and __name__


These are initialized in the namespace when any module is imported

โ€ข __doc__ is the doc string of the module


โ€ข __name__ is the name of the module

In [41]: print(math.__doc__)

This module is always available. It provides access to the


mathematical functions defined by the C standard.

In [42]: math.__name__

Out[42]: 'math'

14.4.4 Interactive Sessions

In Python, all code executed by the interpreter runs in some module


What about commands typed at the prompt?
These are also regarded as being executed within a module โ€” in this case, a module called
__main__
To check this, we can look at the current module name via the value of __name__ given at
the prompt

In [43]: print(__name__)
214 14. MORE LANGUAGE FEATURES

__main__

When we run a script using IPythonโ€™s run command, the contents of the file are executed as
part of __main__ too
To see this, letโ€™s create a file mod.py that prints its own __name__ attribute

In [44]: %%file mod.py


print(__name__)

Writing mod.py

Now letโ€™s look at two different ways of running it in IPython

In [45]: import mod # Standard import

mod

In [46]: %run mod.py # Run interactively

__main__

In the second case, the code is executed as part of __main__, so __name__ is equal to
__main__
To see the contents of the namespace of __main__ we use vars() rather than
vars(__main__)
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has
initialized when you started up your session
If you prefer to see only the variables you have initialized, use whos

In [47]: x = 2
y = 3

import numpy as np

%whos

Variable Type Data/Info


-----------------------------------------------------
e enumerate <enumerate object at 0x7f4f6c16f708>
f function <function f at 0x7f4f6c1c7048>
g function <function f at 0x7f4f6c1c7048>
i str eggs
math module <module 'math' from '/hom<โ€ฆ>37m-x86_64-linux-gnu.so'>
math2 module <module 'math2' from '/ho<โ€ฆ>pyter/executed/math2.py'>
mod module <module 'mod' from '/home<โ€ฆ>jupyter/executed/mod.py'>
nikkei_data reader <_csv.reader object at 0x7f4f6c178588>
np module <module 'numpy' from '/ho<โ€ฆ>kages/numpy/__init__.py'>
reader builtin_function_or_method <built-in function reader>
x int 2
y int 3
14.4. NAMES AND NAME RESOLUTION 215

14.4.5 The Global Namespace

Python documentation often makes reference to the โ€œglobal namespaceโ€


The global namespace is the namespace of the module currently being executed
For example, suppose that we start the interpreter and begin making assignments
We are now working in the module __main__, and hence the namespace for __main__ is
the global namespace
Next, we import a module called amodule

import amodule

At this point, the interpreter creates a namespace for the module amodule and starts exe-
cuting commands in the module
While this occurs, the namespace amodule.__dict__ is the global namespace
Once execution of the module finishes, the interpreter returns to the module from where the
import statement was made
In this case itโ€™s __main__, so the namespace of __main__ again becomes the global names-
pace

14.4.6 Local Namespaces

Important fact: When we call a function, the interpreter creates a local namespace for that
function, and registers the variables in that namespace
The reason for this will be explained in just a moment
Variables in the local namespace are called local variables
After the function returns, the namespace is deallocated and lost
While the function is executing, we can view the contents of the local namespace with lo-
cals()
For example, consider

In [48]: def f(x):


a = 2
print(locals())
return a * x

Now letโ€™s call the function

In [49]: f(1)

{'x': 1, 'a': 2}

Out[49]: 2

You can see the local namespace of f before it is destroyed


216 14. MORE LANGUAGE FEATURES

14.4.7 The __builtins__ Namespace

We have been using various built-in functions, such as max(), dir(), str(), list(),
len(), range(), type(), etc.
How does access to these names work?

โ€ข These definitions are stored in a module called __builtin__


โ€ข They have there own namespace called __builtins__

In [50]: dir()[0:10]

Out[50]: ['In', 'Out', '_', '_11', '_13', '_14', '_15', '_16', '_19', '_2']

In [51]: dir(__builtins__)[0:10]

Out[51]: ['ArithmeticError',
'AssertionError',
'AttributeError',
'BaseException',
'BlockingIOError',
'BrokenPipeError',
'BufferError',
'BytesWarning',
'ChildProcessError',
'ConnectionAbortedError']

We can access elements of the namespace as follows

In [52]: __builtins__.max

Out[52]: <function max>

But __builtins__ is special, because we can always access them directly as well

In [53]: max

Out[53]: <function max>

In [54]: __builtins__.max == max

Out[54]: True

The next section explains how this works โ€ฆ

14.4.8 Name Resolution

Namespaces are great because they help us organize variable names


(Type import this at the prompt and look at the last item thatโ€™s printed)
However, we do need to understand how the Python interpreter works with multiple names-
paces
14.4. NAMES AND NAME RESOLUTION 217

At any point of execution, there are in fact at least two namespaces that can be accessed di-
rectly
(โ€œAccessed directlyโ€ means without using a dot, as in pi rather than math.pi)
These namespaces are

โ€ข The global namespace (of the module being executed)


โ€ข The builtin namespace

If the interpreter is executing a function, then the directly accessible namespaces are

โ€ข The local namespace of the function


โ€ข The global namespace (of the module being executed)
โ€ข The builtin namespace

Sometimes functions are defined within other functions, like so

In [55]: def f():


a = 2
def g():
b = 4
print(a * b)
g()

Here f is the enclosing function for g, and each function gets its own namespaces
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is

1. the local namespace (if it exists)


2. the hierarchy of enclosing namespaces (if they exist)
3. the global namespace
4. the builtin namespace

If the name is not in any of these namespaces, the interpreter raises a NameError
This is called the LEGB rule (local, enclosing, global, builtin)
Hereโ€™s an example that helps to illustrate
Consider a script test.py that looks as follows

In [56]: %%file test.py


def g(x):
a = 1
x = x + a
return x

a = 0
y = g(10)
print("a = ", a, "y = ", y)

Writing test.py

What happens when we run this script?


218 14. MORE LANGUAGE FEATURES

In [57]: %run test.py

a = 0 y = 11

In [58]: x

Out[58]: 2

First,

โ€ข The global namespace {} is created


โ€ข The function object is created, and g is bound to it within the global namespace
โ€ข The name a is bound to 0, again in the global namespace

Next g is called via y = g(10), leading to the following sequence of actions

โ€ข The local namespace for the function is created


โ€ข Local names x and a are bound, so that the local namespace becomes {'x': 10,
'a': 1}
โ€ข Statement x = x + a uses the local a and local x to compute x + a, and binds local
name x to the result
โ€ข This value is returned, and y is bound to it in the global namespace
โ€ข Local x and a are discarded (and the local namespace is deallocated)

Note that the global a was not affected by the local a

14.4.9 Mutable Versus Immutable Parameters

This is a good time to say a little more about mutable vs immutable objects
Consider the code segment

In [59]: def f(x):


x = x + 1
return x

x = 1
print(f(x), x)

2 1

We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as
the value of x
First f and x are registered in the global namespace
The call f(x) creates a local namespace and adds x to it, bound to 1
Next, this local x is rebound to the new integer object 2, and this value is returned
None of this affects the global x
However, itโ€™s a different story when we use a mutable data type such as a list
14.5. HANDLING ERRORS 219

In [60]: def f(x):


x[0] = x[0] + 1
return x

x = [1]
print(f(x), x)

[2] [2]

This prints as the value of f(x) and same for x


Hereโ€™s what happens

โ€ข f is registered as a function in the global namespace


โ€ข x bound to in the global namespace
โ€ข The call f(x)

โ€“ Creates a local namespace


โ€“ Adds x to local namespace, bound to
โ€“ The list is modified to
โ€“ Returns the list
โ€“ The local namespace is deallocated, and local x is lost

โ€ข Global x has been modified

14.5 Handling Errors

Sometimes itโ€™s possible to anticipate errors as weโ€™re writing code


For example, the unbiased sample variance of sample ๐‘ฆ1 , โ€ฆ , ๐‘ฆ๐‘› is defined as

๐‘›
1
๐‘ 2 โˆถ= โˆ‘(๐‘ฆ๐‘– โˆ’ ๐‘ฆ)ฬ„ 2 ๐‘ฆ ฬ„ = sample mean
๐‘› โˆ’ 1 ๐‘–=1

This can be calculated in NumPy using np.var


But if you were writing a function to handle such a calculation, you might anticipate a divide-
by-zero error when the sample size is one
One possible action is to do nothing โ€” the program will just crash, and spit out an error
message
But sometimes itโ€™s worth writing your code in a way that anticipates and deals with runtime
errors that you think might arise
Why?

โ€ข Because the debugging information provided by the interpreter is often less useful than
the information on possible errors you have in your head when writing code
โ€ข Because errors causing execution to stop are frustrating if youโ€™re in the middle of a
large computation
โ€ข Because itโ€™s reduces confidence in your code on the part of your users (if you are writing
for others)
220 14. MORE LANGUAGE FEATURES

14.5.1 Assertions

A relatively easy way to handle checks is with the assert keyword


For example, pretend for a moment that the np.var function doesnโ€™t exist and we need to
write our own

In [61]: def var(y):


n = len(y)
assert n > 1, 'Sample size must be greater than one.'
return np.sum((y - y.mean())**2) / float(n-1)

If we run this with an array of length one, the program will terminate and print our error
message

In [62]: var([1])

---------------------------------------------------------------------------

AssertionError Traceback (most recent call last)

<ipython-input-62-8419b6ab38ec> in <module>
----> 1 var([1])

<ipython-input-61-e6ffb16a7098> in var(y)
1 def var(y):
2 n = len(y)
----> 3 assert n > 1, 'Sample size must be greater than one.'
4 return np.sum((y - y.mean())**2) / float(n-1)

AssertionError: Sample size must be greater than one.

The advantage is that we can

โ€ข fail early, as soon as we know there will be a problem


โ€ข supply specific information on why a program is failing

14.5.2 Handling Errors During Runtime

The approach used above is a bit limited, because it always leads to termination
Sometimes we can handle errors more gracefully, by treating special cases
Letโ€™s look at how this is done
Exceptions
Hereโ€™s an example of a common error type

In [63]: def f:

File "<ipython-input-63-262a7e387ba5>", line 1


def f:
^
SyntaxError: invalid syntax
14.5. HANDLING ERRORS 221

Since illegal syntax cannot be executed, a syntax error terminates execution of the program
Hereโ€™s a different kind of error, unrelated to syntax

In [64]: 1 / 0

---------------------------------------------------------------------------

ZeroDivisionError Traceback (most recent call last)

<ipython-input-64-bc757c3fda29> in <module>
----> 1 1 / 0

ZeroDivisionError: division by zero

Hereโ€™s another

In [65]: x1 = y1

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-65-a7b8d65e9e45> in <module>
----> 1 x1 = y1

NameError: name 'y1' is not defined

And another

In [66]: 'foo' + 6

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-66-216809d6e6fe> in <module>
----> 1 'foo' + 6

TypeError: can only concatenate str (not "int") to str

And another

In [67]: X = []
x = X[0]

---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

<ipython-input-67-082a18d7a0aa> in <module>
1 X = []
----> 2 x = X[0]

IndexError: list index out of range


222 14. MORE LANGUAGE FEATURES

On each occasion, the interpreter informs us of the error type

โ€ข NameError, TypeError, IndexError, ZeroDivisionError, etc.

In Python, these errors are called exceptions


Catching Exceptions
We can catch and deal with exceptions using try โ€“ except blocks
Hereโ€™s a simple example

In [68]: def f(x):


try:
return 1.0 / x
except ZeroDivisionError:
print('Error: division by zero. Returned None')
return None

When we call f we get the following output

In [69]: f(2)

Out[69]: 0.5

In [70]: f(0)

Error: division by zero. Returned None

In [71]: f(0.0)

Error: division by zero. Returned None

The error is caught and execution of the program is not terminated


Note that other error types are not caught
If we are worried the user might pass in a string, we can catch that error too

In [72]: def f(x):


try:
return 1.0 / x
except ZeroDivisionError:
print('Error: Division by zero. Returned None')
except TypeError:
print('Error: Unsupported operation. Returned None')
return None

Hereโ€™s what happens

In [73]: f(2)

Out[73]: 0.5

In [74]: f(0)
14.6. DECORATORS AND DESCRIPTORS 223

Error: Division by zero. Returned None

In [75]: f('foo')

Error: Unsupported operation. Returned None

If we feel lazy we can catch these errors together

In [76]: def f(x):


try:
return 1.0 / x
except (TypeError, ZeroDivisionError):
print('Error: Unsupported operation. Returned None')
return None

Hereโ€™s what happens

In [77]: f(2)

Out[77]: 0.5

In [78]: f(0)

Error: Unsupported operation. Returned None

In [79]: f('foo')

Error: Unsupported operation. Returned None

If we feel extra lazy we can catch all error types as follows

In [80]: def f(x):


try:
return 1.0 / x
except:
print('Error. Returned None')
return None

In general itโ€™s better to be specific

14.6 Decorators and Descriptors

Letโ€™s look at some special syntax elements that are routinely used by Python developers
You might not need the following concepts immediately, but you will see them in other peo-
pleโ€™s code
Hence you need to understand them at some stage of your Python education
224 14. MORE LANGUAGE FEATURES

14.6.1 Decorators

Decorators are a bit of syntactic sugar that, while easily avoided, have turned out to be popu-
lar
Itโ€™s very easy to say what decorators do
On the other hand it takes a bit of effort to explain why you might use them
An Example
Suppose we are working on a program that looks something like this

In [81]: import numpy as np

def f(x):
return np.log(np.log(x))

def g(x):
return np.sqrt(42 * x)

# Program continues with various calculations using f and g

Now suppose thereโ€™s a problem: occasionally negative numbers get fed to f and g in the cal-
culations that follow
If you try it, youโ€™ll see that when these functions are called with negative numbers they re-
turn a NumPy object called nan
This stands for โ€œnot a numberโ€ (and indicates that you are trying to evaluate a mathematical
function at a point where it is not defined)
Perhaps this isnโ€™t what we want, because it causes other problems that are hard to pick up
later on
Suppose that instead we want the program to terminate whenever this happens, with a sensi-
ble error message
This change is easy enough to implement

In [82]: import numpy as np

def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))

def g(x):
assert x >= 0, "Argument must be nonnegative"
return np.sqrt(42 * x)

# Program continues with various calculations using f and g

Notice however that there is some repetition here, in the form of two identical lines of code
Repetition makes our code longer and harder to maintain, and hence is something we try
hard to avoid
Here itโ€™s not a big deal, but imagine now that instead of just f and g, we have 20 such func-
tions that we need to modify in exactly the same way
This means we need to repeat the test logic (i.e., the assert line testing nonnegativity) 20
times
14.6. DECORATORS AND DESCRIPTORS 225

The situation is still worse if the test logic is longer and more complicated
In this kind of scenario the following approach would be neater

In [83]: import numpy as np

def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function

def f(x):
return np.log(np.log(x))

def g(x):
return np.sqrt(42 * x)

f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g

This looks complicated so letโ€™s work through it slowly


To unravel the logic, consider what happens when we say f = check_nonneg(f)
This calls the function check_nonneg with parameter func set equal to f
Now check_nonneg creates a new function called safe_function that verifies x as non-
negative and then calls func on it (which is the same as f)
Finally, the global name f is then set equal to safe_function
Now the behavior of f is as we desire, and the same is true of g
At the same time, the test logic is written only once
Enter Decorators
The last version of our code is still not ideal
For example, if someone is reading our code and wants to know how f works, they will be
looking for the function definition, which is

In [84]: def f(x):


return np.log(np.log(x))

They may well miss the line f = check_nonneg(f)


For this and other reasons, decorators were introduced to Python
With decorators, we can replace the lines

In [85]: def f(x):


return np.log(np.log(x))

def g(x):
return np.sqrt(42 * x)

f = check_nonneg(f)
g = check_nonneg(g)

with
226 14. MORE LANGUAGE FEATURES

In [86]: @check_nonneg
def f(x):
return np.log(np.log(x))

@check_nonneg
def g(x):
return np.sqrt(42 * x)

These two pieces of code do exactly the same thing


If they do the same thing, do we really need decorator syntax?
Well, notice that the decorators sit right on top of the function definitions
Hence anyone looking at the definition of the function will see them and be aware that the
function is modified
In the opinion of many people, this makes the decorator syntax a significant improvement to
the language

14.6.2 Descriptors

Descriptors solve a common problem regarding management of variables


To understand the issue, consider a Car class, that simulates a car
Suppose that this class defines the variables miles and kms, which give the distance traveled
in miles and kilometers respectively
A highly simplified version of the class might look as follows

In [87]: class Car:

def __init__(self, miles=1000):


self.miles = miles
self.kms = miles * 1.61

# Some other functionality, details omitted

One potential problem we might have here is that a user alters one of these variables but not
the other

In [88]: car = Car()


car.miles

Out[88]: 1000

In [89]: car.kms

Out[89]: 1610.0

In [90]: car.miles = 6000


car.kms

Out[90]: 1610.0

In the last two lines we see that miles and kms are out of sync
14.6. DECORATORS AND DESCRIPTORS 227

What we really want is some mechanism whereby each time a user sets one of these variables,
the other is automatically updated
A Solution
In Python, this issue is solved using descriptors
A descriptor is just a Python object that implements certain methods
These methods are triggered when the object is accessed through dotted attribute notation
The best way to understand this is to see it in action
Consider this alternative version of the Car class

In [91]: class Car:

def __init__(self, miles=1000):


self._miles = miles
self._kms = miles * 1.61

def set_miles(self, value):


self._miles = value
self._kms = value * 1.61

def set_kms(self, value):


self._kms = value
self._miles = value / 1.61

def get_miles(self):
return self._miles

def get_kms(self):
return self._kms

miles = property(get_miles, set_miles)


kms = property(get_kms, set_kms)

First letโ€™s check that we get the desired behavior

In [92]: car = Car()


car.miles

Out[92]: 1000

In [93]: car.miles = 6000


car.kms

Out[93]: 9660.0

Yep, thatโ€™s what we want โ€” car.kms is automatically updated


How it Works
The names _miles and _kms are arbitrary names we are using to store the values of the
variables
The objects miles and kms are properties, a common kind of descriptor
The methods get_miles, set_miles, get_kms and set_kms define what happens when
you get (i.e. access) or set (bind) these variables

โ€ข So-called โ€œgetterโ€ and โ€œsetterโ€ methods


228 14. MORE LANGUAGE FEATURES

The builtin Python function property takes getter and setter methods and creates a prop-
erty
For example, after car is created as an instance of Car, the object car.miles is a property
Being a property, when we set its value via car.miles = 6000 its setter method is trig-
gered โ€” in this case set_miles
Decorators and Properties
These days its very common to see the property function used via a decorator
Hereโ€™s another version of our Car class that works as before but now uses decorators to set
up the properties

In [94]: class Car:

def __init__(self, miles=1000):


self._miles = miles
self._kms = miles * 1.61

@property
def miles(self):
return self._miles

@property
def kms(self):
return self._kms

@miles.setter
def miles(self, value):
self._miles = value
self._kms = value * 1.61

@kms.setter
def kms(self, value):
self._kms = value
self._miles = value / 1.61

We wonโ€™t go through all the details here


For further information you can refer to the descriptor documentation

14.7 Generators

A generator is a kind of iterator (i.e., it works with a next function)


We will study two ways to build generators: generator expressions and generator functions

14.7.1 Generator Expressions

The easiest way to build generators is using generator expressions


Just like a list comprehension, but with round brackets
Here is the list comprehension:

In [95]: singular = ('dog', 'cat', 'bird')


type(singular)

Out[95]: tuple
14.7. GENERATORS 229

In [96]: plural = [string + 's' for string in singular]


plural

Out[96]: ['dogs', 'cats', 'birds']

In [97]: type(plural)

Out[97]: list

And here is the generator expression

In [98]: singular = ('dog', 'cat', 'bird')


plural = (string + 's' for string in singular)
type(plural)

Out[98]: generator

In [99]: next(plural)

Out[99]: 'dogs'

In [100]: next(plural)

Out[100]: 'cats'

In [101]: next(plural)

Out[101]: 'birds'

Since sum() can be called on iterators, we can do this

In [102]: sum((x * x for x in range(10)))

Out[102]: 285

The function sum() calls next() to get the items, adds successive terms
In fact, we can omit the outer brackets in this case

In [103]: sum(x * x for x in range(10))

Out[103]: 285

14.7.2 Generator Functions

The most flexible way to create generator objects is to use generator functions
Letโ€™s look at some examples
Example 1
Hereโ€™s a very simple example of a generator function
230 14. MORE LANGUAGE FEATURES

In [104]: def f():


yield 'start'
yield 'middle'
yield 'end'

It looks like a function, but uses a keyword yield that we havenโ€™t met before
Letโ€™s see how it works after running this code

In [105]: type(f)

Out[105]: function

In [106]: gen = f()


gen

Out[106]: <generator object f at 0x7f4f6c1bb1b0>

In [107]: next(gen)

Out[107]: 'start'

In [108]: next(gen)

Out[108]: 'middle'

In [109]: next(gen)

Out[109]: 'end'

In [110]: next(gen)

---------------------------------------------------------------------------

StopIteration Traceback (most recent call last)

<ipython-input-110-6e72e47198db> in <module>
----> 1 next(gen)

StopIteration:

The generator function f() is used to create generator objects (in this case gen)
Generators are iterators, because they support a next method
The first call to next(gen)

โ€ข Executes code in the body of f() until it meets a yield statement


โ€ข Returns that value to the caller of next(gen)

The second call to next(gen) starts executing from the next line
14.7. GENERATORS 231

In [111]: def f():


yield 'start'
yield 'middle' # This line!
yield 'end'

and continues until the next yield statement


At that point it returns the value following yield to the caller of next(gen), and so on
When the code block ends, the generator throws a StopIteration error
Example 2
Our next example receives an argument x from the caller

In [112]: def g(x):


while x < 100:
yield x
x = x * x

Letโ€™s see how it works

In [113]: g

Out[113]: <function __main__.g(x)>

In [114]: gen = g(2)


type(gen)

Out[114]: generator

In [115]: next(gen)

Out[115]: 2

In [116]: next(gen)

Out[116]: 4

In [117]: next(gen)

Out[117]: 16

In [118]: next(gen)

---------------------------------------------------------------------------

StopIteration Traceback (most recent call last)

<ipython-input-118-6e72e47198db> in <module>
----> 1 next(gen)

StopIteration:
232 14. MORE LANGUAGE FEATURES

The call gen = g(2) binds gen to a generator


Inside the generator, the name x is bound to 2
When we call next(gen)

โ€ข The body of g() executes until the line yield x, and the value of x is returned

Note that value of x is retained inside the generator


When we call next(gen) again, execution continues from where it left off

In [119]: def g(x):


while x < 100:
yield x
x = x * x # execution continues from here

When x < 100 fails, the generator throws a StopIteration error


Incidentally, the loop inside the generator can be infinite

In [120]: def g(x):


while 1:
yield x
x = x * x

14.7.3 Advantages of Iterators

Whatโ€™s the advantage of using an iterator here?


Suppose we want to sample a binomial(n,0.5)
One way to do it is as follows

In [121]: import random


n = 10000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
sum(draws)

Out[121]: 5001162

But we are creating two huge lists here, range(n) and draws
This uses lots of memory and is very slow
If we make n even bigger then this happens

In [122]: n = 100000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]

We can avoid these problems using iterators


Here is the generator function

In [123]: def f(n):


i = 1
while i <= n:
yield random.uniform(0, 1) < 0.5
i += 1
14.8. RECURSIVE FUNCTION CALLS 233

Now letโ€™s do the sum

In [124]: n = 10000000
draws = f(n)
draws

Out[124]: <generator object f at 0x7f4f4fdfbb88>

In [125]: sum(draws)

Out[125]: 5000216

In summary, iterables

โ€ข avoid the need to create big lists/tuples, and


โ€ข provide a uniform interface to iteration that can be used transparently in for loops

14.8 Recursive Function Calls

This is not something that you will use every day, but it is still useful โ€” you should learn it
at some stage
Basically, a recursive function is a function that calls itself
For example, consider the problem of computing ๐‘ฅ๐‘ก for some t when

๐‘ฅ๐‘ก+1 = 2๐‘ฅ๐‘ก , ๐‘ฅ0 = 1 (1)

Obviously the answer is 2๐‘ก


We can compute this easily enough with a loop

In [126]: def x_loop(t):


x = 1
for i in range(t):
x = 2 * x
return x

We can also use a recursive solution, as follows

In [127]: def x(t):


if t == 0:
return 1
else:
return 2 * x(t-1)

What happens here is that each successive call uses itโ€™s own frame in the stack

โ€ข a frame is where the local variables of a given function call are held
โ€ข stack is memory used to process function calls
โ€“ a First In Last Out (FILO) queue

This example is somewhat contrived, since the first (iterative) solution would usually be pre-
ferred to the recursive solution
Weโ€™ll meet less contrived applications of recursion later on
234 14. MORE LANGUAGE FEATURES

14.9 Exercises

14.9.1 Exercise 1

The Fibonacci numbers are defined by

๐‘ฅ๐‘ก+1 = ๐‘ฅ๐‘ก + ๐‘ฅ๐‘กโˆ’1 , ๐‘ฅ0 = 0, ๐‘ฅ1 = 1 (2)

The first few numbers in the sequence are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55
Write a function to recursively compute the ๐‘ก-th Fibonacci number for any ๐‘ก

14.9.2 Exercise 2

Complete the following code, and test it using this csv file, which we assume that youโ€™ve put
in your current working directory

def column_iterator(target_file, column_number):


"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
that steps through the elements of column column_number in file
target_file.
"""
# put your code here

dates = column_iterator('test_table.csv', 1)

for date in dates:


print(date)

14.9.3 Exercise 3

Suppose we have a text file numbers.txt containing the following lines

prices
3
8

7
21

Using try โ€“ except, write a program to read in the contents of the file and sum the num-
bers, ignoring lines without numbers
14.10. SOLUTIONS 235

14.10 Solutions

14.10.1 Exercise 1

Hereโ€™s the standard solution

In [128]: def x(t):


if t == 0:
return 0
if t == 1:
return 1
else:
return x(t-1) + x(t-2)

Letโ€™s test it

In [129]: print([x(i) for i in range(10)])

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

14.10.2 Exercise 2

One solution is as follows

In [130]: def column_iterator(target_file, column_number):


"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
which steps through the elements of column column_number in file
target_file.
"""
f = open(target_file, 'r')
for line in f:
yield line.split(',')[column_number - 1]
f.close()

dates = column_iterator('test_table.csv', 1)

i = 1
for date in dates:
print(date)
if i == 10:
break
i += 1

Date
2009-05-21
2009-05-20
2009-05-19
2009-05-18
2009-05-15
2009-05-14
2009-05-13
2009-05-12
2009-05-11

14.10.3 Exercise 3

Letโ€™s save the data first


236 14. MORE LANGUAGE FEATURES

In [131]: %%file numbers.txt


prices
3
8

7
21

Writing numbers.txt

In [132]: f = open('numbers.txt')

total = 0.0
for line in f:
try:
total += float(line)
except ValueError:
pass

f.close()

print(total)

39.0
15

Debugging

15.1 Contents

โ€ข Overview 15.2

โ€ข Debugging 15.3

โ€ข Other Useful Magics 15.4

โ€œDebugging is twice as hard as writing the code in the first place. Therefore, if
you write the code as cleverly as possible, you are, by definition, not smart enough
to debug it.โ€ โ€“ Brian Kernighan

15.2 Overview

Are you one of those programmers who fills their code with print statements when trying to
debug their programs?
Hey, we all used to do that
(OK, sometimes we still do thatโ€ฆ)
But once you start writing larger programs youโ€™ll need a better system
Debugging tools for Python vary across platforms, IDEs and editors
Here weโ€™ll focus on Jupyter and leave you to explore other settings
Weโ€™ll need the following imports

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

15.3 Debugging

15.3.1 The debug Magic

Letโ€™s consider a simple (and rather contrived) example

237
238 15. DEBUGGING

In [2]: def plot_log():


fig, ax = plt.subplots(2, 1)
x = np.linspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log() # Call the function, generate plot

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

<ipython-input-2-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot

<ipython-input-2-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6

AttributeError: 'numpy.ndarray' object has no attribute 'plot'

This code is intended to plot the log function over the interval [1, 2]
But thereโ€™s an error here: plt.subplots(2, 1) should be just plt.subplots()
(The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suit-
able for having two subplots on the same figure)
The traceback shows that the error occurs at the method call ax.plot(x, np.log(x))
The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array
has no plot method
15.3. DEBUGGING 239

But letโ€™s pretend that we donโ€™t understand this for the moment
We might suspect thereโ€™s something wrong with ax but when we try to investigate this ob-
ject, we get the following exception:

In [3]: ax

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-3-b00e77935981> in <module>
----> 1 ax

NameError: name 'ax' is not defined

The problem is that ax was defined inside plot_log(), and the name is lost once that func-
tion terminates
Letโ€™s try doing it a different way
We run the first cell block again, generating the same error

In [4]: def plot_log():


fig, ax = plt.subplots(2, 1)
x = np.linspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log() # Call the function, generate plot

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

<ipython-input-4-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot

<ipython-input-4-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6

AttributeError: 'numpy.ndarray' object has no attribute 'plot'


240 15. DEBUGGING

But this time we type in the following cell block

%debug

You should be dropped into a new prompt that looks something like this

ipdb>

(You might see pdb> instead)


Now we can investigate the value of our variables at this point in the program, step forward
through the code, etc.
For example, here we simply type the name ax to see whatโ€™s happening with this object:

ipdb> ax
array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>,
<matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)

Itโ€™s now very clear that ax is an array, which clarifies the source of the problem
To find out what else you can do from inside ipdb (or pdb), use the online help

ipdb> h

Documented commands (type help <topic>):


========================================
EOF bt cont enable jump pdef r tbreak w
a c continue exit l pdoc restart u whatis
alias cl d h list pinfo return unalias where
15.3. DEBUGGING 241

args clear debug help n pp run unt


b commands disable ignore next q s until
break condition down j p quit step up

Miscellaneous help topics:


==========================
exec pdb

Undocumented commands:
======================
retval rv

ipdb> h c
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.

15.3.2 Setting a Break Point

The preceding approach is handy but sometimes insufficient


Consider the following modified version of our function above

In [5]: def plot_log():


fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log()

Here the original problem is fixed, but weโ€™ve accidentally written np.logspace(1, 2,
10) instead of np.linspace(1, 2, 10)
242 15. DEBUGGING

Now there wonโ€™t be any exception, but the plot wonโ€™t look right
To investigate, it would be helpful if we could inspect variables like x during execution of the
function
To this end, we add a โ€œbreak pointโ€ by inserting breakpoint() inside the function code
block

def plot_log():
breakpoint()
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log()

Now letโ€™s run the script, and investigate via the debugger

> <ipython-input-6-a188074383b7>(6)plot_log()
-> fig, ax = plt.subplots()
(Pdb) n
> <ipython-input-6-a188074383b7>(7)plot_log()
-> x = np.logspace(1, 2, 10)
(Pdb) n
> <ipython-input-6-a188074383b7>(8)plot_log()
-> ax.plot(x, np.log(x))
(Pdb) x
array([ 10. , 12.91549665, 16.68100537, 21.5443469 ,
27.82559402, 35.93813664, 46.41588834, 59.94842503,
77.42636827, 100. ])

We used n twice to step forward through the code (one line at a time)
Then we printed the value of x to see what was happening with that variable
To exit from the debugger, use q

15.4 Other Useful Magics

In this lecture, we used the %debug IPython magic


There are many other useful magics:

โ€ข %precision 4 sets printed precision for floats to 4 decimal places


โ€ข %whos gives a list of variables and their values
โ€ข %quickref gives a list of magics

The full list of magics is here


Part IV

Data and Empirics

243
16

Pandas

16.1 Contents

โ€ข Overview 16.2

โ€ข Series 16.3

โ€ข DataFrames 16.4

โ€ข On-Line Data Sources 16.5

โ€ข Exercises 16.6

โ€ข Solutions 16.7

16.2 Overview

Pandas is a package of fast, efficient data analysis tools for Python


Its popularity has surged in recent years, coincident with the rise of fields such as data science
and machine learning
Hereโ€™s a popularity comparison over time against STATA and SAS, courtesy of Stack Over-
flow Trends

245
246 16. PANDAS

Just as NumPy provides the basic array data type plus core array operations, pandas

1. defines fundamental structures for working with data and


2. endows them with methods that facilitate operations such as

โ€ข reading in data
โ€ข adjusting indices
โ€ข working with dates and time series
โ€ข sorting, grouping, re-ordering and general data munging [1]
โ€ข dealing with missing values, etc., etc.

More sophisticated statistical functionality is left to other packages, such as statsmodels and
scikit-learn, which are built on top of pandas
This lecture will provide a basic introduction to pandas
Throughout the lecture, we will assume that the following imports have taken place

In [1]: import pandas as pd


import numpy as np

16.3 Series

Two important data types defined by pandas are Series and DataFrame
You can think of a Series as a โ€œcolumnโ€ of data, such as a collection of observations on a
single variable
A DataFrame is an object for storing related columns of data
Letโ€™s start with Series

In [2]: s = pd.Series(np.random.randn(4), name='daily returns')


s

Out[2]: 0 0.246617
1 1.616297
16.3. SERIES 247

2 1.371344
3 -0.854713
Name: daily returns, dtype: float64

Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the
values being daily returns on their shares
Pandas Series are built on top of NumPy arrays and support many similar operations

In [3]: s * 100

Out[3]: 0 24.661661
1 161.629724
2 137.134394
3 -85.471300
Name: daily returns, dtype: float64

In [4]: np.abs(s)

Out[4]: 0 0.246617
1 1.616297
2 1.371344
3 0.854713
Name: daily returns, dtype: float64

But Series provide more than NumPy arrays


Not only do they have some additional (statistically oriented) methods

In [5]: s.describe()

Out[5]: count 4.000000


mean 0.594886
std 1.135605
min -0.854713
25% -0.028716
50% 0.808980
75% 1.432582
max 1.616297
Name: daily returns, dtype: float64

But their indices are more flexible

In [6]: s.index = ['AMZN', 'AAPL', 'MSFT', 'GOOG']


s

Out[6]: AMZN 0.246617


AAPL 1.616297
MSFT 1.371344
GOOG -0.854713
Name: daily returns, dtype: float64

Viewed in this way, Series are like fast, efficient Python dictionaries (with the restriction
that the items in the dictionary all have the same typeโ€”in this case, floats)
In fact, you can use much of the same syntax as Python dictionaries

In [7]: s['AMZN']
248 16. PANDAS

Out[7]: 0.24661661104520952

In [8]: s['AMZN'] = 0
s

Out[8]: AMZN 0.000000


AAPL 1.616297
MSFT 1.371344
GOOG -0.854713
Name: daily returns, dtype: float64

In [9]: 'AAPL' in s

Out[9]: True

16.4 DataFrames

While a Series is a single column of data, a DataFrame is several columns, one for each
variable
In essence, a DataFrame in pandas is analogous to a (highly optimized) Excel spreadsheet
Thus, it is a powerful tool for representing and analyzing data that are naturally organized
into rows and columns, often with descriptive indexes for individual rows and individual
columns
Letโ€™s look at an example that reads data from the CSV file pandas/data/test_pwt.csv
that can be downloaded here
Hereโ€™s the content of test_pwt.csv

"country","country isocode","year","POP","XRAT","tcgdp","cc","cg"
"Argentina","ARG","2000","37335.653","0.9995","295072.21869","75.716805379","5.5
"Australia","AUS","2000","19053.186","1.72483","541804.6521","67.759025993","6.7
"India","IND","2000","1006300.297","44.9416","1728144.3748","64.575551328","14.0
"Israel","ISR","2000","6114.57","4.07733","129253.89423","64.436450847","10.2666
"Malawi","MWI","2000","11801.505","59.543808333","5026.2217836","74.707624181","
"South Africa","ZAF","2000","45064.098","6.93983","227242.36949","72.718710427",
"United States","USA","2000","282171.957","1","9898700","72.347054303","6.032453
"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","

Supposing you have this data saved as test_pwt.csv in the present working directory (type
%pwd in Jupyter to see what this is), it can be read in as follows:

In [10]: df = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/test_pw
type(df)

Out[10]: pandas.core.frame.DataFrame

In [11]: df

Out[11]: country country isocode year POP XRAT tcgdp \


0 Argentina ARG 2000 37335.653 0.999500 2.950722e+05
1 Australia AUS 2000 19053.186 1.724830 5.418047e+05
16.4. DATAFRAMES 249

2 India IND 2000 1006300.297 44.941600 1.728144e+06


3 Israel ISR 2000 6114.570 4.077330 1.292539e+05
4 Malawi MWI 2000 11801.505 59.543808 5.026222e+03
5 South Africa ZAF 2000 45064.098 6.939830 2.272424e+05
6 United States USA 2000 282171.957 1.000000 9.898700e+06
7 Uruguay URY 2000 3219.793 12.099592 2.525596e+04

cc cg
0 75.716805 5.578804
1 67.759026 6.720098
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
5 72.718710 5.726546
6 72.347054 6.032454
7 78.978740 5.108068

We can select particular rows using standard Python array slicing notation

In [12]: df[2:5]

Out[12]: country country isocode year POP XRAT tcgdp \


2 India IND 2000 1006300.297 44.941600 1.728144e+06
3 Israel ISR 2000 6114.570 4.077330 1.292539e+05
4 Malawi MWI 2000 11801.505 59.543808 5.026222e+03

cc cg
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954

To select columns, we can pass a list containing the names of the desired columns represented
as strings

In [13]: df[['country', 'tcgdp']]

Out[13]: country tcgdp


0 Argentina 2.950722e+05
1 Australia 5.418047e+05
2 India 1.728144e+06
3 Israel 1.292539e+05
4 Malawi 5.026222e+03
5 South Africa 2.272424e+05
6 United States 9.898700e+06
7 Uruguay 2.525596e+04

To select both rows and columns using integers, the iloc attribute should be used with the
format .iloc[rows, columns]

In [14]: df.iloc[2:5, 0:4]

Out[14]: country country isocode year POP


2 India IND 2000 1006300.297
3 Israel ISR 2000 6114.570
4 Malawi MWI 2000 11801.505

To select rows and columns using a mixture of integers and labels, the loc attribute can be
used in a similar way

In [15]: df.loc[df.index[2:5], ['country', 'tcgdp']]


250 16. PANDAS

Out[15]: country tcgdp


2 India 1.728144e+06
3 Israel 1.292539e+05
4 Malawi 5.026222e+03

Letโ€™s imagine that weโ€™re only interested in population and total GDP (tcgdp)
One way to strip the data frame df down to only these variables is to overwrite the
dataframe using the selection method described above

In [16]: df = df[['country', 'POP', 'tcgdp']]


df

Out[16]: country POP tcgdp


0 Argentina 37335.653 2.950722e+05
1 Australia 19053.186 5.418047e+05
2 India 1006300.297 1.728144e+06
3 Israel 6114.570 1.292539e+05
4 Malawi 11801.505 5.026222e+03
5 South Africa 45064.098 2.272424e+05
6 United States 282171.957 9.898700e+06
7 Uruguay 3219.793 2.525596e+04

Here the index 0, 1,..., 7 is redundant because we can use the country names as an in-
dex
To do this, we set the index to be the country variable in the dataframe

In [17]: df = df.set_index('country')
df

Out[17]: POP tcgdp


country
Argentina 37335.653 2.950722e+05
Australia 19053.186 5.418047e+05
India 1006300.297 1.728144e+06
Israel 6114.570 1.292539e+05
Malawi 11801.505 5.026222e+03
South Africa 45064.098 2.272424e+05
United States 282171.957 9.898700e+06
Uruguay 3219.793 2.525596e+04

Letโ€™s give the columns slightly better names

In [18]: df.columns = 'population', 'total GDP'


df

Out[18]: population total GDP


country
Argentina 37335.653 2.950722e+05
Australia 19053.186 5.418047e+05
India 1006300.297 1.728144e+06
Israel 6114.570 1.292539e+05
Malawi 11801.505 5.026222e+03
South Africa 45064.098 2.272424e+05
United States 282171.957 9.898700e+06
Uruguay 3219.793 2.525596e+04

Population is in thousands, letโ€™s revert to single units

In [19]: df['population'] = df['population'] * 1e3


df
16.4. DATAFRAMES 251

Out[19]: population total GDP


country
Argentina 3.733565e+07 2.950722e+05
Australia 1.905319e+07 5.418047e+05
India 1.006300e+09 1.728144e+06
Israel 6.114570e+06 1.292539e+05
Malawi 1.180150e+07 5.026222e+03
South Africa 4.506410e+07 2.272424e+05
United States 2.821720e+08 9.898700e+06
Uruguay 3.219793e+06 2.525596e+04

Next, weโ€™re going to add a column showing real GDP per capita, multiplying by 1,000,000 as
we go because total GDP is in millions

In [20]: df['GDP percap'] = df['total GDP'] * 1e6 / df['population']


df

Out[20]: population total GDP GDP percap


country
Argentina 3.733565e+07 2.950722e+05 7903.229085
Australia 1.905319e+07 5.418047e+05 28436.433261
India 1.006300e+09 1.728144e+06 1717.324719
Israel 6.114570e+06 1.292539e+05 21138.672749
Malawi 1.180150e+07 5.026222e+03 425.896679
South Africa 4.506410e+07 2.272424e+05 5042.647686
United States 2.821720e+08 9.898700e+06 35080.381854
Uruguay 3.219793e+06 2.525596e+04 7843.970620

One of the nice things about pandas DataFrame and Series objects is that they have
methods for plotting and visualization that work through Matplotlib
For example, we can easily generate a bar plot of GDP per capita

In [21]: import matplotlib.pyplot as plt


%matplotlib inline

df['GDP percap'].plot(kind='bar')
plt.show()
252 16. PANDAS

At the moment the data frame is ordered alphabetically on the countriesโ€”letโ€™s change it to
GDP per capita

In [22]: df = df.sort_values(by='GDP percap', ascending=False)


df

Out[22]: population total GDP GDP percap


country
United States 2.821720e+08 9.898700e+06 35080.381854
Australia 1.905319e+07 5.418047e+05 28436.433261
Israel 6.114570e+06 1.292539e+05 21138.672749
Argentina 3.733565e+07 2.950722e+05 7903.229085
Uruguay 3.219793e+06 2.525596e+04 7843.970620
South Africa 4.506410e+07 2.272424e+05 5042.647686
India 1.006300e+09 1.728144e+06 1717.324719
Malawi 1.180150e+07 5.026222e+03 425.896679

Plotting as before now yields

In [23]: df['GDP percap'].plot(kind='bar')


plt.show()
16.5. ON-LINE DATA SOURCES 253

16.5 On-Line Data Sources

Python makes it straightforward to query online databases programmatically


An important database for economists is FRED โ€” a vast collection of time series data main-
tained by the St. Louis Fed
For example, suppose that we are interested in the unemployment rate
Via FRED, the entire series for the US civilian unemployment rate can be downloaded di-
rectly by entering this URL into your browser (note that this requires an internet connection)

https://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv

(Equivalently, click here: https://research.stlouisfed.org/fred2/series/


UNRATE/downloaddata/UNRATE.csv)
This request returns a CSV file, which will be handled by your default application for this
class of files
Alternatively, we can access the CSV file from within a Python program
This can be done with a variety of methods
We start with a relatively low-level method and then return to pandas
254 16. PANDAS

16.5.1 Accessing Data with requests

One option is to use requests, a standard Python library for requesting data over the Internet
To begin, try the following code on your computer

In [24]: import requests

r = requests.get('http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv')

If thereโ€™s no error message, then the call has succeeded


If you do get an error, then there are two likely causes

1. You are not connected to the Internet โ€” hopefully, this isnโ€™t the case
2. Your machine is accessing the Internet through a proxy server, and Python isnโ€™t aware
of this

In the second case, you can either

โ€ข switch to another machine


โ€ข solve your proxy problem by reading the documentation

Assuming that all is working, you can now proceed


to use the source object returned by the call re-
quests.get('http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRA

In [25]: url = 'http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv'


source = requests.get(url).content.decode().split("\n")
source[0]

Out[25]: 'DATE,VALUE\r'

In [26]: source[1]

Out[26]: '1948-01-01,3.4\r'

In [27]: source[2]

Out[27]: '1948-02-01,3.8\r'

We could now write some additional code to parse this text and store it as an array
But this is unnecessary โ€” pandasโ€™ read_csv function can handle the task for us
We use parse_dates=True so that pandas recognizes our dates column, allowing for simple
date filtering

In [28]: data = pd.read_csv(url, index_col=0, parse_dates=True)

The data has been read into a pandas DataFrame called data that we can now manipulate in
the usual way
16.5. ON-LINE DATA SOURCES 255

In [29]: type(data)

Out[29]: pandas.core.frame.DataFrame

In [30]: data.head() # A useful method to get a quick look at a data frame

Out[30]: VALUE
DATE
1948-01-01 3.4
1948-02-01 3.8
1948-03-01 4.0
1948-04-01 3.9
1948-05-01 3.5

In [31]: pd.set_option('precision', 1)
data.describe() # Your output might differ slightly

Out[31]: VALUE
count 857.0
mean 5.8
std 1.6
min 2.5
25% 4.6
50% 5.6
75% 6.8
max 10.8

We can also plot the unemployment rate from 2006 to 2012 as follows

In [32]: data['2006':'2012'].plot()
plt.show()
256 16. PANDAS

16.5.2 Accessing World Bank Data

Letโ€™s look at one more example of downloading and manipulating data โ€” this time from the
World Bank
The World Bank collects and organizes data on a huge range of indicators
For example, hereโ€™s some data on government debt as a ratio to GDP
If you click on โ€œDOWNLOAD DATAโ€ you will be given the option to download the data as
an Excel file
The next program does this for you, reads an Excel file into a pandas DataFrame, and plots
time series for the US and Australia

In [33]: import matplotlib.pyplot as plt


import requests
import pandas as pd

# == Get data and read into file gd.xls == #


wb_data_query = "http://api.worldbank.org/v2/en/indicator/gc.dod.totl.gd.zs?downloadformat=excel"
r = requests.get(wb_data_query)
with open('gd.xls', 'wb') as output:
output.write(r.content)

# == Parse data into a DataFrame == #


govt_debt = pd.read_excel('gd.xls', sheet_name='Data', skiprows=3, index_col=1)

# == Take desired values and plot == #


govt_debt = govt_debt.transpose()
govt_debt = govt_debt[['AUS', 'USA']]
govt_debt = govt_debt[38:]
govt_debt.plot(lw=2)
plt.show()

(The file is pandas/wb_download.py, and can be downloaded here


16.6. EXERCISES 257

16.6 Exercises

16.6.1 Exercise 1

Write a program to calculate the percentage price change over 2013 for the following shares

In [34]: ticker_list = {'INTC': 'Intel',


'MSFT': 'Microsoft',
'IBM': 'IBM',
'BHP': 'BHP',
'TM': 'Toyota',
'AAPL': 'Apple',
'AMZN': 'Amazon',
'BA': 'Boeing',
'QCOM': 'Qualcomm',
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}

A dataset of daily closing prices for the above firms can be found in pan-
das/data/ticker_data.csv and can be downloaded here
Plot the result as a bar graph like follows

16.7 Solutions

16.7.1 Exercise 1
In [35]: ticker = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/tic
ticker.set_index('Date', inplace=True)

ticker_list = {'INTC': 'Intel',


'MSFT': 'Microsoft',
'IBM': 'IBM',
'BHP': 'BHP',
'TM': 'Toyota',
'AAPL': 'Apple',
258 16. PANDAS

'AMZN': 'Amazon',
'BA': 'Boeing',
'QCOM': 'Qualcomm',
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}

price_change = pd.Series()

for tick in ticker_list:


change = 100 * (ticker.loc[ticker.index[-1], tick] - ticker.loc[ticker.index[0], tick]) / ticker.
name = ticker_list[tick]
price_change[name] = change

price_change.sort_values(inplace=True)
fig, ax = plt.subplots(figsize=(10,8))
price_change.plot(kind='bar', ax=ax)
plt.show()

Footnotes
[1] Wikipedia defines munging as cleaning data from one raw form into a structured, purged
one.
17

Pandas for Panel Data

17.1 Contents

โ€ข Overview 17.2

โ€ข Slicing and Reshaping Data 17.3

โ€ข Merging Dataframes and Filling NaNs 17.4

โ€ข Grouping and Summarizing Data 17.5

โ€ข Final Remarks 17.6

โ€ข Exercises 17.7

โ€ข Solutions 17.8

17.2 Overview

In an earlier lecture on pandas, we looked at working with simple data sets


Econometricians often need to work with more complex data sets, such as panels
Common tasks include

โ€ข Importing data, cleaning it and reshaping it across several axes


โ€ข Selecting a time series or cross-section from a panel
โ€ข Grouping and summarizing data

pandas (derived from โ€˜panelโ€™ and โ€˜dataโ€™) contains powerful and easy-to-use tools for solving
exactly these kinds of problems
In what follows, we will use a panel data set of real minimum wages from the OECD to cre-
ate:

โ€ข summary statistics over multiple dimensions of our data


โ€ข a time series of the average minimum wage of countries in the dataset
โ€ข kernel density estimates of wages by continent

259
260 17. PANDAS FOR PANEL DATA

We will begin by reading in our long format panel data from a CSV file and reshaping the
resulting DataFrame with pivot_table to build a MultiIndex
Additional detail will be added to our DataFrame using pandasโ€™ merge function, and data
will be summarized with the groupby function
Most of this lecture was created by Natasha Watkins

17.3 Slicing and Reshaping Data

We will read in a dataset from the OECD of real minimum wages in 32 countries and assign
it to realwage
The dataset pandas_panel/realwage.csv can be downloaded here
Make sure the file is in your current working directory

In [1]: import pandas as pd

# Display 6 columns for viewing purposes


pd.set_option('display.max_columns', 6)

# Reduce decimal points to 2


pd.options.display.float_format = '{:,.2f}'.format

realwage = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/r

Letโ€™s have a look at what weโ€™ve got to work with

In [2]: realwage.head() # Show first 5 rows

Out[2]: Unnamed: 0 Time Country Series \


0 0 2006-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
1 1 2007-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
2 2 2008-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
3 3 2009-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
4 4 2010-01-01 Ireland In 2015 constant prices at 2015 USD PPPs

Pay period value


0 Annual 17,132.44
1 Annual 18,100.92
2 Annual 17,747.41
3 Annual 18,580.14
4 Annual 18,755.83

The data is currently in long format, which is difficult to analyze when there are several di-
mensions to the data
We will use pivot_table to create a wide format panel, with a MultiIndex to handle
higher dimensional data
pivot_table arguments should specify the data (values), the index, and the columns we
want in our resulting dataframe
By passing a list in columns, we can create a MultiIndex in our column axis

In [3]: realwage = realwage.pivot_table(values='value',


index='Time',
columns=['Country', 'Series', 'Pay period'])
realwage.head()
17.3. SLICING AND RESHAPING DATA 261

Out[3]: Country Australia \


Series In 2015 constant prices at 2015 USD PPPs
Pay period Annual Hourly
Time
2006-01-01 20,410.65 10.33
2007-01-01 21,087.57 10.67
2008-01-01 20,718.24 10.48
2009-01-01 20,984.77 10.62
2010-01-01 20,879.33 10.57

Country โ€ฆ \
Series In 2015 constant prices at 2015 USD exchange rates โ€ฆ
Pay period Annual โ€ฆ
Time โ€ฆ
2006-01-01 23,826.64 โ€ฆ
2007-01-01 24,616.84 โ€ฆ
2008-01-01 24,185.70 โ€ฆ
2009-01-01 24,496.84 โ€ฆ
2010-01-01 24,373.76 โ€ฆ

Country United States \


Series In 2015 constant prices at 2015 USD PPPs
Pay period Hourly
Time
2006-01-01 6.05
2007-01-01 6.24
2008-01-01 6.78
2009-01-01 7.58
2010-01-01 7.88

Country
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88

[5 rows x 128 columns]

To more easily filter our time series data, later on, we will convert the index into a Date-
TimeIndex

In [4]: realwage.index = pd.to_datetime(realwage.index)


type(realwage.index)

Out[4]: pandas.core.indexes.datetimes.DatetimeIndex

The columns contain multiple levels of indexing, known as a MultiIndex, with levels being
ordered hierarchically (Country > Series > Pay period)
A MultiIndex is the simplest and most flexible way to manage panel data in pandas

In [5]: type(realwage.columns)

Out[5]: pandas.core.indexes.multi.MultiIndex

In [6]: realwage.columns.names

Out[6]: FrozenList(['Country', 'Series', 'Pay period'])

Like before, we can select the country (the top level of our MultiIndex)
262 17. PANDAS FOR PANEL DATA

In [7]: realwage['United States'].head()

Out[7]: Series In 2015 constant prices at 2015 USD PPPs \


Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88

Series In 2015 constant prices at 2015 USD exchange rates


Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88

Stacking and unstacking levels of the MultiIndex will be used throughout this lecture to
reshape our dataframe into a format we need
.stack() rotates the lowest level of the column MultiIndex to the row index (.un-
stack() works in the opposite direction - try it out)

In [8]: realwage.stack().head()

Out[8]: Country Australia \


Series In 2015 constant prices at 2015 USD PPPs
Time Pay period
2006-01-01 Annual 20,410.65
Hourly 10.33
2007-01-01 Annual 21,087.57
Hourly 10.67
2008-01-01 Annual 20,718.24

Country \
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 23,826.64
Hourly 12.06
2007-01-01 Annual 24,616.84
Hourly 12.46
2008-01-01 Annual 24,185.70

Country Belgium โ€ฆ \
Series In 2015 constant prices at 2015 USD PPPs โ€ฆ
Time Pay period โ€ฆ
2006-01-01 Annual 21,042.28 โ€ฆ
Hourly 10.09 โ€ฆ
2007-01-01 Annual 21,310.05 โ€ฆ
Hourly 10.22 โ€ฆ
2008-01-01 Annual 21,416.96 โ€ฆ

Country United Kingdom \


Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 20,376.32
Hourly 9.81
2007-01-01 Annual 20,954.13
Hourly 10.07
2008-01-01 Annual 20,902.87

Country United States \


Series In 2015 constant prices at 2015 USD PPPs
Time Pay period
2006-01-01 Annual 12,594.40
Hourly 6.05
17.3. SLICING AND RESHAPING DATA 263

2007-01-01 Annual 12,974.40


Hourly 6.24
2008-01-01 Annual 14,097.56

Country
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 12,594.40
Hourly 6.05
2007-01-01 Annual 12,974.40
Hourly 6.24
2008-01-01 Annual 14,097.56

[5 rows x 64 columns]

We can also pass in an argument to select the level we would like to stack

In [9]: realwage.stack(level='Country').head()

Out[9]: Series In 2015 constant prices at 2015 USD PPPs \


Pay period Annual Hourly
Time Country
2006-01-01 Australia 20,410.65 10.33
Belgium 21,042.28 10.09
Brazil 3,310.51 1.41
Canada 13,649.69 6.56
Chile 5,201.65 2.22

Series In 2015 constant prices at 2015 USD exchange rates


Pay period Annual Hourly
Time Country
2006-01-01 Australia 23,826.64 12.06
Belgium 20,228.74 9.70
Brazil 2,032.87 0.87
Canada 14,335.12 6.89
Chile 3,333.76 1.42

Using a DatetimeIndex makes it easy to select a particular time period


Selecting one year and stacking the two lower levels of the MultiIndex creates a cross-
section of our panel data

In [10]: realwage['2015'].stack(level=(1, 2)).transpose().head()

Out[10]: Time 2015-01-01 \


Series In 2015 constant prices at 2015 USD PPPs
Pay period Annual Hourly
Country
Australia 21,715.53 10.99
Belgium 21,588.12 10.35
Brazil 4,628.63 2.00
Canada 16,536.83 7.95
Chile 6,633.56 2.80

Time
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Country
Australia 25,349.90 12.83
Belgium 20,753.48 9.95
Brazil 2,842.28 1.21
Canada 17,367.24 8.35
Chile 4,251.49 1.81

For the rest of lecture, we will work with a dataframe of the hourly real minimum wages
across countries and time, measured in 2015 US dollars
264 17. PANDAS FOR PANEL DATA

To create our filtered dataframe (realwage_f), we can use the xs method to select values
at lower levels in the multiindex, while keeping the higher levels (countries in this case)

In [11]: realwage_f = realwage.xs(('Hourly', 'In 2015 constant prices at 2015 USD exchange rates'),
level=('Pay period', 'Series'), axis=1)
realwage_f.head()

Out[11]: Country Australia Belgium Brazil โ€ฆ Turkey United Kingdom \


Time โ€ฆ
2006-01-01 12.06 9.70 0.87 โ€ฆ 2.27 9.81
2007-01-01 12.46 9.82 0.92 โ€ฆ 2.26 10.07
2008-01-01 12.24 9.87 0.96 โ€ฆ 2.22 10.04
2009-01-01 12.40 10.21 1.03 โ€ฆ 2.28 10.15
2010-01-01 12.34 10.05 1.08 โ€ฆ 2.30 9.96

Country United States


Time
2006-01-01 6.05
2007-01-01 6.24
2008-01-01 6.78
2009-01-01 7.58
2010-01-01 7.88

[5 rows x 32 columns]

17.4 Merging Dataframes and Filling NaNs

Similar to relational databases like SQL, pandas has built in methods to merge datasets to-
gether
Using country information from WorldData.info, weโ€™ll add the continent of each country to
realwage_f with the merge function
The CSV file can be found in pandas_panel/countries.csv and can be downloaded
here

In [12]: worlddata = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel


worlddata.head()

Out[12]: Country (en) Country (de) Country (local) โ€ฆ Deathrate \


0 Afghanistan Afghanistan Afganistan/Afqanestan โ€ฆ 13.70
1 Egypt ร„gypten Misr โ€ฆ 4.70
2 ร…land Islands ร…landinseln ร…land โ€ฆ 0.00
3 Albania Albanien Shqipรซria โ€ฆ 6.70
4 Algeria Algerien Al-Jazaโ€™ir/Algรฉrie โ€ฆ 4.30

Life expectancy Url


0 51.30 https://www.laenderdaten.info/Asien/Afghanistaโ€ฆ
1 72.70 https://www.laenderdaten.info/Afrika/Aegypten/โ€ฆ
2 0.00 https://www.laenderdaten.info/Europa/Aland/indโ€ฆ
3 78.30 https://www.laenderdaten.info/Europa/Albanien/โ€ฆ
4 76.80 https://www.laenderdaten.info/Afrika/Algerien/โ€ฆ

[5 rows x 17 columns]

First, weโ€™ll select just the country and continent variables from worlddata and rename the
column to โ€˜Countryโ€™

In [13]: worlddata = worlddata[['Country (en)', 'Continent']]


worlddata = worlddata.rename(columns={'Country (en)': 'Country'})
worlddata.head()
17.4. MERGING DATAFRAMES AND FILLING NANS 265

Out[13]: Country Continent


0 Afghanistan Asia
1 Egypt Africa
2 ร…land Islands Europe
3 Albania Europe
4 Algeria Africa

We want to merge our new dataframe, worlddata, with realwage_f


The pandas merge function allows dataframes to be joined together by rows
Our dataframes will be merged using country names, requiring us to use the transpose of re-
alwage_f so that rows correspond to country names in both dataframes

In [14]: realwage_f.transpose().head()

Out[14]: Time 2006-01-01 2007-01-01 2008-01-01 โ€ฆ 2014-01-01 2015-01-01 \


Country โ€ฆ
Australia 12.06 12.46 12.24 โ€ฆ 12.67 12.83
Belgium 9.70 9.82 9.87 โ€ฆ 10.01 9.95
Brazil 0.87 0.92 0.96 โ€ฆ 1.21 1.21
Canada 6.89 6.96 7.24 โ€ฆ 8.22 8.35
Chile 1.42 1.45 1.44 โ€ฆ 1.76 1.81

Time 2016-01-01
Country
Australia 12.98
Belgium 9.76
Brazil 1.24
Canada 8.48
Chile 1.91

[5 rows x 11 columns]

We can use either left, right, inner, or outer join to merge our datasets:

โ€ข left join includes only countries from the left dataset


โ€ข right join includes only countries from the right dataset
โ€ข outer join includes countries that are in either the left and right datasets
โ€ข inner join includes only countries common to both the left and right datasets

By default, merge will use an inner join


Here we will pass how='left' to keep all countries in realwage_f, but discard countries
in worlddata that do not have a corresponding data entry realwage_f
This is illustrated by the red shading in the following diagram
266 17. PANDAS FOR PANEL DATA

We will also need to specify where the country name is located in each dataframe, which will
be the key that is used to merge the dataframes โ€˜onโ€™
Our โ€˜leftโ€™ dataframe (realwage_f.transpose()) contains countries in the index, so we
set left_index=True
Our โ€˜rightโ€™ dataframe (worlddata) contains countries in the โ€˜Countryโ€™ column, so we set
right_on='Country'

In [15]: merged = pd.merge(realwage_f.transpose(), worlddata,


how='left', left_index=True, right_on='Country')
merged.head()

Out[15]: 2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 โ€ฆ \


17 12.06 12.46 12.24 โ€ฆ
23 9.70 9.82 9.87 โ€ฆ
32 0.87 0.92 0.96 โ€ฆ
100 6.89 6.96 7.24 โ€ฆ
38 1.42 1.45 1.44 โ€ฆ

2016-01-01 00:00:00 Country Continent


17 12.98 Australia Australia
23 9.76 Belgium Europe
32 1.24 Brazil South America
100 8.48 Canada North America
38 1.91 Chile South America

[5 rows x 13 columns]

Countries that appeared in realwage_f but not in worlddata will have NaN in the Conti-
nent column
To check whether this has occurred, we can use .isnull() on the continent column and
filter the merged dataframe

In [16]: merged[merged['Continent'].isnull()]

Out[16]: 2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 โ€ฆ \


247 3.42 3.74 3.87 โ€ฆ
247 0.23 0.45 0.39 โ€ฆ
247 1.50 1.64 1.71 โ€ฆ
17.4. MERGING DATAFRAMES AND FILLING NANS 267

2016-01-01 00:00:00 Country Continent


247 5.28 Korea NaN
247 0.55 Russian Federation NaN
247 2.08 Slovak Republic NaN

[3 rows x 13 columns]

We have three missing values!


One option to deal with NaN values is to create a dictionary containing these countries and
their respective continents
.map() will match countries in merged[' Country '] with their continent from the dic-
tionary
Notice how countries not in our dictionary are mapped with NaN

In [17]: missing_continents = {'Korea': 'Asia',


'Russian Federation': 'Europe',
'Slovak Republic': 'Europe'}

merged['Country'].map(missing_continents)

Out[17]: 17 NaN
23 NaN
32 NaN
100 NaN
38 NaN
108 NaN
41 NaN
225 NaN
53 NaN
58 NaN
45 NaN
68 NaN
233 NaN
86 NaN
88 NaN
91 NaN
247 Asia
117 NaN
122 NaN
123 NaN
138 NaN
153 NaN
151 NaN
174 NaN
175 NaN
247 Europe
247 Europe
198 NaN
200 NaN
227 NaN
241 NaN
240 NaN
Name: Country, dtype: object

We donโ€™t want to overwrite the entire series with this mapping


.fillna() only fills in NaN values in merged['Continent'] with the mapping, while
leaving other values in the column unchanged

In [18]: merged['Continent'] = merged['Continent'].fillna(merged['Country'].map(missing_continents))

# Check for whether continents were correctly mapped

merged[merged['Country'] == 'Korea']
268 17. PANDAS FOR PANEL DATA

Out[18]: 2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 โ€ฆ \


247 3.42 3.74 3.87 โ€ฆ

2016-01-01 00:00:00 Country Continent


247 5.28 Korea Asia

[1 rows x 13 columns]

We will also combine the Americas into a single continent - this will make our visualization
nicer later on
To do this, we will use .replace() and loop through a list of the continent values we want
to replace

In [19]: replace = ['Central America', 'North America', 'South America']

for country in replace:


merged['Continent'].replace(to_replace=country,
value='America',
inplace=True)

Now that we have all the data we want in a single DataFrame, we will reshape it back into
panel form with a MultiIndex
We should also ensure to sort the index using .sort_index() so that we can efficiently fil-
ter our dataframe later on
By default, levels will be sorted top-down

In [20]: merged = merged.set_index(['Continent', 'Country']).sort_index()


merged.head()

Out[20]: 2006-01-01 2007-01-01 2008-01-01 โ€ฆ 2014-01-01 \


Continent Country โ€ฆ
America Brazil 0.87 0.92 0.96 โ€ฆ 1.21
Canada 6.89 6.96 7.24 โ€ฆ 8.22
Chile 1.42 1.45 1.44 โ€ฆ 1.76
Colombia 1.01 1.02 1.01 โ€ฆ 1.13
Costa Rica nan nan nan โ€ฆ 2.41

2015-01-01 2016-01-01
Continent Country
America Brazil 1.21 1.24
Canada 8.35 8.48
Chile 1.81 1.91
Colombia 1.13 1.12
Costa Rica 2.56 2.63

[5 rows x 11 columns]

While merging, we lost our DatetimeIndex, as we merged columns that were not in date-
time format

In [21]: merged.columns

Out[21]: Index([2006-01-01 00:00:00, 2007-01-01 00:00:00, 2008-01-01 00:00:00,


2009-01-01 00:00:00, 2010-01-01 00:00:00, 2011-01-01 00:00:00,
2012-01-01 00:00:00, 2013-01-01 00:00:00, 2014-01-01 00:00:00,
2015-01-01 00:00:00, 2016-01-01 00:00:00],
dtype='object')

Now that we have set the merged columns as the index, we can recreate a DatetimeIndex
using .to_datetime()
17.5. GROUPING AND SUMMARIZING DATA 269

In [22]: merged.columns = pd.to_datetime(merged.columns)


merged.columns = merged.columns.rename('Time')
merged.columns

Out[22]: DatetimeIndex(['2006-01-01', '2007-01-01', '2008-01-01', '2009-01-01',


'2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01',
'2014-01-01', '2015-01-01', '2016-01-01'],
dtype='datetime64[ns]', name='Time', freq=None)

The DatetimeIndex tends to work more smoothly in the row axis, so we will go ahead and
transpose merged

In [23]: merged = merged.transpose()


merged.head()

Out[23]: Continent America โ€ฆ Europe


Country Brazil Canada Chile โ€ฆ Slovenia Spain United Kingdom
Time โ€ฆ
2006-01-01 0.87 6.89 1.42 โ€ฆ 3.92 3.99 9.81
2007-01-01 0.92 6.96 1.45 โ€ฆ 3.88 4.10 10.07
2008-01-01 0.96 7.24 1.44 โ€ฆ 3.96 4.14 10.04
2009-01-01 1.03 7.67 1.52 โ€ฆ 4.08 4.32 10.15
2010-01-01 1.08 7.94 1.56 โ€ฆ 4.81 4.30 9.96

[5 rows x 32 columns]

17.5 Grouping and Summarizing Data

Grouping and summarizing data can be particularly useful for understanding large panel
datasets
A simple way to summarize data is to call an aggregation method on the dataframe, such as
.mean() or .max()
For example, we can calculate the average real minimum wage for each country over the pe-
riod 2006 to 2016 (the default is to aggregate over rows)

In [24]: merged.mean().head(10)

Out[24]: Continent Country


America Brazil 1.09
Canada 7.82
Chile 1.62
Colombia 1.07
Costa Rica 2.53
Mexico 0.53
United States 7.15
Asia Israel 5.95
Japan 6.18
Korea 4.22
dtype: float64

Using this series, we can plot the average real minimum wage over the past decade for each
country in our data set

In [25]: import matplotlib.pyplot as plt


%matplotlib inline
import matplotlib
matplotlib.style.use('seaborn')
270 17. PANDAS FOR PANEL DATA

merged.mean().sort_values(ascending=False).plot(kind='bar', title="Average real minimum wage 2006 - 2

#Set country labels


country_labels = merged.mean().sort_values(ascending=False).index.get_level_values('Country').tolist(
plt.xticks(range(0, len(country_labels)), country_labels)
plt.xlabel('Country')

plt.show()

Passing in axis=1 to .mean() will aggregate over columns (giving the average minimum
wage for all countries over time)

In [26]: merged.mean(axis=1).head()

Out[26]: Time
2006-01-01 4.69
2007-01-01 4.84
2008-01-01 4.90
2009-01-01 5.08
2010-01-01 5.11
dtype: float64

We can plot this time series as a line graph

In [27]: merged.mean(axis=1).plot()
plt.title('Average real minimum wage 2006 - 2016')
17.5. GROUPING AND SUMMARIZING DATA 271

plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()

We can also specify a level of the MultiIndex (in the column axis) to aggregate over

In [28]: merged.mean(level='Continent', axis=1).head()

Out[28]: Continent America Asia Australia Europe


Time
2006-01-01 2.80 4.29 10.25 4.80
2007-01-01 2.85 4.44 10.73 4.94
2008-01-01 2.99 4.45 10.76 4.99
2009-01-01 3.23 4.53 10.97 5.16
2010-01-01 3.34 4.53 10.95 5.17

We can plot the average minimum wages in each continent as a time series

In [29]: merged.mean(level='Continent', axis=1).plot()


plt.title('Average real minimum wage')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
272 17. PANDAS FOR PANEL DATA

We will drop Australia as a continent for plotting purposes

In [30]: merged = merged.drop('Australia', level='Continent', axis=1)


merged.mean(level='Continent', axis=1).plot()
plt.title('Average real minimum wage')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
17.5. GROUPING AND SUMMARIZING DATA 273

.describe() is useful for quickly retrieving a number of common summary statistics

In [31]: merged.stack().describe()

Out[31]: Continent America Asia Europe


count 69.00 44.00 200.00
mean 3.19 4.70 5.15
std 3.02 1.56 3.82
min 0.52 2.22 0.23
25% 1.03 3.37 2.02
50% 1.44 5.48 3.54
75% 6.96 5.95 9.70
max 8.48 6.65 12.39

This is a simplified way to use groupby


Using groupby generally follows a โ€˜split-apply-combineโ€™ process:

โ€ข split: data is grouped based on one or more keys


โ€ข apply: a function is called on each group independently
โ€ข combine: the results of the function calls are combined into a new data structure

The groupby method achieves the first step of this process, creating a new
DataFrameGroupBy object with data split into groups
Letโ€™s split merged by continent again, this time using the groupby function, and name the
resulting object grouped

In [32]: grouped = merged.groupby(level='Continent', axis=1)


grouped

Out[32]: <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f59c27f9da0>

Calling an aggregation method on the object applies the function to each group, the results of
which are combined in a new data structure
For example, we can return the number of countries in our dataset for each continent using
.size()
In this case, our new data structure is a Series

In [33]: grouped.size()

Out[33]: Continent
America 7
Asia 4
Europe 19
dtype: int64

Calling .get_group() to return just the countries in a single group, we can create a kernel
density estimate of the distribution of real minimum wages in 2016 for each continent
grouped.groups.keys() will return the keys from the groupby object
274 17. PANDAS FOR PANEL DATA

In [34]: import seaborn as sns

continents = grouped.groups.keys()

for continent in continents:


sns.kdeplot(grouped.get_group(continent)['2015'].unstack(), label=continent, shade=True)

plt.title('Real minimum wages in 2015')


plt.xlabel('US dollars')
plt.show()

17.6 Final Remarks

This lecture has provided an introduction to some of pandasโ€™ more advanced features, includ-
ing multiindices, merging, grouping and plotting
Other tools that may be useful in panel data analysis include xarray, a python package that
extends pandas to N-dimensional data structures

17.7 Exercises

17.7.1 Exercise 1

In these exercises, youโ€™ll work with a dataset of employment rates in Europe by age and sex
from Eurostat
The dataset pandas_panel/employ.csv can be downloaded here
Reading in the CSV file returns a panel dataset in long format. Use .pivot_table() to
construct a wide format dataframe with a MultiIndex in the columns
17.8. SOLUTIONS 275

Start off by exploring the dataframe and the variables available in the MultiIndex levels
Write a program that quickly returns all values in the MultiIndex

17.7.2 Exercise 2

Filter the above dataframe to only include employment as a percentage of โ€˜active populationโ€™
Create a grouped boxplot using seaborn of employment rates in 2015 by age group and sex
Hint: GEO includes both areas and countries

17.8 Solutions

17.8.1 Exercise 1
In [35]: employ = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/em
employ = employ.pivot_table(values='Value',
index=['DATE'],
columns=['UNIT','AGE', 'SEX', 'INDIC_EM', 'GEO'])
employ.index = pd.to_datetime(employ.index) # ensure that dates are datetime format
employ.head()

Out[35]: UNIT Percentage of total population โ€ฆ \


AGE From 15 to 24 years โ€ฆ
SEX Females โ€ฆ
INDIC_EM Active population โ€ฆ
GEO Austria Belgium Bulgaria โ€ฆ
DATE โ€ฆ
2007-01-01 56.00 31.60 26.00 โ€ฆ
2008-01-01 56.20 30.80 26.10 โ€ฆ
2009-01-01 56.20 29.90 24.80 โ€ฆ
2010-01-01 54.00 29.80 26.60 โ€ฆ
2011-01-01 54.80 29.80 24.80 โ€ฆ

UNIT Thousand persons \


AGE From 55 to 64 years
SEX Total
INDIC_EM Total employment (resident population concept - LFS)
GEO Switzerland Turkey
DATE
2007-01-01 nan 1,282.00
2008-01-01 nan 1,354.00
2009-01-01 nan 1,449.00
2010-01-01 640.00 1,583.00
2011-01-01 661.00 1,760.00

UNIT
AGE
SEX
INDIC_EM
GEO United Kingdom
DATE
2007-01-01 4,131.00
2008-01-01 4,204.00
2009-01-01 4,193.00
2010-01-01 4,186.00
2011-01-01 4,164.00

[5 rows x 1440 columns]

This is a large dataset so it is useful to explore the levels and variables available

In [36]: employ.columns.names
276 17. PANDAS FOR PANEL DATA

Out[36]: FrozenList(['UNIT', 'AGE', 'SEX', 'INDIC_EM', 'GEO'])

Variables within levels can be quickly retrieved with a loop

In [37]: for name in employ.columns.names:


print(name, employ.columns.get_level_values(name).unique())

UNIT Index(['Percentage of total population', 'Thousand persons'], dtype='object', name='UNIT')


AGE Index(['From 15 to 24 years', 'From 25 to 54 years', 'From 55 to 64 years'], dtype='object', name='AGE')
SEX Index(['Females', 'Males', 'Total'], dtype='object', name='SEX')
INDIC_EM Index(['Active population', 'Total employment (resident population concept - LFS)'], dtype='object',
GEO Index(['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic',
'Denmark', 'Estonia', 'Euro area (17 countries)',
'Euro area (18 countries)', 'Euro area (19 countries)',
'European Union (15 countries)', 'European Union (27 countries)',
'European Union (28 countries)', 'Finland',
'Former Yugoslav Republic of Macedonia, the', 'France',
'France (metropolitan)',
'Germany (until 1990 former territory of the FRG)', 'Greece', 'Hungary',
'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg',
'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania',
'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey',
'United Kingdom'],
dtype='object', name='GEO')

17.8.2 Exercise 2

To easily filter by country, swap GEO to the top level and sort the MultiIndex

In [38]: employ.columns = employ.columns.swaplevel(0,-1)


employ = employ.sort_index(axis=1)

We need to get rid of a few items in GEO which are not countries
A fast way to get rid of the EU areas is to use a list comprehension to find the level values in
GEO that begin with โ€˜Euroโ€™

In [39]: geo_list = employ.columns.get_level_values('GEO').unique().tolist()


countries = [x for x in geo_list if not x.startswith('Euro')]
employ = employ[countries]
employ.columns.get_level_values('GEO').unique()

Out[39]: Index(['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic',


'Denmark', 'Estonia', 'Finland',
'Former Yugoslav Republic of Macedonia, the', 'France',
'France (metropolitan)',
'Germany (until 1990 former territory of the FRG)', 'Greece', 'Hungary',
'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg',
'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania',
'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey',
'United Kingdom'],
dtype='object', name='GEO')

Select only percentage employed in the active population from the dataframe

In [40]: employ_f = employ.xs(('Percentage of total population', 'Active population'),


level=('UNIT', 'INDIC_EM'),
axis=1)
employ_f.head()
17.8. SOLUTIONS 277

Out[40]: GEO Austria โ€ฆ United Kingdom \


AGE From 15 to 24 years โ€ฆ From 55 to 64 years
SEX Females Males Total โ€ฆ Females Males
DATE โ€ฆ
2007-01-01 56.00 62.90 59.40 โ€ฆ 49.90 68.90
2008-01-01 56.20 62.90 59.50 โ€ฆ 50.20 69.80
2009-01-01 56.20 62.90 59.50 โ€ฆ 50.60 70.30
2010-01-01 54.00 62.60 58.30 โ€ฆ 51.10 69.20
2011-01-01 54.80 63.60 59.20 โ€ฆ 51.30 68.40

GEO
AGE
SEX Total
DATE
2007-01-01 59.30
2008-01-01 59.80
2009-01-01 60.30
2010-01-01 60.00
2011-01-01 59.70

[5 rows x 306 columns]

Drop the โ€˜Totalโ€™ value before creating the grouped boxplot

In [41]: employ_f = employ_f.drop('Total', level='SEX', axis=1)

In [42]: box = employ_f['2015'].unstack().reset_index()


sns.boxplot(x="AGE", y=0, hue="SEX", data=box, palette=("husl"), showfliers=False)
plt.xlabel('')
plt.xticks(rotation=35)
plt.ylabel('Percentage of population (%)')
plt.title('Employment in Europe (2015)')
plt.legend(bbox_to_anchor=(1,0.5))
plt.show()
278 17. PANDAS FOR PANEL DATA
18

Linear Regression in Python

18.1 Contents

โ€ข Overview 18.2

โ€ข Simple Linear Regression 18.3

โ€ข Extending the Linear Regression Model 18.4

โ€ข Endogeneity 18.5

โ€ข Summary 18.6

โ€ข Exercises 18.7

โ€ข Solutions 18.8

In addition to whatโ€™s in Anaconda, this lecture will need the following libraries

In [1]: !pip install linearmodels

18.2 Overview

Linear regression is a standard tool for analyzing the relationship between two or more vari-
ables
In this lecture, weโ€™ll use the Python package statsmodels to estimate, interpret, and visu-
alize linear regression models
Along the way, weโ€™ll discuss a variety of topics, including

โ€ข simple and multivariate linear regression


โ€ข visualization
โ€ข endogeneity and omitted variable bias
โ€ข two-stage least squares

As an example, we will replicate results from Acemoglu, Johnson and Robinsonโ€™s seminal pa-
per [3]

279
280 18. LINEAR REGRESSION IN PYTHON

โ€ข You can download a copy here

In the paper, the authors emphasize the importance of institutions in economic development
The main contribution is the use of settler mortality rates as a source of exogenous variation
in institutional differences
Such variation is needed to determine whether it is institutions that give rise to greater eco-
nomic growth, rather than the other way around

18.2.1 Prerequisites

This lecture assumes you are familiar with basic econometrics


For an introductory text covering these topics, see, for example, [135]

18.2.2 Comments

This lecture is coauthored with Natasha Watkins

18.3 Simple Linear Regression

[3] wish to determine whether or not differences in institutions can help to explain observed
economic outcomes
How do we measure institutional differences and economic outcomes?
In this paper,

โ€ข economic outcomes are proxied by log GDP per capita in 1995, adjusted for exchange
rates
โ€ข institutional differences are proxied by an index of protection against expropriation on
average over 1985-95, constructed by the Political Risk Services Group

These variables and other data used in the paper are available for download on Daron Ace-
mogluโ€™s webpage
We will use pandasโ€™ .read_stata() function to read in data contained in the .dta files to
dataframes

In [2]: import pandas as pd

df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable1.dt
df1.head()

Out[2]: shortnam euro1900 excolony avexpr logpgp95 cons1 cons90 democ00a \


0 AFG 0.000000 1.0 NaN NaN 1.0 2.0 1.0
1 AGO 8.000000 1.0 5.363636 7.770645 3.0 3.0 0.0
2 ARE 0.000000 1.0 7.181818 9.804219 NaN NaN NaN
3 ARG 60.000004 1.0 6.386364 9.133459 1.0 6.0 3.0
4 ARM 0.000000 0.0 NaN 7.682482 NaN NaN NaN

cons00a extmort4 logem4 loghjypl baseco


0 1.0 93.699997 4.540098 NaN NaN
1 1.0 280.000000 5.634789 -3.411248 1.0
18.3. SIMPLE LINEAR REGRESSION 281

2 NaN NaN NaN NaN NaN


3 3.0 68.900002 4.232656 -0.872274 1.0
4 NaN NaN NaN NaN NaN

Letโ€™s use a scatterplot to see whether any obvious relationship exists between GDP per capita
and the protection against expropriation index

In [3]: import matplotlib.pyplot as plt


%matplotlib inline
plt.style.use('seaborn')

df1.plot(x='avexpr', y='logpgp95', kind='scatter')


plt.show()

The plot shows a fairly strong positive relationship between protection against expropriation
and log GDP per capita
Specifically, if higher protection against expropriation is a measure of institutional quality,
then better institutions appear to be positively correlated with better economic outcomes
(higher GDP per capita)
Given the plot, choosing a linear model to describe this relationship seems like a reasonable
assumption
We can write our model as

๐‘™๐‘œ๐‘”๐‘๐‘”๐‘95๐‘– = ๐›ฝ0 + ๐›ฝ1 ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– + ๐‘ข๐‘–

where:

โ€ข ๐›ฝ0 is the intercept of the linear trend line on the y-axis


282 18. LINEAR REGRESSION IN PYTHON

โ€ข ๐›ฝ1 is the slope of the linear trend line, representing the marginal effect of protection
against risk on log GDP per capita
โ€ข ๐‘ข๐‘– is a random error term (deviations of observations from the linear trend due to fac-
tors not included in the model)

Visually, this linear model involves choosing a straight line that best fits the data, as in the
following plot (Figure 2 in [3])

In [4]: import numpy as np

# Dropping NA's is required to use numpy's polyfit


df1_subset = df1.dropna(subset=['logpgp95', 'avexpr'])

# Use only 'base sample' for plotting purposes


df1_subset = df1_subset[df1_subset['baseco'] == 1]

X = df1_subset['avexpr']
y = df1_subset['logpgp95']
labels = df1_subset['shortnam']

# Replace markers with country labels


plt.scatter(X, y, marker='')

for i, label in enumerate(labels):


plt.annotate(label, (X.iloc[i], y.iloc[i]))

# Fit a linear trend line


plt.plot(np.unique(X),
np.poly1d(np.polyfit(X, y, 1))(np.unique(X)),
color='black')

plt.xlim([3.3,10.5])
plt.ylim([4,10.5])
plt.xlabel('Average Expropriation Risk 1985-95')
plt.ylabel('Log GDP per capita, PPP, 1995')
plt.title('Figure 2: OLS relationship between expropriation risk and income')
plt.show()
18.3. SIMPLE LINEAR REGRESSION 283

The most common technique to estimate the parameters (๐›ฝโ€™s) of the linear model is Ordinary
Least Squares (OLS)
As the name implies, an OLS model is solved by finding the parameters that minimize the
sum of squared residuals, ie.

๐‘
min โˆ‘ ๐‘ขฬ‚2๐‘–
๐›ฝฬ‚ ๐‘–=1

where ๐‘ขฬ‚๐‘– is the difference between the observation and the predicted value of the dependent
variable
To estimate the constant term ๐›ฝ0 , we need to add a column of 1โ€™s to our dataset (consider
the equation if ๐›ฝ0 was replaced with ๐›ฝ0 ๐‘ฅ๐‘– and ๐‘ฅ๐‘– = 1)

In [5]: df1['const'] = 1

Now we can construct our model in statsmodels using the OLS function
We will use pandas dataframes with statsmodels, however standard arrays can also be
used as arguments

In [6]: import statsmodels.api as sm

reg1 = sm.OLS(endog=df1['logpgp95'], exog=df1[['const', 'avexpr']], missing='drop')


type(reg1)

Out[6]: statsmodels.regression.linear_model.OLS

So far we have simply constructed our model


We need to use .fit() to obtain parameter estimates ๐›ฝ0ฬ‚ and ๐›ฝ1ฬ‚

In [7]: results = reg1.fit()


type(results)

Out[7]: statsmodels.regression.linear_model.RegressionResultsWrapper

We now have the fitted regression model stored in results


To view the OLS regression results, we can call the .summary() method
Note that an observation was mistakenly dropped from the results in the original paper (see
the note located in maketable2.do from Acemogluโ€™s webpage), and thus the coefficients differ
slightly

In [8]: print(results.summary())

OLS Regression Results


==============================================================================
Dep. Variable: logpgp95 R-squared: 0.611
Model: OLS Adj. R-squared: 0.608
Method: Least Squares F-statistic: 171.4
Date: Fri, 21 Jun 2019 Prob (F-statistic): 4.16e-24
Time: 15:39:14 Log-Likelihood: -119.71
284 18. LINEAR REGRESSION IN PYTHON

No. Observations: 111 AIC: 243.4


Df Residuals: 109 BIC: 248.8
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 4.6261 0.301 15.391 0.000 4.030 5.222
avexpr 0.5319 0.041 13.093 0.000 0.451 0.612
==============================================================================
Omnibus: 9.251 Durbin-Watson: 1.689
Prob(Omnibus): 0.010 Jarque-Bera (JB): 9.170
Skew: -0.680 Prob(JB): 0.0102
Kurtosis: 3.362 Cond. No. 33.2
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

From our results, we see that

โ€ข The intercept ๐›ฝ0ฬ‚ = 4.63


โ€ข The slope ๐›ฝ1ฬ‚ = 0.53
โ€ข The positive ๐›ฝ1ฬ‚ parameter estimate implies that institutional quality has a positive ef-
fect on economic outcomes, as we saw in the figure
โ€ข The p-value of 0.000 for ๐›ฝ1ฬ‚ implies that the effect of institutions on GDP is statistically
significant (using p < 0.05 as a rejection rule)
โ€ข The R-squared value of 0.611 indicates that around 61% of variation in log GDP per
capita is explained by protection against expropriation

Using our parameter estimates, we can now write our estimated relationship as

ฬ‚
๐‘™๐‘œ๐‘”๐‘๐‘”๐‘95 ๐‘– = 4.63 + 0.53 ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘–

This equation describes the line that best fits our data, as shown in Figure 2
We can use this equation to predict the level of log GDP per capita for a value of the index of
expropriation protection
For example, for a country with an index value of 7.07 (the average for the dataset), we find
that their predicted level of log GDP per capita in 1995 is 8.38

In [9]: mean_expr = np.mean(df1_subset['avexpr'])


mean_expr

Out[9]: 6.515625

In [10]: predicted_logpdp95 = 4.63 + 0.53 * 7.07


predicted_logpdp95

Out[10]: 8.3771

An easier (and more accurate) way to obtain this result is to use .predict() and set
๐‘๐‘œ๐‘›๐‘ ๐‘ก๐‘Ž๐‘›๐‘ก = 1 and ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– = ๐‘š๐‘’๐‘Ž๐‘›_๐‘’๐‘ฅ๐‘๐‘Ÿ

In [11]: results.predict(exog=[1, mean_expr])


18.4. EXTENDING THE LINEAR REGRESSION MODEL 285

Out[11]: array([8.09156367])

We can obtain an array of predicted ๐‘™๐‘œ๐‘”๐‘๐‘”๐‘95๐‘– for every value of ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– in our dataset by
calling .predict() on our results
Plotting the predicted values against ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– shows that the predicted values lie along the
linear line that we fitted above
The observed values of ๐‘™๐‘œ๐‘”๐‘๐‘”๐‘95๐‘– are also plotted for comparison purposes

In [12]: # Drop missing observations from whole sample

df1_plot = df1.dropna(subset=['logpgp95', 'avexpr'])

# Plot predicted values

plt.scatter(df1_plot['avexpr'], results.predict(), alpha=0.5, label='predicted')

# Plot observed values

plt.scatter(df1_plot['avexpr'], df1_plot['logpgp95'], alpha=0.5, label='observed')

plt.legend()
plt.title('OLS predicted values')
plt.xlabel('avexpr')
plt.ylabel('logpgp95')
plt.show()

18.4 Extending the Linear Regression Model

So far we have only accounted for institutions affecting economic performance - almost cer-
tainly there are numerous other factors affecting GDP that are not included in our model
286 18. LINEAR REGRESSION IN PYTHON

Leaving out variables that affect ๐‘™๐‘œ๐‘”๐‘๐‘”๐‘95๐‘– will result in omitted variable bias, yielding
biased and inconsistent parameter estimates
We can extend our bivariate regression model to a multivariate regression model by
adding in other factors that may affect ๐‘™๐‘œ๐‘”๐‘๐‘”๐‘95๐‘–
[3] consider other factors such as:

โ€ข the effect of climate on economic outcomes; latitude is used to proxy this


โ€ข differences that affect both economic performance and institutions, eg. cultural, histori-
cal, etc.; controlled for with the use of continent dummies

Letโ€™s estimate some of the extended models considered in the paper (Table 2) using data from
maketable2.dta

In [13]: df2 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable2.d

# Add constant term to dataset


df2['const'] = 1

# Create lists of variables to be used in each regression


X1 = ['const', 'avexpr']
X2 = ['const', 'avexpr', 'lat_abst']
X3 = ['const', 'avexpr', 'lat_abst', 'asia', 'africa', 'other']

# Estimate an OLS regression for each set of variables


reg1 = sm.OLS(df2['logpgp95'], df2[X1], missing='drop').fit()
reg2 = sm.OLS(df2['logpgp95'], df2[X2], missing='drop').fit()
reg3 = sm.OLS(df2['logpgp95'], df2[X3], missing='drop').fit()

Now that we have fitted our model, we will use summary_col to display the results in a sin-
gle table (model numbers correspond to those in the paper)

In [14]: from statsmodels.iolib.summary2 import summary_col

info_dict={'R-squared' : lambda x: f"{x.rsquared:.2f}",


'No. observations' : lambda x: f"{int(x.nobs):d}"}

results_table = summary_col(results=[reg1,reg2,reg3],
float_format='%0.2f',
stars = True,
model_names=['Model 1',
'Model 3',
'Model 4'],
info_dict=info_dict,
regressor_order=['const',
'avexpr',
'lat_abst',
'asia',
'africa'])

results_table.add_title('Table 2 - OLS Regressions')

print(results_table)

Table 2 - OLS Regressions


=========================================
Model 1 Model 3 Model 4
-----------------------------------------
const 4.63*** 4.87*** 5.85***
(0.30) (0.33) (0.34)
avexpr 0.53*** 0.46*** 0.39***
(0.04) (0.06) (0.05)
lat_abst 0.87* 0.33
18.5. ENDOGENEITY 287

(0.49) (0.45)
asia -0.15
(0.15)
africa -0.92***
(0.17)
other 0.30
(0.37)
R-squared 0.61 0.62 0.72
No. observations 111 111 111
=========================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

18.5 Endogeneity

As [3] discuss, the OLS models likely suffer from endogeneity issues, resulting in biased and
inconsistent model estimates
Namely, there is likely a two-way relationship between institutions and economic outcomes:

โ€ข richer countries may be able to afford or prefer better institutions


โ€ข variables that affect income may also be correlated with institutional differences
โ€ข the construction of the index may be biased; analysts may be biased towards seeing
countries with higher income having better institutions

To deal with endogeneity, we can use two-stage least squares (2SLS) regression, which
is an extension of OLS regression
This method requires replacing the endogenous variable ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– with a variable that is:

1. correlated with ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘–


2. not correlated with the error term (ie. it should not directly affect the dependent vari-
able, otherwise it would be correlated with ๐‘ข๐‘– due to omitted variable bias)

The new set of regressors is called an instrument, which aims to remove endogeneity in our
proxy of institutional differences
The main contribution of [3] is the use of settler mortality rates to instrument for institu-
tional differences
They hypothesize that higher mortality rates of colonizers led to the establishment of insti-
tutions that were more extractive in nature (less protection against expropriation), and these
institutions still persist today
Using a scatterplot (Figure 3 in [3]), we can see protection against expropriation is negatively
correlated with settler mortality rates, coinciding with the authorsโ€™ hypothesis and satisfying
the first condition of a valid instrument

In [15]: # Dropping NA's is required to use numpy's polyfit


df1_subset2 = df1.dropna(subset=['logem4', 'avexpr'])

X = df1_subset2['logem4']
y = df1_subset2['avexpr']
labels = df1_subset2['shortnam']

# Replace markers with country labels


288 18. LINEAR REGRESSION IN PYTHON

plt.scatter(X, y, marker='')

for i, label in enumerate(labels):


plt.annotate(label, (X.iloc[i], y.iloc[i]))

# Fit a linear trend line


plt.plot(np.unique(X),
np.poly1d(np.polyfit(X, y, 1))(np.unique(X)),
color='black')

plt.xlim([1.8,8.4])
plt.ylim([3.3,10.4])
plt.xlabel('Log of Settler Mortality')
plt.ylabel('Average Expropriation Risk 1985-95')
plt.title('Figure 3: First-stage relationship between settler mortality and expropriation risk')
plt.show()

The second condition may not be satisfied if settler mortality rates in the 17th to 19th cen-
turies have a direct effect on current GDP (in addition to their indirect effect through institu-
tions)
For example, settler mortality rates may be related to the current disease environment in a
country, which could affect current economic performance
[3] argue this is unlikely because:

โ€ข The majority of settler deaths were due to malaria and yellow fever and had a limited
effect on local people
โ€ข The disease burden on local people in Africa or India, for example, did not appear to
be higher than average, supported by relatively high population densities in these areas
before colonization

As we appear to have a valid instrument, we can use 2SLS regression to obtain consistent and
unbiased parameter estimates
First stage
18.5. ENDOGENEITY 289

The first stage involves regressing the endogenous variable (๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– ) on the instrument
The instrument is the set of all exogenous variables in our model (and not just the variable
we have replaced)
Using model 1 as an example, our instrument is simply a constant and settler mortality rates
๐‘™๐‘œ๐‘”๐‘’๐‘š4๐‘–
Therefore, we will estimate the first-stage regression as

๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– = ๐›ฟ0 + ๐›ฟ1 ๐‘™๐‘œ๐‘”๐‘’๐‘š4๐‘– + ๐‘ฃ๐‘–

The data we need to estimate this equation is located in maketable4.dta (only complete
data, indicated by baseco = 1, is used for estimation)

In [16]: # Import and select the data


df4 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable4.d
df4 = df4[df4['baseco'] == 1]

# Add a constant variable


df4['const'] = 1

# Fit the first stage regression and print summary


results_fs = sm.OLS(df4['avexpr'],
df4[['const', 'logem4']],
missing='drop').fit()
print(results_fs.summary())

OLS Regression Results


==============================================================================
Dep. Variable: avexpr R-squared: 0.270
Model: OLS Adj. R-squared: 0.258
Method: Least Squares F-statistic: 22.95
Date: Fri, 21 Jun 2019 Prob (F-statistic): 1.08e-05
Time: 15:39:17 Log-Likelihood: -104.83
No. Observations: 64 AIC: 213.7
Df Residuals: 62 BIC: 218.0
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 9.3414 0.611 15.296 0.000 8.121 10.562
logem4 -0.6068 0.127 -4.790 0.000 -0.860 -0.354
==============================================================================
Omnibus: 0.035 Durbin-Watson: 2.003
Prob(Omnibus): 0.983 Jarque-Bera (JB): 0.172
Skew: 0.045 Prob(JB): 0.918
Kurtosis: 2.763 Cond. No. 19.4
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Second stage
We need to retrieve the predicted values of ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– using .predict()
We then replace the endogenous variable ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– with the predicted values ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ
ฬ‚ ๐‘– in the
original linear model
Our second stage regression is thus

๐‘™๐‘œ๐‘”๐‘๐‘”๐‘95๐‘– = ๐›ฝ0 + ๐›ฝ1 ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ
ฬ‚ ๐‘– + ๐‘ข๐‘–
290 18. LINEAR REGRESSION IN PYTHON

In [17]: df4['predicted_avexpr'] = results_fs.predict()

results_ss = sm.OLS(df4['logpgp95'],
df4[['const', 'predicted_avexpr']]).fit()
print(results_ss.summary())

OLS Regression Results


==============================================================================
Dep. Variable: logpgp95 R-squared: 0.477
Model: OLS Adj. R-squared: 0.469
Method: Least Squares F-statistic: 56.60
Date: Fri, 21 Jun 2019 Prob (F-statistic): 2.66e-10
Time: 15:39:17 Log-Likelihood: -72.268
No. Observations: 64 AIC: 148.5
Df Residuals: 62 BIC: 152.9
Df Model: 1
Covariance Type: nonrobust
====================================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------------
const 1.9097 0.823 2.320 0.024 0.264 3.555
predicted_avexpr 0.9443 0.126 7.523 0.000 0.693 1.195
==============================================================================
Omnibus: 10.547 Durbin-Watson: 2.137
Prob(Omnibus): 0.005 Jarque-Bera (JB): 11.010
Skew: -0.790 Prob(JB): 0.00407
Kurtosis: 4.277 Cond. No. 58.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The second-stage regression results give us an unbiased and consistent estimate of the effect
of institutions on economic outcomes
The result suggests a stronger positive relationship than what the OLS results indicated
Note that while our parameter estimates are correct, our standard errors are not and for this
reason, computing 2SLS โ€˜manuallyโ€™ (in stages with OLS) is not recommended
We can correctly estimate a 2SLS regression in one step using the linearmodels package, an
extension of statsmodels

In [18]: from linearmodels.iv import IV2SLS

Note that when using IV2SLS, the exogenous and instrument variables are split up in the
function arguments (whereas before the instrument included exogenous variables)

In [19]: iv = IV2SLS(dependent=df4['logpgp95'],
exog=df4['const'],
endog=df4['avexpr'],
instruments=df4['logem4']).fit(cov_type='unadjusted')

print(iv.summary)

IV-2SLS Estimation Summary


==============================================================================
Dep. Variable: logpgp95 R-squared: 0.1870
Estimator: IV-2SLS Adj. R-squared: 0.1739
No. Observations: 64 F-statistic: 37.568
Date: Fri, Jun 21 2019 P-value (F-stat) 0.0000
Time: 15:39:17 Distribution: chi2(1)
Cov. Estimator: unadjusted
18.6. SUMMARY 291

Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 1.9097 1.0106 1.8897 0.0588 -0.0710 3.8903
avexpr 0.9443 0.1541 6.1293 0.0000 0.6423 1.2462
==============================================================================

Endogenous: avexpr
Instruments: logem4
Unadjusted Covariance (Homoskedastic)
Debiased: False

Given that we now have consistent and unbiased estimates, we can infer from the model we
have estimated that institutional differences (stemming from institutions set up during colo-
nization) can help to explain differences in income levels across countries today
[3] use a marginal effect of 0.94 to calculate that the difference in the index between Chile
and Nigeria (ie. institutional quality) implies up to a 7-fold difference in income, emphasizing
the significance of institutions in economic development

18.6 Summary

We have demonstrated basic OLS and 2SLS regression in statsmodels and linearmod-
els
If you are familiar with R, you may want to use the formula interface to statsmodels, or
consider using r2py to call R from within Python

18.7 Exercises

18.7.1 Exercise 1

In the lecture, we think the original model suffers from endogeneity bias due to the likely ef-
fect income has on institutional development
Although endogeneity is often best identified by thinking about the data and model, we can
formally test for endogeneity using the Hausman test
We want to test for correlation between the endogenous variable, ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– , and the errors, ๐‘ข๐‘–

๐ป0 โˆถ ๐ถ๐‘œ๐‘ฃ(๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– , ๐‘ข๐‘– ) = 0 (๐‘›๐‘œ ๐‘’๐‘›๐‘‘๐‘œ๐‘”๐‘’๐‘›๐‘’๐‘–๐‘ก๐‘ฆ)


๐ป1 โˆถ ๐ถ๐‘œ๐‘ฃ(๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– , ๐‘ข๐‘– ) โ‰  0 (๐‘’๐‘›๐‘‘๐‘œ๐‘”๐‘’๐‘›๐‘’๐‘–๐‘ก๐‘ฆ)

This test is run is two stages


First, we regress ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– on the instrument, ๐‘™๐‘œ๐‘”๐‘’๐‘š4๐‘–

๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– = ๐œ‹0 + ๐œ‹1 ๐‘™๐‘œ๐‘”๐‘’๐‘š4๐‘– + ๐œ๐‘–

Second, we retrieve the residuals ๐œ๐‘–ฬ‚ and include them in the original equation

๐‘™๐‘œ๐‘”๐‘๐‘”๐‘95๐‘– = ๐›ฝ0 + ๐›ฝ1 ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– + ๐›ผ๐œ๐‘–ฬ‚ + ๐‘ข๐‘–


292 18. LINEAR REGRESSION IN PYTHON

If ๐›ผ is statistically significant (with a p-value < 0.05), then we reject the null hypothesis and
conclude that ๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– is endogenous
Using the above information, estimate a Hausman test and interpret your results

18.7.2 Exercise 2

The OLS parameter ๐›ฝ can also be estimated using matrix algebra and numpy (you may need
to review the numpy lecture to complete this exercise)
The linear equation we want to estimate is (written in matrix form)

๐‘ฆ = ๐‘‹๐›ฝ + ๐‘ข

To solve for the unknown parameter ๐›ฝ, we want to minimize the sum of squared residuals

min๐‘ขฬ‚โ€ฒ ๐‘ขฬ‚
๐›ฝฬ‚

Rearranging the first equation and substituting into the second equation, we can write

min (๐‘Œ โˆ’ ๐‘‹ ๐›ฝ)ฬ‚ โ€ฒ (๐‘Œ โˆ’ ๐‘‹ ๐›ฝ)ฬ‚


๐›ฝฬ‚

Solving this optimization problem gives the solution for the ๐›ฝ ฬ‚ coefficients

๐›ฝ ฬ‚ = (๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ ๐‘ฆ

Using the above information, compute ๐›ฝ ฬ‚ from model 1 using numpy - your results should be
the same as those in the statsmodels output from earlier in the lecture

18.8 Solutions

18.8.1 Exercise 1
In [20]: # Load in data
df4 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable4.d

# Add a constant term


df4['const'] = 1

# Estimate the first stage regression


reg1 = sm.OLS(endog=df4['avexpr'],
exog=df4[['const', 'logem4']],
missing='drop').fit()

# Retrieve the residuals


df4['resid'] = reg1.resid

# Estimate the second stage residuals


reg2 = sm.OLS(endog=df4['logpgp95'],
exog=df4[['const', 'avexpr', 'resid']],
missing='drop').fit()

print(reg2.summary())
18.8. SOLUTIONS 293

OLS Regression Results


==============================================================================
Dep. Variable: logpgp95 R-squared: 0.689
Model: OLS Adj. R-squared: 0.679
Method: Least Squares F-statistic: 74.05
Date: Fri, 21 Jun 2019 Prob (F-statistic): 1.07e-17
Time: 15:39:17 Log-Likelihood: -62.031
No. Observations: 70 AIC: 130.1
Df Residuals: 67 BIC: 136.8
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.4782 0.547 4.530 0.000 1.386 3.570
avexpr 0.8564 0.082 10.406 0.000 0.692 1.021
resid -0.4951 0.099 -5.017 0.000 -0.692 -0.298
==============================================================================
Omnibus: 17.597 Durbin-Watson: 2.086
Prob(Omnibus): 0.000 Jarque-Bera (JB): 23.194
Skew: -1.054 Prob(JB): 9.19e-06
Kurtosis: 4.873 Cond. No. 53.8
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The output shows that the coefficient on the residuals is statistically significant, indicating
๐‘Ž๐‘ฃ๐‘’๐‘ฅ๐‘๐‘Ÿ๐‘– is endogenous

18.8.2 Exercise 2
In [21]: # Load in data
df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable1.d
df1 = df1.dropna(subset=['logpgp95', 'avexpr'])

# Add a constant term


df1['const'] = 1

# Define the X and y variables


y = np.asarray(df1['logpgp95'])
X = np.asarray(df1[['const', 'avexpr']])

# Compute ฮฒ_hat
ฮฒ_hat = np.linalg.solve(X.T @ X, X.T @ y)

# Print out the results from the 2 x 1 vector ฮฒ_hat


print(f'ฮฒ_0 = {ฮฒ_hat[0]:.2}')
print(f'ฮฒ_1 = {ฮฒ_hat[1]:.2}')

ฮฒ_0 = 4.6
ฮฒ_1 = 0.53

It is also possible to use np.linalg.inv(X.T @ X) @ X.T @ y to solve for ๐›ฝ, however


.solve() is preferred as it involves fewer computations
294 18. LINEAR REGRESSION IN PYTHON
19

Maximum Likelihood Estimation

19.1 Contents

โ€ข Overview 19.2

โ€ข Set Up and Assumptions 19.3

โ€ข Conditional Distributions 19.4

โ€ข Maximum Likelihood Estimation 19.5

โ€ข MLE with Numerical Methods 19.6

โ€ข Maximum Likelihood Estimation 19.7

โ€ข Summary 19.8

โ€ข Exercises 19.9

โ€ข Solutions 19.10

19.2 Overview

In a previous lecture, we estimated the relationship between dependent and explanatory vari-
ables using linear regression
But what if a linear relationship is not an appropriate assumption for our model?
One widely used alternative is maximum likelihood estimation, which involves specifying a
class of distributions, indexed by unknown parameters, and then using the data to pin down
these parameter values
The benefit relative to linear regression is that it allows more flexibility in the probabilistic
relationships between variables
Here we illustrate maximum likelihood by replicating Daniel Treismanโ€™s (2016) paper, Rus-
siaโ€™s Billionaires, which connects the number of billionaires in a country to its economic char-
acteristics
The paper concludes that Russia has a higher number of billionaires than economic factors
such as market size and tax rate predict

295
296 19. MAXIMUM LIKELIHOOD ESTIMATION

19.2.1 Prerequisites

We assume familiarity with basic probability and multivariate calculus

19.2.2 Comments

This lecture is co-authored with Natasha Watkins

19.3 Set Up and Assumptions

Letโ€™s consider the steps we need to go through in maximum likelihood estimation and how
they pertain to this study

19.3.1 Flow of Ideas

The first step with maximum likelihood estimation is to choose the probability distribution
believed to be generating the data
More precisely, we need to make an assumption as to which parametric class of distributions
is generating the data

โ€ข e.g., the class of all normal distributions, or the class of all gamma distributions

Each such class is a family of distributions indexed by a finite number of parameters

โ€ข e.g., the class of normal distributions is a family of distributions indexed by its mean
๐œ‡ โˆˆ (โˆ’โˆž, โˆž) and standard deviation ๐œŽ โˆˆ (0, โˆž)

Weโ€™ll let the data pick out a particular element of the class by pinning down the parameters
The parameter estimates so produced will be called maximum likelihood estimates

19.3.2 Counting Billionaires

Treisman [129] is interested in estimating the number of billionaires in different countries


The number of billionaires is integer-valued
Hence we consider distributions that take values only in the nonnegative integers
(This is one reason least squares regression is not the best tool for the present problem, since
the dependent variable in linear regression is not restricted to integer values)
One integer distribution is the Poisson distribution, the probability mass function (pmf) of
which is

๐œ‡๐‘ฆ โˆ’๐œ‡
๐‘“(๐‘ฆ) = ๐‘’ , ๐‘ฆ = 0, 1, 2, โ€ฆ , โˆž
๐‘ฆ!

We can plot the Poisson distribution over ๐‘ฆ for different values of ๐œ‡ as follows
19.3. SET UP AND ASSUMPTIONS 297

In [1]: from numpy import exp


from scipy.special import factorial
import matplotlib.pyplot as plt
%matplotlib inline

poisson_pmf = lambda y, ฮผ: ฮผ**y / factorial(y) * exp(-ฮผ)


y_values = range(0, 25)

fig, ax = plt.subplots(figsize=(12, 8))

for ฮผ in [1, 5, 10]:


distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, ฮผ))
ax.plot(y_values,
distribution,
label=f'$\mu$={ฮผ}',
alpha=0.5,
marker='o',
markersize=8)

ax.grid()
ax.set_xlabel('$y$', fontsize=14)
ax.set_ylabel('$f(y \mid \mu)$', fontsize=14)
ax.axis(xmin=0, ymin=0)
ax.legend(fontsize=14)

plt.show()

Notice that the Poisson distribution begins to resemble a normal distribution as the mean of
๐‘ฆ increases
Letโ€™s have a look at the distribution of the data weโ€™ll be working with in this lecture
Treismanโ€™s main source of data is Forbesโ€™ annual rankings of billionaires and their estimated
net worth
The dataset mle/fp.dta can be downloaded here or from its AER page
298 19. MAXIMUM LIKELIHOOD ESTIMATION

In [2]: import pandas as pd


pd.options.display.max_columns = 10

# Load in data and view


df = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/mle/fp.dta')
df.head()

Out[2]: country ccode year cyear numbil โ€ฆ topint08 rintr \


0 United States 2.0 1990.0 21990.0 NaN โ€ฆ 39.799999 4.988405
1 United States 2.0 1991.0 21991.0 NaN โ€ฆ 39.799999 4.988405
2 United States 2.0 1992.0 21992.0 NaN โ€ฆ 39.799999 4.988405
3 United States 2.0 1993.0 21993.0 NaN โ€ฆ 39.799999 4.988405
4 United States 2.0 1994.0 21994.0 NaN โ€ฆ 39.799999 4.988405

noyrs roflaw nrrents


0 20.0 1.61 NaN
1 20.0 1.61 NaN
2 20.0 1.61 NaN
3 20.0 1.61 NaN
4 20.0 1.61 NaN

[5 rows x 36 columns]

Using a histogram, we can view the distribution of the number of billionaires per country,
numbil0, in 2008 (the United States is dropped for plotting purposes)

In [3]: numbil0_2008 = df[(df['year'] == 2008) & (


df['country'] != 'United States')].loc[:, 'numbil0']

plt.subplots(figsize=(12, 8))
plt.hist(numbil0_2008, bins=30)
plt.xlim(xmin=0)
plt.grid()
plt.xlabel('Number of billionaires in 2008')
plt.ylabel('Count')
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py:3215: MatplotlibDeprecationWarning:
The `xmin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `left` instead.
alternative='`left`', obj_type='argument')
19.4. CONDITIONAL DISTRIBUTIONS 299

From the histogram, it appears that the Poisson assumption is not unreasonable (albeit with
a very low ๐œ‡ and some outliers)

19.4 Conditional Distributions

In Treismanโ€™s paper, the dependent variable โ€” the number of billionaires ๐‘ฆ๐‘– in country ๐‘– โ€”


is modeled as a function of GDP per capita, population size, and years membership in GATT
and WTO
Hence, the distribution of ๐‘ฆ๐‘– needs to be conditioned on the vector of explanatory variables x๐‘–
The standard formulation โ€” the so-called poisson regression model โ€” is as follows:

๐‘ฆ
๐œ‡ ๐‘–
๐‘“(๐‘ฆ๐‘– โˆฃ x๐‘– ) = ๐‘– ๐‘’โˆ’๐œ‡๐‘– ; ๐‘ฆ๐‘– = 0, 1, 2, โ€ฆ , โˆž. (1)
๐‘ฆ๐‘– !

where ๐œ‡๐‘– = exp(xโ€ฒ๐‘– ๐›ฝ) = exp(๐›ฝ0 + ๐›ฝ1 ๐‘ฅ๐‘–1 + โ€ฆ + ๐›ฝ๐‘˜ ๐‘ฅ๐‘–๐‘˜ )

To illustrate the idea that the distribution of ๐‘ฆ๐‘– depends on x๐‘– letโ€™s run a simple simulation
We use our poisson_pmf function from above and arbitrary values for ๐›ฝ and x๐‘–

In [4]: import numpy as np

y_values = range(0, 20)

# Define a parameter vector with estimates


ฮฒ = np.array([0.26, 0.18, 0.25, -0.1, -0.22])

# Create some observations X


datasets = [np.array([0, 1, 1, 1, 2]),
np.array([2, 3, 2, 4, 0]),
np.array([3, 4, 5, 3, 2]),
np.array([6, 5, 4, 4, 7])]

fig, ax = plt.subplots(figsize=(12, 8))

for X in datasets:
ฮผ = exp(X @ ฮฒ)
distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, ฮผ))
ax.plot(y_values,
distribution,
label=f'$\mu_i$={ฮผ:.1}',
marker='o',
markersize=8,
alpha=0.5)

ax.grid()
ax.legend()
ax.set_xlabel('$y \mid x_i$')
ax.set_ylabel(r'$f(y \mid x_i; \beta )$')
ax.axis(xmin=0, ymin=0)
plt.show()
300 19. MAXIMUM LIKELIHOOD ESTIMATION

We can see that the distribution of ๐‘ฆ๐‘– is conditional on x๐‘– (๐œ‡๐‘– is no longer constant)

19.5 Maximum Likelihood Estimation

In our model for number of billionaires, the conditional distribution contains 4 (๐‘˜ = 4) pa-
rameters that we need to estimate
We will label our entire parameter vector as ๐›ฝ where

๐›ฝ0
โŽก๐›ฝ โŽค
๐›ฝ = โŽข 1โŽฅ
โŽข๐›ฝ2 โŽฅ
โŽฃ๐›ฝ3 โŽฆ

To estimate the model using MLE, we want to maximize the likelihood that our estimate ๐›ฝฬ‚ is
the true parameter ๐›ฝ
Intuitively, we want to find the ๐›ฝฬ‚ that best fits our data
First, we need to construct the likelihood function โ„’(๐›ฝ), which is similar to a joint probabil-
ity density function
Assume we have some data ๐‘ฆ๐‘– = {๐‘ฆ1 , ๐‘ฆ2 } and ๐‘ฆ๐‘– โˆผ ๐‘“(๐‘ฆ๐‘– )
If ๐‘ฆ1 and ๐‘ฆ2 are independent, the joint pmf of these data is ๐‘“(๐‘ฆ1 , ๐‘ฆ2 ) = ๐‘“(๐‘ฆ1 ) โ‹… ๐‘“(๐‘ฆ2 )
If ๐‘ฆ๐‘– follows a Poisson distribution with ๐œ† = 7, we can visualize the joint pmf like so

In [5]: from mpl_toolkits.mplot3d import Axes3D

def plot_joint_poisson(ฮผ=7, y_n=20):


19.5. MAXIMUM LIKELIHOOD ESTIMATION 301

yi_values = np.arange(0, y_n, 1)

# Create coordinate points of X and Y


X, Y = np.meshgrid(yi_values, yi_values)

# Multiply distributions together


Z = poisson_pmf(X, ฮผ) * poisson_pmf(Y, ฮผ)

fig = plt.figure(figsize=(12, 8))


ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z.T, cmap='terrain', alpha=0.6)
ax.scatter(X, Y, Z.T, color='black', alpha=0.5, linewidths=1)
ax.set(xlabel='$y_1$', ylabel='$y_2$')
ax.set_zlabel('$f(y_1, y_2)$', labelpad=10)
plt.show()

plot_joint_poisson(ฮผ=7, y_n=20)

Similarly, the joint pmf of our data (which is distributed as a conditional Poisson distribu-
tion) can be written as

๐‘› ๐‘ฆ
๐œ‡ ๐‘–
๐‘“(๐‘ฆ1 , ๐‘ฆ2 , โ€ฆ , ๐‘ฆ๐‘› โˆฃ x1 , x2 , โ€ฆ , x๐‘› ; ๐›ฝ) = โˆ ๐‘– ๐‘’โˆ’๐œ‡๐‘–
๐‘ฆ!
๐‘–=1 ๐‘–

๐‘ฆ๐‘– is conditional on both the values of x๐‘– and the parameters ๐›ฝ


The likelihood function is the same as the joint pmf, but treats the parameter ๐›ฝ as a random
variable and takes the observations (๐‘ฆ๐‘– , x๐‘– ) as given

๐‘› ๐‘ฆ
๐œ‡ ๐‘–
โ„’(๐›ฝ โˆฃ ๐‘ฆ1 , ๐‘ฆ2 , โ€ฆ , ๐‘ฆ๐‘› ; x1 , x2 , โ€ฆ , x๐‘› ) = โˆ ๐‘– ๐‘’โˆ’๐œ‡๐‘–
๐‘ฆ!
๐‘–=1 ๐‘–
=๐‘“(๐‘ฆ1 , ๐‘ฆ2 , โ€ฆ , ๐‘ฆ๐‘› โˆฃ x1 , x2 , โ€ฆ , x๐‘› ; ๐›ฝ)
302 19. MAXIMUM LIKELIHOOD ESTIMATION

Now that we have our likelihood function, we want to find the ๐›ฝฬ‚ that yields the maximum
likelihood value

maxโ„’(๐›ฝ)
๐›ฝ

In doing so it is generally easier to maximize the log-likelihood (consider differentiating


๐‘“(๐‘ฅ) = ๐‘ฅ exp(๐‘ฅ) vs. ๐‘“(๐‘ฅ) = log(๐‘ฅ) + ๐‘ฅ)
Given that taking a logarithm is a monotone increasing transformation, a maximizer of the
likelihood function will also be a maximizer of the log-likelihood function
In our case the log-likelihood is

log โ„’(๐›ฝ) = log (๐‘“(๐‘ฆ1 ; ๐›ฝ) โ‹… ๐‘“(๐‘ฆ2 ; ๐›ฝ) โ‹… โ€ฆ โ‹… ๐‘“(๐‘ฆ๐‘› ; ๐›ฝ))


๐‘›
= โˆ‘ log ๐‘“(๐‘ฆ๐‘– ; ๐›ฝ)
๐‘–=1
๐‘› ๐‘ฆ
๐œ‡๐‘– ๐‘– โˆ’๐œ‡๐‘–
= โˆ‘ log ( ๐‘’ )
๐‘–=1
๐‘ฆ๐‘– !
๐‘› ๐‘› ๐‘›
= โˆ‘ ๐‘ฆ๐‘– log ๐œ‡๐‘– โˆ’ โˆ‘ ๐œ‡๐‘– โˆ’ โˆ‘ log ๐‘ฆ!
๐‘–=1 ๐‘–=1 ๐‘–=1

The MLE of the Poisson to the Poisson for ๐›ฝ ฬ‚ can be obtained by solving

๐‘› ๐‘› ๐‘›
max( โˆ‘ ๐‘ฆ๐‘– log ๐œ‡๐‘– โˆ’ โˆ‘ ๐œ‡๐‘– โˆ’ โˆ‘ log ๐‘ฆ!)
๐›ฝ
๐‘–=1 ๐‘–=1 ๐‘–=1

However, no analytical solution exists to the above problem โ€“ to find the MLE we need to use
numerical methods

19.6 MLE with Numerical Methods

Many distributions do not have nice, analytical solutions and therefore require numerical
methods to solve for parameter estimates
One such numerical method is the Newton-Raphson algorithm
Our goal is to find the maximum likelihood estimate ๐›ฝฬ‚
At ๐›ฝ,ฬ‚ the first derivative of the log-likelihood function will be equal to 0
Letโ€™s illustrate this by supposing

log โ„’(๐›ฝ) = โˆ’(๐›ฝ โˆ’ 10)2 โˆ’ 10

In [6]: ฮฒ = np.linspace(1, 20)


logL = -(ฮฒ - 10) ** 2 - 10
dlogL = -2 * ฮฒ + 20

fig, (ax1, ax2) = plt.subplots(2, sharex=True, figsize=(12, 8))


19.6. MLE WITH NUMERICAL METHODS 303

ax1.plot(ฮฒ, logL, lw=2)


ax2.plot(ฮฒ, dlogL, lw=2)

ax1.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=35,
fontsize=15)
ax2.set_ylabel(r'$\frac{dlog \mathcal{L(\beta)}}{d \beta}$ ',
rotation=0,
labelpad=35,
fontsize=19)
ax2.set_xlabel(r'$\beta$', fontsize=15)
ax1.grid(), ax2.grid()
plt.axhline(c='black')
plt.show()

๐‘‘ log โ„’(๐›ฝ)
The plot shows that the maximum likelihood value (the top plot) occurs when ๐‘‘๐›ฝ = 0
(the bottom plot)
Therefore, the likelihood is maximized when ๐›ฝ = 10
We can also ensure that this value is a maximum (as opposed to a minimum) by checking
that the second derivative (slope of the bottom plot) is negative
The Newton-Raphson algorithm finds a point where the first derivative is 0
To use the algorithm, we take an initial guess at the maximum value, ๐›ฝ0 (the OLS parameter
estimates might be a reasonable guess), then

1. Use the updating rule to iterate the algorithm

๐›ฝ (๐‘˜+1) = ๐›ฝ (๐‘˜) โˆ’ ๐ป โˆ’1 (๐›ฝ (๐‘˜) )๐บ(๐›ฝ (๐‘˜) )

where:
304 19. MAXIMUM LIKELIHOOD ESTIMATION

๐‘‘ log โ„’(๐›ฝ (๐‘˜) )


๐บ(๐›ฝ (๐‘˜) ) =
๐‘‘๐›ฝ (๐‘˜)
๐‘‘2 log โ„’(๐›ฝ (๐‘˜) )
๐ป(๐›ฝ (๐‘˜) ) = โ€ฒ
๐‘‘๐›ฝ (๐‘˜) ๐‘‘๐›ฝ (๐‘˜)
2. Check whether ๐›ฝ (๐‘˜+1) โˆ’ ๐›ฝ (๐‘˜) < ๐‘ก๐‘œ๐‘™

โ€ข If true, then stop iterating and set ๐›ฝฬ‚ = ๐›ฝ (๐‘˜+1)


โ€ข If false, then update ๐›ฝ (๐‘˜+1)

As can be seen from the updating equation, ๐›ฝ (๐‘˜+1) = ๐›ฝ (๐‘˜) only when ๐บ(๐›ฝ (๐‘˜) ) = 0 ie. where the
first derivative is equal to 0
(In practice, we stop iterating when the difference is below a small tolerance threshold)
Letโ€™s have a go at implementing the Newton-Raphson algorithm
First, weโ€™ll create a class called PoissonRegression so we can easily recompute the values
of the log likelihood, gradient and Hessian for every iteration

In [7]: class PoissonRegression:

def __init__(self, y, X, ฮฒ):


self.X = X
self.n, self.k = X.shape
self.y = y.reshape(self.n,1) # Reshape y as a n_by_1 column vector
self.ฮฒ = ฮฒ.reshape(self.k,1) # Reshape ฮฒ as a k_by_1 column vector

def ฮผ(self):
return np.exp(self.X @ self.ฮฒ)

def logL(self):
y = self.y
ฮผ = self.ฮผ()
return np.sum(y * np.log(ฮผ) - ฮผ - np.log(factorial(y)))

def G(self):
y = self.y
ฮผ = self.ฮผ()
return X.T @ (y - ฮผ)

def H(self):
X = self.X
ฮผ = self.ฮผ()
return -(X.T @ (ฮผ * X))

Our function newton_raphson will take a PoissonRegression object that has an initial
guess of the parameter vector ๐›ฝ 0
The algorithm will update the parameter vector according to the updating rule, and recalcu-
late the gradient and Hessian matrices at the new parameter estimates
Iteration will end when either:

โ€ข The difference between the parameter and the updated parameter is below a tolerance
level
โ€ข The maximum number of iterations has been achieved (meaning convergence is not
achieved)
19.6. MLE WITH NUMERICAL METHODS 305

So we can get an idea of whatโ€™s going on while the algorithm is running, an option dis-
play=True is added to print out values at each iteration

In [8]: def newton_raphson(model, tol=1e-3, max_iter=1000, display=True):

i = 0
error = 100 # Initial error value

# Print header of output


if display:
header = f'{"Iteration_k":<13}{"Log-likelihood":<16}{"ฮธ":<60}'
print(header)
print("-" * len(header))

# While loop runs while any value in error is greater


# than the tolerance until max iterations are reached
while np.any(error > tol) and i < max_iter:
H, G = model.H(), model.G()
ฮฒ_new = model.ฮฒ - (np.linalg.inv(H) @ G)
error = ฮฒ_new - model.ฮฒ
model.ฮฒ = ฮฒ_new

# Print iterations
if display:
ฮฒ_list = [f'{t:.3}' for t in list(model.ฮฒ.flatten())]
update = f'{i:<13}{model.logL():<16.8}{ฮฒ_list}'
print(update)

i += 1

print(f'Number of iterations: {i}')


print(f'ฮฒ_hat = {model.ฮฒ.flatten()}')

return model.ฮฒ.flatten() # Return a flat array for ฮฒ (instead of a k_by_1 column vector)

Letโ€™s try out our algorithm with a small dataset of 5 observations and 3 variables in X

In [9]: X = np.array([[1, 2, 5],


[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])

y = np.array([1, 0, 1, 1, 0])

# Take a guess at initial ฮฒs


init_ฮฒ = np.array([0.1, 0.1, 0.1])

# Create an object with Poisson model values


poi = PoissonRegression(y, X, ฮฒ=init_ฮฒ)

# Use newton_raphson to find the MLE


ฮฒ_hat = newton_raphson(poi, display=True)

Iteration_k Log-likelihood ฮธ
-----------------------------------------------------------------------------------------
0 -4.3447622 ['-1.49', '0.265', '0.244']
1 -3.5742413 ['-3.38', '0.528', '0.474']
2 -3.3999526 ['-5.06', '0.782', '0.702']
3 -3.3788646 ['-5.92', '0.909', '0.82']
4 -3.3783559 ['-6.07', '0.933', '0.843']
5 -3.3783555 ['-6.08', '0.933', '0.843']
Number of iterations: 6
ฮฒ_hat = [-6.07848205 0.93340226 0.84329625]

As this was a simple model with few observations, the algorithm achieved convergence in only
6 iterations
306 19. MAXIMUM LIKELIHOOD ESTIMATION

You can see that with each iteration, the log-likelihood value increased
Remember, our objective was to maximize the log-likelihood function, which the algorithm
has worked to achieve
Also, note that the increase in log โ„’(๐›ฝ (๐‘˜) ) becomes smaller with each iteration
This is because the gradient is approaching 0 as we reach the maximum, and therefore the
numerator in our updating equation is becoming smaller
The gradient vector should be close to 0 at ๐›ฝฬ‚

In [10]: poi.G()

Out[10]: array([[-3.95169228e-07],
[-1.00114805e-06],
[-7.73114562e-07]])

The iterative process can be visualized in the following diagram, where the maximum is found
at ๐›ฝ = 10

In [11]: logL = lambda x: -(x - 10) ** 2 - 10

def find_tangent(ฮฒ, a=0.01):


y1 = logL(ฮฒ)
y2 = logL(ฮฒ+a)
x = np.array([[ฮฒ, 1], [ฮฒ+a, 1]])
m, c = np.linalg.lstsq(x, np.array([y1, y2]), rcond=None)[0]
return m, c

ฮฒ = np.linspace(2, 18)
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(ฮฒ, logL(ฮฒ), lw=2, c='black')

for ฮฒ in [7, 8.5, 9.5, 10]:


ฮฒ_line = np.linspace(ฮฒ-2, ฮฒ+2)
m, c = find_tangent(ฮฒ)
y = m * ฮฒ_line + c
ax.plot(ฮฒ_line, y, '-', c='purple', alpha=0.8)
ax.text(ฮฒ+2.05, y[-1], f'$G({ฮฒ}) = {abs(m):.0f}$', fontsize=12)
ax.vlines(ฮฒ, -24, logL(ฮฒ), linestyles='--', alpha=0.5)
ax.hlines(logL(ฮฒ), 6, ฮฒ, linestyles='--', alpha=0.5)

ax.set(ylim=(-24, -4), xlim=(6, 13))


ax.set_xlabel(r'$\beta$', fontsize=15)
ax.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=25,
fontsize=15)
ax.grid(alpha=0.3)
plt.show()
19.7. MAXIMUM LIKELIHOOD ESTIMATION WITH STATSMODELS 307

Note that our implementation of the Newton-Raphson algorithm is rather basic โ€” for more
robust implementations see, for example, scipy.optimize

19.7 Maximum Likelihood Estimation with statsmodels

Now that we know whatโ€™s going on under the hood, we can apply MLE to an interesting ap-
plication
Weโ€™ll use the Poisson regression model in statsmodels to obtain a richer output with stan-
dard errors, test values, and more
statsmodels uses the same algorithm as above to find the maximum likelihood estimates
Before we begin, letโ€™s re-estimate our simple model with statsmodels to confirm we obtain
the same coefficients and log-likelihood value

In [12]: from statsmodels.api import Poisson


from scipy import stats

X = np.array([[1, 2, 5],
[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])

y = np.array([1, 0, 1, 1, 0])

stats_poisson = Poisson(y, X).fit()


print(stats_poisson.summary())

Optimization terminated successfully.


Current function value: 0.675671
Iterations 7
Poisson Regression Results
==============================================================================
308 19. MAXIMUM LIKELIHOOD ESTIMATION

Dep. Variable: y No. Observations: 5


Model: Poisson Df Residuals: 2
Method: MLE Df Model: 2
Date: Fri, 21 Jun 2019 Pseudo R-squ.: 0.2546
Time: 15:37:09 Log-Likelihood: -3.3784
converged: True LL-Null: -4.5325
LLR p-value: 0.3153
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -6.0785 5.279 -1.151 0.250 -16.425 4.268
x1 0.9334 0.829 1.126 0.260 -0.691 2.558
x2 0.8433 0.798 1.057 0.291 -0.720 2.407
==============================================================================

Now letโ€™s replicate results from Daniel Treismanโ€™s paper, Russiaโ€™s Billionaires, mentioned ear-
lier in the lecture
Treisman starts by estimating equation Eq. (1), where:

โ€ข ๐‘ฆ๐‘– is ๐‘›๐‘ข๐‘š๐‘๐‘’๐‘Ÿ ๐‘œ๐‘“ ๐‘๐‘–๐‘™๐‘™๐‘–๐‘œ๐‘›๐‘Ž๐‘–๐‘Ÿ๐‘’๐‘ ๐‘–
โ€ข ๐‘ฅ๐‘–1 is log ๐บ๐ท๐‘ƒ ๐‘๐‘’๐‘Ÿ ๐‘๐‘Ž๐‘๐‘–๐‘ก๐‘Ž๐‘–
โ€ข ๐‘ฅ๐‘–2 is log ๐‘๐‘œ๐‘๐‘ข๐‘™๐‘Ž๐‘ก๐‘–๐‘œ๐‘›๐‘–
โ€ข ๐‘ฅ๐‘–3 is ๐‘ฆ๐‘’๐‘Ž๐‘Ÿ๐‘  ๐‘–๐‘› ๐บ๐ด๐‘‡ ๐‘‡ ๐‘– โ€“ years membership in GATT and WTO (to proxy access to in-
ternational markets)

The paper only considers the year 2008 for estimation


We will set up our variables for estimation like so (you should have the data assigned to df
from earlier in the lecture)

In [13]: # Keep only year 2008


df = df[df['year'] == 2008]

# Add a constant
df['const'] = 1

# Variable sets
reg1 = ['const', 'lngdppc', 'lnpop', 'gattwto08']
reg2 = ['const', 'lngdppc', 'lnpop',
'gattwto08', 'lnmcap08', 'rintr', 'topint08']
reg3 = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08',
'rintr', 'topint08', 'nrrents', 'roflaw']

Then we can use the Poisson function from statsmodels to fit the model
Weโ€™ll use robust standard errors as in the authorโ€™s paper

In [14]: import statsmodels.api as sm

# Specify model
poisson_reg = sm.Poisson(df[['numbil0']], df[reg1],
missing='drop').fit(cov_type='HC0')
print(poisson_reg.summary())

Optimization terminated successfully.


Current function value: 2.226090
Iterations 9
Poisson Regression Results
==============================================================================
Dep. Variable: numbil0 No. Observations: 197
Model: Poisson Df Residuals: 193
19.7. MAXIMUM LIKELIHOOD ESTIMATION WITH STATSMODELS 309

Method: MLE Df Model: 3


Date: Fri, 21 Jun 2019 Pseudo R-squ.: 0.8574
Time: 15:37:10 Log-Likelihood: -438.54
converged: True LL-Null: -3074.7
LLR p-value: 0.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -29.0495 2.578 -11.268 0.000 -34.103 -23.997
lngdppc 1.0839 0.138 7.834 0.000 0.813 1.355
lnpop 1.1714 0.097 12.024 0.000 0.980 1.362
gattwto08 0.0060 0.007 0.868 0.386 -0.008 0.019
==============================================================================

Success! The algorithm was able to achieve convergence in 9 iterations


Our output indicates that GDP per capita, population, and years of membership in the Gen-
eral Agreement on Tariffs and Trade (GATT) are positively related to the number of billion-
aires a country has, as expected
Letโ€™s also estimate the authorโ€™s more full-featured models and display them in a single table

In [15]: from statsmodels.iolib.summary2 import summary_col

regs = [reg1, reg2, reg3]


reg_names = ['Model 1', 'Model 2', 'Model 3']
info_dict = {'Pseudo R-squared': lambda x: f"{x.prsquared:.2f}",
'No. observations': lambda x: f"{int(x.nobs):d}"}
regressor_order = ['const',
'lngdppc',
'lnpop',
'gattwto08',
'lnmcap08',
'rintr',
'topint08',
'nrrents',
'roflaw']
results = []

for reg in regs:


result = sm.Poisson(df[['numbil0']], df[reg],
missing='drop').fit(cov_type='HC0', maxiter=100, disp=0)
results.append(result)

results_table = summary_col(results=results,
float_format='%0.3f',
stars=True,
model_names=reg_names,
info_dict=info_dict,
regressor_order=regressor_order)
results_table.add_title('Table 1 - Explaining the Number of Billionaires in 2008')
print(results_table)

Table 1 - Explaining the Number of Billionaires in 2008


=================================================
Model 1 Model 2 Model 3
-------------------------------------------------
const -29.050*** -19.444*** -20.858***
(2.578) (4.820) (4.255)
lngdppc 1.084*** 0.717*** 0.737***
(0.138) (0.244) (0.233)
lnpop 1.171*** 0.806*** 0.929***
(0.097) (0.213) (0.195)
gattwto08 0.006 0.007 0.004
(0.007) (0.006) (0.006)
lnmcap08 0.399** 0.286*
(0.172) (0.167)
rintr -0.010 -0.009
310 19. MAXIMUM LIKELIHOOD ESTIMATION

(0.010) (0.010)
topint08 -0.051***-0.058***
(0.011) (0.012)
nrrents -0.005
(0.010)
roflaw 0.203
(0.372)
Pseudo R-squared 0.86 0.90 0.90
No. observations 197 131 131
=================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

The output suggests that the frequency of billionaires is positively correlated with GDP
per capita, population size, stock market capitalization, and negatively correlated with top
marginal income tax rate
To analyze our results by country, we can plot the difference between the predicted an actual
values, then sort from highest to lowest and plot the first 15

In [16]: data = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08', 'rintr',


'topint08', 'nrrents', 'roflaw', 'numbil0', 'country']
results_df = df[data].dropna()

# Use last model (model 3)


results_df['prediction'] = results[-1].predict()

# Calculate difference
results_df['difference'] = results_df['numbil0'] - results_df['prediction']

# Sort in descending order


results_df.sort_values('difference', ascending=False, inplace=True)

# Plot the first 15 data points


results_df[:15].plot('country', 'difference', kind='bar', figsize=(12,8), legend=False)
plt.ylabel('Number of billionaires above predicted level')
plt.xlabel('Country')
plt.show()
19.8. SUMMARY 311

As we can see, Russia has by far the highest number of billionaires in excess of what is pre-
dicted by the model (around 50 more than expected)
Treisman uses this empirical result to discuss possible reasons for Russiaโ€™s excess of billion-
aires, including the origination of wealth in Russia, the political climate, and the history of
privatization in the years after the USSR

19.8 Summary

In this lecture, we used Maximum Likelihood Estimation to estimate the parameters of a


Poisson model
statsmodels contains other built-in likelihood models such as Probit and Logit
For further flexibility, statsmodels provides a way to specify the distribution manually us-
ing the GenericLikelihoodModel class - an example notebook can be found here

19.9 Exercises

19.9.1 Exercise 1

Suppose we wanted to estimate the probability of an event ๐‘ฆ๐‘– occurring, given some observa-
tions
312 19. MAXIMUM LIKELIHOOD ESTIMATION

We could use a probit regression model, where the pmf of ๐‘ฆ๐‘– is

๐‘ฆ
๐‘“(๐‘ฆ๐‘– ; ๐›ฝ) = ๐œ‡๐‘– ๐‘– (1 โˆ’ ๐œ‡๐‘– )1โˆ’๐‘ฆ๐‘– , ๐‘ฆ๐‘– = 0, 1
where ๐œ‡๐‘– = ฮฆ(xโ€ฒ๐‘– ๐›ฝ)

ฮฆ represents the cumulative normal distribution and constrains the predicted ๐‘ฆ๐‘– to be be-
tween 0 and 1 (as required for a probability)
๐›ฝ is a vector of coefficients
Following the example in the lecture, write a class to represent the Probit model
To begin, find the log-likelihood function and derive the gradient and Hessian
The scipy module stats.norm contains the functions needed to compute the cmf and pmf
of the normal distribution

19.9.2 Exercise 2

Use the following dataset and initial values of ๐›ฝ to estimate the MLE with the Newton-
Raphson algorithm developed earlier in the lecture

1 2 4 1
โŽก1 1 1โŽค โŽก0โŽค 0.1
โŽข โŽฅ โŽข โŽฅ
X = โŽข1 4 3โŽฅ ๐‘ฆ = โŽข1โŽฅ ๐›ฝ (0) = โŽก
โŽข0.1โŽฅ
โŽค
โŽข1 5 6โŽฅ โŽข1โŽฅ โŽฃ0.1โŽฆ
โŽฃ1 3 5โŽฆ โŽฃ0โŽฆ

Verify your results with statsmodels - you can import the Probit function with the follow-
ing import statement

In [17]: from statsmodels.discrete.discrete_model import Probit

Note that the simple Newton-Raphson algorithm developed in this lecture is very sensitive to
initial values, and therefore you may fail to achieve convergence with different starting values

19.10 Solutions

19.10.1 Exercise 1

The log-likelihood can be written as

๐‘›
log โ„’ = โˆ‘ [๐‘ฆ๐‘– log ฮฆ(xโ€ฒ๐‘– ๐›ฝ) + (1 โˆ’ ๐‘ฆ๐‘– ) log(1 โˆ’ ฮฆ(xโ€ฒ๐‘– ๐›ฝ))]
๐‘–=1

Using the fundamental theorem of calculus, the derivative of a cumulative probability


distribution is its marginal distribution

๐œ•
ฮฆ(๐‘ ) = ๐œ™(๐‘ )
๐œ•๐‘ 
19.10. SOLUTIONS 313

where ๐œ™ is the marginal normal distribution


The gradient vector of the Probit model is

๐‘›
๐œ• log โ„’ ๐œ™(xโ€ฒ๐‘– ๐›ฝ) ๐œ™(xโ€ฒ๐‘– ๐›ฝ)
= โˆ‘ [๐‘ฆ๐‘– โˆ’ (1 โˆ’ ๐‘ฆ ๐‘– ) ]x
๐œ•๐›ฝ ๐‘–=1
ฮฆ(xโ€ฒ๐‘– ๐›ฝ) 1 โˆ’ ฮฆ(xโ€ฒ๐‘– ๐›ฝ) ๐‘–

The Hessian of the Probit model is

๐‘›
๐œ• 2 log โ„’ โ€ฒ ๐œ™(xโ€ฒ๐‘– ๐›ฝ) + xโ€ฒ๐‘– ๐›ฝฮฆ(xโ€ฒ๐‘– ๐›ฝ) ๐œ™๐‘– (xโ€ฒ๐‘– ๐›ฝ) โˆ’ xโ€ฒ๐‘– ๐›ฝ(1 โˆ’ ฮฆ(xโ€ฒ๐‘– ๐›ฝ))
โ€ฒ = โˆ’ โˆ‘ ๐œ™(x ๐‘– ๐›ฝ)[๐‘ฆ ๐‘– โ€ฒ 2
+ (1 โˆ’ ๐‘ฆ ๐‘– ) โ€ฒ 2
]x๐‘– xโ€ฒ๐‘–
๐œ•๐›ฝ๐œ•๐›ฝ ๐‘–=1
[ฮฆ(x ๐‘– ๐›ฝ)] [1 โˆ’ ฮฆ(x ๐‘– ๐›ฝ)]

Using these results, we can write a class for the Probit model as follows

In [18]: from scipy.stats import norm

class ProbitRegression:

def __init__(self, y, X, ฮฒ):


self.X, self.y, self.ฮฒ = X, y, ฮฒ
self.n, self.k = X.shape

def ฮผ(self):
return norm.cdf(self.X @ self.ฮฒ.T)

def ๏ฟฝ(self):
return norm.pdf(self.X @ self.ฮฒ.T)

def logL(self):
ฮผ = self.ฮผ()
return np.sum(y * np.log(ฮผ) + (1 - y) * np.log(1 - ฮผ))

def G(self):
ฮผ = self.ฮผ()
๏ฟฝ = self.๏ฟฝ()
return np.sum((X.T * y * ๏ฟฝ / ฮผ - X.T * (1 - y) * ๏ฟฝ / (1 - ฮผ)), axis=1)

def H(self):
X = self.X
ฮฒ = self.ฮฒ
ฮผ = self.ฮผ()
๏ฟฝ = self.๏ฟฝ()
a = (๏ฟฝ + (X @ ฮฒ.T) * ฮผ) / ฮผ**2
b = (๏ฟฝ - (X @ ฮฒ.T) * (1 - ฮผ)) / (1 - ฮผ)**2
return -(๏ฟฝ * (y * a + (1 - y) * b) * X.T) @ X

19.10.2 Exercise 2
In [19]: X = np.array([[1, 2, 4],
[1, 1, 1],
[1, 4, 3],
[1, 5, 6],
[1, 3, 5]])

y = np.array([1, 0, 1, 1, 0])

# Take a guess at initial ฮฒs


ฮฒ = np.array([0.1, 0.1, 0.1])

# Create instance of Probit regression class


prob = ProbitRegression(y, X, ฮฒ)

# Run Newton-Raphson algorithm


newton_raphson(prob)
314 19. MAXIMUM LIKELIHOOD ESTIMATION

Iteration_k Log-likelihood ฮธ
-----------------------------------------------------------------------------------------
0 -2.3796884 ['-1.34', '0.775', '-0.157']
1 -2.3687526 ['-1.53', '0.775', '-0.0981']
2 -2.3687294 ['-1.55', '0.778', '-0.0971']
3 -2.3687294 ['-1.55', '0.778', '-0.0971']
Number of iterations: 4
ฮฒ_hat = [-1.54625858 0.77778952 -0.09709757]

Out[19]: array([-1.54625858, 0.77778952, -0.09709757])

In [20]: # Use statsmodels to verify results

print(Probit(y, X).fit().summary())

Optimization terminated successfully.


Current function value: 0.473746
Iterations 6
Probit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 5
Model: Probit Df Residuals: 2
Method: MLE Df Model: 2
Date: Fri, 21 Jun 2019 Pseudo R-squ.: 0.2961
Time: 15:37:10 Log-Likelihood: -2.3687
converged: True LL-Null: -3.3651
LLR p-value: 0.3692
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -1.5463 1.866 -0.829 0.407 -5.204 2.111
x1 0.7778 0.788 0.986 0.324 -0.768 2.323
x2 -0.0971 0.590 -0.165 0.869 -1.254 1.060
==============================================================================
Part V

Tools and Techniques

315
20

Geometric Series for Elementary


Economics

20.1 Contents

โ€ข Overview 20.2
โ€ข Key Formulas 20.3
โ€ข Example: The Money Multiplier in Fractional Reserve Banking 20.4
โ€ข Example: The Keynesian Multiplier 20.5
โ€ข Example: Interest Rates and Present Values 20.6
โ€ข Back to the Keynesian Multiplier 20.7

20.2 Overview

The lecture describes important ideas in economics that use the mathematics of geometric
series
Among these are

โ€ข the Keynesian multiplier


โ€ข the money multiplier that prevails in fractional reserve banking systems
โ€ข interest rates and present values of streams of payouts from assets

(As we shall see below, the term multiplier comes down to meaning sum of a convergent
geometric series)
These and other applications prove the truth of the wise crack that

โ€œin economics, a little knowledge of geometric series goes a long way โ€œ

Below weโ€™ll use the following imports

In [1]: import matplotlib.pyplot as plt


import numpy as np

317
318 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

20.3 Key Formulas

To start, let ๐‘ be a real number that lies strictly between โˆ’1 and 1

โ€ข We often write this as ๐‘ โˆˆ (โˆ’1, 1)


โ€ข Here (โˆ’1, 1) denotes the collection of all real numbers that are strictly less than 1 and
strictly greater than โˆ’1
โ€ข The symbol โˆˆ means in or belongs to the set after the symbol

We want to evaluate geometric series of two types โ€“ infinite and finite

20.3.1 Infinite Geometric Series

The first type of geometric that interests us is the infinite series

1 + ๐‘ + ๐‘2 + ๐‘3 + โ‹ฏ

Where โ‹ฏ means that the series continues without limit


The key formula is

1
1 + ๐‘ + ๐‘2 + ๐‘3 + โ‹ฏ = (1)
1โˆ’๐‘
To prove key formula Eq. (1), multiply both sides by (1 โˆ’ ๐‘) and verify that if ๐‘ โˆˆ (โˆ’1, 1),
then the outcome is the equation 1 = 1

20.3.2 Finite Geometric Series

The second series that interests us is the finite geomtric series

1 + ๐‘ + ๐‘2 + ๐‘3 + โ‹ฏ + ๐‘๐‘‡

where ๐‘‡ is a positive integer


The key formula here is

1 โˆ’ ๐‘๐‘‡ +1
1 + ๐‘ + ๐‘2 + ๐‘3 + โ‹ฏ + ๐‘๐‘‡ =
1โˆ’๐‘
Remark: The above formula works for any value of the scalar ๐‘. We donโ€™t have to restrict ๐‘
to be in the set (โˆ’1, 1)
We now move on to describe some famuous economic applications of geometric series

20.4 Example: The Money Multiplier in Fractional Reserve


Banking

In a fractional reserve banking system, banks hold only a fraction ๐‘Ÿ โˆˆ (0, 1) of cash behind
each deposit receipt that they issue
20.4. EXAMPLE: THE MONEY MULTIPLIER IN FRACTIONAL RESERVE BANKING319

โ€ข In recent times

โ€“ cash consists of pieces of paper issued by the government and called dollars or
pounds or โ€ฆ
โ€“ a deposit is a balance in a checking or savings account that entitles the owner to
ask the bank for immediate payment in cash

โ€ข When the UK and France and the US were on either a gold or silver standard (before
1914, for example)

โ€“ cash was a gold or silver coin


โ€“ a deposit receipt was a bank note that the bank promised to convert into gold or
silver on demand; (sometimes it was also a checking or savings account balance)

Economists and financiers often define the supply of money as an economy-wide sum of
cash plus deposits
In a fractional reserve banking system (one in which the reserve ratio ๐‘Ÿ satisfying 0 <
๐‘Ÿ < 1), banks create money by issuing deposits backed by fractional reserves plus loans
that they make to their customers
A geometric series is a key tool for understanding how banks create money (i.e., deposits) in
a fractional reserve system
The geometric series formula Eq. (1) is at the heart of the classic model of the money cre-
ation process โ€“ one that leads us to the celebrated money multiplier

20.4.1 A Simple Model

There is a set of banks named ๐‘– = 0, 1, 2, โ€ฆ


Bank ๐‘–โ€™s loans ๐ฟ๐‘– , deposits ๐ท๐‘– , and reserves ๐‘…๐‘– must satisfy the balance sheet equation (be-
cause balance sheets balance):

๐ฟ๐‘– + ๐‘…๐‘– = ๐ท๐‘–

The left side of the above equation is the sum of the bankโ€™s assets, namely, the loans ๐ฟ๐‘– it
has outstanding plus its reserves of cash ๐‘…๐‘–
The right side records bank ๐‘–โ€™s liabilities, namely, the deposits ๐ท๐‘– held by its depositors; these
are IOUโ€™s from the bank to its depositors in the form of either checking accounts or savings
accounts (or before 1914, bank notes issued by a bank stating promises to redeem note for
gold or silver on demand)
Ecah bank ๐‘– sets its reserves to satisfy the equation

๐‘…๐‘– = ๐‘Ÿ๐ท๐‘– (2)

where ๐‘Ÿ โˆˆ (0, 1) is its reserve-deposit ratio or reserve ratio for short

โ€ข the reserve ratio is either set by a government or chosen by banks for precautionary rea-
sons
320 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Next we add a theory stating that bank ๐‘– + 1โ€™s deposits depend entirely on loans made by
bank ๐‘–, namely

๐ท๐‘–+1 = ๐ฟ๐‘– (3)

Thus, we can think of the banks as being arranged along a line with loans from bank ๐‘– being
immediately deposited in ๐‘– + 1

โ€ข in this way, the debtors to bank ๐‘– become creditors of bank ๐‘– + 1

Finally, we add an initial condition about an exogenous level of bank 0โ€™s deposits

๐ท0 is given exogenously

We can think of ๐ท0 as being the amount of cash that a first depositor put into the first bank
in the system, bank number ๐‘– = 0
Now we do a little algebra
Combining equations Eq. (2) and Eq. (3) tells us that

๐ฟ๐‘– = (1 โˆ’ ๐‘Ÿ)๐ท๐‘– (4)

This states that bank ๐‘– loans a fraction (1 โˆ’ ๐‘Ÿ) of its deposits and keeps a fraction ๐‘Ÿ as cash
reserves
Combining equation Eq. (4) with equation Eq. (3) tells us that

๐ท๐‘–+1 = (1 โˆ’ ๐‘Ÿ)๐ท๐‘– for ๐‘– โ‰ฅ 0

which implies that

๐ท๐‘– = (1 โˆ’ ๐‘Ÿ)๐‘– ๐ท0 for ๐‘– โ‰ฅ 0 (5)

Equation Eq. (5) expresses ๐ท๐‘– as the ๐‘– th term in the product of ๐ท0 and the geometric series

1, (1 โˆ’ ๐‘Ÿ), (1 โˆ’ ๐‘Ÿ)2 , โ‹ฏ

Therefore, the sum of all deposits in our banking system ๐‘– = 0, 1, 2, โ€ฆ is

โˆž
๐ท0 ๐ท
โˆ‘(1 โˆ’ ๐‘Ÿ)๐‘– ๐ท0 = = 0 (6)
๐‘–=0
1 โˆ’ (1 โˆ’ ๐‘Ÿ) ๐‘Ÿ

20.4.2 Money Multiplier

The money multiplier is a number that tells the multiplicative factor by which an exoge-
nous injection of cash into bank 0 leads to an increase in the total deposits in the banking
system
1
Equation Eq. (6) asserts that the money multiplier is ๐‘Ÿ
20.5. EXAMPLE: THE KEYNESIAN MULTIPLIER 321

โ€ข an initial deposit of cash of ๐ท0 in bank 0 leads the banking system to create total de-
posits of ๐ท๐‘Ÿ0
โ€ข The initial deposit ๐ท0 is held as reserves, distributed throughout the banking system
โˆž
according to ๐ท0 = โˆ‘๐‘–=0 ๐‘…๐‘–

20.5 Example: The Keynesian Multiplier

The famous economist John Maynard Keynes and his followers created a simple model in-
tended to determine national income ๐‘ฆ in circumstances in which

โ€ข there are substantial unemployed resources, in particular excess supply of labor and
capital
โ€ข prices and interest rates fail to adjust to make aggregate supply equal demand (e.g.,
prices and interest rates are frozen)
โ€ข national income is entirely determined by aggregate demand

20.5.1 Static Version

An elementary Keynesian model of national income determination consists of three equations


that describe aggegate demand for ๐‘ฆ and its components
The first equation is a national income identity asserting that consumption ๐‘ plus investment
๐‘– equals national income ๐‘ฆ:

๐‘+๐‘– = ๐‘ฆ

The second equation is a Keynesian consumption function asserting that people consume a
fraction ๐‘ โˆˆ (0, 1) of their income:

๐‘ = ๐‘๐‘ฆ

The fraction ๐‘ โˆˆ (0, 1) is called the marginal propensity to consume


The fraction 1 โˆ’ ๐‘ โˆˆ (0, 1) is called the marginal propensity to save
The third equation simply states that investment is exogenous at level ๐‘–

โ€ข exogenous means determined outside this model

Substituting the second equation into the first gives (1 โˆ’ ๐‘)๐‘ฆ = ๐‘–


Solving this equation for ๐‘ฆ gives

1
๐‘ฆ= ๐‘–
1โˆ’๐‘
1
The quantity 1โˆ’๐‘ is called the investment multiplier or simply the multiplier
Applying the formula for the sum of an infinite geometric series, we can write the above equa-
tion as
322 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

โˆž
๐‘ฆ = ๐‘– โˆ‘ ๐‘๐‘ก
๐‘ก=0

where ๐‘ก is a nonnegative integer


So we arrive at the following equivalent expressions for the multiplier:

โˆž
1
= โˆ‘ ๐‘๐‘ก
1โˆ’๐‘ ๐‘ก=0

โˆž
The expression โˆ‘๐‘ก=0 ๐‘๐‘ก motivates an interpretation of the multiplier as the outcome of a dy-
namic process that we describe next

20.5.2 Dynamic Version

We arrive at a dynamic version by interpreting the nonnegative integer ๐‘ก as indexing time and
changing our specification of the consumption function to take time into account

โ€ข we add a one-period lag in how income affects consumption

We let ๐‘๐‘ก be consumption at time ๐‘ก and ๐‘–๐‘ก be investment at time ๐‘ก


We modify our consumption function to assume the form

๐‘๐‘ก = ๐‘๐‘ฆ๐‘กโˆ’1

so that ๐‘ is the marginal propensity to consume (now) out of last periodโ€™s income
We begin wtih an initial condition stating that

๐‘ฆโˆ’1 = 0

We also assume that

๐‘–๐‘ก = ๐‘– for all ๐‘ก โ‰ฅ 0

so that investment is constant over time


It follows that

๐‘ฆ0 = ๐‘– + ๐‘0 = ๐‘– + ๐‘๐‘ฆโˆ’1 = ๐‘–

and

๐‘ฆ1 = ๐‘1 + ๐‘– = ๐‘๐‘ฆ0 + ๐‘– = (1 + ๐‘)๐‘–

and

๐‘ฆ2 = ๐‘2 + ๐‘– = ๐‘๐‘ฆ1 + ๐‘– = (1 + ๐‘ + ๐‘2 )๐‘–
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 323

and more generally

๐‘ฆ๐‘ก = ๐‘๐‘ฆ๐‘กโˆ’1 + ๐‘– = (1 + ๐‘ + ๐‘2 + โ‹ฏ + ๐‘๐‘ก )๐‘–

or

1 โˆ’ ๐‘๐‘ก+1
๐‘ฆ๐‘ก = ๐‘–
1โˆ’๐‘

Evidently, as ๐‘ก โ†’ +โˆž,

1
๐‘ฆ๐‘ก โ†’ ๐‘–
1โˆ’๐‘

Remark 1: The above formula is often applied to assert that an exogenous increase in
investment of ฮ”๐‘– at time 0 ignites a dynamic process of increases in national income by
amounts

ฮ”๐‘–, (1 + ๐‘)ฮ”๐‘–, (1 + ๐‘ + ๐‘2 )ฮ”๐‘–, โ‹ฏ

at times 0, 1, 2, โ€ฆ
Remark 2 Let ๐‘”๐‘ก be an exogenous sequence of government expenditures
If we generalize the model so that the national income identity becomes

๐‘๐‘ก + ๐‘– ๐‘ก + ๐‘” ๐‘ก = ๐‘ฆ ๐‘ก

then a version of the preceding argument shows that the government expenditures mul-
1
tiplier is also 1โˆ’๐‘ , so that a permanent increase in government expenditures ultimately leads
to an increase in national income equal to the multiplier times the increase in government ex-
penditures

20.6 Example: Interest Rates and Present Values

We can apply our formula for geometric series to study how interest rates affect values of
streams of dollar payments that extend over time
We work in discrete time and assume that ๐‘ก = 0, 1, 2, โ€ฆ indexes time
We let ๐‘Ÿ โˆˆ (0, 1) be a one-period net nominal interest rate

โ€ข if the nominal interest rate is 5 percent, then ๐‘Ÿ = .05

A one-period gross nominal interest rate ๐‘… is defined as

๐‘… = 1 + ๐‘Ÿ โˆˆ (1, 2)

โ€ข if ๐‘Ÿ = .05, then ๐‘… = 1.05


324 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Remark: The gross nominal interest rate ๐‘… is an exchange rate or relative price of dol-
lars at between times ๐‘ก and ๐‘ก + 1. The units of ๐‘… are dollars at time ๐‘ก + 1 per dollar at time
๐‘ก
When people borrow and lend, they trade dollars now for dollars later or dollars later for dol-
lars now
The price at which these exchanges occur is the gross nominal interest rate

โ€ข If I sell ๐‘ฅ dollars to you today, you pay me ๐‘…๐‘ฅ dollars tomorrow


โ€ข This means that you borrowed ๐‘ฅ dollars for me at a gross interest rate ๐‘… and a net in-
terest rate ๐‘Ÿ

We assume that the net nominal interest rate ๐‘Ÿ is fixed over time, so that ๐‘… is the gross nom-
inal interest rate at times ๐‘ก = 0, 1, 2, โ€ฆ
Two important geometric sequences are

1, ๐‘…, ๐‘…2 , โ‹ฏ (7)

and

1, ๐‘…โˆ’1 , ๐‘…โˆ’2 , โ‹ฏ (8)

Sequence Eq. (7) tells us how dollar values of an investment accumulate through time
Sequence Eq. (8) tells us how to discount future dollars to get their values in terms of to-
dayโ€™s dollars

20.6.1 Accumulation

Geometric sequence Eq. (7) tells us how one dollar invested and re-invested in a project with
gross one period nominal rate of return accumulates

โ€ข here we assume that net interest payments are reinvested in the project
โ€ข thus, 1 dollar invested at time 0 pays interest ๐‘Ÿ dollars after one period, so we have ๐‘Ÿ +
1 = ๐‘… dollars at time1
โ€ข at time 1 we reinvest 1 + ๐‘Ÿ = ๐‘… dollars and receive interest of ๐‘Ÿ๐‘… dollars at time 2 plus
the principal ๐‘… dollars, so we receive ๐‘Ÿ๐‘… + ๐‘… = (1 + ๐‘Ÿ)๐‘… = ๐‘…2 dollars at the end of
period 2
โ€ข and so on

Evidently, if we invest ๐‘ฅ dollars at time 0 and reinvest the proceeds, then the sequence

๐‘ฅ, ๐‘ฅ๐‘…, ๐‘ฅ๐‘…2 , โ‹ฏ

tells how our account accumulates at dates ๐‘ก = 0, 1, 2, โ€ฆ


20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 325

20.6.2 Discounting

Geometric sequence Eq. (8) tells us how much future dollars are worth in terms of todayโ€™s
dollars
Remember that the units of ๐‘… are dollars at ๐‘ก + 1 per dollar at ๐‘ก
It follows that

โ€ข the units of ๐‘…โˆ’1 are dollars at ๐‘ก per dollar at ๐‘ก + 1


โ€ข the units of ๐‘…โˆ’2 are dollars at ๐‘ก per dollar at ๐‘ก + 2
โ€ข and so on; the units of ๐‘…โˆ’๐‘— are dollars at ๐‘ก per dollar at ๐‘ก + ๐‘—

So if someone has a claim on ๐‘ฅ dollars at time ๐‘ก + ๐‘—, it is worth ๐‘ฅ๐‘…โˆ’๐‘— dollars at time ๐‘ก (e.g.,
today)

20.6.3 Application to Asset Pricing

A lease requires a payments stream of ๐‘ฅ๐‘ก dollars at times ๐‘ก = 0, 1, 2, โ€ฆ where

๐‘ฅ๐‘ก = ๐บ๐‘ก ๐‘ฅ0

where ๐บ = (1 + ๐‘”) and ๐‘” โˆˆ (0, 1)


Thus, lease payments increase at ๐‘” percent per period
For a reason soon to be revealed, we assume that ๐บ < ๐‘…
The present value of the lease is

๐‘0 = ๐‘ฅ0 + ๐‘ฅ1 /๐‘… + ๐‘ฅ2 /(๐‘…2 )+ โ‹ฑ
= ๐‘ฅ0 (1 + ๐บ๐‘…โˆ’1 + ๐บ2 ๐‘…โˆ’2 + โ‹ฏ)
1
= ๐‘ฅ0
1 โˆ’ ๐บ๐‘…โˆ’1

where the last line uses the formula for an infinite geometric series
Recall that ๐‘… = 1 + ๐‘Ÿ and ๐บ = 1 + ๐‘” and that ๐‘… > ๐บ and ๐‘Ÿ > ๐‘” and that ๐‘Ÿ and๐‘” are typically
small numbers, e.g., .05 or .03
1
Use the Taylor series of 1+๐‘Ÿ about ๐‘Ÿ = 0, namely,

1
= 1 โˆ’ ๐‘Ÿ + ๐‘Ÿ2 โˆ’ ๐‘Ÿ3 + โ‹ฏ
1+๐‘Ÿ

1
and the fact that ๐‘Ÿ is small to aproximate 1+๐‘Ÿ โ‰ˆ 1โˆ’๐‘Ÿ
Use this approximation to write ๐‘0 as
326 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

1
๐‘0 = ๐‘ฅ0
1 โˆ’ ๐บ๐‘…โˆ’1
1
= ๐‘ฅ0
1 โˆ’ (1 + ๐‘”)(1 โˆ’ ๐‘Ÿ)
1
= ๐‘ฅ0
1 โˆ’ (1 + ๐‘” โˆ’ ๐‘Ÿ โˆ’ ๐‘Ÿ๐‘”)
1
โ‰ˆ ๐‘ฅ0
๐‘Ÿโˆ’๐‘”

where the last step uses the approximation ๐‘Ÿ๐‘” โ‰ˆ 0


The approximation

๐‘ฅ0
๐‘0 =
๐‘Ÿโˆ’๐‘”

is known as the Gordon formula for the present value or current price of an infinite pay-
ment stream ๐‘ฅ0 ๐บ๐‘ก when the nominal one-period interest rate is ๐‘Ÿ and when ๐‘Ÿ > ๐‘”
We can also extend the asset pricing formula so that it applies to finite leases
Let the payment stream on the lease now be ๐‘ฅ๐‘ก for ๐‘ก = 1, 2, โ€ฆ , ๐‘‡ , where again

๐‘ฅ๐‘ก = ๐บ๐‘ก ๐‘ฅ0

The present value of this lease is:

๐‘0 = ๐‘ฅ0 + ๐‘ฅ1 /๐‘… + โ‹ฏ + ๐‘ฅ๐‘‡ /๐‘…๐‘‡
= ๐‘ฅ0 (1 + ๐บ๐‘…โˆ’1 + โ‹ฏ + ๐บ๐‘‡ ๐‘…โˆ’๐‘‡ )
๐‘ฅ0 (1 โˆ’ ๐บ๐‘‡ +1 ๐‘…โˆ’(๐‘‡ +1) )
=
1 โˆ’ ๐บ๐‘…โˆ’1

Applying the Taylor series to ๐‘…โˆ’(๐‘‡ +1) about ๐‘Ÿ = 0 we get:

1 1
= 1 โˆ’ ๐‘Ÿ(๐‘‡ + 1) + ๐‘Ÿ2 (๐‘‡ + 1)(๐‘‡ + 2) + โ‹ฏ โ‰ˆ 1 โˆ’ ๐‘Ÿ(๐‘‡ + 1)
(1 + ๐‘Ÿ)๐‘‡ +1 2

Similarly, applying the Taylor series to ๐บ๐‘‡ +1 about ๐‘” = 0:

(1 + ๐‘”)๐‘‡ +1 = 1 + (๐‘‡ + 1)๐‘”(1 + ๐‘”)๐‘‡ + (๐‘‡ + 1)๐‘‡ ๐‘”2 (1 + ๐‘”)๐‘‡ โˆ’1 + โ‹ฏ โ‰ˆ 1 + (๐‘‡ + 1)๐‘”

Thus, we get the following approximation:

๐‘ฅ0 (1 โˆ’ (1 + (๐‘‡ + 1)๐‘”)(1 โˆ’ ๐‘Ÿ(๐‘‡ + 1)))


๐‘0 =
1 โˆ’ (1 โˆ’ ๐‘Ÿ)(1 + ๐‘”)

Expanding:
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 327

๐‘ฅ0 (1 โˆ’ 1 + (๐‘‡ + 1)2 ๐‘Ÿ๐‘” โˆ’ ๐‘Ÿ(๐‘‡ + 1) + ๐‘”(๐‘‡ + 1))


๐‘0 =
1 โˆ’ 1 + ๐‘Ÿ โˆ’ ๐‘” + ๐‘Ÿ๐‘”
๐‘ฅ (๐‘‡ + 1)((๐‘‡ + 1)๐‘Ÿ๐‘” + ๐‘Ÿ โˆ’ ๐‘”)
= 0
๐‘Ÿ โˆ’ ๐‘” + ๐‘Ÿ๐‘”
๐‘ฅ0 (๐‘‡ + 1)(๐‘Ÿ โˆ’ ๐‘”) ๐‘ฅ0 ๐‘Ÿ๐‘”(๐‘‡ + 1)
โ‰ˆ +
๐‘Ÿโˆ’๐‘” ๐‘Ÿโˆ’๐‘”
๐‘ฅ0 ๐‘Ÿ๐‘”(๐‘‡ + 1)
= ๐‘ฅ0 (๐‘‡ + 1) +
๐‘Ÿโˆ’๐‘”

We could have also approximated by removing the second term ๐‘Ÿ๐‘”๐‘ฅ0 (๐‘‡ + 1) when ๐‘‡ is rela-
tively small compared to 1/(๐‘Ÿ๐‘”) to get ๐‘ฅ0 (๐‘‡ + 1) as in the finite stream approximation
We will plot the true finite stream present-value and the two approximations, under different
values of ๐‘‡ , and ๐‘” and ๐‘Ÿ in python
First we plot the true finite stream present-value after computing it below

In [2]: # True present value of a finite lease


def finite_lease_pv(T, g, r, x_0):
G = (1 + g)
R = (1 + r)
return (x_0 * (1 - G**(T + 1) * R**(-T - 1))) / (1 - G * R**(-1))
# First approximation for our finite lease

def finite_lease_pv_approx_f(T, g, r, x_0):


p = x_0 * (T + 1) + x_0 * r * g * (T + 1) / (r - g)
return p

# Second approximation for our finite lease


def finite_lease_pv_approx_s(T, g, r, x_0):
return (x_0 * (T + 1))

# Infinite lease
def infinite_lease(g, r, x_0):
G = (1 + g)
R = (1 + r)
return x_0 / (1 - G * R**(-1))

Now that we have test run our functions, we can plot some outcomes
First we study the quality of our approximations

In [3]: g = 0.02
r = 0.03
x_0 = 1
T_max = 50
T = np.arange(0, T_max+1)
fig, ax = plt.subplots()
ax.set_title('Finite Lease Present Value $T$ Periods Ahead')
y_1 = finite_lease_pv(T, g, r, x_0)
y_2 = finite_lease_pv_approx_f(T, g, r, x_0)
y_3 = finite_lease_pv_approx_s(T, g, r, x_0)
ax.plot(T, y_1, label='True T-period Lease PV')
ax.plot(T, y_2, label='T-period Lease First-order Approx.')
ax.plot(T, y_3, label='T-period Lease First-order Approx. adj.')
ax.legend()
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
plt.show()
328 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Evidently our approximations perform well for small values of ๐‘‡


However, holding ๐‘” and r fixed, our approximations deteriorate as ๐‘‡ increases
Next we compare the infinite and finite duration lease present values over different lease
lengths ๐‘‡

In [4]: # Convergence of infinite and finite


T_max = 1000
T = np.arange(0, T_max+1)
fig, ax = plt.subplots()
ax.set_title('Infinite and Finite Lease Present Value $T$ Periods Ahead')
y_1 = finite_lease_pv(T, g, r, x_0)
y_2 = np.ones(T_max+1)*infinite_lease(g, r, x_0)
ax.plot(T, y_1, label='T-period lease PV')
ax.plot(T, y_2, '--', label='Infinite lease PV')
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
ax.legend()
plt.show()
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 329

The above graphs shows how as duration ๐‘‡ โ†’ +โˆž, the value of a lease of duration ๐‘‡ ap-
proaches the value of a perpetural lease
Now we consider two different views of what happens as ๐‘Ÿ and ๐‘” covary

In [5]: # First view


# Changing r and g
fig, ax = plt.subplots()
ax.set_title('Value of lease of length $T$')
ax.set_ylabel('Present Value, $p_0$')
ax.set_xlabel('$T$ periods ahead')
T_max = 10
T=np.arange(0, T_max+1)
# r >> g, much bigger than g
r = 0.9
g = 0.4
ax.plot(finite_lease_pv(T, g, r, x_0), label='$r\gg g$')
# r > g
r = 0.5
g = 0.4
ax.plot(finite_lease_pv(T, g, r, x_0), label='$r>g$', color='green')

# r ~ g, not defined when r = g, but approximately goes to straight line with slope 1
r = 0.4001
g = 0.4
ax.plot(finite_lease_pv(T, g, r, x_0), label=r'$r \approx g$', color='orange')

# r < g
r = 0.4
g = 0.5
ax.plot(finite_lease_pv(T, g, r, x_0), label='$r<g$', color='red')
ax.legend()
plt.show()
330 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

The above graphs gives a big hint for why the condition ๐‘Ÿ > ๐‘” is necessary if a lease of length
๐‘‡ = +โˆž is to have finite value
For fans of 3-d graphs the same point comes through in the following graph
If you arenโ€™t enamored of 3-d graphs, feel free to skip the next visualization!

In [6]: # Second view


from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
T = 3
ax = fig.gca(projection='3d')
r = np.arange(0.01, 0.99, 0.005)
g = np.arange(0.01, 0.99, 0.005)

rr, gg = np.meshgrid(r, g)
z = finite_lease_pv(T, gg, rr, x_0)

# Removes points where undefined


same = (rr == gg)
z[same] = np.nan
surf = ax.plot_surface(rr, gg, z, cmap=cm.coolwarm, antialiased=True, clim=(0, 15))
fig.colorbar(surf, shrink=0.5, aspect=5)
ax.set_xlabel('$r$')
ax.set_ylabel('$g$')
ax.set_zlabel('Present Value, $p_0$')
ax.view_init(20, 10)
ax.set_title('Three Period Lease PV with Varying $g$ and $r$')
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encou


"""
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: invalid value encoun
"""
/home/anju/anaconda3/lib/python3.7/site-packages/matplotlib/colors.py:512: RuntimeWarning: invalid value encou
xa[xa < 0] = -1
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 331

We can use a little calculus to study how the present value ๐‘0 of a lease varies with ๐‘Ÿ and ๐‘”
We will use a library called SymPy
SymPy enables us to do symbolic math calculations including computing derivatives of alge-
braic equations.
We will illustrate how it works by creating a symbolic expression that represents our present
value formula for an infinite lease
After that, weโ€™ll use SymPy to compute derivatives

In [7]: import sympy as sym


from sympy import init_printing

# Creates algebraic symbols that can be used in an algebraic expression


g, r, x0 = sym.symbols('g, r, x0')
G = (1 + g)
R = (1 + r)
p0 = x0 / (1 - G * R**(-1))
init_printing()
print('Our formula is:')
p0

Our formula is:

Out[7]:

๐‘ฅ0
๐‘”+1
โˆ’ ๐‘Ÿ+1 + 1

In [8]: print('dp0 / dg is:')


dp_dg = sym.diff(p0, g)
dp_dg

dp0 / dg is:
332 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Out[8]:

๐‘ฅ0
2
(๐‘Ÿ + 1) (โˆ’ ๐‘”+1
๐‘Ÿ+1 + 1)

In [9]: print('dp0 / dr is:')


dp_dr = sym.diff(p0, r)
dp_dr

dp0 / dr is:

Out[9]:

๐‘ฅ0 (โˆ’๐‘” โˆ’ 1)
2 2
(๐‘Ÿ + 1) (โˆ’ ๐‘”+1
๐‘Ÿ+1 + 1)

We can see that for ๐œ•๐‘


๐œ•๐‘Ÿ < 0 as long as ๐‘Ÿ > ๐‘”, ๐‘Ÿ > 0 and ๐‘” > 0 and ๐‘ฅ0 is positive, this equation
0

will always be negative


Similarly, ๐œ•๐‘
๐œ•๐‘” > 0 as long as ๐‘Ÿ > ๐‘”, ๐‘Ÿ > 0 and ๐‘” > 0 and ๐‘ฅ0 is positive, this equation will
0

always be postive

20.7 Back to the Keynesian Multiplier

We will now go back to the case of the Keynesian multiplier and plot the time path of ๐‘ฆ๐‘ก ,
given that consumption is a constant fraction of national income, and investment is fixed

In [10]: # Function that calculates a path of y


def calculate_y(i, b, g, T, y_init):
y = np.zeros(T+1)
y[0] = i + b * y_init + g
for t in range(1, T+1):
y[t] = b * y[t-1] + i + g
return y

# Initial values
i_0 = 0.3
g_0 = 0.3
# 2/3 of income goes towards consumption
b = 2/3
y_init = 0
T = 100

fig, ax = plt.subplots()
ax.set_title('Path of Aggregate Output Over Time')
ax.set_xlabel('$t$')
ax.set_ylabel('$y_t$')
ax.plot(np.arange(0, T+1), calculate_y(i_0, b, g_0, T, y_init))
# Output predicted by geometric series
ax.hlines(i_0 / (1 - b) + g_0 / (1 - b), xmin=-1, xmax=101, linestyles='--')
plt.show()
20.7. BACK TO THE KEYNESIAN MULTIPLIER 333

In this model, income grows over time, until it gradually converges to the infinite geometric
series sum of income
We now examine what will happen if we vary the so-called marginal propensity to con-
sume, i.e., the fraction of income that is consumed

In [11]: # Changing fraction of consumption


b_0 = 1/3
b_1 = 2/3
b_2 = 5/6
b_3 = 0.9

fig,ax = plt.subplots()
ax.set_title('Changing Consumption as a Fraction of Income')
ax.set_ylabel('$y_t$')
ax.set_xlabel('$t$')
x = np.arange(0, T+1)
for b in (b_0, b_1, b_2, b_3):
y = calculate_y(i_0, b, g_0, T, y_init)
ax.plot(x, y, label=r'$b=$'+f"{b:.2f}")
ax.legend()
plt.show()
334 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Increasing the marginal propensity to consumer ๐‘ increases the path of output over time

In [12]: x = np.arange(0, T+1)


y_0 = calculate_y(i_0, b, g_0, T, y_init)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(6, 10))
fig.subplots_adjust(hspace=0.3)

# Changing initial investment:


i_1 = 0.4
y_1 = calculate_y(i_1, b, g_0, T, y_init)
ax1.set_title('An Increase in Investment on Output')
ax1.plot(x, y_0, label=r'$i=0.3$', linestyle='--')
ax1.plot(x, y_1, label=r'$i=0.4$')
ax1.legend()
ax1.set_ylabel('$y_t$')
ax1.set_xlabel('$t$')

# Changing government spending


g_1 = 0.4
y_1 = calculate_y(i_0, b, g_1, T, y_init)
ax2.set_title('An Increase in Government Spending on Output')
ax2.plot(x, y_0, label=r'$g=0.3$', linestyle='--')
ax2.plot(x, y_1, label=r'$g=0.4$')
ax2.legend()
ax2.set_ylabel('$y_t$')
ax2.set_xlabel('$t$')
plt.show()
20.7. BACK TO THE KEYNESIAN MULTIPLIER 335

Notice here, whether government spending increases from 0.3 to 0.4 or investment increases
from 0.3 to 0.4, the shifts in the graphs are identical
336 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
21

Linear Algebra

21.1 Contents

โ€ข Overview 21.2

โ€ข Vectors 21.3

โ€ข Matrices 21.4

โ€ข Solving Systems of Equations 21.5

โ€ข Eigenvalues and Eigenvectors 21.6

โ€ข Further Topics 21.7

โ€ข Exercises 21.8

โ€ข Solutions 21.9

21.2 Overview

Linear algebra is one of the most useful branches of applied mathematics for economists to
invest in
For example, many applied problems in economics and finance require the solution of a linear
system of equations, such as

๐‘ฆ1 = ๐‘Ž๐‘ฅ1 + ๐‘๐‘ฅ2
๐‘ฆ2 = ๐‘๐‘ฅ1 + ๐‘‘๐‘ฅ2

or, more generally,

๐‘ฆ1 = ๐‘Ž11 ๐‘ฅ1 + ๐‘Ž12 ๐‘ฅ2 + โ‹ฏ + ๐‘Ž1๐‘˜ ๐‘ฅ๐‘˜


โ‹ฎ (1)
๐‘ฆ๐‘› = ๐‘Ž๐‘›1 ๐‘ฅ1 + ๐‘Ž๐‘›2 ๐‘ฅ2 + โ‹ฏ + ๐‘Ž๐‘›๐‘˜ ๐‘ฅ๐‘˜

The objective here is to solve for the โ€œunknownsโ€ ๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘˜ given ๐‘Ž11 , โ€ฆ , ๐‘Ž๐‘›๐‘˜ and ๐‘ฆ1 , โ€ฆ , ๐‘ฆ๐‘›

337
338 21. LINEAR ALGEBRA

When considering such problems, it is essential that we first consider at least some of the fol-
lowing questions

โ€ข Does a solution actually exist?


โ€ข Are there in fact many solutions, and if so how should we interpret them?
โ€ข If no solution exists, is there a best โ€œapproximateโ€ solution?
โ€ข If a solution exists, how should we compute it?

These are the kinds of topics addressed by linear algebra


In this lecture we will cover the basics of linear and matrix algebra, treating both theory and
computation
We admit some overlap with this lecture, where operations on NumPy arrays were first ex-
plained
Note that this lecture is more theoretical than most, and contains background material that
will be used in applications as we go along

21.3 Vectors

A vector of length ๐‘› is just a sequence (or array, or tuple) of ๐‘› numbers, which we write as
๐‘ฅ = (๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘› ) or ๐‘ฅ = [๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘› ]
We will write these sequences either horizontally or vertically as we please
(Later, when we wish to perform certain matrix operations, it will become necessary to distin-
guish between the two)
The set of all ๐‘›-vectors is denoted by R๐‘›
For example, R2 is the plane, and a vector in R2 is just a point in the plane
Traditionally, vectors are represented visually as arrows from the origin to the point
The following figure represents three vectors in this manner

In [1]: import matplotlib.pyplot as plt


%matplotlib inline

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(-5, 5), ylim=(-5, 5))


ax.grid()
vecs = ((2, 4), (-3, 3), (-4, -3.5))
for v in vecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.7,
width=0.5))
ax.text(1.1 * v[0], 1.1 * v[1], str(v))
plt.show()
21.3. VECTORS 339

21.3.1 Vector Operations

The two most common operators for vectors are addition and scalar multiplication, which we
now describe
As a matter of definition, when we add two vectors, we add them element-by-element

๐‘ฅ1 ๐‘ฆ1 ๐‘ฅ1 + ๐‘ฆ1
โŽก๐‘ฅ โŽค โŽก๐‘ฆ โŽค โŽก๐‘ฅ + ๐‘ฆ โŽค
๐‘ฅ + ๐‘ฆ = โŽข 2 โŽฅ + โŽข 2 โŽฅ โˆถ= โŽข 2 2โŽฅ
โŽข โ‹ฎ โŽฅ โŽข โ‹ฎ โŽฅ โŽข โ‹ฎ โŽฅ
๐‘ฅ
โŽฃ ๐‘›โŽฆ โŽฃ ๐‘›โŽฆ๐‘ฆ ๐‘ฅ
โŽฃ ๐‘› + ๐‘ฆ ๐‘›โŽฆ

Scalar multiplication is an operation that takes a number ๐›พ and a vector ๐‘ฅ and produces

๐›พ๐‘ฅ1
โŽก ๐›พ๐‘ฅ โŽค
๐›พ๐‘ฅ โˆถ= โŽข 2 โŽฅ
โŽข โ‹ฎ โŽฅ
โŽฃ๐›พ๐‘ฅ๐‘› โŽฆ

Scalar multiplication is illustrated in the next figure

In [2]: import numpy as np

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
340 21. LINEAR ALGEBRA

for spine in ['left', 'bottom']:


ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(-5, 5), ylim=(-5, 5))


x = (2, 2)
ax.annotate('', xy=x, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=1,
width=0.5))
ax.text(x[0] + 0.4, x[1] - 0.2, '$x$', fontsize='16')

scalars = (-2, 2)
x = np.array(x)

for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] - 0.2, f'${s} x$', fontsize='16')
plt.show()

In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is
more commonly represented as a NumPy array
One advantage of NumPy arrays is that scalar multiplication and addition have very natural
syntax
21.3. VECTORS 341

In [3]: x = np.ones(3) # Vector of three ones


y = np.array((2, 4, 6)) # Converts tuple (2, 4, 6) into array
x + y

Out[3]: array([3., 5., 7.])

In [4]: 4 * x

Out[4]: array([4., 4., 4.])

21.3.2 Inner Product and Norm

The inner product of vectors ๐‘ฅ, ๐‘ฆ โˆˆ R๐‘› is defined as

๐‘›
๐‘ฅโ€ฒ ๐‘ฆ โˆถ= โˆ‘ ๐‘ฅ๐‘– ๐‘ฆ๐‘–
๐‘–=1

Two vectors are called orthogonal if their inner product is zero


The norm of a vector ๐‘ฅ represents its โ€œlengthโ€ (i.e., its distance from the zero vector) and is
defined as

1/2
โˆš ๐‘›
โ€–๐‘ฅโ€– โˆถ= ๐‘ฅโ€ฒ ๐‘ฅ โˆถ= (โˆ‘ ๐‘ฅ2๐‘– )
๐‘–=1

The expression โ€–๐‘ฅ โˆ’ ๐‘ฆโ€– is thought of as the distance between ๐‘ฅ and ๐‘ฆ


Continuing on from the previous example, the inner product and norm can be computed as
follows

In [5]: np.sum(x * y) # Inner product of x and y

Out[5]: 12.0

In [6]: np.sqrt(np.sum(x**2)) # Norm of x, take one

Out[6]: 1.7320508075688772

In [7]: np.linalg.norm(x) # Norm of x, take two

Out[7]: 1.7320508075688772

21.3.3 Span

Given a set of vectors ๐ด โˆถ= {๐‘Ž1 , โ€ฆ , ๐‘Ž๐‘˜ } in R๐‘› , itโ€™s natural to think about the new vectors we
can create by performing linear operations
New vectors created in this manner are called linear combinations of ๐ด
In particular, ๐‘ฆ โˆˆ R๐‘› is a linear combination of ๐ด โˆถ= {๐‘Ž1 , โ€ฆ , ๐‘Ž๐‘˜ } if

๐‘ฆ = ๐›ฝ1 ๐‘Ž1 + โ‹ฏ + ๐›ฝ๐‘˜ ๐‘Ž๐‘˜ for some scalars ๐›ฝ1 , โ€ฆ , ๐›ฝ๐‘˜


342 21. LINEAR ALGEBRA

In this context, the values ๐›ฝ1 , โ€ฆ , ๐›ฝ๐‘˜ are called the coefficients of the linear combination
The set of linear combinations of ๐ด is called the span of ๐ด
The next figure shows the span of ๐ด = {๐‘Ž1 , ๐‘Ž2 } in R3
The span is a two-dimensional plane passing through these two points and the origin

In [8]: from matplotlib import cm


from mpl_toolkits.mplot3d import Axes3D
from scipy.interpolate import interp2d

fig = plt.figure(figsize=(10, 8))


ax = fig.gca(projection='3d')

x_min, x_max = -5, 5


y_min, y_max = -5, 5

ฮฑ, ฮฒ = 0.2, 0.1

ax.set(xlim=(x_min, x_max), ylim=(x_min, x_max), zlim=(x_min, x_max),


xticks=(0,), yticks=(0,), zticks=(0,))

gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k-', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k-', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k-', lw=2, alpha=0.5)

# Fixed linear function, to generate a plane


def f(x, y):
return ฮฑ * x + ฮฒ * y

# Vector locations, by coordinate


x_coords = np.array((3, 3))
y_coords = np.array((4, -4))
z = f(x_coords, y_coords)
for i in (0, 1):
ax.text(x_coords[i], y_coords[i], z[i], f'$a_{i+1}$', fontsize=14)

# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b-', lw=1.5, alpha=0.6)

# Draw the plane


grid_size = 20
xr2 = np.linspace(x_min, x_max, grid_size)
yr2 = np.linspace(y_min, y_max, grid_size)
x2, y2 = np.meshgrid(xr2, yr2)
z2 = f(x2, y2)
ax.plot_surface(x2, y2, z2, rstride=1, cstride=1, cmap=cm.jet,
linewidth=0, antialiased=True, alpha=0.2)
plt.show()
21.3. VECTORS 343

Examples
If ๐ด contains only one vector ๐‘Ž1 โˆˆ R2 , then its span is just the scalar multiples of ๐‘Ž1 , which is
the unique line passing through both ๐‘Ž1 and the origin
If ๐ด = {๐‘’1 , ๐‘’2 , ๐‘’3 } consists of the canonical basis vectors of R3 , that is

1 0 0
๐‘’1 โˆถ= โŽก โŽค
โŽข0โŽฅ , ๐‘’2 โˆถ= โŽก โŽค
โŽข1โŽฅ , ๐‘’3 โˆถ= โŽก
โŽข0โŽฅ
โŽค
โŽฃ0โŽฆ โŽฃ0โŽฆ โŽฃ1โŽฆ

then the span of ๐ด is all of R3 , because, for any ๐‘ฅ = (๐‘ฅ1 , ๐‘ฅ2 , ๐‘ฅ3 ) โˆˆ R3 , we can write

๐‘ฅ = ๐‘ฅ 1 ๐‘’1 + ๐‘ฅ 2 ๐‘’2 + ๐‘ฅ 3 ๐‘’3

Now consider ๐ด0 = {๐‘’1 , ๐‘’2 , ๐‘’1 + ๐‘’2 }


If ๐‘ฆ = (๐‘ฆ1 , ๐‘ฆ2 , ๐‘ฆ3 ) is any linear combination of these vectors, then ๐‘ฆ3 = 0 (check it)
Hence ๐ด0 fails to span all of R3

21.3.4 Linear Independence

As weโ€™ll see, itโ€™s often desirable to find families of vectors with relatively large span, so that
many vectors can be described by linear operators on a few vectors
344 21. LINEAR ALGEBRA

The condition we need for a set of vectors to have a large span is whatโ€™s called linear inde-
pendence
In particular, a collection of vectors ๐ด โˆถ= {๐‘Ž1 , โ€ฆ , ๐‘Ž๐‘˜ } in R๐‘› is said to be

โ€ข linearly dependent if some strict subset of ๐ด has the same span as ๐ด


โ€ข linearly independent if it is not linearly dependent

Put differently, a set of vectors is linearly independent if no vector is redundant to the span
and linearly dependent otherwise
To illustrate the idea, recall the figure that showed the span of vectors {๐‘Ž1 , ๐‘Ž2 } in R3 as a
plane through the origin
If we take a third vector ๐‘Ž3 and form the set {๐‘Ž1 , ๐‘Ž2 , ๐‘Ž3 }, this set will be

โ€ข linearly dependent if ๐‘Ž3 lies in the plane


โ€ข linearly independent otherwise

As another illustration of the concept, since R๐‘› can be spanned by ๐‘› vectors (see the discus-
sion of canonical basis vectors above), any collection of ๐‘š > ๐‘› vectors in R๐‘› must be linearly
dependent
The following statements are equivalent to linear independence of ๐ด โˆถ= {๐‘Ž1 , โ€ฆ , ๐‘Ž๐‘˜ } โŠ‚ R๐‘›

1. No vector in ๐ด can be formed as a linear combination of the other elements


2. If ๐›ฝ1 ๐‘Ž1 + โ‹ฏ ๐›ฝ๐‘˜ ๐‘Ž๐‘˜ = 0 for scalars ๐›ฝ1 , โ€ฆ , ๐›ฝ๐‘˜ , then ๐›ฝ1 = โ‹ฏ = ๐›ฝ๐‘˜ = 0

(The zero in the first expression is the origin of R๐‘› )

21.3.5 Unique Representations

Another nice thing about sets of linearly independent vectors is that each element in the span
has a unique representation as a linear combination of these vectors
In other words, if ๐ด โˆถ= {๐‘Ž1 , โ€ฆ , ๐‘Ž๐‘˜ } โŠ‚ R๐‘› is linearly independent and

๐‘ฆ = ๐›ฝ 1 ๐‘Ž1 + โ‹ฏ ๐›ฝ ๐‘˜ ๐‘Ž๐‘˜

then no other coefficient sequence ๐›พ1 , โ€ฆ , ๐›พ๐‘˜ will produce the same vector ๐‘ฆ


Indeed, if we also have ๐‘ฆ = ๐›พ1 ๐‘Ž1 + โ‹ฏ ๐›พ๐‘˜ ๐‘Ž๐‘˜ , then

(๐›ฝ1 โˆ’ ๐›พ1 )๐‘Ž1 + โ‹ฏ + (๐›ฝ๐‘˜ โˆ’ ๐›พ๐‘˜ )๐‘Ž๐‘˜ = 0

Linear independence now implies ๐›พ๐‘– = ๐›ฝ๐‘– for all ๐‘–

21.4 Matrices

Matrices are a neat way of organizing data for use in linear operations
21.4. MATRICES 345

An ๐‘› ร— ๐‘˜ matrix is a rectangular array ๐ด of numbers with ๐‘› rows and ๐‘˜ columns:

๐‘Ž11 ๐‘Ž12 โ‹ฏ ๐‘Ž1๐‘˜


โŽก๐‘Ž ๐‘Ž22 โ‹ฏ ๐‘Ž2๐‘˜ โŽค
๐ด = โŽข 21 โŽฅ
โŽข โ‹ฎ โ‹ฎ โ‹ฎ โŽฅ
โŽฃ๐‘Ž๐‘›1 ๐‘Ž๐‘›2 โ‹ฏ ๐‘Ž๐‘›๐‘˜ โŽฆ
Often, the numbers in the matrix represent coefficients in a system of linear equations, as dis-
cussed at the start of this lecture
For obvious reasons, the matrix ๐ด is also called a vector if either ๐‘› = 1 or ๐‘˜ = 1
In the former case, ๐ด is called a row vector, while in the latter it is called a column vector
If ๐‘› = ๐‘˜, then ๐ด is called square
The matrix formed by replacing ๐‘Ž๐‘–๐‘— by ๐‘Ž๐‘—๐‘– for every ๐‘– and ๐‘— is called the transpose of ๐ด and
denoted ๐ดโ€ฒ or ๐ดโŠค
If ๐ด = ๐ดโ€ฒ , then ๐ด is called symmetric
For a square matrix ๐ด, the ๐‘– elements of the form ๐‘Ž๐‘–๐‘– for ๐‘– = 1, โ€ฆ , ๐‘› are called the principal
diagonal
๐ด is called diagonal if the only nonzero entries are on the principal diagonal
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then
๐ด is called the identity matrix and denoted by ๐ผ

21.4.1 Matrix Operations

Just as was the case for vectors, a number of algebraic operations are defined for matrices
Scalar multiplication and addition are immediate generalizations of the vector case:

๐‘Ž11 โ‹ฏ ๐‘Ž1๐‘˜ ๐›พ๐‘Ž11 โ‹ฏ ๐›พ๐‘Ž1๐‘˜


๐›พ๐ด = ๐›พ โŽก
โŽข โ‹ฎ โ‹ฎ โ‹ฎ โŽค โˆถ= โŽก โ‹ฎ
โŽฅ โŽข โ‹ฎ โ‹ฎ โŽคโŽฅ
โŽฃ๐‘Ž๐‘›1 โ‹ฏ ๐‘Ž๐‘›๐‘˜ โŽฆ โŽฃ๐›พ๐‘Ž๐‘›1 โ‹ฏ ๐›พ๐‘Ž๐‘›๐‘˜ โŽฆ
and

๐‘Ž11 โ‹ฏ ๐‘Ž1๐‘˜ ๐‘11 โ‹ฏ ๐‘1๐‘˜ ๐‘Ž11 + ๐‘11 โ‹ฏ ๐‘Ž1๐‘˜ + ๐‘1๐‘˜


๐ด+๐ต = โŽก
โŽข โ‹ฎ โ‹ฎ โ‹ฎ โŽค+โŽก โ‹ฎ
โŽฅ โŽข โ‹ฎ โ‹ฎ โŽค โˆถ= โŽก
โŽฅ โŽข โ‹ฎ โ‹ฎ โ‹ฎ โŽค
โŽฅ
โŽฃ๐‘Ž๐‘›1 โ‹ฏ ๐‘Ž๐‘›๐‘˜ โŽฆ โŽฃ๐‘๐‘›1 โ‹ฏ ๐‘๐‘›๐‘˜ โŽฆ โŽฃ๐‘Ž๐‘›1 + ๐‘๐‘›1 โ‹ฏ ๐‘Ž๐‘›๐‘˜ + ๐‘๐‘›๐‘˜ โŽฆ
In the latter case, the matrices must have the same shape in order for the definition to make
sense
We also have a convention for multiplying two matrices
The rule for matrix multiplication generalizes the idea of inner products discussed above and
is designed to make multiplication play well with basic linear operations
If ๐ด and ๐ต are two matrices, then their product ๐ด๐ต is formed by taking as its ๐‘–, ๐‘—-th element
the inner product of the ๐‘–-th row of ๐ด and the ๐‘—-th column of ๐ต
There are many tutorials to help you visualize this operation, such as this one, or the discus-
sion on the Wikipedia page
346 21. LINEAR ALGEBRA

If ๐ด is ๐‘› ร— ๐‘˜ and ๐ต is ๐‘— ร— ๐‘š, then to multiply ๐ด and ๐ต we require ๐‘˜ = ๐‘—, and the resulting


matrix ๐ด๐ต is ๐‘› ร— ๐‘š
As perhaps the most important special case, consider multiplying ๐‘› ร— ๐‘˜ matrix ๐ด and ๐‘˜ ร— 1
column vector ๐‘ฅ
According to the preceding rule, this gives us an ๐‘› ร— 1 column vector

๐‘Ž11 โ‹ฏ ๐‘Ž1๐‘˜ ๐‘ฅ1 ๐‘Ž11 ๐‘ฅ1 + โ‹ฏ + ๐‘Ž1๐‘˜ ๐‘ฅ๐‘˜


๐ด๐‘ฅ = โŽก
โŽข โ‹ฎ โ‹ฎ โ‹ฎ โŽค โŽก โ‹ฎ โŽค โˆถ= โŽก
โŽฅโŽข โŽฅ โŽข โ‹ฎ โŽค
โŽฅ (2)
โŽฃ๐‘Ž๐‘›1 โ‹ฏ ๐‘Ž๐‘›๐‘˜ โŽฆ โŽฃ๐‘ฅ๐‘˜ โŽฆ โŽฃ๐‘Ž๐‘›1 ๐‘ฅ1 + โ‹ฏ + ๐‘Ž๐‘›๐‘˜ ๐‘ฅ๐‘˜ โŽฆ

Note
๐ด๐ต and ๐ต๐ด are not generally the same thing

Another important special case is the identity matrix


You should check that if ๐ด is ๐‘› ร— ๐‘˜ and ๐ผ is the ๐‘˜ ร— ๐‘˜ identity matrix, then ๐ด๐ผ = ๐ด
If ๐ผ is the ๐‘› ร— ๐‘› identity matrix, then ๐ผ๐ด = ๐ด

21.4.2 Matrices in NumPy

NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all
the standard matrix operations [1]
You can create them manually from tuples of tuples (or lists of lists) as follows

In [9]: A = ((1, 2),


(3, 4))

type(A)

Out[9]: tuple

In [10]: A = np.array(A)

type(A)

Out[10]: numpy.ndarray

In [11]: A.shape

Out[11]: (2, 2)

The shape attribute is a tuple giving the number of rows and columns โ€” see here for more
discussion
To get the transpose of A, use A.transpose() or, more simply, A.T
There are many convenient functions for creating common matrices (matrices of zeros, ones,
etc.) โ€” see here
Since operations are performed elementwise by default, scalar multiplication and addition
have very natural syntax
21.5. SOLVING SYSTEMS OF EQUATIONS 347

In [12]: A = np.identity(3)
B = np.ones((3, 3))
2 * A

Out[12]: array([[2., 0., 0.],


[0., 2., 0.],
[0., 0., 2.]])

In [13]: A + B

Out[13]: array([[2., 1., 1.],


[1., 2., 1.],
[1., 1., 2.]])

To multiply matrices we use the @ symbol


In particular, A @ B is matrix multiplication, whereas A * B is element-by-element multipli-
cation
See here for more discussion

21.4.3 Matrices as Maps

Each ๐‘› ร— ๐‘˜ matrix ๐ด can be identified with a function ๐‘“(๐‘ฅ) = ๐ด๐‘ฅ that maps ๐‘ฅ โˆˆ R๐‘˜ into
๐‘ฆ = ๐ด๐‘ฅ โˆˆ R๐‘›
These kinds of functions have a special property: they are linear
A function ๐‘“ โˆถ R๐‘˜ โ†’ R๐‘› is called linear if, for all ๐‘ฅ, ๐‘ฆ โˆˆ R๐‘˜ and all scalars ๐›ผ, ๐›ฝ, we have

๐‘“(๐›ผ๐‘ฅ + ๐›ฝ๐‘ฆ) = ๐›ผ๐‘“(๐‘ฅ) + ๐›ฝ๐‘“(๐‘ฆ)

You can check that this holds for the function ๐‘“(๐‘ฅ) = ๐ด๐‘ฅ + ๐‘ when ๐‘ is the zero vector and
fails when ๐‘ is nonzero
In fact, itโ€™s known that ๐‘“ is linear if and only if there exists a matrix ๐ด such that ๐‘“(๐‘ฅ) = ๐ด๐‘ฅ
for all ๐‘ฅ

21.5 Solving Systems of Equations

Recall again the system of equations Eq. (1)


If we compare Eq. (1) and Eq. (2), we see that Eq. (1) can now be written more conveniently
as

๐‘ฆ = ๐ด๐‘ฅ (3)

The problem we face is to determine a vector ๐‘ฅ โˆˆ R๐‘˜ that solves Eq. (3), taking ๐‘ฆ and ๐ด as
given
This is a special case of a more general problem: Find an ๐‘ฅ such that ๐‘ฆ = ๐‘“(๐‘ฅ)
Given an arbitrary function ๐‘“ and a ๐‘ฆ, is there always an ๐‘ฅ such that ๐‘ฆ = ๐‘“(๐‘ฅ)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
348 21. LINEAR ALGEBRA

In [14]: def f(x):


return 0.6 * np.cos(4 * x) + 1.4

xmin, xmax = -1, 1


x = np.linspace(xmin, xmax, 160)
y = f(x)
ya, yb = np.min(y), np.max(y)

fig, axes = plt.subplots(2, 1, figsize=(10, 10))

for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(ylim=(-0.6, 3.2), xlim=(xmin, xmax),


yticks=(), xticks=())

ax.plot(x, y, 'k-', lw=2, label='$f$')


ax.fill_between(x, ya, yb, facecolor='blue', alpha=0.05)
ax.vlines([0], ya, yb, lw=3, color='blue', label='range of $f$')
ax.text(0.04, -0.3, '$0$', fontsize=16)

ax = axes[0]

ax.legend(loc='upper right', frameon=False)


ybar = 1.5
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.05, 0.8 * ybar, '$y$', fontsize=16)
for i, z in enumerate((-0.35, 0.35)):
ax.vlines(z, 0, f(z), linestyle='--', alpha=0.5)
ax.text(z, -0.2, f'$x_{i}$', fontsize=16)

ax = axes[1]

ybar = 2.6
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.04, 0.91 * ybar, '$y$', fontsize=16)

plt.show()
21.5. SOLVING SYSTEMS OF EQUATIONS 349

In the first plot, there are multiple solutions, as the function is not one-to-one, while in the
second there are no solutions, since ๐‘ฆ lies outside the range of ๐‘“
Can we impose conditions on ๐ด in Eq. (3) that rule out these problems?
In this context, the most important thing to recognize about the expression ๐ด๐‘ฅ is that it cor-
responds to a linear combination of the columns of ๐ด
In particular, if ๐‘Ž1 , โ€ฆ , ๐‘Ž๐‘˜ are the columns of ๐ด, then

๐ด๐‘ฅ = ๐‘ฅ1 ๐‘Ž1 + โ‹ฏ + ๐‘ฅ๐‘˜ ๐‘Ž๐‘˜

Hence the range of ๐‘“(๐‘ฅ) = ๐ด๐‘ฅ is exactly the span of the columns of ๐ด


We want the range to be large so that it contains arbitrary ๐‘ฆ
As you might recall, the condition that we want for the span to be large is linear indepen-
dence
A happy fact is that linear independence of the columns of ๐ด also gives us uniqueness
Indeed, it follows from our earlier discussion that if {๐‘Ž1 , โ€ฆ , ๐‘Ž๐‘˜ } are linearly independent and
๐‘ฆ = ๐ด๐‘ฅ = ๐‘ฅ1 ๐‘Ž1 + โ‹ฏ + ๐‘ฅ๐‘˜ ๐‘Ž๐‘˜ , then no ๐‘ง โ‰  ๐‘ฅ satisfies ๐‘ฆ = ๐ด๐‘ง
350 21. LINEAR ALGEBRA

21.5.1 The Square Matrix Case

Letโ€™s discuss some more details, starting with the case where ๐ด is ๐‘› ร— ๐‘›
This is the familiar case where the number of unknowns equals the number of equations
For arbitrary ๐‘ฆ โˆˆ R๐‘› , we hope to find a unique ๐‘ฅ โˆˆ R๐‘› such that ๐‘ฆ = ๐ด๐‘ฅ
In view of the observations immediately above, if the columns of ๐ด are linearly independent,
then their span, and hence the range of ๐‘“(๐‘ฅ) = ๐ด๐‘ฅ, is all of R๐‘›
Hence there always exists an ๐‘ฅ such that ๐‘ฆ = ๐ด๐‘ฅ
Moreover, the solution is unique
In particular, the following are equivalent

1. The columns of ๐ด are linearly independent


2. For any ๐‘ฆ โˆˆ R๐‘› , the equation ๐‘ฆ = ๐ด๐‘ฅ has a unique solution

The property of having linearly independent columns is sometimes expressed as having full
column rank
Inverse Matrices
Can we give some sort of expression for the solution?
If ๐‘ฆ and ๐ด are scalar with ๐ด โ‰  0, then the solution is ๐‘ฅ = ๐ดโˆ’1 ๐‘ฆ
A similar expression is available in the matrix case
In particular, if square matrix ๐ด has full column rank, then it possesses a multiplicative in-
verse matrix ๐ดโˆ’1 , with the property that ๐ด๐ดโˆ’1 = ๐ดโˆ’1 ๐ด = ๐ผ
As a consequence, if we pre-multiply both sides of ๐‘ฆ = ๐ด๐‘ฅ by ๐ดโˆ’1 , we get ๐‘ฅ = ๐ดโˆ’1 ๐‘ฆ
This is the solution that weโ€™re looking for
Determinants
Another quick comment about square matrices is that to every such matrix we assign a
unique number called the determinant of the matrix โ€” you can find the expression for it here
If the determinant of ๐ด is not zero, then we say that ๐ด is nonsingular
Perhaps the most important fact about determinants is that ๐ด is nonsingular if and only if ๐ด
is of full column rank
This gives us a useful one-number summary of whether or not a square matrix can be in-
verted

21.5.2 More Rows than Columns

This is the ๐‘› ร— ๐‘˜ case with ๐‘› > ๐‘˜


This case is very important in many settings, not least in the setting of linear regression
(where ๐‘› is the number of observations, and ๐‘˜ is the number of explanatory variables)
Given arbitrary ๐‘ฆ โˆˆ R๐‘› , we seek an ๐‘ฅ โˆˆ R๐‘˜ such that ๐‘ฆ = ๐ด๐‘ฅ
In this setting, the existence of a solution is highly unlikely
21.5. SOLVING SYSTEMS OF EQUATIONS 351

Without much loss of generality, letโ€™s go over the intuition focusing on the case where the
columns of ๐ด are linearly independent
It follows that the span of the columns of ๐ด is a ๐‘˜-dimensional subspace of R๐‘›
This span is very โ€œunlikelyโ€ to contain arbitrary ๐‘ฆ โˆˆ R๐‘›
To see why, recall the figure above, where ๐‘˜ = 2 and ๐‘› = 3
Imagine an arbitrarily chosen ๐‘ฆ โˆˆ R3 , located somewhere in that three-dimensional space
Whatโ€™s the likelihood that ๐‘ฆ lies in the span of {๐‘Ž1 , ๐‘Ž2 } (i.e., the two dimensional plane
through these points)?
In a sense, it must be very small, since this plane has zero โ€œthicknessโ€
As a result, in the ๐‘› > ๐‘˜ case we usually give up on existence
However, we can still seek the best approximation, for example, an ๐‘ฅ that makes the distance
โ€–๐‘ฆ โˆ’ ๐ด๐‘ฅโ€– as small as possible
To solve this problem, one can use either calculus or the theory of orthogonal projections
The solution is known to be ๐‘ฅฬ‚ = (๐ดโ€ฒ ๐ด)โˆ’1 ๐ดโ€ฒ ๐‘ฆ โ€” see for example chapter 3 of these notes

21.5.3 More Columns than Rows

This is the ๐‘› ร— ๐‘˜ case with ๐‘› < ๐‘˜, so there are fewer equations than unknowns
In this case there are either no solutions or infinitely many โ€” in other words, uniqueness
never holds
For example, consider the case where ๐‘˜ = 3 and ๐‘› = 2
Thus, the columns of ๐ด consists of 3 vectors in R2
This set can never be linearly independent, since it is possible to find two vectors that span
R2
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two
For example, letโ€™s say that ๐‘Ž1 = ๐›ผ๐‘Ž2 + ๐›ฝ๐‘Ž3
Then if ๐‘ฆ = ๐ด๐‘ฅ = ๐‘ฅ1 ๐‘Ž1 + ๐‘ฅ2 ๐‘Ž2 + ๐‘ฅ3 ๐‘Ž3 , we can also write

๐‘ฆ = ๐‘ฅ1 (๐›ผ๐‘Ž2 + ๐›ฝ๐‘Ž3 ) + ๐‘ฅ2 ๐‘Ž2 + ๐‘ฅ3 ๐‘Ž3 = (๐‘ฅ1 ๐›ผ + ๐‘ฅ2 )๐‘Ž2 + (๐‘ฅ1 ๐›ฝ + ๐‘ฅ3 )๐‘Ž3

In other words, uniqueness fails

21.5.4 Linear Equations with SciPy

Hereโ€™s an illustration of how to solve linear equations with SciPyโ€™s linalg submodule
All of these routines are Python front ends to time-tested and highly optimized FORTRAN
code

In [15]: from scipy.linalg import inv, solve, det


352 21. LINEAR ALGEBRA

A = ((1, 2), (3, 4))


A = np.array(A)
y = np.ones((2, 1)) # Column vector
det(A) # Check that A is nonsingular, and hence invertible

Out[15]: -2.0

In [16]: A_inv = inv(A) # Compute the inverse


A_inv

Out[16]: array([[-2. , 1. ],
[ 1.5, -0.5]])

In [17]: x = A_inv @ y # Solution


A @ x # Should equal y

Out[17]: array([[1.],
[1.]])

In [18]: solve(A, y) # Produces the same solution

Out[18]: array([[-1.],
[ 1.]])

Observe how we can solve for ๐‘ฅ = ๐ดโˆ’1 ๐‘ฆ by either via inv(A) @ y, or using solve(A, y)
The latter method uses a different algorithm (LU decomposition) that is numerically more
stable, and hence should almost always be preferred
To obtain the least-squares solution ๐‘ฅฬ‚ = (๐ดโ€ฒ ๐ด)โˆ’1 ๐ดโ€ฒ ๐‘ฆ, use scipy.linalg.lstsq(A, y)

21.6 Eigenvalues and Eigenvectors

Let ๐ด be an ๐‘› ร— ๐‘› square matrix


If ๐œ† is scalar and ๐‘ฃ is a non-zero vector in R๐‘› such that

๐ด๐‘ฃ = ๐œ†๐‘ฃ

then we say that ๐œ† is an eigenvalue of ๐ด, and ๐‘ฃ is an eigenvector


Thus, an eigenvector of ๐ด is a vector such that when the map ๐‘“(๐‘ฅ) = ๐ด๐‘ฅ is applied, ๐‘ฃ is
merely scaled
The next figure shows two eigenvectors (blue arrows) and their images under ๐ด (red arrows)
As expected, the image ๐ด๐‘ฃ of each ๐‘ฃ is just a scaled version of the original

In [19]: from scipy.linalg import eig

A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evecs = evecs[:, 0], evecs[:, 1]

fig, ax = plt.subplots(figsize=(10, 8))


# Set the axes through the origin
21.6. EIGENVALUES AND EIGENVECTORS 353

for spine in ['left', 'bottom']:


ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.grid(alpha=0.4)

xmin, xmax = -3, 3


ymin, ymax = -3, 3
ax.set(xlim=(xmin, xmax), ylim=(ymin, ymax))

# Plot each eigenvector


for v in evecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the image of each eigenvector


for v in evecs:
v = A @ v
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the lines they run through


x = np.linspace(xmin, xmax, 3)
for v in evecs:
a = v[1] / v[0]
ax.plot(x, a * x, 'b-', lw=0.4)

plt.show()
354 21. LINEAR ALGEBRA

The eigenvalue equation is equivalent to (๐ด โˆ’ ๐œ†๐ผ)๐‘ฃ = 0, and this has a nonzero solution ๐‘ฃ only
when the columns of ๐ด โˆ’ ๐œ†๐ผ are linearly dependent
This in turn is equivalent to stating that the determinant is zero
Hence to find all eigenvalues, we can look for ๐œ† such that the determinant of ๐ด โˆ’ ๐œ†๐ผ is zero
This problem can be expressed as one of solving for the roots of a polynomial in ๐œ† of degree ๐‘›
This in turn implies the existence of ๐‘› solutions in the complex plane, although some might
be repeated
Some nice facts about the eigenvalues of a square matrix ๐ด are as follows

1. The determinant of ๐ด equals the product of the eigenvalues


2. The trace of ๐ด (the sum of the elements on the principal diagonal) equals the sum of
the eigenvalues
3. If ๐ด is symmetric, then all of its eigenvalues are real
4. If ๐ด is invertible and ๐œ†1 , โ€ฆ , ๐œ†๐‘› are its eigenvalues, then the eigenvalues of ๐ดโˆ’1 are
1/๐œ†1 , โ€ฆ , 1/๐œ†๐‘›

A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues
are nonzero
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows

In [20]: A = ((1, 2),


(2, 1))

A = np.array(A)
evals, evecs = eig(A)
evals

Out[20]: array([ 3.+0.j, -1.+0.j])

In [21]: evecs

Out[21]: array([[ 0.70710678, -0.70710678],


[ 0.70710678, 0.70710678]])

Note that the columns of evecs are the eigenvectors


Since any scalar multiple of an eigenvector is an eigenvector with the same eigenvalue (check
it), the eig routine normalizes the length of each eigenvector to one

21.6.1 Generalized Eigenvalues

It is sometimes useful to consider the generalized eigenvalue problem, which, for given matri-
ces ๐ด and ๐ต, seeks generalized eigenvalues ๐œ† and eigenvectors ๐‘ฃ such that

๐ด๐‘ฃ = ๐œ†๐ต๐‘ฃ

This can be solved in SciPy via scipy.linalg.eig(A, B)


Of course, if ๐ต is square and invertible, then we can treat the generalized eigenvalue problem
as an ordinary eigenvalue problem ๐ตโˆ’1 ๐ด๐‘ฃ = ๐œ†๐‘ฃ, but this is not always the case
21.7. FURTHER TOPICS 355

21.7 Further Topics

We round out our discussion by briefly mentioning several other important topics

21.7.1 Series Expansions

Recall the usual summation formula for a geometric progression, which states that if |๐‘Ž| < 1,
โˆž
then โˆ‘๐‘˜=0 ๐‘Ž๐‘˜ = (1 โˆ’ ๐‘Ž)โˆ’1
A generalization of this idea exists in the matrix setting
Matrix Norms
Let ๐ด be a square matrix, and let

โ€–๐ดโ€– โˆถ= max โ€–๐ด๐‘ฅโ€–


โ€–๐‘ฅโ€–=1

The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand
side is a matrix norm โ€” in this case, the so-called spectral norm
For example, for a square matrix ๐‘†, the condition โ€–๐‘†โ€– < 1 means that ๐‘† is contractive, in the
sense that it pulls all vectors towards the origin [2]
Neumannโ€™s Theorem
Let ๐ด be a square matrix and let ๐ด๐‘˜ โˆถ= ๐ด๐ด๐‘˜โˆ’1 with ๐ด1 โˆถ= ๐ด
In other words, ๐ด๐‘˜ is the ๐‘˜-th power of ๐ด
Neumannโ€™s theorem states the following: If โ€–๐ด๐‘˜ โ€– < 1 for some ๐‘˜ โˆˆ N, then ๐ผ โˆ’ ๐ด is invertible,
and

โˆž
(๐ผ โˆ’ ๐ด)โˆ’1 = โˆ‘ ๐ด๐‘˜ (4)
๐‘˜=0

Spectral Radius
A result known as Gelfandโ€™s formula tells us that, for any square matrix ๐ด,

๐œŒ(๐ด) = lim โ€–๐ด๐‘˜ โ€–1/๐‘˜


๐‘˜โ†’โˆž

Here ๐œŒ(๐ด) is the spectral radius, defined as max๐‘– |๐œ†๐‘– |, where {๐œ†๐‘– }๐‘– is the set of eigenvalues of ๐ด
As a consequence of Gelfandโ€™s formula, if all eigenvalues are strictly less than one in modulus,
there exists a ๐‘˜ with โ€–๐ด๐‘˜ โ€– < 1
In which case Eq. (4) is valid

21.7.2 Positive Definite Matrices

Let ๐ด be a symmetric ๐‘› ร— ๐‘› matrix


We say that ๐ด is
356 21. LINEAR ALGEBRA

1. positive definite if ๐‘ฅโ€ฒ ๐ด๐‘ฅ > 0 for every ๐‘ฅ โˆˆ R๐‘› {0}


2. positive semi-definite or nonnegative definite if ๐‘ฅโ€ฒ ๐ด๐‘ฅ โ‰ฅ 0 for every ๐‘ฅ โˆˆ R๐‘›

Analogous definitions exist for negative definite and negative semi-definite matrices
It is notable that if ๐ด is positive definite, then all of its eigenvalues are strictly positive, and
hence ๐ด is invertible (with positive definite inverse)

21.7.3 Differentiating Linear and Quadratic Forms

The following formulas are useful in many economic contexts. Let

โ€ข ๐‘ง, ๐‘ฅ and ๐‘Ž all be ๐‘› ร— 1 vectors


โ€ข ๐ด be an ๐‘› ร— ๐‘› matrix
โ€ข ๐ต be an ๐‘š ร— ๐‘› matrix and ๐‘ฆ be an ๐‘š ร— 1 vector

Then

๐œ•๐‘Žโ€ฒ ๐‘ฅ
1. ๐œ•๐‘ฅ = ๐‘Ž
๐œ•๐ด๐‘ฅ โ€ฒ
2. ๐œ•๐‘ฅ = ๐ด
โ€ฒ
๐œ•๐‘ฅ ๐ด๐‘ฅ
3. ๐œ•๐‘ฅ = (๐ด + ๐ดโ€ฒ )๐‘ฅ
๐œ•๐‘ฆโ€ฒ ๐ต๐‘ง
4. ๐œ•๐‘ฆ = ๐ต๐‘ง
๐œ•๐‘ฆโ€ฒ ๐ต๐‘ง โ€ฒ
5. ๐œ•๐ต = ๐‘ฆ๐‘ง

Exercise 1 below asks you to apply these formulas

21.7.4 Further Reading

The documentation of the scipy.linalg submodule can be found here


Chapters 2 and 3 of the Econometric Theory contains a discussion of linear algebra along the
same lines as above, with solved exercises
If you donโ€™t mind a slightly abstract approach, a nice intermediate-level text on linear algebra
is [69]

21.8 Exercises

21.8.1 Exercise 1

Let ๐‘ฅ be a given ๐‘› ร— 1 vector and consider the problem

๐‘ฃ(๐‘ฅ) = max {โˆ’๐‘ฆโ€ฒ ๐‘ƒ ๐‘ฆ โˆ’ ๐‘ขโ€ฒ ๐‘„๐‘ข}


๐‘ฆ,๐‘ข

subject to the linear constraint

๐‘ฆ = ๐ด๐‘ฅ + ๐ต๐‘ข

Here
21.9. SOLUTIONS 357

โ€ข ๐‘ƒ is an ๐‘› ร— ๐‘› matrix and ๐‘„ is an ๐‘š ร— ๐‘š matrix


โ€ข ๐ด is an ๐‘› ร— ๐‘› matrix and ๐ต is an ๐‘› ร— ๐‘š matrix
โ€ข both ๐‘ƒ and ๐‘„ are symmetric and positive semidefinite

(What must the dimensions of ๐‘ฆ and ๐‘ข be to make this a well-posed problem?)


One way to solve the problem is to form the Lagrangian

โ„’ = โˆ’๐‘ฆโ€ฒ ๐‘ƒ ๐‘ฆ โˆ’ ๐‘ขโ€ฒ ๐‘„๐‘ข + ๐œ†โ€ฒ [๐ด๐‘ฅ + ๐ต๐‘ข โˆ’ ๐‘ฆ]

where ๐œ† is an ๐‘› ร— 1 vector of Lagrange multipliers


Try applying the formulas given above for differentiating quadratic and linear forms to ob-
tain the first-order conditions for maximizing โ„’ with respect to ๐‘ฆ, ๐‘ข and minimizing it with
respect to ๐œ†
Show that these conditions imply that

1. ๐œ† = โˆ’2๐‘ƒ ๐‘ฆ
2. The optimizing choice of ๐‘ข satisfies ๐‘ข = โˆ’(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ
3. The function ๐‘ฃ satisfies ๐‘ฃ(๐‘ฅ) = โˆ’๐‘ฅโ€ฒ ๐‘ƒ ฬƒ ๐‘ฅ where ๐‘ƒ ฬƒ = ๐ดโ€ฒ ๐‘ƒ ๐ด โˆ’ ๐ดโ€ฒ ๐‘ƒ ๐ต(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด

As we will see, in economic contexts Lagrange multipliers often are shadow prices

Note
If we donโ€™t care about the Lagrange multipliers, we can substitute the constraint
into the objective function, and then just maximize โˆ’(๐ด๐‘ฅ + ๐ต๐‘ข)โ€ฒ ๐‘ƒ (๐ด๐‘ฅ + ๐ต๐‘ข) โˆ’
๐‘ขโ€ฒ ๐‘„๐‘ข with respect to ๐‘ข. You can verify that this leads to the same maximizer.

21.9 Solutions

21.9.1 Solution to Exercise 1

We have an optimization problem:

๐‘ฃ(๐‘ฅ) = max{โˆ’๐‘ฆโ€ฒ ๐‘ƒ ๐‘ฆ โˆ’ ๐‘ขโ€ฒ ๐‘„๐‘ข}


๐‘ฆ,๐‘ข

s.t.

๐‘ฆ = ๐ด๐‘ฅ + ๐ต๐‘ข

with primitives

โ€ข ๐‘ƒ be a symmetric and positive semidefinite ๐‘› ร— ๐‘› matrix


โ€ข ๐‘„ be a symmetric and positive semidefinite ๐‘š ร— ๐‘š matrix
โ€ข ๐ด an ๐‘› ร— ๐‘› matrix
โ€ข ๐ต an ๐‘› ร— ๐‘š matrix
358 21. LINEAR ALGEBRA

The associated Lagrangian is :

๐ฟ = โˆ’๐‘ฆโ€ฒ ๐‘ƒ ๐‘ฆ โˆ’ ๐‘ขโ€ฒ ๐‘„๐‘ข + ๐œ†โ€ฒ [๐ด๐‘ฅ + ๐ต๐‘ข โˆ’ ๐‘ฆ]

1.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields

๐œ•๐ฟ
= โˆ’(๐‘ƒ + ๐‘ƒ โ€ฒ )๐‘ฆ โˆ’ ๐œ† = โˆ’2๐‘ƒ ๐‘ฆ โˆ’ ๐œ† = 0 ,
๐œ•๐‘ฆ

since P is symmetric
Accordingly, the first-order condition for maximizing L w.r.t. y implies

๐œ† = โˆ’2๐‘ƒ ๐‘ฆ

2.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields

๐œ•๐ฟ
= โˆ’(๐‘„ + ๐‘„โ€ฒ )๐‘ข โˆ’ ๐ตโ€ฒ ๐œ† = โˆ’2๐‘„๐‘ข + ๐ตโ€ฒ ๐œ† = 0
๐œ•๐‘ข
Substituting ๐œ† = โˆ’2๐‘ƒ ๐‘ฆ gives

๐‘„๐‘ข + ๐ตโ€ฒ ๐‘ƒ ๐‘ฆ = 0

Substituting the linear constraint ๐‘ฆ = ๐ด๐‘ฅ + ๐ต๐‘ข into above equation gives

๐‘„๐‘ข + ๐ตโ€ฒ ๐‘ƒ (๐ด๐‘ฅ + ๐ต๐‘ข) = 0

(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)๐‘ข + ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ = 0

which is the first-order condition for maximizing L w.r.t. u


Thus, the optimal choice of u must satisfy

๐‘ข = โˆ’(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ ,

which follows from the definition of the first-order conditions for Lagrangian equation
3.
Rewriting our problem by substituting the constraint into the objective function, we get

๐‘ฃ(๐‘ฅ) = max{โˆ’(๐ด๐‘ฅ + ๐ต๐‘ข)โ€ฒ ๐‘ƒ (๐ด๐‘ฅ + ๐ต๐‘ข) โˆ’ ๐‘ขโ€ฒ ๐‘„๐‘ข}


๐‘ข

Since we know the optimal choice of u satisfies ๐‘ข = โˆ’(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ, then

๐‘ฃ(๐‘ฅ) = โˆ’(๐ด๐‘ฅ + ๐ต๐‘ข)โ€ฒ ๐‘ƒ (๐ด๐‘ฅ + ๐ต๐‘ข) โˆ’ ๐‘ขโ€ฒ ๐‘„๐‘ข ๐‘ค๐‘–๐‘กโ„Ž ๐‘ข = โˆ’(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ


21.9. SOLUTIONS 359

To evaluate the function

๐‘ฃ(๐‘ฅ) = โˆ’(๐ด๐‘ฅ + ๐ต๐‘ข)โ€ฒ ๐‘ƒ (๐ด๐‘ฅ + ๐ต๐‘ข) โˆ’ ๐‘ขโ€ฒ ๐‘„๐‘ข


= โˆ’(๐‘ฅโ€ฒ ๐ดโ€ฒ + ๐‘ขโ€ฒ ๐ตโ€ฒ )๐‘ƒ (๐ด๐‘ฅ + ๐ต๐‘ข) โˆ’ ๐‘ขโ€ฒ ๐‘„๐‘ข
= โˆ’๐‘ฅโ€ฒ ๐ดโ€ฒ ๐‘ƒ ๐ด๐‘ฅ โˆ’ ๐‘ขโ€ฒ ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ โˆ’ ๐‘ฅโ€ฒ ๐ดโ€ฒ ๐‘ƒ ๐ต๐‘ข โˆ’ ๐‘ขโ€ฒ ๐ตโ€ฒ ๐‘ƒ ๐ต๐‘ข โˆ’ ๐‘ขโ€ฒ ๐‘„๐‘ข
= โˆ’๐‘ฅโ€ฒ ๐ดโ€ฒ ๐‘ƒ ๐ด๐‘ฅ โˆ’ 2๐‘ขโ€ฒ ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ โˆ’ ๐‘ขโ€ฒ (๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)๐‘ข

For simplicity, denote by ๐‘† โˆถ= (๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด, then $u = -Sx$


Regarding the second term โˆ’2๐‘ขโ€ฒ ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ,

โˆ’2๐‘ขโ€ฒ ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ = โˆ’2๐‘ฅโ€ฒ ๐‘† โ€ฒ ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ
= 2๐‘ฅโ€ฒ ๐ดโ€ฒ ๐‘ƒ ๐ต(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ

Notice that the term (๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 is symmetric as both P and Q are symmetric
Regarding the third term โˆ’๐‘ขโ€ฒ (๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)๐‘ข,

โˆ’๐‘ขโ€ฒ (๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)๐‘ข = โˆ’๐‘ฅโ€ฒ ๐‘† โ€ฒ (๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)๐‘†๐‘ฅ


= โˆ’๐‘ฅโ€ฒ ๐ดโ€ฒ ๐‘ƒ ๐ต(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ

Hence, the summation of second and third terms is ๐‘ฅโ€ฒ ๐ดโ€ฒ ๐‘ƒ ๐ต(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ


This implies that

๐‘ฃ(๐‘ฅ) = โˆ’๐‘ฅโ€ฒ ๐ดโ€ฒ ๐‘ƒ ๐ด๐‘ฅ โˆ’ 2๐‘ขโ€ฒ ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ โˆ’ ๐‘ขโ€ฒ (๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)๐‘ข


= โˆ’๐‘ฅโ€ฒ ๐ดโ€ฒ ๐‘ƒ ๐ด๐‘ฅ + ๐‘ฅโ€ฒ ๐ดโ€ฒ ๐‘ƒ ๐ต(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด๐‘ฅ
= โˆ’๐‘ฅโ€ฒ [๐ดโ€ฒ ๐‘ƒ ๐ด โˆ’ ๐ดโ€ฒ ๐‘ƒ ๐ต(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด]๐‘ฅ

Therefore, the solution to the optimization problem ๐‘ฃ(๐‘ฅ) = โˆ’๐‘ฅโ€ฒ ๐‘ƒ ฬƒ ๐‘ฅ follows the above result by
denoting ๐‘ƒ ฬƒ โˆถ= ๐ดโ€ฒ ๐‘ƒ ๐ด โˆ’ ๐ดโ€ฒ ๐‘ƒ ๐ต(๐‘„ + ๐ตโ€ฒ ๐‘ƒ ๐ต)โˆ’1 ๐ตโ€ฒ ๐‘ƒ ๐ด
Footnotes
[1] Although there is a specialized matrix data type defined in NumPy, itโ€™s more standard to
work with ordinary NumPy arrays. See this discussion.
[2] Suppose that โ€–๐‘†โ€– < 1. Take any nonzero vector ๐‘ฅ, and let ๐‘Ÿ โˆถ= โ€–๐‘ฅโ€–. We have โ€–๐‘†๐‘ฅโ€– =
๐‘Ÿโ€–๐‘†(๐‘ฅ/๐‘Ÿ)โ€– โ‰ค ๐‘Ÿโ€–๐‘†โ€– < ๐‘Ÿ = โ€–๐‘ฅโ€–. Hence every point is pulled towards the origin.
360 21. LINEAR ALGEBRA
22

Complex Numbers and Trignometry

22.1 Contents

โ€ข Overview 22.2

โ€ข De Moivreโ€™s Theorem 22.3

โ€ข Applications of de Moivreโ€™s Theorem 22.4

22.2 Overview

This lecture introduces some elementary mathematics and trigonometry


Useful and interesting in its own right, these concepts reap substantial rewards when studying
dynamics generated by linear difference equations or linear differential equations
For example, these tools are keys to understanding outcomes attained by Paul Samuelson
(1939) [115] in his classic paper on interactions between the investment accelerator and the
Keynesian consumption function, our topic in the lecture Samuelson Multiplier Accelerator
In addition to providing foundations for Samuelsonโ€™s work and extensions of it, this lec-
ture can be read as a stand-alone quick reminder of key results from elementary high school
trigonometry
So letโ€™s dive in

22.2.1 Complex Numbers

A complex number has a real part ๐‘ฅ and a purely imaginary part ๐‘ฆ


The Euclidean, polar, and trigonometric forms of a complex number ๐‘ง are:

๐‘ง = ๐‘ฅ + ๐‘–๐‘ฆ = ๐‘Ÿ๐‘’๐‘–๐œƒ = ๐‘Ÿ(cos ๐œƒ + ๐‘– sin ๐œƒ)

The second equality above is known as called Eulerโ€™s formula

โ€ข Euler contributed many other formulas too!

361
362 22. COMPLEX NUMBERS AND TRIGNOMETRY

The complex conjugate ๐‘ง ฬ„ of ๐‘ง is defined as

๐‘ง ฬ„ = ๐‘Ÿ๐‘’โˆ’๐‘–๐œƒ = ๐‘Ÿ(cos ๐œƒ โˆ’ ๐‘– sin ๐œƒ)

The value ๐‘ฅ is the real part of ๐‘ง and ๐‘ฆ is the imaginary part of ๐‘ง


The symbol |๐‘ง| = ๐‘ง๐‘งฬ„ = ๐‘Ÿ represents the modulus of ๐‘ง
The value ๐‘Ÿ is the Euclidean distance of vector (๐‘ฅ, ๐‘ฆ) from the origin:

๐‘Ÿ = |๐‘ง| = โˆš๐‘ฅ2 + ๐‘ฆ2

The value ๐œƒ is the angle of (๐‘ฅ, ๐‘ฆ) with respect to the real axis
Evidently, the tangent of ๐œƒ is ( ๐‘ฅ๐‘ฆ )
Therefore,

๐‘ฆ
๐œƒ = tanโˆ’1 ( )
๐‘ฅ

Three elementary trigonometric functions are

๐‘ฅ ๐‘’๐‘–๐œƒ + ๐‘’โˆ’๐‘–๐œƒ ๐‘ฆ ๐‘’๐‘–๐œƒ โˆ’ ๐‘’โˆ’๐‘–๐œƒ ๐‘ฅ


cos ๐œƒ = = , sin ๐œƒ = = , tan ๐œƒ =
๐‘Ÿ 2 ๐‘Ÿ 2๐‘– ๐‘ฆ

Weโ€™ll need the following imports

In [1]: import numpy as np


import matplotlib.pyplot as plt
%matplotlib inline

22.2.2 An Example
โˆš
Consider the complex number ๐‘ง = 1 + 3๐‘–
โˆš โˆš
For ๐‘ง = 1 + 3๐‘–, ๐‘ฅ = 1, ๐‘ฆ = 3
โˆš
It follows that ๐‘Ÿ = 2 and ๐œƒ = tanโˆ’1 ( 3) = ๐œ‹3 = 60๐‘œ
โˆš
Letโ€™s use Python to plot the trigonometric form of the complex number ๐‘ง = 1 + 3๐‘–

In [2]: # Abbreviate useful values and functions


ฯ€ = np.pi
zeros = np.zeros
ones = np.ones

# Set parameters
r = 2
ฮธ = ฯ€/3
x = r * np.cos(ฮธ)
x_range = np.linspace(0, x, 1000)
ฮธ_range = np.linspace(0, ฮธ, 1000)

# Plot
fig = plt.figure(figsize=(8, 8))
ax = plt.subplot(111, projection='polar')

ax.plot((0, ฮธ), (0, r), marker='o', color='b') # plot r


22.2. OVERVIEW 363

ax.plot(zeros(x_range.shape), x_range, color='b') # plot x


ax.plot(ฮธ_range, x / np.cos(ฮธ_range), color='b') # plot y
ax.plot(ฮธ_range, ones(ฮธ_range.shape) * 0.1, color='r') # plot ฮธ

ax.margins(0) # Let the plot starts at origin

ax.set_title("Trigonometry of complex numbers", va='bottom', fontsize='x-large')

ax.set_rmax(2)
ax.set_rticks((0.5, 1, 1.5, 2)) # less radial ticks
ax.set_rlabel_position(-88.5) # get radial labels away from plotted line

ax.text(ฮธ, r+0.01 , r'$z = x + iy = 1 + \sqrt{3}\, i$') # label z


ax.text(ฮธ+0.2, 1 , '$r = 2$') # label r
ax.text(0-0.2, 0.5, '$x = 1$') # label x
ax.text(0.5, 1.2, r'$y = \sqrt{3}$') # label y
ax.text(0.25, 0.15, r'$\theta = 60^o$') # label ฮธ

ax.grid(True)
plt.show()
364 22. COMPLEX NUMBERS AND TRIGNOMETRY

22.3 De Moivreโ€™s Theorem

de Moivreโ€™s theorem states that:

(๐‘Ÿ(cos ๐œƒ + ๐‘– sin ๐œƒ))๐‘› = ๐‘Ÿ๐‘› ๐‘’๐‘–๐‘›๐œƒ = ๐‘Ÿ๐‘› (cos ๐‘›๐œƒ + ๐‘– sin ๐‘›๐œƒ)

To prove de Moivreโ€™s theorem, note that

๐‘›
(๐‘Ÿ(cos ๐œƒ + ๐‘– sin ๐œƒ))๐‘› = (๐‘Ÿ๐‘’๐‘–๐œƒ )

and compute

22.4 Applications of de Moivreโ€™s Theorem

22.4.1 Example 1

We can use de Moivreโ€™s theorem to show that ๐‘Ÿ = โˆš๐‘ฅ2 + ๐‘ฆ2


We have

1 = ๐‘’๐‘–๐œƒ ๐‘’โˆ’๐‘–๐œƒ
= (cos ๐œƒ + ๐‘– sin ๐œƒ)(cos (-๐œƒ) + ๐‘– sin (-๐œƒ))
= (cos ๐œƒ + ๐‘– sin ๐œƒ)(cos ๐œƒ โˆ’ ๐‘– sin ๐œƒ)
= cos2 ๐œƒ + sin2 ๐œƒ
๐‘ฅ2 ๐‘ฆ2
= + 2
๐‘Ÿ2 ๐‘Ÿ

and thus

๐‘ฅ2 + ๐‘ฆ2 = ๐‘Ÿ2

We recognize this as a theorem of Pythagoras

22.4.2 Example 2

Let ๐‘ง = ๐‘Ÿ๐‘’๐‘–๐œƒ and ๐‘ง ฬ„ = ๐‘Ÿ๐‘’โˆ’๐‘–๐œƒ so that ๐‘ง ฬ„ is the complex conjugate of ๐‘ง


(๐‘ง, ๐‘ง)ฬ„ form a complex conjugate pair of complex numbers
Let ๐‘Ž = ๐‘๐‘’๐‘–๐œ” and ๐‘Žฬ„ = ๐‘๐‘’โˆ’๐‘–๐œ” be another complex conjugate pair
For each element of a sequence of integers ๐‘› = 0, 1, 2, โ€ฆ ,
To do so, we can apply de Moivreโ€™s formula
Thus,
22.4. APPLICATIONS OF DE MOIVREโ€™S THEOREM 365

๐‘ฅ๐‘› = ๐‘Ž๐‘ง ๐‘› + ๐‘Ž๐‘งฬ„ ๐‘›ฬ„
= ๐‘๐‘’๐‘–๐œ” (๐‘Ÿ๐‘’๐‘–๐œƒ )๐‘› + ๐‘๐‘’โˆ’๐‘–๐œ” (๐‘Ÿ๐‘’โˆ’๐‘–๐œƒ )๐‘›
= ๐‘๐‘Ÿ๐‘› ๐‘’๐‘–(๐œ”+๐‘›๐œƒ) + ๐‘๐‘Ÿ๐‘› ๐‘’โˆ’๐‘–(๐œ”+๐‘›๐œƒ)
= ๐‘๐‘Ÿ๐‘› [cos (๐œ” + ๐‘›๐œƒ) + ๐‘– sin (๐œ” + ๐‘›๐œƒ) + cos (๐œ” + ๐‘›๐œƒ) โˆ’ ๐‘– sin (๐œ” + ๐‘›๐œƒ)]
= 2๐‘๐‘Ÿ๐‘› cos (๐œ” + ๐‘›๐œƒ)

22.4.3 Example 3

This example provides machinery that is at the heard of Samuelsonโ€™s analysis of his
multiplier-accelerator model [115]
Thus, consider a second-order linear difference equation

๐‘ฅ๐‘›+2 = ๐‘1 ๐‘ฅ๐‘›+1 + ๐‘2 ๐‘ฅ๐‘›

whose characteristic polynomial is

๐‘ง 2 โˆ’ ๐‘1 ๐‘ง โˆ’ ๐‘ 2 = 0

or

(๐‘ง2 โˆ’ ๐‘1 ๐‘ง โˆ’ ๐‘2 ) = (๐‘ง โˆ’ ๐‘ง1 )(๐‘ง โˆ’ ๐‘ง2 ) = 0

has roots ๐‘ง1 , ๐‘ง1
A solution is a sequence {๐‘ฅ๐‘› }โˆž
๐‘›=0 that satisfies the difference equation

Under the following circumstances, we can apply our example 2 formula to solve the differ-
ence equation

โ€ข the roots ๐‘ง1 , ๐‘ง2 of the characteristic polynomial of the difference equation form a com-
plex conjugate pair
โ€ข the values ๐‘ฅ0 , ๐‘ฅ1 are given initial conditions

To solve the difference equation, recall from example 2 that

๐‘ฅ๐‘› = 2๐‘๐‘Ÿ๐‘› cos (๐œ” + ๐‘›๐œƒ)

where ๐œ”, ๐‘ are coefficients to be determined from information encoded in the initial conditions
๐‘ฅ1 , ๐‘ฅ0
Since ๐‘ฅ0 = 2๐‘ cos ๐œ” and ๐‘ฅ1 = 2๐‘๐‘Ÿ cos (๐œ” + ๐œƒ) the ratio of ๐‘ฅ1 to ๐‘ฅ0 is

๐‘ฅ1 ๐‘Ÿ cos (๐œ” + ๐œƒ)
=
๐‘ฅ0 cos ๐œ”

We can solve this equation for ๐œ” then solve for ๐‘ using ๐‘ฅ0 = 2๐‘๐‘Ÿ0 cos (๐œ” + ๐‘›๐œƒ)
With the sympy package in Python, we are able to solve and plot the dynamics of ๐‘ฅ๐‘› given
different values of ๐‘›
366 22. COMPLEX NUMBERS AND TRIGNOMETRY
โˆš โˆš
In this example, we set the initial values: - ๐‘Ÿ = 0.9 - ๐œƒ = 14 ๐œ‹ - ๐‘ฅ0 = 4 - ๐‘ฅ1 = ๐‘Ÿ โ‹… 2 2 = 1.8 2
We first numerically solve for ๐œ” and ๐‘ using nsolve in the sympy package based on the
above initial condition:

In [3]: from sympy import *

# Set parameters
r = 0.9
ฮธ = ฯ€/4
x0 = 4
x1 = 2 * r * sqrt(2)

# Define symbols to be calculated


ฯ‰, p = symbols('ฯ‰ p', real=True)

# Solve for ฯ‰
## Note: we choose the solution near 0
eq1 = Eq(x1/x0 - r * cos(ฯ‰+ฮธ) / cos(ฯ‰))
ฯ‰ = nsolve(eq1, ฯ‰, 0)
ฯ‰ = np.float(ฯ‰)
print(f'ฯ‰ = {ฯ‰:1.3f}')

# Solve for p
eq2 = Eq(x0 - 2 * p * cos(ฯ‰))
p = nsolve(eq2, p, 0)
p = np.float(p)
print(f'p = {p:1.3f}')

ฯ‰ = 0.000
p = 2.000

Using the code above, we compute that ๐œ” = 0 and ๐‘ = 2


Then we plug in the values we solve for ๐œ” and ๐‘ and plot the dynamic

In [4]: # Define range of n


max_n = 30
n = np.arange(0, max_n+1, 0.01)

# Define x_n
x = lambda n: 2 * p * r**n * np.cos(ฯ‰ + n * ฮธ)

# Plot
fig, ax = plt.subplots(figsize=(12, 8))

ax.plot(n, x(n))
ax.set(xlim=(0, max_n), ylim=(-5, 5), xlabel='$n$', ylabel='$x_n$')

ax.spines['bottom'].set_position('center') # Set x-axis in the middle of the plot


ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')

ticklab = ax.xaxis.get_ticklabels()[0] # Set x-label position


trans = ticklab.get_transform()
ax.xaxis.set_label_coords(31, 0, transform=trans)

ticklab = ax.yaxis.get_ticklabels()[0] # Set y-label position


trans = ticklab.get_transform()
ax.yaxis.set_label_coords(0, 5, transform=trans)

ax.grid()
plt.show()
22.4. APPLICATIONS OF DE MOIVREโ€™S THEOREM 367

22.4.4 Trigonometric Identities

We can obtain a complete suite of trigonometric identities by appropriately manipulating po-


lar forms of complex numbers
Weโ€™ll get many of them by deducing implications of the equality

๐‘’๐‘–(๐œ”+๐œƒ) = ๐‘’๐‘–๐œ” ๐‘’๐‘–๐œƒ

For example, weโ€™ll calculate identities for


cos (๐œ” + ๐œƒ) and sin (๐œ” + ๐œƒ)
Using the sine and cosine formulas presented at the beginning of this lecture, we have:

๐‘’๐‘–(๐œ”+๐œƒ) + ๐‘’โˆ’๐‘–(๐œ”+๐œƒ)
cos (๐œ” + ๐œƒ) =
2
๐‘’๐‘–(๐œ”+๐œƒ) โˆ’ ๐‘’โˆ’๐‘–(๐œ”+๐œƒ)
sin (๐œ” + ๐œƒ) =
2๐‘–

We can also obtain the trigonometric identities as follows:

cos (๐œ” + ๐œƒ) + ๐‘– sin (๐œ” + ๐œƒ) = ๐‘’๐‘–(๐œ”+๐œƒ)


= ๐‘’๐‘–๐œ” ๐‘’๐‘–๐œƒ
= (cos ๐œ” + ๐‘– sin ๐œ”)(cos ๐œƒ + ๐‘– sin ๐œƒ)
= (cos ๐œ” cos ๐œƒ โˆ’ sin ๐œ” sin ๐œƒ) + ๐‘–(cos ๐œ” sin ๐œƒ + sin ๐œ” cos ๐œƒ)

Since both real and imaginary parts of the above formula should be equal, we get:
368 22. COMPLEX NUMBERS AND TRIGNOMETRY

cos (๐œ” + ๐œƒ) = cos ๐œ” cos ๐œƒ โˆ’ sin ๐œ” sin ๐œƒ


sin (๐œ” + ๐œƒ) = cos ๐œ” sin ๐œƒ + sin ๐œ” cos ๐œƒ

The equations above are also known as the angle sum identities. We can verify the equa-
tions using the simplify function in the sympy package:

In [5]: # Define symbols


ฯ‰, ฮธ = symbols('ฯ‰ ฮธ', real=True)

# Verify
print("cos(ฯ‰)cos(ฮธ) - sin(ฯ‰)sin(ฮธ) =", simplify(cos(ฯ‰)*cos(ฮธ) - sin(ฯ‰) * sin(ฮธ)))
print("cos(ฯ‰)sin(ฮธ) + sin(ฯ‰)cos(ฮธ) =", simplify(cos(ฯ‰)*sin(ฮธ) + sin(ฯ‰) * cos(ฮธ)))

cos(ฯ‰)cos(ฮธ) - sin(ฯ‰)sin(ฮธ) = cos(ฮธ + ฯ‰)


cos(ฯ‰)sin(ฮธ) + sin(ฯ‰)cos(ฮธ) = sin(ฮธ + ฯ‰)

22.4.5 Trigonometric Integrals

We can also compute the trigonometric integrals using polar forms of complex numbers
For example, we want to solve the following integral:

๐œ‹
โˆซ cos(๐œ”) sin(๐œ”) ๐‘‘๐œ”
โˆ’๐œ‹

Using Eulerโ€™s formula, we have:

(๐‘’๐‘–๐œ” + ๐‘’โˆ’๐‘–๐œ” ) (๐‘’๐‘–๐œ” โˆ’ ๐‘’โˆ’๐‘–๐œ” )


โˆซ cos(๐œ”) sin(๐œ”) ๐‘‘๐œ” = โˆซ ๐‘‘๐œ”
2 2๐‘–
1
= โˆซ ๐‘’2๐‘–๐œ” โˆ’ ๐‘’โˆ’2๐‘–๐œ” ๐‘‘๐œ”
4๐‘–
1 โˆ’๐‘– ๐‘–
= ( ๐‘’2๐‘–๐œ” โˆ’ ๐‘’โˆ’2๐‘–๐œ” + ๐ถ1 )
4๐‘– 2 2
2 2
1
= โˆ’ [(๐‘’๐‘–๐œ” ) + (๐‘’โˆ’๐‘–๐œ” ) โˆ’ 2] + ๐ถ2
8
1
= โˆ’ (๐‘’๐‘–๐œ” โˆ’ ๐‘’โˆ’๐‘–๐œ” )2 + ๐ถ2
8
2
1 ๐‘’๐‘–๐œ” โˆ’ ๐‘’โˆ’๐‘–๐œ”
= ( ) + ๐ถ2
2 2๐‘–
1
= sin2 (๐œ”) + ๐ถ2
2

and thus:

๐œ‹
1 1
โˆซ cos(๐œ”) sin(๐œ”) ๐‘‘๐œ” = sin2 (๐œ‹) โˆ’ sin2 (โˆ’๐œ‹) = 0
โˆ’๐œ‹ 2 2

We can verify the analytical as well as numerical results using integrate in the sympy
package:
22.4. APPLICATIONS OF DE MOIVREโ€™S THEOREM 369

In [6]: # Set initial printing


init_printing()

ฯ‰ = Symbol('ฯ‰')
print('The analytical solution for integral of cos(ฯ‰)sin(ฯ‰) is:')
integrate(cos(ฯ‰) * sin(ฯ‰), ฯ‰)

The analytical solution for integral of cos(ฯ‰)sin(ฯ‰) is:

Out[6]:

sin2 (๐œ”)
2

In [7]: print('The numerical solution for the integral of cos(ฯ‰)sin(ฯ‰) from -ฯ€ to ฯ€ is:')
integrate(cos(ฯ‰) * sin(ฯ‰), (ฯ‰, -ฯ€, ฯ€))

The numerical solution for the integral of cos(ฯ‰)sin(ฯ‰) from -ฯ€ to ฯ€ is:

Out[7]:

0
370 22. COMPLEX NUMBERS AND TRIGNOMETRY
23

Orthogonal Projections and Their


Applications

23.1 Contents

โ€ข Overview 23.2

โ€ข Key Definitions 23.3

โ€ข The Orthogonal Projection Theorem 23.4

โ€ข Orthonormal Basis 23.5

โ€ข Projection Using Matrix Algebra 23.6

โ€ข Least Squares Regression 23.7

โ€ข Orthogonalization and Decomposition 23.8

โ€ข Exercises 23.9

โ€ข Solutions 23.10

23.2 Overview

Orthogonal projection is a cornerstone of vector space methods, with many diverse applica-
tions
These include, but are not limited to,

โ€ข Least squares projection, also known as linear regression


โ€ข Conditional expectations for multivariate normal (Gaussian) distributions
โ€ข Gramโ€“Schmidt orthogonalization
โ€ข QR decomposition
โ€ข Orthogonal polynomials
โ€ข etc

In this lecture, we focus on

371
372 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

โ€ข key ideas
โ€ข least squares regression

23.2.1 Further Reading

For background and foundational concepts, see our lecture on linear algebra
For more proofs and greater theoretical detail, see A Primer in Econometric Theory
For a complete set of proofs in a general setting, see, for example, [109]
For an advanced treatment of projection in the context of least squares prediction, see this
book chapter

23.3 Key Definitions

Assume ๐‘ฅ, ๐‘ง โˆˆ R๐‘›
Define โŸจ๐‘ฅ, ๐‘งโŸฉ = โˆ‘๐‘– ๐‘ฅ๐‘– ๐‘ง๐‘–
Recall โ€–๐‘ฅโ€–2 = โŸจ๐‘ฅ, ๐‘ฅโŸฉ
The law of cosines states that โŸจ๐‘ฅ, ๐‘งโŸฉ = โ€–๐‘ฅโ€–โ€–๐‘งโ€– cos(๐œƒ) where ๐œƒ is the angle between the vectors
๐‘ฅ and ๐‘ง
When โŸจ๐‘ฅ, ๐‘งโŸฉ = 0, then cos(๐œƒ) = 0 and ๐‘ฅ and ๐‘ง are said to be orthogonal and we write ๐‘ฅ โŸ‚ ๐‘ง

For a linear subspace ๐‘† โŠ‚ R๐‘› , we call ๐‘ฅ โˆˆ R๐‘› orthogonal to ๐‘† if ๐‘ฅ โŸ‚ ๐‘ง for all ๐‘ง โˆˆ ๐‘†, and


write ๐‘ฅ โŸ‚ ๐‘†
23.3. KEY DEFINITIONS 373

The orthogonal complement of linear subspace ๐‘† โŠ‚ R๐‘› is the set ๐‘† โŸ‚ โˆถ= {๐‘ฅ โˆˆ R๐‘› โˆถ ๐‘ฅ โŸ‚ ๐‘†}

๐‘† โŸ‚ is a linear subspace of R๐‘›

โ€ข To see this, fix ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘† โŸ‚ and ๐›ผ, ๐›ฝ โˆˆ R


โ€ข Observe that if ๐‘ง โˆˆ ๐‘†, then

โŸจ๐›ผ๐‘ฅ + ๐›ฝ๐‘ฆ, ๐‘งโŸฉ = ๐›ผโŸจ๐‘ฅ, ๐‘งโŸฉ + ๐›ฝโŸจ๐‘ฆ, ๐‘งโŸฉ = ๐›ผ ร— 0 + ๐›ฝ ร— 0 = 0

โ€ข Hence ๐›ผ๐‘ฅ + ๐›ฝ๐‘ฆ โˆˆ ๐‘† โŸ‚ , as was to be shown


374 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

A set of vectors {๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘˜ } โŠ‚ R๐‘› is called an orthogonal set if ๐‘ฅ๐‘– โŸ‚ ๐‘ฅ๐‘— whenever ๐‘– โ‰  ๐‘—


If {๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘˜ } is an orthogonal set, then the Pythagorean Law states that

โ€–๐‘ฅ1 + โ‹ฏ + ๐‘ฅ๐‘˜ โ€–2 = โ€–๐‘ฅ1 โ€–2 + โ‹ฏ + โ€–๐‘ฅ๐‘˜ โ€–2

For example, when ๐‘˜ = 2, ๐‘ฅ1 โŸ‚ ๐‘ฅ2 implies

โ€–๐‘ฅ1 + ๐‘ฅ2 โ€–2 = โŸจ๐‘ฅ1 + ๐‘ฅ2 , ๐‘ฅ1 + ๐‘ฅ2 โŸฉ = โŸจ๐‘ฅ1 , ๐‘ฅ1 โŸฉ + 2โŸจ๐‘ฅ2 , ๐‘ฅ1 โŸฉ + โŸจ๐‘ฅ2 , ๐‘ฅ2 โŸฉ = โ€–๐‘ฅ1 โ€–2 + โ€–๐‘ฅ2 โ€–2

23.3.1 Linear Independence vs Orthogonality

If ๐‘‹ โŠ‚ R๐‘› is an orthogonal set and 0 โˆ‰ ๐‘‹, then ๐‘‹ is linearly independent


Proving this is a nice exercise
While the converse is not true, a kind of partial converse holds, as weโ€™ll see below

23.4 The Orthogonal Projection Theorem

What vector within a linear subspace of R๐‘› best approximates a given vector in R๐‘› ?


The next theorem provides answer to this question
Theorem (OPT) Given ๐‘ฆ โˆˆ R๐‘› and linear subspace ๐‘† โŠ‚ R๐‘› , there exists a unique solution
to the minimization problem

๐‘ฆ ฬ‚ โˆถ= min โ€–๐‘ฆ โˆ’ ๐‘งโ€–
๐‘งโˆˆ๐‘†

The minimizer ๐‘ฆ ฬ‚ is the unique vector in R๐‘› that satisfies

โ€ข ๐‘ฆฬ‚ โˆˆ ๐‘†
โ€ข ๐‘ฆ โˆ’ ๐‘ฆฬ‚ โŸ‚ ๐‘†

The vector ๐‘ฆ ฬ‚ is called the orthogonal projection of ๐‘ฆ onto ๐‘†


The next figure provides some intuition

23.4.1 Proof of Sufficiency

Weโ€™ll omit the full proof.


But we will prove sufficiency of the asserted conditions
To this end, let ๐‘ฆ โˆˆ R๐‘› and let ๐‘† be a linear subspace of R๐‘›
Let ๐‘ฆ ฬ‚ be a vector in R๐‘› such that ๐‘ฆ ฬ‚ โˆˆ ๐‘† and ๐‘ฆ โˆ’ ๐‘ฆ ฬ‚ โŸ‚ ๐‘†
Let ๐‘ง be any other point in ๐‘† and use the fact that ๐‘† is a linear subspace to deduce

โ€–๐‘ฆ โˆ’ ๐‘งโ€–2 = โ€–(๐‘ฆ โˆ’ ๐‘ฆ)ฬ‚ + (๐‘ฆ ฬ‚ โˆ’ ๐‘ง)โ€–2 = โ€–๐‘ฆ โˆ’ ๐‘ฆโ€–ฬ‚ 2 + โ€–๐‘ฆ ฬ‚ โˆ’ ๐‘งโ€–2

Hence โ€–๐‘ฆ โˆ’ ๐‘งโ€– โ‰ฅ โ€–๐‘ฆ โˆ’ ๐‘ฆโ€–,
ฬ‚ which completes the proof
23.4. THE ORTHOGONAL PROJECTION THEOREM 375

23.4.2 Orthogonal Projection as a Mapping

For a linear space ๐‘Œ and a fixed linear subspace ๐‘†, we have a functional relationship

๐‘ฆ โˆˆ ๐‘Œ โ†ฆ its orthogonal projection ๐‘ฆ ฬ‚ โˆˆ ๐‘†

By the OPT, this is a well-defined mapping or operator from R๐‘› to R๐‘›


In what follows we denote this operator by a matrix ๐‘ƒ

โ€ข ๐‘ƒ ๐‘ฆ represents the projection ๐‘ฆ ฬ‚


โ€ข This is sometimes expressed as ๐ธ๐‘†ฬ‚ ๐‘ฆ = ๐‘ƒ ๐‘ฆ, where ๐ธฬ‚ denotes a wide-sense expecta-
tions operator and the subscript ๐‘† indicates that we are projecting ๐‘ฆ onto the linear
subspace ๐‘†

The operator ๐‘ƒ is called the orthogonal projection mapping onto ๐‘†


376 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

It is immediate from the OPT that for any ๐‘ฆ โˆˆ R๐‘›

1. ๐‘ƒ ๐‘ฆ โˆˆ ๐‘† and
2. ๐‘ฆ โˆ’ ๐‘ƒ ๐‘ฆ โŸ‚ ๐‘†

From this, we can deduce additional useful properties, such as

1. โ€–๐‘ฆโ€–2 = โ€–๐‘ƒ ๐‘ฆโ€–2 + โ€–๐‘ฆ โˆ’ ๐‘ƒ ๐‘ฆโ€–2 and


2. โ€–๐‘ƒ ๐‘ฆโ€– โ‰ค โ€–๐‘ฆโ€–

For example, to prove 1, observe that ๐‘ฆ = ๐‘ƒ ๐‘ฆ + ๐‘ฆ โˆ’ ๐‘ƒ ๐‘ฆ and apply the Pythagorean law
Orthogonal Complement
Let ๐‘† โŠ‚ R๐‘› .
The orthogonal complement of ๐‘† is the linear subspace ๐‘† โŸ‚ that satisfies ๐‘ฅ1 โŸ‚ ๐‘ฅ2 for every
๐‘ฅ1 โˆˆ ๐‘† and ๐‘ฅ2 โˆˆ ๐‘† โŸ‚
Let ๐‘Œ be a linear space with linear subspace ๐‘† and its orthogonal complement ๐‘† โŸ‚
We write

๐‘Œ = ๐‘† โŠ• ๐‘†โŸ‚

to indicate that for every ๐‘ฆ โˆˆ ๐‘Œ there is unique ๐‘ฅ1 โˆˆ ๐‘† and a unique ๐‘ฅ2 โˆˆ ๐‘† โŸ‚ such that
๐‘ฆ = ๐‘ฅ1 + ๐‘ฅ2
Moreover, ๐‘ฅ1 = ๐ธ๐‘†ฬ‚ ๐‘ฆ and ๐‘ฅ2 = ๐‘ฆ โˆ’ ๐ธ๐‘†ฬ‚ ๐‘ฆ
This amounts to another version of the OPT:
Theorem. If ๐‘† is a linear subspace of R๐‘› , ๐ธ๐‘†ฬ‚ ๐‘ฆ = ๐‘ƒ ๐‘ฆ and ๐ธ๐‘†ฬ‚ โŸ‚ ๐‘ฆ = ๐‘€ ๐‘ฆ, then

๐‘ƒ ๐‘ฆ โŸ‚ ๐‘€๐‘ฆ and ๐‘ฆ = ๐‘ƒ ๐‘ฆ + ๐‘€ ๐‘ฆ for all ๐‘ฆ โˆˆ R๐‘›


23.5. ORTHONORMAL BASIS 377

The next figure illustrates

23.5 Orthonormal Basis

An orthogonal set of vectors ๐‘‚ โŠ‚ R๐‘› is called an orthonormal set if โ€–๐‘ขโ€– = 1 for all ๐‘ข โˆˆ ๐‘‚


Let ๐‘† be a linear subspace of R๐‘› and let ๐‘‚ โŠ‚ ๐‘†
If ๐‘‚ is orthonormal and span ๐‘‚ = ๐‘†, then ๐‘‚ is called an orthonormal basis of ๐‘†
๐‘‚ is necessarily a basis of ๐‘† (being independent by orthogonality and the fact that no ele-
ment is the zero vector)
One example of an orthonormal set is the canonical basis {๐‘’1 , โ€ฆ , ๐‘’๐‘› } that forms an orthonor-
mal basis of R๐‘› , where ๐‘’๐‘– is the ๐‘– th unit vector
If {๐‘ข1 , โ€ฆ , ๐‘ข๐‘˜ } is an orthonormal basis of linear subspace ๐‘†, then

๐‘˜
๐‘ฅ = โˆ‘โŸจ๐‘ฅ, ๐‘ข๐‘– โŸฉ๐‘ข๐‘– for all ๐‘ฅโˆˆ๐‘†
๐‘–=1

To see this, observe that since ๐‘ฅ โˆˆ span{๐‘ข1 , โ€ฆ , ๐‘ข๐‘˜ }, we can find scalars ๐›ผ1 , โ€ฆ , ๐›ผ๐‘˜ that verify

๐‘˜
๐‘ฅ = โˆ‘ ๐›ผ๐‘— ๐‘ข๐‘— (1)
๐‘—=1

Taking the inner product with respect to ๐‘ข๐‘– gives

๐‘˜
โŸจ๐‘ฅ, ๐‘ข๐‘– โŸฉ = โˆ‘ ๐›ผ๐‘— โŸจ๐‘ข๐‘— , ๐‘ข๐‘– โŸฉ = ๐›ผ๐‘–
๐‘—=1

Combining this result with Eq. (1) verifies the claim


378 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

23.5.1 Projection onto an Orthonormal Basis

When the subspace onto which are projecting is orthonormal, computing the projection sim-
plifies:
Theorem If {๐‘ข1 , โ€ฆ , ๐‘ข๐‘˜ } is an orthonormal basis for ๐‘†, then

๐‘˜
๐‘ƒ ๐‘ฆ = โˆ‘โŸจ๐‘ฆ, ๐‘ข๐‘– โŸฉ๐‘ข๐‘– , โˆ€ ๐‘ฆ โˆˆ R๐‘› (2)
๐‘–=1

Proof: Fix ๐‘ฆ โˆˆ R๐‘› and let ๐‘ƒ ๐‘ฆ be defined as in Eq. (2)


Clearly, ๐‘ƒ ๐‘ฆ โˆˆ ๐‘†
We claim that ๐‘ฆ โˆ’ ๐‘ƒ ๐‘ฆ โŸ‚ ๐‘† also holds
It sufficies to show that ๐‘ฆ โˆ’ ๐‘ƒ ๐‘ฆ โŸ‚ any basis vector ๐‘ข๐‘– (why?)
This is true because

๐‘˜ ๐‘˜
โŸจ๐‘ฆ โˆ’ โˆ‘โŸจ๐‘ฆ, ๐‘ข๐‘– โŸฉ๐‘ข๐‘– , ๐‘ข๐‘— โŸฉ = โŸจ๐‘ฆ, ๐‘ข๐‘— โŸฉ โˆ’ โˆ‘โŸจ๐‘ฆ, ๐‘ข๐‘– โŸฉโŸจ๐‘ข๐‘– , ๐‘ข๐‘— โŸฉ = 0
๐‘–=1 ๐‘–=1

23.6 Projection Using Matrix Algebra

Let ๐‘† be a linear subspace of R๐‘› and let ๐‘ฆ โˆˆ R๐‘›


We want to compute the matrix ๐‘ƒ that verifies

๐ธ๐‘†ฬ‚ ๐‘ฆ = ๐‘ƒ ๐‘ฆ

Evidently ๐‘ƒ ๐‘ฆ is a linear function from ๐‘ฆ โˆˆ R๐‘› to ๐‘ƒ ๐‘ฆ โˆˆ R๐‘›


This reference is useful https://en.wikipedia.org/wiki/Linear_map#Matrices
Theorem. Let the columns of ๐‘› ร— ๐‘˜ matrix ๐‘‹ form a basis of ๐‘†. Then

๐‘ƒ = ๐‘‹(๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ

Proof: Given arbitrary ๐‘ฆ โˆˆ R๐‘› and ๐‘ƒ = ๐‘‹(๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ , our claim is that

1. ๐‘ƒ ๐‘ฆ โˆˆ ๐‘†, and
2. ๐‘ฆ โˆ’ ๐‘ƒ ๐‘ฆ โŸ‚ ๐‘†

Claim 1 is true because

๐‘ƒ ๐‘ฆ = ๐‘‹(๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ ๐‘ฆ = ๐‘‹๐‘Ž when ๐‘Ž โˆถ= (๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ ๐‘ฆ

An expression of the form ๐‘‹๐‘Ž is precisely a linear combination of the columns of ๐‘‹, and


hence an element of ๐‘†
Claim 2 is equivalent to the statement
23.6. PROJECTION USING MATRIX ALGEBRA 379

๐‘ฆ โˆ’ ๐‘‹(๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ ๐‘ฆ โŸ‚ ๐‘‹๐‘ for all ๐‘ โˆˆ R๐พ

This is true: If ๐‘ โˆˆ R๐พ , then

(๐‘‹๐‘)โ€ฒ [๐‘ฆ โˆ’ ๐‘‹(๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ ๐‘ฆ] = ๐‘โ€ฒ [๐‘‹ โ€ฒ ๐‘ฆ โˆ’ ๐‘‹ โ€ฒ ๐‘ฆ] = 0

The proof is now complete

23.6.1 Starting with the Basis

It is common in applications to start with ๐‘› ร— ๐‘˜ matrix ๐‘‹ with linearly independent columns


and let

๐‘† โˆถ= span ๐‘‹ โˆถ= span{1 ๐‘‹, โ€ฆ ,๐‘˜ ๐‘‹}

Then the columns of ๐‘‹ form a basis of ๐‘†


From the preceding theorem, ๐‘ƒ = ๐‘‹(๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ ๐‘ฆ projects ๐‘ฆ onto ๐‘†
In this context, ๐‘ƒ is often called the projection matrix

โ€ข The matrix ๐‘€ = ๐ผ โˆ’ ๐‘ƒ satisfies ๐‘€ ๐‘ฆ = ๐ธ๐‘†ฬ‚ โŸ‚ ๐‘ฆ and is sometimes called the annihilator


matrix

23.6.2 The Orthonormal Case

Suppose that ๐‘ˆ is ๐‘› ร— ๐‘˜ with orthonormal columns


Let ๐‘ข๐‘– โˆถ= col ๐‘ˆ๐‘– for each ๐‘–, let ๐‘† โˆถ= span ๐‘ˆ and let ๐‘ฆ โˆˆ R๐‘›
We know that the projection of ๐‘ฆ onto ๐‘† is

๐‘ƒ ๐‘ฆ = ๐‘ˆ (๐‘ˆ โ€ฒ ๐‘ˆ )โˆ’1 ๐‘ˆ โ€ฒ ๐‘ฆ

Since ๐‘ˆ has orthonormal columns, we have ๐‘ˆ โ€ฒ ๐‘ˆ = ๐ผ


Hence

๐‘˜
๐‘ƒ ๐‘ฆ = ๐‘ˆ ๐‘ˆ โ€ฒ ๐‘ฆ = โˆ‘โŸจ๐‘ข๐‘– , ๐‘ฆโŸฉ๐‘ข๐‘–
๐‘–=1

We have recovered our earlier result about projecting onto the span of an orthonormal basis

23.6.3 Application: Overdetermined Systems of Equations

Let ๐‘ฆ โˆˆ R๐‘› and let ๐‘‹ is ๐‘› ร— ๐‘˜ with linearly independent columns


Given ๐‘‹ and ๐‘ฆ, we seek ๐‘ โˆˆ R๐‘˜ satisfying the system of linear equations ๐‘‹๐‘ = ๐‘ฆ
If ๐‘› > ๐‘˜ (more equations than unknowns), then ๐‘ is said to be overdetermined
380 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

Intuitively, we may not be able to find a ๐‘ that satisfies all ๐‘› equations


The best approach here is to

โ€ข Accept that an exact solution may not exist


โ€ข Look instead for an approximate solution

By approximate solution, we mean a ๐‘ โˆˆ R๐‘˜ such that ๐‘‹๐‘ is as close to ๐‘ฆ as possible


The next theorem shows that the solution is well defined and unique
The proof uses the OPT
Theorem The unique minimizer of โ€–๐‘ฆ โˆ’ ๐‘‹๐‘โ€– over ๐‘ โˆˆ R๐พ is

๐›ฝ ฬ‚ โˆถ= (๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ ๐‘ฆ

Proof: Note that

๐‘‹ ๐›ฝ ฬ‚ = ๐‘‹(๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ ๐‘ฆ = ๐‘ƒ ๐‘ฆ

Since ๐‘ƒ ๐‘ฆ is the orthogonal projection onto span(๐‘‹) we have

โ€–๐‘ฆ โˆ’ ๐‘ƒ ๐‘ฆโ€– โ‰ค โ€–๐‘ฆ โˆ’ ๐‘งโ€– for any ๐‘ง โˆˆ span(๐‘‹)

Because ๐‘‹๐‘ โˆˆ span(๐‘‹)

โ€–๐‘ฆ โˆ’ ๐‘‹ ๐›ฝโ€–ฬ‚ โ‰ค โ€–๐‘ฆ โˆ’ ๐‘‹๐‘โ€– for any ๐‘ โˆˆ R๐พ

This is what we aimed to show

23.7 Least Squares Regression

Letโ€™s apply the theory of orthogonal projection to least squares regression


This approach provides insights about many geometric properties of linear regression
We treat only some examples

23.7.1 Squared Risk Measures

Given pairs (๐‘ฅ, ๐‘ฆ) โˆˆ R๐พ ร— R, consider choosing ๐‘“ โˆถ R๐พ โ†’ R to minimize the risk

๐‘…(๐‘“) โˆถ= E [(๐‘ฆ โˆ’ ๐‘“(๐‘ฅ))2 ]

If probabilities and hence E are unknown, we cannot solve this problem directly
However, if a sample is available, we can estimate the risk with the empirical risk:

1 ๐‘
min โˆ‘(๐‘ฆ โˆ’ ๐‘“(๐‘ฅ๐‘› ))2
๐‘“โˆˆโ„ฑ ๐‘ ๐‘›=1 ๐‘›
23.7. LEAST SQUARES REGRESSION 381

Minimizing this expression is called empirical risk minimization


The set โ„ฑ is sometimes called the hypothesis space
The theory of statistical learning tells us that to prevent overfitting we should take the set โ„ฑ
to be relatively simple
If we let โ„ฑ be the class of linear functions 1/๐‘ , the problem is

๐‘
min โˆ‘(๐‘ฆ๐‘› โˆ’ ๐‘โ€ฒ ๐‘ฅ๐‘› )2
๐‘โˆˆR๐พ
๐‘›=1

This is the sample linear least squares problem

23.7.2 Solution

Define the matrices

๐‘ฆ1 ๐‘ฅ๐‘›1
โŽ›
โŽœ ๐‘ฆ2 โŽž
โŽŸ โŽ›
โŽœ ๐‘ฅ๐‘›2 โŽž
โŽŸ
๐‘ฆ โˆถ= โŽœ
โŽœ โŽŸ
โŽŸ , ๐‘ฅ๐‘› โˆถ= โŽœ
โŽœ โŽŸ
โŽŸ = :math:โ€˜nโ€˜-th obs on all regressors
โŽœ โ‹ฎ โŽŸ โŽœ โ‹ฎ โŽŸ
โŽ ๐‘ฆ๐‘ โŽ  โŽ ๐‘ฅ๐‘›๐พ โŽ 

and

๐‘ฅโ€ฒ1 ๐‘ฅ11 ๐‘ฅ12 โ‹ฏ ๐‘ฅ1๐พ


โŽ›
โŽœ ๐‘ฅโ€ฒ2 โŽž
โŽŸ โŽ›
โŽœ ๐‘ฅ21 ๐‘ฅ22 โ‹ฏ ๐‘ฅ2๐พ โŽž
โŽŸ
๐‘‹ โˆถ= โŽœ
โŽœ โŽŸ
โŽŸ โˆถ=โˆถ โŽœ
โŽœ โŽŸ
โŽŸ
โŽœ โ‹ฎ โŽŸ โŽœ โ‹ฎ โ‹ฎ โ‹ฎ โŽŸ
โŽ ๐‘ฅโ€ฒ๐‘ โŽ  ๐‘ฅ
โŽ ๐‘1 ๐‘ฅ๐‘2 โ‹ฏ ๐‘ฅ ๐‘๐พ โŽ 

We assume throughout that ๐‘ > ๐พ and ๐‘‹ is full column rank


๐‘
If you work through the algebra, you will be able to verify that โ€–๐‘ฆ โˆ’ ๐‘‹๐‘โ€–2 = โˆ‘๐‘›=1 (๐‘ฆ๐‘› โˆ’ ๐‘โ€ฒ ๐‘ฅ๐‘› )2
Since monotone transforms donโ€™t affect minimizers, we have

๐‘
min โˆ‘(๐‘ฆ๐‘› โˆ’ ๐‘โ€ฒ ๐‘ฅ๐‘› )2 = min โ€–๐‘ฆ โˆ’ ๐‘‹๐‘โ€–
๐‘โˆˆR๐พ ๐‘โˆˆR๐พ
๐‘›=1

By our results about overdetermined linear systems of equations, the solution is

๐›ฝ ฬ‚ โˆถ= (๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ ๐‘ฆ

Let ๐‘ƒ and ๐‘€ be the projection and annihilator associated with ๐‘‹:

๐‘ƒ โˆถ= ๐‘‹(๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ and ๐‘€ โˆถ= ๐ผ โˆ’ ๐‘ƒ

The vector of fitted values is

๐‘ฆ ฬ‚ โˆถ= ๐‘‹ ๐›ฝ ฬ‚ = ๐‘ƒ ๐‘ฆ
382 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

The vector of residuals is

๐‘ขฬ‚ โˆถ= ๐‘ฆ โˆ’ ๐‘ฆ ฬ‚ = ๐‘ฆ โˆ’ ๐‘ƒ ๐‘ฆ = ๐‘€ ๐‘ฆ

Here are some more standard definitions:

โ€ข The total sum of squares is โˆถ= โ€–๐‘ฆโ€–2


โ€ข The sum of squared residuals is โˆถ= โ€–๐‘ขโ€–ฬ‚ 2
โ€ข The explained sum of squares is โˆถ= โ€–๐‘ฆโ€–ฬ‚ 2

TSS = ESS + SSR

We can prove this easily using the OPT


From the OPT we have ๐‘ฆ = ๐‘ฆ ฬ‚ + ๐‘ขฬ‚ and ๐‘ขฬ‚ โŸ‚ ๐‘ฆ ฬ‚
Applying the Pythagorean law completes the proof

23.8 Orthogonalization and Decomposition

Letโ€™s return to the connection between linear independence and orthogonality touched on
above
A result of much interest is a famous algorithm for constructing orthonormal sets from lin-
early independent sets
The next section gives details

23.8.1 Gram-Schmidt Orthogonalization

Theorem For each linearly independent set {๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘˜ } โŠ‚ R๐‘› , there exists an orthonormal
set {๐‘ข1 , โ€ฆ , ๐‘ข๐‘˜ } with

span{๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘– } = span{๐‘ข1 , โ€ฆ , ๐‘ข๐‘– } for ๐‘– = 1, โ€ฆ , ๐‘˜

The Gram-Schmidt orthogonalization procedure constructs an orthogonal set


{๐‘ข1 , ๐‘ข2 , โ€ฆ , ๐‘ข๐‘› }
One description of this procedure is as follows:

โ€ข For ๐‘– = 1, โ€ฆ , ๐‘˜, form ๐‘†๐‘– โˆถ= span{๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘– } and ๐‘†๐‘–โŸ‚


โ€ข Set ๐‘ฃ1 = ๐‘ฅ1
โ€ข For ๐‘– โ‰ฅ 2 set ๐‘ฃ๐‘– โˆถ= ๐ธ๐‘†ฬ‚ ๐‘–โˆ’1
โŸ‚ ๐‘ฅ๐‘– and ๐‘ข๐‘– โˆถ= ๐‘ฃ๐‘– /โ€–๐‘ฃ๐‘– โ€–

The sequence ๐‘ข1 , โ€ฆ , ๐‘ข๐‘˜ has the stated properties


A Gram-Schmidt orthogonalization construction is a key idea behind the Kalman filter de-
scribed in A First Look at the Kalman filter
In some exercises below, you are asked to implement this algorithm and test it using projec-
tion
23.9. EXERCISES 383

23.8.2 QR Decomposition

The following result uses the preceding algorithm to produce a useful decomposition
Theorem If ๐‘‹ is ๐‘› ร— ๐‘˜ with linearly independent columns, then there exists a factorization
๐‘‹ = ๐‘„๐‘… where

โ€ข ๐‘… is ๐‘˜ ร— ๐‘˜, upper triangular, and nonsingular


โ€ข ๐‘„ is ๐‘› ร— ๐‘˜ with orthonormal columns

Proof sketch: Let

โ€ข ๐‘ฅ๐‘— โˆถ=๐‘— (๐‘‹)
โ€ข {๐‘ข1 , โ€ฆ , ๐‘ข๐‘˜ } be orthonormal with the same span as {๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘˜ } (to be constructed using
Gramโ€“Schmidt)
โ€ข ๐‘„ be formed from cols ๐‘ข๐‘–

Since ๐‘ฅ๐‘— โˆˆ span{๐‘ข1 , โ€ฆ , ๐‘ข๐‘— }, we have

๐‘—
๐‘ฅ๐‘— = โˆ‘โŸจ๐‘ข๐‘– , ๐‘ฅ๐‘— โŸฉ๐‘ข๐‘– for ๐‘— = 1, โ€ฆ , ๐‘˜
๐‘–=1

Some rearranging gives ๐‘‹ = ๐‘„๐‘…

23.8.3 Linear Regression via QR Decomposition

For matrices ๐‘‹ and ๐‘ฆ that overdetermine ๐‘๐‘’๐‘ก๐‘Ž in the linear equation system ๐‘ฆ = ๐‘‹๐›ฝ, we
found the least squares approximator ๐›ฝ ฬ‚ = (๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ ๐‘ฆ
Using the QR decomposition ๐‘‹ = ๐‘„๐‘… gives

๐›ฝ ฬ‚ = (๐‘…โ€ฒ ๐‘„โ€ฒ ๐‘„๐‘…)โˆ’1 ๐‘…โ€ฒ ๐‘„โ€ฒ ๐‘ฆ
= (๐‘…โ€ฒ ๐‘…)โˆ’1 ๐‘…โ€ฒ ๐‘„โ€ฒ ๐‘ฆ
= ๐‘…โˆ’1 (๐‘…โ€ฒ )โˆ’1 ๐‘…โ€ฒ ๐‘„โ€ฒ ๐‘ฆ = ๐‘…โˆ’1 ๐‘„โ€ฒ ๐‘ฆ

Numerical routines would in this case use the alternative form ๐‘…๐›ฝ ฬ‚ = ๐‘„โ€ฒ ๐‘ฆ and back substitu-
tion

23.9 Exercises

23.9.1 Exercise 1

Show that, for any linear subspace ๐‘† โŠ‚ R๐‘› , ๐‘† โˆฉ ๐‘† โŸ‚ = {0}

23.9.2 Exercise 2

Let ๐‘ƒ = ๐‘‹(๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ and let ๐‘€ = ๐ผ โˆ’ ๐‘ƒ . Show that ๐‘ƒ and ๐‘€ are both idempotent and
symmetric. Can you give any intuition as to why they should be idempotent?
384 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

23.9.3 Exercise 3

Using Gram-Schmidt orthogonalization, produce a linear projection of ๐‘ฆ onto the column


space of ๐‘‹ and verify this using the projection matrix ๐‘ƒ โˆถ= ๐‘‹(๐‘‹ โ€ฒ ๐‘‹)โˆ’1 ๐‘‹ โ€ฒ and also using
QR decomposition, where:

1
๐‘ฆ โˆถ= โŽ›
โŽœ 3 โŽžโŽŸ,
โŽ โˆ’3 โŽ 

and

1 0
๐‘‹ โˆถ= โŽ›
โŽœ 0 โˆ’6 โŽž
โŽŸ
โŽ 2 2 โŽ 

23.10 Solutions

23.10.1 Exercise 1

If ๐‘ฅ โˆˆ ๐‘† and ๐‘ฅ โˆˆ ๐‘† โŸ‚ , then we have in particular that โŸจ๐‘ฅ, ๐‘ฅโŸฉ = 0, ut then ๐‘ฅ = 0

23.10.2 Exercise 2

Symmetry and idempotence of ๐‘€ and ๐‘ƒ can be established using standard rules for matrix
algebra. The intuition behind idempotence of ๐‘€ and ๐‘ƒ is that both are orthogonal projec-
tions. After a point is projected into a given subspace, applying the projection again makes
no difference. (A point inside the subspace is not shifted by orthogonal projection onto that
space because it is already the closest point in the subspace to itself.)

23.10.3 Exercise 3

Hereโ€™s a function that computes the orthonormal vectors using the GS algorithm given in the
lecture

In [1]: import numpy as np

def gram_schmidt(X):
"""
Implements Gram-Schmidt orthogonalization.

Parameters
----------
X : an n x k array with linearly independent columns

Returns
-------
U : an n x k array with orthonormal columns

"""

# Set up
n, k = X.shape
U = np.empty((n, k))
23.10. SOLUTIONS 385

I = np.eye(n)

# The first col of U is just the normalized first col of X


v1 = X[:,0]
U[:, 0] = v1 / np.sqrt(np.sum(v1 * v1))

for i in range(1, k):


# Set up
b = X[:, i] # The vector we're going to project
Z = X[:, 0:i] # First i-1 columns of X

# Project onto the orthogonal complement of the col span of Z


M = I - Z @ np.linalg.inv(Z.T @ Z) @ Z.T
u = M @ b

# Normalize
U[:, i] = u / np.sqrt(np.sum(u * u))

return U

Here are the arrays weโ€™ll work with

In [2]: y = [1, 3, -3]

X = [[1, 0],
[0, -6],
[2, 2]]

X, y = [np.asarray(z) for z in (X, y)]

First, letโ€™s try projection of ๐‘ฆ onto the column space of ๐‘‹ using the ordinary matrix expres-
sion:

In [3]: Py1 = X @ np.linalg.inv(X.T @ X) @ X.T @ y


Py1

Out[3]: array([-0.56521739, 3.26086957, -2.2173913 ])

Now letโ€™s do the same using an orthonormal basis created from our gram_schmidt function

In [4]: U = gram_schmidt(X)
U

Out[4]: array([[ 0.4472136 , -0.13187609],


[ 0. , -0.98907071],
[ 0.89442719, 0.06593805]])

In [5]: Py2 = U @ U.T @ y


Py2

Out[5]: array([-0.56521739, 3.26086957, -2.2173913 ])

This is the same answer. So far so good. Finally, letโ€™s try the same thing but with the basis
obtained via QR decomposition:

In [6]: from scipy.linalg import qr

Q, R = qr(X, mode='economic')
Q
386 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

Out[6]: array([[-0.4472136 , -0.13187609],


[-0. , -0.98907071],
[-0.89442719, 0.06593805]])

In [7]: Py3 = Q @ Q.T @ y


Py3

Out[7]: array([-0.56521739, 3.26086957, -2.2173913 ])

Again, we obtain the same answer


24

LLN and CLT

24.1 Contents

โ€ข Overview 24.2

โ€ข Relationships 24.3

โ€ข LLN 24.4

โ€ข CLT 24.5

โ€ข Exercises 24.6

โ€ข Solutions 24.7

24.2 Overview

This lecture illustrates two of the most important theorems of probability and statistics: The
law of large numbers (LLN) and the central limit theorem (CLT)
These beautiful theorems lie behind many of the most fundamental results in econometrics
and quantitative economic modeling
The lecture is based around simulations that show the LLN and CLT in action
We also demonstrate how the LLN and CLT break down when the assumptions they are
based on do not hold
In addition, we examine several useful extensions of the classical theorems, such as

โ€ข The delta method, for smooth functions of random variables


โ€ข The multivariate case

Some of these extensions are presented as exercises

24.3 Relationships

The CLT refines the LLN

387
388 24. LLN AND CLT

The LLN gives conditions under which sample moments converge to population moments as
sample size increases
The CLT provides information about the rate at which sample moments converge to popula-
tion moments as sample size increases

24.4 LLN

We begin with the law of large numbers, which tells us when sample averages will converge to
their population means

24.4.1 The Classical LLN

The classical law of large numbers concerns independent and identically distributed (IID)
random variables
Here is the strongest version of the classical LLN, known as Kolmogorovโ€™s strong law
Let ๐‘‹1 , โ€ฆ , ๐‘‹๐‘› be independent and identically distributed scalar random variables, with com-
mon distribution ๐น
When it exists, let ๐œ‡ denote the common mean of this sample:

๐œ‡ โˆถ= E๐‘‹ = โˆซ ๐‘ฅ๐น (๐‘‘๐‘ฅ)

In addition, let

1 ๐‘›
๐‘‹ฬ„ ๐‘› โˆถ= โˆ‘ ๐‘‹๐‘–
๐‘› ๐‘–=1

Kolmogorovโ€™s strong law states that, if E|๐‘‹| is finite, then

P {๐‘‹ฬ„ ๐‘› โ†’ ๐œ‡ as ๐‘› โ†’ โˆž} = 1 (1)

What does this last expression mean?


Letโ€™s think about it from a simulation perspective, imagining for a moment that our com-
puter can generate perfect random samples (which of course it canโ€™t)
Letโ€™s also imagine that we can generate infinite sequences so that the statement ๐‘‹ฬ„ ๐‘› โ†’ ๐œ‡ can
be evaluated
In this setting, Eq. (1) should be interpreted as meaning that the probability of the computer
producing a sequence where ๐‘‹ฬ„ ๐‘› โ†’ ๐œ‡ fails to occur is zero

24.4.2 Proof

The proof of Kolmogorovโ€™s strong law is nontrivial โ€“ see, for example, theorem 8.3.5 of [38]
On the other hand, we can prove a weaker version of the LLN very easily and still get most of
the intuition
24.4. LLN 389

The version we prove is as follows: If ๐‘‹1 , โ€ฆ , ๐‘‹๐‘› is IID with E๐‘‹๐‘–2 < โˆž, then, for any ๐œ– > 0,
we have

P {|๐‘‹ฬ„ ๐‘› โˆ’ ๐œ‡| โ‰ฅ ๐œ–} โ†’ 0 as ๐‘›โ†’โˆž (2)

(This version is weaker because we claim only convergence in probability rather than almost
sure convergence, and assume a finite second moment)
To see that this is so, fix ๐œ– > 0, and let ๐œŽ2 be the variance of each ๐‘‹๐‘–
Recall the Chebyshev inequality, which tells us that

E[(๐‘‹ฬ„ ๐‘› โˆ’ ๐œ‡)2 ]
P {|๐‘‹ฬ„ ๐‘› โˆ’ ๐œ‡| โ‰ฅ ๐œ–} โ‰ค (3)
๐œ–2

Now observe that

2
โŽง
{ 1 ๐‘› โŽซ
}
E[(๐‘‹ฬ„ ๐‘› โˆ’ ๐œ‡)2 ] = E โŽจ[ โˆ‘(๐‘‹๐‘– โˆ’ ๐œ‡)] โŽฌ
{ ๐‘› ๐‘–=1 }
โŽฉ โŽญ
๐‘› ๐‘›
1
= 2 โˆ‘ โˆ‘ E(๐‘‹๐‘– โˆ’ ๐œ‡)(๐‘‹๐‘— โˆ’ ๐œ‡)
๐‘› ๐‘–=1 ๐‘—=1
1 ๐‘›
= 2 โˆ‘ E(๐‘‹๐‘– โˆ’ ๐œ‡)2
๐‘› ๐‘–=1
๐œŽ2
=
๐‘›

Here the crucial step is at the third equality, which follows from independence
Independence means that if ๐‘– โ‰  ๐‘—, then the covariance term E(๐‘‹๐‘– โˆ’ ๐œ‡)(๐‘‹๐‘— โˆ’ ๐œ‡) drops out
As a result, ๐‘›2 โˆ’ ๐‘› terms vanish, leading us to a final expression that goes to zero in ๐‘›
Combining our last result with Eq. (3), we come to the estimate

๐œŽ2
P {|๐‘‹ฬ„ ๐‘› โˆ’ ๐œ‡| โ‰ฅ ๐œ–} โ‰ค 2 (4)
๐‘›๐œ–

The claim in Eq. (2) is now clear


Of course, if the sequence ๐‘‹1 , โ€ฆ , ๐‘‹๐‘› is correlated, then the cross-product terms E(๐‘‹๐‘– โˆ’
๐œ‡)(๐‘‹๐‘— โˆ’ ๐œ‡) are not necessarily zero
While this doesnโ€™t mean that the same line of argument is impossible, it does mean that if we
want a similar result then the covariances should be โ€œalmost zeroโ€ for โ€œmostโ€ of these terms
In a long sequence, this would be true if, for example, E(๐‘‹๐‘– โˆ’ ๐œ‡)(๐‘‹๐‘— โˆ’ ๐œ‡) approached zero
when the difference between ๐‘– and ๐‘— became large
In other words, the LLN can still work if the sequence ๐‘‹1 , โ€ฆ , ๐‘‹๐‘› has a kind of โ€œasymptotic
independenceโ€, in the sense that correlation falls to zero as variables become further apart in
the sequence
This idea is very important in time series analysis, and weโ€™ll come across it again soon enough
390 24. LLN AND CLT

24.4.3 Illustration

Letโ€™s now illustrate the classical IID law of large numbers using simulation
In particular, we aim to generate some sequences of IID random variables and plot the evolu-
tion of ๐‘‹ฬ„ ๐‘› as ๐‘› increases
Below is a figure that does just this (as usual, you can click on it to expand it)
It shows IID observations from three different distributions and plots ๐‘‹ฬ„ ๐‘› against ๐‘› in each
case
The dots represent the underlying observations ๐‘‹๐‘– for ๐‘– = 1, โ€ฆ , 100
In each of the three cases, convergence of ๐‘‹ฬ„ ๐‘› to ๐œ‡ occurs as predicted

In [1]: import random


import numpy as np
from scipy.stats import t, beta, lognorm, expon, gamma, poisson
import matplotlib.pyplot as plt
%matplotlib inline

n = 100

# == Arbitrary collection of distributions == #


distributions = {"student's t with 10 degrees of freedom": t(10),
"ฮฒ(2, 2)": beta(2, 2),
"lognormal LN(0, 1/2)": lognorm(0.5),
"ฮณ(5, 1/2)": gamma(5, scale=2),
"poisson(4)": poisson(4),
"exponential with ฮป = 1": expon(1)}

# == Create a figure and some axes == #


num_plots = 3
fig, axes = plt.subplots(num_plots, 1, figsize=(20, 20))

# == Set some plotting parameters to improve layout == #


bbox = (0., 1.02, 1., .102)
legend_args = {'ncol': 2,
'bbox_to_anchor': bbox,
'loc': 3,
'mode': 'expand'}
plt.subplots_adjust(hspace=0.5)

for ax in axes:
# == Choose a randomly selected distribution == #
name = random.choice(list(distributions.keys()))
distribution = distributions.pop(name)

# == Generate n draws from the distribution == #


data = distribution.rvs(n)

# == Compute sample mean at each n == #


sample_mean = np.empty(n)
for i in range(n):
sample_mean[i] = np.mean(data[:i+1])

# == Plot == #
ax.plot(list(range(n)), data, 'o', color='grey', alpha=0.5)
axlabel = '$\\bar X_n$ for $X_i \sim$' + name
ax.plot(list(range(n)), sample_mean, 'g-', lw=3, alpha=0.6, label=axlabel)
m = distribution.mean()
ax.plot(list(range(n)), [m] * n, 'k--', lw=1.5, label='$\mu$')
ax.vlines(list(range(n)), m, data, lw=0.2)
ax.legend(**legend_args)

plt.show()
24.4. LLN 391

The three distributions are chosen at random from a selection stored in the dictionary dis-
tributions

24.4.4 Infinite Mean

What happens if the condition E|๐‘‹| < โˆž in the statement of the LLN is not satisfied?
This might be the case if the underlying distribution is heavy-tailed โ€” the best- known ex-
ample is the Cauchy distribution, which has density

1
๐‘“(๐‘ฅ) = (๐‘ฅ โˆˆ R)
๐œ‹(1 + ๐‘ฅ2 )

The next figure shows 100 independent draws from this distribution

In [2]: from scipy.stats import cauchy

n = 100
distribution = cauchy()
392 24. LLN AND CLT

fig, ax = plt.subplots(figsize=(10, 6))


data = distribution.rvs(n)

ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5)


ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"{n} observations from the Cauchy distribution")

plt.show()

Notice how extreme observations are far more prevalent here than the previous figure
Letโ€™s now have a look at the behavior of the sample mean

In [3]: n = 1000
distribution = cauchy()

fig, ax = plt.subplots(figsize=(10, 6))


data = distribution.rvs(n)

# == Compute sample mean at each n == #


sample_mean = np.empty(n)

for i in range(1, n):


sample_mean[i] = np.mean(data[:i])

# == Plot == #
ax.plot(list(range(n)), sample_mean, 'r-', lw=3, alpha=0.6,
label='$\\bar X_n$')
ax.plot(list(range(n)), [0] * n, 'k--', lw=0.5)
ax.legend()

plt.show()
24.5. CLT 393

Here weโ€™ve increased ๐‘› to 1000, but the sequence still shows no sign of converging
Will convergence become visible if we take ๐‘› even larger?
The answer is no
To see this, recall that the characteristic function of the Cauchy distribution is

๐œ™(๐‘ก) = E๐‘’๐‘–๐‘ก๐‘‹ = โˆซ ๐‘’๐‘–๐‘ก๐‘ฅ ๐‘“(๐‘ฅ)๐‘‘๐‘ฅ = ๐‘’โˆ’|๐‘ก| (5)

Using independence, the characteristic function of the sample mean becomes

ฬ„ ๐‘ก ๐‘›
E๐‘’๐‘–๐‘ก๐‘‹๐‘› = E exp {๐‘– โˆ‘ ๐‘‹๐‘— }
๐‘› ๐‘—=1
๐‘›
๐‘ก
= E โˆ exp {๐‘– ๐‘‹๐‘— }
๐‘—=1
๐‘›
๐‘›
๐‘ก
= โˆ E exp {๐‘– ๐‘‹๐‘— } = [๐œ™(๐‘ก/๐‘›)]๐‘›
๐‘—=1
๐‘›

In view of Eq. (5), this is just ๐‘’โˆ’|๐‘ก|


Thus, in the case of the Cauchy distribution, the sample mean itself has the very same
Cauchy distribution, regardless of ๐‘›
In particular, the sequence ๐‘‹ฬ„ ๐‘› does not converge to a point

24.5 CLT

Next, we turn to the central limit theorem, which tells us about the distribution of the devia-
tion between sample averages and population means
394 24. LLN AND CLT

24.5.1 Statement of the Theorem

The central limit theorem is one of the most remarkable results in all of mathematics
In the classical IID setting, it tells us the following:
If the sequence ๐‘‹1 , โ€ฆ , ๐‘‹๐‘› is IID, with common mean ๐œ‡ and common variance ๐œŽ2 โˆˆ (0, โˆž),
then

โˆš ๐‘‘
๐‘›(๐‘‹ฬ„ ๐‘› โˆ’ ๐œ‡) โ†’ ๐‘ (0, ๐œŽ2 ) as ๐‘›โ†’โˆž (6)

๐‘‘
Here โ†’ ๐‘ (0, ๐œŽ2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal
with standard deviation ๐œŽ

24.5.2 Intuition

The striking implication of the CLT is that for any distribution with finite second moment,
the simple operation of adding independent copies always leads to a Gaussian curve
A relatively simple proof of the central limit theorem can be obtained by working with char-
acteristic functions (see, e.g., theorem 9.5.6 of [38])
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition
In fact, all of the proofs of the CLT that we know are similar in this respect
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating the addition of independent Bernoulli
random variables
In particular, let ๐‘‹๐‘– be binary, with P{๐‘‹๐‘– = 0} = P{๐‘‹๐‘– = 1} = 0.5, and let ๐‘‹1 , โ€ฆ , ๐‘‹๐‘› be
independent
๐‘›
Think of ๐‘‹๐‘– = 1 as a โ€œsuccessโ€, so that ๐‘Œ๐‘› = โˆ‘๐‘–=1 ๐‘‹๐‘– is the number of successes in ๐‘› trials
The next figure plots the probability mass function of ๐‘Œ๐‘› for ๐‘› = 1, 2, 4, 8

In [4]: from scipy.stats import binom

fig, axes = plt.subplots(2, 2, figsize=(10, 6))


plt.subplots_adjust(hspace=0.4)
axes = axes.flatten()
ns = [1, 2, 4, 8]
dom = list(range(9))

for ax, n in zip(axes, ns):


b = binom(n, 0.5)
ax.bar(dom, b.pmf(dom), alpha=0.6, align='center')
ax.set(xlim=(-0.5, 8.5), ylim=(0, 0.55),
xticks=list(range(9)), yticks=(0, 0.2, 0.4),
title=f'$n = {n}$')

plt.show()
24.5. CLT 395

When ๐‘› = 1, the distribution is flat โ€” one success or no successes have the same probability
When ๐‘› = 2 we can either have 0, 1 or 2 successes
Notice the peak in probability mass at the mid-point ๐‘˜ = 1
The reason is that there are more ways to get 1 success (โ€œfail then succeedโ€ or โ€œsucceed then
failโ€) than to get zero or two successes
Moreover, the two trials are independent, so the outcomes โ€œfail then succeedโ€ and โ€œsucceed
then failโ€ are just as likely as the outcomes โ€œfail then failโ€ and โ€œsucceed then succeedโ€
(If there was positive correlation, say, then โ€œsucceed then failโ€ would be less likely than โ€œsuc-
ceed then succeedโ€)
Here, already we have the essence of the CLT: addition under independence leads probability
mass to pile up in the middle and thin out at the tails
For ๐‘› = 4 and ๐‘› = 8 we again get a peak at the โ€œmiddleโ€ value (halfway between the mini-
mum and the maximum possible value)
The intuition is the same โ€” there are simply more ways to get these middle outcomes
If we continue, the bell-shaped curve becomes even more pronounced
We are witnessing the binomial approximation of the normal distribution

24.5.3 Simulation 1

Since the CLT seems almost magical, running simulations that verify its implications is one
good way to build intuition
To this end, we now perform the following simulation

1. Choose an arbitrary distribution ๐น for the underlying observations ๐‘‹๐‘–


396 24. LLN AND CLT

โˆš
2. Generate independent draws of ๐‘Œ๐‘› โˆถ= ๐‘›(๐‘‹ฬ„ ๐‘› โˆ’ ๐œ‡)
3. Use these draws to compute some measure of their distribution โ€” such as a histogram
4. Compare the latter to ๐‘ (0, ๐œŽ2 )

Hereโ€™s some code that does exactly this for the exponential distribution ๐น (๐‘ฅ) = 1 โˆ’ ๐‘’โˆ’๐œ†๐‘ฅ
(Please experiment with other choices of ๐น , but remember that, to conform with the condi-
tions of the CLT, the distribution must have a finite second moment)

In [5]: from scipy.stats import norm

# == Set parameters == #
n = 250 # Choice of n
k = 100000 # Number of draws of Y_n
distribution = expon(2) # Exponential distribution, ฮป = 1/2
ฮผ, s = distribution.mean(), distribution.std()

# == Draw underlying RVs. Each row contains a draw of X_1,..,X_n == #


data = distribution.rvs((k, n))
# == Compute mean of each row, producing k draws of \bar X_n == #
sample_means = data.mean(axis=1)
# == Generate observations of Y_n == #
Y = np.sqrt(n) * (sample_means - ฮผ)

# == Plot == #
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * s, 3 * s
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k-', lw=2, label='$N(0, \sigma^2)$')
ax.legend()

plt.show()

Notice the absence of for loops โ€” every operation is vectorized, meaning that the major cal-
culations are all shifted to highly optimized C code
24.5. CLT 397

The fit to the normal density is already tight and can be further improved by increasing n
You can also experiment with other specifications of ๐น

24.5.4 Simulation 2

Our next simulation is somewhat like the first, except that we aim to track the distribution of
โˆš
๐‘Œ๐‘› โˆถ= ๐‘›(๐‘‹ฬ„ ๐‘› โˆ’ ๐œ‡) as ๐‘› increases
In the simulation, weโ€™ll be working with random variables having ๐œ‡ = 0
Thus, when ๐‘› = 1, we have ๐‘Œ1 = ๐‘‹1 , so the first distribution is just the distribution of the
underlying random variable
โˆš
For ๐‘› = 2, the distribution of ๐‘Œ2 is that of (๐‘‹1 + ๐‘‹2 )/ 2, and so on
What we expect is that, regardless of the distribution of the underlying random variable, the
distribution of ๐‘Œ๐‘› will smooth out into a bell-shaped curve
The next figure shows this process for ๐‘‹๐‘– โˆผ ๐‘“, where ๐‘“ was specified as the convex combina-
tion of three different beta densities
(Taking a convex combination is an easy way to produce an irregular shape for ๐‘“)
In the figure, the closest density is that of ๐‘Œ1 , while the furthest is that of ๐‘Œ5

In [6]: from scipy.stats import gaussian_kde


from mpl_toolkits.mplot3d import Axes3D
from matplotlib.collections import PolyCollection

beta_dist = beta(2, 2)

def gen_x_draws(k):
"""
Returns a flat array containing k independent draws from the
distribution of X, the underlying random variable. This distribution is
itself a convex combination of three beta distributions.
"""
bdraws = beta_dist.rvs((3, k))
# == Transform rows, so each represents a different distribution == #
bdraws[0, :] -= 0.5
bdraws[1, :] += 0.6
bdraws[2, :] -= 1.1
# == Set X[i] = bdraws[j, i], where j is a random draw from {0, 1, 2} == #
js = np.random.randint(0, 2, size=k)
X = bdraws[js, np.arange(k)]
# == Rescale, so that the random variable is zero mean == #
m, sigma = X.mean(), X.std()
return (X - m) / sigma

nmax = 5
reps = 100000
ns = list(range(1, nmax + 1))

# == Form a matrix Z such that each column is reps independent draws of X == #


Z = np.empty((reps, nmax))
for i in range(nmax):
Z[:, i] = gen_x_draws(reps)
# == Take cumulative sum across columns
S = Z.cumsum(axis=1)
# == Multiply j-th column by sqrt j == #
Y = (1 / np.sqrt(ns)) * S

# == Plot == #

fig = plt.figure(figsize = (10, 6))


398 24. LLN AND CLT

ax = fig.gca(projection='3d')

a, b = -3, 3
gs = 100
xs = np.linspace(a, b, gs)

# == Build verts == #
greys = np.linspace(0.3, 0.7, nmax)
verts = []
for n in ns:
density = gaussian_kde(Y[:, n-1])
ys = density(xs)
verts.append(list(zip(xs, ys)))

poly = PolyCollection(verts, facecolors=[str(g) for g in greys])


poly.set_alpha(0.85)
ax.add_collection3d(poly, zs=ns, zdir='x')

ax.set(xlim3d=(1, nmax), xticks=(ns), ylabel='$Y_n$', zlabel='$p(y_n)$',


xlabel=("n"), yticks=((-3, 0, 3)), ylim3d=(a, b),
zlim3d=(0, 0.4), zticks=((0.2, 0.4)))
ax.invert_xaxis()
ax.view_init(30, 45) # Rotates the plot 30 deg on z axis and 45 deg on x axis
plt.show()

As expected, the distribution smooths out into a bell curve as ๐‘› increases


We leave you to investigate its contents if you wish to know more
If you run the file from the ordinary IPython shell, the figure should pop up in a window that
you can rotate with your mouse, giving different views on the density sequence

24.5.5 The Multivariate Case

The law of large numbers and central limit theorem work just as nicely in multidimensional
settings
To state the results, letโ€™s recall some elementary facts about random vectors
A random vector X is just a sequence of ๐‘˜ random variables (๐‘‹1 , โ€ฆ , ๐‘‹๐‘˜ )
24.5. CLT 399

Each realization of X is an element of R๐‘˜


A collection of random vectors X1 , โ€ฆ , X๐‘› is called independent if, given any ๐‘› vectors
x1 , โ€ฆ , x๐‘› in R๐‘˜ , we have

P{X1 โ‰ค x1 , โ€ฆ , X๐‘› โ‰ค x๐‘› } = P{X1 โ‰ค x1 } ร— โ‹ฏ ร— P{X๐‘› โ‰ค x๐‘› }

(The vector inequality X โ‰ค x means that ๐‘‹๐‘— โ‰ค ๐‘ฅ๐‘— for ๐‘— = 1, โ€ฆ , ๐‘˜)


Let ๐œ‡๐‘— โˆถ= E[๐‘‹๐‘— ] for all ๐‘— = 1, โ€ฆ , ๐‘˜
The expectation E[X] of X is defined to be the vector of expectations:

E[๐‘‹1 ] ๐œ‡1
โŽ›
โŽœ E[๐‘‹2 ] โŽž
โŽŸ โŽ›
โŽœ ๐œ‡2 โŽž
โŽŸ
E[X] โˆถ= โŽœ
โŽœ โŽŸ
โŽŸ =โŽœ โŽŸ =โˆถ ๐œ‡
โŽœ โ‹ฎ โŽŸ โŽœโŽœ โ‹ฎ โŽŸโŽŸ
โŽ E[๐‘‹ ๐‘˜] ๐œ‡
โŽ  โŽ ๐‘˜ โŽ 

The variance-covariance matrix of random vector X is defined as

Var[X] โˆถ= E[(X โˆ’ ๐œ‡)(X โˆ’ ๐œ‡)โ€ฒ ]

Expanding this out, we get

E[(๐‘‹1 โˆ’ ๐œ‡1 )(๐‘‹1 โˆ’ ๐œ‡1 )] โ‹ฏ E[(๐‘‹1 โˆ’ ๐œ‡1 )(๐‘‹๐‘˜ โˆ’ ๐œ‡๐‘˜ )]


โŽ› E[(๐‘‹ โŽž
โŽœ 2 โˆ’ ๐œ‡2 )(๐‘‹1 โˆ’ ๐œ‡1 )] โ‹ฏ E[(๐‘‹2 โˆ’ ๐œ‡2 )(๐‘‹๐‘˜ โˆ’ ๐œ‡๐‘˜ )] โŽŸ
Var[X] = โŽœ
โŽœ โŽŸ
โŽŸ
โŽœ โ‹ฎ โ‹ฎ โ‹ฎ โŽŸ
โŽ E[(๐‘‹๐‘˜ โˆ’ ๐œ‡๐‘˜ )(๐‘‹1 โˆ’ ๐œ‡1 )] โ‹ฏ E[(๐‘‹๐‘˜ โˆ’ ๐œ‡๐‘˜ )(๐‘‹๐‘˜ โˆ’ ๐œ‡๐‘˜ )] โŽ 

The ๐‘—, ๐‘˜-th term is the scalar covariance between ๐‘‹๐‘— and ๐‘‹๐‘˜


With this notation, we can proceed to the multivariate LLN and CLT
Let X1 , โ€ฆ , X๐‘› be a sequence of independent and identically distributed random vectors, each
one taking values in R๐‘˜
Let ๐œ‡ be the vector E[X๐‘– ], and let ฮฃ be the variance-covariance matrix of X๐‘–
Interpreting vector addition and scalar multiplication in the usual way (i.e., pointwise), let

1 ๐‘›
Xฬ„ ๐‘› โˆถ= โˆ‘ X๐‘–
๐‘› ๐‘–=1

In this setting, the LLN tells us that

P {Xฬ„ ๐‘› โ†’ ๐œ‡ as ๐‘› โ†’ โˆž} = 1 (7)

Here Xฬ„ ๐‘› โ†’ ๐œ‡ means that โ€–Xฬ„ ๐‘› โˆ’ ๐œ‡โ€– โ†’ 0, where โ€– โ‹… โ€– is the standard Euclidean norm


The CLT tells us that, provided ฮฃ is finite,

โˆš ๐‘‘
๐‘›(Xฬ„ ๐‘› โˆ’ ๐œ‡) โ†’ ๐‘ (0, ฮฃ) as ๐‘›โ†’โˆž (8)
400 24. LLN AND CLT

24.6 Exercises

24.6.1 Exercise 1

One very useful consequence of the central limit theorem is as follows


Assume the conditions of the CLT as stated above
If ๐‘” โˆถ R โ†’ R is differentiable at ๐œ‡ and ๐‘”โ€ฒ (๐œ‡) โ‰  0, then

โˆš ๐‘‘
๐‘›{๐‘”(๐‘‹ฬ„ ๐‘› ) โˆ’ ๐‘”(๐œ‡)} โ†’ ๐‘ (0, ๐‘”โ€ฒ (๐œ‡)2 ๐œŽ2 ) as ๐‘›โ†’โˆž (9)

This theorem is used frequently in statistics to obtain the asymptotic distribution of estima-
tors โ€” many of which can be expressed as functions of sample means
(These kinds of results are often said to use the โ€œdelta methodโ€)
The proof is based on a Taylor expansion of ๐‘” around the point ๐œ‡
Taking the result as given, let the distribution ๐น of each ๐‘‹๐‘– be uniform on [0, ๐œ‹/2] and let
๐‘”(๐‘ฅ) = sin(๐‘ฅ)
โˆš
Derive the asymptotic distribution of ๐‘›{๐‘”(๐‘‹ฬ„ ๐‘› ) โˆ’ ๐‘”(๐œ‡)} and illustrate convergence in the
same spirit as the program illustrate_clt.py discussed above
What happens when you replace [0, ๐œ‹/2] with [0, ๐œ‹]?
What is the source of the problem?

24.6.2 Exercise 2

Hereโ€™s a result thatโ€™s often used in developing statistical tests, and is connected to the multi-
variate central limit theorem
If you study econometric theory, you will see this result used again and again
Assume the setting of the multivariate CLT discussed above, so that

1. X1 , โ€ฆ , X๐‘› is a sequence of IID random vectors, each taking values in R๐‘˜


2. ๐œ‡ โˆถ= E[X๐‘– ], and ฮฃ is the variance-covariance matrix of X๐‘–
3. The convergence

โˆš ๐‘‘
๐‘›(Xฬ„ ๐‘› โˆ’ ๐œ‡) โ†’ ๐‘ (0, ฮฃ) (10)

is valid
In a statistical setting, one often wants the right-hand side to be standard normal so that
confidence intervals are easily computed
This normalization can be achieved on the basis of three observations
First, if X is a random vector in R๐‘˜ and A is constant and ๐‘˜ ร— ๐‘˜, then

Var[AX] = A Var[X]Aโ€ฒ
24.6. EXERCISES 401

๐‘‘
Second, by the continuous mapping theorem, if Z๐‘› โ†’ Z in R๐‘˜ and A is constant and ๐‘˜ ร— ๐‘˜,
then

๐‘‘
AZ๐‘› โ†’ AZ

Third, if S is a ๐‘˜ ร— ๐‘˜ symmetric positive definite matrix, then there exists a symmetric posi-
tive definite matrix Q, called the inverse square root of S, such that

QSQโ€ฒ = I

Here I is the ๐‘˜ ร— ๐‘˜ identity matrix


Putting these things together, your first exercise is to show that if Q is the inverse square
root of ๏ฟฝ, then

โˆš ๐‘‘
Z๐‘› โˆถ= ๐‘›Q(Xฬ„ ๐‘› โˆ’ ๐œ‡) โ†’ Z โˆผ ๐‘ (0, I)

Applying the continuous mapping theorem one more time tells us that

๐‘‘
โ€–Z๐‘› โ€–2 โ†’ โ€–Zโ€–2

Given the distribution of Z, we conclude that

๐‘‘
๐‘›โ€–Q(Xฬ„ ๐‘› โˆ’ ๐œ‡)โ€–2 โ†’ ๐œ’2 (๐‘˜) (11)

where ๐œ’2 (๐‘˜) is the chi-squared distribution with ๐‘˜ degrees of freedom


(Recall that ๐‘˜ is the dimension of X๐‘– , the underlying random vectors)
Your second exercise is to illustrate the convergence in Eq. (11) with a simulation
In doing so, let

๐‘Š๐‘–
X๐‘– โˆถ= ( )
๐‘ˆ๐‘– + ๐‘Š ๐‘–

where

โ€ข each ๐‘Š๐‘– is an IID draw from the uniform distribution on [โˆ’1, 1]


โ€ข each ๐‘ˆ๐‘– is an IID draw from the uniform distribution on [โˆ’2, 2]
โ€ข ๐‘ˆ๐‘– and ๐‘Š๐‘– are independent of each other

Hints:

1. scipy.linalg.sqrtm(A) computes the square root of A. You still need to invert it


2. You should be able to work out ฮฃ from the preceding information
402 24. LLN AND CLT

24.7 Solutions

24.7.1 Exercise 1

Here is one solution

In [7]: """
Illustrates the delta method, a consequence of the central limit theorem.
"""

from scipy.stats import uniform

# == Set parameters == #
n = 250
replications = 100000
distribution = uniform(loc=0, scale=(np.pi / 2))
ฮผ, s = distribution.mean(), distribution.std()

g = np.sin
g_prime = np.cos

# == Generate obs of sqrt{n} (g(X_n) - g(ฮผ)) == #


data = distribution.rvs((replications, n))
sample_means = data.mean(axis=1) # Compute mean of each row
error_obs = np.sqrt(n) * (g(sample_means) - g(ฮผ))

# == Plot == #
asymptotic_sd = g_prime(ฮผ) * s
fig, ax = plt.subplots(figsize=(10, 6))
xmin = -3 * g_prime(ฮผ) * s
xmax = -xmin
ax.set_xlim(xmin, xmax)
ax.hist(error_obs, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
lb = "$N(0, g'(\mu)^2 \sigma^2)$"
ax.plot(xgrid, norm.pdf(xgrid, scale=asymptotic_sd), 'k-', lw=2, label=lb)
ax.legend()
plt.show()
24.7. SOLUTIONS 403

What happens when you replace [0, ๐œ‹/2] with [0, ๐œ‹]?
In this case, the mean ๐œ‡ of this distribution is ๐œ‹/2, and since ๐‘”โ€ฒ = cos, we have ๐‘”โ€ฒ (๐œ‡) = 0
Hence the conditions of the delta theorem are not satisfied

24.7.2 Exercise 2

First we want to verify the claim that

โˆš ๐‘‘
๐‘›Q(Xฬ„ ๐‘› โˆ’ ๐œ‡) โ†’ ๐‘ (0, I)

This is straightforward given the facts presented in the exercise


Let

โˆš
Y๐‘› โˆถ= ๐‘›(Xฬ„ ๐‘› โˆ’ ๐œ‡) and Y โˆผ ๐‘ (0, ฮฃ)

By the multivariate CLT and the continuous mapping theorem, we have

๐‘‘
QY๐‘› โ†’ QY

Since linear combinations of normal random variables are normal, the vector QY is also nor-
mal
Its mean is clearly 0, and its variance-covariance matrix is

Var[QY] = QVar[Y]Qโ€ฒ = QฮฃQโ€ฒ = I

๐‘‘
In conclusion, QY๐‘› โ†’ QY โˆผ ๐‘ (0, I), which is what we aimed to show
Now we turn to the simulation exercise
Our solution is as follows

In [8]: from scipy.stats import chi2


from scipy.linalg import inv, sqrtm

# == Set parameters == #
n = 250
replications = 50000
dw = uniform(loc=-1, scale=2) # Uniform(-1, 1)
du = uniform(loc=-2, scale=4) # Uniform(-2, 2)
sw, su = dw.std(), du.std()
vw, vu = sw**2, su**2
ฮฃ = ((vw, vw), (vw, vw + vu))
ฮฃ = np.array(ฮฃ)

# == Compute ฮฃ^{-1/2} == #
Q = inv(sqrtm(ฮฃ))

# == Generate observations of the normalized sample mean == #


error_obs = np.empty((2, replications))
for i in range(replications):
# == Generate one sequence of bivariate shocks == #
X = np.empty((2, n))
W = dw.rvs(n)
U = du.rvs(n)
404 24. LLN AND CLT

# == Construct the n observations of the random vector == #


X[0, :] = W
X[1, :] = W + U
# == Construct the i-th observation of Y_n == #
error_obs[:, i] = np.sqrt(n) * X.mean(axis=1)

# == Premultiply by Q and then take the squared norm == #


temp = Q @ error_obs
chisq_obs = np.sum(temp**2, axis=0)

# == Plot == #
fig, ax = plt.subplots(figsize=(10, 6))
xmax = 8
ax.set_xlim(0, xmax)
xgrid = np.linspace(0, xmax, 200)
lb = "Chi-squared with 2 degrees of freedom"
ax.plot(xgrid, chi2.pdf(xgrid, 2), 'k-', lw=2, label=lb)
ax.legend()
ax.hist(chisq_obs, bins=50, density=True)
plt.show()
25

Linear State Space Models

25.1 Contents

โ€ข Overview 25.2

โ€ข The Linear State Space Model 25.3

โ€ข Distributions and Moments 25.4

โ€ข Stationarity and Ergodicity 25.5

โ€ข Noisy Observations 25.6

โ€ข Prediction 25.7

โ€ข Code 25.8

โ€ข Exercises 25.9

โ€ข Solutions 25.10

โ€œWe may regard the present state of the universe as the effect of its past and the
cause of its futureโ€ โ€“ Marquis de Laplace

In addition to whatโ€™s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

25.2 Overview

This lecture introduces the linear state space dynamic system


This model is a workhorse that carries a powerful theory of prediction
Its many applications include:

โ€ข representing dynamics of higher-order linear systems

โ€ข predicting the position of a system ๐‘— steps into the future

405
406 25. LINEAR STATE SPACE MODELS

โ€ข predicting a geometric sum of future values of a variable like

โ€“ non-financial income
โ€“ dividends on a stock
โ€“ the money supply
โ€“ a government deficit or surplus, etc.

โ€ข key ingredient of useful models

โ€“ Friedmanโ€™s permanent income model of consumption smoothing


โ€“ Barroโ€™s model of smoothing total tax collections
โ€“ Rational expectations version of Caganโ€™s model of hyperinflation
โ€“ Sargent and Wallaceโ€™s โ€œunpleasant monetarist arithmetic,โ€ etc.

25.3 The Linear State Space Model

The objects in play are:

โ€ข An ๐‘› ร— 1 vector ๐‘ฅ๐‘ก denoting the state at time ๐‘ก = 0, 1, 2, โ€ฆ


โ€ข An IID sequence of ๐‘š ร— 1 random vectors ๐‘ค๐‘ก โˆผ ๐‘ (0, ๐ผ)
โ€ข A ๐‘˜ ร— 1 vector ๐‘ฆ๐‘ก of observations at time ๐‘ก = 0, 1, 2, โ€ฆ
โ€ข An ๐‘› ร— ๐‘› matrix ๐ด called the transition matrix
โ€ข An ๐‘› ร— ๐‘š matrix ๐ถ called the volatility matrix
โ€ข A ๐‘˜ ร— ๐‘› matrix ๐บ sometimes called the output matrix

Here is the linear state-space system

๐‘ฅ๐‘ก+1 = ๐ด๐‘ฅ๐‘ก + ๐ถ๐‘ค๐‘ก+1


๐‘ฆ๐‘ก = ๐บ๐‘ฅ๐‘ก (1)
๐‘ฅ0 โˆผ ๐‘ (๐œ‡0 , ฮฃ0 )

25.3.1 Primitives

The primitives of the model are

1. the matrices ๐ด, ๐ถ, ๐บ
2. shock distribution, which we have specialized to ๐‘ (0, ๐ผ)
3. the distribution of the initial condition ๐‘ฅ0 , which we have set to ๐‘ (๐œ‡0 , ฮฃ0 )

Given ๐ด, ๐ถ, ๐บ and draws of ๐‘ฅ0 and ๐‘ค1 , ๐‘ค2 , โ€ฆ, the model Eq. (1) pins down the values of the
sequences {๐‘ฅ๐‘ก } and {๐‘ฆ๐‘ก }
Even without these draws, the primitives 1โ€“3 pin down the probability distributions of {๐‘ฅ๐‘ก }
and {๐‘ฆ๐‘ก }
Later weโ€™ll see how to compute these distributions and their moments
Martingale Difference Shocks
Weโ€™ve made the common assumption that the shocks are independent standardized normal
vectors
25.3. THE LINEAR STATE SPACE MODEL 407

But some of what we say will be valid under the assumption that {๐‘ค๐‘ก+1 } is a martingale
difference sequence
A martingale difference sequence is a sequence that is zero mean when conditioned on past
information
In the present case, since {๐‘ฅ๐‘ก } is our state sequence, this means that it satisfies

E[๐‘ค๐‘ก+1 |๐‘ฅ๐‘ก , ๐‘ฅ๐‘กโˆ’1 , โ€ฆ] = 0

This is a weaker condition than that {๐‘ค๐‘ก } is IID with ๐‘ค๐‘ก+1 โˆผ ๐‘ (0, ๐ผ)

25.3.2 Examples

By appropriate choice of the primitives, a variety of dynamics can be represented in terms of


the linear state space model
The following examples help to highlight this point
They also illustrate the wise dictum finding the state is an art
Second-order Difference Equation
Let {๐‘ฆ๐‘ก } be a deterministic sequence that satisfies

๐‘ฆ๐‘ก+1 = ๐œ™0 + ๐œ™1 ๐‘ฆ๐‘ก + ๐œ™2 ๐‘ฆ๐‘กโˆ’1 s.t. ๐‘ฆ0 , ๐‘ฆโˆ’1 given (2)

To map Eq. (2) into our state space system Eq. (1), we set

1 1 0 0 0
๐‘ฅ๐‘ก = โŽก ๐‘ฆ
โŽข ๐‘ก โŽฅ
โŽค ๐ด=โŽก โŽค
โŽข 0 ๐œ™1 ๐œ™2 โŽฅ
๐œ™ ๐ถ=โŽก
โŽข0โŽฅ
โŽค ๐บ = [0 1 0]
โŽฃ๐‘ฆ๐‘กโˆ’1 โŽฆ โŽฃ0 1 0โŽฆ โŽฃ0โŽฆ

You can confirm that under these definitions, Eq. (1) and Eq. (2) agree
The next figure shows the dynamics of this process when ๐œ™0 = 1.1, ๐œ™1 = 0.8, ๐œ™2 = โˆ’0.8, ๐‘ฆ0 =
๐‘ฆโˆ’1 = 1
408 25. LINEAR STATE SPACE MODELS

Later youโ€™ll be asked to recreate this figure


Univariate Autoregressive Processes
We can use Eq. (1) to represent the model

๐‘ฆ๐‘ก+1 = ๐œ™1 ๐‘ฆ๐‘ก + ๐œ™2 ๐‘ฆ๐‘กโˆ’1 + ๐œ™3 ๐‘ฆ๐‘กโˆ’2 + ๐œ™4 ๐‘ฆ๐‘กโˆ’3 + ๐œŽ๐‘ค๐‘ก+1 (3)

where {๐‘ค๐‘ก } is IID and standard normal


โ€ฒ
To put this in the linear state space format we take ๐‘ฅ๐‘ก = [๐‘ฆ๐‘ก ๐‘ฆ๐‘กโˆ’1 ๐‘ฆ๐‘กโˆ’2 ๐‘ฆ๐‘กโˆ’3 ] and

๐œ™1 ๐œ™2 ๐œ™3 ๐œ™4 ๐œŽ
โŽก1 0 0 0โŽค โŽก0โŽค
๐ด=โŽข โŽฅ ๐ถ=โŽข โŽฅ ๐บ = [1 0 0 0]
โŽข0 1 0 0โŽฅ โŽข0โŽฅ
โŽฃ0 0 1 0โŽฆ โŽฃ0โŽฆ

The matrix ๐ด has the form of the companion matrix to the vector [๐œ™1 ๐œ™2 ๐œ™3 ๐œ™4 ]
The next figure shows the dynamics of this process when

๐œ™1 = 0.5, ๐œ™2 = โˆ’0.2, ๐œ™3 = 0, ๐œ™4 = 0.5, ๐œŽ = 0.2, ๐‘ฆ0 = ๐‘ฆโˆ’1 = ๐‘ฆโˆ’2 = ๐‘ฆโˆ’3 = 1

Vector Autoregressions
Now suppose that

โ€ข ๐‘ฆ๐‘ก is a ๐‘˜ ร— 1 vector
โ€ข ๐œ™๐‘— is a ๐‘˜ ร— ๐‘˜ matrix and
โ€ข ๐‘ค๐‘ก is ๐‘˜ ร— 1

Then Eq. (3) is termed a vector autoregression


To map this into Eq. (1), we set
25.3. THE LINEAR STATE SPACE MODEL 409

๐‘ฆ๐‘ก ๐œ™1 ๐œ™2 ๐œ™3 ๐œ™4 ๐œŽ
โŽก๐‘ฆ โŽค โŽก๐ผ 0 0 0โŽค โŽก0โŽค
๐‘ฅ๐‘ก = โŽข ๐‘กโˆ’1 โŽฅ ๐ด=โŽข โŽฅ ๐ถ=โŽข โŽฅ ๐บ = [๐ผ 0 0 0]
โŽข๐‘ฆ๐‘กโˆ’2 โŽฅ โŽข0 ๐ผ 0 0โŽฅ โŽข0โŽฅ
โŽฃ๐‘ฆ๐‘กโˆ’3 โŽฆ โŽฃ0 0 ๐ผ 0โŽฆ โŽฃ0โŽฆ

where ๐ผ is the ๐‘˜ ร— ๐‘˜ identity matrix and ๐œŽ is a ๐‘˜ ร— ๐‘˜ matrix


Seasonals
We can use Eq. (1) to represent

1. the deterministic seasonal ๐‘ฆ๐‘ก = ๐‘ฆ๐‘กโˆ’4


2. the indeterministic seasonal ๐‘ฆ๐‘ก = ๐œ™4 ๐‘ฆ๐‘กโˆ’4 + ๐‘ค๐‘ก

In fact, both are special cases of Eq. (3)


With the deterministic seasonal, the transition matrix becomes

0 0 0 1
โŽก1 0 0 0โŽค
๐ด=โŽข โŽฅ
โŽข0 1 0 0โŽฅ
โŽฃ0 0 1 0โŽฆ

It is easy to check that ๐ด4 = ๐ผ, which implies that ๐‘ฅ๐‘ก is strictly periodic with period 4:[1]

๐‘ฅ๐‘ก+4 = ๐‘ฅ๐‘ก

Such an ๐‘ฅ๐‘ก process can be used to model deterministic seasonals in quarterly time series
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations
Time Trends
The model ๐‘ฆ๐‘ก = ๐‘Ž๐‘ก + ๐‘ is known as a linear time trend
We can represent this model in the linear state space form by taking

1 1 0
๐ด=[ ] ๐ถ=[ ] ๐บ = [๐‘Ž ๐‘] (4)
0 1 0
โ€ฒ
and starting at initial condition ๐‘ฅ0 = [0 1]
In fact, itโ€™s possible to use the state-space system to represent polynomial trends of any order
For instance, let

0 1 1 0 0
๐‘ฅ0 = โŽข0โŽค
โŽก
โŽฅ ๐ด = โŽข0 1 1 โŽค
โŽก
โŽฅ ๐ถ = โŽข0โŽค
โŽก
โŽฅ
1
โŽฃ โŽฆ โŽฃ 0 0 1 โŽฆ 0
โŽฃ โŽฆ
It follows that

1 ๐‘ก ๐‘ก(๐‘ก โˆ’ 1)/2
๐ด๐‘ก = โŽก
โŽข0 1 ๐‘ก โŽค
โŽฅ
โŽฃ0 0 1 โŽฆ
410 25. LINEAR STATE SPACE MODELS

Then ๐‘ฅโ€ฒ๐‘ก = [๐‘ก(๐‘ก โˆ’ 1)/2 ๐‘ก 1], so that ๐‘ฅ๐‘ก contains linear and quadratic time trends

25.3.3 Moving Average Representations

A nonrecursive expression for ๐‘ฅ๐‘ก as a function of ๐‘ฅ0 , ๐‘ค1 , ๐‘ค2 , โ€ฆ , ๐‘ค๐‘ก can be found by using


Eq. (1) repeatedly to obtain

๐‘ฅ๐‘ก = ๐ด๐‘ฅ๐‘กโˆ’1 + ๐ถ๐‘ค๐‘ก
= ๐ด2 ๐‘ฅ๐‘กโˆ’2 + ๐ด๐ถ๐‘ค๐‘กโˆ’1 + ๐ถ๐‘ค๐‘ก
โ‹ฎ (5)
๐‘กโˆ’1
= โˆ‘ ๐ด๐‘— ๐ถ๐‘ค๐‘กโˆ’๐‘— + ๐ด๐‘ก ๐‘ฅ0
๐‘—=0

Representation Eq. (5) is a moving average representation


It expresses {๐‘ฅ๐‘ก } as a linear function of

1. current and past values of the process {๐‘ค๐‘ก } and


2. the initial condition ๐‘ฅ0

As an example of a moving average representation, let the model be

1 1 1
๐ด=[ ] ๐ถ=[ ]
0 1 0

1 ๐‘ก โ€ฒ
You will be able to show that ๐ด๐‘ก = [ ] and ๐ด๐‘— ๐ถ = [1 0]
0 1
Substituting into the moving average representation Eq. (5), we obtain

๐‘กโˆ’1
๐‘ฅ1๐‘ก = โˆ‘ ๐‘ค๐‘กโˆ’๐‘— + [1 ๐‘ก] ๐‘ฅ0
๐‘—=0

where ๐‘ฅ1๐‘ก is the first entry of ๐‘ฅ๐‘ก


The first term on the right is a cumulated sum of martingale differences and is therefore a
martingale
The second term is a translated linear function of time
For this reason, ๐‘ฅ1๐‘ก is called a martingale with drift

25.4 Distributions and Moments

25.4.1 Unconditional Moments

Using Eq. (1), itโ€™s easy to obtain expressions for the (unconditional) means of ๐‘ฅ๐‘ก and ๐‘ฆ๐‘ก
Weโ€™ll explain what unconditional and conditional mean soon
25.4. DISTRIBUTIONS AND MOMENTS 411

Letting ๐œ‡๐‘ก โˆถ= E[๐‘ฅ๐‘ก ] and using linearity of expectations, we find that

๐œ‡๐‘ก+1 = ๐ด๐œ‡๐‘ก with ๐œ‡0 given (6)

Here ๐œ‡0 is a primitive given in Eq. (1)


The variance-covariance matrix of ๐‘ฅ๐‘ก is ฮฃ๐‘ก โˆถ= E[(๐‘ฅ๐‘ก โˆ’ ๐œ‡๐‘ก )(๐‘ฅ๐‘ก โˆ’ ๐œ‡๐‘ก )โ€ฒ ]
Using ๐‘ฅ๐‘ก+1 โˆ’ ๐œ‡๐‘ก+1 = ๐ด(๐‘ฅ๐‘ก โˆ’ ๐œ‡๐‘ก ) + ๐ถ๐‘ค๐‘ก+1 , we can determine this matrix recursively via

ฮฃ๐‘ก+1 = ๐ดฮฃ๐‘ก ๐ดโ€ฒ + ๐ถ๐ถ โ€ฒ with ฮฃ0 given (7)

As with ๐œ‡0 , the matrix ฮฃ0 is a primitive given in Eq. (1)


As a matter of terminology, we will sometimes call

โ€ข ๐œ‡๐‘ก the unconditional mean of ๐‘ฅ๐‘ก


โ€ข ฮฃ๐‘ก the unconditional variance-covariance matrix of ๐‘ฅ๐‘ก

This is to distinguish ๐œ‡๐‘ก and ฮฃ๐‘ก from related objects that use conditioning information, to be
defined below
However, you should be aware that these โ€œunconditionalโ€ moments do depend on the initial
distribution ๐‘ (๐œ‡0 , ฮฃ0 )
Moments of the Observations
Using linearity of expectations again we have

E[๐‘ฆ๐‘ก ] = E[๐บ๐‘ฅ๐‘ก ] = ๐บ๐œ‡๐‘ก (8)

The variance-covariance matrix of ๐‘ฆ๐‘ก is easily shown to be

Var[๐‘ฆ๐‘ก ] = Var[๐บ๐‘ฅ๐‘ก ] = ๐บฮฃ๐‘ก ๐บโ€ฒ (9)

25.4.2 Distributions

In general, knowing the mean and variance-covariance matrix of a random vector is not quite
as good as knowing the full distribution
However, there are some situations where these moments alone tell us all we need to know
These are situations in which the mean vector and covariance matrix are sufficient statis-
tics for the population distribution
(Sufficient statistics form a list of objects that characterize a population distribution)
One such situation is when the vector in question is Gaussian (i.e., normally distributed)
This is the case here, given

1. our Gaussian assumptions on the primitives


2. the fact that normality is preserved under linear operations
412 25. LINEAR STATE SPACE MODELS

In fact, itโ€™s well-known that

๐‘ข โˆผ ๐‘ (๐‘ข,ฬ„ ๐‘†) and ๐‘ฃ = ๐‘Ž + ๐ต๐‘ข โŸน ๐‘ฃ โˆผ ๐‘ (๐‘Ž + ๐ต๐‘ข,ฬ„ ๐ต๐‘†๐ตโ€ฒ ) (10)

In particular, given our Gaussian assumptions on the primitives and the linearity of Eq. (1)
we can see immediately that both ๐‘ฅ๐‘ก and ๐‘ฆ๐‘ก are Gaussian for all ๐‘ก โ‰ฅ 0 [2]
Since ๐‘ฅ๐‘ก is Gaussian, to find the distribution, all we need to do is find its mean and variance-
covariance matrix
But in fact weโ€™ve already done this, in Eq. (6) and Eq. (7)
Letting ๐œ‡๐‘ก and ฮฃ๐‘ก be as defined by these equations, we have

๐‘ฅ๐‘ก โˆผ ๐‘ (๐œ‡๐‘ก , ฮฃ๐‘ก ) (11)

By similar reasoning combined with Eq. (8) and Eq. (9),

๐‘ฆ๐‘ก โˆผ ๐‘ (๐บ๐œ‡๐‘ก , ๐บฮฃ๐‘ก ๐บโ€ฒ ) (12)

25.4.3 Ensemble Interpretations

How should we interpret the distributions defined by Eq. (11)โ€“Eq. (12)?


Intuitively, the probabilities in a distribution correspond to relative frequencies in a large
population drawn from that distribution
Letโ€™s apply this idea to our setting, focusing on the distribution of ๐‘ฆ๐‘‡ for fixed ๐‘‡
We can generate independent draws of ๐‘ฆ๐‘‡ by repeatedly simulating the evolution of the sys-
tem up to time ๐‘‡ , using an independent set of shocks each time
The next figure shows 20 simulations, producing 20 time series for {๐‘ฆ๐‘ก }, and hence 20 draws
of ๐‘ฆ๐‘‡
The system in question is the univariate autoregressive model Eq. (3)
The values of ๐‘ฆ๐‘‡ are represented by black dots in the left-hand figure

In the right-hand figure, these values are converted into a rotated histogram that shows rela-
tive frequencies from our sample of 20 ๐‘ฆ๐‘‡ โ€™s
(The parameters and source code for the figures can be found in file lin-
ear_models/paths_and_hist.py)
Here is another figure, this time with 100 observations
25.4. DISTRIBUTIONS AND MOMENTS 413

Letโ€™s now try with 500,000 observations, showing only the histogram (without rotation)

The black line is the population density of ๐‘ฆ๐‘‡ calculated from Eq. (12)
The histogram and population distribution are close, as expected
By looking at the figures and experimenting with parameters, you will gain a feel for how the
population distribution depends on the model primitives listed above, as intermediated by the
distributionโ€™s sufficient statistics
Ensemble Means
In the preceding figure, we approximated the population distribution of ๐‘ฆ๐‘‡ by

1. generating ๐ผ sample paths (i.e., time series) where ๐ผ is a large number


2. recording each observation ๐‘ฆ๐‘‡๐‘–
3. histogramming this sample

Just as the histogram approximates the population distribution, the ensemble or cross-
sectional average

1 ๐ผ ๐‘–
๐‘ฆ๐‘‡ฬ„ โˆถ= โˆ‘๐‘ฆ
๐ผ ๐‘–=1 ๐‘‡

approximates the expectation E[๐‘ฆ๐‘‡ ] = ๐บ๐œ‡๐‘‡ (as implied by the law of large numbers)
Hereโ€™s a simulation comparing the ensemble averages and population means at time points
๐‘ก = 0, โ€ฆ , 50
414 25. LINEAR STATE SPACE MODELS

The parameters are the same as for the preceding figures, and the sample size is relatively
small (๐ผ = 20)

The ensemble mean for ๐‘ฅ๐‘ก is

1 ๐ผ ๐‘–
๐‘ฅ๐‘‡ฬ„ โˆถ= โˆ‘ ๐‘ฅ โ†’ ๐œ‡๐‘‡ (๐ผ โ†’ โˆž)
๐ผ ๐‘–=1 ๐‘‡

The limit ๐œ‡๐‘‡ is a โ€œlong-run averageโ€


(By long-run average we mean the average for an infinite (๐ผ = โˆž) number of sample ๐‘ฅ๐‘‡ โ€™s)
Another application of the law of large numbers assures us that

1 ๐ผ
โˆ‘(๐‘ฅ๐‘– โˆ’ ๐‘ฅ๐‘‡ฬ„ )(๐‘ฅ๐‘–๐‘‡ โˆ’ ๐‘ฅ๐‘‡ฬ„ )โ€ฒ โ†’ ฮฃ๐‘‡ (๐ผ โ†’ โˆž)
๐ผ ๐‘–=1 ๐‘‡

25.4.4 Joint Distributions

In the preceding discussion, we looked at the distributions of ๐‘ฅ๐‘ก and ๐‘ฆ๐‘ก in isolation


This gives us useful information but doesnโ€™t allow us to answer questions like

โ€ข whatโ€™s the probability that ๐‘ฅ๐‘ก โ‰ฅ 0 for all ๐‘ก?


โ€ข whatโ€™s the probability that the process {๐‘ฆ๐‘ก } exceeds some value ๐‘Ž before falling below
๐‘?
โ€ข etc., etc.

Such questions concern the joint distributions of these sequences


To compute the joint distribution of ๐‘ฅ0 , ๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘‡ , recall that joint and conditional densities
are linked by the rule

๐‘(๐‘ฅ, ๐‘ฆ) = ๐‘(๐‘ฆ | ๐‘ฅ)๐‘(๐‘ฅ) (joint = conditional ร— marginal)


25.5. STATIONARITY AND ERGODICITY 415

From this rule we get ๐‘(๐‘ฅ0 , ๐‘ฅ1 ) = ๐‘(๐‘ฅ1 | ๐‘ฅ0 )๐‘(๐‘ฅ0 )


The Markov property ๐‘(๐‘ฅ๐‘ก | ๐‘ฅ๐‘กโˆ’1 , โ€ฆ , ๐‘ฅ0 ) = ๐‘(๐‘ฅ๐‘ก | ๐‘ฅ๐‘กโˆ’1 ) and repeated applications of the preced-
ing rule lead us to

๐‘‡ โˆ’1
๐‘(๐‘ฅ0 , ๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘‡ ) = ๐‘(๐‘ฅ0 ) โˆ ๐‘(๐‘ฅ๐‘ก+1 | ๐‘ฅ๐‘ก )
๐‘ก=0

The marginal ๐‘(๐‘ฅ0 ) is just the primitive ๐‘ (๐œ‡0 , ฮฃ0 )


In view of Eq. (1), the conditional densities are

๐‘(๐‘ฅ๐‘ก+1 | ๐‘ฅ๐‘ก ) = ๐‘ (๐ด๐‘ฅ๐‘ก , ๐ถ๐ถ โ€ฒ )

Autocovariance Functions
An important object related to the joint distribution is the autocovariance function

ฮฃ๐‘ก+๐‘—,๐‘ก โˆถ= E[(๐‘ฅ๐‘ก+๐‘— โˆ’ ๐œ‡๐‘ก+๐‘— )(๐‘ฅ๐‘ก โˆ’ ๐œ‡๐‘ก )โ€ฒ ] (13)

Elementary calculations show that

ฮฃ๐‘ก+๐‘—,๐‘ก = ๐ด๐‘— ฮฃ๐‘ก (14)

Notice that ฮฃ๐‘ก+๐‘—,๐‘ก in general depends on both ๐‘—, the gap between the two dates, and ๐‘ก, the
earlier date

25.5 Stationarity and Ergodicity

Stationarity and ergodicity are two properties that, when they hold, greatly aid analysis of
linear state space models
Letโ€™s start with the intuition

25.5.1 Visualizing Stability

Letโ€™s look at some more time series from the same model that we analyzed above
This picture shows cross-sectional distributions for ๐‘ฆ at times ๐‘‡ , ๐‘‡ โ€ฒ , ๐‘‡ โ€ณ
416 25. LINEAR STATE SPACE MODELS

Note how the time series โ€œsettle downโ€ in the sense that the distributions at ๐‘‡ โ€ฒ and ๐‘‡ โ€ณ are
relatively similar to each other โ€” but unlike the distribution at ๐‘‡
Apparently, the distributions of ๐‘ฆ๐‘ก converge to a fixed long-run distribution as ๐‘ก โ†’ โˆž
When such a distribution exists it is called a stationary distribution

25.5.2 Stationary Distributions

In our setting, a distribution ๐œ“โˆž is said to be stationary for ๐‘ฅ๐‘ก if

๐‘ฅ๐‘ก โˆผ ๐œ“โˆž and ๐‘ฅ๐‘ก+1 = ๐ด๐‘ฅ๐‘ก + ๐ถ๐‘ค๐‘ก+1 โŸน ๐‘ฅ๐‘ก+1 โˆผ ๐œ“โˆž

Since

1. in the present case, all distributions are Gaussian


2. a Gaussian distribution is pinned down by its mean and variance-covariance matrix

we can restate the definition as follows: ๐œ“โˆž is stationary for ๐‘ฅ๐‘ก if

๐œ“โˆž = ๐‘ (๐œ‡โˆž , ฮฃโˆž )

where ๐œ‡โˆž and ฮฃโˆž are fixed points of Eq. (6) and Eq. (7) respectively
25.5. STATIONARITY AND ERGODICITY 417

25.5.3 Covariance Stationary Processes

Letโ€™s see what happens to the preceding figure if we start ๐‘ฅ0 at the stationary distribution

Now the differences in the observed distributions at ๐‘‡ , ๐‘‡ โ€ฒ and ๐‘‡ โ€ณ come entirely from random
fluctuations due to the finite sample size
By

โ€ข our choosing ๐‘ฅ0 โˆผ ๐‘ (๐œ‡โˆž , ฮฃโˆž )


โ€ข the definitions of ๐œ‡โˆž and ฮฃโˆž as fixed points of Eq. (6) and Eq. (7) respectively

weโ€™ve ensured that

๐œ‡๐‘ก = ๐œ‡โˆž and ฮฃ๐‘ก = ฮฃโˆž for all ๐‘ก

Moreover, in view of Eq. (14), the autocovariance function takes the form ฮฃ๐‘ก+๐‘—,๐‘ก = ๐ด๐‘— ฮฃโˆž ,
which depends on ๐‘— but not on ๐‘ก
This motivates the following definition
A process {๐‘ฅ๐‘ก } is said to be covariance stationary if

โ€ข both ๐œ‡๐‘ก and ฮฃ๐‘ก are constant in ๐‘ก


โ€ข ฮฃ๐‘ก+๐‘—,๐‘ก depends on the time gap ๐‘— but not on time ๐‘ก

In our setting, {๐‘ฅ๐‘ก } will be covariance stationary if ๐œ‡0 , ฮฃ0 , ๐ด, ๐ถ assume values that imply that
none of ๐œ‡๐‘ก , ฮฃ๐‘ก , ฮฃ๐‘ก+๐‘—,๐‘ก depends on ๐‘ก

25.5.4 Conditions for Stationarity

The Globally Stable Case


The difference equation ๐œ‡๐‘ก+1 = ๐ด๐œ‡๐‘ก is known to have unique fixed point ๐œ‡โˆž = 0 if all eigen-
values of ๐ด have moduli strictly less than unity
That is, if (np.absolute(np.linalg.eigvals(A)) < 1).all() == True
418 25. LINEAR STATE SPACE MODELS

The difference equation Eq. (7) also has a unique fixed point in this case, and, moreover

๐œ‡๐‘ก โ†’ ๐œ‡โˆž = 0 and ฮฃ๐‘ก โ†’ ฮฃโˆž as ๐‘กโ†’โˆž

regardless of the initial conditions ๐œ‡0 and ฮฃ0


This is the globally stable case โ€” see these notes for more a theoretical treatment
However, global stability is more than we need for stationary solutions, and often more than
we want
To illustrate, consider our second order difference equation example
โ€ฒ
Here the state is ๐‘ฅ๐‘ก = [1 ๐‘ฆ๐‘ก ๐‘ฆ๐‘กโˆ’1 ]
Because of the constant first component in the state vector, we will never have ๐œ‡๐‘ก โ†’ 0
How can we find stationary solutions that respect a constant state component?
Processes with a Constant State Component
To investigate such a process, suppose that ๐ด and ๐ถ take the form

๐ด1 ๐‘Ž ๐ถ1
๐ด=[ ] ๐ถ=[ ]
0 1 0

where

โ€ข ๐ด1 is an (๐‘› โˆ’ 1) ร— (๐‘› โˆ’ 1) matrix
โ€ข ๐‘Ž is an (๐‘› โˆ’ 1) ร— 1 column vector

โ€ฒ
Let ๐‘ฅ๐‘ก = [๐‘ฅโ€ฒ1๐‘ก 1] where ๐‘ฅ1๐‘ก is (๐‘› โˆ’ 1) ร— 1
It follows that

๐‘ฅ1,๐‘ก+1 = ๐ด1 ๐‘ฅ1๐‘ก + ๐‘Ž + ๐ถ1 ๐‘ค๐‘ก+1

Let ๐œ‡1๐‘ก = E[๐‘ฅ1๐‘ก ] and take expectations on both sides of this expression to get

๐œ‡1,๐‘ก+1 = ๐ด1 ๐œ‡1,๐‘ก + ๐‘Ž (15)

Assume now that the moduli of the eigenvalues of ๐ด1 are all strictly less than one
Then Eq. (15) has a unique stationary solution, namely,

๐œ‡1โˆž = (๐ผ โˆ’ ๐ด1 )โˆ’1 ๐‘Ž

โ€ฒ
The stationary value of ๐œ‡๐‘ก itself is then ๐œ‡โˆž โˆถ= [๐œ‡โ€ฒ1โˆž 1]
The stationary values of ฮฃ๐‘ก and ฮฃ๐‘ก+๐‘—,๐‘ก satisfy

ฮฃโˆž = ๐ดฮฃโˆž ๐ดโ€ฒ + ๐ถ๐ถ โ€ฒ
(16)
ฮฃ๐‘ก+๐‘—,๐‘ก = ๐ด๐‘— ฮฃโˆž
25.5. STATIONARITY AND ERGODICITY 419

Notice that here ฮฃ๐‘ก+๐‘—,๐‘ก depends on the time gap ๐‘— but not on calendar time ๐‘ก
In conclusion, if

โ€ข ๐‘ฅ0 โˆผ ๐‘ (๐œ‡โˆž , ฮฃโˆž ) and
โ€ข the moduli of the eigenvalues of ๐ด1 are all strictly less than unity

then the {๐‘ฅ๐‘ก } process is covariance stationary, with constant state component

Note
If the eigenvalues of ๐ด1 are less than unity in modulus, then (a) starting from any
initial value, the mean and variance-covariance matrix both converge to their sta-
tionary values; and (b) iterations on Eq. (7) converge to the fixed point of the dis-
crete Lyapunov equation in the first line of Eq. (16)

25.5.5 Ergodicity

Letโ€™s suppose that weโ€™re working with a covariance stationary process


In this case, we know that the ensemble mean will converge to ๐œ‡โˆž as the sample size ๐ผ ap-
proaches infinity
Averages over Time
Ensemble averages across simulations are interesting theoretically, but in real life, we usually
observe only a single realization {๐‘ฅ๐‘ก , ๐‘ฆ๐‘ก }๐‘‡๐‘ก=0
So now letโ€™s take a single realization and form the time-series averages

1 ๐‘‡ 1 ๐‘‡
๐‘ฅฬ„ โˆถ= โˆ‘๐‘ฅ and ๐‘ฆ ฬ„ โˆถ= โˆ‘๐‘ฆ
๐‘‡ ๐‘ก=1 ๐‘ก ๐‘‡ ๐‘ก=1 ๐‘ก

Do these time series averages converge to something interpretable in terms of our basic state-
space representation?
The answer depends on something called ergodicity
Ergodicity is the property that time series and ensemble averages coincide
More formally, ergodicity implies that time series sample averages converge to their expecta-
tion under the stationary distribution
In particular,

1 ๐‘‡
โ€ข ๐‘‡ โˆ‘๐‘ก=1 ๐‘ฅ๐‘ก โ†’ ๐œ‡โˆž
1 ๐‘‡
โ€ข ๐‘‡ โˆ‘๐‘ก=1 (๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘‡ฬ„ )(๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘‡ฬ„ )โ€ฒ โ†’ ฮฃโˆž
1 ๐‘‡
โ€ข ๐‘‡ โˆ‘๐‘ก=1 (๐‘ฅ๐‘ก+๐‘— โˆ’ ๐‘ฅ๐‘‡ฬ„ )(๐‘ฅ๐‘ก โˆ’ ๐‘ฅ๐‘‡ฬ„ )โ€ฒ โ†’ ๐ด๐‘— ฮฃโˆž

In our linear Gaussian setting, any covariance stationary process is also ergodic
420 25. LINEAR STATE SPACE MODELS

25.6 Noisy Observations

In some settings, the observation equation ๐‘ฆ๐‘ก = ๐บ๐‘ฅ๐‘ก is modified to include an error term
Often this error term represents the idea that the true state can only be observed imperfectly
To include an error term in the observation we introduce

โ€ข An IID sequence of โ„“ ร— 1 random vectors ๐‘ฃ๐‘ก โˆผ ๐‘ (0, ๐ผ)


โ€ข A ๐‘˜ ร— โ„“ matrix ๐ป

and extend the linear state-space system to

๐‘ฅ๐‘ก+1 = ๐ด๐‘ฅ๐‘ก + ๐ถ๐‘ค๐‘ก+1


๐‘ฆ๐‘ก = ๐บ๐‘ฅ๐‘ก + ๐ป๐‘ฃ๐‘ก (17)
๐‘ฅ0 โˆผ ๐‘ (๐œ‡0 , ฮฃ0 )

The sequence {๐‘ฃ๐‘ก } is assumed to be independent of {๐‘ค๐‘ก }


The process {๐‘ฅ๐‘ก } is not modified by noise in the observation equation and its moments, distri-
butions and stability properties remain the same
The unconditional moments of ๐‘ฆ๐‘ก from Eq. (8) and Eq. (9) now become

E[๐‘ฆ๐‘ก ] = E[๐บ๐‘ฅ๐‘ก + ๐ป๐‘ฃ๐‘ก ] = ๐บ๐œ‡๐‘ก (18)

The variance-covariance matrix of ๐‘ฆ๐‘ก is easily shown to be

Var[๐‘ฆ๐‘ก ] = Var[๐บ๐‘ฅ๐‘ก + ๐ป๐‘ฃ๐‘ก ] = ๐บฮฃ๐‘ก ๐บโ€ฒ + ๐ป๐ป โ€ฒ (19)

The distribution of ๐‘ฆ๐‘ก is therefore

๐‘ฆ๐‘ก โˆผ ๐‘ (๐บ๐œ‡๐‘ก , ๐บฮฃ๐‘ก ๐บโ€ฒ + ๐ป๐ป โ€ฒ )

25.7 Prediction

The theory of prediction for linear state space systems is elegant and simple

25.7.1 Forecasting Formulas โ€“ Conditional Means

The natural way to predict variables is to use conditional distributions


For example, the optimal forecast of ๐‘ฅ๐‘ก+1 given information known at time ๐‘ก is

E๐‘ก [๐‘ฅ๐‘ก+1 ] โˆถ= E[๐‘ฅ๐‘ก+1 โˆฃ ๐‘ฅ๐‘ก , ๐‘ฅ๐‘กโˆ’1 , โ€ฆ , ๐‘ฅ0 ] = ๐ด๐‘ฅ๐‘ก

The right-hand side follows from ๐‘ฅ๐‘ก+1 = ๐ด๐‘ฅ๐‘ก + ๐ถ๐‘ค๐‘ก+1 and the fact that ๐‘ค๐‘ก+1 is zero mean and
independent of ๐‘ฅ๐‘ก , ๐‘ฅ๐‘กโˆ’1 , โ€ฆ , ๐‘ฅ0
That E๐‘ก [๐‘ฅ๐‘ก+1 ] = E[๐‘ฅ๐‘ก+1 โˆฃ ๐‘ฅ๐‘ก ] is an implication of {๐‘ฅ๐‘ก } having the Markov property
25.7. PREDICTION 421

The one-step-ahead forecast error is

๐‘ฅ๐‘ก+1 โˆ’ E๐‘ก [๐‘ฅ๐‘ก+1 ] = ๐ถ๐‘ค๐‘ก+1

The covariance matrix of the forecast error is

E[(๐‘ฅ๐‘ก+1 โˆ’ E๐‘ก [๐‘ฅ๐‘ก+1 ])(๐‘ฅ๐‘ก+1 โˆ’ E๐‘ก [๐‘ฅ๐‘ก+1 ])โ€ฒ ] = ๐ถ๐ถ โ€ฒ

More generally, weโ€™d like to compute the ๐‘—-step ahead forecasts E๐‘ก [๐‘ฅ๐‘ก+๐‘— ] and E๐‘ก [๐‘ฆ๐‘ก+๐‘— ]
With a bit of algebra, we obtain

๐‘ฅ๐‘ก+๐‘— = ๐ด๐‘— ๐‘ฅ๐‘ก + ๐ด๐‘—โˆ’1 ๐ถ๐‘ค๐‘ก+1 + ๐ด๐‘—โˆ’2 ๐ถ๐‘ค๐‘ก+2 + โ‹ฏ + ๐ด0 ๐ถ๐‘ค๐‘ก+๐‘—

In view of the IID property, current and past state values provide no information about fu-
ture values of the shock
Hence E๐‘ก [๐‘ค๐‘ก+๐‘˜ ] = E[๐‘ค๐‘ก+๐‘˜ ] = 0
It now follows from linearity of expectations that the ๐‘—-step ahead forecast of ๐‘ฅ is

E๐‘ก [๐‘ฅ๐‘ก+๐‘— ] = ๐ด๐‘— ๐‘ฅ๐‘ก

The ๐‘—-step ahead forecast of ๐‘ฆ is therefore

E๐‘ก [๐‘ฆ๐‘ก+๐‘— ] = E๐‘ก [๐บ๐‘ฅ๐‘ก+๐‘— + ๐ป๐‘ฃ๐‘ก+๐‘— ] = ๐บ๐ด๐‘— ๐‘ฅ๐‘ก

25.7.2 Covariance of Prediction Errors

It is useful to obtain the covariance matrix of the vector of ๐‘—-step-ahead prediction errors

๐‘—โˆ’1
๐‘ฅ๐‘ก+๐‘— โˆ’ E๐‘ก [๐‘ฅ๐‘ก+๐‘— ] = โˆ‘ ๐ด๐‘  ๐ถ๐‘ค๐‘กโˆ’๐‘ +๐‘— (20)
๐‘ =0

Evidently,

๐‘—โˆ’1
โ€ฒ
๐‘‰๐‘— โˆถ= E๐‘ก [(๐‘ฅ๐‘ก+๐‘— โˆ’ E๐‘ก [๐‘ฅ๐‘ก+๐‘— ])(๐‘ฅ๐‘ก+๐‘— โˆ’ E๐‘ก [๐‘ฅ๐‘ก+๐‘— ]) ] = โˆ‘ ๐ด๐‘˜ ๐ถ๐ถ โ€ฒ ๐ด๐‘˜
โ€ฒ
(21)
๐‘˜=0

๐‘‰๐‘— defined in Eq. (21) can be calculated recursively via ๐‘‰1 = ๐ถ๐ถ โ€ฒ and

๐‘‰๐‘— = ๐ถ๐ถ โ€ฒ + ๐ด๐‘‰๐‘—โˆ’1 ๐ดโ€ฒ , ๐‘—โ‰ฅ2 (22)

๐‘‰๐‘— is the conditional covariance matrix of the errors in forecasting ๐‘ฅ๐‘ก+๐‘— , conditioned on time ๐‘ก
information ๐‘ฅ๐‘ก
Under particular conditions, ๐‘‰๐‘— converges to

๐‘‰โˆž = ๐ถ๐ถ โ€ฒ + ๐ด๐‘‰โˆž ๐ดโ€ฒ (23)
422 25. LINEAR STATE SPACE MODELS

Equation Eq. (23) is an example of a discrete Lyapunov equation in the covariance matrix ๐‘‰โˆž
A sufficient condition for ๐‘‰๐‘— to converge is that the eigenvalues of ๐ด be strictly less than one
in modulus
Weaker sufficient conditions for convergence associate eigenvalues equaling or exceeding one
in modulus with elements of ๐ถ that equal 0

25.7.3 Forecasts of Geometric Sums

In several contexts, we want to compute forecasts of geometric sums of future random vari-
ables governed by the linear state-space system Eq. (1)
We want the following objects

โˆž
โ€ข Forecast of a geometric sum of future ๐‘ฅโ€™s, or E๐‘ก [โˆ‘๐‘—=0 ๐›ฝ ๐‘— ๐‘ฅ๐‘ก+๐‘— ]
โˆž
โ€ข Forecast of a geometric sum of future ๐‘ฆโ€™s, or E๐‘ก [โˆ‘๐‘—=0 ๐›ฝ ๐‘— ๐‘ฆ๐‘ก+๐‘— ]

These objects are important components of some famous and interesting dynamic models
For example,

โˆž
โ€ข if {๐‘ฆ๐‘ก } is a stream of dividends, then E [โˆ‘๐‘—=0 ๐›ฝ ๐‘— ๐‘ฆ๐‘ก+๐‘— |๐‘ฅ๐‘ก ] is a model of a stock price
โˆž
โ€ข if {๐‘ฆ๐‘ก } is the money supply, then E [โˆ‘๐‘—=0 ๐›ฝ ๐‘— ๐‘ฆ๐‘ก+๐‘— |๐‘ฅ๐‘ก ] is a model of the price level

Formulas
Fortunately, it is easy to use a little matrix algebra to compute these objects
1
Suppose that every eigenvalue of ๐ด has modulus strictly less than ๐›ฝ
โˆ’1
It then follows that ๐ผ + ๐›ฝ๐ด + ๐›ฝ 2 ๐ด2 + โ‹ฏ = [๐ผ โˆ’ ๐›ฝ๐ด]
This leads to our formulas:

โ€ข Forecast of a geometric sum of future ๐‘ฅโ€™s

โˆž
E๐‘ก [โˆ‘ ๐›ฝ ๐‘— ๐‘ฅ๐‘ก+๐‘— ] = [๐ผ + ๐›ฝ๐ด + ๐›ฝ 2 ๐ด2 + โ‹ฏ ]๐‘ฅ๐‘ก = [๐ผ โˆ’ ๐›ฝ๐ด]โˆ’1 ๐‘ฅ๐‘ก
๐‘—=0

โ€ข Forecast of a geometric sum of future ๐‘ฆโ€™s

โˆž
E๐‘ก [โˆ‘ ๐›ฝ ๐‘— ๐‘ฆ๐‘ก+๐‘— ] = ๐บ[๐ผ + ๐›ฝ๐ด + ๐›ฝ 2 ๐ด2 + โ‹ฏ ]๐‘ฅ๐‘ก = ๐บ[๐ผ โˆ’ ๐›ฝ๐ด]โˆ’1 ๐‘ฅ๐‘ก
๐‘—=0

25.8 Code

Our preceding simulations and calculations are based on code in the file lss.py from the
QuantEcon.py package
25.9. EXERCISES 423

The code implements a class for handling linear state space models (simulations, calculating
moments, etc.)
One Python construct you might not be familiar with is the use of a generator function in the
method moment_sequence()
Go back and read the relevant documentation if youโ€™ve forgotten how generator functions
work
Examples of usage are given in the solutions to the exercises

25.9 Exercises

25.9.1 Exercise 1

Replicate this figure using the LinearStateSpace class from lss.py

25.9.2 Exercise 2

Replicate this figure modulo randomness using the same class

25.9.3 Exercise 3

Replicate this figure modulo randomness using the same class


The state space model and parameters are the same as for the preceding exercise

25.9.4 Exercise 4

Replicate this figure modulo randomness using the same class


The state space model and parameters are the same as for the preceding exercise, except that
the initial condition is the stationary distribution
Hint: You can use the stationary_distributions method to get the initial conditions
The number of sample paths is 80, and the time horizon in the figure is 100
Producing the vertical bars and dots is optional, but if you wish to try, the bars are at dates
10, 50 and 75

25.10 Solutions
In [2]: import numpy as np
import matplotlib.pyplot as plt
from quantecon import LinearStateSpace

25.10.1 Exercise 1
In [3]: ๏ฟฝ_0, ๏ฟฝ_1, ๏ฟฝ_2 = 1.1, 0.8, -0.8

A = [[1, 0, 0 ],
424 25. LINEAR STATE SPACE MODELS

[๏ฟฝ_0, ๏ฟฝ_1, ๏ฟฝ_2],


[0, 1, 0 ]]
C = np.zeros((3, 1))
G = [0, 1, 0]

ar = LinearStateSpace(A, C, G, mu_0=np.ones(3))
x, y = ar.simulate(ts_length=50)

fig, ax = plt.subplots(figsize=(10, 6))


y = y.flatten()
ax.plot(y, 'b-', lw=2, alpha=0.7)
ax.grid()
ax.set_xlabel('time')
ax.set_ylabel('$y_t$', fontsize=16)
plt.show()

25.10.2 Exercise 2
In [4]: ๏ฟฝ_1, ๏ฟฝ_2, ๏ฟฝ_3, ๏ฟฝ_4 = 0.5, -0.2, 0, 0.5
ฯƒ = 0.2

A = [[๏ฟฝ_1, ๏ฟฝ_2, ๏ฟฝ_3, ๏ฟฝ_4],


[1, 0, 0, 0 ],
[0, 1, 0, 0 ],
[0, 0, 1, 0 ]]
C = [[ฯƒ],
[0],
[0],
[0]]
G = [1, 0, 0, 0]

ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
x, y = ar.simulate(ts_length=200)

fig, ax = plt.subplots(figsize=(10, 6))


y = y.flatten()
ax.plot(y, 'b-', lw=2, alpha=0.7)
ax.grid()
ax.set_xlabel('time')
ax.set_ylabel('$y_t$', fontsize=16)
plt.show()
25.10. SOLUTIONS 425

25.10.3 Exercise 3
In [5]: from scipy.stats import norm
import random

๏ฟฝ_1, ๏ฟฝ_2, ๏ฟฝ_3, ๏ฟฝ_4 = 0.5, -0.2, 0, 0.5


ฯƒ = 0.1

A = [[๏ฟฝ_1, ๏ฟฝ_2, ๏ฟฝ_3, ๏ฟฝ_4],


[1, 0, 0, 0 ],
[0, 1, 0, 0 ],
[0, 0, 1, 0 ]]
C = [[ฯƒ],
[0],
[0],
[0]]
G = [1, 0, 0, 0]

I = 20
T = 50
ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
ymin, ymax = -0.5, 1.15

fig, ax = plt.subplots(figsize=(8, 5))

ax.set_ylim(ymin, ymax)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel('$y_t$', fontsize=16)

ensemble_mean = np.zeros(T)
for i in range(I):
x, y = ar.simulate(ts_length=T)
y = y.flatten()
ax.plot(y, 'c-', lw=0.8, alpha=0.5)
ensemble_mean = ensemble_mean + y

ensemble_mean = ensemble_mean / I
ax.plot(ensemble_mean, color='b', lw=2, alpha=0.8, label='$\\bar y_t$')

m = ar.moment_sequence()
population_means = []
426 25. LINEAR STATE SPACE MODELS

for t in range(T):
ฮผ_x, ฮผ_y, ฮฃ_x, ฮฃ_y = next(m)
population_means.append(float(ฮผ_y))
ax.plot(population_means, color='g', lw=2, alpha=0.8, label='$G\mu_t$')
ax.legend(ncol=2)
plt.show()

25.10.4 Exercise 4
In [6]: ๏ฟฝ_1, ๏ฟฝ_2, ๏ฟฝ_3, ๏ฟฝ_4 = 0.5, -0.2, 0, 0.5
ฯƒ = 0.1

A = [[๏ฟฝ_1, ๏ฟฝ_2, ๏ฟฝ_3, ๏ฟฝ_4],


[1, 0, 0, 0 ],
[0, 1, 0, 0 ],
[0, 0, 1, 0 ]]
C = [[ฯƒ],
[0],
[0],
[0]]
G = [1, 0, 0, 0]

T0 = 10
T1 = 50
T2 = 75
T4 = 100

ar = LinearStateSpace(A, C, G, mu_0=np.ones(4), Sigma_0=ฮฃ_x)


ymin, ymax = -0.6, 0.6

fig, ax = plt.subplots(figsize=(8, 5))

ax.grid(alpha=0.4)
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$y_t$', fontsize=16)
ax.vlines((T0, T1, T2), -1.5, 1.5)

ax.set_xticks((T0, T1, T2))


ax.set_xticklabels(("$T$", "$T'$", "$T''$"), fontsize=14)
25.10. SOLUTIONS 427

ฮผ_x, ฮผ_y, ฮฃ_x, ฮฃ_y = ar.stationary_distributions()


ar.mu_0 = ฮผ_x
ar.Sigma_0 = ฮฃ_x

for i in range(80):
rcolor = random.choice(('c', 'g', 'b'))
x, y = ar.simulate(ts_length=T4)
y = y.flatten()
ax.plot(y, color=rcolor, lw=0.8, alpha=0.5)
ax.plot((T0, T1, T2), (y[T0], y[T1], y[T2],), 'ko', alpha=0.5)
plt.show()

Footnotes
[1] The eigenvalues of ๐ด are (1, โˆ’1, ๐‘–, โˆ’๐‘–).
[2] The correct way to argue this is by induction. Suppose that ๐‘ฅ๐‘ก is Gaussian. Then Eq. (1)
and Eq. (10) imply that ๐‘ฅ๐‘ก+1 is Gaussian. Since ๐‘ฅ0 is assumed to be Gaussian, it follows that
every ๐‘ฅ๐‘ก is Gaussian. Evidently, this implies that each ๐‘ฆ๐‘ก is Gaussian.
428 25. LINEAR STATE SPACE MODELS
26

Finite Markov Chains

26.1 Contents

โ€ข Overview 26.2

โ€ข Definitions 26.3

โ€ข Simulation 26.4

โ€ข Marginal Distributions 26.5

โ€ข Irreducibility and Aperiodicity 26.6

โ€ข Stationary Distributions 26.7

โ€ข Ergodicity 26.8

โ€ข Computing Expectations 26.9

โ€ข Exercises 26.10

โ€ข Solutions 26.11

In addition to whatโ€™s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

26.2 Overview

Markov chains are one of the most useful classes of stochastic processes, being

โ€ข simple, flexible and supported by many elegant theoretical results


โ€ข valuable for building intuition about random dynamic models
โ€ข central to quantitative modeling in their own right

You will find them in many of the workhorse models of economics and finance
In this lecture, we review some of the theory of Markov chains

429
430 26. FINITE MARKOV CHAINS

We will also introduce some of the high-quality routines for working with Markov chains
available in QuantEcon.py
Prerequisite knowledge is basic probability and linear algebra

26.3 Definitions

The following concepts are fundamental

26.3.1 Stochastic Matrices

A stochastic matrix (or Markov matrix) is an ๐‘› ร— ๐‘› square matrix ๐‘ƒ such that

1. each element of ๐‘ƒ is nonnegative, and


2. each row of ๐‘ƒ sums to one

Each row of ๐‘ƒ can be regarded as a probability mass function over ๐‘› possible outcomes
It is too not difficult to check [1] that if ๐‘ƒ is a stochastic matrix, then so is the ๐‘˜-th power ๐‘ƒ ๐‘˜
for all ๐‘˜ โˆˆ N

26.3.2 Markov Chains

There is a close connection between stochastic matrices and Markov chains


To begin, let ๐‘† be a finite set with ๐‘› elements {๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘› }
The set ๐‘† is called the state space and ๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘› are the state values
A Markov chain {๐‘‹๐‘ก } on ๐‘† is a sequence of random variables on ๐‘† that have the Markov
property
This means that, for any date ๐‘ก and any state ๐‘ฆ โˆˆ ๐‘†,

P{๐‘‹๐‘ก+1 = ๐‘ฆ | ๐‘‹๐‘ก } = P{๐‘‹๐‘ก+1 = ๐‘ฆ | ๐‘‹๐‘ก , ๐‘‹๐‘กโˆ’1 , โ€ฆ} (1)

In other words, knowing the current state is enough to know probabilities for future states
In particular, the dynamics of a Markov chain are fully determined by the set of values

๐‘ƒ (๐‘ฅ, ๐‘ฆ) โˆถ= P{๐‘‹๐‘ก+1 = ๐‘ฆ | ๐‘‹๐‘ก = ๐‘ฅ} (๐‘ฅ, ๐‘ฆ โˆˆ ๐‘†) (2)

By construction,

โ€ข ๐‘ƒ (๐‘ฅ, ๐‘ฆ) is the probability of going from ๐‘ฅ to ๐‘ฆ in one unit of time (one step)
โ€ข ๐‘ƒ (๐‘ฅ, โ‹…) is the conditional distribution of ๐‘‹๐‘ก+1 given ๐‘‹๐‘ก = ๐‘ฅ

We can view ๐‘ƒ as a stochastic matrix where

๐‘ƒ๐‘–๐‘— = ๐‘ƒ (๐‘ฅ๐‘– , ๐‘ฅ๐‘— ) 1 โ‰ค ๐‘–, ๐‘— โ‰ค ๐‘›
26.3. DEFINITIONS 431

Going the other way, if we take a stochastic matrix ๐‘ƒ , we can generate a Markov chain {๐‘‹๐‘ก }
as follows:

โ€ข draw ๐‘‹0 from some specified distribution


โ€ข for each ๐‘ก = 0, 1, โ€ฆ, draw ๐‘‹๐‘ก+1 from ๐‘ƒ (๐‘‹๐‘ก , โ‹…)

By construction, the resulting process satisfies Eq. (2)

26.3.3 Example 1

Consider a worker who, at any given time ๐‘ก, is either unemployed (state 0) or employed (state
1)
Suppose that, over a one month period,

1. An unemployed worker finds a job with probability ๐›ผ โˆˆ (0, 1)


2. An employed worker loses her job and becomes unemployed with probability ๐›ฝ โˆˆ (0, 1)

In terms of a Markov model, we have

โ€ข ๐‘† = {0, 1}
โ€ข ๐‘ƒ (0, 1) = ๐›ผ and ๐‘ƒ (1, 0) = ๐›ฝ

We can write out the transition probabilities in matrix form as

1โˆ’๐›ผ ๐›ผ
๐‘ƒ =( )
๐›ฝ 1โˆ’๐›ฝ

Once we have the values ๐›ผ and ๐›ฝ, we can address a range of questions, such as

โ€ข What is the average duration of unemployment?


โ€ข Over the long-run, what fraction of time does a worker find herself unemployed?
โ€ข Conditional on employment, what is the probability of becoming unemployed at least
once over the next 12 months?

Weโ€™ll cover such applications below

26.3.4 Example 2

Using US unemployment data, Hamilton [51] estimated the stochastic matrix

0.971 0.029 0
๐‘ƒ =โŽ›
โŽœ 0.145 0.778 0.077 โŽž
โŽŸ
โŽ 0 0.508 0.492 โŽ 

where

โ€ข the frequency is monthly


432 26. FINITE MARKOV CHAINS

โ€ข the first state represents โ€œnormal growthโ€


โ€ข the second state represents โ€œmild recessionโ€
โ€ข the third state represents โ€œsevere recessionโ€

For example, the matrix tells us that when the state is normal growth, the state will again be
normal growth next month with probability 0.97
In general, large values on the main diagonal indicate persistence in the process {๐‘‹๐‘ก }
This Markov process can also be represented as a directed graph, with edges labeled by tran-
sition probabilities

Here โ€œngโ€ is normal growth, โ€œmrโ€ is mild recession, etc.

26.4 Simulation

One natural way to answer questions about Markov chains is to simulate them
(To approximate the probability of event ๐ธ, we can simulate many times and count the frac-
tion of times that ๐ธ occurs)
Nice functionality for simulating Markov chains exists in QuantEcon.py

โ€ข Efficient, bundled with lots of other useful routines for handling Markov chains

However, itโ€™s also a good exercise to roll our own routines โ€” letโ€™s do that first and then come
back to the methods in QuantEcon.py
In these exercises, weโ€™ll take the state space to be ๐‘† = 0, โ€ฆ , ๐‘› โˆ’ 1

26.4.1 Rolling Our Own

To simulate a Markov chain, we need its stochastic matrix ๐‘ƒ and either an initial state or a
probability distribution ๐œ“ for initial state to be drawn from
The Markov chain is then constructed as discussed above. To repeat:

1. At time ๐‘ก = 0, the ๐‘‹0 is set to some fixed state or chosen from ๐œ“


2. At each subsequent time ๐‘ก, the new state ๐‘‹๐‘ก+1 is drawn from ๐‘ƒ (๐‘‹๐‘ก , โ‹…)

In order to implement this simulation procedure, we need a method for generating draws from
a discrete distribution
For this task, weโ€™ll use DiscreteRV from QuantEcon
26.4. SIMULATION 433

In [2]: import quantecon as qe


import numpy as np

ฯˆ = (0.1, 0.9) # Probabilities over sample space {0, 1}


cdf = np.cumsum(ฯˆ)
qe.random.draw(cdf, 5) # Generate 5 independent draws from ฯˆ

Out[2]: array([1, 1, 1, 1, 1])

Weโ€™ll write our code as a function that takes the following three arguments

โ€ข A stochastic matrix P
โ€ข An initial state init
โ€ข A positive integer sample_size representing the length of the time series the function
should return

In [3]: def mc_sample_path(P, init=0, sample_size=1000):


# === make sure P is a NumPy array === #
P = np.asarray(P)
# === allocate memory === #
X = np.empty(sample_size, dtype=int)
X[0] = init
# === convert each row of P into a distribution === #
# In particular, P_dist[i] = the distribution corresponding to P[i, :]
n = len(P)
P_dist = [np.cumsum(P[i, :]) for i in range(n)]

# === generate the sample path === #


for t in range(sample_size - 1):
X[t+1] = qe.random.draw(P_dist[X[t]])

return X

Letโ€™s see how it works using the small matrix

0.4 0.6
๐‘ƒ โˆถ= ( ) (3)
0.2 0.8

As weโ€™ll see later, for a long series drawn from P, the fraction of the sample that takes value 0
will be about 0.25
If you run the following code you should get roughly that answer

In [4]: P = [[0.4, 0.6], [0.2, 0.8]]


X = mc_sample_path(P, sample_size=100000)
np.mean(X == 0)

Out[4]: 0.25109

26.4.2 Using QuantEconโ€™s Routines

As discussed above, QuantEcon.py has routines for handling Markov chains, including simula-
tion
Hereโ€™s an illustration using the same P as the preceding example

In [5]: P = [[0.4, 0.6], [0.2, 0.8]]


mc = qe.MarkovChain(P)
X = mc.simulate(ts_length=1000000)
np.mean(X == 0)
434 26. FINITE MARKOV CHAINS

Out[5]: 0.249741

In fact the QuantEcon.py routine is JIT compiled and much faster


(Because itโ€™s JIT compiled the first run takes a bit longer โ€” the function has to be compiled
and stored in memory)

In [6]: %timeit mc_sample_path(P, sample_size=1000000) # our version

678 ms ยฑ 9.12 ms per loop (mean ยฑ std. dev. of 7 runs, 1 loop each)

In [7]: %timeit mc.simulate(ts_length=1000000) # qe version

30.2 ms ยฑ 396 ยตs per loop (mean ยฑ std. dev. of 7 runs, 10 loops each)

Adding State Values and Initial Conditions


If we wish to, we can provide a specification of state values to MarkovChain
These state values can be integers, floats, or even strings
The following code illustrates

In [8]: mc = qe.MarkovChain(P, state_values=('unemployed', 'employed'))


mc.simulate(ts_length=4, init='employed')

Out[8]: array(['employed', 'employed', 'employed', 'employed'], dtype='<U10')

In [9]: mc.simulate(ts_length=4, init='unemployed')

Out[9]: array(['unemployed', 'employed', 'employed', 'employed'], dtype='<U10')

In [10]: mc.simulate(ts_length=4) # Start at randomly chosen initial state

Out[10]: array(['unemployed', 'employed', 'unemployed', 'employed'], dtype='<U10')

If we want to simulate with output as indices rather than state values we can use

In [11]: mc.simulate_indices(ts_length=4)

Out[11]: array([1, 1, 1, 1])

26.5 Marginal Distributions

Suppose that

1. {๐‘‹๐‘ก } is a Markov chain with stochastic matrix ๐‘ƒ


2. the distribution of ๐‘‹๐‘ก is known to be ๐œ“๐‘ก

What then is the distribution of ๐‘‹๐‘ก+1 , or, more generally, of ๐‘‹๐‘ก+๐‘š ?


26.5. MARGINAL DISTRIBUTIONS 435

26.5.1 Solution

Let ๐œ“๐‘ก be the distribution of ๐‘‹๐‘ก for ๐‘ก = 0, 1, 2, โ€ฆ


Our first aim is to find ๐œ“๐‘ก+1 given ๐œ“๐‘ก and ๐‘ƒ
To begin, pick any ๐‘ฆ โˆˆ ๐‘†
Using the law of total probability, we can decompose the probability that ๐‘‹๐‘ก+1 = ๐‘ฆ as follows:

P{๐‘‹๐‘ก+1 = ๐‘ฆ} = โˆ‘ P{๐‘‹๐‘ก+1 = ๐‘ฆ | ๐‘‹๐‘ก = ๐‘ฅ} โ‹… P{๐‘‹๐‘ก = ๐‘ฅ}


๐‘ฅโˆˆ๐‘†

In words, to get the probability of being at ๐‘ฆ tomorrow, we account for all ways this can hap-
pen and sum their probabilities
Rewriting this statement in terms of marginal and conditional probabilities gives
>
๐œ“๐‘ก+1 (๐‘ฆ) = โˆ‘ ๐‘ƒ (๐‘ฅ, ๐‘ฆ)๐œ“๐‘ก (๐‘ฅ)
๐‘ฅโˆˆ๐‘†

There are ๐‘› such equations, one for each ๐‘ฆ โˆˆ ๐‘†


If we think of ๐œ“๐‘ก+1 and ๐œ“๐‘ก as row vectors (as is traditional in this literature), these ๐‘› equa-
tions are summarized by the matrix expression

๐œ“๐‘ก+1 = ๐œ“๐‘ก ๐‘ƒ (4)

In other words, to move the distribution forward one unit of time, we postmultiply by ๐‘ƒ
By repeating this ๐‘š times we move forward ๐‘š steps into the future
Hence, iterating on Eq. (4), the expression ๐œ“๐‘ก+๐‘š = ๐œ“๐‘ก ๐‘ƒ ๐‘š is also valid โ€” here ๐‘ƒ ๐‘š is the ๐‘š-th
power of ๐‘ƒ
As a special case, we see that if ๐œ“0 is the initial distribution from which ๐‘‹0 is drawn, then
๐œ“0 ๐‘ƒ ๐‘š is the distribution of ๐‘‹๐‘š
This is very important, so letโ€™s repeat it

๐‘‹0 โˆผ ๐œ“ 0 โŸน ๐‘‹๐‘š โˆผ ๐œ“0 ๐‘ƒ ๐‘š (5)

and, more generally,

๐‘‹๐‘ก โˆผ ๐œ“๐‘ก โŸน ๐‘‹๐‘ก+๐‘š โˆผ ๐œ“๐‘ก ๐‘ƒ ๐‘š (6)

26.5.2 Multiple Step Transition Probabilities

We know that the probability of transitioning from ๐‘ฅ to ๐‘ฆ in one step is ๐‘ƒ (๐‘ฅ, ๐‘ฆ)


It turns out that the probability of transitioning from ๐‘ฅ to ๐‘ฆ in ๐‘š steps is ๐‘ƒ ๐‘š (๐‘ฅ, ๐‘ฆ), the
(๐‘ฅ, ๐‘ฆ)-th element of the ๐‘š-th power of ๐‘ƒ
To see why, consider again Eq. (6), but now with ๐œ“๐‘ก putting all probability on state ๐‘ฅ
436 26. FINITE MARKOV CHAINS

โ€ข 1 in the ๐‘ฅ-th position and zero elsewhere

Inserting this into Eq. (6), we see that, conditional on ๐‘‹๐‘ก = ๐‘ฅ, the distribution of ๐‘‹๐‘ก+๐‘š is the
๐‘ฅ-th row of ๐‘ƒ ๐‘š
In particular

P{๐‘‹๐‘ก+๐‘š = ๐‘ฆ} = ๐‘ƒ ๐‘š (๐‘ฅ, ๐‘ฆ) = (๐‘ฅ, ๐‘ฆ)-th element of ๐‘ƒ ๐‘š

26.5.3 Example: Probability of Recession

Recall the stochastic matrix ๐‘ƒ for recession and growth considered above
Suppose that the current state is unknown โ€” perhaps statistics are available only at the end
of the current month
We estimate the probability that the economy is in state ๐‘ฅ to be ๐œ“(๐‘ฅ)
The probability of being in recession (either mild or severe) in 6 months time is given by the
inner product

0
๐œ“๐‘ƒ 6 โ‹… โŽ›
โŽœ 1 โŽž
โŽŸ
โŽ 1 โŽ 

26.5.4 Example 2: Cross-Sectional Distributions

The marginal distributions we have been studying can be viewed either as probabilities or as
cross-sectional frequencies in large samples
To illustrate, recall our model of employment/unemployment dynamics for a given worker
discussed above
Consider a large (i.e., tending to infinite) population of workers, each of whose lifetime expe-
rience is described by the specified dynamics, independent of one another
Let ๐œ“ be the current cross-sectional distribution over {0, 1}

โ€ข For example, ๐œ“(0) is the unemployment rate

The cross-sectional distribution records the fractions of workers employed and unemployed at
a given moment
The same distribution also describes the fractions of a particular workerโ€™s career spent being
employed and unemployed, respectively

26.6 Irreducibility and Aperiodicity

Irreducibility and aperiodicity are central concepts of modern Markov chain theory
Letโ€™s see what theyโ€™re about
26.6. IRREDUCIBILITY AND APERIODICITY 437

26.6.1 Irreducibility

Let ๐‘ƒ be a fixed stochastic matrix


Two states ๐‘ฅ and ๐‘ฆ are said to communicate with each other if there exist positive integers
๐‘— and ๐‘˜ such that

๐‘ƒ ๐‘— (๐‘ฅ, ๐‘ฆ) > 0 and ๐‘ƒ ๐‘˜ (๐‘ฆ, ๐‘ฅ) > 0

In view of our discussion above, this means precisely that

โ€ข state ๐‘ฅ can be reached eventually from state ๐‘ฆ, and


โ€ข state ๐‘ฆ can be reached eventually from state ๐‘ฅ

The stochastic matrix ๐‘ƒ is called irreducible if all states communicate; that is, if ๐‘ฅ and ๐‘ฆ
communicate for all (๐‘ฅ, ๐‘ฆ) in ๐‘† ร— ๐‘†
For example, consider the following transition probabilities for wealth of a fictitious set of
households

We can translate this into a stochastic matrix, putting zeros where thereโ€™s no edge between
nodes

0.9 0.1 0
๐‘ƒ โˆถ= โŽ›
โŽœ 0.4 0.4 0.2 โŽž
โŽŸ
โŽ 0.1 0.1 0.8 โŽ 

Itโ€™s clear from the graph that this stochastic matrix is irreducible: we can reach any state
from any other state eventually
We can also test this using QuantEcon.pyโ€™s MarkovChain class

In [12]: P = [[0.9, 0.1, 0.0],


[0.4, 0.4, 0.2],
[0.1, 0.1, 0.8]]

mc = qe.MarkovChain(P, ('poor', 'middle', 'rich'))


mc.is_irreducible

Out[12]: True

Hereโ€™s a more pessimistic scenario, where the poor are poor forever
438 26. FINITE MARKOV CHAINS

This stochastic matrix is not irreducible, since, for example, rich is not accessible from poor
Letโ€™s confirm this

In [13]: P = [[1.0, 0.0, 0.0],


[0.1, 0.8, 0.1],
[0.0, 0.2, 0.8]]

mc = qe.MarkovChain(P, ('poor', 'middle', 'rich'))


mc.is_irreducible

Out[13]: False

We can also determine the โ€œcommunication classesโ€

In [14]: mc.communication_classes

Out[14]: [array(['poor'], dtype='<U6'), array(['middle', 'rich'], dtype='<U6')]

It might be clear to you already that irreducibility is going to be important in terms of long
run outcomes
For example, poverty is a life sentence in the second graph but not the first
Weโ€™ll come back to this a bit later

26.6.2 Aperiodicity

Loosely speaking, a Markov chain is called periodic if it cycles in a predictible way, and aperi-
odic otherwise
Hereโ€™s a trivial example with three states

The chain cycles with period 3:


26.7. STATIONARY DISTRIBUTIONS 439

In [15]: P = [[0, 1, 0],


[0, 0, 1],
[1, 0, 0]]

mc = qe.MarkovChain(P)
mc.period

Out[15]: 3

More formally, the period of a state ๐‘ฅ is the greatest common divisor of the set of integers

๐ท(๐‘ฅ) โˆถ= {๐‘— โ‰ฅ 1 โˆถ ๐‘ƒ ๐‘— (๐‘ฅ, ๐‘ฅ) > 0}

In the last example, ๐ท(๐‘ฅ) = {3, 6, 9, โ€ฆ} for every state ๐‘ฅ, so the period is 3
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic other-
wise
For example, the stochastic matrix associated with the transition probabilities below is peri-
odic because, for example, state ๐‘Ž has period 2

We can confirm that the stochastic matrix is periodic as follows

In [16]: P = [[0.0, 1.0, 0.0, 0.0],


[0.5, 0.0, 0.5, 0.0],
[0.0, 0.5, 0.0, 0.5],
[0.0, 0.0, 1.0, 0.0]]

mc = qe.MarkovChain(P)
mc.period

Out[16]: 2

In [17]: mc.is_aperiodic

Out[17]: False

26.7 Stationary Distributions

As seen in Eq. (4), we can shift probabilities forward one unit of time via postmultiplication
by ๐‘ƒ
Some distributions are invariant under this updating process โ€” for example,

In [18]: P = np.array([[.4, .6], [.2, .8]])


ฯˆ = (0.25, 0.75)
ฯˆ @ P

Out[18]: array([0.25, 0.75])

Such distributions are called stationary, or invariant


Formally, a distribution ๐œ“โˆ— on ๐‘† is called stationary for ๐‘ƒ if ๐œ“โˆ— = ๐œ“โˆ— ๐‘ƒ
440 26. FINITE MARKOV CHAINS

From this equality, we immediately get ๐œ“โˆ— = ๐œ“โˆ— ๐‘ƒ ๐‘ก for all ๐‘ก


This tells us an important fact: If the distribution of ๐‘‹0 is a stationary distribution, then ๐‘‹๐‘ก
will have this same distribution for all ๐‘ก
Hence stationary distributions have a natural interpretation as stochastic steady states โ€”
weโ€™ll discuss this more in just a moment
Mathematically, a stationary distribution is a fixed point of ๐‘ƒ when ๐‘ƒ is thought of as the
map ๐œ“ โ†ฆ ๐œ“๐‘ƒ from (row) vectors to (row) vectors
Theorem. Every stochastic matrix ๐‘ƒ has at least one stationary distribution
(We are assuming here that the state space ๐‘† is finite; if not more assumptions are required)
For proof of this result, you can apply Brouwerโ€™s fixed point theorem, or see EDTC, theorem
4.3.5
There may in fact be many stationary distributions corresponding to a given stochastic ma-
trix ๐‘ƒ

โ€ข For example, if ๐‘ƒ is the identity matrix, then all distributions are stationary

Since stationary distributions are long run equilibria, to get uniqueness we require that initial
conditions are not infinitely persistent
Infinite persistence of initial conditions occurs if certain regions of the state space cannot be
accessed from other regions, which is the opposite of irreducibility
This gives some intuition for the following fundamental theorem
Theorem. If ๐‘ƒ is both aperiodic and irreducible, then

1. ๐‘ƒ has exactly one stationary distribution ๐œ“โˆ—


2. For any initial distribution ๐œ“0 , we have โ€–๐œ“0 ๐‘ƒ ๐‘ก โˆ’ ๐œ“โˆ— โ€– โ†’ 0 as ๐‘ก โ†’ โˆž

For a proof, see, for example, theorem 5.2 of [47]


(Note that part 1 of the theorem requires only irreducibility, whereas part 2 requires both
irreducibility and aperiodicity)
A stochastic matrix satisfying the conditions of the theorem is sometimes called uniformly
ergodic
One easy sufficient condition for aperiodicity and irreducibility is that every element of ๐‘ƒ is
strictly positive

โ€ข Try to convince yourself of this

26.7.1 Example

Recall our model of employment/unemployment dynamics for a given worker discussed above
Assuming ๐›ผ โˆˆ (0, 1) and ๐›ฝ โˆˆ (0, 1), the uniform ergodicity condition is satisfied
Let ๐œ“โˆ— = (๐‘, 1 โˆ’ ๐‘) be the stationary distribution, so that ๐‘ corresponds to unemployment
(state 0)
26.7. STATIONARY DISTRIBUTIONS 441

Using ๐œ“โˆ— = ๐œ“โˆ— ๐‘ƒ and a bit of algebra yields

๐›ฝ
๐‘=
๐›ผ+๐›ฝ

This is, in some sense, a steady state probability of unemployment โ€” more on interpretation
below
Not surprisingly it tends to zero as ๐›ฝ โ†’ 0, and to one as ๐›ผ โ†’ 0

26.7.2 Calculating Stationary Distributions

As discussed above, a given Markov matrix ๐‘ƒ can have many stationary distributions
That is, there can be many row vectors ๐œ“ such that ๐œ“ = ๐œ“๐‘ƒ
In fact if ๐‘ƒ has two distinct stationary distributions ๐œ“1 , ๐œ“2 then it has infinitely many, since
in this case, as you can verify,

๐œ“3 โˆถ= ๐œ†๐œ“1 + (1 โˆ’ ๐œ†)๐œ“2

is a stationary distribution for ๐‘ƒ for any ๐œ† โˆˆ [0, 1]


If we restrict attention to the case where only one stationary distribution exists, one option
for finding it is to try to solve the linear system ๐œ“(๐ผ๐‘› โˆ’ ๐‘ƒ ) = 0 for ๐œ“, where ๐ผ๐‘› is the ๐‘› ร— ๐‘›
identity
But the zero vector solves this equation
Hence we need to impose the restriction that the solution must be a probability distribution
A suitable algorithm is implemented in QuantEcon.py โ€” the next code block illustrates

In [19]: P = [[0.4, 0.6], [0.2, 0.8]]


mc = qe.MarkovChain(P)
mc.stationary_distributions # Show all stationary distributions

Out[19]: array([[0.25, 0.75]])

The stationary distribution is unique

26.7.3 Convergence to Stationarity

Part 2 of the Markov chain convergence theorem stated above tells us that the distribution of
๐‘‹๐‘ก converges to the stationary distribution regardless of where we start off
This adds considerable weight to our interpretation of ๐œ“โˆ— as a stochastic steady state
The convergence in the theorem is illustrated in the next figure

In [20]: from mpl_toolkits.mplot3d import Axes3D


import matplotlib.pyplot as plt
%matplotlib inline

P = ((0.971, 0.029, 0.000),


(0.145, 0.778, 0.077),
442 26. FINITE MARKOV CHAINS

(0.000, 0.508, 0.492))


P = np.array(P)

ฯˆ = (0.0, 0.2, 0.8) # Initial condition

fig = plt.figure(figsize=(8, 6))


ax = fig.add_subplot(111, projection='3d')

ax.set(xlim=(0, 1), ylim=(0, 1), zlim=(0, 1),


xticks=(0.25, 0.5, 0.75),
yticks=(0.25, 0.5, 0.75),
zticks=(0.25, 0.5, 0.75))

x_vals, y_vals, z_vals = [], [], []


for t in range(20):
x_vals.append(ฯˆ[0])
y_vals.append(ฯˆ[1])
z_vals.append(ฯˆ[2])
ฯˆ = ฯˆ @ P

ax.scatter(x_vals, y_vals, z_vals, c='r', s=60)


ax.view_init(30, 210)

mc = qe.MarkovChain(P)
ฯˆ_star = mc.stationary_distributions[0]
ax.scatter(ฯˆ_star[0], ฯˆ_star[1], ฯˆ_star[2], c='k', s=60)

plt.show()

Here

โ€ข ๐‘ƒ is the stochastic matrix for recession and growth considered above


โ€ข The highest red dot is an arbitrarily chosen initial probability distribution ๐œ“, repre-
sented as a vector in R3
26.8. ERGODICITY 443

โ€ข The other red dots are the distributions ๐œ“๐‘ƒ ๐‘ก for ๐‘ก = 1, 2, โ€ฆ


โ€ข The black dot is ๐œ“โˆ—

The code for the figure can be found here โ€” you might like to try experimenting with differ-
ent initial conditions

26.8 Ergodicity

Under irreducibility, yet another important result obtains: For all ๐‘ฅ โˆˆ ๐‘†,

1 ๐‘š
โˆ‘ 1{๐‘‹๐‘ก = ๐‘ฅ} โ†’ ๐œ“โˆ— (๐‘ฅ) as ๐‘š โ†’ โˆž (7)
๐‘› ๐‘ก=1

Here

โ€ข 1{๐‘‹๐‘ก = ๐‘ฅ} = 1 if ๐‘‹๐‘ก = ๐‘ฅ and zero otherwise


โ€ข convergence is with probability one
โ€ข the result does not depend on the distribution (or value) of ๐‘‹0

The result tells us that the fraction of time the chain spends at state ๐‘ฅ converges to ๐œ“โˆ— (๐‘ฅ) as
time goes to infinity
This gives us another way to interpret the stationary distribution โ€” provided that the con-
vergence result in Eq. (7) is valid
The convergence in Eq. (7) is a special case of a law of large numbers result for Markov
chains โ€” see EDTC, section 4.3.4 for some additional information

26.8.1 Example

Recall our cross-sectional interpretation of the employment/unemployment model discussed


above
Assume that ๐›ผ โˆˆ (0, 1) and ๐›ฝ โˆˆ (0, 1), so that irreducibility and aperiodicity both hold
We saw that the stationary distribution is (๐‘, 1 โˆ’ ๐‘), where

๐›ฝ
๐‘=
๐›ผ+๐›ฝ

In the cross-sectional interpretation, this is the fraction of people unemployed


In view of our latest (ergodicity) result, it is also the fraction of time that a worker can ex-
pect to spend unemployed
Thus, in the long-run, cross-sectional averages for a population and time-series averages for a
given person coincide
This is one interpretation of the notion of ergodicity
444 26. FINITE MARKOV CHAINS

26.9 Computing Expectations

We are interested in computing expectations of the form

E[โ„Ž(๐‘‹๐‘ก )] (8)

and conditional expectations such as

E[โ„Ž(๐‘‹๐‘ก+๐‘˜ ) โˆฃ ๐‘‹๐‘ก = ๐‘ฅ] (9)

where

โ€ข {๐‘‹๐‘ก } is a Markov chain generated by ๐‘› ร— ๐‘› stochastic matrix ๐‘ƒ


โ€ข โ„Ž is a given function, which, in expressions involving matrix algebra, weโ€™ll think of as
the column vector

โ„Ž(๐‘ฅ1 )
โ„Ž=โŽ›
โŽœ โ‹ฎ โŽž
โŽŸ
โŽ โ„Ž(๐‘ฅ๐‘› ) โŽ 

The unconditional expectation Eq. (8) is easy: We just sum over the distribution of ๐‘‹๐‘ก to get

E[โ„Ž(๐‘‹๐‘ก )] = โˆ‘(๐œ“๐‘ƒ ๐‘ก )(๐‘ฅ)โ„Ž(๐‘ฅ)


๐‘ฅโˆˆ๐‘†

Here ๐œ“ is the distribution of ๐‘‹0


Since ๐œ“ and hence ๐œ“๐‘ƒ ๐‘ก are row vectors, we can also write this as

E[โ„Ž(๐‘‹๐‘ก )] = ๐œ“๐‘ƒ ๐‘ก โ„Ž

For the conditional expectation Eq. (9), we need to sum over the conditional distribution of
๐‘‹๐‘ก+๐‘˜ given ๐‘‹๐‘ก = ๐‘ฅ
We already know that this is ๐‘ƒ ๐‘˜ (๐‘ฅ, โ‹…), so

E[โ„Ž(๐‘‹๐‘ก+๐‘˜ ) โˆฃ ๐‘‹๐‘ก = ๐‘ฅ] = (๐‘ƒ ๐‘˜ โ„Ž)(๐‘ฅ) (10)

The vector ๐‘ƒ ๐‘˜ โ„Ž stores the conditional expectation E[โ„Ž(๐‘‹๐‘ก+๐‘˜ ) โˆฃ ๐‘‹๐‘ก = ๐‘ฅ] over all ๐‘ฅ

26.9.1 Expectations of Geometric Sums

Sometimes we also want to compute expectations of a geometric sum, such as โˆ‘๐‘ก ๐›ฝ ๐‘ก โ„Ž(๐‘‹๐‘ก )


In view of the preceding discussion, this is

โˆž
E [โˆ‘ ๐›ฝ ๐‘— โ„Ž(๐‘‹๐‘ก+๐‘— ) โˆฃ ๐‘‹๐‘ก = ๐‘ฅ] = [(๐ผ โˆ’ ๐›ฝ๐‘ƒ )โˆ’1 โ„Ž](๐‘ฅ)
๐‘—=0
26.10. EXERCISES 445

where

(๐ผ โˆ’ ๐›ฝ๐‘ƒ )โˆ’1 = ๐ผ + ๐›ฝ๐‘ƒ + ๐›ฝ 2 ๐‘ƒ 2 + โ‹ฏ

Premultiplication by (๐ผ โˆ’ ๐›ฝ๐‘ƒ )โˆ’1 amounts to โ€œapplying the resolvent operatorโ€

26.10 Exercises

26.10.1 Exercise 1

According to the discussion above, if a workerโ€™s employment dynamics obey the stochastic
matrix

1โˆ’๐›ผ ๐›ผ
๐‘ƒ =( )
๐›ฝ 1โˆ’๐›ฝ

with ๐›ผ โˆˆ (0, 1) and ๐›ฝ โˆˆ (0, 1), then, in the long-run, the fraction of time spent unemployed
will be

๐›ฝ
๐‘ โˆถ=
๐›ผ+๐›ฝ

In other words, if {๐‘‹๐‘ก } represents the Markov chain for employment, then ๐‘‹ฬ„ ๐‘š โ†’ ๐‘ as ๐‘š โ†’
โˆž, where

1 ๐‘š
๐‘‹ฬ„ ๐‘š โˆถ= โˆ‘ 1{๐‘‹๐‘ก = 0}
๐‘š ๐‘ก=1

Your exercise is to illustrate this convergence


First,

โ€ข generate one simulated time series {๐‘‹๐‘ก } of length 10,000, starting at ๐‘‹0 = 0


โ€ข plot ๐‘‹ฬ„ ๐‘š โˆ’ ๐‘ against ๐‘š, where ๐‘ is as defined above

Second, repeat the first step, but this time taking ๐‘‹0 = 1


In both cases, set ๐›ผ = ๐›ฝ = 0.1
The result should look something like the following โ€” modulo randomness, of course
446 26. FINITE MARKOV CHAINS

(You donโ€™t need to add the fancy touches to the graphโ€”see the solution if youโ€™re interested)

26.10.2 Exercise 2

A topic of interest for economics and many other disciplines is ranking


Letโ€™s now consider one of the most practical and important ranking problems โ€” the rank as-
signed to web pages by search engines
(Although the problem is motivated from outside of economics, there is in fact a deep connec-
tion between search ranking systems and prices in certain competitive equilibria โ€” see [37])
To understand the issue, consider the set of results returned by a query to a web search en-
gine
For the user, it is desirable to

1. receive a large set of accurate matches


2. have the matches returned in order, where the order corresponds to some measure of
โ€œimportanceโ€

Ranking according to a measure of importance is the problem we now consider


The methodology developed to solve this problem by Google founders Larry Page and Sergey
Brin is known as PageRank
To illustrate the idea, consider the following diagram
26.10. EXERCISES 447

Imagine that this is a miniature version of the WWW, with

โ€ข each node representing a web page


โ€ข each arrow representing the existence of a link from one page to another

Now letโ€™s think about which pages are likely to be important, in the sense of being valuable
to a search engine user
One possible criterion for the importance of a page is the number of inbound links โ€” an indi-
cation of popularity
By this measure, m and j are the most important pages, with 5 inbound links each
However, what if the pages linking to m, say, are not themselves important?
Thinking this way, it seems appropriate to weight the inbound nodes by relative importance
The PageRank algorithm does precisely this
A slightly simplified presentation that captures the basic idea is as follows
Letting ๐‘— be (the integer index of) a typical page and ๐‘Ÿ๐‘— be its ranking, we set

๐‘Ÿ๐‘–
๐‘Ÿ๐‘— = โˆ‘
๐‘–โˆˆ๐ฟ๐‘—
โ„“๐‘–

where

โ€ข โ„“๐‘– is the total number of outbound links from ๐‘–


โ€ข ๐ฟ๐‘— is the set of all pages ๐‘– such that ๐‘– has a link to ๐‘—

This is a measure of the number of inbound links, weighted by their own ranking (and nor-
malized by 1/โ„“๐‘– )
There is, however, another interpretation, and it brings us back to Markov chains
Let ๐‘ƒ be the matrix given by ๐‘ƒ (๐‘–, ๐‘—) = 1{๐‘– โ†’ ๐‘—}/โ„“๐‘– where 1{๐‘– โ†’ ๐‘—} = 1 if ๐‘– has a link to ๐‘—
and zero otherwise
The matrix ๐‘ƒ is a stochastic matrix provided that each page has at least one link
448 26. FINITE MARKOV CHAINS

With this definition of ๐‘ƒ we have

๐‘Ÿ๐‘– ๐‘Ÿ
๐‘Ÿ๐‘— = โˆ‘ = โˆ‘ 1{๐‘– โ†’ ๐‘—} ๐‘– = โˆ‘ ๐‘ƒ (๐‘–, ๐‘—)๐‘Ÿ๐‘–
๐‘–โˆˆ๐ฟ๐‘—
โ„“๐‘– all ๐‘–
โ„“๐‘– all ๐‘–

Writing ๐‘Ÿ for the row vector of rankings, this becomes ๐‘Ÿ = ๐‘Ÿ๐‘ƒ


Hence ๐‘Ÿ is the stationary distribution of the stochastic matrix ๐‘ƒ
Letโ€™s think of ๐‘ƒ (๐‘–, ๐‘—) as the probability of โ€œmovingโ€ from page ๐‘– to page ๐‘—
The value ๐‘ƒ (๐‘–, ๐‘—) has the interpretation

โ€ข ๐‘ƒ (๐‘–, ๐‘—) = 1/๐‘˜ if ๐‘– has ๐‘˜ outbound links and ๐‘— is one of them


โ€ข ๐‘ƒ (๐‘–, ๐‘—) = 0 if ๐‘– has no direct link to ๐‘—

Thus, motion from page to page is that of a web surfer who moves from one page to another
by randomly clicking on one of the links on that page
Here โ€œrandomโ€ means that each link is selected with equal probability
Since ๐‘Ÿ is the stationary distribution of ๐‘ƒ , assuming that the uniform ergodicity condition is
valid, we can interpret ๐‘Ÿ๐‘— as the fraction of time that a (very persistent) random surfer spends
at page ๐‘—
Your exercise is to apply this ranking algorithm to the graph pictured above and return the
list of pages ordered by rank
The data for this graph is in the web_graph_data.txt file โ€” you can also view it here
There is a total of 14 nodes (i.e., web pages), the first named a and the last named n
A typical line from the file has the form

d -> h;

This should be interpreted as meaning that there exists a link from d to h


To parse this file and extract the relevant information, you can use regular expressions
The following code snippet provides a hint as to how you can go about this

In [21]: import re

re.findall('\w', 'x +++ y ****** z') # \w matches alphanumerics

Out[21]: ['x', 'y', 'z']

In [22]: re.findall('\w', 'a ^^ b &&& $$ c')

Out[22]: ['a', 'b', 'c']

When you solve for the ranking, you will find that the highest ranked node is in fact g, while
the lowest is a
26.10. EXERCISES 449

26.10.3 Exercise 3

In numerical work, it is sometimes convenient to replace a continuous model with a discrete


one
In particular, Markov chains are routinely generated as discrete approximations to AR(1)
processes of the form

๐‘ฆ๐‘ก+1 = ๐œŒ๐‘ฆ๐‘ก + ๐‘ข๐‘ก+1

Here ๐‘ข๐‘ก is assumed to be IID and ๐‘ (0, ๐œŽ๐‘ข2 )


The variance of the stationary probability distribution of {๐‘ฆ๐‘ก } is

๐œŽ๐‘ข2
๐œŽ๐‘ฆ2 โˆถ=
1 โˆ’ ๐œŒ2

Tauchenโ€™s method [128] is the most common method for approximating this continuous state
process with a finite state Markov chain
A routine for this already exists in QuantEcon.py but letโ€™s write our own version as an exer-
cise
As a first step, we choose

โ€ข ๐‘›, the number of states for the discrete approximation


โ€ข ๐‘š, an integer that parameterizes the width of the state space

Next, we create a state space {๐‘ฅ0 , โ€ฆ , ๐‘ฅ๐‘›โˆ’1 } โŠ‚ R and a stochastic ๐‘› ร— ๐‘› matrix ๐‘ƒ such that

โ€ข ๐‘ฅ0 = โˆ’๐‘š ๐œŽ๐‘ฆ
โ€ข ๐‘ฅ๐‘›โˆ’1 = ๐‘š ๐œŽ๐‘ฆ
โ€ข ๐‘ฅ๐‘–+1 = ๐‘ฅ๐‘– + ๐‘  where ๐‘  = (๐‘ฅ๐‘›โˆ’1 โˆ’ ๐‘ฅ0 )/(๐‘› โˆ’ 1)

Let ๐น be the cumulative distribution function of the normal distribution ๐‘ (0, ๐œŽ๐‘ข2 )
The values ๐‘ƒ (๐‘ฅ๐‘– , ๐‘ฅ๐‘— ) are computed to approximate the AR(1) process โ€” omitting the deriva-
tion, the rules are as follows:

1. If ๐‘— = 0, then set

๐‘ƒ (๐‘ฅ๐‘– , ๐‘ฅ๐‘— ) = ๐‘ƒ (๐‘ฅ๐‘– , ๐‘ฅ0 ) = ๐น (๐‘ฅ0 โˆ’ ๐œŒ๐‘ฅ๐‘– + ๐‘ /2)

1. If ๐‘— = ๐‘› โˆ’ 1, then set

๐‘ƒ (๐‘ฅ๐‘– , ๐‘ฅ๐‘— ) = ๐‘ƒ (๐‘ฅ๐‘– , ๐‘ฅ๐‘›โˆ’1 ) = 1 โˆ’ ๐น (๐‘ฅ๐‘›โˆ’1 โˆ’ ๐œŒ๐‘ฅ๐‘– โˆ’ ๐‘ /2)

1. Otherwise, set
450 26. FINITE MARKOV CHAINS

๐‘ƒ (๐‘ฅ๐‘– , ๐‘ฅ๐‘— ) = ๐น (๐‘ฅ๐‘— โˆ’ ๐œŒ๐‘ฅ๐‘– + ๐‘ /2) โˆ’ ๐น (๐‘ฅ๐‘— โˆ’ ๐œŒ๐‘ฅ๐‘– โˆ’ ๐‘ /2)

The exercise is to write a function approx_markov(rho, sigma_u, m=3, n=7) that


returns {๐‘ฅ0 , โ€ฆ , ๐‘ฅ๐‘›โˆ’1 } โŠ‚ R and ๐‘› ร— ๐‘› matrix ๐‘ƒ as described above

โ€ข Even better, write a function that returns an instance of QuantEcon.pyโ€™s MarkovChain


class

26.11 Solutions

In [23]: import numpy as np


import matplotlib.pyplot as plt
from quantecon import MarkovChain

26.11.1 Exercise 1

Compute the fraction of time that the worker spends unemployed, and compare it to the sta-
tionary probability

In [24]: ฮฑ = ฮฒ = 0.1
N = 10000
p = ฮฒ / (ฮฑ + ฮฒ)

P = ((1 - ฮฑ, ฮฑ), # Careful: P and p are distinct


( ฮฒ, 1 - ฮฒ))
P = np.array(P)
mc = MarkovChain(P)

fig, ax = plt.subplots(figsize=(9, 6))


ax.set_ylim(-0.25, 0.25)
ax.grid()
ax.hlines(0, 0, N, lw=2, alpha=0.6) # Horizonal line at zero

for x0, col in ((0, 'blue'), (1, 'green')):


# == Generate time series for worker that starts at x0 == #
X = mc.simulate(N, init=x0)
# == Compute fraction of time spent unemployed, for each n == #
X_bar = (X == 0).cumsum() / (1 + np.arange(N, dtype=float))
# == Plot == #
ax.fill_between(range(N), np.zeros(N), X_bar - p, color=col, alpha=0.1)
ax.plot(X_bar - p, color=col, label=f'$X_0 = \, {x0} $')
ax.plot(X_bar - p, 'k-', alpha=0.6) # Overlay in black--make lines clearer

ax.legend(loc='upper right')
plt.show()
26.11. SOLUTIONS 451

26.11.2 Exercise 2

First, save the data into a file called web_graph_data.txt by executing the next cell

In [25]: %%file web_graph_data.txt


a -> d;
a -> f;
b -> j;
b -> k;
b -> m;
c -> c;
c -> g;
c -> j;
c -> m;
d -> f;
d -> h;
d -> k;
e -> d;
e -> h;
e -> l;
f -> a;
f -> b;
f -> j;
f -> l;
g -> b;
g -> j;
h -> d;
h -> g;
h -> l;
h -> m;
i -> g;
i -> h;
i -> n;
j -> e;
j -> i;
j -> k;
k -> n;
l -> m;
452 26. FINITE MARKOV CHAINS

m -> g;
n -> c;
n -> j;
n -> m;

Writing web_graph_data.txt

In [26]: """
Return list of pages, ordered by rank
"""
import numpy as np
from operator import itemgetter

infile = 'web_graph_data.txt'
alphabet = 'abcdefghijklmnopqrstuvwxyz'

n = 14 # Total number of web pages (nodes)

# == Create a matrix Q indicating existence of links == #


# * Q[i, j] = 1 if there is a link from i to j
# * Q[i, j] = 0 otherwise
Q = np.zeros((n, n), dtype=int)
f = open(infile, 'r')
edges = f.readlines()
f.close()
for edge in edges:
from_node, to_node = re.findall('\w', edge)
i, j = alphabet.index(from_node), alphabet.index(to_node)
Q[i, j] = 1
# == Create the corresponding Markov matrix P == #
P = np.empty((n, n))
for i in range(n):
P[i, :] = Q[i, :] / Q[i, :].sum()
mc = MarkovChain(P)
# == Compute the stationary distribution r == #
r = mc.stationary_distributions[0]
ranked_pages = {alphabet[i] : r[i] for i in range(n)}
# == Print solution, sorted from highest to lowest rank == #
print('Rankings\n ***')
for name, rank in sorted(ranked_pages.items(), key=itemgetter(1), reverse=1):
print(f'{name}: {rank:.4}')

Rankings
***
g: 0.1607
j: 0.1594
m: 0.1195
n: 0.1088
k: 0.09106
b: 0.08326
e: 0.05312
i: 0.05312
c: 0.04834
h: 0.0456
l: 0.03202
d: 0.03056
f: 0.01164
a: 0.002911

26.11.3 Exercise 3

A solution from the QuantEcon.py library can be found here


Footnotes
26.11. SOLUTIONS 453

[1] Hint: First show that if ๐‘ƒ and ๐‘„ are stochastic matrices then so is their product โ€” to
check the row sums, try post multiplying by a column vector of ones. Finally, argue that ๐‘ƒ ๐‘›
is a stochastic matrix using induction.
454 26. FINITE MARKOV CHAINS
27

Continuous State Markov Chains

27.1 Contents

โ€ข Overview 27.2

โ€ข The Density Case 27.3

โ€ข Beyond Densities 27.4

โ€ข Stability 27.5

โ€ข Exercises 27.6

โ€ข Solutions 27.7

โ€ข Appendix 27.8

In addition to whatโ€™s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

27.2 Overview

In a previous lecture, we learned about finite Markov chains, a relatively elementary class of
stochastic dynamic models
The present lecture extends this analysis to continuous (i.e., uncountable) state Markov
chains
Most stochastic dynamic models studied by economists either fit directly into this class or can
be represented as continuous state Markov chains after minor modifications
In this lecture, our focus will be on continuous Markov models that

โ€ข evolve in discrete-time
โ€ข are often nonlinear

The fact that we accommodate nonlinear models here is significant, because linear stochastic
models have their own highly developed toolset, as weโ€™ll see later on

455
456 27. CONTINUOUS STATE MARKOV CHAINS

The question that interests us most is: Given a particular stochastic dynamic model, how will
the state of the system evolve over time?
In particular,

โ€ข What happens to the distribution of the state variables?


โ€ข Is there anything we can say about the โ€œaverage behaviorโ€ of these variables?
โ€ข Is there a notion of โ€œsteady stateโ€ or โ€œlong-run equilibriumโ€ thatโ€™s applicable to the
model?

โ€“ If so, how can we compute it?

Answering these questions will lead us to revisit many of the topics that occupied us in the
finite state case, such as simulation, distribution dynamics, stability, ergodicity, etc.

Note
For some people, the term โ€œMarkov chainโ€ always refers to a process with a finite
or discrete state space. We follow the mainstream mathematical literature (e.g.,
[95]) in using the term to refer to any discrete time Markov process

27.3 The Density Case

You are probably aware that some distributions can be represented by densities and some
cannot
(For example, distributions on the real numbers R that put positive probability on individual
points have no density representation)
We are going to start our analysis by looking at Markov chains where the one-step transition
probabilities have density representations
The benefit is that the density case offers a very direct parallel to the finite case in terms of
notation and intuition
Once weโ€™ve built some intuition weโ€™ll cover the general case

27.3.1 Definitions and Basic Properties

In our lecture on finite Markov chains, we studied discrete-time Markov chains that evolve on
a finite state space ๐‘†
In this setting, the dynamics of the model are described by a stochastic matrix โ€” a nonnega-
tive square matrix ๐‘ƒ = ๐‘ƒ [๐‘–, ๐‘—] such that each row ๐‘ƒ [๐‘–, โ‹…] sums to one
The interpretation of ๐‘ƒ is that ๐‘ƒ [๐‘–, ๐‘—] represents the probability of transitioning from state ๐‘–
to state ๐‘— in one unit of time
In symbols,

P{๐‘‹๐‘ก+1 = ๐‘— | ๐‘‹๐‘ก = ๐‘–} = ๐‘ƒ [๐‘–, ๐‘—]

Equivalently,
27.3. THE DENSITY CASE 457

โ€ข ๐‘ƒ can be thought of as a family of distributions ๐‘ƒ [๐‘–, โ‹…], one for each ๐‘– โˆˆ ๐‘†


โ€ข ๐‘ƒ [๐‘–, โ‹…] is the distribution of ๐‘‹๐‘ก+1 given ๐‘‹๐‘ก = ๐‘–

(As you probably recall, when using NumPy arrays, ๐‘ƒ [๐‘–, โ‹…] is expressed as P[i, :])
In this section, weโ€™ll allow ๐‘† to be a subset of R, such as

โ€ข R itself
โ€ข the positive reals (0, โˆž)
โ€ข a bounded interval (๐‘Ž, ๐‘)

The family of discrete distributions ๐‘ƒ [๐‘–, โ‹…] will be replaced by a family of densities ๐‘(๐‘ฅ, โ‹…), one
for each ๐‘ฅ โˆˆ ๐‘†
Analogous to the finite state case, ๐‘(๐‘ฅ, โ‹…) is to be understood as the distribution (density) of
๐‘‹๐‘ก+1 given ๐‘‹๐‘ก = ๐‘ฅ
More formally, a stochastic kernel on ๐‘† is a function ๐‘ โˆถ ๐‘† ร— ๐‘† โ†’ R with the property that

1. ๐‘(๐‘ฅ, ๐‘ฆ) โ‰ฅ 0 for all ๐‘ฅ, ๐‘ฆ โˆˆ ๐‘†


2. โˆซ ๐‘(๐‘ฅ, ๐‘ฆ)๐‘‘๐‘ฆ = 1 for all ๐‘ฅ โˆˆ ๐‘†

(Integrals are over the whole space unless otherwise specified)


For example, let ๐‘† = R and consider the particular stochastic kernel ๐‘๐‘ค defined by

1 (๐‘ฆ โˆ’ ๐‘ฅ)2
๐‘๐‘ค (๐‘ฅ, ๐‘ฆ) โˆถ= โˆš exp {โˆ’ } (1)
2๐œ‹ 2

What kind of model