100% found this document useful (1 vote)

305 views1,478 pages

Quantitative Economics With Python

Uploaded by

Ivanildo Batista

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

305 views1,478 pages

Quantitative Economics With Python

Uploaded by

Ivanildo Batista

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1478

Lectures in Quantitative Economics

with Python

Thomas J. Sargent and John Stachurski1

July 1, 2019

1 https://lectures.quantecon.org/py/
2
Contents

I Introduction to Python 1

1 About Python 3

2 Setting up Your Python Environment 13

3 An Introductory Example 35

4 Python Essentials 55

5 OOP I: Introduction to Object Oriented Programming 73

II The Scientific Libraries 79

6 NumPy 81

7 Matplotlib 99

8 SciPy 111

9 Numba 123

10 Other Scientific Libraries 135

III Advanced Python Programming 145

11 Writing Good Code 147

12 OOP II: Building Classes 155

13 OOP III: Samuelson Multiplier Accelerator 171

14 More Language Features 205

15 Debugging 237

IV Data and Empirics 243

16 Pandas 245

17 Pandas for Panel Data 259

3
4 CONTENTS

18 Linear Regression in Python 279

19 Maximum Likelihood Estimation 295

V Tools and Techniques 315

20 Geometric Series for Elementary Economics 317

21 Linear Algebra 337

22 Complex Numbers and Trignometry 361

23 Orthogonal Projections and Their Applications 371

24 LLN and CLT 387

25 Linear State Space Models 405

26 Finite Markov Chains 429

27 Continuous State Markov Chains 455

28 Cass-Koopmans Optimal Growth Model 477

29 A First Look at the Kalman Filter 505

30 Reverse Engineering a la Muth 523

VI Dynamic Programming 531

31 Shortest Paths 533

32 Job Search I: The McCall Search Model 541

33 Job Search II: Search and Separation 553

34 A Problem that Stumped Milton Friedman 565

35 Job Search III: Search with Learning 583

36 Job Search IV: Modeling Career Choice 599

37 Job Search V: On-the-Job Search 611

38 Optimal Growth I: The Stochastic Optimal Growth Model 621

39 Optimal Growth II: Time Iteration 639

40 Optimal Growth III: The Endogenous Grid Method 657

41 LQ Dynamic Programming Problems 665

42 Optimal Savings I: The Permanent Income Model 693

CONTENTS 5

43 Optimal Savings II: LQ Techniques 711

44 Consumption and Tax Smoothing with Complete and Incomplete Markets 729

45 Optimal Savings III: Occasionally Binding Constraints 747

46 Robustness 763

47 Discrete State Dynamic Programming 783

VII Multiple Agent Models 807

48 Schelling’s Segregation Model 809

49 A Lake Model of Employment and Unemployment 821

50 Rational Expectations Equilibrium 845

51 Markov Perfect Equilibrium 859

52 Robust Markov Perfect Equilibrium 875

53 Uncertainty Traps 893

54 The Aiyagari Model 907

55 Default Risk and Income Fluctuations 915

56 Globalization and Cycles 933

57 Coase’s Theory of the Firm 949

VIII Recursive Models of Dynamic Linear Economies 963

58 Recursive Models of Dynamic Linear Economies 965

59 Growth in Dynamic Linear Economies 1001

60 Lucas Asset Pricing Using DLE 1013

61 IRFs in Hall Models 1021

62 Permanent Income Model using the DLE Class 1029

63 Rosen Schooling Model 1035

64 Cattle Cycles 1041

65 Shock Non Invertibility 1049

IX Classic Linear Models 1055

66 Von Neumann Growth Model (and a Generalization) 1057

6 CONTENTS

X Time Series Models 1073

67 Covariance Stationary Processes 1075

68 Estimation of Spectra 1095

69 Additive and Multiplicative Functionals 1109

70 Classical Control with Linear Algebra 1131

71 Classical Prediction and Filtering With Linear Algebra 1151

XI Asset Pricing and Finance 1171

72 Asset Pricing I: Finite State Models 1173

73 Asset Pricing II: The Lucas Asset Pricing Model 1193

74 Asset Pricing III: Incomplete Markets 1203

75 Two Modifications of Mean-variance Portfolio Theory 1215

XII Dynamic Programming Squared 1239

76 Stackelberg Plans 1241

77 Ramsey Plans, Time Inconsistency, Sustainable Plans 1265

78 Optimal Taxation in an LQ Economy 1289

79 Optimal Taxation with State-Contingent Debt 1309

80 Optimal Taxation without State-Contingent Debt 1339

81 Fluctuating Interest Rates Deliver Fiscal Insurance 1365

82 Fiscal Risk and Government Debt 1389

83 Competitive Equilibria of Chang Model 1415

84 Credible Government Policies in Chang Model 1443

Part I

Introduction to Python

1
1

About Python

1.1 Contents

• Overview 1.2

• What’s Python? 1.3

• Scientific Programming 1.4

• Learn More 1.5

1.2 Overview

In this lecture we will

• Outline what Python is

• Showcase some of its abilities
• Compare it to some other languages

At this stage, it’s not our intention that you try to replicate all you see
We will work through what follows at a slow pace later in the lecture series
Our only objective for this lecture is to give you some feel of what Python is, and what it can
do

1.3 What’s Python?

Python is a general-purpose programming language conceived in 1989 by Dutch programmer

Guido van Rossum
Python is free and open source, with development coordinated through the Python Software
Foundation
Python has experienced rapid adoption in the last decade and is now one of the most popular
programming languages

3
4 1. ABOUT PYTHON

1.3.1 Common Uses

Python is a general-purpose language used in almost all application domains

• communications
• web development
• CGI and graphical user interfaces
• games
• multimedia, data processing, security, etc., etc., etc.

Used extensively by Internet service and high tech companies such as

• Google
• Dropbox
• Reddit
• YouTube
• Walt Disney Animation, etc., etc.

Often used to teach computer science and programming

For reasons we will discuss, Python is particularly popular within the scientific community

• academia, NASA, CERN, Wall St., etc., etc.

1.3.2 Relative Popularity

The following chart, produced using Stack Overflow Trends, shows one measure of the relative
popularity of Python

The figure indicates not only that Python is widely used but also that adoption of Python
has accelerated significantly since 2012
We suspect this is driven at least in part by uptake in the scientific domain, particularly in
rapidly growing fields like data science
1.3. WHAT’S PYTHON? 5

For example, the popularity of pandas, a library for data analysis with Python has exploded,
as seen here
(The corresponding time path for MATLAB is shown for comparison)

Note that pandas takes off in 2012, which is the same year that we seek Python’s popularity
begin to spike in the first figure
Overall, it’s clear that

• Python is one of the most popular programming languages worldwide

• Python is a major tool for scientific computing, accounting for a rapidly rising share of
scientific work around the globe

1.3.3 Features

Python is a high-level language suitable for rapid development

It has a relatively small core language supported by many libraries
Other features:

• A multiparadigm language, in that multiple programming styles are supported (proce-

dural, object-oriented, functional, etc.)
• Interpreted rather than compiled

1.3.4 Syntax and Design

One nice feature of Python is its elegant syntax — we’ll see many examples later on
Elegant code might sound superfluous but in fact it’s highly beneficial because it makes the
syntax easy to read and easy to remember
Remembering how to read from files, sort dictionaries and other such routine tasks means
that you don’t need to break your flow in order to hunt down correct syntax
Closely related to elegant syntax is an elegant design
6 1. ABOUT PYTHON

Features like iterators, generators, decorators, list comprehensions, etc. make Python highly
expressive, allowing you to get more done with less code
Namespaces improve productivity by cutting down on bugs and syntax errors

1.4 Scientific Programming

Python has become one of the core languages of scientific computing

It’s either the dominant player or a major player in

• Machine learning and data science

• Astronomy
• Artificial intelligence
• Chemistry
• Computational biology
• Meteorology
• etc., etc.

Its popularity in economics is also beginning to rise

This section briefly showcases some examples of Python for scientific programming

• All of these topics will be covered in detail later on

1.4.1 Numerical Programming

Fundamental matrix and array processing capabilities are provided by the excellent NumPy
library
NumPy provides the basic array data type plus some simple processing operations
For example, let’s build some arrays

In [1]: import numpy as np # Load the library

a = np.linspace(-np.pi, np.pi, 100) # Create even grid from -π to π

b = np.cos(a) # Apply cosine to each element of a
c = np.sin(a) # Apply sin to each element of a

Now let’s take the inner product:

In [2]: b @ c

Out[2]: 1.5265566588595902e-16

The number you see here might vary slightly but it’s essentially zero
(For older versions of Python and NumPy you need to use the np.dot function)
The SciPy library is built on top of NumPy and provides additional functionality
2
For example, let’s calculate ∫−2 𝜙(𝑧)𝑑𝑧 where 𝜙 is the standard normal density
1.4. SCIENTIFIC PROGRAMMING 7

In [3]: from scipy.stats import norm

from scipy.integrate import quad

� = norm()
value, error = quad(�.pdf, -2, 2) # Integrate using Gaussian quadrature
value

Out[3]: 0.9544997361036417

SciPy includes many of the standard routines used in

• linear algebra
• integration
• interpolation
• optimization
• distributions and random number generation
• signal processing
• etc., etc.

1.4.2 Graphics

The most popular and comprehensive Python library for creating figures and graphs is Mat-
plotlib

• Plots, histograms, contour images, 3D, bar charts, etc., etc.

• Output in many formats (PDF, PNG, EPS, etc.)
• LaTeX integration

Example 2D plot with embedded LaTeX annotations

Example contour plot

8 1. ABOUT PYTHON

Example 3D plot

More examples can be found in the Matplotlib thumbnail gallery

Other graphics libraries include

• Plotly
• Bokeh
• VPython — 3D graphics and animations
1.4. SCIENTIFIC PROGRAMMING 9

1.4.3 Symbolic Algebra

It’s useful to be able to manipulate symbolic expressions, as in Mathematica or Maple

The SymPy library provides this functionality from within the Python shell

In [4]: from sympy import Symbol

x, y = Symbol('x'), Symbol('y') # Treat 'x' and 'y' as algebraic symbols

x + x + x + y

Out[4]: 3*x + y

We can manipulate expressions

In [5]: expression = (x + y)**2

expression.expand()

Out[5]: x**2 + 2xy + y**2

solve polynomials

In [6]: from sympy import solve

solve(x**2 + x + 2)

Out[6]: [-1/2 - sqrt(7)I/2, -1/2 + sqrt(7)I/2]

and calculate limits, derivatives and integrals

In [7]: from sympy import limit, sin, diff

limit(1 / x, x, 0)

Out[7]: oo

In [8]: limit(sin(x) / x, x, 0)

Out[8]: 1

In [9]: diff(sin(x), x)

Out[9]: cos(x)

The beauty of importing this functionality into Python is that we are working within a fully
fledged programming language
Can easily create tables of derivatives, generate LaTeX output, add it to figures, etc., etc.
10 1. ABOUT PYTHON

1.4.4 Statistics

Python’s data manipulation and statistics libraries have improved rapidly over the last few
years
Pandas
One of the most popular libraries for working with data is pandas
Pandas is fast, efficient, flexible and well designed
Here’s a simple example, using some fake data

In [10]: import pandas as pd

np.random.seed(1234)

data = np.random.randn(5, 2) # 5x2 matrix of N(0, 1) random draws

dates = pd.date_range('28/12/2010', periods=5)

df = pd.DataFrame(data, columns=('price', 'weight'), index=dates)

print(df)

price weight
2010-12-28 0.471435 -1.190976
2010-12-29 1.432707 -0.312652
2010-12-30 -0.720589 0.887163
2010-12-31 0.859588 -0.636524
2011-01-01 0.015696 -2.242685

In [11]: df.mean()

Out[11]: price 0.411768

weight -0.699135
dtype: float64

Other Useful Statistics Libraries

- statsmodels — various statistical routines
- scikit-learn — machine learning in Python (sponsored by Google, among others)
- pyMC — for Bayesian data analysis
- pystan Bayesian analysis based on stan

1.4.5 Networks and Graphs

Python has many libraries for studying graphs

One well-known example is NetworkX

• Standard graph algorithms for analyzing network structure, etc.

• Plotting routines
• etc., etc.

Here’s some example code that generates and plots a random graph, with node color deter-
mined by shortest path length from a central node
1.4. SCIENTIFIC PROGRAMMING 11

In [12]: import networkx as nx

import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(1234)

# Generate a random graph

p = dict((i,(np.random.uniform(0, 1),np.random.uniform(0, 1))) for i in range(200))
G = nx.random_geometric_graph(200, 0.12, pos=p)
pos = nx.get_node_attributes(G, 'pos')

# find node nearest the center point (0.5, 0.5)

dists = [(x - 0.5)**2 + (y - 0.5)**2 for x, y in list(pos.values())]
ncenter = np.argmin(dists)

# Plot graph, coloring by path length from central node

p = nx.single_source_shortest_path_length(G, ncenter)
plt.figure()
nx.draw_networkx_edges(G, pos, alpha=0.4)
nx.draw_networkx_nodes(G,
pos,
nodelist=list(p.keys()),
node_size=120, alpha=0.5,
node_color=list(p.values()),
cmap=plt.cm.jet_r)
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:611: MatplotlibDeprecationWarnin
if cb.is_numlike(alpha):

1.4.6 Cloud Computing

Running your Python code on massive servers in the cloud is becoming easier and easier
A nice example is Anaconda Enterprise
12 1. ABOUT PYTHON

See also
- Amazon Elastic Compute Cloud
- The Google App Engine (Python, Java, PHP or Go)
- Pythonanywhere
- Sagemath Cloud

1.4.7 Parallel Processing

Apart from the cloud computing options listed above, you might like to consider
- Parallel computing through IPython clusters
- The Starcluster interface to Amazon’s EC2
- GPU programming through PyCuda, PyOpenCL, Theano or similar

1.4.8 Other Developments

There are many other interesting developments with scientific programming in Python
Some representative examples include
- Jupyter — Python in your browser with code cells, embedded images, etc.
- Numba — Make Python run at the same speed as native machine code!
- Blaze — a generalization of NumPy
- PyTables — manage large data sets
- CVXPY — convex optimization in Python

1.5 Learn More

• Browse some Python projects on GitHub

• Have a look at some of the Jupyter notebooks people have shared on various scientific
topics

- Visit the Python Package Index

- View some of the questions people are asking about Python on Stackoverflow
- Keep up to date on what’s happening in the Python community with the Python subreddit
2

Setting up Your Python

Environment

2.1 Contents

• Overview 2.2

• Anaconda 2.3

• Jupyter Notebooks 2.4

• Installing Libraries 2.5

• Working with Files 2.6

• Editors and IDEs 2.7

• Exercises 2.8

2.2 Overview

In this lecture, you will learn how to

1. get a Python environment up and running with all the necessary tools
2. execute simple Python commands
3. run a sample program
4. install the code libraries that underpin these lectures

2.3 Anaconda

The core Python package is easy to install but not what you should choose for these lectures
These lectures require the entire scientific programming ecosystem, which

• the core installation doesn’t provide

• is painful to install one piece at a time

13
14 2. SETTING UP YOUR PYTHON ENVIRONMENT

Hence the best approach for our purposes is to install a free Python distribution that contains

1. the core Python language and

2. the most popular scientific libraries

The best such distribution is Anaconda

Anaconda is

• very popular
• cross platform
• comprehensive
• completely unrelated to the Nicki Minaj song of the same name

Anaconda also comes with a great package management system to organize your code li-
braries
All of what follows assumes that you adopt this recommendation!

2.3.1 Installing Anaconda

Installing Anaconda is straightforward: download the binary and follow the instructions
Important points:

• Install the latest version

• If you are asked during the installation process whether you’d like to make Anaconda
your default Python installation, say yes
• Otherwise, you can accept all of the defaults

2.3.2 Updating Anaconda

Anaconda supplies a tool called conda to manage and upgrade your Anaconda packages
One conda command you should execute regularly is the one that updates the whole Ana-
conda distribution
As a practice run, please execute the following

1. Open up a terminal
2. Type conda update anaconda

For more information on conda, type conda help in a terminal

2.4 Jupyter Notebooks

Jupyter notebooks are one of the many possible ways to interact with Python and the scien-
tific libraries
They use a browser-based interface to Python with
2.4. JUPYTER NOTEBOOKS 15

• The ability to write and execute Python commands

• Formatted output in the browser, including tables, figures, animation, etc.
• The option to mix in formatted text and mathematical expressions

Because of these possibilities, Jupyter is fast turning into a major player in the scientific com-
puting ecosystem
Here’s an image showing execution of some code (borrowed from here) in a Jupyter notebook

You can find a nice example of the kinds of things you can do in a Jupyter notebook (such as
include maths and text) here
While Jupyter isn’t the only way to code in Python, it’s great for when you wish to

• start coding in Python

• test new ideas or interact with small pieces of code
• share or collaborate scientific ideas with students or colleagues

These lectures are designed for executing in Jupyter notebooks

16 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.4.1 Starting the Jupyter Notebook

Once you have installed Anaconda, you can start the Jupyter notebook
Either

• search for Jupyter in your applications menu, or

• open up a terminal and type jupyter notebook

– Windows users should substitute “Anaconda command prompt” for “terminal” in

the previous line

If you use the second option, you will see something like this (click to enlarge)

The output tells us the notebook is running at http://localhost:8888/

• localhost is the name of the local machine

• 8888 refers to port number 8888 on your computer

Thus, the Jupyter kernel is listening for Python commands on port 8888 of our local machine
Hopefully, your default browser has also opened up with a web page that looks something like
this (click to enlarge)
2.4. JUPYTER NOTEBOOKS 17

What you see here is called the Jupyter dashboard

If you look at the URL at the top, it should be localhost:8888 or similar, matching the
message above
Assuming all this has worked OK, you can now click on New at the top right and select
Python 3 or similar
Here’s what shows up on our machine:
18 2. SETTING UP YOUR PYTHON ENVIRONMENT

The notebook displays an active cell, into which you can type Python commands

2.4.2 Notebook Basics

Let’s start with how to edit code and run simple programs
Running Cells
Notice that in the previous figure the cell is surrounded by a green border
This means that the cell is in edit mode
As a result, you can type in Python code and it will appear in the cell
When you’re ready to execute the code in a cell, hit Shift-Enter instead of the usual En-
ter
2.4. JUPYTER NOTEBOOKS 19

(Note: There are also menu and button options for running code in a cell that you can find
by exploring)
Modal Editing
The next thing to understand about the Jupyter notebook is that it uses a modal editing sys-
tem
This means that the effect of typing at the keyboard depends on which mode you are in
The two modes are

1. Edit mode

• Indicated by a green border around one cell

• Whatever you type appears as is in that cell

1. Command mode

• The green border is replaced by a grey border

• Key strokes are interpreted as commands — for example, typing b adds a new cell be-
low the current one

To switch to

• command mode from edit mode, hit the Esc key or Ctrl-M
20 2. SETTING UP YOUR PYTHON ENVIRONMENT

• edit mode from command mode, hit Enter or click in a cell

The modal behavior of the Jupyter notebook is a little tricky at first but very efficient when
you get used to it
User Interface Tour
At this stage, we recommend you take your time to

• look at the various options in the menus and see what they do
• take the “user interface tour”, which can be accessed through the help menu

Inserting Unicode (e.g., Greek Letters)

Python 3 introduced support for unicode characters, allowing the use of characters such as �
and � in your code
Unicode characters can be typed quickly in Jupyter using the tab key
Try creating a new code cell and typing �, then hitting the tab key on your keyboard
A Test Program
Let’s run a test program
Here’s an arbitrary program we can use: http://matplotlib.org/1.4.1/examples/
pie_and_polar_charts/polar_bar_demo.html
On that page, you’ll see the following code

In [1]: import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

N = 20
θ = np.linspace(0.0, 2 * np.pi, N, endpoint=False)
radii = 10 * np.random.rand(N)
width = np.pi / 4 * np.random.rand(N)

ax = plt.subplot(111, polar=True)
bars = ax.bar(θ, radii, width=width, bottom=0.0)

# Use custom colors and opacity

for r, bar in zip(radii, bars):
bar.set_facecolor(plt.cm.jet(r / 10.))
bar.set_alpha(0.5)

plt.show()
2.4. JUPYTER NOTEBOOKS 21

Don’t worry about the details for now — let’s just run it and see what happens
The easiest way to run this code is to copy and paste into a cell in the notebook
(In older versions of Jupyter you might need to add the command %matplotlib inline
before you generate the figure)

2.4.3 Working with the Notebook

Here are a few more tips on working with Jupyter notebooks

Tab Completion
In the previous program, we executed the line import numpy as np

• NumPy is a numerical library we’ll work with in depth

After this import command, functions in NumPy can be accessed with

np.<function_name> type syntax

• For example, try np.random.randn(3)

We can explore these attributes of np using the Tab key

For example, here we type np.ran and hit Tab (click to enlarge)
22 2. SETTING UP YOUR PYTHON ENVIRONMENT

Jupyter offers up the two possible completions, random and rank

In this way, the Tab key helps remind you of what’s available and also saves you typing
On-Line Help
To get help on np.rank, say, we can execute np.rank?
Documentation appears in a split window of the browser, like so
2.4. JUPYTER NOTEBOOKS 23

Clicking on the top right of the lower split closes the on-line help
Other Content
In addition to executing code, the Jupyter notebook allows you to embed text, equations, fig-
ures and even videos in the page
For example, here we enter a mixture of plain text and LaTeX instead of code
24 2. SETTING UP YOUR PYTHON ENVIRONMENT

Next we Esc to enter command mode and then type m to indicate that we are writing Mark-
down, a mark-up language similar to (but simpler than) LaTeX
(You can also use your mouse to select Markdown from the Code drop-down box just below
the list of menu items)
Now we Shift+Enter to produce this
2.4. JUPYTER NOTEBOOKS 25

2.4.4 Sharing Notebooks

Notebook files are just text files structured in JSON and typically ending with .ipynb
You can share them in the usual way that you share files — or by using web services such as
nbviewer
The notebooks you see on that site are static html representations
To run one, download it as an ipynb file by clicking on the download icon at the top right
Save it somewhere, navigate to it from the Jupyter dashboard and then run as discussed
above

2.4.5 QuantEcon Notes

QuantEcon has its own site for sharing Jupyter notebooks related to economics – QuantEcon
Notes
Notebooks submitted to QuantEcon Notes can be shared with a link, and are open to com-
ments and votes by the community
26 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.5 Installing Libraries

Most of the libraries we need come in Anaconda

Other libraries can be installed with pip
One library we’ll be using is QuantEcon.py
You can install QuantEcon.py by starting Jupyter and typing

!pip install quantecon

into a cell
Alternatively, you can type the following into a terminal

pip install quantecon

More instructions can be found on the library page

To upgrade to the latest version, which you should do regularly, use

pip install --upgrade quantecon

Another library we will be using is interpolation.py

This can be installed by typing in Jupyter

!pip install interpolation

2.6 Working with Files

How does one run a locally saved Python file?

There are a number of ways to do this but let’s focus on methods using Jupyter notebooks

2.6.1 Option 1: Copy and Paste

The steps are:

1. Navigate to your file with your mouse/trackpad using a file browser

2. Click on your file to open it with a text editor
3. Copy and paste into a cell and Shift-Enter

2.6.2 Method 2: Run

Using the run command is often easier than copy and paste

• For example, %run test.py will run the file test.py

2.6. WORKING WITH FILES 27

(You might find that the % is unnecessary — use %automagic to toggle the need for %)
Note that Jupyter only looks for test.py in the present working directory (PWD)
If test.py isn’t in that directory, you will get an error
Let’s look at a successful example, where we run a file test.py with contents:

In [2]: for i in range(5):

print('foobar')

foobar
foobar
foobar
foobar
foobar

Here’s the notebook (click to enlarge)

Here

• pwd asks Jupyter to show the PWD (or %pwd — see the comment about automagic
above)

– This is where Jupyter is going to look for files to run

– Your output will look a bit different depending on your OS

• ls asks Jupyter to list files in the PWD (or %ls)

28 2. SETTING UP YOUR PYTHON ENVIRONMENT

– Note that test.py is there (on our computer, because we saved it there earlier)

• cat test.py asks Jupyter to print the contents of test.py (or !type test.py on
Windows)

• run test.py runs the file and prints any output

2.6.3 But File X isn’t in my PWD!

If you’re trying to run a file not in the present working directory, you’ll get an error
To fix this error you need to either

1. Shift the file into the PWD, or

2. Change the PWD to where the file lives

One way to achieve the first option is to use the Upload button

• The button is on the top level dashboard, where Jupyter first opened to
• Look where the pointer is in this picture

The second option can be achieved using the cd command

• On Windows it might look like this cd C:/Python27/Scripts/dir

• On Linux / OSX it might look like this cd /home/user/scripts/dir

Note: You can type the first letter or two of each directory name and then use the tab key to
expand

2.6.4 Loading Files

It’s often convenient to be able to see your code before you run it
2.7. EDITORS AND IDES 29

In the following example, we execute load white_noise_plot.py where

white_noise_plot.py is in the PWD
(Use %load if automagic is off)
Now the code from the file appears in a cell ready to execute

2.6.5 Saving Files

To save the contents of a cell as file foo.py

• put %%file foo.py as the first line of the cell

• Shift+Enter

Here %%file is an example of a cell magic

2.7 Editors and IDEs

The preceding discussion covers most of what you need to know to interact with this website
However, as you start to write longer programs, you might want to experiment with your
workflow
There are many different options and we mention them only in passing
30 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.7.1 JupyterLab

JupyterLab is an integrated development environment centered around Jupyter notebooks

It is available through Anaconda and will soon be made the default environment for Jupyter
notebooks
Reading the docs or searching for a recent YouTube video will give you more information

2.7.2 Text Editors

A text editor is an application that is specifically designed to work with text files — such as
Python programs
Nothing beats the power and efficiency of a good text editor for working with program text
A good text editor will provide

• efficient text editing commands (e.g., copy, paste, search and replace)
• syntax highlighting, etc.

Among the most popular are Sublime Text and Atom

For a top quality open source text editor with a steeper learning curve, try Emacs
If you want an outstanding free text editor and don’t mind a seemingly vertical learning
curve plus long days of pain and suffering while all your neural pathways are rewired, try
Vim

2.7.3 Text Editors Plus IPython Shell

A text editor is for writing programs

To run them you can continue to use Jupyter as described above
Another option is to use the excellent IPython shell
To use an IPython shell, open up a terminal and type ipython
You should see something like this
2.7. EDITORS AND IDES 31

The IPython shell has many of the features of the notebook: tab completion, color syntax,
etc.
It also has command history through the arrow key
The up arrow key to brings previously typed commands to the prompt
This saves a lot of typing…
Here’s one set up, on a Linux box, with

• a file being edited in Vim

• An IPython shell next to it, to run the file
32 2. SETTING UP YOUR PYTHON ENVIRONMENT

2.7.4 IDEs

IDEs are Integrated Development Environments, which allow you to edit, execute and inter-
act with code from an integrated environment
One of the most popular in recent times is VS Code, which is now available via Anaconda
We hear good things about VS Code — please tell us about your experiences on the forum

2.8 Exercises

2.8.1 Exercise 1

If Jupyter is still running, quit by using Ctrl-C at the terminal where you started it
Now launch again, but this time using jupyter notebook --no-browser
This should start the kernel without launching the browser
Note also the startup message: It should give you a URL such as
http://localhost:8888 where the notebook is running
Now

1. Start your browser — or open a new tab if it’s already running

2. Enter the URL from above (e.g. http://localhost:8888) in the address bar at the
top

You should now be able to run a standard Jupyter notebook session

This is an alternative way to start the notebook that can also be handy

2.8.2 Exercise 2

This exercise will familiarize you with git and GitHub

Git is a version control system — a piece of software used to manage digital projects such as
code libraries
In many cases, the associated collections of files — called repositories — are stored on
GitHub
GitHub is a wonderland of collaborative coding projects
For example, it hosts many of the scientific libraries we’ll be using later on, such as this one
Git is the underlying software used to manage these projects
Git is an extremely powerful tool for distributed collaboration — for example, we use it to
share and synchronize all the source files for these lectures
There are two main flavors of Git

1. the plain vanilla command line Git version

2. the various point-and-click GUI versions
2.8. EXERCISES 33

• See, for example, the GitHub version

As an exercise, try

1. Installing Git
2. Getting a copy of QuantEcon.py using Git

For example, if you’ve installed the command line version, open up a terminal and enter

git clone https://github.com/QuantEcon/QuantEcon.py

(This is just git clone in front of the URL for the repository)
Even better,

1. Sign up to GitHub
2. Look into ‘forking’ GitHub repositories (forking means making your own copy of a
GitHub repository, stored on GitHub)
3. Fork QuantEcon.py
4. Clone your fork to some local directory, make edits, commit them, and push them back
up to your forked GitHub repo
5. If you made a valuable improvement, send us a pull request!

For reading on these and other topics, try

• The official Git documentation

• Reading through the docs on GitHub
• Pro Git Book by Scott Chacon and Ben Straub
• One of the thousands of Git tutorials on the Net
34 2. SETTING UP YOUR PYTHON ENVIRONMENT
3

An Introductory Example

3.1 Contents

• Overview 3.2

• The Task: Plotting a White Noise Process 3.3

• Version 1 3.4

• Alternative Versions 3.5

• Exercises 3.6

• Solutions 3.7

We’re now ready to start learning the Python language itself

The level of this and the next few lectures will suit those with some basic knowledge of pro-
gramming
But don’t give up if you have none—you are not excluded
You just need to cover a few of the fundamentals of programming before returning here
Good references for first time programmers include:

• The first 5 or 6 chapters of How to Think Like a Computer Scientist

• Automate the Boring Stuff with Python
• The start of Dive into Python 3

Note: These references offer help on installing Python but you should probably stick with the
method on our set up page
You’ll then have an outstanding scientific computing environment (Anaconda) and be ready
to move on to the rest of our course

3.2 Overview

In this lecture, we will write and then pick apart small Python programs

35
36 3. AN INTRODUCTORY EXAMPLE

The objective is to introduce you to basic Python syntax and data structures
Deeper concepts will be covered in later lectures

3.2.1 Prerequisites

The lecture on getting started with Python

3.3 The Task: Plotting a White Noise Process

Suppose we want to simulate and plot the white noise process 𝜖0 , 𝜖1 , … , 𝜖𝑇 , where each draw
𝜖𝑡 is independent standard normal
In other words, we want to generate figures that look something like this:

We’ll do this in several different ways

3.4 Version 1

Here are a few lines of code that perform the task we set

In [1]: import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

x = np.random.randn(100)
plt.plot(x)
plt.show()
3.4. VERSION 1 37

Let’s break this program down and see how it works

3.4.1 Import Statements

The first two lines of the program import functionality

The first line imports NumPy, a favorite Python package for tasks like

• working with arrays (vectors and matrices)

• common mathematical functions like cos and sqrt
• generating random numbers
• linear algebra, etc.

After import numpy as np we have access to these attributes via the syntax np.
Here’s another example

In [2]: import numpy as np

np.sqrt(4)

Out[2]: 2.0

We could also just write

In [3]: import numpy

numpy.sqrt(4)

Out[3]: 2.0
38 3. AN INTRODUCTORY EXAMPLE

But the former method is convenient and more standard

Why all the Imports?
Remember that Python is a general-purpose language
The core language is quite small so it’s easy to learn and maintain
When you want to do something interesting with Python, you almost always need to import
additional functionality
Scientific work in Python is no exception
Most of our programs start off with lines similar to the import statements seen above
Packages
As stated above, NumPy is a Python package
Packages are used by developers to organize a code library
In fact, a package is just a directory containing

1. files with Python code — called modules in Python speak

2. possibly some compiled code that can be accessed by Python (e.g., functions compiled
from C or FORTRAN code)
3. a file called __init__.py that specifies what will be executed when we type import
package_name

In fact, you can find and explore the directory for NumPy on your computer easily enough if
you look around
On this machine, it’s located in

anaconda3/lib/python3.6/site-packages/numpy

Subpackages
Consider the line x = np.random.randn(100)
Here np refers to the package NumPy, while random is a subpackage of NumPy
You can see the contents here
Subpackages are just packages that are subdirectories of another package

3.4.2 Importing Names Directly

Recall this code that we saw above

In [4]: import numpy as np

np.sqrt(4)

Out[4]: 2.0

Here’s another way to access NumPy’s square root function

3.5. ALTERNATIVE VERSIONS 39

In [5]: from numpy import sqrt

sqrt(4)

Out[5]: 2.0

This is also fine

The advantage is less typing if we use sqrt often in our code
The disadvantage is that, in a long program, these two lines might be separated by many
other lines
Then it’s harder for readers to know where sqrt came from, should they wish to

3.5 Alternative Versions

Let’s try writing some alternative versions of our first program

Our aim in doing this is to illustrate some more Python syntax and semantics
The programs below are less efficient but

• help us understand basic constructs like loops

• illustrate common data types like lists

3.5.1 A Version with a For Loop

Here’s a version that illustrates loops and Python lists

In [6]: ts_length = 100

�_values = [] # Empty list

for i in range(ts_length):
e = np.random.randn()
�_values.append(e)

plt.plot(�_values)
plt.show()
40 3. AN INTRODUCTORY EXAMPLE

In brief,

• The first pair of lines import functionality as before

• The next line sets the desired length of the time series
• The next line creates an empty list called �_values that will store the 𝜖𝑡 values as we
generate them
• The next three lines are the for loop, which repeatedly draws a new random number 𝜖𝑡
and appends it to the end of the list �_values
• The last two lines generate the plot and display it to the user

Let’s study some parts of this program in more detail

3.5.2 Lists

Consider the statement �_values = [], which creates an empty list

Lists are a native Python data structure used to group a collection of objects
For example, try

In [7]: x = [10, 'foo', False] # We can include heterogeneous data inside a list
type(x)

Out[7]: list

The first element of x is an integer, the next is a string and the third is a Boolean value
When adding a value to a list, we can use the syntax list_name.append(some_value)

In [8]: x
3.5. ALTERNATIVE VERSIONS 41

Out[8]: [10, 'foo', False]

In [9]: x.append(2.5)
x

Out[9]: [10, 'foo', False, 2.5]

Here append() is what’s called a method, which is a function “attached to” an object—in
this case, the list x
We’ll learn all about methods later on, but just to give you some idea,

• Python objects such as lists, strings, etc. all have methods that are used to manipulate
the data contained in the object
• String objects have string methods, list objects have list methods, etc.

Another useful list method is pop()

In [10]: x

Out[10]: [10, 'foo', False, 2.5]

In [11]: x.pop()

Out[11]: 2.5

In [12]: x

Out[12]: [10, 'foo', False]

The full set of list methods can be found here

Following C, C++, Java, etc., lists in Python are zero-based

In [13]: x

Out[13]: [10, 'foo', False]

In [14]: x[0]

Out[14]: 10

In [15]: x[1]

Out[15]: 'foo'
42 3. AN INTRODUCTORY EXAMPLE

3.5.3 The For Loop

Now let’s consider the for loop from the program above, which was

In [16]: for i in range(ts_length):

e = np.random.randn()
�_values.append(e)

Python executes the two indented lines ts_length times before moving on
These two lines are called a code block, since they comprise the “block” of code that we
are looping over
Unlike most other languages, Python knows the extent of the code block only from indenta-
tion
In our program, indentation decreases after line �_values.append(e), telling Python that
this line marks the lower limit of the code block
More on indentation below—for now, let’s look at another example of a for loop

In [17]: animals = ['dog', 'cat', 'bird']

for animal in animals:
print("The plural of " + animal + " is " + animal + "s")

The plural of dog is dogs

The plural of cat is cats
The plural of bird is birds

This example helps to clarify how the for loop works: When we execute a loop of the form

for variable_name in sequence:

The Python interpreter performs the following:

• For each element of the sequence, it “binds” the name variable_name to that ele-
ment and then executes the code block

The sequence object can in fact be a very general object, as we’ll see soon enough

3.5.4 Code Blocks and Indentation

In discussing the for loop, we explained that the code blocks being looped over are delimited
by indentation
In fact, in Python, all code blocks (i.e., those occurring inside loops, if clauses, function defi-
nitions, etc.) are delimited by indentation
Thus, unlike most other languages, whitespace in Python code affects the output of the pro-
gram
Once you get used to it, this is a good thing: It
3.5. ALTERNATIVE VERSIONS 43

• forces clean, consistent indentation, improving readability

• removes clutter, such as the brackets or end statements used in other languages

On the other hand, it takes a bit of care to get right, so please remember:

• The line before the start of a code block always ends in a colon

– for i in range(10):
– if x > y:
– while x < 100:
– etc., etc.

• All lines in a code block must have the same amount of indentation

• The Python standard is 4 spaces, and that’s what you should use

Tabs vs Spaces
One small “gotcha” here is the mixing of tabs and spaces, which often leads to errors
(Important: Within text files, the internal representation of tabs and spaces is not the same)
You can use your Tab key to insert 4 spaces, but you need to make sure it’s configured to do
so
If you are using a Jupyter notebook you will have no problems here
Also, good text editors will allow you to configure the Tab key to insert spaces instead of tabs
— trying searching online

3.5.5 While Loops

The for loop is the most common technique for iteration in Python
But, for the purpose of illustration, let’s modify the program above to use a while loop in-
stead

In [18]: ts_length = 100

�_values = []
i = 0
while i < ts_length:
e = np.random.randn()
�_values.append(e)
i = i + 1
plt.plot(�_values)
plt.show()
44 3. AN INTRODUCTORY EXAMPLE

Note that

• the code block for the while loop is again delimited only by indentation
• the statement i = i + 1 can be replaced by i += 1

3.5.6 User-Defined Functions

Now let’s go back to the for loop, but restructure our program to make the logic clearer
To this end, we will break our program into two parts:

1. A user-defined function that generates a list of random variables

2. The main part of the program that

3. calls this function to get data

4. plots the data

This is accomplished in the next program

In [19]: def generate_data(n):

�_values = []
for i in range(n):
e = np.random.randn()
�_values.append(e)
return �_values

data = generate_data(100)
plt.plot(data)
plt.show()
3.5. ALTERNATIVE VERSIONS 45

Let’s go over this carefully, in case you’re not familiar with functions and how they work
We have defined a function called generate_data() as follows

• def is a Python keyword used to start function definitions

• def generate_data(n): indicates that the function is called generate_data and
that it has a single argument n
• The indented code is a code block called the function body—in this case, it creates an
IID list of random draws using the same logic as before
• The return keyword indicates that �_values is the object that should be returned to
the calling code

This whole function definition is read by the Python interpreter and stored in memory
When the interpreter gets to the expression generate_data(100), it executes the function
body with n set equal to 100
The net result is that the name data is bound to the list �_values returned by the function

3.5.7 Conditions

Our function generate_data() is rather limited

Let’s make it slightly more useful by giving it the ability to return either standard normals or
uniform random variables on (0, 1) as required
This is achieved the next piece of code

In [20]: def generate_data(n, generator_type):

�_values = []
for i in range(n):
if generator_type == 'U':
e = np.random.uniform(0, 1)
46 3. AN INTRODUCTORY EXAMPLE

else:
e = np.random.randn()
�_values.append(e)
return �_values

data = generate_data(100, 'U')

plt.plot(data)
plt.show()

Hopefully, the syntax of the if/else clause is self-explanatory, with indentation again delimit-
ing the extent of the code blocks
Notes

• We are passing the argument U as a string, which is why we write it as 'U'

• Notice that equality is tested with the == syntax, not =

– For example, the statement a = 10 assigns the name a to the value 10

– The expression a == 10 evaluates to either True or False, depending on the
value of a

Now, there are several ways that we can simplify the code above
For example, we can get rid of the conditionals all together by just passing the desired gener-
ator type as a function
To understand this, consider the following version

In [21]: def generate_data(n, generator_type):

�_values = []
for i in range(n):
e = generator_type()
�_values.append(e)
return �_values
3.5. ALTERNATIVE VERSIONS 47

data = generate_data(100, np.random.uniform)

plt.plot(data)
plt.show()

Now, when we call the function generate_data(), we pass np.random.uniform as the

second argument
This object is a function
When the function call generate_data(100, np.random.uniform) is executed,
Python runs the function code block with n equal to 100 and the name generator_type
“bound” to the function np.random.uniform

• While these lines are executed, the names generator_type and

np.random.uniform are “synonyms”, and can be used in identical ways

This principle works more generally—for example, consider the following piece of code

In [22]: max(7, 2, 4) # max() is a built-in Python function

Out[22]: 7

In [23]: m = max
m(7, 2, 4)

Out[23]: 7

Here we created another name for the built-in function max(), which could then be used in
identical ways
In the context of our program, the ability to bind new names to functions means that there is
no problem passing a function as an argument to another function—as we did above
48 3. AN INTRODUCTORY EXAMPLE

3.5.8 List Comprehensions

We can also simplify the code for generating the list of random draws considerably by using
something called a list comprehension
List comprehensions are an elegant Python tool for creating lists
Consider the following example, where the list comprehension is on the right-hand side of the
second line

In [24]: animals = ['dog', 'cat', 'bird']

plurals = [animal + 's' for animal in animals]
plurals

Out[24]: ['dogs', 'cats', 'birds']

Here’s another example

In [25]: range(8)

Out[25]: range(0, 8)

In [26]: doubles = [2 * x for x in range(8)]

doubles

Out[26]: [0, 2, 4, 6, 8, 10, 12, 14]

With the list comprehension syntax, we can simplify the lines

�_values = []
for i in range(n):
e = generator_type()
�_values.append(e)

into

�_values = [generator_type() for i in range(n)]

3.6 Exercises

3.6.1 Exercise 1

Recall that 𝑛! is read as “𝑛 factorial” and defined as 𝑛! = 𝑛 × (𝑛 − 1) × ⋯ × 2 × 1

There are functions to compute this in various modules, but let’s write our own version as an
exercise
In particular, write a function factorial such that factorial(n) returns 𝑛! for any posi-
tive integer 𝑛
3.6. EXERCISES 49

3.6.2 Exercise 2

The binomial random variable 𝑌 ∼ 𝐵𝑖𝑛(𝑛, 𝑝) represents the number of successes in 𝑛 binary
trials, where each trial succeeds with probability 𝑝
Without any import besides from numpy.random import uniform, write a function
binomial_rv such that binomial_rv(n, p) generates one draw of 𝑌
Hint: If 𝑈 is uniform on (0, 1) and 𝑝 ∈ (0, 1), then the expression U < p evaluates to True
with probability 𝑝

3.6.3 Exercise 3

Compute an approximation to 𝜋 using Monte Carlo. Use no imports besides

In [27]: import numpy as np

Your hints are as follows:

• If 𝑈 is a bivariate uniform random variable on the unit square (0, 1)2 , then the proba-
bility that 𝑈 lies in a subset 𝐵 of (0, 1)2 is equal to the area of 𝐵
• If 𝑈1 , … , 𝑈𝑛 are IID copies of 𝑈 , then, as 𝑛 gets large, the fraction that falls in 𝐵, con-
verges to the probability of landing in 𝐵
• For a circle, area = pi * radius^2

3.6.4 Exercise 4

Write a program that prints one realization of the following random device:

• Flip an unbiased coin 10 times

• If 3 consecutive heads occur one or more times within this sequence, pay one dollar
• If not, pay nothing

Use no import besides from numpy.random import uniform

3.6.5 Exercise 5

Your next task is to simulate and plot the correlated time series

𝑥𝑡+1 = 𝛼 𝑥𝑡 + 𝜖𝑡+1 where 𝑥0 = 0 and 𝑡 = 0, … , 𝑇

The sequence of shocks {𝜖𝑡 } is assumed to be IID and standard normal

In your solution, restrict your import statements to

In [28]: import numpy as np

import matplotlib.pyplot as plt

Set 𝑇 = 200 and 𝛼 = 0.9

50 3. AN INTRODUCTORY EXAMPLE

3.6.6 Exercise 6

To do the next exercise, you will need to know how to produce a plot legend
The following example should be sufficient to convey the idea

In [29]: import numpy as np

import matplotlib.pyplot as plt

x = [np.random.randn() for i in range(100)]

plt.plot(x, label="white noise")
plt.legend()
plt.show()

Now, starting with your solution to exercise 5, plot three simulated time series, one for each
of the cases 𝛼 = 0, 𝛼 = 0.8 and 𝛼 = 0.98
In particular, you should produce (modulo randomness) a figure that looks as follows
3.7. SOLUTIONS 51

(The figure nicely illustrates how time series with the same one-step-ahead conditional volatil-
ities, as these three processes have, can have very different unconditional volatilities.)
Use a for loop to step through the 𝛼 values
Important hints:

• If you call the plot() function multiple times before calling show(), all of the lines
you produce will end up on the same figure

– And if you omit the argument 'b-' to the plot function, Matplotlib will automati-
cally select different colors for each line

• The expression 'foo' + str(42) evaluates to 'foo42'

3.7 Solutions

3.7.1 Exercise 1
In [30]: def factorial(n):
k = 1
for i in range(n):
k = k * (i + 1)
return k

factorial(4)

Out[30]: 24

3.7.2 Exercise 2
In [31]: from numpy.random import uniform
52 3. AN INTRODUCTORY EXAMPLE

def binomial_rv(n, p):

count = 0
for i in range(n):
U = uniform()
if U < p:
count = count + 1 # Or count += 1
return count

binomial_rv(10, 0.5)

Out[31]: 5

3.7.3 Exercise 3

Consider the circle of diameter 1 embedded in the unit square

Let 𝐴 be its area and let 𝑟 = 1/2 be its radius
If we know 𝜋 then we can compute 𝐴 via 𝐴 = 𝜋𝑟2
But here the point is to compute 𝜋, which we can do by 𝜋 = 𝐴/𝑟2
Summary: If we can estimate the area of the unit circle, then dividing by 𝑟2 = (1/2)2 = 1/4
gives an estimate of 𝜋
We estimate the area by sampling bivariate uniforms and looking at the fraction that falls
into the unit circle

In [32]: n = 100000

count = 0
for i in range(n):
u, v = np.random.uniform(), np.random.uniform()
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1

area_estimate = count / n

print(area_estimate * 4) # dividing by radius**2

3.13976

3.7.4 Exercise 4
In [33]: from numpy.random import uniform

payoff = 0
count = 0

for i in range(10):
U = uniform()
count = count + 1 if U < 0.5 else 0
if count == 3:
payoff = 1

print(payoff)

1
3.7. SOLUTIONS 53

3.7.5 Exercise 5

The next line embeds all subsequent figures in the browser itself

In [34]: α = 0.9
ts_length = 200
current_x = 0

x_values = []
for i in range(ts_length + 1):
x_values.append(current_x)
current_x = α * current_x + np.random.randn()
plt.plot(x_values)
plt.show()

3.7.6 Exercise 6

In [35]: αs = [0.0, 0.8, 0.98]

ts_length = 200

for α in αs:
x_values = []
current_x = 0
for i in range(ts_length):
x_values.append(current_x)
current_x = α * current_x + np.random.randn()
plt.plot(x_values, label=f'α = {α}')
plt.legend()
plt.show()
54 3. AN INTRODUCTORY EXAMPLE
4

Python Essentials

4.1 Contents

• Data Types 4.2

• Input and Output 4.3

• Iterating 4.4

• Comparisons and Logical Operators 4.5

• More Functions 4.6

• Coding Style and PEP8 4.7

• Exercises 4.8

• Solutions 4.9

In this lecture, we’ll cover features of the language that are essential to reading and writing
Python code

4.2 Data Types

We’ve already met several built-in Python data types, such as strings, integers, floats and
lists
Let’s learn a bit more about them

4.2.1 Primitive Data Types

One simple data type is Boolean values, which can be either True or False

In [1]: x = True
x

Out[1]: True

55
56 4. PYTHON ESSENTIALS

In the next line of code, the interpreter evaluates the expression on the right of = and binds y
to this value

In [2]: y = 100 < 10

Out[2]: False

In [3]: type(y)

Out[3]: bool

In arithmetic expressions, True is converted to 1 and False is converted 0

This is called Boolean arithmetic and is often useful in programming
Here are some examples

In [4]: x + y

Out[4]: 1

In [5]: x * y

Out[5]: 0

In [6]: True + True

Out[6]: 2

In [7]: bools = [True, True, False, True] # List of Boolean values

sum(bools)

Out[7]: 3

The two most common data types used to represent numbers are integers and floats

In [8]: a, b = 1, 2
c, d = 2.5, 10.0
type(a)

Out[8]: int

In [9]: type(c)

Out[9]: float

Computers distinguish between the two because, while floats are more informative, arithmetic
operations on integers are faster and more accurate
As long as you’re using Python 3.x, division of integers yields floats

In [10]: 1 / 2
4.2. DATA TYPES 57

Out[10]: 0.5

But be careful! If you’re still using Python 2.x, division of two integers returns only the inte-
ger part
For integer division in Python 3.x use this syntax:

In [11]: 1 // 2

Out[11]: 0

Complex numbers are another primitive data type in Python

In [12]: x = complex(1, 2)
y = complex(2, 1)
x * y

Out[12]: 5j

4.2.2 Containers

Python has several basic types for storing collections of (possibly heterogeneous) data
We’ve already discussed lists
A related data type is tuples, which are “immutable” lists

In [13]: x = ('a', 'b') # Parentheses instead of the square brackets

x = 'a', 'b' # Or no brackets --- the meaning is identical
x

Out[13]: ('a', 'b')

In [14]: type(x)

Out[14]: tuple

In Python, an object is called immutable if, once created, the object cannot be changed
Conversely, an object is mutable if it can still be altered after creation
Python lists are mutable

In [15]: x = [1, 2]
x[0] = 10
x

Out[15]: [10, 2]

But tuples are not

In [16]: x = (1, 2)
x[0] = 10
58 4. PYTHON ESSENTIALS

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-16-d1b2647f6c81> in <module>
1 x = (1, 2)
----> 2 x[0] = 10

TypeError: 'tuple' object does not support item assignment

We’ll say more about the role of mutable and immutable data a bit later
Tuples (and lists) can be “unpacked” as follows

In [17]: integers = (10, 20, 30)

x, y, z = integers
x

Out[17]: 10

In [18]: y

Out[18]: 20

You’ve actually seen an example of this already

Tuple unpacking is convenient and we’ll use it often
Slice Notation
To access multiple elements of a list or tuple, you can use Python’s slice notation
For example,

In [19]: a = [2, 4, 6, 8]
a[1:]

Out[19]: [4, 6, 8]

In [20]: a[1:3]

Out[20]: [4, 6]

The general rule is that a[m:n] returns n - m elements, starting at a[m]

Negative numbers are also permissible

In [21]: a[-2:] # Last two elements of the list

Out[21]: [6, 8]

The same slice notation works on tuples and strings

In [22]: s = 'foobar'
s[-3:] # Select the last three elements
4.3. INPUT AND OUTPUT 59

Out[22]: 'bar'

Sets and Dictionaries

Two other container types we should mention before moving on are sets and dictionaries
Dictionaries are much like lists, except that the items are named instead of numbered

In [23]: d = {'name': 'Frodo', 'age': 33}

type(d)

Out[23]: dict

In [24]: d['age']

Out[24]: 33

The names 'name' and 'age' are called the keys

The objects that the keys are mapped to ('Frodo' and 33) are called the values
Sets are unordered collections without duplicates, and set methods provide the usual set-
theoretic operations

In [25]: s1 = {'a', 'b'}

type(s1)

Out[25]: set

In [26]: s2 = {'b', 'c'}

s1.issubset(s2)

Out[26]: False

In [27]: s1.intersection(s2)

Out[27]: {'b'}

The set() function creates sets from sequences

In [28]: s3 = set(('foo', 'bar', 'foo'))

Out[28]: {'bar', 'foo'}

4.3 Input and Output

Let’s briefly review reading and writing to text files, starting with writing

In [29]: f = open('newfile.txt', 'w') # Open 'newfile.txt' for writing

f.write('Testing\n') # Here '\n' means new line
f.write('Testing again')
f.close()
60 4. PYTHON ESSENTIALS

Here

• The built-in function open() creates a file object for writing to

• Both write() and close() are methods of file objects

Where is this file that we’ve created?

Recall that Python maintains a concept of the present working directory (pwd) that can be
located from with Jupyter or IPython via

In [30]: %pwd

Out[30]: '/home/anju/Desktop/lecture-source-py/_build/jupyter/executed'

If a path is not specified, then this is where Python writes to

We can also use Python to read the contents of newline.txt as follows

In [31]: f = open('newfile.txt', 'r')

out = f.read()
out

Out[31]: 'Testing\nTesting again'

In [32]: print(out)

Testing
Testing again

4.3.1 Paths

Note that if newfile.txt is not in the present working directory then this call to open()
fails
In this case, you can shift the file to the pwd or specify the full path to the file

f = open('insert_full_path_to_file/newfile.txt', 'r')

4.4 Iterating

One of the most important tasks in computing is stepping through a sequence of data and
performing a given action
One of Python’s strengths is its simple, flexible interface to this kind of iteration via the for
loop

4.4.1 Looping over Different Objects

Many Python objects are “iterable”, in the sense that they can be looped over
To give an example, let’s write the file us_cities.txt, which lists US cities and their popula-
tion, to the present working directory
4.4. ITERATING 61

In [33]: %%file us_cities.txt

new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229

Overwriting us_cities.txt

Suppose that we want to make the information more readable, by capitalizing names and
adding commas to mark thousands
The program us_cities.py program reads the data in and makes the conversion:

In [34]: data_file = open('us_cities.txt', 'r')

for line in data_file:
city, population = line.split(':') # Tuple unpacking
city = city.title() # Capitalize city names
population = f'{int(population):,}' # Add commas to numbers
print(city.ljust(15) + population)
data_file.close()

New York 8,244,910

Los Angeles 3,819,702
Chicago 2,707,120
Houston 2,145,146
Philadelphia 1,536,471
Phoenix 1,469,471
San Antonio 1,359,758
San Diego 1,326,179
Dallas 1,223,229

Here format() is a string method used for inserting variables into strings
The reformatting of each line is the result of three different string methods, the details of
which can be left till later
The interesting part of this program for us is line 2, which shows that

1. The file object f is iterable, in the sense that it can be placed to the right of in within
a for loop
2. Iteration steps through each line in the file

This leads to the clean, convenient syntax shown in our program

Many other kinds of objects are iterable, and we’ll discuss some of them later on

4.4.2 Looping without Indices

One thing you might have noticed is that Python tends to favor looping without explicit in-
dexing
For example,
62 4. PYTHON ESSENTIALS

In [35]: x_values = [1, 2, 3] # Some iterable x

for x in x_values:
print(x * x)

1
4
9

is preferred to

In [36]: for i in range(len(x_values)):

print(x_values[i] * x_values[i])

1
4
9

When you compare these two alternatives, you can see why the first one is preferred
Python provides some facilities to simplify looping without indices
One is zip(), which is used for stepping through pairs from two sequences
For example, try running the following code

In [37]: countries = ('Japan', 'Korea', 'China')

cities = ('Tokyo', 'Seoul', 'Beijing')
for country, city in zip(countries, cities):
print(f'The capital of {country} is {city}')

The capital of Japan is Tokyo

The capital of Korea is Seoul
The capital of China is Beijing

The zip() function is also useful for creating dictionaries — for example

In [38]: names = ['Tom', 'John']

marks = ['E', 'F']
dict(zip(names, marks))

Out[38]: {'Tom': 'E', 'John': 'F'}

If we actually need the index from a list, one option is to use enumerate()
To understand what enumerate() does, consider the following example

In [39]: letter_list = ['a', 'b', 'c']

for index, letter in enumerate(letter_list):
print(f"letter_list[{index}] = '{letter}'")

letter_list[0] = 'a'
letter_list[1] = 'b'
letter_list[2] = 'c'

The output of the loop is

In [40]: letter_list[0] = 'a'

letter_list[1] = 'b'
letter_list[2] = 'c'
4.5. COMPARISONS AND LOGICAL OPERATORS 63

4.5 Comparisons and Logical Operators

4.5.1 Comparisons

Many different kinds of expressions evaluate to one of the Boolean values (i.e., True or
False)
A common type is comparisons, such as

In [41]: x, y = 1, 2
x < y

Out[41]: True

In [42]: x > y

Out[42]: False

One of the nice features of Python is that we can chain inequalities

In [43]: 1 < 2 < 3

Out[43]: True

In [44]: 1 <= 2 <= 3

Out[44]: True

As we saw earlier, when testing for equality we use ==

In [45]: x = 1 # Assignment
x == 2 # Comparison

Out[45]: False

For “not equal” use !=

In [46]: 1 != 2

Out[46]: True

Note that when testing conditions, we can use any valid Python expression

In [47]: x = 'yes' if 42 else 'no'

Out[47]: 'yes'

In [48]: x = 'yes' if [] else 'no'

Out[48]: 'no'
64 4. PYTHON ESSENTIALS

What’s going on here?

The rule is:

• Expressions that evaluate to zero, empty sequences or containers (strings, lists, etc.)
and None are all equivalent to False

– for example, and () are equivalent to False in an if clause

• All other values are equivalent to True

– for example, 42 is equivalent to True in an if clause

4.5.2 Combining Expressions

We can combine expressions using and, or and not

These are the standard logical connectives (conjunction, disjunction and denial)

In [49]: 1 < 2 and 'f' in 'foo'

Out[49]: True

In [50]: 1 < 2 and 'g' in 'foo'

Out[50]: False

In [51]: 1 < 2 or 'g' in 'foo'

Out[51]: True

In [52]: not True

Out[52]: False

In [53]: not not True

Out[53]: True

Remember

• P and Q is True if both are True, else False

• P or Q is False if both are False, else True

4.6 More Functions

Let’s talk a bit more about functions, which are all important for good programming style
Python has a number of built-in functions that are available without import
We have already met some
4.6. MORE FUNCTIONS 65

In [54]: max(19, 20)

Out[54]: 20

In [55]: range(4) # in python3 this returns a range iterator object

Out[55]: range(0, 4)

In [56]: list(range(4)) # will evaluate the range iterator and create a list

Out[56]: [0, 1, 2, 3]

In [57]: str(22)

Out[57]: '22'

In [58]: type(22)

Out[58]: int

Two more useful built-in functions are any() and all()

In [59]: bools = False, True, True

all(bools) # True if all are True and False otherwise

Out[59]: False

In [60]: any(bools) # False if all are False and True otherwise

Out[60]: True

The full list of Python built-ins is here

Now let’s talk some more about user-defined functions constructed using the keyword def

4.6.1 Why Write Functions?

User-defined functions are important for improving the clarity of your code by

• separating different strands of logic

• facilitating code reuse

(Writing the same thing twice is almost always a bad idea)

The basics of user-defined functions were discussed here
66 4. PYTHON ESSENTIALS

4.6.2 The Flexibility of Python Functions

As we discussed in the previous lecture, Python functions are very flexible

In particular

• Any number of functions can be defined in a given file

• Functions can be (and often are) defined inside other functions
• Any object can be passed to a function as an argument, including other functions
• A function can return any kind of object, including functions

We already gave an example of how straightforward it is to pass a function to a function

Note that a function can have arbitrarily many return statements (including zero)
Execution of the function terminates when the first return is hit, allowing code like the fol-
lowing example

In [61]: def f(x):

if x < 0:
return 'negative'
return 'nonnegative'

Functions without a return statement automatically return the special Python object None

4.6.3 Docstrings

Python has a system for adding comments to functions, modules, etc. called docstrings
The nice thing about docstrings is that they are available at run-time
Try running this

In [62]: def f(x):

"""
This function squares its argument
"""
return x**2

After running this code, the docstring is available

In [63]: f?

Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Docstring: This function squares its argument

In [64]: f??

Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
4.6. MORE FUNCTIONS 67

Definition: f(x)
Source:
def f(x):
"""
This function squares its argument
"""
return x**2

With one question mark we bring up the docstring, and with two we get the source code as
well

4.6.4 One-Line Functions: lambda

The lambda keyword is used to create simple functions on one line

For example, the definitions

In [65]: def f(x):

return x**3

and

In [66]: f = lambda x: x**3

are entirely equivalent

2
To see why lambda is useful, suppose that we want to calculate ∫0 𝑥3 𝑑𝑥 (and have forgotten
our high-school calculus)
The SciPy library has a function called quad that will do this calculation for us
The syntax of the quad function is quad(f, a, b) where f is a function and a and b are
numbers
To create the function 𝑓(𝑥) = 𝑥3 we can use lambda as follows

In [67]: from scipy.integrate import quad

quad(lambda x: x**3, 0, 2)

Out[67]: (4.0, 4.440892098500626e-14)

Here the function created by lambda is said to be anonymous because it was never given a
name

4.6.5 Keyword Arguments

If you did the exercises in the previous lecture, you would have come across the statement

plt.plot(x, 'b-', label="white noise")

68 4. PYTHON ESSENTIALS

In this call to Matplotlib’s plot function, notice that the last argument is passed in
name=argument syntax
This is called a keyword argument, with label being the keyword
Non-keyword arguments are called positional arguments, since their meaning is determined by
order

• plot(x, 'b-', label="white noise") is different from plot('b-', x, la-

bel="white noise")

Keyword arguments are particularly useful when a function has a lot of arguments, in which
case it’s hard to remember the right order
You can adopt keyword arguments in user-defined functions with no difficulty
The next example illustrates the syntax

In [68]: def f(x, a=1, b=1):

return a + b * x

The keyword argument values we supplied in the definition of f become the default values

In [69]: f(2)

Out[69]: 3

They can be modified as follows

In [70]: f(2, a=4, b=5)

Out[70]: 14

4.7 Coding Style and PEP8

To learn more about the Python programming philosophy type import this at the prompt
Among other things, Python strongly favors consistency in programming style
We’ve all heard the saying about consistency and little minds
In programming, as in mathematics, the opposite is true

• A mathematical paper where the symbols ∪ and ∩ were reversed would be very hard to
read, even if the author told you so on the first page

In Python, the standard style is set out in PEP8

(Occasionally we’ll deviate from PEP8 in these lectures to better match mathematical nota-
tion)
4.8. EXERCISES 69

4.8 Exercises

Solve the following exercises

(For some, the built-in function sum() comes in handy)

4.8.1 Exercise 1

Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute their
inner product using zip()
Part 2: In one line, count the number of even numbers in 0,…,99

• Hint: x % 2 returns 0 if x is even, 1 otherwise

Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of
pairs (a, b) such that both a and b are even

4.8.2 Exercise 2

Consider the polynomial

𝑛
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑛 𝑥𝑛 = ∑ 𝑎𝑖 𝑥𝑖 (1)
𝑖=0

Write a function p such that p(x, coeff) that computes the value in Eq. (1) given a point
x and a list of coefficients coeff
Try to use enumerate() in your loop

4.8.3 Exercise 3

Write a function that takes a string as an argument and returns the number of capital letters
in the string
Hint: 'foo'.upper() returns 'FOO'

4.8.4 Exercise 4

Write a function that takes two sequences seq_a and seq_b as arguments and returns True
if every element in seq_a is also an element of seq_b, else False

• By “sequence” we mean a list, a tuple or a string

• Do the exercise without using sets and set methods

4.8.5 Exercise 5

When we cover the numerical libraries, we will see they include many alternatives for interpo-
lation and function approximation
70 4. PYTHON ESSENTIALS

Nevertheless, let’s write our own function approximation routine as an exercise

In particular, without using any imports, write a function linapprox that takes as argu-
ments

• A function f mapping some interval [𝑎, 𝑏] into R

• two scalars a and b providing the limits of this interval
• An integer n determining the number of grid points
• A number x satisfying a <= x <= b

and returns the piecewise linear interpolation of f at x, based on n evenly spaced grid points
a = point[0] < point[1] < ... < point[n-1] = b
Aim for clarity, not efficiency

4.9 Solutions

4.9.1 Exercise 1

Part 1 Solution:
Here’s one possible solution

In [71]: x_vals = [1, 2, 3]

y_vals = [1, 1, 1]
sum([x * y for x, y in zip(x_vals, y_vals)])

Out[71]: 6

This also works

In [72]: sum(x * y for x, y in zip(x_vals, y_vals))

Out[72]: 6

Part 2 Solution:
One solution is

In [73]: sum([x % 2 == 0 for x in range(100)])

Out[73]: 50

This also works:

In [74]: sum(x % 2 == 0 for x in range(100))

Out[74]: 50

Some less natural alternatives that nonetheless help to illustrate the flexibility of list compre-
hensions are
4.9. SOLUTIONS 71

In [75]: len([x for x in range(100) if x % 2 == 0])

Out[75]: 50

and

In [76]: sum([1 for x in range(100) if x % 2 == 0])

Out[76]: 50

Part 3 Solution
Here’s one possibility

In [77]: pairs = ((2, 5), (4, 2), (9, 8), (12, 10))
sum([x % 2 == 0 and y % 2 == 0 for x, y in pairs])

Out[77]: 2

4.9.2 Exercise 2
In [78]: def p(x, coeff):
return sum(a * x**i for i, a in enumerate(coeff))

In [79]: p(1, (2, 4))

Out[79]: 6

4.9.3 Exercise 3

Here’s one solution:

In [80]: def f(string):

count = 0
for letter in string:
if letter == letter.upper() and letter.isalpha():
count += 1
return count
f('The Rain in Spain')

Out[80]: 3

4.9.4 Exercise 4

Here’s a solution:

In [81]: def f(seq_a, seq_b):

is_subset = True
for a in seq_a:
if a not in seq_b:
is_subset = False
return is_subset

# == test == #

print(f([1, 2], [1, 2, 3]))

print(f([1, 2, 3], [1, 2]))
72 4. PYTHON ESSENTIALS

True
False

Of course, if we use the sets data type then the solution is easier

In [82]: def f(seq_a, seq_b):

return set(seq_a).issubset(set(seq_b))

4.9.5 Exercise 5
In [83]: def linapprox(f, a, b, n, x):
"""
Evaluates the piecewise linear interpolant of f at x on the interval
[a, b], with n evenly spaced grid points.

Parameters
===========
f : function
The function to approximate

x, a, b : scalars (floats or integers)

Evaluation point and endpoints, with a <= x <= b

n : integer
Number of grid points

Returns
=========
A float. The interpolant evaluated at x

"""
length_of_interval = b - a
num_subintervals = n - 1
step = length_of_interval / num_subintervals

# === find first grid point larger than x === #

point = a
while point <= x:
point += step

# === x must lie between the gridpoints (point - step) and point === #
u, v = point - step, point

return f(u) + (x - u) * (f(v) - f(u)) / (v - u)

OOP I: Introduction to Object

Oriented Programming

5.1 Contents

• Overview 5.2

• Objects 5.3

• Summary 5.4

5.2 Overview

OOP is one of the major paradigms in programming

The traditional programming paradigm (think Fortran, C, MATLAB, etc.) is called procedu-
ral
It works as follows

• The program has a state corresponding to the values of its variables

• Functions are called to act on these data
• Data are passed back and forth via function calls

In contrast, in the OOP paradigm

• data and functions are “bundled together” into “objects”

(Functions in this context are referred to as methods)

5.2.1 Python and OOP

Python is a pragmatic language that blends object-oriented and procedural styles, rather than
taking a purist approach
However, at a foundational level, Python is object-oriented

73
74 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING

In particular, in Python, everything is an object

In this lecture, we explain what that statement means and why it matters

5.3 Objects

In Python, an object is a collection of data and instructions held in computer memory that
consists of

1. a type
2. a unique identity
3. data (i.e., content)
4. methods

These concepts are defined and discussed sequentially below

5.3.1 Type

Python provides for different types of objects, to accommodate different categories of data
For example

In [1]: s = 'This is a string'

type(s)

Out[1]: str

In [2]: x = 42 # Now let's create an integer

type(x)

Out[2]: int

The type of an object matters for many expressions

For example, the addition operator between two strings means concatenation

In [3]: '300' + 'cc'

Out[3]: '300cc'

On the other hand, between two numbers it means ordinary addition

In [4]: 300 + 400

Out[4]: 700

Consider the following expression

In [5]: '300' + 400

5.3. OBJECTS 75

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-5-263a89d2d982> in <module>
----> 1 '300' + 400

TypeError: can only concatenate str (not "int") to str

Here we are mixing types, and it’s unclear to Python whether the user wants to

• convert '300' to an integer and then add it to 400, or

• convert 400 to string and then concatenate it with '300'

Some languages might try to guess but Python is strongly typed

• Type is important, and implicit type conversion is rare

• Python will respond instead by raising a TypeError

To avoid the error, you need to clarify by changing the relevant type
For example,

In [6]: int('300') + 400 # To add as numbers, change the string to an integer

Out[6]: 700

5.3.2 Identity

In Python, each object has a unique identifier, which helps Python (and us) keep track of the
object
The identity of an object can be obtained via the id() function

In [7]: y = 2.5
z = 2.5
id(y)

Out[7]: 140535456630128

In [8]: id(z)

Out[8]: 140535456630080

In this example, y and z happen to have the same value (i.e., 2.5), but they are not the
same object
The identity of an object is in fact just the address of the object in memory
76 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING

5.3.3 Object Content: Data and Attributes

If we set x = 42 then we create an object of type int that contains the data 42
In fact, it contains more, as the following example shows

In [9]: x = 42
x

Out[9]: 42

In [10]: x.imag

Out[10]: 0

In [11]: x.__class__

Out[11]: int

When Python creates this integer object, it stores with it various auxiliary information, such
as the imaginary part, and the type
Any name following a dot is called an attribute of the object to the left of the dot

• e.g.,imag and class are attributes of x

We see from this example that objects have attributes that contain auxiliary information
They also have attributes that act like functions, called methods
These attributes are important, so let’s discuss them in-depth

5.3.4 Methods

Methods are functions that are bundled with objects

Formally, methods are attributes of objects that are callable (i.e., can be called as functions)

In [12]: x = ['foo', 'bar']

callable(x.append)

Out[12]: True

In [13]: callable(x.__doc__)

Out[13]: False

Methods typically act on the data contained in the object they belong to, or combine that
data with other data

In [14]: x = ['a', 'b']

x.append('c')
s = 'This is a string'
s.upper()
5.4. SUMMARY 77

Out[14]: 'THIS IS A STRING'

In [15]: s.lower()

Out[15]: 'this is a string'

In [16]: s.replace('This', 'That')

Out[16]: 'That is a string'

A great deal of Python functionality is organized around method calls

For example, consider the following piece of code

In [17]: x = ['a', 'b']

x[0] = 'aa' # Item assignment using square bracket notation
x

Out[17]: ['aa', 'b']

It doesn’t look like there are any methods used here, but in fact the square bracket assign-
ment notation is just a convenient interface to a method call
What actually happens is that Python calls the __setitem__ method, as follows

In [18]: x = ['a', 'b']

x.__setitem__(0, 'aa') # Equivalent to x[0] = 'aa'
x

Out[18]: ['aa', 'b']

(If you wanted to you could modify the __setitem__ method, so that square bracket as-
signment does something totally different)

5.4 Summary

In Python, everything in memory is treated as an object

This includes not just lists, strings, etc., but also less obvious things, such as

• functions (once they have been read into memory)

• modules (ditto)
• files opened for reading or writing
• integers, etc.

Consider, for example, functions

When Python reads a function definition, it creates a function object and stores it in mem-
ory
The following code illustrates
78 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING

In [19]: def f(x): return x**2

Out[19]: <function main.f(x)>

In [20]: type(f)

Out[20]: function

In [21]: id(f)

Out[21]: 140535456543336

In [22]: f.__name__

Out[22]: 'f'

We can see that f has type, identity, attributes and so on—just like any other object
It also has methods
One example is the __call__ method, which just evaluates the function

In [23]: f.__call__(3)

Out[23]: 9

Another is the dir method, which returns a list of attributes

Modules loaded into memory are also treated as objects

In [24]: import math

id(math)

Out[24]: 140535632790936

This uniform treatment of data in Python (everything is an object) helps keep the language
simple and consistent
Part II

The Scientific Libraries

79
6

NumPy

6.1 Contents

• Overview 6.2

• Introduction to NumPy 6.3

• NumPy Arrays 6.4

• Operations on Arrays 6.5

• Additional Functionality 6.6

• Exercises 6.7

• Solutions 6.8

“Let’s be clear: the work of science has nothing whatever to do with consensus.
Consensus is the business of politics. Science, on the contrary, requires only one
investigator who happens to be right, which means that he or she has results that
are verifiable by reference to the real world. In science consensus is irrelevant.
What is relevant is reproducible results.” – Michael Crichton

6.2 Overview

NumPy is a first-rate library for numerical programming

• Widely used in academia, finance and industry

• Mature, fast, stable and under continuous development

In this lecture, we introduce NumPy arrays and the fundamental array processing operations
provided by NumPy

6.2.1 References

• The official NumPy documentation

81
82 6. NUMPY

6.3 Introduction to NumPy

The essential problem that NumPy solves is fast array processing

For example, suppose we want to create an array of 1 million random draws from a uniform
distribution and compute the mean
If we did this in pure Python it would be orders of magnitude slower than C or Fortran
This is because

• Loops in Python over Python data types like lists carry significant overhead
• C and Fortran code contains a lot of type information that can be used for optimization
• Various optimizations can be carried out during compilation when the compiler sees the
instructions as a whole

However, for a task like the one described above, there’s no need to switch back to C or For-
tran
Instead, we can use NumPy, where the instructions look like this:

In [1]: import numpy as np

x = np.random.uniform(0, 1, size=1000000)
x.mean()

Out[1]: 0.5004892850074708

The operations of creating the array and computing its mean are both passed out to carefully
optimized machine code compiled from C
More generally, NumPy sends operations in batches to optimized C and Fortran code
This is similar in spirit to Matlab, which provides an interface to fast Fortran routines

6.3.1 A Comment on Vectorization

NumPy is great for operations that are naturally vectorized

Vectorized operations are precompiled routines that can be sent in batches, like

• matrix multiplication and other linear algebra routines

• generating a vector of random numbers
• applying a fixed transformation (e.g., sine or cosine) to an entire array

In a later lecture, we’ll discuss code that isn’t easy to vectorize and how such routines can
also be optimized

6.4 NumPy Arrays

The most important thing that NumPy defines is an array data type formally called a
numpy.ndarray
6.4. NUMPY ARRAYS 83

NumPy arrays power a large proportion of the scientific Python ecosystem

To create a NumPy array containing only zeros we use np.zeros

In [2]: a = np.zeros(3)
a

Out[2]: array([0., 0., 0.])

In [3]: type(a)

Out[3]: numpy.ndarray

NumPy arrays are somewhat like native Python lists, except that

• Data must be homogeneous (all elements of the same type)

• These types must be one of the data types (dtypes) provided by NumPy

The most important of these dtypes are:

• float64: 64 bit floating-point number

• int64: 64 bit integer
• bool: 8 bit True or False

There are also dtypes to represent complex numbers, unsigned integers, etc
On modern machines, the default dtype for arrays is float64

In [4]: a = np.zeros(3)
type(a[0])

Out[4]: numpy.float64

If we want to use integers we can specify as follows:

In [5]: a = np.zeros(3, dtype=int)

type(a[0])

Out[5]: numpy.int64

6.4.1 Shape and Dimension

Consider the following assignment

In [6]: z = np.zeros(10)

Here z is a flat array with no dimension — neither row nor column vector
The dimension is recorded in the shape attribute, which is a tuple

In [7]: z.shape
84 6. NUMPY

Out[7]: (10,)

Here the shape tuple has only one element, which is the length of the array (tuples with one
element end with a comma)
To give it dimension, we can change the shape attribute

In [8]: z.shape = (10, 1)

Out[8]: array([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]])

In [9]: z = np.zeros(4)
z.shape = (2, 2)
z

Out[9]: array([[0., 0.],

[0., 0.]])

In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() func-
tion, as in z = np.zeros((2, 2))

6.4.2 Creating Arrays

As we’ve seen, the np.zeros function creates an array of zeros

You can probably guess what np.ones creates
Related is np.empty, which creates arrays in memory that can later be populated with data

In [10]: z = np.empty(3)
z

Out[10]: array([0., 0., 0.])

The numbers you see here are garbage values

(Python allocates 3 contiguous 64 bit pieces of memory, and the existing contents of those
memory slots are interpreted as float64 values)
To set up a grid of evenly spaced numbers use np.linspace

In [11]: z = np.linspace(2, 4, 5) # From 2 to 4, with 5 elements

To create an identity matrix use either np.identity or np.eye

In [12]: z = np.identity(2)
z
6.4. NUMPY ARRAYS 85

Out[12]: array([[1., 0.],

[0., 1.]])

In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array

In [13]: z = np.array([10, 20]) # ndarray from Python list

Out[13]: array([10, 20])

In [14]: type(z)

Out[14]: numpy.ndarray

In [15]: z = np.array((10, 20), dtype=float) # Here 'float' is equivalent to 'np.float64'

Out[15]: array([10., 20.])

In [16]: z = np.array([[1, 2], [3, 4]]) # 2D array from a list of lists

Out[16]: array([[1, 2],

[3, 4]])

See also np.asarray, which performs a similar function, but does not make a distinct copy
of data already in a NumPy array

In [17]: na = np.linspace(10, 20, 2)

na is np.asarray(na) # Does not copy NumPy arrays

Out[17]: True

In [18]: na is np.array(na) # Does make a new copy --- perhaps unnecessarily

Out[18]: False

To read in the array data from a text file containing numeric data use np.loadtxt or
np.genfromtxt—see the documentation for details

6.4.3 Array Indexing

For a flat array, indexing is the same as Python sequences:

In [19]: z = np.linspace(1, 2, 5)
z

Out[19]: array([1. , 1.25, 1.5 , 1.75, 2. ])

In [20]: z[0]

Out[20]: 1.0
86 6. NUMPY

In [21]: z[0:2] # Two elements, starting at element 0

Out[21]: array([1. , 1.25])

In [22]: z[-1]

Out[22]: 2.0

For 2D arrays the index syntax is as follows:

In [23]: z = np.array([[1, 2], [3, 4]])

Out[23]: array([[1, 2],

[3, 4]])

In [24]: z[0, 0]

Out[24]: 1

In [25]: z[0, 1]

Out[25]: 2

And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows

In [26]: z[0, :]

Out[26]: array([1, 2])

In [27]: z[:, 1]

Out[27]: array([2, 4])

NumPy arrays of integers can also be used to extract elements

In [28]: z = np.linspace(2, 4, 5)
z

Out[28]: array([2. , 2.5, 3. , 3.5, 4. ])

In [29]: indices = np.array((0, 2, 3))

z[indices]

Out[29]: array([2. , 3. , 3.5])

Finally, an array of dtype bool can be used to extract elements

In [30]: z
6.4. NUMPY ARRAYS 87

Out[30]: array([2. , 2.5, 3. , 3.5, 4. ])

In [31]: d = np.array([0, 1, 1, 0, 0], dtype=bool)

Out[31]: array([False, True, True, False, False])

In [32]: z[d]

Out[32]: array([2.5, 3. ])

We’ll see why this is useful below

An aside: all elements of an array can be set equal to one number using slice notation

In [33]: z = np.empty(3)
z

Out[33]: array([2. , 3. , 3.5])

In [34]: z[:] = 42
z

Out[34]: array([42., 42., 42.])

6.4.4 Array Methods

Arrays have useful methods, all of which are carefully optimized

In [35]: a = np.array((4, 3, 2, 1))

Out[35]: array([4, 3, 2, 1])

In [36]: a.sort() # Sorts a in place

Out[36]: array([1, 2, 3, 4])

In [37]: a.sum() # Sum

Out[37]: 10

In [38]: a.mean() # Mean

Out[38]: 2.5

In [39]: a.max() # Max

Out[39]: 4

In [40]: a.argmax() # Returns the index of the maximal element

88 6. NUMPY

Out[40]: 3

In [41]: a.cumsum() # Cumulative sum of the elements of a

Out[41]: array([ 1, 3, 6, 10])

In [42]: a.cumprod() # Cumulative product of the elements of a

Out[42]: array([ 1, 2, 6, 24])

In [43]: a.var() # Variance

Out[43]: 1.25

In [44]: a.std() # Standard deviation

Out[44]: 1.118033988749895

In [45]: a.shape = (2, 2)

a.T # Equivalent to a.transpose()

Out[45]: array([[1, 3],

[2, 4]])

Another method worth knowing is searchsorted()

If z is a nondecreasing array, then z.searchsorted(a) returns the index of the first ele-
ment of z that is >= a

In [46]: z = np.linspace(2, 4, 5)
z

Out[46]: array([2. , 2.5, 3. , 3.5, 4. ])

In [47]: z.searchsorted(2.2)

Out[47]: 1

Many of the methods discussed above have equivalent functions in the NumPy namespace

In [48]: a = np.array((4, 3, 2, 1))

In [49]: np.sum(a)

Out[49]: 10

In [50]: np.mean(a)

Out[50]: 2.5
6.5. OPERATIONS ON ARRAYS 89

6.5 Operations on Arrays

6.5.1 Arithmetic Operations

The operators +, -, *, / and ** all act elementwise on arrays

In [51]: a = np.array([1, 2, 3, 4])

b = np.array([5, 6, 7, 8])
a + b

Out[51]: array([ 6, 8, 10, 12])

In [52]: a * b

Out[52]: array([ 5, 12, 21, 32])

We can add a scalar to each element as follows

In [53]: a + 10

Out[53]: array([11, 12, 13, 14])

Scalar multiplication is similar

In [54]: a * 10

Out[54]: array([10, 20, 30, 40])

The two-dimensional arrays follow the same general rules

In [55]: A = np.ones((2, 2))

B = np.ones((2, 2))
A + B

Out[55]: array([[2., 2.],

[2., 2.]])

In [56]: A + 10

Out[56]: array([[11., 11.],

[11., 11.]])

In [57]: A * B

Out[57]: array([[1., 1.],

[1., 1.]])

In particular, A * B is not the matrix product, it is an element-wise product

90 6. NUMPY

6.5.2 Matrix Multiplication

With Anaconda’s scientific Python package based around Python 3.5 and above, one can use
the @ symbol for matrix multiplication, as follows:

In [58]: A = np.ones((2, 2))

B = np.ones((2, 2))
A @ B

Out[58]: array([[2., 2.],

[2., 2.]])

(For older versions of Python and NumPy you need to use the np.dot function)
We can also use @ to take the inner product of two flat arrays

In [59]: A = np.array((1, 2))

B = np.array((10, 20))
A @ B

Out[59]: 50

In fact, we can use @ when one element is a Python list or tuple

In [60]: A = np.array(((1, 2), (3, 4)))

Out[60]: array([[1, 2],

[3, 4]])

In [61]: A @ (0, 1)

Out[61]: array([2, 4])

Since we are post-multiplying, the tuple is treated as a column vector

6.5.3 Mutability and Copying Arrays

NumPy arrays are mutable data types, like Python lists

In other words, their contents can be altered (mutated) in memory after initialization
We already saw examples above
Here’s another example:

In [62]: a = np.array([42, 44])

Out[62]: array([42, 44])

In [63]: a[-1] = 0 # Change last element to 0

Out[63]: array([42, 0])

6.5. OPERATIONS ON ARRAYS 91

Mutability leads to the following behavior (which can be shocking to MATLAB program-
mers…)

In [64]: a = np.random.randn(3)
a

Out[64]: array([ 1.05287718, -0.90366748, -1.51731058])

In [65]: b = a
b[0] = 0.0
a

Out[65]: array([ 0. , -0.90366748, -1.51731058])

What’s happened is that we have changed a by changing b

The name b is bound to a and becomes just another reference to the array (the Python as-
signment model is described in more detail later in the course)
Hence, it has equal rights to make changes to that array
This is in fact the most sensible default behavior!
It means that we pass around only pointers to data, rather than making copies
Making copies is expensive in terms of both speed and memory
Making Copies
It is of course possible to make b an independent copy of a when required
This can be done using np.copy

In [66]: a = np.random.randn(3)
a

Out[66]: array([-0.19842005, 0.08435544, -0.34056112])

In [67]: b = np.copy(a)
b

Out[67]: array([-0.19842005, 0.08435544, -0.34056112])

Now b is an independent copy (called a deep copy)

In [68]: b[:] = 1
b

Out[68]: array([1., 1., 1.])

In [69]: a

Out[69]: array([-0.19842005, 0.08435544, -0.34056112])

Note that the change to b has not affected a

92 6. NUMPY

6.6 Additional Functionality

Let’s look at some other useful things we can do with NumPy

6.6.1 Vectorized Functions

NumPy provides versions of the standard functions log, exp, sin, etc. that act element-
wise on arrays

In [70]: z = np.array([1, 2, 3])

np.sin(z)

Out[70]: array([0.84147098, 0.90929743, 0.14112001])

This eliminates the need for explicit element-by-element loops such as

In [71]: n = len(z)
y = np.empty(n)
for i in range(n):
y[i] = np.sin(z[i])

Because they act element-wise on arrays, these functions are called vectorized functions
In NumPy-speak, they are also called ufuncs, which stands for “universal functions”
As we saw above, the usual arithmetic operations (+, *, etc.) also work element-wise, and
combining these with the ufuncs gives a very large set of fast element-wise functions

In [72]: z

Out[72]: array([1, 2, 3])

In [73]: (1 / np.sqrt(2 * np.pi)) * np.exp(- 0.5 * z**2)

Out[73]: array([0.24197072, 0.05399097, 0.00443185])

Not all user-defined functions will act element-wise

For example, passing the function f defined below a NumPy array causes a ValueError

In [74]: def f(x):

return 1 if x > 0 else 0

The NumPy function np.where provides a vectorized alternative:

In [75]: x = np.random.randn(4)
x

Out[75]: array([ 1.61695912, -0.70388772, 0.17046687, 0.89294672])

In [76]: np.where(x > 0, 1, 0) # Insert 1 if x > 0 true, otherwise 0

Out[76]: array([1, 0, 1, 1])

6.6. ADDITIONAL FUNCTIONALITY 93

You can also use np.vectorize to vectorize a given function

In [77]: def f(x): return 1 if x > 0 else 0

f = np.vectorize(f)
f(x) # Passing the same vector x as in the previous example

Out[77]: array([1, 0, 1, 1])

However, this approach doesn’t always obtain the same speed as a more carefully crafted vec-
torized function

6.6.2 Comparisons

As a rule, comparisons on arrays are done element-wise

In [78]: z = np.array([2, 3])

y = np.array([2, 3])
z == y

Out[78]: array([ True, True])

In [79]: y[0] = 5
z == y

Out[79]: array([False, True])

In [80]: z != y

Out[80]: array([ True, False])

The situation is similar for >, <, >= and <=

We can also do comparisons against scalars

In [81]: z = np.linspace(0, 10, 5)

Out[81]: array([ 0. , 2.5, 5. , 7.5, 10. ])

In [82]: z > 3

Out[82]: array([False, False, True, True, True])

This is particularly useful for conditional extraction

In [83]: b = z > 3
b

Out[83]: array([False, False, True, True, True])

In [84]: z[b]

Out[84]: array([ 5. , 7.5, 10. ])

Of course we can—and frequently do—perform this in one step

In [85]: z[z > 3]

Out[85]: array([ 5. , 7.5, 10. ])

94 6. NUMPY

6.6.3 Sub-packages

NumPy provides some additional functionality related to scientific programming through its
sub-packages
We’ve already seen how we can generate random variables using np.random

In [86]: z = np.random.randn(10000) # Generate standard normals

y = np.random.binomial(10, 0.5, size=1000) # 1,000 draws from Bin(10, 0.5)
y.mean()

Out[86]: 5.034

Another commonly used subpackage is np.linalg

In [87]: A = np.array([[1, 2], [3, 4]])

np.linalg.det(A) # Compute the determinant

Out[87]: -2.0000000000000004

In [88]: np.linalg.inv(A) # Compute the inverse

Out[88]: array([[-2. , 1. ],
[ 1.5, -0.5]])

Much of this functionality is also available in SciPy, a collection of modules that are built on
top of NumPy
We’ll cover the SciPy versions in more detail soon
For a comprehensive list of what’s available in NumPy see this documentation

6.7 Exercises

6.7.1 Exercise 1

Consider the polynomial expression

𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (1)
𝑛=0

Earlier, you wrote a simple function p(x, coeff) to evaluate Eq. (1) without considering
efficiency
Now write a new function that does the same job, but uses NumPy arrays and array opera-
tions for its computations, rather than any form of Python loop
(Such functionality is already implemented as np.poly1d, but for the sake of the exercise
don’t use this class)

• Hint: Use np.cumprod()

6.7. EXERCISES 95

6.7.2 Exercise 2

Let q be a NumPy array of length n with q.sum() == 1

Suppose that q represents a probability mass function
We wish to generate a discrete random variable 𝑥 such that P{𝑥 = 𝑖} = 𝑞𝑖
In other words, x takes values in range(len(q)) and x = i with probability q[i]
The standard (inverse transform) algorithm is as follows:

• Divide the unit interval [0, 1] into 𝑛 subintervals 𝐼0 , 𝐼1 , … , 𝐼𝑛−1 such that the length of
𝐼𝑖 is 𝑞𝑖
• Draw a uniform random variable 𝑈 on [0, 1] and return the 𝑖 such that 𝑈 ∈ 𝐼𝑖

The probability of drawing 𝑖 is the length of 𝐼𝑖 , which is equal to 𝑞𝑖

We can implement the algorithm as follows

In [89]: from random import uniform

def sample(q):
a = 0.0
U = uniform(0, 1)
for i in range(len(q)):
if a < U <= a + q[i]:
return i
a = a + q[i]

If you can’t see how this works, try thinking through the flow for a simple example, such as q
= [0.25, 0.75] It helps to sketch the intervals on paper
Your exercise is to speed it up using NumPy, avoiding explicit loops

• Hint: Use np.searchsorted and np.cumsum

If you can, implement the functionality as a class called discreteRV, where

• the data for an instance of the class is the vector of probabilities q

• the class has a draw() method, which returns one draw according to the algorithm de-
scribed above

If you can, write the method so that draw(k) returns k draws from q

6.7.3 Exercise 3

Recall our earlier discussion of the empirical cumulative distribution function

Your task is to

1. Make the call method more efficient using NumPy

2. Add a method that plots the ECDF over [𝑎, 𝑏], where 𝑎 and 𝑏 are method parameters
96 6. NUMPY

6.8 Solutions
In [90]: import matplotlib.pyplot as plt
%matplotlib inline

6.8.1 Exercise 1

This code does the job

In [91]: def p(x, coef):

X = np.empty(len(coef))
X[0] = 1
X[1:] = x
y = np.cumprod(X) # y = [1, x, x**2,...]
return coef @ y

Let’s test it

In [92]: coef = np.ones(3)

print(coef)
print(p(1, coef))
# For comparison
q = np.poly1d(coef)
print(q(1))

[1. 1. 1.]
3.0
3.0

6.8.2 Exercise 2

Here’s our first pass at a solution:

In [93]: from numpy import cumsum

from numpy.random import uniform

class DiscreteRV:
"""
Generates an array of draws from a discrete random variable with vector of
probabilities given by q.
"""

def init(self, q):

"""
The argument q is a NumPy array, or array like, nonnegative and sums
to 1
"""
self.q = q
self.Q = cumsum(q)

def draw(self, k=1):

"""
Returns k draws from q. For each such draw, the value i is returned
with probability q[i].
"""
return self.Q.searchsorted(uniform(0, 1, size=k))

The logic is not obvious, but if you take your time and read it slowly, you will understand
There is a problem here, however
Suppose that q is altered after an instance of discreteRV is created, for example by
6.8. SOLUTIONS 97

In [94]: q = (0.1, 0.9)

d = DiscreteRV(q)
d.q = (0.5, 0.5)

The problem is that Q does not change accordingly, and Q is the data used in the draw
method
To deal with this, one option is to compute Q every time the draw method is called
But this is inefficient relative to computing Q once-off
A better option is to use descriptors
A solution from the quantecon library using descriptors that behaves as we desire can be
found here

6.8.3 Exercise 3

An example solution is given below

In essence, we’ve just taken this code from QuantEcon and added in a plot method

In [95]: """
Modifies ecdf.py from QuantEcon to add in a plot method

"""

class ECDF:
"""
One-dimensional empirical distribution function given a vector of
observations.

Parameters
----------
observations : array_like
An array of observations

Attributes
----------
observations : array_like
An array of observations

"""

def init(self, observations):

self.observations = np.asarray(observations)

def call(self, x):

"""
Evaluates the ecdf at x

Parameters
----------
x : scalar(float)
The x at which the ecdf is evaluated

Returns
-------
scalar(float)
Fraction of the sample less than x

"""
return np.mean(self.observations <= x)

def plot(self, a=None, b=None):

"""
Plot the ecdf on the interval [a, b].
98 6. NUMPY

Parameters
----------
a : scalar(float), optional(default=None)
Lower endpoint of the plot interval
b : scalar(float), optional(default=None)
Upper endpoint of the plot interval

"""

# === choose reasonable interval if [a, b] not specified === #

if a is None:
a = self.observations.min() - self.observations.std()
if b is None:
b = self.observations.max() + self.observations.std()

# === generate plot === #

x_vals = np.linspace(a, b, num=100)
f = np.vectorize(self.__call__)
plt.plot(x_vals, f(x_vals))
plt.show()

Here’s an example of usage

In [96]: X = np.random.randn(1000)
F = ECDF(X)
F.plot()
7

Matplotlib

7.1 Contents

• Overview 7.2

• The APIs 7.3

• More Features 7.4

• Further Reading 7.5

• Exercises 7.6

• Solutions 7.7

7.2 Overview

We’ve already generated quite a few figures in these lectures using Matplotlib
Matplotlib is an outstanding graphics library, designed for scientific computing, with

• high-quality 2D and 3D plots

• output in all the usual formats (PDF, PNG, etc.)
• LaTeX integration
• fine-grained control over all aspects of presentation
• animation, etc.

7.2.1 Matplotlib’s Split Personality

Matplotlib is unusual in that it offers two different interfaces to plotting

One is a simple MATLAB-style API (Application Programming Interface) that was written to
help MATLAB refugees find a ready home
The other is a more “Pythonic” object-oriented API
For reasons described below, we recommend that you use the second API
But first, let’s discuss the difference

99
100 7. MATPLOTLIB

7.3 The APIs

7.3.1 The MATLAB-style API

Here’s the kind of easy example you might find in introductory treatments

In [1]: import matplotlib.pyplot as plt

%matplotlib inline
import numpy as np

x = np.linspace(0, 10, 200)

y = np.sin(x)

plt.plot(x, y, 'b-', linewidth=2)

plt.show()

This is simple and convenient, but also somewhat limited and un-Pythonic
For example, in the function calls, a lot of objects get created and passed around without
making themselves known to the programmer
Python programmers tend to prefer a more explicit style of programming (run import this
in a code block and look at the second line)
This leads us to the alternative, object-oriented Matplotlib API

7.3.2 The Object-Oriented API

Here’s the code corresponding to the preceding figure using the object-oriented API

In [2]: fig, ax = plt.subplots()

ax.plot(x, y, 'b-', linewidth=2)
plt.show()
7.3. THE APIS 101

Here the call fig, ax = plt.subplots() returns a pair, where

• fig is a Figure instance—like a blank canvas

• ax is an AxesSubplot instance—think of a frame for plotting in

The plot() function is actually a method of ax

While there’s a bit more typing, the more explicit use of objects gives us better control
This will become more clear as we go along

7.3.3 Tweaks

Here we’ve changed the line to red and added a legend

In [3]: fig, ax = plt.subplots()

ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend()
plt.show()
102 7. MATPLOTLIB

We’ve also used alpha to make the line slightly transparent—which makes it look smoother
The location of the legend can be changed by replacing ax.legend() with
ax.legend(loc='upper center')

In [4]: fig, ax = plt.subplots()

ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend(loc='upper center')
plt.show()

If everything is properly configured, then adding LaTeX is trivial

7.3. THE APIS 103

In [5]: fig, ax = plt.subplots()

ax.plot(x, y, 'r-', linewidth=2, label='$y=\sin(x)$', alpha=0.6)
ax.legend(loc='upper center')
plt.show()

Controlling the ticks, adding titles and so on is also straightforward

In [6]: fig, ax = plt.subplots()

ax.plot(x, y, 'r-', linewidth=2, label='$y=\sin(x)$', alpha=0.6)
ax.legend(loc='upper center')
ax.set_yticks([-1, 0, 1])
ax.set_title('Test plot')
plt.show()
104 7. MATPLOTLIB

7.4 More Features

Matplotlib has a huge array of functions and features, which you can discover over time as
you have need for them
We mention just a few

7.4.1 Multiple Plots on One Axis

It’s straightforward to generate multiple plots on the same axes

Here’s an example that randomly generates three normal densities and adds a label with their
mean

In [7]: from scipy.stats import norm

from random import uniform

fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = f'$\mu = {m:.2}$'
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()

7.4.2 Multiple Subplots

Sometimes we want multiple subplots in one figure

7.4. MORE FEATURES 105

Here’s an example that generates 6 histograms

In [8]: num_rows, num_cols = 3, 2

fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 12))
for i in range(num_rows):
for j in range(num_cols):
m, s = uniform(-1, 1), uniform(1, 2)
x = norm.rvs(loc=m, scale=s, size=100)
axes[i, j].hist(x, alpha=0.6, bins=20)
t = f'$\mu = {m:.2}, \quad \sigma = {s:.2}$'
axes[i, j].set(title=t, xticks=[-4, 0, 4], yticks=[])
plt.show()
106 7. MATPLOTLIB

7.4.3 3D Plots

Matplotlib does a nice job of 3D plots — here is one example

In [9]: from mpl_toolkits.mplot3d.axes3d import Axes3D

from matplotlib import cm

def f(x, y):

return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

xgrid = np.linspace(-3, 3, 50)

ygrid = xgrid
x, y = np.meshgrid(xgrid, ygrid)

fig = plt.figure(figsize=(8, 6))

ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x,
y,
f(x, y),
rstride=2, cstride=2,
cmap=cm.jet,
alpha=0.7,
linewidth=0.25)
ax.set_zlim(-0.5, 1.0)
plt.show()

7.4.4 A Customizing Function

Perhaps you will find a set of customizations that you regularly use
Suppose we usually prefer our axes to go through the origin, and to have a grid
7.5. FURTHER READING 107

Here’s a nice example from Matthew Doty of how the object-oriented API can be used to
build a custom subplots function that implements these changes
Read carefully through the code and see if you can follow what’s going on

In [10]: def subplots():

"Custom subplots with axes through the origin"
fig, ax = plt.subplots()

# Set the axes through the origin

for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.grid()
return fig, ax

fig, ax = subplots() # Call the local version, not plt.subplots()

x = np.linspace(-2, 10, 200)
y = np.sin(x)
ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend(loc='lower right')
plt.show()

The custom subplots function

1. calls the standard plt.subplots function internally to generate the fig, ax pair,
2. makes the desired customizations to ax, and
3. passes the fig, ax pair back to the calling code

7.5 Further Reading

• The Matplotlib gallery provides many examples

• A nice Matplotlib tutorial by Nicolas Rougier, Mike Muller and Gael Varoquaux
108 7. MATPLOTLIB

• mpltools allows easy switching between plot styles

• Seaborn facilitates common statistics plots in Matplotlib

7.6 Exercises

7.6.1 Exercise 1

Plot the function

𝑓(𝑥) = cos(𝜋𝜃𝑥) exp(−𝑥)

over the interval [0, 5] for each 𝜃 in np.linspace(0, 2, 10)

Place all the curves in the same figure
The output should look like this

7.7 Solutions

7.7.1 Exercise 1

Here’s one solution

In [11]: θ_vals = np.linspace(0, 2, 10)

x = np.linspace(0, 5, 200)
fig, ax = plt.subplots()

for θ in θ_vals:
ax.plot(x, np.cos(np.pi * θ * x) * np.exp(- x))

plt.show()
7.7. SOLUTIONS 109
110 7. MATPLOTLIB
8

SciPy

8.1 Contents

• SciPy versus NumPy 8.2

• Statistics 8.3

• Roots and Fixed Points 8.4

• Optimization 8.5

• Integration 8.6

• Linear Algebra 8.7

• Exercises 8.8

• Solutions 8.9

SciPy builds on top of NumPy to provide common tools for scientific programming such as

• linear algebra
• numerical integration
• interpolation
• optimization
• distributions and random number generation
• signal processing
• etc., etc

Like NumPy, SciPy is stable, mature and widely used

Many SciPy routines are thin wrappers around industry-standard Fortran libraries such as
LAPACK, BLAS, etc.
It’s not really necessary to “learn” SciPy as a whole
A more common approach is to get some idea of what’s in the library and then look up docu-
mentation as required
In this lecture, we aim only to highlight some useful parts of the package

111
112 8. SCIPY

8.2 SciPy versus NumPy

SciPy is a package that contains various tools that are built on top of NumPy, using its array
data type and related functionality
In fact, when we import SciPy we also get NumPy, as can be seen from the SciPy initializa-
tion file

In [1]: # Import numpy symbols to scipy namespace

import numpy as _num
linalg = None
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *

__all__ = []
__all__ += _num.__all__
__all__ += ['randn', 'rand', 'fft', 'ifft']

del _num
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
__all__.remove('linalg')

However, it’s more common and better practice to use NumPy functionality explicitly

In [2]: import numpy as np

a = np.identity(3)

What is useful in SciPy is the functionality in its sub-packages

• scipy.optimize, scipy.integrate, scipy.stats, etc.

These sub-packages and their attributes need to be imported separately

In [3]: from scipy.integrate import quad

from scipy.optimize import brentq
# etc

Let’s explore some of the major sub-packages

8.3 Statistics

The scipy.stats subpackage supplies

• numerous random variable objects (densities, cumulative distributions, random sam-

pling, etc.)
• some estimation procedures
• some statistical tests
8.3. STATISTICS 113

8.3.1 Random Variables and Distributions

Recall that numpy.random provides functions for generating random variables

In [4]: np.random.beta(5, 5, size=3)

Out[4]: array([0.46025917, 0.2775525 , 0.25400856])

This generates a draw from the distribution below when a, b = 5, 5

𝑥(𝑎−1) (1 − 𝑥)(𝑏−1)
𝑓(𝑥; 𝑎, 𝑏) = 1
(0 ≤ 𝑥 ≤ 1) (1)
∫0 𝑢(𝑎−1) (1 − 𝑢)(𝑏−1) 𝑑𝑢

Sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this, we can use scipy.stats, which provides all of this functionality as well as random
number generation in a single consistent interface
Here’s an example of usage

In [5]: from scipy.stats import beta

import matplotlib.pyplot as plt
%matplotlib inline

q = beta(5, 5) # Beta(a, b), with a = b = 5

obs = q.rvs(2000) # 2000 observations
grid = np.linspace(0.01, 0.99, 100)

fig, ax = plt.subplots(figsize=(10, 6))

ax.hist(obs, bins=40, density=True)
ax.plot(grid, q.pdf(grid), 'k-', linewidth=2)
plt.show()

In this code, we created a so-called rv_frozen object, via the call q = beta(5, 5)
114 8. SCIPY

The “frozen” part of the notation implies that q represents a particular distribution with a
particular set of parameters
Once we’ve done so, we can then generate random numbers, evaluate the density, etc., all
from this fixed distribution

In [6]: q.cdf(0.4) # Cumulative distribution function

Out[6]: 0.26656768000000003

In [7]: q.pdf(0.4) # Density function

Out[7]: 2.0901888000000013

In [8]: q.ppf(0.8) # Quantile (inverse cdf) function

Out[8]: 0.6339134834642708

In [9]: q.mean()

Out[9]: 0.5

The general syntax for creating these objects is

identifier = scipy.stats.distribution_name(shape_parameters)

where distribution_name is one of the distribution names in scipy.stats

There are also two keyword arguments, loc and scale, which following our example above,
are called as

identifier = scipy.stats.distribution_name(shape_parameters,
loc=c, scale=d)

These transform the original random variable 𝑋 into 𝑌 = 𝑐 + 𝑑𝑋

The methods rvs, pdf, cdf, etc. are transformed accordingly
Before finishing this section, we note that there is an alternative way of calling the methods
described above
For example, the previous code can be replaced by

In [10]: obs = beta.rvs(5, 5, size=2000)

grid = np.linspace(0.01, 0.99, 100)

fig, ax = plt.subplots()
ax.hist(obs, bins=40, density=True)
ax.plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
plt.show()
8.4. ROOTS AND FIXED POINTS 115

8.3.2 Other Goodies in scipy.stats

There are a variety statistical functions in scipy.stats

For example, scipy.stats.linregress implements simple linear regression

In [11]: from scipy.stats import linregress

x = np.random.randn(200)
y = 2 * x + 0.1 * np.random.randn(200)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
gradient, intercept

Out[11]: (2.0015196606243273, 0.009718239356687364)

To see the full list, consult the documentation

8.4 Roots and Fixed Points

A root of a real function 𝑓 on [𝑎, 𝑏] is an 𝑥 ∈ [𝑎, 𝑏] such that 𝑓(𝑥) = 0

For example, if we plot the function

𝑓(𝑥) = sin(4(𝑥 − 1/4)) + 𝑥 + 𝑥20 − 1 (2)

with 𝑥 ∈ [0, 1] we get

In [12]: f = lambda x: np.sin(4 * (x - 1/4)) + x + x**20 - 1

x = np.linspace(0, 1, 100)
116 8. SCIPY

plt.figure(figsize=(10, 8))
plt.plot(x, f(x))
plt.axhline(ls='--', c='k')
plt.show()

The unique root is approximately 0.408

Let’s consider some numerical techniques for finding roots

8.4.1 Bisection

One of the most common algorithms for numerical root-finding is bisection

To understand the idea, recall the well-known game where

• Player A thinks of a secret number between 1 and 100

• Player B asks if it’s less than 50

– If yes, B asks if it’s less than 25

– If no, B asks if it’s less than 75

And so on
This is bisection
Here’s a fairly simplistic implementation of the algorithm in Python
It works for all sufficiently well behaved increasing continuous functions with 𝑓(𝑎) < 0 < 𝑓(𝑏)
8.4. ROOTS AND FIXED POINTS 117

In [13]: def bisect(f, a, b, tol=10e-5):

"""
Implements the bisection root finding algorithm, assuming that f is a
real-valued function on [a, b] satisfying f(a) < 0 < f(b).
"""
lower, upper = a, b

while upper - lower > tol:

middle = 0.5 * (upper + lower)
# === if root is between lower and middle === #
if f(middle) > 0:
lower, upper = lower, middle
# === if root is between middle and upper === #
else:
lower, upper = middle, upper

return 0.5 * (upper + lower)

In fact, SciPy provides its own bisection function, which we now test using the function 𝑓 de-
fined in Eq. (2)

In [14]: from scipy.optimize import bisect

bisect(f, 0, 1)

Out[14]: 0.4082935042806639

8.4.2 The Newton-Raphson Method

Another very common root-finding algorithm is the Newton-Raphson method

In SciPy this algorithm is implemented by scipy.optimize.newton
Unlike bisection, the Newton-Raphson method uses local slope information
This is a double-edged sword:

• When the function is well-behaved, the Newton-Raphson method is faster than bisec-
tion
• When the function is less well-behaved, the Newton-Raphson might fail

Let’s investigate this using the same function 𝑓, first looking at potential instability

In [15]: from scipy.optimize import newton

newton(f, 0.2) # Start the search at initial condition x = 0.2

Out[15]: 0.40829350427935673

In [16]: newton(f, 0.7) # Start the search at x = 0.7 instead

Out[16]: 0.7001700000000279

The second initial condition leads to failure of convergence

On the other hand, using IPython’s timeit magic, we see that newton can be much faster

In [17]: %timeit bisect(f, 0, 1)

118 8. SCIPY

62.4 µs ± 4.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [18]: %timeit newton(f, 0.2)

149 µs ± 5.77 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

8.4.3 Hybrid Methods

So far we have seen that the Newton-Raphson method is fast but not robust
This bisection algorithm is robust but relatively slow
This illustrates a general principle

• If you have specific knowledge about your function, you might be able to exploit it to
generate efficiency
• If not, then the algorithm choice involves a trade-off between the speed of convergence
and robustness

In practice, most default algorithms for root-finding, optimization and fixed points use hybrid
methods
These methods typically combine a fast method with a robust method in the following man-
ner:

1. Attempt to use a fast method

2. Check diagnostics
3. If diagnostics are bad, then switch to a more robust algorithm

In scipy.optimize, the function brentq is such a hybrid method and a good default

In [19]: brentq(f, 0, 1)

Out[19]: 0.40829350427936706

In [20]: %timeit brentq(f, 0, 1)

15.6 µs ± 840 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Here the correct solution is found and the speed is almost the same as newton

8.4.4 Multivariate Root-Finding

Use scipy.optimize.fsolve, a wrapper for a hybrid method in MINPACK

See the documentation for details
8.5. OPTIMIZATION 119

8.4.5 Fixed Points

SciPy has a function for finding (scalar) fixed points too

In [21]: from scipy.optimize import fixed_point

fixed_point(lambda x: x**2, 10.0) # 10.0 is an initial guess

Out[21]: array(1.)

If you don’t get good results, you can always switch back to the brentq root finder, since
the fixed point of a function 𝑓 is the root of 𝑔(𝑥) ∶= 𝑥 − 𝑓(𝑥)

8.5 Optimization

Most numerical packages provide only functions for minimization

Maximization can be performed by recalling that the maximizer of a function 𝑓 on domain 𝐷
is the minimizer of −𝑓 on 𝐷
Minimization is closely related to root-finding: For smooth functions, interior optima corre-
spond to roots of the first derivative
The speed/robustness trade-off described above is present with numerical optimization too
Unless you have some prior information you can exploit, it’s usually best to use hybrid meth-
ods
For constrained, univariate (i.e., scalar) minimization, a good hybrid option is fminbound

In [22]: from scipy.optimize import fminbound

fminbound(lambda x: x**2, -1, 2) # Search in [-1, 2]

Out[22]: 0.0

8.5.1 Multivariate Optimization

Multivariate local optimizers include minimize, fmin, fmin_powell, fmin_cg,

fmin_bfgs, and fmin_ncg
Constrained multivariate local optimizers include fmin_l_bfgs_b, fmin_tnc,
fmin_cobyla
See the documentation for details

8.6 Integration

Most numerical integration methods work by computing the integral of an approximating

polynomial
The resulting error depends on how well the polynomial fits the integrand, which in turn de-
pends on how “regular” the integrand is
120 8. SCIPY

In SciPy, the relevant module for numerical integration is scipy.integrate

A good default for univariate integration is quad

In [23]: from scipy.integrate import quad

integral, error = quad(lambda x: x**2, 0, 1)

integral

Out[23]: 0.33333333333333337

In fact, quad is an interface to a very standard numerical integration routine in the Fortran
library QUADPACK
It uses Clenshaw-Curtis quadrature, based on expansion in terms of Chebychev polynomials
There are other options for univariate integration—a useful one is fixed_quad, which is fast
and hence works well inside for loops
There are also functions for multivariate integration
See the documentation for more details

8.7 Linear Algebra

We saw that NumPy provides a module for linear algebra called linalg
SciPy also provides a module for linear algebra with the same name
The latter is not an exact superset of the former, but overall it has more functionality
We leave you to investigate the set of available routines

8.8 Exercises

8.8.1 Exercise 1

Previously we discussed the concept of recursive function calls

Write a recursive implementation of the bisection function described above, which we repeat
here for convenience

In [24]: def bisect(f, a, b, tol=10e-5):

"""
Implements the bisection root finding algorithm, assuming that f is a
real-valued function on [a, b] satisfying f(a) < 0 < f(b).
"""
lower, upper = a, b

while upper - lower > tol:

return 0.5 * (upper + lower)

8.9. SOLUTIONS 121

Test it on the function f = lambda x: np.sin(4 * (x - 0.25)) + x + x**20 -

1 discussed above

8.9 Solutions

8.9.1 Exercise 1

Here’s a reasonable solution:

In [25]: def bisect(f, a, b, tol=10e-5):

"""
Implements the bisection root-finding algorithm, assuming that f is a
real-valued function on [a, b] satisfying f(a) < 0 < f(b).
"""
lower, upper = a, b
if upper - lower < tol:
return 0.5 * (upper + lower)
else:
middle = 0.5 * (upper + lower)
print(f'Current mid point = {middle}')
if f(middle) > 0: # Implies root is between lower and middle
return bisect(f, lower, middle)
else: # Implies root is between middle and upper
return bisect(f, middle, upper)

We can test it as follows

In [26]: f = lambda x: np.sin(4 * (x - 0.25)) + x + x**20 - 1

bisect(f, 0, 1)

Current mid point = 0.5

Current mid point = 0.25
Current mid point = 0.375
Current mid point = 0.4375
Current mid point = 0.40625
Current mid point = 0.421875
Current mid point = 0.4140625
Current mid point = 0.41015625
Current mid point = 0.408203125
Current mid point = 0.4091796875
Current mid point = 0.40869140625
Current mid point = 0.408447265625
Current mid point = 0.4083251953125
Current mid point = 0.40826416015625

Out[26]: 0.408294677734375
122 8. SCIPY
9

Numba

9.1 Contents

• Overview 9.2

• Where are the Bottlenecks? 9.3

• Vectorization 9.4

• Numba 9.5

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

9.2 Overview

In our lecture on NumPy, we learned one method to improve speed and efficiency in numeri-
cal work
That method, called vectorization, involved sending array processing operations in batch to
efficient low-level code
This clever idea dates back to Matlab, which uses it extensively
Unfortunately, vectorization is limited and has several weaknesses
One weakness is that it is highly memory-intensive
Another problem is that only some algorithms can be vectorized
In the last few years, a new Python library called Numba has appeared that solves many of
these problems
It does so through something called just in time (JIT) compilation
JIT compilation is effective in many numerical settings and can generate extremely fast, effi-
cient code
It can also do other tricks such as facilitate multithreading (a form of parallelization well
suited to numerical work)

123
124 9. NUMBA

9.2.1 The Need for Speed

To understand what Numba does and why, we need some background knowledge
Let’s start by thinking about higher-level languages, such as Python
These languages are optimized for humans
This means that the programmer can leave many details to the runtime environment

• specifying variable types

• memory allocation/deallocation, etc.

The upside is that, compared to low-level languages, Python is typically faster to write, less
error-prone and easier to debug
The downside is that Python is harder to optimize — that is, turn into fast machine code —
than languages like C or Fortran
Indeed, the standard implementation of Python (called CPython) cannot match the speed of
compiled languages such as C or Fortran
Does that mean that we should just switch to C or Fortran for everything?
The answer is no, no and one hundred times no
High productivity languages should be chosen over high-speed languages for the great major-
ity of scientific computing tasks
This is because

1. Of any given program, relatively few lines are ever going to be time-critical
2. For those lines of code that are time-critical, we can achieve C-like speed using a combi-
nation of NumPy and Numba

This lecture provides a guide

9.3 Where are the Bottlenecks?

Let’s start by trying to understand why high-level languages like Python are slower than com-
piled code

9.3.1 Dynamic Typing

Consider this Python operation

In [2]: a, b = 10, 10
a + b

Out[2]: 20

Even for this simple operation, the Python interpreter has a fair bit of work to do
For example, in the statement a + b, the interpreter has to know which operation to invoke
If a and b are strings, then a + b requires string concatenation
9.3. WHERE ARE THE BOTTLENECKS? 125

In [3]: a, b = 'foo', 'bar'

a + b

Out[3]: 'foobar'

If a and b are lists, then a + b requires list concatenation

In [4]: a, b = ['foo'], ['bar']

a + b

Out[4]: ['foo', 'bar']

(We say that the operator + is overloaded — its action depends on the type of the objects on
which it acts)
As a result, Python must check the type of the objects and then call the correct operation
This involves substantial overheads
Static Types
Compiled languages avoid these overheads with explicit, static types
For example, consider the following C code, which sums the integers from 1 to 10

#include <stdio.h>

int main(void) {
int i;
int sum = 0;
for (i = 1; i <= 10; i++) {
sum = sum + i;
}
printf("sum = %d\n", sum);
return 0;
}

The variables i and sum are explicitly declared to be integers

Hence, the meaning of addition here is completely unambiguous

9.3.2 Data Access

Another drag on speed for high-level languages is data access

To illustrate, let’s consider the problem of summing some data — say, a collection of integers
Summing with Compiled Code
In C or Fortran, these integers would typically be stored in an array, which is a simple data
structure for storing homogeneous data
Such an array is stored in a single contiguous block of memory

• In modern computers, memory addresses are allocated to each byte (one byte = 8 bits)
126 9. NUMBA

• For example, a 64 bit integer is stored in 8 bytes of memory

• An array of 𝑛 such integers occupies 8𝑛 consecutive memory slots

Moreover, the compiler is made aware of the data type by the programmer

• In this case 64 bit integers

Hence, each successive data point can be accessed by shifting forward in memory space by a
known and fixed amount

• In this case 8 bytes

Summing in Pure Python

Python tries to replicate these ideas to some degree
For example, in the standard Python implementation (CPython), list elements are placed in
memory locations that are in a sense contiguous
However, these list elements are more like pointers to data rather than actual data
Hence, there is still overhead involved in accessing the data values themselves
This is a considerable drag on speed
In fact, it’s generally true that memory traffic is a major culprit when it comes to slow execu-
tion
Let’s look at some ways around these problems

9.4 Vectorization

Vectorization is about sending batches of related operations to native machine code

• The machine code itself is typically compiled from carefully optimized C or Fortran

This can greatly accelerate many (but not all) numerical computations

9.4.1 Operations on Arrays

First, let’s run some imports

In [5]: import random

import numpy as np
import quantecon as qe

Now let’s try this non-vectorized code

In [6]: qe.util.tic() # Start timing

n = 100_000
sum = 0
for i in range(n):
x = random.uniform(0, 1)
sum += x**2
qe.util.toc() # End timing
9.4. VECTORIZATION 127

TOC: Elapsed: 0:00:0.04

Out[6]: 0.04178762435913086

Now compare this vectorized code

In [7]: qe.util.tic()
n = 100_000
x = np.random.uniform(0, 1, n)
np.sum(x**2)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[7]: 0.0038301944732666016

The second code block — which achieves the same thing as the first — runs much faster
The reason is that in the second implementation we have broken the loop down into three
basic operations

1. draw n uniforms
2. square them
3. sum them

These are sent as batch operators to optimized machine code

Apart from minor overheads associated with sending data back and forth, the result is C or
Fortran-like speed
When we run batch operations on arrays like this, we say that the code is vectorized
Vectorized code is typically fast and efficient
It is also surprisingly flexible, in the sense that many operations can be vectorized
The next section illustrates this point

9.4.2 Universal Functions

Many functions provided by NumPy are so-called universal functions — also called ufuncs
This means that they

• map scalars into scalars, as expected

• map arrays into arrays, acting element-wise

For example, np.cos is a ufunc:

In [8]: np.cos(1.0)

Out[8]: 0.5403023058681398
128 9. NUMBA

In [9]: np.cos(np.linspace(0, 1, 3))

Out[9]: array([1. , 0.87758256, 0.54030231])

By exploiting ufuncs, many operations can be vectorized

For example, consider the problem of maximizing a function 𝑓 of two variables (𝑥, 𝑦) over the
square [−𝑎, 𝑎] × [−𝑎, 𝑎]
For 𝑓 and 𝑎 let’s choose

cos(𝑥2 + 𝑦2 )
𝑓(𝑥, 𝑦) = and 𝑎 = 3
1 + 𝑥2 + 𝑦 2

Here’s a plot of 𝑓

In [10]: import matplotlib.pyplot as plt

%matplotlib inline
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm

def f(x, y):

return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

xgrid = np.linspace(-3, 3, 50)

ygrid = xgrid
x, y = np.meshgrid(xgrid, ygrid)

fig = plt.figure(figsize=(8, 6))

ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(x,
y,
f(x, y),
rstride=2, cstride=2,
cmap=cm.jet,
alpha=0.7,
linewidth=0.25)
ax.set_zlim(-0.5, 1.0)
plt.show()
9.4. VECTORIZATION 129

To maximize it, we’re going to use a naive grid search:

1. Evaluate 𝑓 for all (𝑥, 𝑦) in a grid on the square

2. Return the maximum of observed values

Here’s a non-vectorized version that uses Python loops

In [11]: def f(x, y):

return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

grid = np.linspace(-3, 3, 1000)

m = -np.inf

qe.tic()
for x in grid:
for y in grid:
z = f(x, y)
if z > m:
m = z

qe.toc()

TOC: Elapsed: 0:00:2.74

Out[11]: 2.7486989498138428

And here’s a vectorized version

In [12]: def f(x, y):

return np.cos(x**2 + y**2) / (1 + x**2 + y**2)
130 9. NUMBA

grid = np.linspace(-3, 3, 1000)

x, y = np.meshgrid(grid, grid)

qe.tic()
np.max(f(x, y))
qe.toc()

TOC: Elapsed: 0:00:0.02

Out[12]: 0.02516627311706543

In the vectorized version, all the looping takes place in compiled code
As you can see, the second version is much faster
(We’ll make it even faster again below when we discuss Numba)

9.4.3 Pros and Cons of Vectorization

At its best, vectorization yields fast, simple code

However, it’s not without disadvantages
One issue is that it can be highly memory-intensive
For example, the vectorized maximization routine above is far more memory intensive than
the non-vectorized version that preceded it
Another issue is that not all algorithms can be vectorized
In these kinds of settings, we need to go back to loops
Fortunately, there are nice ways to speed up Python loops

9.5 Numba

One exciting development in this direction is Numba

Numba aims to automatically compile functions to native machine code instructions on the
fly
The process isn’t flawless, since Numba needs to infer type information on all variables to
generate pure machine instructions
Such inference isn’t possible in every setting
But for simple routines, Numba infers types very well
Moreover, the “hot loops” at the heart of our code that we need to speed up are often such
simple routines

9.5.1 Prerequisites

If you followed our set up instructions, then Numba should be installed

Make sure you have the latest version of Anaconda by running conda update anaconda
from a terminal (Mac, Linux) / Anaconda command prompt (Windows)
9.5. NUMBA 131

9.5.2 An Example

Let’s consider some problems that are difficult to vectorize

One is generating the trajectory of a difference equation given an initial condition
Let’s take the difference equation to be the quadratic map

𝑥𝑡+1 = 4𝑥𝑡 (1 − 𝑥𝑡 )

Here’s the plot of a typical trajectory, starting from 𝑥0 = 0.1, with 𝑡 on the x-axis

In [13]: def qm(x0, n):

x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x

x = qm(0.1, 250)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, 'b-', lw=2, alpha=0.8)
ax.set_xlabel('time', fontsize=16)
plt.show()

To speed this up using Numba is trivial using Numba’s jit function

In [14]: from numba import jit

qm_numba = jit(qm) # qm_numba is now a 'compiled' version of qm

Let’s time and compare identical function calls across these two versions:

In [15]: qe.util.tic()
qm(0.1, int(10**5))
time1 = qe.util.toc()
132 9. NUMBA

TOC: Elapsed: 0:00:0.06

In [16]: qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()

TOC: Elapsed: 0:00:0.11

The first execution is relatively slow because of JIT compilation (see below)
Next time and all subsequent times it runs much faster:

In [17]: qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()

TOC: Elapsed: 0:00:0.00

In [18]: time1 / time2 # Calculate speed gain

Out[18]: 174.51294400963275

That’s a speed increase of two orders of magnitude!

Your mileage will of course vary depending on hardware and so on
Nonetheless, two orders of magnitude is huge relative to how simple and clear the implemen-
tation is
Decorator Notation
If you don’t need a separate name for the “numbafied” version of qm, you can just put @jit
before the function

In [19]: @jit
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x

This is equivalent to qm = jit(qm)

9.5.3 How and When it Works

Numba attempts to generate fast machine code using the infrastructure provided by the
LLVM Project
It does this by inferring type information on the fly
As you can imagine, this is easier for simple Python objects (simple scalar data types, such as
floats, integers, etc.)
Numba also plays well with NumPy arrays, which it treats as typed memory regions
9.5. NUMBA 133

In an ideal setting, Numba can infer all necessary type information

This allows it to generate native machine code, without having to call the Python runtime
environment
In such a setting, Numba will be on par with machine code from low-level languages
When Numba cannot infer all type information, some Python objects are given generic ob-
ject status, and some code is generated using the Python runtime
In this second setting, Numba typically provides only minor speed gains — or none at all
Hence, it’s prudent when using Numba to focus on speeding up small, time-critical snippets of
code
This will give you much better performance than blanketing your Python programs with
@jit statements
A Gotcha: Global Variables
Consider the following example

In [20]: a = 1

@jit
def add_x(x):
return a + x

print(add_x(10))

In [21]: a = 2

print(add_x(10))

Notice that changing the global had no effect on the value returned by the function
When Numba compiles machine code for functions, it treats global variables as constants to
ensure type stability

9.5.4 Numba for Vectorization

Numba can also be used to create custom ufuncs with the @vectorize decorator
To illustrate the advantage of using Numba to vectorize a function, we return to a maximiza-
tion problem discussed above

In [22]: from numba import vectorize

@vectorize
def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

grid = np.linspace(-3, 3, 1000)

x, y = np.meshgrid(grid, grid)
134 9. NUMBA

np.max(f_vec(x, y)) # Run once to compile

qe.tic()
np.max(f_vec(x, y))
qe.toc()

TOC: Elapsed: 0:00:0.03

Out[22]: 0.030055522918701172

This is faster than our vectorized version using NumPy’s ufuncs

Why should that be? After all, anything vectorized with NumPy will be running in fast C or
Fortran code
The reason is that it’s much less memory-intensive
For example, when NumPy computes np.cos(x**2 + y**2) it first creates the intermedi-
ate arrays x**2 and y**2, then it creates the array np.cos(x**2 + y**2)
In our @vectorize version using Numba, the entire operator is reduced to a single vector-
ized process and none of these intermediate arrays are created
We can gain further speed improvements using Numba’s automatic parallelization feature by
specifying target='parallel'
In this case, we need to specify the types of our inputs and outputs

In [23]: @vectorize('float64(float64, float64)', target='parallel')

def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)

np.max(f_vec(x, y)) # Run once to compile

qe.tic()
np.max(f_vec(x, y))
qe.toc()

TOC: Elapsed: 0:00:0.02

Out[23]: 0.023700714111328125

This is a striking speed up with very little effort

Other Scientific Libraries

10.1 Contents

• Overview 10.2

• Cython 10.3

• Joblib 10.4

• Other Options 10.5

• Exercises 10.6

• Solutions 10.7

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

10.2 Overview

In this lecture, we review some other scientific libraries that are useful for economic research
and analysis
We have, however, already picked most of the low hanging fruit in terms of economic research
Hence you should feel free to skip this lecture on first pass

10.3 Cython

Like Numba, Cython provides an approach to generating fast compiled code that can be used
from Python
As was the case with Numba, a key problem is the fact that Python is dynamically typed
As you’ll recall, Numba solves this problem (where possible) by inferring type
Cython’s approach is different — programmers add type definitions directly to their “Python”
code

135
136 10. OTHER SCIENTIFIC LIBRARIES

As such, the Cython language can be thought of as Python with type definitions
In addition to a language specification, Cython is also a language translator, transforming
Cython code into optimized C and C++ code
Cython also takes care of building language extensions — the wrapper code that interfaces
between the resulting compiled code and Python
Important Note:
In what follows code is executed in a Jupyter notebook
This is to take advantage of a Cython cell magic that makes Cython particularly easy to use
Some modifications are required to run the code outside a notebook

• See the book Cython by Kurt Smith or the online documentation

10.3.1 A First Example

Let’s start with a rather artificial example

𝑛
Suppose that we want to compute the sum ∑𝑖=0 𝛼𝑖 for given 𝛼, 𝑛
Suppose further that we’ve forgotten the basic formula

𝑛
1 − 𝛼𝑛+1
∑ 𝛼𝑖 =
𝑖=0
1−𝛼

for a geometric progression and hence have resolved to rely on a loop

Python vs C
Here’s a pure Python function that does the job

In [2]: def geo_prog(alpha, n):

current = 1.0
sum = current
for i in range(n):
current = current * alpha
sum = sum + current
return sum

This works fine but for large 𝑛 it is slow

Here’s a C function that will do the same thing

double geo_prog(double alpha, int n) {

double current = 1.0;
double sum = current;
int i;
for (i = 1; i <= n; i++) {
current = current * alpha;
sum = sum + current;
}
return sum;
}
10.3. CYTHON 137

If you’re not familiar with C, the main thing you should take notice of is the type definitions

• int means integer

• double means double precision floating-point number
• the double in double geo_prog(... indicates that the function will return a dou-
ble

Not surprisingly, the C code is faster than the Python code

A Cython Implementation
Cython implementations look like a convex combination of Python and C
We’re going to run our Cython code in the Jupyter notebook, so we’ll start by loading the
Cython extension in a notebook cell

In [3]: %load_ext Cython

In the next cell, we execute the following

In [4]: %%cython
def geo_prog_cython(double alpha, int n):
cdef double current = 1.0
cdef double sum = current
cdef int i
for i in range(n):
current = current * alpha
sum = sum + current
return sum

Here cdef is a Cython keyword indicating a variable declaration and is followed by a type
The %%cython line at the top is not actually Cython code — it’s a Jupyter cell magic indi-
cating the start of Cython code
After executing the cell, you can now call the function geo_prog_cython from within
Python
What you are in fact calling is compiled C code with a Python call interface

In [5]: import quantecon as qe

qe.util.tic()
geo_prog(0.99, int(10**6))
qe.util.toc()

TOC: Elapsed: 0:00:0.08

Out[5]: 0.0884397029876709

In [6]: qe.util.tic()
geo_prog_cython(0.99, int(10**6))
qe.util.toc()

TOC: Elapsed: 0:00:0.03

Out[6]: 0.03421354293823242
138 10. OTHER SCIENTIFIC LIBRARIES

10.3.2 Example 2: Cython with NumPy Arrays

Let’s go back to the first problem that we worked with: generating the iterates of the
quadratic map

𝑥𝑡+1 = 4𝑥𝑡 (1 − 𝑥𝑡 )

The problem of computing iterates and returning a time series requires us to work with ar-
rays
The natural array type to work with is NumPy arrays
Here’s a Cython implementation that initializes, populates and returns a NumPy array

In [7]: %%cython
import numpy as np

def qm_cython_first_pass(double x0, int n):

cdef int t
x = np.zeros(n+1, float)
x[0] = x0
for t in range(n):
x[t+1] = 4.0 * x[t] * (1 - x[t])
return np.asarray(x)

If you run this code and time it, you will see that its performance is disappointing — nothing
like the speed gain we got from Numba

In [8]: qe.util.tic()
qm_cython_first_pass(0.1, int(10**5))
qe.util.toc()

TOC: Elapsed: 0:00:0.03

Out[8]: 0.03150629997253418

This example was also computed in the Numba lecture, and you can see Numba is around 90
times faster
The reason is that working with NumPy arrays incurs substantial Python overheads
We can do better by using Cython’s typed memoryviews, which provide more direct access to
arrays in memory
When using them, the first step is to create a NumPy array
Next, we declare a memoryview and bind it to the NumPy array
Here’s an example:

In [9]: %%cython
import numpy as np
from numpy cimport float_t

def qm_cython(double x0, int n):

cdef int t
x_np_array = np.zeros(n+1, dtype=float)
cdef float_t [:] x = x_np_array
x[0] = x0
for t in range(n):
x[t+1] = 4.0 * x[t] * (1 - x[t])
return np.asarray(x)
10.4. JOBLIB 139

Here

• cimport pulls in some compile-time information from NumPy

• cdef float_t [:] x = x_np_array creates a memoryview on the NumPy array
x_np_array
• the return statement uses np.asarray(x) to convert the memoryview back to a
NumPy array

Let’s time it:

In [10]: qe.util.tic()
qm_cython(0.1, int(10**5))
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[10]: 0.0006136894226074219

This is fast, although still slightly slower than qm_numba

10.3.3 Summary

Cython requires more expertise than Numba, and is a little more fiddly in terms of getting
good performance
In fact, it’s surprising how difficult it is to beat the speed improvements provided by Numba
Nonetheless,

• Cython is a very mature, stable and widely used tool

• Cython can be more useful than Numba when working with larger, more sophisticated
applications

10.4 Joblib

Joblib is a popular Python library for caching and parallelization

To install it, start Jupyter and type

In [11]: !pip install joblib

Requirement already satisfied: joblib in /home/anju/anaconda3/lib/python3.7/site-packages (0.13.2)

from within a notebook

Here we review just the basics
140 10. OTHER SCIENTIFIC LIBRARIES

10.4.1 Caching

Perhaps, like us, you sometimes run a long computation that simulates a model at a given set
of parameters — to generate a figure, say, or a table
20 minutes later you realize that you want to tweak the figure and now you have to do it all
again
What caching will do is automatically store results at each parameterization
With Joblib, results are compressed and stored on file, and automatically served back up to
you when you repeat the calculation

10.4.2 An Example

Let’s look at a toy example, related to the quadratic map model discussed above
Let’s say we want to generate a long trajectory from a certain initial condition 𝑥0 and see
what fraction of the sample is below 0.1
(We’ll omit JIT compilation or other speedups for simplicity)
Here’s our code

In [12]: from joblib import Memory

location = './cachedir'
memory = Memory(location='./joblib_cache')

@memory.cache
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return np.mean(x < 0.1)

We are using joblib to cache the result of calling qm at a given set of parameters
With the argument location=’./joblib_cache’, any call to this function results in both the in-
put values and output values being stored a subdirectory joblib_cache of the present working
directory
(In UNIX shells, . refers to the present working directory)
The first time we call the function with a given set of parameters we see some extra output
that notes information being cached

In [13]: qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()

________________________________________________________________________________
[Memory] Calling __main__--home-anju-Desktop-lecture-source-py-_build-jupyter-executed-__ipython-input__.qm…
qm(0.2, 10000000)
_______________________________________________________________qm - 8.9s, 0.1min
TOC: Elapsed: 0:00:8.85

Out[13]: 8.85545039176941
10.5. OTHER OPTIONS 141

The next time we call the function with the same set of parameters, the result is returned
almost instantaneously

In [14]: qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[14]: 0.0007827281951904297

10.5 Other Options

There are in fact many other approaches to speeding up your Python code
One is interfacing with Fortran
If you are comfortable writing Fortran you will find it very easy to create extension modules
from Fortran code using F2Py
F2Py is a Fortran-to-Python interface generator that is particularly simple to use
Robert Johansson provides a very nice introduction to F2Py, among other things
Recently, a Jupyter cell magic for Fortran has been developed — you might want to give it a
try

10.6 Exercises

10.6.1 Exercise 1

Later we’ll learn all about finite-state Markov chains

For now, let’s just concentrate on simulating a very simple example of such a chain
Suppose that the volatility of returns on an asset can be in one of two regimes — high or low
The transition probabilities across states are as follows

For example, let the period length be one month, and suppose the current state is high
We see from the graph that the state next month will be

• high with probability 0.8

• low with probability 0.2
142 10. OTHER SCIENTIFIC LIBRARIES

Your task is to simulate a sequence of monthly volatility states according to this rule
Set the length of the sequence to n = 100000 and start in the high state
Implement a pure Python version, a Numba version and a Cython version, and compare
speeds
To test your code, evaluate the fraction of time that the chain spends in the low state
If your code is correct, it should be about 2/3

10.7 Solutions

10.7.1 Exercise 1

We let

• 0 represent “low”
• 1 represent “high”

In [15]: p, q = 0.1, 0.2 # Prob of leaving low and high state respectively

Here’s a pure Python version of the function

In [16]: def compute_series(n):

x = np.empty(n, dtype=int)
x[0] = 1 # Start in state 1
U = np.random.uniform(0, 1, size=n)
for t in range(1, n):
current_x = x[t-1]
if current_x == 0:
x[t] = U[t] < p
else:
x[t] = U[t] > q
return x

Let’s run this code and check that the fraction of time spent in the low state is about 0.666

In [17]: n = 100000
x = compute_series(n)
print(np.mean(x == 0)) # Fraction of time x is in state 0

0.6629

Now let’s time it

In [18]: qe.util.tic()
compute_series(n)
qe.util.toc()

TOC: Elapsed: 0:00:0.07

Out[18]: 0.0751335620880127
10.7. SOLUTIONS 143

Next let’s implement a Numba version, which is easy

In [19]: from numba import jit

compute_series_numba = jit(compute_series)

Let’s check we still get the right numbers

In [20]: x = compute_series_numba(n)
print(np.mean(x == 0))

0.66566

Let’s see the time

In [21]: qe.util.tic()
compute_series_numba(n)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[21]: 0.0015265941619873047

This is a nice speed improvement for one line of code

Now let’s implement a Cython version

In [22]: %load_ext Cython

The Cython extension is already loaded. To reload it, use:

%reload_ext Cython

In [23]: %%cython
import numpy as np
from numpy cimport int_t, float_t

def compute_series_cy(int n):

# == Create NumPy arrays first == #
x_np = np.empty(n, dtype=int)
U_np = np.random.uniform(0, 1, size=n)
# == Now create memoryviews of the arrays == #
cdef int_t [:] x = x_np
cdef float_t [:] U = U_np
# == Other variable declarations == #
cdef float p = 0.1
cdef float q = 0.2
cdef int t
# == Main loop == #
x[0] = 1
for t in range(1, n):
current_x = x[t-1]
if current_x == 0:
x[t] = U[t] < p
else:
x[t] = U[t] > q
return np.asarray(x)

In [24]: compute_series_cy(10)
144 10. OTHER SCIENTIFIC LIBRARIES

Out[24]: array([1, 1, 1, 1, 0, 0, 1, 0, 0, 0])

In [25]: x = compute_series_cy(n)
print(np.mean(x == 0))

0.66746

In [26]: qe.util.tic()
compute_series_cy(n)
qe.util.toc()

TOC: Elapsed: 0:00:0.00

Out[26]: 0.0033597946166992188

The Cython implementation is fast but not as fast as Numba

Part III

Advanced Python Programming

145
11

Writing Good Code

11.1 Contents

• Overview 11.2

• An Example of Bad Code 11.3

• Good Coding Practice 11.4

• Revisiting the Example 11.5

• Summary 11.6

11.2 Overview

When computer programs are small, poorly written code is not overly costly
But more data, more sophisticated models, and more computer power are enabling us to take
on more challenging problems that involve writing longer programs
For such programs, investment in good coding practices will pay high returns
The main payoffs are higher productivity and faster code
In this lecture, we review some elements of good coding practice
We also touch on modern developments in scientific computing — such as just in time compi-
lation — and how they affect good program design

11.3 An Example of Bad Code

Let’s have a look at some poorly written code

The job of the code is to generate and plot time series of the simplified Solow model

𝑘𝑡+1 = 𝑠𝑘𝑡𝛼 + (1 − 𝛿)𝑘𝑡 , 𝑡 = 0, 1, 2, … (1)

Here

147
148 11. WRITING GOOD CODE

• 𝑘𝑡 is capital at time 𝑡 and

• 𝑠, 𝛼, 𝛿 are parameters (savings, a productivity parameter and depreciation)

For each parameterization, the code

1. sets 𝑘0 = 1
2. iterates using Eq. (1) to produce a sequence 𝑘0 , 𝑘1 , 𝑘2 … , 𝑘𝑇
3. plots the sequence

The plots will be grouped into three subfigures

In each subfigure, two parameters are held fixed while another varies

In [1]: import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

# Allocate memory for time series

k = np.empty(50)

fig, axes = plt.subplots(3, 1, figsize=(12, 15))

# Trajectories with different α

δ = 0.1
s = 0.4
α = (0.25, 0.33, 0.45)

for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α[j] + (1 - δ) * k[t]
axes[0].plot(k, 'o-', label=rf"$\alpha = {α[j]},\; s = {s},\; \delta={δ}$")

axes[0].grid(lw=0.2)
axes[0].set_ylim(0, 18)
axes[0].set_xlabel('time')
axes[0].set_ylabel('capital')
axes[0].legend(loc='upper left', frameon=True, fontsize=14)

# Trajectories with different s

δ = 0.1
α = 0.33
s = (0.3, 0.4, 0.5)

for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s[j] * k[t]**α + (1 - δ) * k[t]
axes[1].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta={δ}$")

axes[1].grid(lw=0.2)
axes[1].set_xlabel('time')
axes[1].set_ylabel('capital')
axes[1].set_ylim(0, 18)
axes[1].legend(loc='upper left', frameon=True, fontsize=14)

# Trajectories with different δ

δ = (0.05, 0.1, 0.15)
α = 0.33
s = 0.4

for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α + (1 - δ[j]) * k[t]
axes[2].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta={δ[j]}$")
11.3. AN EXAMPLE OF BAD CODE 149

axes[2].set_ylim(0, 18)
axes[2].set_xlabel('time')
axes[2].set_ylabel('capital')
axes[2].grid(lw=0.2)
axes[2].legend(loc='upper left', frameon=True, fontsize=14)

plt.show()

True, the code more or less follows PEP8

At the same time, it’s very poorly structured
Let’s talk about why that’s the case, and what we can do about it
150 11. WRITING GOOD CODE

11.4 Good Coding Practice

There are usually many different ways to write a program that accomplishes a given task
For small programs, like the one above, the way you write code doesn’t matter too much
But if you are ambitious and want to produce useful things, you’ll write medium to large pro-
grams too
In those settings, coding style matters a great deal
Fortunately, lots of smart people have thought about the best way to write code
Here are some basic precepts

11.4.1 Don’t Use Magic Numbers

If you look at the code above, you’ll see numbers like 50 and 49 and 3 scattered through the
code
These kinds of numeric literals in the body of your code are sometimes called “magic num-
bers”
This is not a complement
While numeric literals are not all evil, the numbers shown in the program above should cer-
tainly be replaced by named constants
For example, the code above could declare the variable time_series_length = 50
Then in the loops, 49 should be replaced by time_series_length - 1
The advantages are:

• the meaning is much clearer throughout

• to alter the time series length, you only need to change one value

11.4.2 Don’t Repeat Yourself

The other mortal sin in the code snippet above is repetition

Blocks of logic (such as the loop to generate time series) are repeated with only minor
changes
This violates a fundamental tenet of programming: Don’t repeat yourself (DRY)

• Also called DIE (duplication is evil)

Yes, we realize that you can just cut and paste and change a few symbols
But as a programmer, your aim should be to automate repetition, not do it yourself
More importantly, repeating the same logic in different places means that eventually one of
them will likely be wrong
If you want to know more, read the excellent summary found on this page
We’ll talk about how to avoid repetition below
11.4. GOOD CODING PRACTICE 151

11.4.3 Minimize Global Variables

Sure, global variables (i.e., names assigned to values outside of any function or class) are con-
venient
Rookie programmers typically use global variables with abandon — as we once did ourselves
But global variables are dangerous, especially in medium to large size programs, since

• they can affect what happens in any part of your program

• they can be changed by any function

This makes it much harder to be certain about what some small part of a given piece of code
actually commands
Here’s a useful discussion on the topic
While the odd global in small scripts is no big deal, we recommend that you teach yourself to
avoid them
(We’ll discuss how just below)
JIT Compilation
In fact, there’s now another good reason to avoid global variables
In scientific computing, we’re witnessing the rapid growth of just in time (JIT) compilation
JIT compilation can generate excellent performance for scripting languages like Python and
Julia
But the task of the compiler used for JIT compilation becomes much harder when many
global variables are present
(This is because data type instability hinders the generation of efficient machine code — we’ll
learn more about such topics later on)

11.4.4 Use Functions or Classes

Fortunately, we can easily avoid the evils of global variables and WET code

• WET stands for “we love typing” and is the opposite of DRY

We can do this by making frequent use of functions or classes

In fact, functions and classes are designed specifically to help us avoid shaming ourselves by
repeating code or excessive use of global variables
Which One, Functions or Classes?
Both can be useful, and in fact they work well with each other
We’ll learn more about these topics over time
(Personal preference is part of the story too)
What’s really important is that you use one or the other or both
152 11. WRITING GOOD CODE

11.5 Revisiting the Example

Here’s some code that reproduces the plot above with better coding style
It uses a function to avoid repetition
Note also that

• global variables are quarantined by collecting together at the end, not the start of the
program
• magic numbers are avoided
• the loop at the end where the actual work is done is short and relatively simple

In [2]: from itertools import product

def plot_path(ax, αs, s_vals, δs, series_length=50):

"""
Add a time series plot to the axes ax for all given parameters.
"""
k = np.empty(series_length)

for (α, s, δ) in product(αs, s_vals, δs):

k[0] = 1
for t in range(series_length-1):
k[t+1] = s * k[t]**α + (1 - δ) * k[t]
ax.plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta = {δ}$")

ax.grid(lw=0.2)
ax.set_xlabel('time')
ax.set_ylabel('capital')
ax.set_ylim(0, 18)
ax.legend(loc='upper left', frameon=True, fontsize=14)

fig, axes = plt.subplots(3, 1, figsize=(12, 15))

# Parameters (αs, s_vals, δs)

set_one = ([0.25, 0.33, 0.45], [0.4], [0.1])
set_two = ([0.33], [0.3, 0.4, 0.5], [0.1])
set_three = ([0.33], [0.4], [0.05, 0.1, 0.15])

for (ax, params) in zip(axes, (set_one, set_two, set_three)):

αs, s_vals, δs = params
plot_path(ax, αs, s_vals, δs)

plt.show()
11.6. SUMMARY 153

11.6 Summary

Writing decent code isn’t hard

It’s also fun and intellectually satisfying
We recommend that you cultivate good habits and style even when you write relatively short
programs
154 11. WRITING GOOD CODE
12

OOP II: Building Classes

12.1 Contents

• Overview 12.2

• OOP Review 12.3

• Defining Your Own Classes 12.4

• Special Methods 12.5

• Exercises 12.6

• Solutions 12.7

12.2 Overview

In an earlier lecture, we learned some foundations of object-oriented programming

The objectives of this lecture are

• cover OOP in more depth

• learn how to build our own objects, specialized to our needs

For example, you already know how to

• create lists, strings and other Python objects

• use their methods to modify their contents

So imagine now you want to write a program with consumers, who can

• hold and spend cash

• consume goods
• work and earn cash

A natural solution in Python would be to create consumers as objects with

155
156 12. OOP II: BUILDING CLASSES

• data, such as cash on hand

• methods, such as buy or work that affect this data

Python makes it easy to do this, by providing you with class definitions

Classes are blueprints that help you build objects according to your own specifications
It takes a little while to get used to the syntax so we’ll provide plenty of examples

12.3 OOP Review

OOP is supported in many languages:

• JAVA and Ruby are relatively pure OOP

• Python supports both procedural and object-oriented programming
• Fortran and MATLAB are mainly procedural, some OOP recently tacked on
• C is a procedural language, while C++ is C with OOP added on top

Let’s cover general OOP concepts before we specialize to Python

12.3.1 Key Concepts

As discussed an earlier lecture, in the OOP paradigm, data and functions are bundled to-
gether into “objects”
An example is a Python list, which not only stores data but also knows how to sort itself, etc.

In [1]: x = [1, 5, 4]
x.sort()
x

Out[1]: [1, 4, 5]

As we now know, sort is a function that is “part of” the list object — and hence called a
method
If we want to make our own types of objects we need to use class definitions
A class definition is a blueprint for a particular class of objects (e.g., lists, strings or complex
numbers)
It describes

• What kind of data the class stores

• What methods it has for acting on these data

An object or instance is a realization of the class, created from the blueprint

• Each instance has its own unique data

• Methods set out in the class definition act on this (and other) data
12.4. DEFINING YOUR OWN CLASSES 157

In Python, the data and methods of an object are collectively referred to as attributes
Attributes are accessed via “dotted attribute notation”

• object_name.data
• object_name.method_name()

In the example

In [2]: x = [1, 5, 4]
x.sort()
x.__class__

Out[2]: list

• x is an object or instance, created from the definition for Python lists, but with its own
particular data
• x.sort() and x.__class__ are two attributes of x
• dir(x) can be used to view all the attributes of x

12.3.2 Why is OOP Useful?

OOP is useful for the same reason that abstraction is useful: for recognizing and exploiting
the common structure
For example,

• a Markov chain consists of a set of states and a collection of transition probabilities for
moving across states
• a general equilibrium theory consists of a commodity space, preferences, technologies,
and an equilibrium definition
• a game consists of a list of players, lists of actions available to each player, player pay-
offs as functions of all players’ actions, and a timing protocol

These are all abstractions that collect together “objects” of the same “type”
Recognizing common structure allows us to employ common tools
In economic theory, this might be a proposition that applies to all games of a certain type
In Python, this might be a method that’s useful for all Markov chains (e.g., simulate)
When we use OOP, the simulate method is conveniently bundled together with the Markov
chain object

12.4 Defining Your Own Classes

Let’s build some simple classes to start off

158 12. OOP II: BUILDING CLASSES

12.4.1 Example: A Consumer Class

First, we’ll build a Consumer class with

• a wealth attribute that stores the consumer’s wealth (data)

• an earn method, where earn(y) increments the consumer’s wealth by y
• a spend method, where spend(x) either decreases wealth by x or returns an error if
insufficient funds exist

Admittedly a little contrived, this example of a class helps us internalize some new syntax
Here’s one implementation

In [3]: class Consumer:

def init(self, w):

"Initialize consumer with w dollars of wealth"
self.wealth = w

def earn(self, y):

"The consumer earns y dollars"
self.wealth += y

def spend(self, x):

"The consumer spends x dollars if feasible"
new_wealth = self.wealth - x
if new_wealth < 0:
print("Insufficent funds")
else:
self.wealth = new_wealth

There’s some special syntax here so let’s step through carefully

• The class keyword indicates that we are building a class

This class defines instance data wealth and three methods: __init__, earn and spend

• wealth is instance data because each consumer we create (each instance of the Con-
sumer class) will have its own separate wealth data

The ideas behind the earn and spend methods were discussed above
Both of these act on the instance data wealth
The __init__ method is a constructor method
Whenever we create an instance of the class, this method will be called automatically
Calling __init__ sets up a “namespace” to hold the instance data — more on this soon
We’ll also discuss the role of self just below
Usage
Here’s an example of usage

In [4]: c1 = Consumer(10) # Create instance with initial wealth 10

c1.spend(5)
c1.wealth
12.4. DEFINING YOUR OWN CLASSES 159

Out[4]: 5

In [5]: c1.earn(15)
c1.spend(100)

Insufficent funds

We can of course create multiple instances each with its own data

In [6]: c1 = Consumer(10)
c2 = Consumer(12)
c2.spend(4)
c2.wealth

Out[6]: 8

In [7]: c1.wealth

Out[7]: 10

In fact, each instance stores its data in a separate namespace dictionary

In [8]: c1.__dict__

Out[8]: {'wealth': 10}

In [9]: c2.__dict__

Out[9]: {'wealth': 8}

When we access or set attributes we’re actually just modifying the dictionary maintained by
the instance
Self
If you look at the Consumer class definition again you’ll see the word self throughout the
code
The rules with self are that

• Any instance data should be prepended with self

– e.g., the earn method references self.wealth rather than just wealth

• Any method defined within the class should have self as its first argument

– e.g., def earn(self, y) rather than just def earn(y)

• Any method referenced within the class should be called as self.method_name

There are no examples of the last rule in the preceding code but we will see some shortly
Details
In this section, we look at some more formal details related to classes and self
160 12. OOP II: BUILDING CLASSES

• You might wish to skip to the next section on first pass of this lecture
• You can return to these details after you’ve familiarized yourself with more examples

Methods actually live inside a class object formed when the interpreter reads the class defini-
tion

In [10]: print(Consumer.dict) # Show dict attribute of class object

{'module': 'main', 'init': <function Consumer.init at 0x7f89127b42f0>, 'earn': <function Consu

Note how the three methods __init__, earn and spend are stored in the class object
Consider the following code

In [11]: c1 = Consumer(10)
c1.earn(10)
c1.wealth

Out[11]: 20

When you call earn via c1.earn(10) the interpreter passes the instance c1 and the argu-
ment 10 to Consumer.earn
In fact, the following are equivalent

• c1.earn(10)
• Consumer.earn(c1, 10)

In the function call Consumer.earn(c1, 10) note that c1 is the first argument
Recall that in the definition of the earn method, self is the first parameter

In [12]: def earn(self, y):

"The consumer earns y dollars"
self.wealth += y

The end result is that self is bound to the instance c1 inside the function call
That’s why the statement self.wealth += y inside earn ends up modifying c1.wealth

12.4.2 Example: The Solow Growth Model

For our next example, let’s write a simple class to implement the Solow growth model
The Solow growth model is a neoclassical growth model where the amount of capital stock
per capita 𝑘𝑡 evolves according to the rule

𝑠𝑧𝑘𝑡𝛼 + (1 − 𝛿)𝑘𝑡
𝑘𝑡+1 = (1)
1+𝑛

Here
12.4. DEFINING YOUR OWN CLASSES 161

• 𝑠 is an exogenously given savings rate

• 𝑧 is a productivity parameter
• 𝛼 is capital’s share of income
• 𝑛 is the population growth rate
• 𝛿 is the depreciation rate

The steady state of the model is the 𝑘 that solves Eq. (1) when 𝑘𝑡+1 = 𝑘𝑡 = 𝑘
Here’s a class that implements this model
Some points of interest in the code are

• An instance maintains a record of its current capital stock in the variable self.k

• The h method implements the right-hand side of Eq. (1)

• The update method uses h to update capital as per Eq. (1)

– Notice how inside update the reference to the local method h is self.h

The methods steady_state and generate_sequence are fairly self-explanatory

In [13]: class Solow:

r"""
Implements the Solow growth model with the update rule

k_{t+1} = [(s z k^α_t) + (1 - δ)k_t] /(1 + n)

"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
δ=0.1, # depreciation rate
α=0.3, # share of labor
z=2.0, # productivity
k=1.0): # current capital stock

self.n, self.s, self.δ, self.α, self.z = n, s, δ, α, z

self.k = k

def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Apply the update rule
return (s * z * self.k**α + (1 - δ) * self.k) / (1 + n)

def update(self):
"Update the current state (i.e., the capital stock)."
self.k = self.h()

def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Compute and return steady state
return ((s * z) / (n + δ))**(1 / (1 - α))

def generate_sequence(self, t):

"Generate and return a time series of length t"
path = []
for i in range(t):
path.append(self.k)
self.update()
return path
162 12. OOP II: BUILDING CLASSES

Here’s a little program that uses the class to compute time series from two different initial
conditions
The common steady state is also plotted for comparison

In [14]: import matplotlib.pyplot as plt

%matplotlib inline

s1 = Solow()
s2 = Solow(k=8.0)

T = 60
fig, ax = plt.subplots(figsize=(9, 6))

# Plot the common steady state value of capital

ax.plot([s1.steady_state()]*T, 'k-', label='steady state')

# Plot time series for each economy

for s in s1, s2:
lb = f'capital series from initial state {s.k}'
ax.plot(s.generate_sequence(T), 'o-', lw=2, alpha=0.6, label=lb)

ax.legend()
plt.show()

12.4.3 Example: A Market

Next, let’s write a class for a simple one good market where agents are price takers
The market consists of the following objects:

• A linear demand curve 𝑄 = 𝑎𝑑 − 𝑏𝑑 𝑝

• A linear supply curve 𝑄 = 𝑎𝑧 + 𝑏𝑧 (𝑝 − 𝑡)
12.4. DEFINING YOUR OWN CLASSES 163

Here

• 𝑝 is price paid by the consumer, 𝑄 is quantity and 𝑡 is a per-unit tax

• Other symbols are demand and supply parameters

The class provides methods to compute various values of interest, including competitive equi-
librium price and quantity, tax revenue raised, consumer surplus and producer surplus
Here’s our implementation

In [15]: from scipy.integrate import quad

class Market:

def init(self, ad, bd, az, bz, tax):

"""
Set up market parameters. All parameters are scalars. See
https://lectures.quantecon.org/py/python_oop.html for interpretation.

"""
self.ad, self.bd, self.az, self.bz, self.tax = ad, bd, az, bz, tax
if ad < az:
raise ValueError('Insufficient demand.')

def price(self):
"Return equilibrium price"
return (self.ad - self.az + self.bz * self.tax) / (self.bd + self.bz)

def quantity(self):
"Compute equilibrium quantity"
return self.ad - self.bd * self.price()

def consumer_surp(self):
"Compute consumer surplus"
# == Compute area under inverse demand function == #
integrand = lambda x: (self.ad / self.bd) - (1 / self.bd) * x
area, error = quad(integrand, 0, self.quantity())
return area - self.price() * self.quantity()

def producer_surp(self):
"Compute producer surplus"
# == Compute area above inverse supply curve, excluding tax == #
integrand = lambda x: -(self.az / self.bz) + (1 / self.bz) * x
area, error = quad(integrand, 0, self.quantity())
return (self.price() - self.tax) * self.quantity() - area

def taxrev(self):
"Compute tax revenue"
return self.tax * self.quantity()

def inverse_demand(self, x):

"Compute inverse demand"
return self.ad / self.bd - (1 / self.bd)* x

def inverse_supply(self, x):

"Compute inverse supply curve"
return -(self.az / self.bz) + (1 / self.bz) * x + self.tax

def inverse_supply_no_tax(self, x):

"Compute inverse supply curve without tax"
return -(self.az / self.bz) + (1 / self.bz) * x

Here’s a sample of usage

In [16]: baseline_params = 15, .5, -2, .5, 3

m = Market(*baseline_params)
print("equilibrium price = ", m.price())
164 12. OOP II: BUILDING CLASSES

equilibrium price = 18.5

In [17]: print("consumer surplus = ", m.consumer_surp())

consumer surplus = 33.0625

Here’s a short program that uses this class to plot an inverse demand curve together with in-
verse supply curves with and without taxes

In [18]: import numpy as np

# Baseline ad, bd, az, bz, tax

baseline_params = 15, .5, -2, .5, 3
m = Market(*baseline_params)

q_max = m.quantity() * 2
q_grid = np.linspace(0.0, q_max, 100)
pd = m.inverse_demand(q_grid)
ps = m.inverse_supply(q_grid)
psno = m.inverse_supply_no_tax(q_grid)

fig, ax = plt.subplots()
ax.plot(q_grid, pd, lw=2, alpha=0.6, label='demand')
ax.plot(q_grid, ps, lw=2, alpha=0.6, label='supply')
ax.plot(q_grid, psno, '--k', lw=2, alpha=0.6, label='supply without tax')
ax.set_xlabel('quantity', fontsize=14)
ax.set_xlim(0, q_max)
ax.set_ylabel('price', fontsize=14)
ax.legend(loc='lower right', frameon=False, fontsize=14)
plt.show()

The next program provides a function that

• takes an instance of Market as a parameter

12.4. DEFINING YOUR OWN CLASSES 165

• computes dead weight loss from the imposition of the tax

In [19]: def deadw(m):

"Computes deadweight loss for market m."
# == Create analogous market with no tax == #
m_no_tax = Market(m.ad, m.bd, m.az, m.bz, 0)
# == Compare surplus, return difference == #
surp1 = m_no_tax.consumer_surp() + m_no_tax.producer_surp()
surp2 = m.consumer_surp() + m.producer_surp() + m.taxrev()
return surp1 - surp2

Here’s an example of usage

In [20]: baseline_params = 15, .5, -2, .5, 3

m = Market(*baseline_params)
deadw(m) # Show deadweight loss

Out[20]: 1.125

12.4.4 Example: Chaos

Let’s look at one more example, related to chaotic dynamics in nonlinear systems
One simple transition rule that can generate complex dynamics is the logistic map

𝑥𝑡+1 = 𝑟𝑥𝑡 (1 − 𝑥𝑡 ), 𝑥0 ∈ [0, 1], 𝑟 ∈ [0, 4] (2)

Let’s write a class for generating time series from this model
Here’s one implementation

In [21]: class Chaos:

"""
Models the dynamical system with :math:`x_{t+1} = r x_t (1 - x_t)`
"""
def __init__(self, x0, r):
"""
Initialize with state x0 and parameter r
"""
self.x, self.r = x0, r

def update(self):
"Apply the map to update state."
self.x = self.r * self.x *(1 - self.x)

def generate_sequence(self, n):

"Generate and return a sequence of length n."
path = []
for i in range(n):
path.append(self.x)
self.update()
return path

Here’s an example of usage

In [22]: ch = Chaos(0.1, 4.0) # x0 = 0.1 and r = 0.4

ch.generate_sequence(5) # First 5 iterates

Out[22]: [0.1, 0.36000000000000004, 0.9216, 0.28901376000000006, 0.8219392261226498]

166 12. OOP II: BUILDING CLASSES

This piece of code plots a longer trajectory

In [23]: ch = Chaos(0.1, 4.0)

ts_length = 250

fig, ax = plt.subplots()
ax.set_xlabel('$t$', fontsize=14)
ax.set_ylabel('$x_t$', fontsize=14)
x = ch.generate_sequence(ts_length)
ax.plot(range(ts_length), x, 'bo-', alpha=0.5, lw=2, label='$x_t$')
plt.show()

The next piece of code provides a bifurcation diagram

In [24]: fig, ax = plt.subplots()

ch = Chaos(0.1, 4)
r = 2.5
while r < 4:
ch.r = r
t = ch.generate_sequence(1000)[950:]
ax.plot([r] * len(t), t, 'b.', ms=0.6)
r = r + 0.005

ax.set_xlabel('$r$', fontsize=16)
plt.show()
12.5. SPECIAL METHODS 167

On the horizontal axis is the parameter 𝑟 in Eq. (2)

The vertical axis is the state space [0, 1]
For each 𝑟 we compute a long time series and then plot the tail (the last 50 points)
The tail of the sequence shows us where the trajectory concentrates after settling down to
some kind of steady state, if a steady state exists
Whether it settles down, and the character of the steady state to which it does settle down,
depend on the value of 𝑟
For 𝑟 between about 2.5 and 3, the time series settles into a single fixed point plotted on the
vertical axis
For 𝑟 between about 3 and 3.45, the time series settles down to oscillating between the two
values plotted on the vertical axis
For 𝑟 a little bit higher than 3.45, the time series settles down to oscillating among the four
values plotted on the vertical axis
Notice that there is no value of 𝑟 that leads to a steady state oscillating among three values

12.5 Special Methods

Python provides special methods with which some neat tricks can be performed
For example, recall that lists and tuples have a notion of length and that this length can be
queried via the len function

In [25]: x = (10, 20)

len(x)
168 12. OOP II: BUILDING CLASSES

Out[25]: 2

If you want to provide a return value for the len function when applied to your user-defined
object, use the __len__ special method

In [26]: class Foo:

def __len__(self):
return 42

Now we get

In [27]: f = Foo()
len(f)

Out[27]: 42

A special method we will use regularly is the call method

This method can be used to make your instances callable, just like functions

In [28]: class Foo:

def call(self, x):

return x + 42

After running we get

In [29]: f = Foo()
f(8) # Exactly equivalent to f.__call__(8)

Out[29]: 50

Exercise 1 provides a more useful example

12.6 Exercises

12.6.1 Exercise 1

The empirical cumulative distribution function (ecdf) corresponding to a sample {𝑋𝑖 }𝑛𝑖=1 is
defined as

1 𝑛
𝐹𝑛 (𝑥) ∶= ∑ 1{𝑋𝑖 ≤ 𝑥} (𝑥 ∈ R) (3)
𝑛 𝑖=1

Here 1{𝑋𝑖 ≤ 𝑥} is an indicator function (one if 𝑋𝑖 ≤ 𝑥 and zero otherwise) and hence 𝐹𝑛 (𝑥)
is the fraction of the sample that falls below 𝑥
The Glivenko–Cantelli Theorem states that, provided that the sample is IID, the ecdf 𝐹𝑛 con-
verges to the true distribution function 𝐹
Implement 𝐹𝑛 as a class called ECDF, where
12.7. SOLUTIONS 169

• A given sample {𝑋𝑖 }𝑛𝑖=1 are the instance data, stored as self.observations
• The class implements a __call__ method that returns 𝐹𝑛 (𝑥) for any 𝑥

Your code should work as follows (modulo randomness)

from random import uniform

samples = [uniform(0, 1) for i in range(10)]

F = ECDF(samples)
F(0.5) # Evaluate ecdf at x = 0.5

F.observations = [uniform(0, 1) for i in range(1000)]

F(0.5)

Aim for clarity, not efficiency

12.6.2 Exercise 2

In an earlier exercise, you wrote a function for evaluating polynomials

This exercise is an extension, where the task is to build a simple class called Polynomial for
representing and manipulating polynomial functions such as

𝑁
𝑝(𝑥) = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥2 + ⋯ 𝑎𝑁 𝑥𝑁 = ∑ 𝑎𝑛 𝑥𝑛 (𝑥 ∈ R) (4)
𝑛=0

The instance data for the class Polynomial will be the coefficients (in the case of Eq. (4),
the numbers 𝑎0 , … , 𝑎𝑁 )
Provide methods that

1. Evaluate the polynomial Eq. (4), returning 𝑝(𝑥) for any 𝑥

2. Differentiate the polynomial, replacing the original coefficients with those of its deriva-
tive 𝑝′

Avoid using any import statements

12.7 Solutions

12.7.1 Exercise 1
In [30]: class ECDF:

def init(self, observations):

self.observations = observations

def call(self, x):

counter = 0.0
for obs in self.observations:
if obs <= x:
counter += 1
return counter / len(self.observations)
170 12. OOP II: BUILDING CLASSES

In [31]: # == test == #

from random import uniform

samples = [uniform(0, 1) for i in range(10)]

F = ECDF(samples)

print(F(0.5)) # Evaluate ecdf at x = 0.5

F.observations = [uniform(0, 1) for i in range(1000)]

print(F(0.5))

0.4
0.484

12.7.2 Exercise 2
In [32]: class Polynomial:

def init(self, coefficients):

"""
Creates an instance of the Polynomial class representing

p(x) = a_0 x^0 + ... + a_N x^N,

where a_i = coefficients[i].

"""
self.coefficients = coefficients

def call(self, x):

"Evaluate the polynomial at x."
y = 0
for i, a in enumerate(self.coefficients):
y += a * x**i
return y

def differentiate(self):
"Reset self.coefficients to those of p' instead of p."
new_coefficients = []
for i, a in enumerate(self.coefficients):
new_coefficients.append(i * a)
# Remove the first element, which is zero
del new_coefficients[0]
# And reset coefficients data to new values
self.coefficients = new_coefficients
return new_coefficients
13

OOP III: Samuelson Multiplier

Accelerator

13.1 Contents

• Overview 13.2
• Details 13.3
• Implementation 13.4
• Stochastic Shocks 13.5
• Government Spending 13.6
• Wrapping Everything Into a Class 13.7
• Using the LinearStateSpace Class 13.8
• Pure Multiplier Model 13.9
• Summary 13.10

Co-author: Natasha Watkins

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

13.2 Overview

This lecture creates non-stochastic and stochastic versions of Paul Samuelson’s celebrated
multiplier accelerator model [115]
In doing so, we extend the example of the Solow model class in our second OOP lecture
Our objectives are to

• provide a more detailed example of OOP and classes

• review a famous model
• review linear difference equations, both deterministic and stochastic

171
172 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.2.1 Samuelson’s Model

Samuelson used a second-order linear difference equation to represent a model of national out-
put based on three components:

• a national output identity asserting that national outcome is the sum of consumption
plus investment plus government purchases
• a Keynesian consumption function asserting that consumption at time 𝑡 is equal to a
constant times national output at time 𝑡 − 1
• an investment accelerator asserting that investment at time 𝑡 equals a constant called
the accelerator coefficient times the difference in output between period 𝑡 − 1 and 𝑡 − 2
• the idea that consumption plus investment plus government purchases constitute aggre-
gate demand, which automatically calls forth an equal amount of aggregate supply

(To read about linear difference equations see here or chapter IX of [118])
Samuelson used the model to analyze how particular values of the marginal propensity to
consume and the accelerator coefficient might give rise to transient business cycles in national
output
Possible dynamic properties include

• smooth convergence to a constant level of output

• damped business cycles that eventually converge to a constant level of output
• persistent business cycles that neither dampen nor explode

Later we present an extension that adds a random shock to the right side of the national in-
come identity representing random fluctuations in aggregate demand
This modification makes national output become governed by a second-order stochastic linear
difference equation that, with appropriate parameter values, gives rise to recurrent irregular
business cycles
(To read about stochastic linear difference equations see chapter XI of [118])

13.3 Details

Let’s assume that

• {𝐺𝑡 } is a sequence of levels of government expenditures – we’ll start by setting 𝐺𝑡 = 𝐺

for all 𝑡

• {𝐶𝑡 } is a sequence of levels of aggregate consumption expenditures, a key endogenous

variable in the model

• {𝐼𝑡 } is a sequence of rates of investment, another key endogenous variable

• {𝑌𝑡 } is a sequence of levels of national income, yet another endogenous variable

• 𝑎 is the marginal propensity to consume in the Keynesian consumption function 𝐶𝑡 =

𝑎𝑌𝑡−1 + 𝛾
13.3. DETAILS 173

• 𝑏 is the “accelerator coefficient” in the “investment accelerator” 𝐼_𝑡 = 𝑏(𝑌 _𝑡 − 1 −

𝑌 _𝑡 − 2)

• {𝜖𝑡 } is an IID sequence standard normal random variables

• 𝜎 ≥ 0 is a “volatility” parameter — setting 𝜎 = 0 recovers the non-stochastic case that

we’ll start with

The model combines the consumption function

𝐶𝑡 = 𝑎𝑌𝑡−1 + 𝛾 (1)

with the investment accelerator

𝐼𝑡 = 𝑏(𝑌𝑡−1 − 𝑌𝑡−2 ) (2)

and the national income identity

𝑌𝑡 = 𝐶𝑡 + 𝐼𝑡 + 𝐺𝑡 (3)

• The parameter 𝑎 is peoples’ marginal propensity to consume out of income - equation

Eq. (1) asserts that people consume a fraction of math:a in (0,1) of each additional dol-
lar of income
• The parameter 𝑏 > 0 is the investment accelerator coefficient - equation Eq. (2) asserts
that people invest in physical capital when income is increasing and disinvest when it is
decreasing

Equations Eq. (1), Eq. (2), and Eq. (3) imply the following second-order linear difference
equation for national income:

𝑌𝑡 = (𝑎 + 𝑏)𝑌𝑡−1 − 𝑏𝑌𝑡−2 + (𝛾 + 𝐺𝑡 )

𝑌𝑡 = 𝜌1 𝑌𝑡−1 + 𝜌2 𝑌𝑡−2 + (𝛾 + 𝐺𝑡 ) (4)

where 𝜌1 = (𝑎 + 𝑏) and 𝜌2 = −𝑏
To complete the model, we require two initial conditions
If the model is to generate time series for 𝑡 = 0, … , 𝑇 , we require initial values

̄ ,
𝑌−1 = 𝑌−1 ̄
𝑌−2 = 𝑌−2

We’ll ordinarily set the parameters (𝑎, 𝑏) so that starting from an arbitrary pair of initial con-
̄ , 𝑌−2
ditions (𝑌−1 ̄ ), national income 𝑌 _𝑡 converges to a constant value as 𝑡 becomes large

We are interested in studying

• the transient fluctuations in 𝑌𝑡 as it converges to its steady state level

174 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

• the rate at which it converges to a steady state level

The deterministic version of the model described so far — meaning that no random shocks
hit aggregate demand — has only transient fluctuations
We can convert the model to one that has persistent irregular fluctuations by adding a ran-
dom shock to aggregate demand

13.3.1 Stochastic Version of the Model

We create a random or stochastic version of the model by adding a random process of

shocks or disturbances {𝜎𝜖𝑡 } to the right side of equation Eq. (4), leading to the second-
order scalar linear stochastic difference equation:

𝑌𝑡 = 𝐺𝑡 + 𝑎(1 − 𝑏)𝑌𝑡−1 − 𝑎𝑏𝑌𝑡−2 + 𝜎𝜖𝑡 (5)

13.3.2 Mathematical Analysis of the Model

To get started, let’s set 𝐺𝑡 ≡ 0, 𝜎 = 0, and 𝛾 = 0

Then we can write equation Eq. (5) as

𝑌𝑡 = 𝜌1 𝑌𝑡−1 + 𝜌2 𝑌𝑡−2

𝑌𝑡+2 − 𝜌1 𝑌𝑡+1 − 𝜌2 𝑌𝑡 = 0 (6)

To discover the properties of the solution of Eq. (6), it is useful first to form the characteris-
tic polynomial for Eq. (6):

𝑧 2 − 𝜌1 𝑧 − 𝜌 2 (7)

where 𝑧 is possibly a complex number

We want to find the two zeros (a.k.a. roots) – namely 𝜆1 , 𝜆2 – of the characteristic polyno-
mial
These are two special values of 𝑧, say 𝑧 = 𝜆1 and 𝑧 = 𝜆2 , such that if we set 𝑧 equal to one of
these values in expression Eq. (7), the characteristic polynomial Eq. (7) equals zero:

𝑧2 − 𝜌1 𝑧 − 𝜌2 = (𝑧 − 𝜆1 )(𝑧 − 𝜆2 ) = 0 (8)

Equation Eq. (8) is said to factor the characteristic polynomial

When the roots are complex, they will occur as a complex conjugate pair
When the roots are complex, it is convenient to represent them in the polar form

𝜆1 = 𝑟𝑒𝑖𝜔 , 𝜆2 = 𝑟𝑒−𝑖𝜔
13.3. DETAILS 175

where 𝑟 is the amplitude of the complex number and 𝜔 is its angle or phase
These can also be represented as

𝜆1 = 𝑟(𝑐𝑜𝑠(𝜔) + 𝑖 sin(𝜔))

𝜆2 = 𝑟(𝑐𝑜𝑠(𝜔) − 𝑖 sin(𝜔))

(To read about the polar form, see here)

Given initial conditions 𝑌−1 , 𝑌−2 , we want to generate a solution of the difference equation
Eq. (6)
It can be represented as

𝑌𝑡 = 𝜆𝑡1 𝑐1 + 𝜆𝑡2 𝑐2

where 𝑐1 and 𝑐2 are constants that depend on the two initial conditions and on 𝜌1 , 𝜌2
When the roots are complex, it is useful to pursue the following calculations
Notice that

𝑌𝑡 = 𝑐1 (𝑟𝑒𝑖𝜔 )𝑡 + 𝑐2 (𝑟𝑒−𝑖𝜔 )𝑡
= 𝑐1 𝑟𝑡 𝑒𝑖𝜔𝑡 + 𝑐2 𝑟𝑡 𝑒−𝑖𝜔𝑡
= 𝑐1 𝑟𝑡 [cos(𝜔𝑡) + 𝑖 sin(𝜔𝑡)] + 𝑐2 𝑟𝑡 [cos(𝜔𝑡) − 𝑖 sin(𝜔𝑡)]
= (𝑐1 + 𝑐2 )𝑟𝑡 cos(𝜔𝑡) + 𝑖(𝑐1 − 𝑐2 )𝑟𝑡 sin(𝜔𝑡)

The only way that 𝑌𝑡 can be a real number for each 𝑡 is if 𝑐1 + 𝑐2 is a real number and 𝑐1 − 𝑐2
is an imaginary number
This happens only when 𝑐1 and 𝑐2 are complex conjugates, in which case they can be written
in the polar forms

𝑐1 = 𝑣𝑒𝑖𝜃 , 𝑐2 = 𝑣𝑒−𝑖𝜃

So we can write

𝑌𝑡 = 𝑣𝑒𝑖𝜃 𝑟𝑡 𝑒𝑖𝜔𝑡 + 𝑣𝑒−𝑖𝜃 𝑟𝑡 𝑒−𝑖𝜔𝑡

= 𝑣𝑟𝑡 [𝑒𝑖(𝜔𝑡+𝜃) + 𝑒−𝑖(𝜔𝑡+𝜃) ]
= 2𝑣𝑟𝑡 cos(𝜔𝑡 + 𝜃)

where 𝑣 and 𝜃 are constants that must be chosen to satisfy initial conditions for 𝑌−1 , 𝑌−2
This formula shows that when the roots are complex, 𝑌𝑡 displays oscillations with period
𝑝̌ = 2𝜋
𝜔 and damping factor 𝑟

We say that 𝑝̌ is the period because in that amount of time the cosine wave cos(𝜔𝑡 + 𝜃) goes
through exactly one complete cycles
(Draw a cosine function to convince yourself of this please)
176 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

Remark: Following [115], we want to choose the parameters 𝑎, 𝑏 of the model so that the ab-
solute values (of the possibly complex) roots 𝜆1 , 𝜆2 of the characteristic polynomial are both
strictly less than one:

|𝜆𝑗 | < 1 for 𝑗 = 1, 2

Remark: When both roots 𝜆1 , 𝜆2 of the characteristic polynomial have absolute values
strictly less than one, the absolute value of the larger one governs the rate of convergence to
the steady state of the non stochastic version of the model

13.3.3 Things This Lecture Does

We write a function to generate simulations of a {𝑌𝑡 } sequence as a function of time

The function requires that we put in initial conditions for 𝑌−1 , 𝑌−2
The function checks that 𝑎, 𝑏 are set so that 𝜆1 , 𝜆2 are less than
unity in absolute value (also called “modulus”)
The function also tells us whether the roots are complex, and, if they are complex, returns
both their real and complex parts
If the roots are both real, the function returns their values
We use our function written to simulate paths that are stochastic (when 𝜎 > 0)
We have written the function in a way that allows us to input {𝐺𝑡 } paths of a few simple
forms, e.g.,

• one time jumps in 𝐺 at some time

• a permanent jump in 𝐺 that occurs at some time

We proceed to use the Samuelson multiplier-accelerator model as a laboratory to make a sim-

ple OOP example
The “state” that determines next period’s 𝑌𝑡+1 is now not just the current value 𝑌𝑡 but also
the once lagged value 𝑌𝑡−1
This involves a little more bookkeeping than is required in the Solow model class definition
We use the Samuelson multiplier-accelerator model as a vehicle for teaching how we can grad-
ually add more features to the class
We want to have a method in the class that automatically generates a simulation, either non-
stochastic (𝜎 = 0) or stochastic (𝜎 > 0)
We also show how to map the Samuelson model into a simple instance of the Lin-
earStateSpace class described here
We can use a LinearStateSpace instance to do various things that we did above with our
homemade function and class
Among other things, we show by example that the eigenvalues of the matrix 𝐴 that we use to
form the instance of the LinearStateSpace class for the Samuelson model equal the roots
of the characteristic polynomial Eq. (7) for the Samuelson multiplier accelerator model
13.4. IMPLEMENTATION 177

Here is the formula for the matrix 𝐴 in the linear state space system in the case that govern-
ment expenditures are a constant 𝐺:

1 0 0
𝐴 = ⎢𝛾 + 𝐺 𝜌1 𝜌2 ⎤
⎡
⎥
⎣ 0 1 0 ⎦

13.4 Implementation

We’ll start by drawing an informative graph from page 189 of [118]

In [2]: import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

def param_plot():

"""this function creates the graph on page 189 of Sargent Macroeconomic Theory, second edition, 19

fig, ax = plt.subplots(figsize=(10, 6))

ax.set_aspect('equal')

# Set axis
xmin, ymin = -3, -2
xmax, ymax = -xmin, -ymin
plt.axis([xmin, xmax, ymin, ymax])

# Set axis labels

ax.set(xticks=[], yticks=[])
ax.set_xlabel(r'$\rho_2$', fontsize=16)
ax.xaxis.set_label_position('top')
ax.set_ylabel(r'$\rho_1$', rotation=0, fontsize=16)
ax.yaxis.set_label_position('right')

# Draw (t1, t2) points

ρ1 = np.linspace(-2, 2, 100)
ax.plot(ρ1, -abs(ρ1) + 1, c='black')
ax.plot(ρ1, np.ones_like(ρ1) * -1, c='black')
ax.plot(ρ1, -(ρ1**2 / 4), c='black')

# Turn normal axes off

for spine in ['left', 'bottom', 'top', 'right']:
ax.spines[spine].set_visible(False)

# Add arrows to represent axes

axes_arrows = {'arrowstyle': '<|-|>', 'lw': 1.3}
ax.annotate('', xy=(xmin, 0), xytext=(xmax, 0), arrowprops=axes_arrows)
ax.annotate('', xy=(0, ymin), xytext=(0, ymax), arrowprops=axes_arrows)

# Annotate the plot with equations

plot_arrowsl = {'arrowstyle': '-|>', 'connectionstyle': "arc3, rad=-0.2"}
plot_arrowsr = {'arrowstyle': '-|>', 'connectionstyle': "arc3, rad=0.2"}
ax.annotate(r'$\rho_1 + \rho_2 < 1$', xy=(0.5, 0.3), xytext=(0.8, 0.6),
arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'$\rho_1 + \rho_2 = 1$', xy=(0.38, 0.6), xytext=(0.6, 0.8),
arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'$\rho_2 < 1 + \rho_1$', xy=(-0.5, 0.3), xytext=(-1.3, 0.6),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'$\rho_2 = 1 + \rho_1$', xy=(-0.38, 0.6), xytext=(-1, 0.8),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'$\rho_2 = -1$', xy=(1.5, -1), xytext=(1.8, -1.3),
arrowprops=plot_arrowsl, fontsize='12')
ax.annotate(r'${\rho_1}^2 + 4\rho_2 = 0$', xy=(1.15, -0.35),
xytext=(1.5, -0.3), arrowprops=plot_arrowsr, fontsize='12')
ax.annotate(r'${\rho_1}^2 + 4\rho_2 < 0$', xy=(1.4, -0.7),
xytext=(1.8, -0.6), arrowprops=plot_arrowsr, fontsize='12')
178 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

# Label categories of solutions

ax.text(1.5, 1, 'Explosive\n growth', ha='center', fontsize=16)
ax.text(-1.5, 1, 'Explosive\n oscillations', ha='center', fontsize=16)
ax.text(0.05, -1.5, 'Explosive oscillations', ha='center', fontsize=16)
ax.text(0.09, -0.5, 'Damped oscillations', ha='center', fontsize=16)

# Add small marker to y-axis

ax.axhline(y=1.005, xmin=0.495, xmax=0.505, c='black')
ax.text(-0.12, -1.12, '-1', fontsize=10)
ax.text(-0.12, 0.98, '1', fontsize=10)

return fig

param_plot()
plt.show()

The graph portrays regions in which the (𝜆1 , 𝜆2 ) root pairs implied by the (𝜌1 = (𝑎 + 𝑏), 𝜌2 =
−𝑏) difference equation parameter pairs in the Samuelson model are such that:

• (𝜆1 , 𝜆2 ) are complex with modulus less than 1 - in this case, the {𝑌𝑡 } sequence displays
damped oscillations
• (𝜆1 , 𝜆2 ) are both real, but one is strictly greater than 1 - this leads to explosive growth
• (𝜆1 , 𝜆2 ) are both real, but one is strictly less than −1 - this leads to explosive oscilla-
tions
• (𝜆1 , 𝜆2 ) are both real and both are less than 1 in absolute value - in this case, there is
smooth convergence to the steady state without damped cycles

Later we’ll present the graph with a red mark showing the particular point implied by the
setting of (𝑎, 𝑏)
13.4. IMPLEMENTATION 179

13.4.1 Function to Describe Implications of Characteristic Polynomial

In [3]: def categorize_solution(ρ1, ρ2):
"""this function takes values of ρ1 and ρ2 and uses them to classify the type of solution"""

discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 > 1 + ρ1 or ρ2 < -1:
print('Explosive oscillations')
elif ρ1 + ρ2 > 1:
print('Explosive growth')
elif discriminant < 0:
print('Roots are complex with modulus less than one; therefore damped oscillations')
else:
print('Roots are real and absolute values are less than one; therefore get smooth convergence

In [4]: ### Test the categorize_solution function

categorize_solution(1.3, -.4)

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state

13.4.2 Function for Plotting Paths

A useful function for our work below is

In [5]: def plot_y(function=None):

"""function plots path of Y_t"""
plt.subplots(figsize=(10, 6))
plt.plot(function)
plt.xlabel('Time $t$')
plt.ylabel('$Y_t$', rotation=0)
plt.grid()
plt.show()

13.4.3 Manual or “by hand” Root Calculations

The following function calculates roots of the characteristic polynomial using high school al-
gebra
(We’ll calculate the roots in other ways later)
The function also plots a 𝑌𝑡 starting from initial conditions that we set

In [6]: from cmath import sqrt

##=== This is a 'manual' method ===#

def y_nonstochastic(y_0=100, y_1=80, α=.92, β=.5, γ=10, n=80):

"""Takes values of parameters and computes the roots of characteristic polynomial.

It tells whether they are real or complex and whether they are less than unity in absolute valu
It also computes a simulation of length n starting from the two given initial conditions for na

roots = []

ρ1 = α + β
ρ2 = -β

print(f'ρ_1 is {ρ1}')
print(f'ρ_2 is {ρ2}')

discriminant = ρ1 ** 2 + 4 * ρ2
180 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

if discriminant == 0:
roots.append(-ρ1 / 2)
print('Single real root: ')
print(''.join(str(roots)))
elif discriminant > 0:
roots.append((-ρ1 + sqrt(discriminant).real) / 2)
roots.append((-ρ1 - sqrt(discriminant).real) / 2)
print('Two real roots: ')
print(''.join(str(roots)))
else:
roots.append((-ρ1 + sqrt(discriminant)) / 2)
roots.append((-ρ1 - sqrt(discriminant)) / 2)
print('Two complex roots: ')
print(''.join(str(roots)))

if all(abs(root) < 1 for root in roots):

print('Absolute values of roots are less than one')
else:
print('Absolute values of roots are not less than one')

def transition(x, t): return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ

y_t = [y_0, y_1]

for t in range(2, n):

y_t.append(transition(y_t, t))

return y_t

plot_y(y_nonstochastic())

ρ_1 is 1.42
ρ_2 is -0.5
Two real roots:
[-0.6459687576256715, -0.7740312423743284]
Absolute values of roots are less than one
13.4. IMPLEMENTATION 181

13.4.4 Reverse-Engineering Parameters to Generate Damped Cycles

The next cell writes code that takes as inputs the modulus 𝑟 and phase 𝜙 of a conjugate pair
of complex numbers in polar form

𝜆1 = 𝑟 exp(𝑖𝜙), 𝜆2 = 𝑟 exp(−𝑖𝜙)

• The code assumes that these two complex numbers are the roots of the characteristic
polynomial
• It then reverse-engineers (𝑎, 𝑏) and (𝜌1 , 𝜌2 ), pairs that would generate those roots

In [7]: ### code to reverse-engineer a cycle

### y_t = r^t (c_1 cos(� t) + c2 sin(� t))
###

import cmath
import math

def f(r, �):

"""
Takes modulus r and angle � of complex number r exp(j �)
and creates ρ1 and ρ2 of characteristic polynomial for which
r exp(j �) and r exp(- j �) are complex roots.

Returns the multiplier coefficient a and the accelerator coefficient b

that verifies those roots.
"""
g1 = cmath.rect(r, �) # Generate two complex roots
g2 = cmath.rect(r, -�)
ρ1 = g1 + g2 # Implied ρ1, ρ2
ρ2 = -g1 * g2
b = -ρ2 # Reverse-engineer a and b that validate these
a = ρ1 - b
return ρ1, ρ2, a, b

## Now let's use the function in an example

## Here are the example parameters

r = .95
period = 10 # Length of cycle in units of time
� = 2 * math.pi/period

## Apply the function

ρ1, ρ2, a, b = f(r, �)

print(f"a, b = {a}, {b}")

print(f"ρ1, ρ2 = {ρ1}, {ρ2}")

a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ρ1, ρ2 = (1.5371322893124+0j), (-0.9024999999999999+0j)

In [8]: ## Print the real components of ρ1 and ρ2

ρ1 = ρ1.real
ρ2 = ρ2.real

ρ1, ρ2

Out[8]: (1.5371322893124, -0.9024999999999999)

182 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.4.5 Root Finding Using Numpy

Here we’ll use numpy to compute the roots of the characteristic polynomial

In [9]: r1, r2 = np.roots([1, -ρ1, -ρ2])

p1 = cmath.polar(r1)
p2 = cmath.polar(r2)

print(f"r, � = {r}, {�}")

print(f"p1, p2 = {p1}, {p2}")
# print(f"g1, g2 = {g1}, {g2}")

print(f"a, b = {a}, {b}")

print(f"ρ1, ρ2 = {ρ1}, {ρ2}")

r, � = 0.95, 0.6283185307179586
p1, p2 = (0.95, 0.6283185307179586), (0.95, -0.6283185307179586)
a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ρ1, ρ2 = 1.5371322893124, -0.9024999999999999

In [10]: ##=== This method uses numpy to calculate roots ===#

def y_nonstochastic(y_0=100, y_1=80, α=.9, β=.8, γ=10, n=80):

""" Rather than computing the roots of the characteristic polynomial by hand as we did earlier, t
enlists numpy to do the work for us """

# Useful constants
ρ1 = α + β
ρ2 = -β

categorize_solution(ρ1, ρ2)

# Find roots of polynomial

roots = np.roots([1, -ρ1, -ρ2])
print(f'Roots are {roots}')

# Check if real or complex

if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one

if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Define transition equation

def transition(x, t): return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ

# Set initial conditions

y_t = [y_0, y_1]

# Generate y_t series

for t in range(2, n):
y_t.append(transition(y_t, t))

return y_t

plot_y(y_nonstochastic())

Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.85+0.27838822j 0.85-0.27838822j]
Roots are complex
13.4. IMPLEMENTATION 183

Roots are less than one

13.4.6 Reverse-Engineered Complex Roots: Example

The next cell studies the implications of reverse-engineered complex roots

We’ll generate an undamped cycle of period 10

In [11]: r = 1 # generates undamped, nonexplosive cycles

period = 10 # length of cycle in units of time

� = 2 * math.pi/period

## Apply the reverse-engineering function f

ρ1, ρ2, a, b = f(r, �)

a = a.real # drop the imaginary part so that it is a valid input into y_nonstochastic
b = b.real

print(f"a, b = {a}, {b}")

ytemp = y_nonstochastic(α=a, β=b, y_0=20, y_1=30)

plot_y(ytemp)

a, b = 0.6180339887498949, 1.0
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.80901699+0.58778525j 0.80901699-0.58778525j]
Roots are complex
Roots are less than one
184 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.4.7 Digression: Using Sympy to Find Roots

We can also use sympy to compute analytic formulas for the roots

In [12]: import sympy

from sympy import Symbol, init_printing
init_printing()

r1 = Symbol("ρ_1")
r2 = Symbol("ρ_2")
z = Symbol("z")

sympy.solve(z**2 - r1*z - r2, z)

Out[12]:

𝜌1 √𝜌12 + 4𝜌2 𝜌1 √𝜌12 + 4𝜌2

[ − , + ]
2 2 2 2

𝜌1 1 𝜌1 1
[ − √𝜌12 + 4𝜌2 , + √𝜌12 + 4𝜌2 ]
2 2 2 2

In [13]: a = Symbol("α")
b = Symbol("β")
r1 = a + b
r2 = -b

sympy.solve(z**2 - r1*z - r2, z)

Out[13]:
13.5. STOCHASTIC SHOCKS 185

𝛼 𝛽 √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽 𝛼 𝛽 √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽

[ + − , + + ]
2 2 2 2 2 2

𝛼 𝛽 1 𝛼 𝛽 1
[ + − √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽, + + √𝛼2 + 2𝛼𝛽 + 𝛽 2 − 4𝛽]
2 2 2 2 2 2

13.5 Stochastic Shocks

Now we’ll construct some code to simulate the stochastic version of the model that emerges
when we add a random shock process to aggregate demand

In [14]: def y_stochastic(y_0=0, y_1=0, α=0.8, β=0.2, γ=10, n=100, σ=5):

"""This function takes parameters of a stochastic version of the model and proceeds to analyze
the roots of the characteristic polynomial and also generate a simulation"""

# Useful constants
ρ1 = α + β
ρ2 = -β

# Categorize solution
categorize_solution(ρ1, ρ2)

# Find roots of polynomial

roots = np.roots([1, -ρ1, -ρ2])
print(roots)

# Check if real or complex

if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one

if all(abs(root) < 1 for root in roots):
print('Roots are less than one')
else:
print('Roots are not less than one')

# Generate shocks
� = np.random.normal(0, 1, n)

# Define transition equation

def transition(x, t): return ρ1 * \
x[t - 1] + ρ2 * x[t - 2] + γ + σ * �[t]

# Set initial conditions

y_t = [y_0, y_1]

# Generate y_t series

for t in range(2, n):
y_t.append(transition(y_t, t))

return y_t

plot_y(y_stochastic())

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
186 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

Let’s do a simulation in which there are shocks and the characteristic polynomial has complex
roots

In [15]: r = .97

period = 10 # length of cycle in units of time

� = 2 * math.pi/period

### apply the reverse-engineering function f

ρ1, ρ2, a, b = f(r, �)

a = a.real # drop the imaginary part so that it is a valid input into y_nonstochastic
b = b.real

print(f"a, b = {a}, {b}")

plot_y(y_stochastic(y_0=40, y_1 = 42, α=a, β=b, σ=2, n=100))

a, b = 0.6285929690873979, 0.9409000000000001
Roots are complex with modulus less than one; therefore damped oscillations
[0.78474648+0.57015169j 0.78474648-0.57015169j]
Roots are complex
Roots are less than one
13.6. GOVERNMENT SPENDING 187

13.6 Government Spending

This function computes a response to either a permanent or one-off increase in government

expenditures

In [16]: def y_stochastic_g(y_0=20,

y_1=20,
α=0.8,
β=0.2,
γ=10,
n=100,
σ=2,
g=0,
g_t=0,
duration='permanent'):

"""This program computes a response to a permanent increase in government expenditures that occur
at time 20"""

# Useful constants
ρ1 = α + β
ρ2 = -β

# Categorize solution
categorize_solution(ρ1, ρ2)

# Find roots of polynomial

roots = np.roots([1, -ρ1, -ρ2])
print(roots)

# Check if real or complex

if all(isinstance(root, complex) for root in roots):
print('Roots are complex')
else:
print('Roots are real')

# Check if roots are less than one

if all(abs(root) < 1 for root in roots):
188 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

print('Roots are less than one')

else:
print('Roots are not less than one')

# Generate shocks
� = np.random.normal(0, 1, n)

def transition(x, t, g):

# Non-stochastic - separated to avoid generating random series when not needed

if σ == 0:
return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ + g

# Stochastic
else:
� = np.random.normal(0, 1, n)
return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ + g + σ * �[t]

# Create list and set initial conditions

y_t = [y_0, y_1]

# Generate y_t series

for t in range(2, n):

# No government spending
if g == 0:
y_t.append(transition(y_t, t))

# Government spending (no shock)

elif g != 0 and duration == None:
y_t.append(transition(y_t, t))

# Permanent government spending shock

elif duration == 'permanent':
if t < g_t:
y_t.append(transition(y_t, t, g=0))
else:
y_t.append(transition(y_t, t, g=g))

# One-off government spending shock

elif duration == 'one-off':
if t == g_t:
y_t.append(transition(y_t, t, g=g))
else:
y_t.append(transition(y_t, t, g=0))
return y_t

A permanent government spending shock can be simulated as follows

In [17]: plot_y(y_stochastic_g(g=10, g_t=20, duration='permanent'))

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
13.6. GOVERNMENT SPENDING 189

We can also see the response to a one time jump in government expenditures

In [18]: plot_y(y_stochastic_g(g=500, g_t=50, duration='one-off'))

Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
190 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.7 Wrapping Everything Into a Class

Up to now, we have written functions to do the work

Now we’ll roll up our sleeves and write a Python class called Samuelson for the Samuelson
model

In [19]: class Samuelson():

r"""This class represents the Samuelson model, otherwise known as the

multiple-accelerator model. The model combines the Keynesian multiplier
with the accelerator theory of investment.

The path of output is governed by a linear second-order difference equation

.. math::

Y_t = + \alpha (1 + \beta) Y_{t-1} - \alpha \beta Y_{t-2}

Parameters
----------
y_0 : scalar
Initial condition for Y_0
y_1 : scalar
Initial condition for Y_1
α : scalar
Marginal propensity to consume
β : scalar
Accelerator coefficient
n : int
Number of iterations
σ : scalar
Volatility parameter. It must be greater than or equal to 0. Set
equal to 0 for a non-stochastic model.
g : scalar
Government spending shock
g_t : int
Time at which government spending shock occurs. Must be specified
when duration != None.
duration : {None, 'permanent', 'one-off'}
Specifies type of government spending shock. If none, government
spending equal to g for all t.

"""

def __init__(self,
y_0=100,
y_1=50,
α=1.3,
β=0.2,
γ=10,
n=100,
σ=0,
g=0,
g_t=0,
duration=None):

self.y_0, self.y_1, self.α, self.β = y_0, y_1, α, β

self.n, self.g, self.g_t, self.duration = n, g, g_t, duration
self.γ, self.σ = γ, σ
self.ρ1 = α + β
self.ρ2 = -β
self.roots = np.roots([1, -self.ρ1, -self.ρ2])

def root_type(self):
if all(isinstance(root, complex) for root in self.roots):
return 'Complex conjugate'
elif len(self.roots) > 1:
return 'Double real'
else:
return 'Single real'
13.7. WRAPPING EVERYTHING INTO A CLASS 191

def root_less_than_one(self):
if all(abs(root) < 1 for root in self.roots):
return True

def solution_type(self):
ρ1, ρ2 = self.ρ1, self.ρ2
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 >= 1 + ρ1 or ρ2 <= -1:
return 'Explosive oscillations'
elif ρ1 + ρ2 >= 1:
return 'Explosive growth'
elif discriminant < 0:
return 'Damped oscillations'
else:
return 'Steady state'

def _transition(self, x, t, g):

# Non-stochastic - separated to avoid generating random series when not needed

if self.σ == 0:
return self.ρ1 * x[t - 1] + self.ρ2 * x[t - 2] + self.γ + g

# Stochastic
else:
� = np.random.normal(0, 1, self.n)
return self.ρ1 * x[t - 1] + self.ρ2 * x[t - 2] + self.γ + g + self.σ * �[t]

def generate_series(self):

# Create list and set initial conditions

y_t = [self.y_0, self.y_1]

# Generate y_t series

for t in range(2, self.n):

# No government spending
if self.g == 0:
y_t.append(self._transition(y_t, t))

# Government spending (no shock)

elif self.g != 0 and self.duration == None:
y_t.append(self._transition(y_t, t))

# Permanent government spending shock

elif self.duration == 'permanent':
if t < self.g_t:
y_t.append(self._transition(y_t, t, g=0))
else:
y_t.append(self._transition(y_t, t, g=self.g))

# One-off government spending shock

elif self.duration == 'one-off':
if t == self.g_t:
y_t.append(self._transition(y_t, t, g=self.g))
else:
y_t.append(self._transition(y_t, t, g=0))
return y_t

def summary(self):
print('Summary\n' + '-' * 50)
print(f'Root type: {self.root_type()}')
print(f'Solution type: {self.solution_type()}')
print(f'Roots: {str(self.roots)}')

if self.root_less_than_one() == True:
print('Absolute value of roots is less than one')
else:
print('Absolute value of roots is not less than one')

if self.σ > 0:
print('Stochastic series with σ = ' + str(self.σ))
else:
192 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

print('Non-stochastic series')

if self.g != 0:
print('Government spending equal to ' + str(self.g))

if self.duration != None:
print(self.duration.capitalize() +
' government spending shock at t = ' + str(self.g_t))

def plot(self):
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(self.generate_series())
ax.set(xlabel='Iteration', xlim=(0, self.n))
ax.set_ylabel('$Y_t$', rotation=0)
ax.grid()

# Add parameter values to plot

paramstr = f'$\\alpha={self.α:.2f}$ \n $\\beta={self.β:.2f}$ \n $\\gamma={self.γ:.2f}$ \n \
$\\sigma={self.σ:.2f}$ \n $\\rho_1={self.ρ1:.2f}$ \n $\\rho_2={self.ρ2:.2f}$'
props = dict(fc='white', pad=10, alpha=0.5)
ax.text(0.87, 0.05, paramstr, transform=ax.transAxes,
fontsize=12, bbox=props, va='bottom')

return fig

def param_plot(self):

# Uses the param_plot() function defined earlier (it is then able

# to be used standalone or as part of the model)

fig = param_plot()
ax = fig.gca()

# Add λ values to legend

for i, root in enumerate(self.roots):
if isinstance(root, complex):
operator = ['+', ''] # Need to fill operator for positive as string is split apart
label = rf'$\lambda_{i+1} = {sam.roots[i].real:.2f} {operator[i]} {sam.roots[i].imag:
else:
label = rf'$\lambda_{i+1} = {sam.roots[i].real:.2f}$'
ax.scatter(0, 0, 0, label=label) # dummy to add to legend

# Add ρ pair to plot

ax.scatter(self.ρ1, self.ρ2, 100, 'red', '+', label=r'$(\ \rho_1, \ \rho_2 \ )$', zorder=5)

plt.legend(fontsize=12, loc=3)

return fig

13.7.1 Illustration of Samuelson Class

Now we’ll put our Samuelson class to work on an example

In [20]: sam = Samuelson(α=0.8, β=0.5, σ=2, g=10, g_t=20, duration='permanent')

sam.summary()

Summary
--------------------------------------------------
Root type: Complex conjugate
Solution type: Damped oscillations
Roots: [0.65+0.27838822j 0.65-0.27838822j]
Absolute value of roots is less than one
Stochastic series with σ = 2
Government spending equal to 10
Permanent government spending shock at t = 20

In [21]: sam.plot()
plt.show()
13.7. WRAPPING EVERYTHING INTO A CLASS 193

13.7.2 Using the Graph

We’ll use our graph to show where the roots lie and how their location is consistent with the
behavior of the path just graphed
The red + sign shows the location of the roots

In [22]: sam.param_plot()
plt.show()
194 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.8 Using the LinearStateSpace Class

It turns out that we can use the QuantEcon.py LinearStateSpace class to do much of the
work that we have done from scratch above
Here is how we map the Samuelson model into an instance of a LinearStateSpace class

In [23]: from quantecon import LinearStateSpace

""" This script maps the Samuelson model in the the ``LinearStateSpace`` class"""
α = 0.8
β = 0.9
ρ1 = α + β
ρ2 = -β
γ = 10
σ = 1
g = 10
n = 100

A = [[1, 0, 0],
[γ + g, ρ1, ρ2],
[0, 1, 0]]

G = [[γ + g, ρ1, ρ2], # this is Y_{t+1}

[γ, α, 0], # this is C_{t+1}
[0, β, -β]] # this is I_{t+1}

μ_0 = [1, 100, 100]

C = np.zeros((3,1))
C[1] = σ # stochastic

sam_t = LinearStateSpace(A, C, G, mu_0=μ_0)

x, y = sam_t.simulate(ts_length=n)

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))

titles = ['Output ($Y_t$)', 'Consumption ($C_t$)', 'Investment ($I_t$)']
colors = ['darkblue', 'red', 'purple']
for ax, series, title, color in zip(axes, y, titles, colors):
ax.plot(series, color=color)
ax.set(title=title, xlim=(0, n))
ax.grid()

axes[-1].set_xlabel('Iteration')

plt.show()
13.8. USING THE LINEARSTATESPACE CLASS 195

13.8.1 Other Methods in the LinearStateSpace Class

Let’s plot impulse response functions for the instance of the Samuelson model using a
method in the LinearStateSpace class

In [24]: imres = sam_t.impulse_response()

imres = np.asarray(imres)
y1 = imres[:, :, 0]
y2 = imres[:, :, 1]
y1.shape

Out[24]:

(2, 6, 1)

Now let’s compute the zeros of the characteristic polynomial by simply calculating the eigen-
values of 𝐴

In [25]: A = np.asarray(A)
w, v = np.linalg.eig(A)
print(w)

[0.85+0.42130749j 0.85-0.42130749j 1. +0.j ]

196 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

13.8.2 Inheriting Methods from LinearStateSpace

We could also create a subclass of LinearStateSpace (inheriting all its methods and at-
tributes) to add more functions to use

In [26]: class SamuelsonLSS(LinearStateSpace):

"""
this subclass creates a Samuelson multiplier-accelerator model
as a linear state space system
"""
def __init__(self,
y_0=100,
y_1=100,
α=0.8,
β=0.9,
γ=10,
σ=1,
g=10):

self.α, self.β = α, β
self.y_0, self.y_1, self.g = y_0, y_1, g
self.γ, self.σ = γ, σ

# Define intial conditions

self.μ_0 = [1, y_0, y_1]

self.ρ1 = α + β
self.ρ2 = -β

# Define transition matrix

self.A = [[1, 0, 0],
[γ + g, self.ρ1, self.ρ2],
[0, 1, 0]]

# Define output matrix

self.G = [[γ + g, self.ρ1, self.ρ2], # this is Y_{t+1}
[γ, α, 0], # this is C_{t+1}
[0, β, -β]] # this is I_{t+1}

self.C = np.zeros((3, 1))

self.C[1] = σ # stochastic

# Initialize LSS with parameters from Samuelson model

LinearStateSpace.__init__(self, self.A, self.C, self.G, mu_0=self.μ_0)

def plot_simulation(self, ts_length=100, stationary=True):

# Temporarily store original parameters

temp_μ = self.μ_0
temp_Σ = self.Sigma_0

# Set distribution parameters equal to their stationary values for simulation

if stationary == True:
try:
self.μ_x, self.μ_y, self.σ_x, self.σ_y = self.stationary_distributions()
self.μ_0 = self.μ_y
self.Σ_0 = self.σ_y
# Exception where no convergence achieved when calculating stationary distributions
except ValueError:
print('Stationary distribution does not exist')

x, y = self.simulate(ts_length)

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))

axes[-1].set_xlabel('Iteration')

# Reset distribution parameters to their initial values

self.μ_0 = temp_μ
self.Sigma_0 = temp_Σ

return fig

def plot_irf(self, j=5):

x, y = self.impulse_response(j)

# Reshape into 3 x j matrix for plotting purposes

yimf = np.array(y).flatten().reshape(j+1, 3).T

fig, axes = plt.subplots(3, 1, sharex=True, figsize=(12, 8))

labels = ['$Y_t$', '$C_t$', '$I_t$']
colors = ['darkblue', 'red', 'purple']
for ax, series, label, color in zip(axes, yimf, labels, colors):
ax.plot(series, color=color)
ax.set(xlim=(0, j))
ax.set_ylabel(label, rotation=0, fontsize=14, labelpad=10)
ax.grid()

axes[0].set_title('Impulse Response Functions')

axes[-1].set_xlabel('Iteration')

return fig

def multipliers(self, j=5):

x, y = self.impulse_response(j)
return np.sum(np.array(y).flatten().reshape(j+1, 3), axis=0)

13.8.3 Illustrations

Let’s show how we can use the SamuelsonLSS

In [27]: samlss = SamuelsonLSS()

In [28]: samlss.plot_simulation(100, stationary=False)

plt.show()
198 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

In [29]: samlss.plot_simulation(100, stationary=True)

plt.show()
13.9. PURE MULTIPLIER MODEL 199

In [30]: samlss.plot_irf(100)
plt.show()

In [31]: samlss.multipliers()

Out[31]: array([7.414389, 6.835896, 0.578493])

13.9 Pure Multiplier Model

Let’s shut down the accelerator by setting 𝑏 = 0 to get a pure multiplier model

• the absence of cycles gives an idea about why Samuelson included the accelerator

In [32]: pure_multiplier = SamuelsonLSS(α=0.95, β=0)

In [33]: pure_multiplier.plot_simulation()

Stationary distribution does not exist

Out[33]:
200 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

In [34]: pure_multiplier = SamuelsonLSS(α=0.8, β=0)

In [35]: pure_multiplier.plot_simulation()
13.9. PURE MULTIPLIER MODEL 201

Out[35]:

In [36]: pure_multiplier.plot_irf(100)
202 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR

Out[36]:
13.10. SUMMARY 203

13.10 Summary

In this lecture, we wrote functions and classes to represent non-stochastic and stochastic ver-
sions of the Samuelson (1939) multiplier-accelerator model, described in [115]
We saw that different parameter values led to different output paths, which could either be
stationary, explosive, or oscillating
We also were able to represent the model using the QuantEcon.py LinearStateSpace class
204 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
14

More Language Features

14.1 Contents

• Overview 14.2
• Iterables and Iterators 14.3
• Names and Name Resolution 14.4
• Handling Errors 14.5
• Decorators and Descriptors 14.6
• Generators 14.7
• Recursive Function Calls 14.8
• Exercises 14.9
• Solutions 14.10

14.2 Overview

With this last lecture, our advice is to skip it on first pass, unless you have a burning de-
sire to read it
It’s here

1. as a reference, so we can link back to it when required, and

2. for those who have worked through a number of applications, and now want to learn
more about the Python language

A variety of topics are treated in the lecture, including generators, exceptions and descriptors

14.3 Iterables and Iterators

We’ve already said something about iterating in Python

Now let’s look more closely at how it all works, focusing in Python’s implementation of the
for loop

205
206 14. MORE LANGUAGE FEATURES

14.3.1 Iterators

Iterators are a uniform interface to stepping through elements in a collection

Here we’ll talk about using iterators—later we’ll learn how to build our own
Formally, an iterator is an object with a __next__ method
For example, file objects are iterators
To see this, let’s have another look at the US cities data, which is written to the present
working directory in the following cell

In [1]: %%file us_cities.txt

new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229

Writing us_cities.txt

In [2]: f = open('us_cities.txt')
f.__next__()

Out[2]: 'new york: 8244910\n'

In [3]: f.__next__()

Out[3]: 'los angeles: 3819702\n'

We see that file objects do indeed have a __next__ method, and that calling this method
returns the next line in the file
The next method can also be accessed via the builtin function next(), which directly calls
this method

In [4]: next(f)

Out[4]: 'chicago: 2707120\n'

The objects returned by enumerate() are also iterators

In [5]: e = enumerate(['foo', 'bar'])

next(e)

Out[5]: (0, 'foo')

In [6]: next(e)

Out[6]: (1, 'bar')

14.3. ITERABLES AND ITERATORS 207

as are the reader objects from the csv module

Let’s create a small csv file that contains data from the NIKKEI index

In [7]: %%file test_table.csv

Date,Open,High,Low,Close,Volume,Adj Close
2009-05-21,9280.35,9286.35,9189.92,9264.15,133200,9264.15
2009-05-20,9372.72,9399.40,9311.61,9344.64,143200,9344.64
2009-05-19,9172.56,9326.75,9166.97,9290.29,167000,9290.29
2009-05-18,9167.05,9167.82,8997.74,9038.69,147800,9038.69
2009-05-15,9150.21,9272.08,9140.90,9265.02,172000,9265.02
2009-05-14,9212.30,9223.77,9052.41,9093.73,169400,9093.73
2009-05-13,9305.79,9379.47,9278.89,9340.49,176000,9340.49
2009-05-12,9358.25,9389.61,9298.61,9298.61,188400,9298.61
2009-05-11,9460.72,9503.91,9342.75,9451.98,230800,9451.98
2009-05-08,9351.40,9464.43,9349.57,9432.83,220200,9432.83

Writing test_table.csv

In [8]: from csv import reader

f = open('test_table.csv', 'r')
nikkei_data = reader(f)
next(nikkei_data)

Out[8]: ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']

In [9]: next(nikkei_data)

Out[9]: ['2009-05-21', '9280.35', '9286.35', '9189.92', '9264.15', '133200', '9264.15']

14.3.2 Iterators in For Loops

All iterators can be placed to the right of the in keyword in for loop statements
In fact this is how the for loop works: If we write

for x in iterator:
<code block>

then the interpreter

• calls iterator._next_() and binds x to the result

• executes the code block
• repeats until a StopIteration error occurs

So now you know how this magical looking syntax works

f = open('somefile.txt', 'r')
for line in f:
# do something

The interpreter just keeps

1. calling f.next() and binding line to the result

2. executing the body of the loop

This continues until a StopIteration error occurs

208 14. MORE LANGUAGE FEATURES

14.3.3 Iterables

You already know that we can put a Python list to the right of in in a for loop

In [10]: for i in ['spam', 'eggs']:

print(i)

spam
eggs

So does that mean that a list is an iterator?

The answer is no

In [11]: x = ['foo', 'bar']

type(x)

Out[11]: list

In [12]: next(x)

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-12-92de4e9f6b1e> in <module>
----> 1 next(x)

TypeError: 'list' object is not an iterator

So why can we iterate over a list in a for loop?

The reason is that a list is iterable (as opposed to an iterator)
Formally, an object is iterable if it can be converted to an iterator using the built-in function
iter()
Lists are one such object

In [13]: x = ['foo', 'bar']

type(x)

Out[13]: list

In [14]: y = iter(x)
type(y)

Out[14]: list_iterator

In [15]: next(y)

Out[15]: 'foo'

In [16]: next(y)
14.3. ITERABLES AND ITERATORS 209

Out[16]: 'bar'

In [17]: next(y)

---------------------------------------------------------------------------

StopIteration Traceback (most recent call last)

<ipython-input-17-81b9d2f0f16a> in <module>
----> 1 next(y)

StopIteration:

Many other objects are iterable, such as dictionaries and tuples

Of course, not all objects are iterable

In [18]: iter(42)

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-18-ef50b48e4398> in <module>
----> 1 iter(42)

TypeError: 'int' object is not iterable

To conclude our discussion of for loops

• for loops work on either iterators or iterables

• In the second case, the iterable is converted into an iterator before the loop starts

14.3.4 Iterators and built-ins

Some built-in functions that act on sequences also work with iterables

• max(), min(), sum(), all(), any()

For example

In [19]: x = [10, -10]

max(x)

Out[19]: 10

In [20]: y = iter(x)
type(y)

Out[20]: list_iterator
210 14. MORE LANGUAGE FEATURES

In [21]: max(y)

Out[21]: 10

One thing to remember about iterators is that they are depleted by use

In [22]: x = [10, -10]

y = iter(x)
max(y)

Out[22]: 10

In [23]: max(y)

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-23-062424e6ec08> in <module>
----> 1 max(y)

ValueError: max() arg is an empty sequence

14.4 Names and Name Resolution

14.4.1 Variable Names in Python

Consider the Python statement

In [24]: x = 42

We now know that when this statement is executed, Python creates an object of type int in
your computer’s memory, containing

• the value 42
• some associated attributes

But what is x itself?

In Python, x is called a name, and the statement x = 42 binds the name x to the integer
object we have just discussed
Under the hood, this process of binding names to objects is implemented as a dictionary—
more about this in a moment
There is no problem binding two or more names to the one object, regardless of what that
object is

In [25]: def f(string): # Create a function called f

print(string) # that prints any string it's passed

g = f
id(g) == id(f)
14.4. NAMES AND NAME RESOLUTION 211

Out[25]: True

In [26]: g('test')

test

In the first step, a function object is created, and the name f is bound to it
After binding the name g to the same object, we can use it anywhere we would use f
What happens when the number of names bound to an object goes to zero?
Here’s an example of this situation, where the name x is first bound to one object and then
rebound to another

In [27]: x = 'foo'
id(x)

Out[27]: 139979150881488

In [28]: x = 'bar' # No names bound to the first object

What happens here is that the first object is garbage collected

In other words, the memory slot that stores that object is deallocated, and returned to the
operating system

14.4.2 Namespaces

Recall from the preceding discussion that the statement

In [29]: x = 42

binds the name x to the integer object on the right-hand side

We also mentioned that this process of binding x to the correct object is implemented as a
dictionary
This dictionary is called a namespace
Definition: A namespace is a symbol table that maps names to objects in memory
Python uses multiple namespaces, creating them on the fly as necessary
For example, every time we import a module, Python creates a namespace for that module
To see this in action, suppose we write a script math2.py with a single line

In [30]: %%file math2.py

pi = 'foobar'

Writing math2.py

Now we start the Python interpreter and import it

212 14. MORE LANGUAGE FEATURES

In [31]: import math2

Next let’s import the math module from the standard library

In [32]: import math

Both of these modules have an attribute called pi

In [33]: math.pi

Out[33]: 3.141592653589793

In [34]: math2.pi

Out[34]: 'foobar'

These two different bindings of pi exist in different namespaces, each one implemented as a
dictionary
We can look at the dictionary directly, using module_name.__dict__

In [35]: import math

math.__dict__.items()

Out[35]: dict_items([('__name__', 'math'), ('__doc__', 'This module is always available. It provides access t

In [36]: import math2

math2.__dict__.items()

Out[36]: dict_items([('name', 'math2'), ('doc', None), ('package', ''), ('loader', <_frozen_im

Copyright (c) 2000 BeOpen.com.

Copyright (c) 1995-2001 Corporation for National Research Initiatives.

Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.

All Rights Reserved., 'credits': Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of
for supporting Python development. See www.python.org for more information., 'license': Type lic

As you know, we access elements of the namespace using the dotted attribute notation

In [37]: math.pi

Out[37]: 3.141592653589793

In fact this is entirely equivalent to math.dict['pi']

In [38]: math.dict['pi'] == math.pi

Out[38]: True
14.4. NAMES AND NAME RESOLUTION 213

14.4.3 Viewing Namespaces

As we saw above, the math namespace can be printed by typing math.dict

Another way to see its contents is to type vars(math)

In [39]: vars(math).items()

Out[39]: dict_items([('__name__', 'math'), ('__doc__', 'This module is always available. It provides access t

If you just want to see the names, you can type

In [40]: dir(math)[0:10]

Out[40]: ['__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'acos',
'acosh',
'asin',
'asinh']

Notice the special names doc and name

These are initialized in the namespace when any module is imported

• doc is the doc string of the module

• __name__ is the name of the module

In [41]: print(math.__doc__)

This module is always available. It provides access to the

mathematical functions defined by the C standard.

In [42]: math.__name__

Out[42]: 'math'

14.4.4 Interactive Sessions

In Python, all code executed by the interpreter runs in some module

What about commands typed at the prompt?
These are also regarded as being executed within a module — in this case, a module called
__main__
To check this, we can look at the current module name via the value of __name__ given at
the prompt

In [43]: print(__name__)
214 14. MORE LANGUAGE FEATURES

__main__

When we run a script using IPython’s run command, the contents of the file are executed as
part of __main__ too
To see this, let’s create a file mod.py that prints its own __name__ attribute

In [44]: %%file mod.py

print(__name__)

Writing mod.py

Now let’s look at two different ways of running it in IPython

In [45]: import mod # Standard import

mod

In [46]: %run mod.py # Run interactively

__main__

In the second case, the code is executed as part of __main__, so __name__ is equal to
__main__
To see the contents of the namespace of __main__ we use vars() rather than
vars(__main__)
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has
initialized when you started up your session
If you prefer to see only the variables you have initialized, use whos

In [47]: x = 2
y = 3

import numpy as np

%whos

Variable Type Data/Info

-----------------------------------------------------
e enumerate <enumerate object at 0x7f4f6c16f708>
f function <function f at 0x7f4f6c1c7048>
g function <function f at 0x7f4f6c1c7048>
i str eggs
math module <module 'math' from '/hom<…>37m-x86_64-linux-gnu.so'>
math2 module <module 'math2' from '/ho<…>pyter/executed/math2.py'>
mod module <module 'mod' from '/home<…>jupyter/executed/mod.py'>
nikkei_data reader <_csv.reader object at 0x7f4f6c178588>
np module <module 'numpy' from '/ho<…>kages/numpy/__init__.py'>
reader builtin_function_or_method <built-in function reader>
x int 2
y int 3
14.4. NAMES AND NAME RESOLUTION 215

14.4.5 The Global Namespace

Python documentation often makes reference to the “global namespace”

The global namespace is the namespace of the module currently being executed
For example, suppose that we start the interpreter and begin making assignments
We are now working in the module __main__, and hence the namespace for __main__ is
the global namespace
Next, we import a module called amodule

import amodule

At this point, the interpreter creates a namespace for the module amodule and starts exe-
cuting commands in the module
While this occurs, the namespace amodule.__dict__ is the global namespace
Once execution of the module finishes, the interpreter returns to the module from where the
import statement was made
In this case it’s __main__, so the namespace of __main__ again becomes the global names-
pace

14.4.6 Local Namespaces

Important fact: When we call a function, the interpreter creates a local namespace for that
function, and registers the variables in that namespace
The reason for this will be explained in just a moment
Variables in the local namespace are called local variables
After the function returns, the namespace is deallocated and lost
While the function is executing, we can view the contents of the local namespace with lo-
cals()
For example, consider

In [48]: def f(x):

a = 2
print(locals())
return a * x

Now let’s call the function

In [49]: f(1)

{'x': 1, 'a': 2}

Out[49]: 2

You can see the local namespace of f before it is destroyed

216 14. MORE LANGUAGE FEATURES

14.4.7 The builtins Namespace

We have been using various built-in functions, such as max(), dir(), str(), list(),
len(), range(), type(), etc.
How does access to these names work?

• These definitions are stored in a module called builtin

• They have there own namespace called __builtins__

In [50]: dir()[0:10]

Out[50]: ['In', 'Out', '_', '_11', '_13', '_14', '_15', '_16', '_19', '_2']

In [51]: dir(__builtins__)[0:10]

Out[51]: ['ArithmeticError',
'AssertionError',
'AttributeError',
'BaseException',
'BlockingIOError',
'BrokenPipeError',
'BufferError',
'BytesWarning',
'ChildProcessError',
'ConnectionAbortedError']

We can access elements of the namespace as follows

In [52]: __builtins__.max

Out[52]: <function max>

But __builtins__ is special, because we can always access them directly as well

In [53]: max

Out[53]: <function max>

In [54]: builtins.max == max

Out[54]: True

The next section explains how this works …

14.4.8 Name Resolution

Namespaces are great because they help us organize variable names

(Type import this at the prompt and look at the last item that’s printed)
However, we do need to understand how the Python interpreter works with multiple names-
paces
14.4. NAMES AND NAME RESOLUTION 217

At any point of execution, there are in fact at least two namespaces that can be accessed di-
rectly
(“Accessed directly” means without using a dot, as in pi rather than math.pi)
These namespaces are

• The global namespace (of the module being executed)

• The builtin namespace

If the interpreter is executing a function, then the directly accessible namespaces are

• The local namespace of the function

• The global namespace (of the module being executed)
• The builtin namespace

Sometimes functions are defined within other functions, like so

In [55]: def f():

a = 2
def g():
b = 4
print(a * b)
g()

Here f is the enclosing function for g, and each function gets its own namespaces
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is

1. the local namespace (if it exists)

2. the hierarchy of enclosing namespaces (if they exist)
3. the global namespace
4. the builtin namespace

If the name is not in any of these namespaces, the interpreter raises a NameError
This is called the LEGB rule (local, enclosing, global, builtin)
Here’s an example that helps to illustrate
Consider a script test.py that looks as follows

In [56]: %%file test.py

def g(x):
a = 1
x = x + a
return x

a = 0
y = g(10)
print("a = ", a, "y = ", y)

Writing test.py

What happens when we run this script?

218 14. MORE LANGUAGE FEATURES

In [57]: %run test.py

a = 0 y = 11

In [58]: x

Out[58]: 2

First,

• The global namespace {} is created

• The function object is created, and g is bound to it within the global namespace
• The name a is bound to 0, again in the global namespace

Next g is called via y = g(10), leading to the following sequence of actions

• The local namespace for the function is created

• Local names x and a are bound, so that the local namespace becomes {'x': 10,
'a': 1}
• Statement x = x + a uses the local a and local x to compute x + a, and binds local
name x to the result
• This value is returned, and y is bound to it in the global namespace
• Local x and a are discarded (and the local namespace is deallocated)

Note that the global a was not affected by the local a

14.4.9 Mutable Versus Immutable Parameters

This is a good time to say a little more about mutable vs immutable objects
Consider the code segment

In [59]: def f(x):

x = x + 1
return x

x = 1
print(f(x), x)

2 1

We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as
the value of x
First f and x are registered in the global namespace
The call f(x) creates a local namespace and adds x to it, bound to 1
Next, this local x is rebound to the new integer object 2, and this value is returned
None of this affects the global x
However, it’s a different story when we use a mutable data type such as a list
14.5. HANDLING ERRORS 219

In [60]: def f(x):

x[0] = x[0] + 1
return x

x = [1]
print(f(x), x)

[2] [2]

This prints as the value of f(x) and same for x

Here’s what happens

• f is registered as a function in the global namespace

• x bound to in the global namespace
• The call f(x)

– Creates a local namespace

– Adds x to local namespace, bound to
– The list is modified to
– Returns the list
– The local namespace is deallocated, and local x is lost

• Global x has been modified

14.5 Handling Errors

Sometimes it’s possible to anticipate errors as we’re writing code

For example, the unbiased sample variance of sample 𝑦1 , … , 𝑦𝑛 is defined as

𝑛
1
𝑠2 ∶= ∑(𝑦𝑖 − 𝑦)̄ 2 𝑦 ̄ = sample mean
𝑛 − 1 𝑖=1

This can be calculated in NumPy using np.var

But if you were writing a function to handle such a calculation, you might anticipate a divide-
by-zero error when the sample size is one
One possible action is to do nothing — the program will just crash, and spit out an error
message
But sometimes it’s worth writing your code in a way that anticipates and deals with runtime
errors that you think might arise
Why?

• Because the debugging information provided by the interpreter is often less useful than
the information on possible errors you have in your head when writing code
• Because errors causing execution to stop are frustrating if you’re in the middle of a
large computation
• Because it’s reduces confidence in your code on the part of your users (if you are writing
for others)
220 14. MORE LANGUAGE FEATURES

14.5.1 Assertions

A relatively easy way to handle checks is with the assert keyword

For example, pretend for a moment that the np.var function doesn’t exist and we need to
write our own

In [61]: def var(y):

n = len(y)
assert n > 1, 'Sample size must be greater than one.'
return np.sum((y - y.mean())**2) / float(n-1)

If we run this with an array of length one, the program will terminate and print our error
message

In [62]: var([1])

---------------------------------------------------------------------------

AssertionError Traceback (most recent call last)

<ipython-input-62-8419b6ab38ec> in <module>
----> 1 var([1])

<ipython-input-61-e6ffb16a7098> in var(y)
1 def var(y):
2 n = len(y)
----> 3 assert n > 1, 'Sample size must be greater than one.'
4 return np.sum((y - y.mean())**2) / float(n-1)

AssertionError: Sample size must be greater than one.

The advantage is that we can

• fail early, as soon as we know there will be a problem

• supply specific information on why a program is failing

14.5.2 Handling Errors During Runtime

The approach used above is a bit limited, because it always leads to termination
Sometimes we can handle errors more gracefully, by treating special cases
Let’s look at how this is done
Exceptions
Here’s an example of a common error type

In [63]: def f:

File "<ipython-input-63-262a7e387ba5>", line 1

def f:
^
SyntaxError: invalid syntax
14.5. HANDLING ERRORS 221

Since illegal syntax cannot be executed, a syntax error terminates execution of the program
Here’s a different kind of error, unrelated to syntax

In [64]: 1 / 0

---------------------------------------------------------------------------

ZeroDivisionError Traceback (most recent call last)

<ipython-input-64-bc757c3fda29> in <module>
----> 1 1 / 0

ZeroDivisionError: division by zero

Here’s another

In [65]: x1 = y1

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-65-a7b8d65e9e45> in <module>
----> 1 x1 = y1

NameError: name 'y1' is not defined

And another

In [66]: 'foo' + 6

---------------------------------------------------------------------------

TypeError Traceback (most recent call last)

<ipython-input-66-216809d6e6fe> in <module>
----> 1 'foo' + 6

TypeError: can only concatenate str (not "int") to str

And another

In [67]: X = []
x = X[0]

---------------------------------------------------------------------------

IndexError Traceback (most recent call last)

<ipython-input-67-082a18d7a0aa> in <module>
1 X = []
----> 2 x = X[0]

IndexError: list index out of range

222 14. MORE LANGUAGE FEATURES

On each occasion, the interpreter informs us of the error type

• NameError, TypeError, IndexError, ZeroDivisionError, etc.

In Python, these errors are called exceptions

Catching Exceptions
We can catch and deal with exceptions using try – except blocks
Here’s a simple example

In [68]: def f(x):

try:
return 1.0 / x
except ZeroDivisionError:
print('Error: division by zero. Returned None')
return None

When we call f we get the following output

In [69]: f(2)

Out[69]: 0.5

In [70]: f(0)

Error: division by zero. Returned None

In [71]: f(0.0)

Error: division by zero. Returned None

The error is caught and execution of the program is not terminated

Note that other error types are not caught
If we are worried the user might pass in a string, we can catch that error too

In [72]: def f(x):

try:
return 1.0 / x
except ZeroDivisionError:
print('Error: Division by zero. Returned None')
except TypeError:
print('Error: Unsupported operation. Returned None')
return None

Here’s what happens

In [73]: f(2)

Out[73]: 0.5

In [74]: f(0)
14.6. DECORATORS AND DESCRIPTORS 223

Error: Division by zero. Returned None

In [75]: f('foo')

Error: Unsupported operation. Returned None

If we feel lazy we can catch these errors together

In [76]: def f(x):

try:
return 1.0 / x
except (TypeError, ZeroDivisionError):
print('Error: Unsupported operation. Returned None')
return None

Here’s what happens

In [77]: f(2)

Out[77]: 0.5

In [78]: f(0)

Error: Unsupported operation. Returned None

In [79]: f('foo')

Error: Unsupported operation. Returned None

If we feel extra lazy we can catch all error types as follows

In [80]: def f(x):

try:
return 1.0 / x
except:
print('Error. Returned None')
return None

In general it’s better to be specific

14.6 Decorators and Descriptors

Let’s look at some special syntax elements that are routinely used by Python developers
You might not need the following concepts immediately, but you will see them in other peo-
ple’s code
Hence you need to understand them at some stage of your Python education
224 14. MORE LANGUAGE FEATURES

14.6.1 Decorators

Decorators are a bit of syntactic sugar that, while easily avoided, have turned out to be popu-
lar
It’s very easy to say what decorators do
On the other hand it takes a bit of effort to explain why you might use them
An Example
Suppose we are working on a program that looks something like this

In [81]: import numpy as np

def f(x):
return np.log(np.log(x))

def g(x):
return np.sqrt(42 * x)

# Program continues with various calculations using f and g

Now suppose there’s a problem: occasionally negative numbers get fed to f and g in the cal-
culations that follow
If you try it, you’ll see that when these functions are called with negative numbers they re-
turn a NumPy object called nan
This stands for “not a number” (and indicates that you are trying to evaluate a mathematical
function at a point where it is not defined)
Perhaps this isn’t what we want, because it causes other problems that are hard to pick up
later on
Suppose that instead we want the program to terminate whenever this happens, with a sensi-
ble error message
This change is easy enough to implement

In [82]: import numpy as np

def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))

def g(x):
assert x >= 0, "Argument must be nonnegative"
return np.sqrt(42 * x)

# Program continues with various calculations using f and g

Notice however that there is some repetition here, in the form of two identical lines of code
Repetition makes our code longer and harder to maintain, and hence is something we try
hard to avoid
Here it’s not a big deal, but imagine now that instead of just f and g, we have 20 such func-
tions that we need to modify in exactly the same way
This means we need to repeat the test logic (i.e., the assert line testing nonnegativity) 20
times
14.6. DECORATORS AND DESCRIPTORS 225

The situation is still worse if the test logic is longer and more complicated
In this kind of scenario the following approach would be neater

In [83]: import numpy as np

def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function

def f(x):
return np.log(np.log(x))

def g(x):
return np.sqrt(42 * x)

f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g

This looks complicated so let’s work through it slowly

To unravel the logic, consider what happens when we say f = check_nonneg(f)
This calls the function check_nonneg with parameter func set equal to f
Now check_nonneg creates a new function called safe_function that verifies x as non-
negative and then calls func on it (which is the same as f)
Finally, the global name f is then set equal to safe_function
Now the behavior of f is as we desire, and the same is true of g
At the same time, the test logic is written only once
Enter Decorators
The last version of our code is still not ideal
For example, if someone is reading our code and wants to know how f works, they will be
looking for the function definition, which is

In [84]: def f(x):

return np.log(np.log(x))

They may well miss the line f = check_nonneg(f)

For this and other reasons, decorators were introduced to Python
With decorators, we can replace the lines

In [85]: def f(x):

return np.log(np.log(x))

def g(x):
return np.sqrt(42 * x)

f = check_nonneg(f)
g = check_nonneg(g)

with
226 14. MORE LANGUAGE FEATURES

In [86]: @check_nonneg
def f(x):
return np.log(np.log(x))

@check_nonneg
def g(x):
return np.sqrt(42 * x)

These two pieces of code do exactly the same thing

If they do the same thing, do we really need decorator syntax?
Well, notice that the decorators sit right on top of the function definitions
Hence anyone looking at the definition of the function will see them and be aware that the
function is modified
In the opinion of many people, this makes the decorator syntax a significant improvement to
the language

14.6.2 Descriptors

Descriptors solve a common problem regarding management of variables

To understand the issue, consider a Car class, that simulates a car
Suppose that this class defines the variables miles and kms, which give the distance traveled
in miles and kilometers respectively
A highly simplified version of the class might look as follows

In [87]: class Car:

def init(self, miles=1000):

self.miles = miles
self.kms = miles * 1.61

# Some other functionality, details omitted

One potential problem we might have here is that a user alters one of these variables but not
the other

In [88]: car = Car()

car.miles

Out[88]: 1000

In [89]: car.kms

Out[89]: 1610.0

In [90]: car.miles = 6000

car.kms

Out[90]: 1610.0

In the last two lines we see that miles and kms are out of sync
14.6. DECORATORS AND DESCRIPTORS 227

What we really want is some mechanism whereby each time a user sets one of these variables,
the other is automatically updated
A Solution
In Python, this issue is solved using descriptors
A descriptor is just a Python object that implements certain methods
These methods are triggered when the object is accessed through dotted attribute notation
The best way to understand this is to see it in action
Consider this alternative version of the Car class

In [91]: class Car:

def init(self, miles=1000):

self._miles = miles
self._kms = miles * 1.61

def set_miles(self, value):

self._miles = value
self._kms = value * 1.61

def set_kms(self, value):

self._kms = value
self._miles = value / 1.61

def get_miles(self):
return self._miles

def get_kms(self):
return self._kms

miles = property(get_miles, set_miles)

kms = property(get_kms, set_kms)

First let’s check that we get the desired behavior

In [92]: car = Car()

car.miles

Out[92]: 1000

In [93]: car.miles = 6000

car.kms

Out[93]: 9660.0

Yep, that’s what we want — car.kms is automatically updated

How it Works
The names _miles and _kms are arbitrary names we are using to store the values of the
variables
The objects miles and kms are properties, a common kind of descriptor
The methods get_miles, set_miles, get_kms and set_kms define what happens when
you get (i.e. access) or set (bind) these variables

• So-called “getter” and “setter” methods

228 14. MORE LANGUAGE FEATURES

The builtin Python function property takes getter and setter methods and creates a prop-
erty
For example, after car is created as an instance of Car, the object car.miles is a property
Being a property, when we set its value via car.miles = 6000 its setter method is trig-
gered — in this case set_miles
Decorators and Properties
These days its very common to see the property function used via a decorator
Here’s another version of our Car class that works as before but now uses decorators to set
up the properties

In [94]: class Car:

def init(self, miles=1000):

self._miles = miles
self._kms = miles * 1.61

@property
def miles(self):
return self._miles

@property
def kms(self):
return self._kms

@miles.setter
def miles(self, value):
self._miles = value
self._kms = value * 1.61

@kms.setter
def kms(self, value):
self._kms = value
self._miles = value / 1.61

We won’t go through all the details here

For further information you can refer to the descriptor documentation

14.7 Generators

A generator is a kind of iterator (i.e., it works with a next function)

We will study two ways to build generators: generator expressions and generator functions

14.7.1 Generator Expressions

The easiest way to build generators is using generator expressions

Just like a list comprehension, but with round brackets
Here is the list comprehension:

In [95]: singular = ('dog', 'cat', 'bird')

type(singular)

Out[95]: tuple
14.7. GENERATORS 229

In [96]: plural = [string + 's' for string in singular]

plural

Out[96]: ['dogs', 'cats', 'birds']

In [97]: type(plural)

Out[97]: list

And here is the generator expression

In [98]: singular = ('dog', 'cat', 'bird')

plural = (string + 's' for string in singular)
type(plural)

Out[98]: generator

In [99]: next(plural)

Out[99]: 'dogs'

In [100]: next(plural)

Out[100]: 'cats'

In [101]: next(plural)

Out[101]: 'birds'

Since sum() can be called on iterators, we can do this

In [102]: sum((x * x for x in range(10)))

Out[102]: 285

The function sum() calls next() to get the items, adds successive terms
In fact, we can omit the outer brackets in this case

In [103]: sum(x * x for x in range(10))

Out[103]: 285

14.7.2 Generator Functions

The most flexible way to create generator objects is to use generator functions
Let’s look at some examples
Example 1
Here’s a very simple example of a generator function
230 14. MORE LANGUAGE FEATURES

In [104]: def f():

yield 'start'
yield 'middle'
yield 'end'

It looks like a function, but uses a keyword yield that we haven’t met before
Let’s see how it works after running this code

In [105]: type(f)

Out[105]: function

In [106]: gen = f()

gen

Out[106]: <generator object f at 0x7f4f6c1bb1b0>

In [107]: next(gen)

Out[107]: 'start'

In [108]: next(gen)

Out[108]: 'middle'

In [109]: next(gen)

Out[109]: 'end'

In [110]: next(gen)

---------------------------------------------------------------------------

StopIteration Traceback (most recent call last)

<ipython-input-110-6e72e47198db> in <module>
----> 1 next(gen)

StopIteration:

The generator function f() is used to create generator objects (in this case gen)
Generators are iterators, because they support a next method
The first call to next(gen)

• Executes code in the body of f() until it meets a yield statement

• Returns that value to the caller of next(gen)

The second call to next(gen) starts executing from the next line
14.7. GENERATORS 231

In [111]: def f():

yield 'start'
yield 'middle' # This line!
yield 'end'

and continues until the next yield statement

At that point it returns the value following yield to the caller of next(gen), and so on
When the code block ends, the generator throws a StopIteration error
Example 2
Our next example receives an argument x from the caller

In [112]: def g(x):

while x < 100:
yield x
x = x * x

Let’s see how it works

In [113]: g

Out[113]: <function main.g(x)>

In [114]: gen = g(2)

type(gen)

Out[114]: generator

In [115]: next(gen)

Out[115]: 2

In [116]: next(gen)

Out[116]: 4

In [117]: next(gen)

Out[117]: 16

In [118]: next(gen)

---------------------------------------------------------------------------

StopIteration Traceback (most recent call last)

<ipython-input-118-6e72e47198db> in <module>
----> 1 next(gen)

StopIteration:
232 14. MORE LANGUAGE FEATURES

The call gen = g(2) binds gen to a generator

Inside the generator, the name x is bound to 2
When we call next(gen)

• The body of g() executes until the line yield x, and the value of x is returned

Note that value of x is retained inside the generator

When we call next(gen) again, execution continues from where it left off

In [119]: def g(x):

while x < 100:
yield x
x = x * x # execution continues from here

When x < 100 fails, the generator throws a StopIteration error

Incidentally, the loop inside the generator can be infinite

In [120]: def g(x):

while 1:
yield x
x = x * x

14.7.3 Advantages of Iterators

What’s the advantage of using an iterator here?

Suppose we want to sample a binomial(n,0.5)
One way to do it is as follows

In [121]: import random

n = 10000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
sum(draws)

Out[121]: 5001162

But we are creating two huge lists here, range(n) and draws
This uses lots of memory and is very slow
If we make n even bigger then this happens

In [122]: n = 100000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]

We can avoid these problems using iterators

Here is the generator function

In [123]: def f(n):

i = 1
while i <= n:
yield random.uniform(0, 1) < 0.5
i += 1
14.8. RECURSIVE FUNCTION CALLS 233

Now let’s do the sum

In [124]: n = 10000000
draws = f(n)
draws

Out[124]: <generator object f at 0x7f4f4fdfbb88>

In [125]: sum(draws)

Out[125]: 5000216

In summary, iterables

• avoid the need to create big lists/tuples, and

• provide a uniform interface to iteration that can be used transparently in for loops

14.8 Recursive Function Calls

This is not something that you will use every day, but it is still useful — you should learn it
at some stage
Basically, a recursive function is a function that calls itself
For example, consider the problem of computing 𝑥𝑡 for some t when

𝑥𝑡+1 = 2𝑥𝑡 , 𝑥0 = 1 (1)

Obviously the answer is 2𝑡

We can compute this easily enough with a loop

In [126]: def x_loop(t):

x = 1
for i in range(t):
x = 2 * x
return x

We can also use a recursive solution, as follows

In [127]: def x(t):

if t == 0:
return 1
else:
return 2 * x(t-1)

What happens here is that each successive call uses it’s own frame in the stack

• a frame is where the local variables of a given function call are held
• stack is memory used to process function calls
– a First In Last Out (FILO) queue

This example is somewhat contrived, since the first (iterative) solution would usually be pre-
ferred to the recursive solution
We’ll meet less contrived applications of recursion later on
234 14. MORE LANGUAGE FEATURES

14.9 Exercises

14.9.1 Exercise 1

The Fibonacci numbers are defined by

𝑥𝑡+1 = 𝑥𝑡 + 𝑥𝑡−1 , 𝑥0 = 0, 𝑥1 = 1 (2)

The first few numbers in the sequence are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55
Write a function to recursively compute the 𝑡-th Fibonacci number for any 𝑡

14.9.2 Exercise 2

Complete the following code, and test it using this csv file, which we assume that you’ve put
in your current working directory

def column_iterator(target_file, column_number):

"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
that steps through the elements of column column_number in file
target_file.
"""
# put your code here

dates = column_iterator('test_table.csv', 1)

for date in dates:

print(date)

14.9.3 Exercise 3

Suppose we have a text file numbers.txt containing the following lines

prices
3
8

7
21

Using try – except, write a program to read in the contents of the file and sum the num-
bers, ignoring lines without numbers
14.10. SOLUTIONS 235

14.10 Solutions

14.10.1 Exercise 1

Here’s the standard solution

In [128]: def x(t):

if t == 0:
return 0
if t == 1:
return 1
else:
return x(t-1) + x(t-2)

Let’s test it

In [129]: print([x(i) for i in range(10)])

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

14.10.2 Exercise 2

One solution is as follows

In [130]: def column_iterator(target_file, column_number):

"""A generator function for CSV files.
When called with a file name target_file (string) and column number
column_number (integer), the generator function returns a generator
which steps through the elements of column column_number in file
target_file.
"""
f = open(target_file, 'r')
for line in f:
yield line.split(',')[column_number - 1]
f.close()

dates = column_iterator('test_table.csv', 1)

i = 1
for date in dates:
print(date)
if i == 10:
break
i += 1

Date
2009-05-21
2009-05-20
2009-05-19
2009-05-18
2009-05-15
2009-05-14
2009-05-13
2009-05-12
2009-05-11

14.10.3 Exercise 3

Let’s save the data first

236 14. MORE LANGUAGE FEATURES

In [131]: %%file numbers.txt

prices
3
8

7
21

Writing numbers.txt

In [132]: f = open('numbers.txt')

total = 0.0
for line in f:
try:
total += float(line)
except ValueError:
pass

f.close()

print(total)

39.0
15

Debugging

15.1 Contents

• Overview 15.2

• Debugging 15.3

• Other Useful Magics 15.4

“Debugging is twice as hard as writing the code in the first place. Therefore, if
you write the code as cleverly as possible, you are, by definition, not smart enough
to debug it.” – Brian Kernighan

15.2 Overview

Are you one of those programmers who fills their code with print statements when trying to
debug their programs?
Hey, we all used to do that
(OK, sometimes we still do that…)
But once you start writing larger programs you’ll need a better system
Debugging tools for Python vary across platforms, IDEs and editors
Here we’ll focus on Jupyter and leave you to explore other settings
We’ll need the following imports

In [1]: import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

15.3 Debugging

15.3.1 The debug Magic

Let’s consider a simple (and rather contrived) example

237
238 15. DEBUGGING

In [2]: def plot_log():

fig, ax = plt.subplots(2, 1)
x = np.linspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log() # Call the function, generate plot

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

<ipython-input-2-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot

<ipython-input-2-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6

AttributeError: 'numpy.ndarray' object has no attribute 'plot'

This code is intended to plot the log function over the interval [1, 2]
But there’s an error here: plt.subplots(2, 1) should be just plt.subplots()
(The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suit-
able for having two subplots on the same figure)
The traceback shows that the error occurs at the method call ax.plot(x, np.log(x))
The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array
has no plot method
15.3. DEBUGGING 239

But let’s pretend that we don’t understand this for the moment
We might suspect there’s something wrong with ax but when we try to investigate this ob-
ject, we get the following exception:

In [3]: ax

---------------------------------------------------------------------------

NameError Traceback (most recent call last)

<ipython-input-3-b00e77935981> in <module>
----> 1 ax

NameError: name 'ax' is not defined

The problem is that ax was defined inside plot_log(), and the name is lost once that func-
tion terminates
Let’s try doing it a different way
We run the first cell block again, generating the same error

In [4]: def plot_log():

fig, ax = plt.subplots(2, 1)
x = np.linspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log() # Call the function, generate plot

---------------------------------------------------------------------------

AttributeError Traceback (most recent call last)

<ipython-input-4-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot

<ipython-input-4-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6

AttributeError: 'numpy.ndarray' object has no attribute 'plot'

240 15. DEBUGGING

But this time we type in the following cell block

%debug

You should be dropped into a new prompt that looks something like this

ipdb>

(You might see pdb> instead)

Now we can investigate the value of our variables at this point in the program, step forward
through the code, etc.
For example, here we simply type the name ax to see what’s happening with this object:

ipdb> ax
array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>,
<matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)

It’s now very clear that ax is an array, which clarifies the source of the problem
To find out what else you can do from inside ipdb (or pdb), use the online help

ipdb> h

Documented commands (type help <topic>):

========================================
EOF bt cont enable jump pdef r tbreak w
a c continue exit l pdoc restart u whatis
alias cl d h list pinfo return unalias where
15.3. DEBUGGING 241

args clear debug help n pp run unt

b commands disable ignore next q s until
break condition down j p quit step up

Miscellaneous help topics:

==========================
exec pdb

Undocumented commands:
======================
retval rv

ipdb> h c
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.

15.3.2 Setting a Break Point

The preceding approach is handy but sometimes insufficient

Consider the following modified version of our function above

In [5]: def plot_log():

fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log()

Here the original problem is fixed, but we’ve accidentally written np.logspace(1, 2,
10) instead of np.linspace(1, 2, 10)
242 15. DEBUGGING

Now there won’t be any exception, but the plot won’t look right
To investigate, it would be helpful if we could inspect variables like x during execution of the
function
To this end, we add a “break point” by inserting breakpoint() inside the function code
block

def plot_log():
breakpoint()
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()

plot_log()

Now let’s run the script, and investigate via the debugger

> <ipython-input-6-a188074383b7>(6)plot_log()
-> fig, ax = plt.subplots()
(Pdb) n
> <ipython-input-6-a188074383b7>(7)plot_log()
-> x = np.logspace(1, 2, 10)
(Pdb) n
> <ipython-input-6-a188074383b7>(8)plot_log()
-> ax.plot(x, np.log(x))
(Pdb) x
array([ 10. , 12.91549665, 16.68100537, 21.5443469 ,
27.82559402, 35.93813664, 46.41588834, 59.94842503,
77.42636827, 100. ])

We used n twice to step forward through the code (one line at a time)
Then we printed the value of x to see what was happening with that variable
To exit from the debugger, use q

15.4 Other Useful Magics

In this lecture, we used the %debug IPython magic

There are many other useful magics:

• %precision 4 sets printed precision for floats to 4 decimal places

• %whos gives a list of variables and their values
• %quickref gives a list of magics

The full list of magics is here

Part IV

Data and Empirics

243
16

Pandas

16.1 Contents

• Overview 16.2

• Series 16.3

• DataFrames 16.4

• On-Line Data Sources 16.5

• Exercises 16.6

• Solutions 16.7

16.2 Overview

Pandas is a package of fast, efficient data analysis tools for Python

Its popularity has surged in recent years, coincident with the rise of fields such as data science
and machine learning
Here’s a popularity comparison over time against STATA and SAS, courtesy of Stack Over-
flow Trends

245
246 16. PANDAS

Just as NumPy provides the basic array data type plus core array operations, pandas

1. defines fundamental structures for working with data and

2. endows them with methods that facilitate operations such as

• reading in data
• adjusting indices
• working with dates and time series
• sorting, grouping, re-ordering and general data munging [1]
• dealing with missing values, etc., etc.

More sophisticated statistical functionality is left to other packages, such as statsmodels and
scikit-learn, which are built on top of pandas
This lecture will provide a basic introduction to pandas
Throughout the lecture, we will assume that the following imports have taken place

In [1]: import pandas as pd

import numpy as np

16.3 Series

Two important data types defined by pandas are Series and DataFrame
You can think of a Series as a “column” of data, such as a collection of observations on a
single variable
A DataFrame is an object for storing related columns of data
Let’s start with Series

In [2]: s = pd.Series(np.random.randn(4), name='daily returns')

Out[2]: 0 0.246617
1 1.616297
16.3. SERIES 247

2 1.371344
3 -0.854713
Name: daily returns, dtype: float64

Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the
values being daily returns on their shares
Pandas Series are built on top of NumPy arrays and support many similar operations

In [3]: s * 100

Out[3]: 0 24.661661
1 161.629724
2 137.134394
3 -85.471300
Name: daily returns, dtype: float64

In [4]: np.abs(s)

Out[4]: 0 0.246617
1 1.616297
2 1.371344
3 0.854713
Name: daily returns, dtype: float64

But Series provide more than NumPy arrays

Not only do they have some additional (statistically oriented) methods

In [5]: s.describe()

Out[5]: count 4.000000

mean 0.594886
std 1.135605
min -0.854713
25% -0.028716
50% 0.808980
75% 1.432582
max 1.616297
Name: daily returns, dtype: float64

But their indices are more flexible

In [6]: s.index = ['AMZN', 'AAPL', 'MSFT', 'GOOG']

Out[6]: AMZN 0.246617

AAPL 1.616297
MSFT 1.371344
GOOG -0.854713
Name: daily returns, dtype: float64

Viewed in this way, Series are like fast, efficient Python dictionaries (with the restriction
that the items in the dictionary all have the same type—in this case, floats)
In fact, you can use much of the same syntax as Python dictionaries

In [7]: s['AMZN']
248 16. PANDAS

Out[7]: 0.24661661104520952

In [8]: s['AMZN'] = 0
s

Out[8]: AMZN 0.000000

AAPL 1.616297
MSFT 1.371344
GOOG -0.854713
Name: daily returns, dtype: float64

In [9]: 'AAPL' in s

Out[9]: True

16.4 DataFrames

While a Series is a single column of data, a DataFrame is several columns, one for each
variable
In essence, a DataFrame in pandas is analogous to a (highly optimized) Excel spreadsheet
Thus, it is a powerful tool for representing and analyzing data that are naturally organized
into rows and columns, often with descriptive indexes for individual rows and individual
columns
Let’s look at an example that reads data from the CSV file pandas/data/test_pwt.csv
that can be downloaded here
Here’s the content of test_pwt.csv

"country","country isocode","year","POP","XRAT","tcgdp","cc","cg"
"Argentina","ARG","2000","37335.653","0.9995","295072.21869","75.716805379","5.5
"Australia","AUS","2000","19053.186","1.72483","541804.6521","67.759025993","6.7
"India","IND","2000","1006300.297","44.9416","1728144.3748","64.575551328","14.0
"Israel","ISR","2000","6114.57","4.07733","129253.89423","64.436450847","10.2666
"Malawi","MWI","2000","11801.505","59.543808333","5026.2217836","74.707624181","
"South Africa","ZAF","2000","45064.098","6.93983","227242.36949","72.718710427",
"United States","USA","2000","282171.957","1","9898700","72.347054303","6.032453
"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","

Supposing you have this data saved as test_pwt.csv in the present working directory (type
%pwd in Jupyter to see what this is), it can be read in as follows:

In [10]: df = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/test_pw
type(df)

Out[10]: pandas.core.frame.DataFrame

In [11]: df

Out[11]: country country isocode year POP XRAT tcgdp \

0 Argentina ARG 2000 37335.653 0.999500 2.950722e+05
1 Australia AUS 2000 19053.186 1.724830 5.418047e+05
16.4. DATAFRAMES 249

2 India IND 2000 1006300.297 44.941600 1.728144e+06

3 Israel ISR 2000 6114.570 4.077330 1.292539e+05
4 Malawi MWI 2000 11801.505 59.543808 5.026222e+03
5 South Africa ZAF 2000 45064.098 6.939830 2.272424e+05
6 United States USA 2000 282171.957 1.000000 9.898700e+06
7 Uruguay URY 2000 3219.793 12.099592 2.525596e+04

cc cg
0 75.716805 5.578804
1 67.759026 6.720098
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
5 72.718710 5.726546
6 72.347054 6.032454
7 78.978740 5.108068

We can select particular rows using standard Python array slicing notation

In [12]: df[2:5]

Out[12]: country country isocode year POP XRAT tcgdp \

2 India IND 2000 1006300.297 44.941600 1.728144e+06
3 Israel ISR 2000 6114.570 4.077330 1.292539e+05
4 Malawi MWI 2000 11801.505 59.543808 5.026222e+03

cc cg
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954

To select columns, we can pass a list containing the names of the desired columns represented
as strings

In [13]: df[['country', 'tcgdp']]

Out[13]: country tcgdp

0 Argentina 2.950722e+05
1 Australia 5.418047e+05
2 India 1.728144e+06
3 Israel 1.292539e+05
4 Malawi 5.026222e+03
5 South Africa 2.272424e+05
6 United States 9.898700e+06
7 Uruguay 2.525596e+04

To select both rows and columns using integers, the iloc attribute should be used with the
format .iloc[rows, columns]

In [14]: df.iloc[2:5, 0:4]

Out[14]: country country isocode year POP

2 India IND 2000 1006300.297
3 Israel ISR 2000 6114.570
4 Malawi MWI 2000 11801.505

To select rows and columns using a mixture of integers and labels, the loc attribute can be
used in a similar way

In [15]: df.loc[df.index[2:5], ['country', 'tcgdp']]

250 16. PANDAS

Out[15]: country tcgdp

2 India 1.728144e+06
3 Israel 1.292539e+05
4 Malawi 5.026222e+03

Let’s imagine that we’re only interested in population and total GDP (tcgdp)
One way to strip the data frame df down to only these variables is to overwrite the
dataframe using the selection method described above

In [16]: df = df[['country', 'POP', 'tcgdp']]

Out[16]: country POP tcgdp

0 Argentina 37335.653 2.950722e+05
1 Australia 19053.186 5.418047e+05
2 India 1006300.297 1.728144e+06
3 Israel 6114.570 1.292539e+05
4 Malawi 11801.505 5.026222e+03
5 South Africa 45064.098 2.272424e+05
6 United States 282171.957 9.898700e+06
7 Uruguay 3219.793 2.525596e+04

Here the index 0, 1,..., 7 is redundant because we can use the country names as an in-
dex
To do this, we set the index to be the country variable in the dataframe

In [17]: df = df.set_index('country')
df

Out[17]: POP tcgdp

country
Argentina 37335.653 2.950722e+05
Australia 19053.186 5.418047e+05
India 1006300.297 1.728144e+06
Israel 6114.570 1.292539e+05
Malawi 11801.505 5.026222e+03
South Africa 45064.098 2.272424e+05
United States 282171.957 9.898700e+06
Uruguay 3219.793 2.525596e+04

Let’s give the columns slightly better names

In [18]: df.columns = 'population', 'total GDP'

Out[18]: population total GDP

Population is in thousands, let’s revert to single units

In [19]: df['population'] = df['population'] * 1e3

df
16.4. DATAFRAMES 251

Out[19]: population total GDP

country
Argentina 3.733565e+07 2.950722e+05
Australia 1.905319e+07 5.418047e+05
India 1.006300e+09 1.728144e+06
Israel 6.114570e+06 1.292539e+05
Malawi 1.180150e+07 5.026222e+03
South Africa 4.506410e+07 2.272424e+05
United States 2.821720e+08 9.898700e+06
Uruguay 3.219793e+06 2.525596e+04

Next, we’re going to add a column showing real GDP per capita, multiplying by 1,000,000 as
we go because total GDP is in millions

In [20]: df['GDP percap'] = df['total GDP'] * 1e6 / df['population']

Out[20]: population total GDP GDP percap

country
Argentina 3.733565e+07 2.950722e+05 7903.229085
Australia 1.905319e+07 5.418047e+05 28436.433261
India 1.006300e+09 1.728144e+06 1717.324719
Israel 6.114570e+06 1.292539e+05 21138.672749
Malawi 1.180150e+07 5.026222e+03 425.896679
South Africa 4.506410e+07 2.272424e+05 5042.647686
United States 2.821720e+08 9.898700e+06 35080.381854
Uruguay 3.219793e+06 2.525596e+04 7843.970620

One of the nice things about pandas DataFrame and Series objects is that they have
methods for plotting and visualization that work through Matplotlib
For example, we can easily generate a bar plot of GDP per capita

In [21]: import matplotlib.pyplot as plt

%matplotlib inline

df['GDP percap'].plot(kind='bar')
plt.show()
252 16. PANDAS

At the moment the data frame is ordered alphabetically on the countries—let’s change it to
GDP per capita

In [22]: df = df.sort_values(by='GDP percap', ascending=False)

Out[22]: population total GDP GDP percap

country
United States 2.821720e+08 9.898700e+06 35080.381854
Australia 1.905319e+07 5.418047e+05 28436.433261
Israel 6.114570e+06 1.292539e+05 21138.672749
Argentina 3.733565e+07 2.950722e+05 7903.229085
Uruguay 3.219793e+06 2.525596e+04 7843.970620
South Africa 4.506410e+07 2.272424e+05 5042.647686
India 1.006300e+09 1.728144e+06 1717.324719
Malawi 1.180150e+07 5.026222e+03 425.896679

Plotting as before now yields

In [23]: df['GDP percap'].plot(kind='bar')

plt.show()
16.5. ON-LINE DATA SOURCES 253

16.5 On-Line Data Sources

Python makes it straightforward to query online databases programmatically

An important database for economists is FRED — a vast collection of time series data main-
tained by the St. Louis Fed
For example, suppose that we are interested in the unemployment rate
Via FRED, the entire series for the US civilian unemployment rate can be downloaded di-
rectly by entering this URL into your browser (note that this requires an internet connection)

https://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv

(Equivalently, click here: https://research.stlouisfed.org/fred2/series/

UNRATE/downloaddata/UNRATE.csv)
This request returns a CSV file, which will be handled by your default application for this
class of files
Alternatively, we can access the CSV file from within a Python program
This can be done with a variety of methods
We start with a relatively low-level method and then return to pandas
254 16. PANDAS

16.5.1 Accessing Data with requests

One option is to use requests, a standard Python library for requesting data over the Internet
To begin, try the following code on your computer

In [24]: import requests

r = requests.get('http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv')

If there’s no error message, then the call has succeeded

If you do get an error, then there are two likely causes

1. You are not connected to the Internet — hopefully, this isn’t the case
2. Your machine is accessing the Internet through a proxy server, and Python isn’t aware
of this

In the second case, you can either

• switch to another machine

• solve your proxy problem by reading the documentation

Assuming that all is working, you can now proceed

to use the source object returned by the call re-
quests.get('http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRA

In [25]: url = 'http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv'

source = requests.get(url).content.decode().split("\n")
source[0]

Out[25]: 'DATE,VALUE\r'

In [26]: source[1]

Out[26]: '1948-01-01,3.4\r'

In [27]: source[2]

Out[27]: '1948-02-01,3.8\r'

We could now write some additional code to parse this text and store it as an array
But this is unnecessary — pandas’ read_csv function can handle the task for us
We use parse_dates=True so that pandas recognizes our dates column, allowing for simple
date filtering

In [28]: data = pd.read_csv(url, index_col=0, parse_dates=True)

The data has been read into a pandas DataFrame called data that we can now manipulate in
the usual way
16.5. ON-LINE DATA SOURCES 255

In [29]: type(data)

Out[29]: pandas.core.frame.DataFrame

In [30]: data.head() # A useful method to get a quick look at a data frame

Out[30]: VALUE
DATE
1948-01-01 3.4
1948-02-01 3.8
1948-03-01 4.0
1948-04-01 3.9
1948-05-01 3.5

In [31]: pd.set_option('precision', 1)
data.describe() # Your output might differ slightly

Out[31]: VALUE
count 857.0
mean 5.8
std 1.6
min 2.5
25% 4.6
50% 5.6
75% 6.8
max 10.8

We can also plot the unemployment rate from 2006 to 2012 as follows

In [32]: data['2006':'2012'].plot()
plt.show()
256 16. PANDAS

16.5.2 Accessing World Bank Data

Let’s look at one more example of downloading and manipulating data — this time from the
World Bank
The World Bank collects and organizes data on a huge range of indicators
For example, here’s some data on government debt as a ratio to GDP
If you click on “DOWNLOAD DATA” you will be given the option to download the data as
an Excel file
The next program does this for you, reads an Excel file into a pandas DataFrame, and plots
time series for the US and Australia

In [33]: import matplotlib.pyplot as plt

import requests
import pandas as pd

# == Get data and read into file gd.xls == #

wb_data_query = "http://api.worldbank.org/v2/en/indicator/gc.dod.totl.gd.zs?downloadformat=excel"
r = requests.get(wb_data_query)
with open('gd.xls', 'wb') as output:
output.write(r.content)

# == Parse data into a DataFrame == #

govt_debt = pd.read_excel('gd.xls', sheet_name='Data', skiprows=3, index_col=1)

# == Take desired values and plot == #

govt_debt = govt_debt.transpose()
govt_debt = govt_debt[['AUS', 'USA']]
govt_debt = govt_debt[38:]
govt_debt.plot(lw=2)
plt.show()

(The file is pandas/wb_download.py, and can be downloaded here

16.6. EXERCISES 257

16.6 Exercises

16.6.1 Exercise 1

Write a program to calculate the percentage price change over 2013 for the following shares

In [34]: ticker_list = {'INTC': 'Intel',

'MSFT': 'Microsoft',
'IBM': 'IBM',
'BHP': 'BHP',
'TM': 'Toyota',
'AAPL': 'Apple',
'AMZN': 'Amazon',
'BA': 'Boeing',
'QCOM': 'Qualcomm',
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}

A dataset of daily closing prices for the above firms can be found in pan-
das/data/ticker_data.csv and can be downloaded here
Plot the result as a bar graph like follows

16.7 Solutions

16.7.1 Exercise 1
In [35]: ticker = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/tic
ticker.set_index('Date', inplace=True)

ticker_list = {'INTC': 'Intel',

'MSFT': 'Microsoft',
'IBM': 'IBM',
'BHP': 'BHP',
'TM': 'Toyota',
'AAPL': 'Apple',
258 16. PANDAS

'AMZN': 'Amazon',
'BA': 'Boeing',
'QCOM': 'Qualcomm',
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}

price_change = pd.Series()

for tick in ticker_list:

change = 100 * (ticker.loc[ticker.index[-1], tick] - ticker.loc[ticker.index[0], tick]) / ticker.
name = ticker_list[tick]
price_change[name] = change

price_change.sort_values(inplace=True)
fig, ax = plt.subplots(figsize=(10,8))
price_change.plot(kind='bar', ax=ax)
plt.show()

Footnotes
[1] Wikipedia defines munging as cleaning data from one raw form into a structured, purged
one.
17

Pandas for Panel Data

17.1 Contents

• Overview 17.2

• Slicing and Reshaping Data 17.3

• Merging Dataframes and Filling NaNs 17.4

• Grouping and Summarizing Data 17.5

• Final Remarks 17.6

• Exercises 17.7

• Solutions 17.8

17.2 Overview

In an earlier lecture on pandas, we looked at working with simple data sets

Econometricians often need to work with more complex data sets, such as panels
Common tasks include

• Importing data, cleaning it and reshaping it across several axes

• Selecting a time series or cross-section from a panel
• Grouping and summarizing data

pandas (derived from ‘panel’ and ‘data’) contains powerful and easy-to-use tools for solving
exactly these kinds of problems
In what follows, we will use a panel data set of real minimum wages from the OECD to cre-
ate:

• summary statistics over multiple dimensions of our data

• a time series of the average minimum wage of countries in the dataset
• kernel density estimates of wages by continent

259
260 17. PANDAS FOR PANEL DATA

We will begin by reading in our long format panel data from a CSV file and reshaping the
resulting DataFrame with pivot_table to build a MultiIndex
Additional detail will be added to our DataFrame using pandas’ merge function, and data
will be summarized with the groupby function
Most of this lecture was created by Natasha Watkins

17.3 Slicing and Reshaping Data

We will read in a dataset from the OECD of real minimum wages in 32 countries and assign
it to realwage
The dataset pandas_panel/realwage.csv can be downloaded here
Make sure the file is in your current working directory

In [1]: import pandas as pd

# Display 6 columns for viewing purposes

pd.set_option('display.max_columns', 6)

# Reduce decimal points to 2

pd.options.display.float_format = '{:,.2f}'.format

realwage = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/r

Let’s have a look at what we’ve got to work with

In [2]: realwage.head() # Show first 5 rows

Out[2]: Unnamed: 0 Time Country Series \

0 0 2006-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
1 1 2007-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
2 2 2008-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
3 3 2009-01-01 Ireland In 2015 constant prices at 2015 USD PPPs
4 4 2010-01-01 Ireland In 2015 constant prices at 2015 USD PPPs

Pay period value

0 Annual 17,132.44
1 Annual 18,100.92
2 Annual 17,747.41
3 Annual 18,580.14
4 Annual 18,755.83

The data is currently in long format, which is difficult to analyze when there are several di-
mensions to the data
We will use pivot_table to create a wide format panel, with a MultiIndex to handle
higher dimensional data
pivot_table arguments should specify the data (values), the index, and the columns we
want in our resulting dataframe
By passing a list in columns, we can create a MultiIndex in our column axis

In [3]: realwage = realwage.pivot_table(values='value',

index='Time',
columns=['Country', 'Series', 'Pay period'])
realwage.head()
17.3. SLICING AND RESHAPING DATA 261

Out[3]: Country Australia \

Series In 2015 constant prices at 2015 USD PPPs
Pay period Annual Hourly
Time
2006-01-01 20,410.65 10.33
2007-01-01 21,087.57 10.67
2008-01-01 20,718.24 10.48
2009-01-01 20,984.77 10.62
2010-01-01 20,879.33 10.57

Country … \
Series In 2015 constant prices at 2015 USD exchange rates …
Pay period Annual …
Time …
2006-01-01 23,826.64 …
2007-01-01 24,616.84 …
2008-01-01 24,185.70 …
2009-01-01 24,496.84 …
2010-01-01 24,373.76 …

Country United States \

Series In 2015 constant prices at 2015 USD PPPs
Pay period Hourly
Time
2006-01-01 6.05
2007-01-01 6.24
2008-01-01 6.78
2009-01-01 7.58
2010-01-01 7.88

Country
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88

[5 rows x 128 columns]

To more easily filter our time series data, later on, we will convert the index into a Date-
TimeIndex

In [4]: realwage.index = pd.to_datetime(realwage.index)

type(realwage.index)

Out[4]: pandas.core.indexes.datetimes.DatetimeIndex

The columns contain multiple levels of indexing, known as a MultiIndex, with levels being
ordered hierarchically (Country > Series > Pay period)
A MultiIndex is the simplest and most flexible way to manage panel data in pandas

In [5]: type(realwage.columns)

Out[5]: pandas.core.indexes.multi.MultiIndex

In [6]: realwage.columns.names

Out[6]: FrozenList(['Country', 'Series', 'Pay period'])

Like before, we can select the country (the top level of our MultiIndex)
262 17. PANDAS FOR PANEL DATA

In [7]: realwage['United States'].head()

Out[7]: Series In 2015 constant prices at 2015 USD PPPs \

Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88

Series In 2015 constant prices at 2015 USD exchange rates

Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88

Stacking and unstacking levels of the MultiIndex will be used throughout this lecture to
reshape our dataframe into a format we need
.stack() rotates the lowest level of the column MultiIndex to the row index (.un-
stack() works in the opposite direction - try it out)

In [8]: realwage.stack().head()

Out[8]: Country Australia \

Series In 2015 constant prices at 2015 USD PPPs
Time Pay period
2006-01-01 Annual 20,410.65
Hourly 10.33
2007-01-01 Annual 21,087.57
Hourly 10.67
2008-01-01 Annual 20,718.24

Country \
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 23,826.64
Hourly 12.06
2007-01-01 Annual 24,616.84
Hourly 12.46
2008-01-01 Annual 24,185.70

Country Belgium … \
Series In 2015 constant prices at 2015 USD PPPs …
Time Pay period …
2006-01-01 Annual 21,042.28 …
Hourly 10.09 …
2007-01-01 Annual 21,310.05 …
Hourly 10.22 …
2008-01-01 Annual 21,416.96 …

Country United Kingdom \

Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 20,376.32
Hourly 9.81
2007-01-01 Annual 20,954.13
Hourly 10.07
2008-01-01 Annual 20,902.87

Country United States \

Series In 2015 constant prices at 2015 USD PPPs
Time Pay period
2006-01-01 Annual 12,594.40
Hourly 6.05
17.3. SLICING AND RESHAPING DATA 263

2007-01-01 Annual 12,974.40

Hourly 6.24
2008-01-01 Annual 14,097.56

Country
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 12,594.40
Hourly 6.05
2007-01-01 Annual 12,974.40
Hourly 6.24
2008-01-01 Annual 14,097.56

[5 rows x 64 columns]

We can also pass in an argument to select the level we would like to stack

In [9]: realwage.stack(level='Country').head()

Out[9]: Series In 2015 constant prices at 2015 USD PPPs \

Pay period Annual Hourly
Time Country
2006-01-01 Australia 20,410.65 10.33
Belgium 21,042.28 10.09
Brazil 3,310.51 1.41
Canada 13,649.69 6.56
Chile 5,201.65 2.22

Series In 2015 constant prices at 2015 USD exchange rates

Pay period Annual Hourly
Time Country
2006-01-01 Australia 23,826.64 12.06
Belgium 20,228.74 9.70
Brazil 2,032.87 0.87
Canada 14,335.12 6.89
Chile 3,333.76 1.42

Using a DatetimeIndex makes it easy to select a particular time period

Selecting one year and stacking the two lower levels of the MultiIndex creates a cross-
section of our panel data

In [10]: realwage['2015'].stack(level=(1, 2)).transpose().head()

Out[10]: Time 2015-01-01 \

Series In 2015 constant prices at 2015 USD PPPs
Pay period Annual Hourly
Country
Australia 21,715.53 10.99
Belgium 21,588.12 10.35
Brazil 4,628.63 2.00
Canada 16,536.83 7.95
Chile 6,633.56 2.80

Time
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Country
Australia 25,349.90 12.83
Belgium 20,753.48 9.95
Brazil 2,842.28 1.21
Canada 17,367.24 8.35
Chile 4,251.49 1.81

For the rest of lecture, we will work with a dataframe of the hourly real minimum wages
across countries and time, measured in 2015 US dollars
264 17. PANDAS FOR PANEL DATA

To create our filtered dataframe (realwage_f), we can use the xs method to select values
at lower levels in the multiindex, while keeping the higher levels (countries in this case)

In [11]: realwage_f = realwage.xs(('Hourly', 'In 2015 constant prices at 2015 USD exchange rates'),
level=('Pay period', 'Series'), axis=1)
realwage_f.head()

Out[11]: Country Australia Belgium Brazil … Turkey United Kingdom \

Time …
2006-01-01 12.06 9.70 0.87 … 2.27 9.81
2007-01-01 12.46 9.82 0.92 … 2.26 10.07
2008-01-01 12.24 9.87 0.96 … 2.22 10.04
2009-01-01 12.40 10.21 1.03 … 2.28 10.15
2010-01-01 12.34 10.05 1.08 … 2.30 9.96

Country United States

Time
2006-01-01 6.05
2007-01-01 6.24
2008-01-01 6.78
2009-01-01 7.58
2010-01-01 7.88

[5 rows x 32 columns]

17.4 Merging Dataframes and Filling NaNs

Similar to relational databases like SQL, pandas has built in methods to merge datasets to-
gether
Using country information from WorldData.info, we’ll add the continent of each country to
realwage_f with the merge function
The CSV file can be found in pandas_panel/countries.csv and can be downloaded
here

In [12]: worlddata = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel

worlddata.head()

Out[12]: Country (en) Country (de) Country (local) … Deathrate \

0 Afghanistan Afghanistan Afganistan/Afqanestan … 13.70
1 Egypt Ägypten Misr … 4.70
2 Åland Islands Ålandinseln Åland … 0.00
3 Albania Albanien Shqipëria … 6.70
4 Algeria Algerien Al-Jaza’ir/Algérie … 4.30

Life expectancy Url

0 51.30 https://www.laenderdaten.info/Asien/Afghanista…
1 72.70 https://www.laenderdaten.info/Afrika/Aegypten/…
2 0.00 https://www.laenderdaten.info/Europa/Aland/ind…
3 78.30 https://www.laenderdaten.info/Europa/Albanien/…
4 76.80 https://www.laenderdaten.info/Afrika/Algerien/…

[5 rows x 17 columns]

First, we’ll select just the country and continent variables from worlddata and rename the
column to ‘Country’

In [13]: worlddata = worlddata[['Country (en)', 'Continent']]

worlddata = worlddata.rename(columns={'Country (en)': 'Country'})
worlddata.head()
17.4. MERGING DATAFRAMES AND FILLING NANS 265

Out[13]: Country Continent

0 Afghanistan Asia
1 Egypt Africa
2 Åland Islands Europe
3 Albania Europe
4 Algeria Africa

We want to merge our new dataframe, worlddata, with realwage_f

The pandas merge function allows dataframes to be joined together by rows
Our dataframes will be merged using country names, requiring us to use the transpose of re-
alwage_f so that rows correspond to country names in both dataframes

In [14]: realwage_f.transpose().head()

Out[14]: Time 2006-01-01 2007-01-01 2008-01-01 … 2014-01-01 2015-01-01 \

Country …
Australia 12.06 12.46 12.24 … 12.67 12.83
Belgium 9.70 9.82 9.87 … 10.01 9.95
Brazil 0.87 0.92 0.96 … 1.21 1.21
Canada 6.89 6.96 7.24 … 8.22 8.35
Chile 1.42 1.45 1.44 … 1.76 1.81

Time 2016-01-01
Country
Australia 12.98
Belgium 9.76
Brazil 1.24
Canada 8.48
Chile 1.91

[5 rows x 11 columns]

We can use either left, right, inner, or outer join to merge our datasets:

• left join includes only countries from the left dataset

• right join includes only countries from the right dataset
• outer join includes countries that are in either the left and right datasets
• inner join includes only countries common to both the left and right datasets

By default, merge will use an inner join

Here we will pass how='left' to keep all countries in realwage_f, but discard countries
in worlddata that do not have a corresponding data entry realwage_f
This is illustrated by the red shading in the following diagram
266 17. PANDAS FOR PANEL DATA

We will also need to specify where the country name is located in each dataframe, which will
be the key that is used to merge the dataframes ‘on’
Our ‘left’ dataframe (realwage_f.transpose()) contains countries in the index, so we
set left_index=True
Our ‘right’ dataframe (worlddata) contains countries in the ‘Country’ column, so we set
right_on='Country'

In [15]: merged = pd.merge(realwage_f.transpose(), worlddata,

how='left', left_index=True, right_on='Country')
merged.head()

Out[15]: 2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 … \

17 12.06 12.46 12.24 …
23 9.70 9.82 9.87 …
32 0.87 0.92 0.96 …
100 6.89 6.96 7.24 …
38 1.42 1.45 1.44 …

2016-01-01 00:00:00 Country Continent

17 12.98 Australia Australia
23 9.76 Belgium Europe
32 1.24 Brazil South America
100 8.48 Canada North America
38 1.91 Chile South America

[5 rows x 13 columns]

Countries that appeared in realwage_f but not in worlddata will have NaN in the Conti-
nent column
To check whether this has occurred, we can use .isnull() on the continent column and
filter the merged dataframe

In [16]: merged[merged['Continent'].isnull()]

Out[16]: 2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 … \

247 3.42 3.74 3.87 …
247 0.23 0.45 0.39 …
247 1.50 1.64 1.71 …
17.4. MERGING DATAFRAMES AND FILLING NANS 267

2016-01-01 00:00:00 Country Continent

247 5.28 Korea NaN
247 0.55 Russian Federation NaN
247 2.08 Slovak Republic NaN

[3 rows x 13 columns]

We have three missing values!

One option to deal with NaN values is to create a dictionary containing these countries and
their respective continents
.map() will match countries in merged[' Country '] with their continent from the dic-
tionary
Notice how countries not in our dictionary are mapped with NaN

In [17]: missing_continents = {'Korea': 'Asia',

'Russian Federation': 'Europe',
'Slovak Republic': 'Europe'}

merged['Country'].map(missing_continents)

Out[17]: 17 NaN
23 NaN
32 NaN
100 NaN
38 NaN
108 NaN
41 NaN
225 NaN
53 NaN
58 NaN
45 NaN
68 NaN
233 NaN
86 NaN
88 NaN
91 NaN
247 Asia
117 NaN
122 NaN
123 NaN
138 NaN
153 NaN
151 NaN
174 NaN
175 NaN
247 Europe
247 Europe
198 NaN
200 NaN
227 NaN
241 NaN
240 NaN
Name: Country, dtype: object

We don’t want to overwrite the entire series with this mapping

.fillna() only fills in NaN values in merged['Continent'] with the mapping, while
leaving other values in the column unchanged

In [18]: merged['Continent'] = merged['Continent'].fillna(merged['Country'].map(missing_continents))

# Check for whether continents were correctly mapped

merged[merged['Country'] == 'Korea']
268 17. PANDAS FOR PANEL DATA

Out[18]: 2006-01-01 00:00:00 2007-01-01 00:00:00 2008-01-01 00:00:00 … \

247 3.42 3.74 3.87 …

2016-01-01 00:00:00 Country Continent

247 5.28 Korea Asia

[1 rows x 13 columns]

We will also combine the Americas into a single continent - this will make our visualization
nicer later on
To do this, we will use .replace() and loop through a list of the continent values we want
to replace

In [19]: replace = ['Central America', 'North America', 'South America']

for country in replace:

merged['Continent'].replace(to_replace=country,
value='America',
inplace=True)

Now that we have all the data we want in a single DataFrame, we will reshape it back into
panel form with a MultiIndex
We should also ensure to sort the index using .sort_index() so that we can efficiently fil-
ter our dataframe later on
By default, levels will be sorted top-down

In [20]: merged = merged.set_index(['Continent', 'Country']).sort_index()

merged.head()

Out[20]: 2006-01-01 2007-01-01 2008-01-01 … 2014-01-01 \

Continent Country …
America Brazil 0.87 0.92 0.96 … 1.21
Canada 6.89 6.96 7.24 … 8.22
Chile 1.42 1.45 1.44 … 1.76
Colombia 1.01 1.02 1.01 … 1.13
Costa Rica nan nan nan … 2.41

2015-01-01 2016-01-01
Continent Country
America Brazil 1.21 1.24
Canada 8.35 8.48
Chile 1.81 1.91
Colombia 1.13 1.12
Costa Rica 2.56 2.63

[5 rows x 11 columns]

While merging, we lost our DatetimeIndex, as we merged columns that were not in date-
time format

In [21]: merged.columns

Out[21]: Index([2006-01-01 00:00:00, 2007-01-01 00:00:00, 2008-01-01 00:00:00,

2009-01-01 00:00:00, 2010-01-01 00:00:00, 2011-01-01 00:00:00,
2012-01-01 00:00:00, 2013-01-01 00:00:00, 2014-01-01 00:00:00,
2015-01-01 00:00:00, 2016-01-01 00:00:00],
dtype='object')

Now that we have set the merged columns as the index, we can recreate a DatetimeIndex
using .to_datetime()
17.5. GROUPING AND SUMMARIZING DATA 269

In [22]: merged.columns = pd.to_datetime(merged.columns)

merged.columns = merged.columns.rename('Time')
merged.columns

Out[22]: DatetimeIndex(['2006-01-01', '2007-01-01', '2008-01-01', '2009-01-01',

'2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01',
'2014-01-01', '2015-01-01', '2016-01-01'],
dtype='datetime64[ns]', name='Time', freq=None)

The DatetimeIndex tends to work more smoothly in the row axis, so we will go ahead and
transpose merged

In [23]: merged = merged.transpose()

merged.head()

Out[23]: Continent America … Europe

Country Brazil Canada Chile … Slovenia Spain United Kingdom
Time …
2006-01-01 0.87 6.89 1.42 … 3.92 3.99 9.81
2007-01-01 0.92 6.96 1.45 … 3.88 4.10 10.07
2008-01-01 0.96 7.24 1.44 … 3.96 4.14 10.04
2009-01-01 1.03 7.67 1.52 … 4.08 4.32 10.15
2010-01-01 1.08 7.94 1.56 … 4.81 4.30 9.96

[5 rows x 32 columns]

17.5 Grouping and Summarizing Data

Grouping and summarizing data can be particularly useful for understanding large panel
datasets
A simple way to summarize data is to call an aggregation method on the dataframe, such as
.mean() or .max()
For example, we can calculate the average real minimum wage for each country over the pe-
riod 2006 to 2016 (the default is to aggregate over rows)

In [24]: merged.mean().head(10)

Out[24]: Continent Country

America Brazil 1.09
Canada 7.82
Chile 1.62
Colombia 1.07
Costa Rica 2.53
Mexico 0.53
United States 7.15
Asia Israel 5.95
Japan 6.18
Korea 4.22
dtype: float64

Using this series, we can plot the average real minimum wage over the past decade for each
country in our data set

In [25]: import matplotlib.pyplot as plt

%matplotlib inline
import matplotlib
matplotlib.style.use('seaborn')
270 17. PANDAS FOR PANEL DATA

merged.mean().sort_values(ascending=False).plot(kind='bar', title="Average real minimum wage 2006 - 2

#Set country labels

country_labels = merged.mean().sort_values(ascending=False).index.get_level_values('Country').tolist(
plt.xticks(range(0, len(country_labels)), country_labels)
plt.xlabel('Country')

plt.show()

Passing in axis=1 to .mean() will aggregate over columns (giving the average minimum
wage for all countries over time)

In [26]: merged.mean(axis=1).head()

Out[26]: Time
2006-01-01 4.69
2007-01-01 4.84
2008-01-01 4.90
2009-01-01 5.08
2010-01-01 5.11
dtype: float64

We can plot this time series as a line graph

In [27]: merged.mean(axis=1).plot()
plt.title('Average real minimum wage 2006 - 2016')
17.5. GROUPING AND SUMMARIZING DATA 271

plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()

We can also specify a level of the MultiIndex (in the column axis) to aggregate over

In [28]: merged.mean(level='Continent', axis=1).head()

Out[28]: Continent America Asia Australia Europe

Time
2006-01-01 2.80 4.29 10.25 4.80
2007-01-01 2.85 4.44 10.73 4.94
2008-01-01 2.99 4.45 10.76 4.99
2009-01-01 3.23 4.53 10.97 5.16
2010-01-01 3.34 4.53 10.95 5.17

We can plot the average minimum wages in each continent as a time series

In [29]: merged.mean(level='Continent', axis=1).plot()

plt.title('Average real minimum wage')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
272 17. PANDAS FOR PANEL DATA

We will drop Australia as a continent for plotting purposes

In [30]: merged = merged.drop('Australia', level='Continent', axis=1)

merged.mean(level='Continent', axis=1).plot()
plt.title('Average real minimum wage')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
17.5. GROUPING AND SUMMARIZING DATA 273

.describe() is useful for quickly retrieving a number of common summary statistics

In [31]: merged.stack().describe()

Out[31]: Continent America Asia Europe

count 69.00 44.00 200.00
mean 3.19 4.70 5.15
std 3.02 1.56 3.82
min 0.52 2.22 0.23
25% 1.03 3.37 2.02
50% 1.44 5.48 3.54
75% 6.96 5.95 9.70
max 8.48 6.65 12.39

This is a simplified way to use groupby

Using groupby generally follows a ‘split-apply-combine’ process:

• split: data is grouped based on one or more keys

• apply: a function is called on each group independently
• combine: the results of the function calls are combined into a new data structure

The groupby method achieves the first step of this process, creating a new
DataFrameGroupBy object with data split into groups
Let’s split merged by continent again, this time using the groupby function, and name the
resulting object grouped

In [32]: grouped = merged.groupby(level='Continent', axis=1)

grouped

Out[32]: <pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f59c27f9da0>

Calling an aggregation method on the object applies the function to each group, the results of
which are combined in a new data structure
For example, we can return the number of countries in our dataset for each continent using
.size()
In this case, our new data structure is a Series

In [33]: grouped.size()

Out[33]: Continent
America 7
Asia 4
Europe 19
dtype: int64

Calling .get_group() to return just the countries in a single group, we can create a kernel
density estimate of the distribution of real minimum wages in 2016 for each continent
grouped.groups.keys() will return the keys from the groupby object
274 17. PANDAS FOR PANEL DATA

In [34]: import seaborn as sns

continents = grouped.groups.keys()

for continent in continents:

sns.kdeplot(grouped.get_group(continent)['2015'].unstack(), label=continent, shade=True)

plt.title('Real minimum wages in 2015')

plt.xlabel('US dollars')
plt.show()

17.6 Final Remarks

This lecture has provided an introduction to some of pandas’ more advanced features, includ-
ing multiindices, merging, grouping and plotting
Other tools that may be useful in panel data analysis include xarray, a python package that
extends pandas to N-dimensional data structures

17.7 Exercises

17.7.1 Exercise 1

In these exercises, you’ll work with a dataset of employment rates in Europe by age and sex
from Eurostat
The dataset pandas_panel/employ.csv can be downloaded here
Reading in the CSV file returns a panel dataset in long format. Use .pivot_table() to
construct a wide format dataframe with a MultiIndex in the columns
17.8. SOLUTIONS 275

Start off by exploring the dataframe and the variables available in the MultiIndex levels
Write a program that quickly returns all values in the MultiIndex

17.7.2 Exercise 2

Filter the above dataframe to only include employment as a percentage of ‘active population’
Create a grouped boxplot using seaborn of employment rates in 2015 by age group and sex
Hint: GEO includes both areas and countries

17.8 Solutions

17.8.1 Exercise 1
In [35]: employ = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/em
employ = employ.pivot_table(values='Value',
index=['DATE'],
columns=['UNIT','AGE', 'SEX', 'INDIC_EM', 'GEO'])
employ.index = pd.to_datetime(employ.index) # ensure that dates are datetime format
employ.head()

Out[35]: UNIT Percentage of total population … \

AGE From 15 to 24 years …
SEX Females …
INDIC_EM Active population …
GEO Austria Belgium Bulgaria …
DATE …
2007-01-01 56.00 31.60 26.00 …
2008-01-01 56.20 30.80 26.10 …
2009-01-01 56.20 29.90 24.80 …
2010-01-01 54.00 29.80 26.60 …
2011-01-01 54.80 29.80 24.80 …

UNIT Thousand persons \

AGE From 55 to 64 years
SEX Total
INDIC_EM Total employment (resident population concept - LFS)
GEO Switzerland Turkey
DATE
2007-01-01 nan 1,282.00
2008-01-01 nan 1,354.00
2009-01-01 nan 1,449.00
2010-01-01 640.00 1,583.00
2011-01-01 661.00 1,760.00

UNIT
AGE
SEX
INDIC_EM
GEO United Kingdom
DATE
2007-01-01 4,131.00
2008-01-01 4,204.00
2009-01-01 4,193.00
2010-01-01 4,186.00
2011-01-01 4,164.00

[5 rows x 1440 columns]

This is a large dataset so it is useful to explore the levels and variables available

In [36]: employ.columns.names
276 17. PANDAS FOR PANEL DATA

Out[36]: FrozenList(['UNIT', 'AGE', 'SEX', 'INDIC_EM', 'GEO'])

Variables within levels can be quickly retrieved with a loop

In [37]: for name in employ.columns.names:

print(name, employ.columns.get_level_values(name).unique())

UNIT Index(['Percentage of total population', 'Thousand persons'], dtype='object', name='UNIT')

AGE Index(['From 15 to 24 years', 'From 25 to 54 years', 'From 55 to 64 years'], dtype='object', name='AGE')
SEX Index(['Females', 'Males', 'Total'], dtype='object', name='SEX')
INDIC_EM Index(['Active population', 'Total employment (resident population concept - LFS)'], dtype='object',
GEO Index(['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic',
'Denmark', 'Estonia', 'Euro area (17 countries)',
'Euro area (18 countries)', 'Euro area (19 countries)',
'European Union (15 countries)', 'European Union (27 countries)',
'European Union (28 countries)', 'Finland',
'Former Yugoslav Republic of Macedonia, the', 'France',
'France (metropolitan)',
'Germany (until 1990 former territory of the FRG)', 'Greece', 'Hungary',
'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg',
'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania',
'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey',
'United Kingdom'],
dtype='object', name='GEO')

17.8.2 Exercise 2

To easily filter by country, swap GEO to the top level and sort the MultiIndex

In [38]: employ.columns = employ.columns.swaplevel(0,-1)

employ = employ.sort_index(axis=1)

We need to get rid of a few items in GEO which are not countries
A fast way to get rid of the EU areas is to use a list comprehension to find the level values in
GEO that begin with ‘Euro’

In [39]: geo_list = employ.columns.get_level_values('GEO').unique().tolist()

countries = [x for x in geo_list if not x.startswith('Euro')]
employ = employ[countries]
employ.columns.get_level_values('GEO').unique()

Out[39]: Index(['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic',

'Denmark', 'Estonia', 'Finland',
'Former Yugoslav Republic of Macedonia, the', 'France',
'France (metropolitan)',
'Germany (until 1990 former territory of the FRG)', 'Greece', 'Hungary',
'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg',
'Malta', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania',
'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Turkey',
'United Kingdom'],
dtype='object', name='GEO')

Select only percentage employed in the active population from the dataframe

In [40]: employ_f = employ.xs(('Percentage of total population', 'Active population'),

level=('UNIT', 'INDIC_EM'),
axis=1)
employ_f.head()
17.8. SOLUTIONS 277

Out[40]: GEO Austria … United Kingdom \

AGE From 15 to 24 years … From 55 to 64 years
SEX Females Males Total … Females Males
DATE …
2007-01-01 56.00 62.90 59.40 … 49.90 68.90
2008-01-01 56.20 62.90 59.50 … 50.20 69.80
2009-01-01 56.20 62.90 59.50 … 50.60 70.30
2010-01-01 54.00 62.60 58.30 … 51.10 69.20
2011-01-01 54.80 63.60 59.20 … 51.30 68.40

GEO
AGE
SEX Total
DATE
2007-01-01 59.30
2008-01-01 59.80
2009-01-01 60.30
2010-01-01 60.00
2011-01-01 59.70

[5 rows x 306 columns]

Drop the ‘Total’ value before creating the grouped boxplot

In [41]: employ_f = employ_f.drop('Total', level='SEX', axis=1)

In [42]: box = employ_f['2015'].unstack().reset_index()

sns.boxplot(x="AGE", y=0, hue="SEX", data=box, palette=("husl"), showfliers=False)
plt.xlabel('')
plt.xticks(rotation=35)
plt.ylabel('Percentage of population (%)')
plt.title('Employment in Europe (2015)')
plt.legend(bbox_to_anchor=(1,0.5))
plt.show()
278 17. PANDAS FOR PANEL DATA
18

Linear Regression in Python

18.1 Contents

• Overview 18.2

• Simple Linear Regression 18.3

• Extending the Linear Regression Model 18.4

• Endogeneity 18.5

• Summary 18.6

• Exercises 18.7

• Solutions 18.8

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install linearmodels

18.2 Overview

Linear regression is a standard tool for analyzing the relationship between two or more vari-
ables
In this lecture, we’ll use the Python package statsmodels to estimate, interpret, and visu-
alize linear regression models
Along the way, we’ll discuss a variety of topics, including

• simple and multivariate linear regression

• visualization
• endogeneity and omitted variable bias
• two-stage least squares

As an example, we will replicate results from Acemoglu, Johnson and Robinson’s seminal pa-
per [3]

279
280 18. LINEAR REGRESSION IN PYTHON

• You can download a copy here

In the paper, the authors emphasize the importance of institutions in economic development
The main contribution is the use of settler mortality rates as a source of exogenous variation
in institutional differences
Such variation is needed to determine whether it is institutions that give rise to greater eco-
nomic growth, rather than the other way around

18.2.1 Prerequisites

This lecture assumes you are familiar with basic econometrics

For an introductory text covering these topics, see, for example, [135]

18.2.2 Comments

This lecture is coauthored with Natasha Watkins

18.3 Simple Linear Regression

[3] wish to determine whether or not differences in institutions can help to explain observed
economic outcomes
How do we measure institutional differences and economic outcomes?
In this paper,

• economic outcomes are proxied by log GDP per capita in 1995, adjusted for exchange
rates
• institutional differences are proxied by an index of protection against expropriation on
average over 1985-95, constructed by the Political Risk Services Group

These variables and other data used in the paper are available for download on Daron Ace-
moglu’s webpage
We will use pandas’ .read_stata() function to read in data contained in the .dta files to
dataframes

In [2]: import pandas as pd

df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable1.dt
df1.head()

Out[2]: shortnam euro1900 excolony avexpr logpgp95 cons1 cons90 democ00a \

0 AFG 0.000000 1.0 NaN NaN 1.0 2.0 1.0
1 AGO 8.000000 1.0 5.363636 7.770645 3.0 3.0 0.0
2 ARE 0.000000 1.0 7.181818 9.804219 NaN NaN NaN
3 ARG 60.000004 1.0 6.386364 9.133459 1.0 6.0 3.0
4 ARM 0.000000 0.0 NaN 7.682482 NaN NaN NaN

cons00a extmort4 logem4 loghjypl baseco

0 1.0 93.699997 4.540098 NaN NaN
1 1.0 280.000000 5.634789 -3.411248 1.0
18.3. SIMPLE LINEAR REGRESSION 281

2 NaN NaN NaN NaN NaN

3 3.0 68.900002 4.232656 -0.872274 1.0
4 NaN NaN NaN NaN NaN

Let’s use a scatterplot to see whether any obvious relationship exists between GDP per capita
and the protection against expropriation index

In [3]: import matplotlib.pyplot as plt

%matplotlib inline
plt.style.use('seaborn')

df1.plot(x='avexpr', y='logpgp95', kind='scatter')

plt.show()

The plot shows a fairly strong positive relationship between protection against expropriation
and log GDP per capita
Specifically, if higher protection against expropriation is a measure of institutional quality,
then better institutions appear to be positively correlated with better economic outcomes
(higher GDP per capita)
Given the plot, choosing a linear model to describe this relationship seems like a reasonable
assumption
We can write our model as

𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 + 𝑢𝑖

where:

• 𝛽0 is the intercept of the linear trend line on the y-axis

282 18. LINEAR REGRESSION IN PYTHON

• 𝛽1 is the slope of the linear trend line, representing the marginal effect of protection
against risk on log GDP per capita
• 𝑢𝑖 is a random error term (deviations of observations from the linear trend due to fac-
tors not included in the model)

Visually, this linear model involves choosing a straight line that best fits the data, as in the
following plot (Figure 2 in [3])

In [4]: import numpy as np

# Dropping NA's is required to use numpy's polyfit

df1_subset = df1.dropna(subset=['logpgp95', 'avexpr'])

# Use only 'base sample' for plotting purposes

df1_subset = df1_subset[df1_subset['baseco'] == 1]

X = df1_subset['avexpr']
y = df1_subset['logpgp95']
labels = df1_subset['shortnam']

# Replace markers with country labels

plt.scatter(X, y, marker='')

for i, label in enumerate(labels):

plt.annotate(label, (X.iloc[i], y.iloc[i]))

# Fit a linear trend line

plt.plot(np.unique(X),
np.poly1d(np.polyfit(X, y, 1))(np.unique(X)),
color='black')

plt.xlim([3.3,10.5])
plt.ylim([4,10.5])
plt.xlabel('Average Expropriation Risk 1985-95')
plt.ylabel('Log GDP per capita, PPP, 1995')
plt.title('Figure 2: OLS relationship between expropriation risk and income')
plt.show()
18.3. SIMPLE LINEAR REGRESSION 283

The most common technique to estimate the parameters (𝛽’s) of the linear model is Ordinary
Least Squares (OLS)
As the name implies, an OLS model is solved by finding the parameters that minimize the
sum of squared residuals, ie.

𝑁
min ∑ 𝑢̂2𝑖
𝛽̂ 𝑖=1

where 𝑢̂𝑖 is the difference between the observation and the predicted value of the dependent
variable
To estimate the constant term 𝛽0 , we need to add a column of 1’s to our dataset (consider
the equation if 𝛽0 was replaced with 𝛽0 𝑥𝑖 and 𝑥𝑖 = 1)

In [5]: df1['const'] = 1

Now we can construct our model in statsmodels using the OLS function
We will use pandas dataframes with statsmodels, however standard arrays can also be
used as arguments

In [6]: import statsmodels.api as sm

reg1 = sm.OLS(endog=df1['logpgp95'], exog=df1[['const', 'avexpr']], missing='drop')

type(reg1)

Out[6]: statsmodels.regression.linear_model.OLS

So far we have simply constructed our model

We need to use .fit() to obtain parameter estimates 𝛽0̂ and 𝛽1̂

In [7]: results = reg1.fit()

type(results)

Out[7]: statsmodels.regression.linear_model.RegressionResultsWrapper

We now have the fitted regression model stored in results

To view the OLS regression results, we can call the .summary() method
Note that an observation was mistakenly dropped from the results in the original paper (see
the note located in maketable2.do from Acemoglu’s webpage), and thus the coefficients differ
slightly

In [8]: print(results.summary())

OLS Regression Results

==============================================================================
Dep. Variable: logpgp95 R-squared: 0.611
Model: OLS Adj. R-squared: 0.608
Method: Least Squares F-statistic: 171.4
Date: Fri, 21 Jun 2019 Prob (F-statistic): 4.16e-24
Time: 15:39:14 Log-Likelihood: -119.71
284 18. LINEAR REGRESSION IN PYTHON

No. Observations: 111 AIC: 243.4

Df Residuals: 109 BIC: 248.8
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 4.6261 0.301 15.391 0.000 4.030 5.222
avexpr 0.5319 0.041 13.093 0.000 0.451 0.612
==============================================================================
Omnibus: 9.251 Durbin-Watson: 1.689
Prob(Omnibus): 0.010 Jarque-Bera (JB): 9.170
Skew: -0.680 Prob(JB): 0.0102
Kurtosis: 3.362 Cond. No. 33.2
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

From our results, we see that

• The intercept 𝛽0̂ = 4.63

• The slope 𝛽1̂ = 0.53
• The positive 𝛽1̂ parameter estimate implies that institutional quality has a positive ef-
fect on economic outcomes, as we saw in the figure
• The p-value of 0.000 for 𝛽1̂ implies that the effect of institutions on GDP is statistically
significant (using p < 0.05 as a rejection rule)
• The R-squared value of 0.611 indicates that around 61% of variation in log GDP per
capita is explained by protection against expropriation

Using our parameter estimates, we can now write our estimated relationship as

̂
𝑙𝑜𝑔𝑝𝑔𝑝95 𝑖 = 4.63 + 0.53 𝑎𝑣𝑒𝑥𝑝𝑟𝑖

This equation describes the line that best fits our data, as shown in Figure 2
We can use this equation to predict the level of log GDP per capita for a value of the index of
expropriation protection
For example, for a country with an index value of 7.07 (the average for the dataset), we find
that their predicted level of log GDP per capita in 1995 is 8.38

In [9]: mean_expr = np.mean(df1_subset['avexpr'])

mean_expr

Out[9]: 6.515625

In [10]: predicted_logpdp95 = 4.63 + 0.53 * 7.07

predicted_logpdp95

Out[10]: 8.3771

An easier (and more accurate) way to obtain this result is to use .predict() and set
𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 = 1 and 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝑚𝑒𝑎𝑛_𝑒𝑥𝑝𝑟

In [11]: results.predict(exog=[1, mean_expr])

18.4. EXTENDING THE LINEAR REGRESSION MODEL 285

Out[11]: array([8.09156367])

We can obtain an array of predicted 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 for every value of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 in our dataset by
calling .predict() on our results
Plotting the predicted values against 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 shows that the predicted values lie along the
linear line that we fitted above
The observed values of 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 are also plotted for comparison purposes

In [12]: # Drop missing observations from whole sample

df1_plot = df1.dropna(subset=['logpgp95', 'avexpr'])

# Plot predicted values

plt.scatter(df1_plot['avexpr'], results.predict(), alpha=0.5, label='predicted')

# Plot observed values

plt.scatter(df1_plot['avexpr'], df1_plot['logpgp95'], alpha=0.5, label='observed')

plt.legend()
plt.title('OLS predicted values')
plt.xlabel('avexpr')
plt.ylabel('logpgp95')
plt.show()

18.4 Extending the Linear Regression Model

So far we have only accounted for institutions affecting economic performance - almost cer-
tainly there are numerous other factors affecting GDP that are not included in our model
286 18. LINEAR REGRESSION IN PYTHON

Leaving out variables that affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 will result in omitted variable bias, yielding
biased and inconsistent parameter estimates
We can extend our bivariate regression model to a multivariate regression model by
adding in other factors that may affect 𝑙𝑜𝑔𝑝𝑔𝑝95𝑖
[3] consider other factors such as:

• the effect of climate on economic outcomes; latitude is used to proxy this

• differences that affect both economic performance and institutions, eg. cultural, histori-
cal, etc.; controlled for with the use of continent dummies

Let’s estimate some of the extended models considered in the paper (Table 2) using data from
maketable2.dta

In [13]: df2 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable2.d

# Add constant term to dataset

df2['const'] = 1

# Create lists of variables to be used in each regression

X1 = ['const', 'avexpr']
X2 = ['const', 'avexpr', 'lat_abst']
X3 = ['const', 'avexpr', 'lat_abst', 'asia', 'africa', 'other']

# Estimate an OLS regression for each set of variables

reg1 = sm.OLS(df2['logpgp95'], df2[X1], missing='drop').fit()
reg2 = sm.OLS(df2['logpgp95'], df2[X2], missing='drop').fit()
reg3 = sm.OLS(df2['logpgp95'], df2[X3], missing='drop').fit()

Now that we have fitted our model, we will use summary_col to display the results in a sin-
gle table (model numbers correspond to those in the paper)

In [14]: from statsmodels.iolib.summary2 import summary_col

info_dict={'R-squared' : lambda x: f"{x.rsquared:.2f}",

'No. observations' : lambda x: f"{int(x.nobs):d}"}

results_table = summary_col(results=[reg1,reg2,reg3],
float_format='%0.2f',
stars = True,
model_names=['Model 1',
'Model 3',
'Model 4'],
info_dict=info_dict,
regressor_order=['const',
'avexpr',
'lat_abst',
'asia',
'africa'])

results_table.add_title('Table 2 - OLS Regressions')

print(results_table)

Table 2 - OLS Regressions

=========================================
Model 1 Model 3 Model 4
-----------------------------------------
const 4.63*** 4.87*** 5.85***
(0.30) (0.33) (0.34)
avexpr 0.53*** 0.46*** 0.39***
(0.04) (0.06) (0.05)
lat_abst 0.87* 0.33
18.5. ENDOGENEITY 287

(0.49) (0.45)
asia -0.15
(0.15)
africa -0.92***
(0.17)
other 0.30
(0.37)
R-squared 0.61 0.62 0.72
No. observations 111 111 111
=========================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

18.5 Endogeneity

As [3] discuss, the OLS models likely suffer from endogeneity issues, resulting in biased and
inconsistent model estimates
Namely, there is likely a two-way relationship between institutions and economic outcomes:

• richer countries may be able to afford or prefer better institutions

• variables that affect income may also be correlated with institutional differences
• the construction of the index may be biased; analysts may be biased towards seeing
countries with higher income having better institutions

To deal with endogeneity, we can use two-stage least squares (2SLS) regression, which
is an extension of OLS regression
This method requires replacing the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with a variable that is:

1. correlated with 𝑎𝑣𝑒𝑥𝑝𝑟𝑖

2. not correlated with the error term (ie. it should not directly affect the dependent vari-
able, otherwise it would be correlated with 𝑢𝑖 due to omitted variable bias)

The new set of regressors is called an instrument, which aims to remove endogeneity in our
proxy of institutional differences
The main contribution of [3] is the use of settler mortality rates to instrument for institu-
tional differences
They hypothesize that higher mortality rates of colonizers led to the establishment of insti-
tutions that were more extractive in nature (less protection against expropriation), and these
institutions still persist today
Using a scatterplot (Figure 3 in [3]), we can see protection against expropriation is negatively
correlated with settler mortality rates, coinciding with the authors’ hypothesis and satisfying
the first condition of a valid instrument

In [15]: # Dropping NA's is required to use numpy's polyfit

df1_subset2 = df1.dropna(subset=['logem4', 'avexpr'])

X = df1_subset2['logem4']
y = df1_subset2['avexpr']
labels = df1_subset2['shortnam']

# Replace markers with country labels

288 18. LINEAR REGRESSION IN PYTHON

plt.scatter(X, y, marker='')

for i, label in enumerate(labels):

plt.annotate(label, (X.iloc[i], y.iloc[i]))

# Fit a linear trend line

plt.plot(np.unique(X),
np.poly1d(np.polyfit(X, y, 1))(np.unique(X)),
color='black')

plt.xlim([1.8,8.4])
plt.ylim([3.3,10.4])
plt.xlabel('Log of Settler Mortality')
plt.ylabel('Average Expropriation Risk 1985-95')
plt.title('Figure 3: First-stage relationship between settler mortality and expropriation risk')
plt.show()

The second condition may not be satisfied if settler mortality rates in the 17th to 19th cen-
turies have a direct effect on current GDP (in addition to their indirect effect through institu-
tions)
For example, settler mortality rates may be related to the current disease environment in a
country, which could affect current economic performance
[3] argue this is unlikely because:

• The majority of settler deaths were due to malaria and yellow fever and had a limited
effect on local people
• The disease burden on local people in Africa or India, for example, did not appear to
be higher than average, supported by relatively high population densities in these areas
before colonization

As we appear to have a valid instrument, we can use 2SLS regression to obtain consistent and
unbiased parameter estimates
First stage
18.5. ENDOGENEITY 289

The first stage involves regressing the endogenous variable (𝑎𝑣𝑒𝑥𝑝𝑟𝑖 ) on the instrument
The instrument is the set of all exogenous variables in our model (and not just the variable
we have replaced)
Using model 1 as an example, our instrument is simply a constant and settler mortality rates
𝑙𝑜𝑔𝑒𝑚4𝑖
Therefore, we will estimate the first-stage regression as

𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝛿0 + 𝛿1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝑣𝑖

The data we need to estimate this equation is located in maketable4.dta (only complete
data, indicated by baseco = 1, is used for estimation)

In [16]: # Import and select the data

df4 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable4.d
df4 = df4[df4['baseco'] == 1]

# Add a constant variable

df4['const'] = 1

# Fit the first stage regression and print summary

results_fs = sm.OLS(df4['avexpr'],
df4[['const', 'logem4']],
missing='drop').fit()
print(results_fs.summary())

OLS Regression Results

==============================================================================
Dep. Variable: avexpr R-squared: 0.270
Model: OLS Adj. R-squared: 0.258
Method: Least Squares F-statistic: 22.95
Date: Fri, 21 Jun 2019 Prob (F-statistic): 1.08e-05
Time: 15:39:17 Log-Likelihood: -104.83
No. Observations: 64 AIC: 213.7
Df Residuals: 62 BIC: 218.0
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 9.3414 0.611 15.296 0.000 8.121 10.562
logem4 -0.6068 0.127 -4.790 0.000 -0.860 -0.354
==============================================================================
Omnibus: 0.035 Durbin-Watson: 2.003
Prob(Omnibus): 0.983 Jarque-Bera (JB): 0.172
Skew: 0.045 Prob(JB): 0.918
Kurtosis: 2.763 Cond. No. 19.4
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Second stage
We need to retrieve the predicted values of 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 using .predict()
We then replace the endogenous variable 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 with the predicted values 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 in the
original linear model
Our second stage regression is thus

𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟
̂ 𝑖 + 𝑢𝑖
290 18. LINEAR REGRESSION IN PYTHON

In [17]: df4['predicted_avexpr'] = results_fs.predict()

results_ss = sm.OLS(df4['logpgp95'],
df4[['const', 'predicted_avexpr']]).fit()
print(results_ss.summary())

OLS Regression Results

==============================================================================
Dep. Variable: logpgp95 R-squared: 0.477
Model: OLS Adj. R-squared: 0.469
Method: Least Squares F-statistic: 56.60
Date: Fri, 21 Jun 2019 Prob (F-statistic): 2.66e-10
Time: 15:39:17 Log-Likelihood: -72.268
No. Observations: 64 AIC: 148.5
Df Residuals: 62 BIC: 152.9
Df Model: 1
Covariance Type: nonrobust
====================================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------------
const 1.9097 0.823 2.320 0.024 0.264 3.555
predicted_avexpr 0.9443 0.126 7.523 0.000 0.693 1.195
==============================================================================
Omnibus: 10.547 Durbin-Watson: 2.137
Prob(Omnibus): 0.005 Jarque-Bera (JB): 11.010
Skew: -0.790 Prob(JB): 0.00407
Kurtosis: 4.277 Cond. No. 58.1
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The second-stage regression results give us an unbiased and consistent estimate of the effect
of institutions on economic outcomes
The result suggests a stronger positive relationship than what the OLS results indicated
Note that while our parameter estimates are correct, our standard errors are not and for this
reason, computing 2SLS ‘manually’ (in stages with OLS) is not recommended
We can correctly estimate a 2SLS regression in one step using the linearmodels package, an
extension of statsmodels

In [18]: from linearmodels.iv import IV2SLS

Note that when using IV2SLS, the exogenous and instrument variables are split up in the
function arguments (whereas before the instrument included exogenous variables)

In [19]: iv = IV2SLS(dependent=df4['logpgp95'],
exog=df4['const'],
endog=df4['avexpr'],
instruments=df4['logem4']).fit(cov_type='unadjusted')

print(iv.summary)

IV-2SLS Estimation Summary

==============================================================================
Dep. Variable: logpgp95 R-squared: 0.1870
Estimator: IV-2SLS Adj. R-squared: 0.1739
No. Observations: 64 F-statistic: 37.568
Date: Fri, Jun 21 2019 P-value (F-stat) 0.0000
Time: 15:39:17 Distribution: chi2(1)
Cov. Estimator: unadjusted
18.6. SUMMARY 291

Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 1.9097 1.0106 1.8897 0.0588 -0.0710 3.8903
avexpr 0.9443 0.1541 6.1293 0.0000 0.6423 1.2462
==============================================================================

Endogenous: avexpr
Instruments: logem4
Unadjusted Covariance (Homoskedastic)
Debiased: False

Given that we now have consistent and unbiased estimates, we can infer from the model we
have estimated that institutional differences (stemming from institutions set up during colo-
nization) can help to explain differences in income levels across countries today
[3] use a marginal effect of 0.94 to calculate that the difference in the index between Chile
and Nigeria (ie. institutional quality) implies up to a 7-fold difference in income, emphasizing
the significance of institutions in economic development

18.6 Summary

We have demonstrated basic OLS and 2SLS regression in statsmodels and linearmod-
els
If you are familiar with R, you may want to use the formula interface to statsmodels, or
consider using r2py to call R from within Python

18.7 Exercises

18.7.1 Exercise 1

In the lecture, we think the original model suffers from endogeneity bias due to the likely ef-
fect income has on institutional development
Although endogeneity is often best identified by thinking about the data and model, we can
formally test for endogeneity using the Hausman test
We want to test for correlation between the endogenous variable, 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , and the errors, 𝑢𝑖

𝐻0 ∶ 𝐶𝑜𝑣(𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , 𝑢𝑖 ) = 0 (𝑛𝑜 𝑒𝑛𝑑𝑜𝑔𝑒𝑛𝑒𝑖𝑡𝑦)

𝐻1 ∶ 𝐶𝑜𝑣(𝑎𝑣𝑒𝑥𝑝𝑟𝑖 , 𝑢𝑖 ) ≠ 0 (𝑒𝑛𝑑𝑜𝑔𝑒𝑛𝑒𝑖𝑡𝑦)

This test is run is two stages

First, we regress 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 on the instrument, 𝑙𝑜𝑔𝑒𝑚4𝑖

𝑎𝑣𝑒𝑥𝑝𝑟𝑖 = 𝜋0 + 𝜋1 𝑙𝑜𝑔𝑒𝑚4𝑖 + 𝜐𝑖

Second, we retrieve the residuals 𝜐𝑖̂ and include them in the original equation

𝑙𝑜𝑔𝑝𝑔𝑝95𝑖 = 𝛽0 + 𝛽1 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 + 𝛼𝜐𝑖̂ + 𝑢𝑖

292 18. LINEAR REGRESSION IN PYTHON

If 𝛼 is statistically significant (with a p-value < 0.05), then we reject the null hypothesis and
conclude that 𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous
Using the above information, estimate a Hausman test and interpret your results

18.7.2 Exercise 2

The OLS parameter 𝛽 can also be estimated using matrix algebra and numpy (you may need
to review the numpy lecture to complete this exercise)
The linear equation we want to estimate is (written in matrix form)

𝑦 = 𝑋𝛽 + 𝑢

To solve for the unknown parameter 𝛽, we want to minimize the sum of squared residuals

min𝑢̂′ 𝑢̂
𝛽̂

Rearranging the first equation and substituting into the second equation, we can write

min (𝑌 − 𝑋 𝛽)̂ ′ (𝑌 − 𝑋 𝛽)̂

𝛽̂

Solving this optimization problem gives the solution for the 𝛽 ̂ coefficients

𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

Using the above information, compute 𝛽 ̂ from model 1 using numpy - your results should be
the same as those in the statsmodels output from earlier in the lecture

18.8 Solutions

18.8.1 Exercise 1
In [20]: # Load in data
df4 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable4.d

# Add a constant term

df4['const'] = 1

# Estimate the first stage regression

reg1 = sm.OLS(endog=df4['avexpr'],
exog=df4[['const', 'logem4']],
missing='drop').fit()

# Retrieve the residuals

df4['resid'] = reg1.resid

# Estimate the second stage residuals

reg2 = sm.OLS(endog=df4['logpgp95'],
exog=df4[['const', 'avexpr', 'resid']],
missing='drop').fit()

print(reg2.summary())
18.8. SOLUTIONS 293

OLS Regression Results

==============================================================================
Dep. Variable: logpgp95 R-squared: 0.689
Model: OLS Adj. R-squared: 0.679
Method: Least Squares F-statistic: 74.05
Date: Fri, 21 Jun 2019 Prob (F-statistic): 1.07e-17
Time: 15:39:17 Log-Likelihood: -62.031
No. Observations: 70 AIC: 130.1
Df Residuals: 67 BIC: 136.8
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 2.4782 0.547 4.530 0.000 1.386 3.570
avexpr 0.8564 0.082 10.406 0.000 0.692 1.021
resid -0.4951 0.099 -5.017 0.000 -0.692 -0.298
==============================================================================
Omnibus: 17.597 Durbin-Watson: 2.086
Prob(Omnibus): 0.000 Jarque-Bera (JB): 23.194
Skew: -1.054 Prob(JB): 9.19e-06
Kurtosis: 4.873 Cond. No. 53.8
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The output shows that the coefficient on the residuals is statistically significant, indicating
𝑎𝑣𝑒𝑥𝑝𝑟𝑖 is endogenous

18.8.2 Exercise 2
In [21]: # Load in data
df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable1.d
df1 = df1.dropna(subset=['logpgp95', 'avexpr'])

# Add a constant term

df1['const'] = 1

# Define the X and y variables

y = np.asarray(df1['logpgp95'])
X = np.asarray(df1[['const', 'avexpr']])

# Compute β_hat
β_hat = np.linalg.solve(X.T @ X, X.T @ y)

# Print out the results from the 2 x 1 vector β_hat

print(f'β_0 = {β_hat[0]:.2}')
print(f'β_1 = {β_hat[1]:.2}')

β_0 = 4.6
β_1 = 0.53

It is also possible to use np.linalg.inv(X.T @ X) @ X.T @ y to solve for 𝛽, however

.solve() is preferred as it involves fewer computations
294 18. LINEAR REGRESSION IN PYTHON
19

Maximum Likelihood Estimation

19.1 Contents

• Overview 19.2

• Set Up and Assumptions 19.3

• Conditional Distributions 19.4

• Maximum Likelihood Estimation 19.5

• MLE with Numerical Methods 19.6

• Maximum Likelihood Estimation 19.7

• Summary 19.8

• Exercises 19.9

• Solutions 19.10

19.2 Overview

In a previous lecture, we estimated the relationship between dependent and explanatory vari-
ables using linear regression
But what if a linear relationship is not an appropriate assumption for our model?
One widely used alternative is maximum likelihood estimation, which involves specifying a
class of distributions, indexed by unknown parameters, and then using the data to pin down
these parameter values
The benefit relative to linear regression is that it allows more flexibility in the probabilistic
relationships between variables
Here we illustrate maximum likelihood by replicating Daniel Treisman’s (2016) paper, Rus-
sia’s Billionaires, which connects the number of billionaires in a country to its economic char-
acteristics
The paper concludes that Russia has a higher number of billionaires than economic factors
such as market size and tax rate predict

295
296 19. MAXIMUM LIKELIHOOD ESTIMATION

19.2.1 Prerequisites

We assume familiarity with basic probability and multivariate calculus

19.2.2 Comments

This lecture is co-authored with Natasha Watkins

19.3 Set Up and Assumptions

Let’s consider the steps we need to go through in maximum likelihood estimation and how
they pertain to this study

19.3.1 Flow of Ideas

The first step with maximum likelihood estimation is to choose the probability distribution
believed to be generating the data
More precisely, we need to make an assumption as to which parametric class of distributions
is generating the data

• e.g., the class of all normal distributions, or the class of all gamma distributions

Each such class is a family of distributions indexed by a finite number of parameters

• e.g., the class of normal distributions is a family of distributions indexed by its mean
𝜇 ∈ (−∞, ∞) and standard deviation 𝜎 ∈ (0, ∞)

We’ll let the data pick out a particular element of the class by pinning down the parameters
The parameter estimates so produced will be called maximum likelihood estimates

19.3.2 Counting Billionaires

Treisman [129] is interested in estimating the number of billionaires in different countries

The number of billionaires is integer-valued
Hence we consider distributions that take values only in the nonnegative integers
(This is one reason least squares regression is not the best tool for the present problem, since
the dependent variable in linear regression is not restricted to integer values)
One integer distribution is the Poisson distribution, the probability mass function (pmf) of
which is

𝜇𝑦 −𝜇
𝑓(𝑦) = 𝑒 , 𝑦 = 0, 1, 2, … , ∞
𝑦!

We can plot the Poisson distribution over 𝑦 for different values of 𝜇 as follows
19.3. SET UP AND ASSUMPTIONS 297

In [1]: from numpy import exp

from scipy.special import factorial
import matplotlib.pyplot as plt
%matplotlib inline

poisson_pmf = lambda y, μ: μ**y / factorial(y) * exp(-μ)

y_values = range(0, 25)

fig, ax = plt.subplots(figsize=(12, 8))

for μ in [1, 5, 10]:

distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, μ))
ax.plot(y_values,
distribution,
label=f'$\mu$={μ}',
alpha=0.5,
marker='o',
markersize=8)

ax.grid()
ax.set_xlabel('$y$', fontsize=14)
ax.set_ylabel('$f(y \mid \mu)$', fontsize=14)
ax.axis(xmin=0, ymin=0)
ax.legend(fontsize=14)

plt.show()

Notice that the Poisson distribution begins to resemble a normal distribution as the mean of
𝑦 increases
Let’s have a look at the distribution of the data we’ll be working with in this lecture
Treisman’s main source of data is Forbes’ annual rankings of billionaires and their estimated
net worth
The dataset mle/fp.dta can be downloaded here or from its AER page
298 19. MAXIMUM LIKELIHOOD ESTIMATION

In [2]: import pandas as pd

pd.options.display.max_columns = 10

# Load in data and view

df = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/mle/fp.dta')
df.head()

Out[2]: country ccode year cyear numbil … topint08 rintr \

0 United States 2.0 1990.0 21990.0 NaN … 39.799999 4.988405
1 United States 2.0 1991.0 21991.0 NaN … 39.799999 4.988405
2 United States 2.0 1992.0 21992.0 NaN … 39.799999 4.988405
3 United States 2.0 1993.0 21993.0 NaN … 39.799999 4.988405
4 United States 2.0 1994.0 21994.0 NaN … 39.799999 4.988405

noyrs roflaw nrrents

0 20.0 1.61 NaN
1 20.0 1.61 NaN
2 20.0 1.61 NaN
3 20.0 1.61 NaN
4 20.0 1.61 NaN

[5 rows x 36 columns]

Using a histogram, we can view the distribution of the number of billionaires per country,
numbil0, in 2008 (the United States is dropped for plotting purposes)

In [3]: numbil0_2008 = df[(df['year'] == 2008) & (

df['country'] != 'United States')].loc[:, 'numbil0']

plt.subplots(figsize=(12, 8))
plt.hist(numbil0_2008, bins=30)
plt.xlim(xmin=0)
plt.grid()
plt.xlabel('Number of billionaires in 2008')
plt.ylabel('Count')
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py:3215: MatplotlibDeprecationWarning:
The `xmin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `left` instead.
alternative='`left`', obj_type='argument')
19.4. CONDITIONAL DISTRIBUTIONS 299

From the histogram, it appears that the Poisson assumption is not unreasonable (albeit with
a very low 𝜇 and some outliers)

19.4 Conditional Distributions

In Treisman’s paper, the dependent variable — the number of billionaires 𝑦𝑖 in country 𝑖 —

is modeled as a function of GDP per capita, population size, and years membership in GATT
and WTO
Hence, the distribution of 𝑦𝑖 needs to be conditioned on the vector of explanatory variables x𝑖
The standard formulation — the so-called poisson regression model — is as follows:

𝑦
𝜇 𝑖
𝑓(𝑦𝑖 ∣ x𝑖 ) = 𝑖 𝑒−𝜇𝑖 ; 𝑦𝑖 = 0, 1, 2, … , ∞. (1)
𝑦𝑖 !

where 𝜇𝑖 = exp(x′𝑖 𝛽) = exp(𝛽0 + 𝛽1 𝑥𝑖1 + … + 𝛽𝑘 𝑥𝑖𝑘 )

To illustrate the idea that the distribution of 𝑦𝑖 depends on x𝑖 let’s run a simple simulation
We use our poisson_pmf function from above and arbitrary values for 𝛽 and x𝑖

In [4]: import numpy as np

y_values = range(0, 20)

# Define a parameter vector with estimates

β = np.array([0.26, 0.18, 0.25, -0.1, -0.22])

# Create some observations X

datasets = [np.array([0, 1, 1, 1, 2]),
np.array([2, 3, 2, 4, 0]),
np.array([3, 4, 5, 3, 2]),
np.array([6, 5, 4, 4, 7])]

fig, ax = plt.subplots(figsize=(12, 8))

for X in datasets:
μ = exp(X @ β)
distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, μ))
ax.plot(y_values,
distribution,
label=f'$\mu_i$={μ:.1}',
marker='o',
markersize=8,
alpha=0.5)

ax.grid()
ax.legend()
ax.set_xlabel('$y \mid x_i$')
ax.set_ylabel(r'$f(y \mid x_i; \beta )$')
ax.axis(xmin=0, ymin=0)
plt.show()
300 19. MAXIMUM LIKELIHOOD ESTIMATION

We can see that the distribution of 𝑦𝑖 is conditional on x𝑖 (𝜇𝑖 is no longer constant)

19.5 Maximum Likelihood Estimation

In our model for number of billionaires, the conditional distribution contains 4 (𝑘 = 4) pa-
rameters that we need to estimate
We will label our entire parameter vector as 𝛽 where

𝛽0
⎡𝛽 ⎤
𝛽 = ⎢ 1⎥
⎢𝛽2 ⎥
⎣𝛽3 ⎦

To estimate the model using MLE, we want to maximize the likelihood that our estimate 𝛽̂ is
the true parameter 𝛽
Intuitively, we want to find the 𝛽̂ that best fits our data
First, we need to construct the likelihood function ℒ(𝛽), which is similar to a joint probabil-
ity density function
Assume we have some data 𝑦𝑖 = {𝑦1 , 𝑦2 } and 𝑦𝑖 ∼ 𝑓(𝑦𝑖 )
If 𝑦1 and 𝑦2 are independent, the joint pmf of these data is 𝑓(𝑦1 , 𝑦2 ) = 𝑓(𝑦1 ) ⋅ 𝑓(𝑦2 )
If 𝑦𝑖 follows a Poisson distribution with 𝜆 = 7, we can visualize the joint pmf like so

In [5]: from mpl_toolkits.mplot3d import Axes3D

def plot_joint_poisson(μ=7, y_n=20):

19.5. MAXIMUM LIKELIHOOD ESTIMATION 301

yi_values = np.arange(0, y_n, 1)

# Create coordinate points of X and Y

X, Y = np.meshgrid(yi_values, yi_values)

# Multiply distributions together

Z = poisson_pmf(X, μ) * poisson_pmf(Y, μ)

fig = plt.figure(figsize=(12, 8))

ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z.T, cmap='terrain', alpha=0.6)
ax.scatter(X, Y, Z.T, color='black', alpha=0.5, linewidths=1)
ax.set(xlabel='$y_1$', ylabel='$y_2$')
ax.set_zlabel('$f(y_1, y_2)$', labelpad=10)
plt.show()

plot_joint_poisson(μ=7, y_n=20)

Similarly, the joint pmf of our data (which is distributed as a conditional Poisson distribu-
tion) can be written as

𝑛 𝑦
𝜇 𝑖
𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽) = ∏ 𝑖 𝑒−𝜇𝑖
𝑦!
𝑖=1 𝑖

𝑦𝑖 is conditional on both the values of x𝑖 and the parameters 𝛽

The likelihood function is the same as the joint pmf, but treats the parameter 𝛽 as a random
variable and takes the observations (𝑦𝑖 , x𝑖 ) as given

𝑛 𝑦
𝜇 𝑖
ℒ(𝛽 ∣ 𝑦1 , 𝑦2 , … , 𝑦𝑛 ; x1 , x2 , … , x𝑛 ) = ∏ 𝑖 𝑒−𝜇𝑖
𝑦!
𝑖=1 𝑖
=𝑓(𝑦1 , 𝑦2 , … , 𝑦𝑛 ∣ x1 , x2 , … , x𝑛 ; 𝛽)
302 19. MAXIMUM LIKELIHOOD ESTIMATION

Now that we have our likelihood function, we want to find the 𝛽̂ that yields the maximum
likelihood value

maxℒ(𝛽)
𝛽

In doing so it is generally easier to maximize the log-likelihood (consider differentiating

𝑓(𝑥) = 𝑥 exp(𝑥) vs. 𝑓(𝑥) = log(𝑥) + 𝑥)
Given that taking a logarithm is a monotone increasing transformation, a maximizer of the
likelihood function will also be a maximizer of the log-likelihood function
In our case the log-likelihood is

log ℒ(𝛽) = log (𝑓(𝑦1 ; 𝛽) ⋅ 𝑓(𝑦2 ; 𝛽) ⋅ … ⋅ 𝑓(𝑦𝑛 ; 𝛽))

𝑛
= ∑ log 𝑓(𝑦𝑖 ; 𝛽)
𝑖=1
𝑛 𝑦
𝜇𝑖 𝑖 −𝜇𝑖
= ∑ log ( 𝑒 )
𝑖=1
𝑦𝑖 !
𝑛 𝑛 𝑛
= ∑ 𝑦𝑖 log 𝜇𝑖 − ∑ 𝜇𝑖 − ∑ log 𝑦!
𝑖=1 𝑖=1 𝑖=1

The MLE of the Poisson to the Poisson for 𝛽 ̂ can be obtained by solving

𝑛 𝑛 𝑛
max( ∑ 𝑦𝑖 log 𝜇𝑖 − ∑ 𝜇𝑖 − ∑ log 𝑦!)
𝛽
𝑖=1 𝑖=1 𝑖=1

However, no analytical solution exists to the above problem – to find the MLE we need to use
numerical methods

19.6 MLE with Numerical Methods

Many distributions do not have nice, analytical solutions and therefore require numerical
methods to solve for parameter estimates
One such numerical method is the Newton-Raphson algorithm
Our goal is to find the maximum likelihood estimate 𝛽̂
At 𝛽,̂ the first derivative of the log-likelihood function will be equal to 0
Let’s illustrate this by supposing

log ℒ(𝛽) = −(𝛽 − 10)2 − 10

In [6]: β = np.linspace(1, 20)

logL = -(β - 10) ** 2 - 10
dlogL = -2 * β + 20

fig, (ax1, ax2) = plt.subplots(2, sharex=True, figsize=(12, 8))

19.6. MLE WITH NUMERICAL METHODS 303

ax1.plot(β, logL, lw=2)

ax2.plot(β, dlogL, lw=2)

ax1.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=35,
fontsize=15)
ax2.set_ylabel(r'$\frac{dlog \mathcal{L(\beta)}}{d \beta}$ ',
rotation=0,
labelpad=35,
fontsize=19)
ax2.set_xlabel(r'$\beta$', fontsize=15)
ax1.grid(), ax2.grid()
plt.axhline(c='black')
plt.show()

𝑑 log ℒ(𝛽)
The plot shows that the maximum likelihood value (the top plot) occurs when 𝑑𝛽 = 0
(the bottom plot)
Therefore, the likelihood is maximized when 𝛽 = 10
We can also ensure that this value is a maximum (as opposed to a minimum) by checking
that the second derivative (slope of the bottom plot) is negative
The Newton-Raphson algorithm finds a point where the first derivative is 0
To use the algorithm, we take an initial guess at the maximum value, 𝛽0 (the OLS parameter
estimates might be a reasonable guess), then

1. Use the updating rule to iterate the algorithm

𝛽 (𝑘+1) = 𝛽 (𝑘) − 𝐻 −1 (𝛽 (𝑘) )𝐺(𝛽 (𝑘) )

where:
304 19. MAXIMUM LIKELIHOOD ESTIMATION

𝑑 log ℒ(𝛽 (𝑘) )

𝐺(𝛽 (𝑘) ) =
𝑑𝛽 (𝑘)
𝑑2 log ℒ(𝛽 (𝑘) )
𝐻(𝛽 (𝑘) ) = ′
𝑑𝛽 (𝑘) 𝑑𝛽 (𝑘)
2. Check whether 𝛽 (𝑘+1) − 𝛽 (𝑘) < 𝑡𝑜𝑙

• If true, then stop iterating and set 𝛽̂ = 𝛽 (𝑘+1)

• If false, then update 𝛽 (𝑘+1)

As can be seen from the updating equation, 𝛽 (𝑘+1) = 𝛽 (𝑘) only when 𝐺(𝛽 (𝑘) ) = 0 ie. where the
first derivative is equal to 0
(In practice, we stop iterating when the difference is below a small tolerance threshold)
Let’s have a go at implementing the Newton-Raphson algorithm
First, we’ll create a class called PoissonRegression so we can easily recompute the values
of the log likelihood, gradient and Hessian for every iteration

In [7]: class PoissonRegression:

def init(self, y, X, β):

self.X = X
self.n, self.k = X.shape
self.y = y.reshape(self.n,1) # Reshape y as a n_by_1 column vector
self.β = β.reshape(self.k,1) # Reshape β as a k_by_1 column vector

def μ(self):
return np.exp(self.X @ self.β)

def logL(self):
y = self.y
μ = self.μ()
return np.sum(y * np.log(μ) - μ - np.log(factorial(y)))

def G(self):
y = self.y
μ = self.μ()
return X.T @ (y - μ)

def H(self):
X = self.X
μ = self.μ()
return -(X.T @ (μ * X))

Our function newton_raphson will take a PoissonRegression object that has an initial
guess of the parameter vector 𝛽 0
The algorithm will update the parameter vector according to the updating rule, and recalcu-
late the gradient and Hessian matrices at the new parameter estimates
Iteration will end when either:

• The difference between the parameter and the updated parameter is below a tolerance
level
• The maximum number of iterations has been achieved (meaning convergence is not
achieved)
19.6. MLE WITH NUMERICAL METHODS 305

So we can get an idea of what’s going on while the algorithm is running, an option dis-
play=True is added to print out values at each iteration

In [8]: def newton_raphson(model, tol=1e-3, max_iter=1000, display=True):

i = 0
error = 100 # Initial error value

# Print header of output

if display:
header = f'{"Iteration_k":<13}{"Log-likelihood":<16}{"θ":<60}'
print(header)
print("-" * len(header))

# While loop runs while any value in error is greater

# than the tolerance until max iterations are reached
while np.any(error > tol) and i < max_iter:
H, G = model.H(), model.G()
β_new = model.β - (np.linalg.inv(H) @ G)
error = β_new - model.β
model.β = β_new

# Print iterations
if display:
β_list = [f'{t:.3}' for t in list(model.β.flatten())]
update = f'{i:<13}{model.logL():<16.8}{β_list}'
print(update)

i += 1

print(f'Number of iterations: {i}')

print(f'β_hat = {model.β.flatten()}')

return model.β.flatten() # Return a flat array for β (instead of a k_by_1 column vector)

Let’s try out our algorithm with a small dataset of 5 observations and 3 variables in X

In [9]: X = np.array([[1, 2, 5],

[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])

y = np.array([1, 0, 1, 1, 0])

# Take a guess at initial βs

init_β = np.array([0.1, 0.1, 0.1])

# Create an object with Poisson model values

poi = PoissonRegression(y, X, β=init_β)

# Use newton_raphson to find the MLE

β_hat = newton_raphson(poi, display=True)

Iteration_k Log-likelihood θ
-----------------------------------------------------------------------------------------
0 -4.3447622 ['-1.49', '0.265', '0.244']
1 -3.5742413 ['-3.38', '0.528', '0.474']
2 -3.3999526 ['-5.06', '0.782', '0.702']
3 -3.3788646 ['-5.92', '0.909', '0.82']
4 -3.3783559 ['-6.07', '0.933', '0.843']
5 -3.3783555 ['-6.08', '0.933', '0.843']
Number of iterations: 6
β_hat = [-6.07848205 0.93340226 0.84329625]

As this was a simple model with few observations, the algorithm achieved convergence in only
6 iterations
306 19. MAXIMUM LIKELIHOOD ESTIMATION

You can see that with each iteration, the log-likelihood value increased
Remember, our objective was to maximize the log-likelihood function, which the algorithm
has worked to achieve
Also, note that the increase in log ℒ(𝛽 (𝑘) ) becomes smaller with each iteration
This is because the gradient is approaching 0 as we reach the maximum, and therefore the
numerator in our updating equation is becoming smaller
The gradient vector should be close to 0 at 𝛽̂

In [10]: poi.G()

Out[10]: array([[-3.95169228e-07],
[-1.00114805e-06],
[-7.73114562e-07]])

The iterative process can be visualized in the following diagram, where the maximum is found
at 𝛽 = 10

In [11]: logL = lambda x: -(x - 10) ** 2 - 10

def find_tangent(β, a=0.01):

y1 = logL(β)
y2 = logL(β+a)
x = np.array([[β, 1], [β+a, 1]])
m, c = np.linalg.lstsq(x, np.array([y1, y2]), rcond=None)[0]
return m, c

β = np.linspace(2, 18)
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(β, logL(β), lw=2, c='black')

for β in [7, 8.5, 9.5, 10]:

β_line = np.linspace(β-2, β+2)
m, c = find_tangent(β)
y = m * β_line + c
ax.plot(β_line, y, '-', c='purple', alpha=0.8)
ax.text(β+2.05, y[-1], f'$G({β}) = {abs(m):.0f}$', fontsize=12)
ax.vlines(β, -24, logL(β), linestyles='--', alpha=0.5)
ax.hlines(logL(β), 6, β, linestyles='--', alpha=0.5)

ax.set(ylim=(-24, -4), xlim=(6, 13))

ax.set_xlabel(r'$\beta$', fontsize=15)
ax.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=25,
fontsize=15)
ax.grid(alpha=0.3)
plt.show()
19.7. MAXIMUM LIKELIHOOD ESTIMATION WITH STATSMODELS 307

Note that our implementation of the Newton-Raphson algorithm is rather basic — for more
robust implementations see, for example, scipy.optimize

19.7 Maximum Likelihood Estimation with statsmodels

Now that we know what’s going on under the hood, we can apply MLE to an interesting ap-
plication
We’ll use the Poisson regression model in statsmodels to obtain a richer output with stan-
dard errors, test values, and more
statsmodels uses the same algorithm as above to find the maximum likelihood estimates
Before we begin, let’s re-estimate our simple model with statsmodels to confirm we obtain
the same coefficients and log-likelihood value

In [12]: from statsmodels.api import Poisson

from scipy import stats

X = np.array([[1, 2, 5],
[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])

y = np.array([1, 0, 1, 1, 0])

stats_poisson = Poisson(y, X).fit()

print(stats_poisson.summary())

Optimization terminated successfully.

Current function value: 0.675671
Iterations 7
Poisson Regression Results
==============================================================================
308 19. MAXIMUM LIKELIHOOD ESTIMATION

Dep. Variable: y No. Observations: 5

Model: Poisson Df Residuals: 2
Method: MLE Df Model: 2
Date: Fri, 21 Jun 2019 Pseudo R-squ.: 0.2546
Time: 15:37:09 Log-Likelihood: -3.3784
converged: True LL-Null: -4.5325
LLR p-value: 0.3153
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -6.0785 5.279 -1.151 0.250 -16.425 4.268
x1 0.9334 0.829 1.126 0.260 -0.691 2.558
x2 0.8433 0.798 1.057 0.291 -0.720 2.407
==============================================================================

Now let’s replicate results from Daniel Treisman’s paper, Russia’s Billionaires, mentioned ear-
lier in the lecture
Treisman starts by estimating equation Eq. (1), where:

• 𝑦𝑖 is 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑖𝑙𝑙𝑖𝑜𝑛𝑎𝑖𝑟𝑒𝑠𝑖
• 𝑥𝑖1 is log 𝐺𝐷𝑃 𝑝𝑒𝑟 𝑐𝑎𝑝𝑖𝑡𝑎𝑖
• 𝑥𝑖2 is log 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛𝑖
• 𝑥𝑖3 is 𝑦𝑒𝑎𝑟𝑠 𝑖𝑛 𝐺𝐴𝑇 𝑇 𝑖 – years membership in GATT and WTO (to proxy access to in-
ternational markets)

The paper only considers the year 2008 for estimation

We will set up our variables for estimation like so (you should have the data assigned to df
from earlier in the lecture)

In [13]: # Keep only year 2008

df = df[df['year'] == 2008]

# Add a constant
df['const'] = 1

# Variable sets
reg1 = ['const', 'lngdppc', 'lnpop', 'gattwto08']
reg2 = ['const', 'lngdppc', 'lnpop',
'gattwto08', 'lnmcap08', 'rintr', 'topint08']
reg3 = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08',
'rintr', 'topint08', 'nrrents', 'roflaw']

Then we can use the Poisson function from statsmodels to fit the model
We’ll use robust standard errors as in the author’s paper

In [14]: import statsmodels.api as sm

# Specify model
poisson_reg = sm.Poisson(df[['numbil0']], df[reg1],
missing='drop').fit(cov_type='HC0')
print(poisson_reg.summary())

Optimization terminated successfully.

Current function value: 2.226090
Iterations 9
Poisson Regression Results
==============================================================================
Dep. Variable: numbil0 No. Observations: 197
Model: Poisson Df Residuals: 193
19.7. MAXIMUM LIKELIHOOD ESTIMATION WITH STATSMODELS 309

Method: MLE Df Model: 3

Date: Fri, 21 Jun 2019 Pseudo R-squ.: 0.8574
Time: 15:37:10 Log-Likelihood: -438.54
converged: True LL-Null: -3074.7
LLR p-value: 0.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -29.0495 2.578 -11.268 0.000 -34.103 -23.997
lngdppc 1.0839 0.138 7.834 0.000 0.813 1.355
lnpop 1.1714 0.097 12.024 0.000 0.980 1.362
gattwto08 0.0060 0.007 0.868 0.386 -0.008 0.019
==============================================================================

Success! The algorithm was able to achieve convergence in 9 iterations

Our output indicates that GDP per capita, population, and years of membership in the Gen-
eral Agreement on Tariffs and Trade (GATT) are positively related to the number of billion-
aires a country has, as expected
Let’s also estimate the author’s more full-featured models and display them in a single table

In [15]: from statsmodels.iolib.summary2 import summary_col

regs = [reg1, reg2, reg3]

reg_names = ['Model 1', 'Model 2', 'Model 3']
info_dict = {'Pseudo R-squared': lambda x: f"{x.prsquared:.2f}",
'No. observations': lambda x: f"{int(x.nobs):d}"}
regressor_order = ['const',
'lngdppc',
'lnpop',
'gattwto08',
'lnmcap08',
'rintr',
'topint08',
'nrrents',
'roflaw']
results = []

for reg in regs:

result = sm.Poisson(df[['numbil0']], df[reg],
missing='drop').fit(cov_type='HC0', maxiter=100, disp=0)
results.append(result)

results_table = summary_col(results=results,
float_format='%0.3f',
stars=True,
model_names=reg_names,
info_dict=info_dict,
regressor_order=regressor_order)
results_table.add_title('Table 1 - Explaining the Number of Billionaires in 2008')
print(results_table)

Table 1 - Explaining the Number of Billionaires in 2008

=================================================
Model 1 Model 2 Model 3
-------------------------------------------------
const -29.050*** -19.444*** -20.858***
(2.578) (4.820) (4.255)
lngdppc 1.084*** 0.717*** 0.737***
(0.138) (0.244) (0.233)
lnpop 1.171*** 0.806*** 0.929***
(0.097) (0.213) (0.195)
gattwto08 0.006 0.007 0.004
(0.007) (0.006) (0.006)
lnmcap08 0.399** 0.286*
(0.172) (0.167)
rintr -0.010 -0.009
310 19. MAXIMUM LIKELIHOOD ESTIMATION

(0.010) (0.010)
topint08 -0.051***-0.058***
(0.011) (0.012)
nrrents -0.005
(0.010)
roflaw 0.203
(0.372)
Pseudo R-squared 0.86 0.90 0.90
No. observations 197 131 131
=================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01

The output suggests that the frequency of billionaires is positively correlated with GDP
per capita, population size, stock market capitalization, and negatively correlated with top
marginal income tax rate
To analyze our results by country, we can plot the difference between the predicted an actual
values, then sort from highest to lowest and plot the first 15

In [16]: data = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08', 'rintr',

'topint08', 'nrrents', 'roflaw', 'numbil0', 'country']
results_df = df[data].dropna()

# Use last model (model 3)

results_df['prediction'] = results[-1].predict()

# Calculate difference
results_df['difference'] = results_df['numbil0'] - results_df['prediction']

# Sort in descending order

results_df.sort_values('difference', ascending=False, inplace=True)

# Plot the first 15 data points

results_df[:15].plot('country', 'difference', kind='bar', figsize=(12,8), legend=False)
plt.ylabel('Number of billionaires above predicted level')
plt.xlabel('Country')
plt.show()
19.8. SUMMARY 311

As we can see, Russia has by far the highest number of billionaires in excess of what is pre-
dicted by the model (around 50 more than expected)
Treisman uses this empirical result to discuss possible reasons for Russia’s excess of billion-
aires, including the origination of wealth in Russia, the political climate, and the history of
privatization in the years after the USSR

19.8 Summary

In this lecture, we used Maximum Likelihood Estimation to estimate the parameters of a

Poisson model
statsmodels contains other built-in likelihood models such as Probit and Logit
For further flexibility, statsmodels provides a way to specify the distribution manually us-
ing the GenericLikelihoodModel class - an example notebook can be found here

19.9 Exercises

19.9.1 Exercise 1

Suppose we wanted to estimate the probability of an event 𝑦𝑖 occurring, given some observa-
tions
312 19. MAXIMUM LIKELIHOOD ESTIMATION

We could use a probit regression model, where the pmf of 𝑦𝑖 is

𝑦
𝑓(𝑦𝑖 ; 𝛽) = 𝜇𝑖 𝑖 (1 − 𝜇𝑖 )1−𝑦𝑖 , 𝑦𝑖 = 0, 1
where 𝜇𝑖 = Φ(x′𝑖 𝛽)

Φ represents the cumulative normal distribution and constrains the predicted 𝑦𝑖 to be be-
tween 0 and 1 (as required for a probability)
𝛽 is a vector of coefficients
Following the example in the lecture, write a class to represent the Probit model
To begin, find the log-likelihood function and derive the gradient and Hessian
The scipy module stats.norm contains the functions needed to compute the cmf and pmf
of the normal distribution

19.9.2 Exercise 2

Use the following dataset and initial values of 𝛽 to estimate the MLE with the Newton-
Raphson algorithm developed earlier in the lecture

1 2 4 1
⎡1 1 1⎤ ⎡0⎤ 0.1
⎢ ⎥ ⎢ ⎥
X = ⎢1 4 3⎥ 𝑦 = ⎢1⎥ 𝛽 (0) = ⎡
⎢0.1⎥
⎤
⎢1 5 6⎥ ⎢1⎥ ⎣0.1⎦
⎣1 3 5⎦ ⎣0⎦

Verify your results with statsmodels - you can import the Probit function with the follow-
ing import statement

In [17]: from statsmodels.discrete.discrete_model import Probit

Note that the simple Newton-Raphson algorithm developed in this lecture is very sensitive to
initial values, and therefore you may fail to achieve convergence with different starting values

19.10 Solutions

19.10.1 Exercise 1

The log-likelihood can be written as

𝑛
log ℒ = ∑ [𝑦𝑖 log Φ(x′𝑖 𝛽) + (1 − 𝑦𝑖 ) log(1 − Φ(x′𝑖 𝛽))]
𝑖=1

Using the fundamental theorem of calculus, the derivative of a cumulative probability

distribution is its marginal distribution

𝜕
Φ(𝑠) = 𝜙(𝑠)
𝜕𝑠
19.10. SOLUTIONS 313

where 𝜙 is the marginal normal distribution

The gradient vector of the Probit model is

𝑛
𝜕 log ℒ 𝜙(x′𝑖 𝛽) 𝜙(x′𝑖 𝛽)
= ∑ [𝑦𝑖 − (1 − 𝑦 𝑖 ) ]x
𝜕𝛽 𝑖=1
Φ(x′𝑖 𝛽) 1 − Φ(x′𝑖 𝛽) 𝑖

The Hessian of the Probit model is

𝑛
𝜕 2 log ℒ ′ 𝜙(x′𝑖 𝛽) + x′𝑖 𝛽Φ(x′𝑖 𝛽) 𝜙𝑖 (x′𝑖 𝛽) − x′𝑖 𝛽(1 − Φ(x′𝑖 𝛽))
′ = − ∑ 𝜙(x 𝑖 𝛽)[𝑦 𝑖 ′ 2
+ (1 − 𝑦 𝑖 ) ′ 2
]x𝑖 x′𝑖
𝜕𝛽𝜕𝛽 𝑖=1
[Φ(x 𝑖 𝛽)] [1 − Φ(x 𝑖 𝛽)]

Using these results, we can write a class for the Probit model as follows

In [18]: from scipy.stats import norm

class ProbitRegression:

def init(self, y, X, β):

self.X, self.y, self.β = X, y, β
self.n, self.k = X.shape

def μ(self):
return norm.cdf(self.X @ self.β.T)

def �(self):
return norm.pdf(self.X @ self.β.T)

def logL(self):
μ = self.μ()
return np.sum(y * np.log(μ) + (1 - y) * np.log(1 - μ))

def G(self):
μ = self.μ()
� = self.�()
return np.sum((X.T * y * � / μ - X.T * (1 - y) * � / (1 - μ)), axis=1)

def H(self):
X = self.X
β = self.β
μ = self.μ()
� = self.�()
a = (� + (X @ β.T) * μ) / μ**2
b = (� - (X @ β.T) * (1 - μ)) / (1 - μ)**2
return -(� * (y * a + (1 - y) * b) * X.T) @ X

19.10.2 Exercise 2
In [19]: X = np.array([[1, 2, 4],
[1, 1, 1],
[1, 4, 3],
[1, 5, 6],
[1, 3, 5]])

y = np.array([1, 0, 1, 1, 0])

# Take a guess at initial βs

β = np.array([0.1, 0.1, 0.1])

# Create instance of Probit regression class

prob = ProbitRegression(y, X, β)

# Run Newton-Raphson algorithm

newton_raphson(prob)
314 19. MAXIMUM LIKELIHOOD ESTIMATION

Iteration_k Log-likelihood θ
-----------------------------------------------------------------------------------------
0 -2.3796884 ['-1.34', '0.775', '-0.157']
1 -2.3687526 ['-1.53', '0.775', '-0.0981']
2 -2.3687294 ['-1.55', '0.778', '-0.0971']
3 -2.3687294 ['-1.55', '0.778', '-0.0971']
Number of iterations: 4
β_hat = [-1.54625858 0.77778952 -0.09709757]

Out[19]: array([-1.54625858, 0.77778952, -0.09709757])

In [20]: # Use statsmodels to verify results

print(Probit(y, X).fit().summary())

Optimization terminated successfully.

Current function value: 0.473746
Iterations 6
Probit Regression Results
==============================================================================
Dep. Variable: y No. Observations: 5
Model: Probit Df Residuals: 2
Method: MLE Df Model: 2
Date: Fri, 21 Jun 2019 Pseudo R-squ.: 0.2961
Time: 15:37:10 Log-Likelihood: -2.3687
converged: True LL-Null: -3.3651
LLR p-value: 0.3692
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -1.5463 1.866 -0.829 0.407 -5.204 2.111
x1 0.7778 0.788 0.986 0.324 -0.768 2.323
x2 -0.0971 0.590 -0.165 0.869 -1.254 1.060
==============================================================================
Part V

Tools and Techniques

315
20

Geometric Series for Elementary

Economics

20.1 Contents

• Overview 20.2
• Key Formulas 20.3
• Example: The Money Multiplier in Fractional Reserve Banking 20.4
• Example: The Keynesian Multiplier 20.5
• Example: Interest Rates and Present Values 20.6
• Back to the Keynesian Multiplier 20.7

20.2 Overview

The lecture describes important ideas in economics that use the mathematics of geometric
series
Among these are

• the Keynesian multiplier

• the money multiplier that prevails in fractional reserve banking systems
• interest rates and present values of streams of payouts from assets

(As we shall see below, the term multiplier comes down to meaning sum of a convergent
geometric series)
These and other applications prove the truth of the wise crack that

“in economics, a little knowledge of geometric series goes a long way “

Below we’ll use the following imports

In [1]: import matplotlib.pyplot as plt

import numpy as np

317
318 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

20.3 Key Formulas

To start, let 𝑐 be a real number that lies strictly between −1 and 1

• We often write this as 𝑐 ∈ (−1, 1)

• Here (−1, 1) denotes the collection of all real numbers that are strictly less than 1 and
strictly greater than −1
• The symbol ∈ means in or belongs to the set after the symbol

We want to evaluate geometric series of two types – infinite and finite

20.3.1 Infinite Geometric Series

The first type of geometric that interests us is the infinite series

1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯

Where ⋯ means that the series continues without limit

The key formula is

1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ = (1)
1−𝑐
To prove key formula Eq. (1), multiply both sides by (1 − 𝑐) and verify that if 𝑐 ∈ (−1, 1),
then the outcome is the equation 1 = 1

20.3.2 Finite Geometric Series

The second series that interests us is the finite geomtric series

1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇

where 𝑇 is a positive integer

The key formula here is

1 − 𝑐𝑇 +1
1 + 𝑐 + 𝑐2 + 𝑐3 + ⋯ + 𝑐𝑇 =
1−𝑐
Remark: The above formula works for any value of the scalar 𝑐. We don’t have to restrict 𝑐
to be in the set (−1, 1)
We now move on to describe some famuous economic applications of geometric series

20.4 Example: The Money Multiplier in Fractional Reserve

Banking

In a fractional reserve banking system, banks hold only a fraction 𝑟 ∈ (0, 1) of cash behind
each deposit receipt that they issue
20.4. EXAMPLE: THE MONEY MULTIPLIER IN FRACTIONAL RESERVE BANKING319

• In recent times

– cash consists of pieces of paper issued by the government and called dollars or
pounds or …
– a deposit is a balance in a checking or savings account that entitles the owner to
ask the bank for immediate payment in cash

• When the UK and France and the US were on either a gold or silver standard (before
1914, for example)

– cash was a gold or silver coin

– a deposit receipt was a bank note that the bank promised to convert into gold or
silver on demand; (sometimes it was also a checking or savings account balance)

Economists and financiers often define the supply of money as an economy-wide sum of
cash plus deposits
In a fractional reserve banking system (one in which the reserve ratio 𝑟 satisfying 0 <
𝑟 < 1), banks create money by issuing deposits backed by fractional reserves plus loans
that they make to their customers
A geometric series is a key tool for understanding how banks create money (i.e., deposits) in
a fractional reserve system
The geometric series formula Eq. (1) is at the heart of the classic model of the money cre-
ation process – one that leads us to the celebrated money multiplier

20.4.1 A Simple Model

There is a set of banks named 𝑖 = 0, 1, 2, …

Bank 𝑖’s loans 𝐿𝑖 , deposits 𝐷𝑖 , and reserves 𝑅𝑖 must satisfy the balance sheet equation (be-
cause balance sheets balance):

𝐿𝑖 + 𝑅𝑖 = 𝐷𝑖

The left side of the above equation is the sum of the bank’s assets, namely, the loans 𝐿𝑖 it
has outstanding plus its reserves of cash 𝑅𝑖
The right side records bank 𝑖’s liabilities, namely, the deposits 𝐷𝑖 held by its depositors; these
are IOU’s from the bank to its depositors in the form of either checking accounts or savings
accounts (or before 1914, bank notes issued by a bank stating promises to redeem note for
gold or silver on demand)
Ecah bank 𝑖 sets its reserves to satisfy the equation

𝑅𝑖 = 𝑟𝐷𝑖 (2)

where 𝑟 ∈ (0, 1) is its reserve-deposit ratio or reserve ratio for short

• the reserve ratio is either set by a government or chosen by banks for precautionary rea-
sons
320 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Next we add a theory stating that bank 𝑖 + 1’s deposits depend entirely on loans made by
bank 𝑖, namely

𝐷𝑖+1 = 𝐿𝑖 (3)

Thus, we can think of the banks as being arranged along a line with loans from bank 𝑖 being
immediately deposited in 𝑖 + 1

• in this way, the debtors to bank 𝑖 become creditors of bank 𝑖 + 1

Finally, we add an initial condition about an exogenous level of bank 0’s deposits

𝐷0 is given exogenously

We can think of 𝐷0 as being the amount of cash that a first depositor put into the first bank
in the system, bank number 𝑖 = 0
Now we do a little algebra
Combining equations Eq. (2) and Eq. (3) tells us that

𝐿𝑖 = (1 − 𝑟)𝐷𝑖 (4)

This states that bank 𝑖 loans a fraction (1 − 𝑟) of its deposits and keeps a fraction 𝑟 as cash
reserves
Combining equation Eq. (4) with equation Eq. (3) tells us that

𝐷𝑖+1 = (1 − 𝑟)𝐷𝑖 for 𝑖 ≥ 0

which implies that

𝐷𝑖 = (1 − 𝑟)𝑖 𝐷0 for 𝑖 ≥ 0 (5)

Equation Eq. (5) expresses 𝐷𝑖 as the 𝑖 th term in the product of 𝐷0 and the geometric series

1, (1 − 𝑟), (1 − 𝑟)2 , ⋯

Therefore, the sum of all deposits in our banking system 𝑖 = 0, 1, 2, … is

∞
𝐷0 𝐷
∑(1 − 𝑟)𝑖 𝐷0 = = 0 (6)
𝑖=0
1 − (1 − 𝑟) 𝑟

20.4.2 Money Multiplier

The money multiplier is a number that tells the multiplicative factor by which an exoge-
nous injection of cash into bank 0 leads to an increase in the total deposits in the banking
system
1
Equation Eq. (6) asserts that the money multiplier is 𝑟
20.5. EXAMPLE: THE KEYNESIAN MULTIPLIER 321

• an initial deposit of cash of 𝐷0 in bank 0 leads the banking system to create total de-
posits of 𝐷𝑟0
• The initial deposit 𝐷0 is held as reserves, distributed throughout the banking system
∞
according to 𝐷0 = ∑𝑖=0 𝑅𝑖

20.5 Example: The Keynesian Multiplier

The famous economist John Maynard Keynes and his followers created a simple model in-
tended to determine national income 𝑦 in circumstances in which

• there are substantial unemployed resources, in particular excess supply of labor and
capital
• prices and interest rates fail to adjust to make aggregate supply equal demand (e.g.,
prices and interest rates are frozen)
• national income is entirely determined by aggregate demand

20.5.1 Static Version

An elementary Keynesian model of national income determination consists of three equations

that describe aggegate demand for 𝑦 and its components
The first equation is a national income identity asserting that consumption 𝑐 plus investment
𝑖 equals national income 𝑦:

𝑐+𝑖 = 𝑦

The second equation is a Keynesian consumption function asserting that people consume a
fraction 𝑏 ∈ (0, 1) of their income:

𝑐 = 𝑏𝑦

The fraction 𝑏 ∈ (0, 1) is called the marginal propensity to consume

The fraction 1 − 𝑏 ∈ (0, 1) is called the marginal propensity to save
The third equation simply states that investment is exogenous at level 𝑖

• exogenous means determined outside this model

Substituting the second equation into the first gives (1 − 𝑏)𝑦 = 𝑖

Solving this equation for 𝑦 gives

1
𝑦= 𝑖
1−𝑏
1
The quantity 1−𝑏 is called the investment multiplier or simply the multiplier
Applying the formula for the sum of an infinite geometric series, we can write the above equa-
tion as
322 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

∞
𝑦 = 𝑖 ∑ 𝑏𝑡
𝑡=0

where 𝑡 is a nonnegative integer

So we arrive at the following equivalent expressions for the multiplier:

∞
1
= ∑ 𝑏𝑡
1−𝑏 𝑡=0

∞
The expression ∑𝑡=0 𝑏𝑡 motivates an interpretation of the multiplier as the outcome of a dy-
namic process that we describe next

20.5.2 Dynamic Version

We arrive at a dynamic version by interpreting the nonnegative integer 𝑡 as indexing time and
changing our specification of the consumption function to take time into account

• we add a one-period lag in how income affects consumption

We let 𝑐𝑡 be consumption at time 𝑡 and 𝑖𝑡 be investment at time 𝑡

We modify our consumption function to assume the form

𝑐𝑡 = 𝑏𝑦𝑡−1

so that 𝑏 is the marginal propensity to consume (now) out of last period’s income
We begin wtih an initial condition stating that

𝑦−1 = 0

We also assume that

𝑖𝑡 = 𝑖 for all 𝑡 ≥ 0

so that investment is constant over time

It follows that

𝑦0 = 𝑖 + 𝑐0 = 𝑖 + 𝑏𝑦−1 = 𝑖

and

𝑦1 = 𝑐1 + 𝑖 = 𝑏𝑦0 + 𝑖 = (1 + 𝑏)𝑖

and

𝑦2 = 𝑐2 + 𝑖 = 𝑏𝑦1 + 𝑖 = (1 + 𝑏 + 𝑏2 )𝑖
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 323

and more generally

𝑦𝑡 = 𝑏𝑦𝑡−1 + 𝑖 = (1 + 𝑏 + 𝑏2 + ⋯ + 𝑏𝑡 )𝑖

1 − 𝑏𝑡+1
𝑦𝑡 = 𝑖
1−𝑏

Evidently, as 𝑡 → +∞,

1
𝑦𝑡 → 𝑖
1−𝑏

Remark 1: The above formula is often applied to assert that an exogenous increase in
investment of Δ𝑖 at time 0 ignites a dynamic process of increases in national income by
amounts

Δ𝑖, (1 + 𝑏)Δ𝑖, (1 + 𝑏 + 𝑏2 )Δ𝑖, ⋯

at times 0, 1, 2, …
Remark 2 Let 𝑔𝑡 be an exogenous sequence of government expenditures
If we generalize the model so that the national income identity becomes

𝑐𝑡 + 𝑖 𝑡 + 𝑔 𝑡 = 𝑦 𝑡

then a version of the preceding argument shows that the government expenditures mul-
1
tiplier is also 1−𝑏 , so that a permanent increase in government expenditures ultimately leads
to an increase in national income equal to the multiplier times the increase in government ex-
penditures

20.6 Example: Interest Rates and Present Values

We can apply our formula for geometric series to study how interest rates affect values of
streams of dollar payments that extend over time
We work in discrete time and assume that 𝑡 = 0, 1, 2, … indexes time
We let 𝑟 ∈ (0, 1) be a one-period net nominal interest rate

• if the nominal interest rate is 5 percent, then 𝑟 = .05

A one-period gross nominal interest rate 𝑅 is defined as

𝑅 = 1 + 𝑟 ∈ (1, 2)

• if 𝑟 = .05, then 𝑅 = 1.05

324 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Remark: The gross nominal interest rate 𝑅 is an exchange rate or relative price of dol-
lars at between times 𝑡 and 𝑡 + 1. The units of 𝑅 are dollars at time 𝑡 + 1 per dollar at time
𝑡
When people borrow and lend, they trade dollars now for dollars later or dollars later for dol-
lars now
The price at which these exchanges occur is the gross nominal interest rate

• If I sell 𝑥 dollars to you today, you pay me 𝑅𝑥 dollars tomorrow

• This means that you borrowed 𝑥 dollars for me at a gross interest rate 𝑅 and a net in-
terest rate 𝑟

We assume that the net nominal interest rate 𝑟 is fixed over time, so that 𝑅 is the gross nom-
inal interest rate at times 𝑡 = 0, 1, 2, …
Two important geometric sequences are

1, 𝑅, 𝑅2 , ⋯ (7)

and

1, 𝑅−1 , 𝑅−2 , ⋯ (8)

Sequence Eq. (7) tells us how dollar values of an investment accumulate through time
Sequence Eq. (8) tells us how to discount future dollars to get their values in terms of to-
day’s dollars

20.6.1 Accumulation

Geometric sequence Eq. (7) tells us how one dollar invested and re-invested in a project with
gross one period nominal rate of return accumulates

• here we assume that net interest payments are reinvested in the project
• thus, 1 dollar invested at time 0 pays interest 𝑟 dollars after one period, so we have 𝑟 +
1 = 𝑅 dollars at time1
• at time 1 we reinvest 1 + 𝑟 = 𝑅 dollars and receive interest of 𝑟𝑅 dollars at time 2 plus
the principal 𝑅 dollars, so we receive 𝑟𝑅 + 𝑅 = (1 + 𝑟)𝑅 = 𝑅2 dollars at the end of
period 2
• and so on

Evidently, if we invest 𝑥 dollars at time 0 and reinvest the proceeds, then the sequence

𝑥, 𝑥𝑅, 𝑥𝑅2 , ⋯

tells how our account accumulates at dates 𝑡 = 0, 1, 2, …

20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 325

20.6.2 Discounting

Geometric sequence Eq. (8) tells us how much future dollars are worth in terms of today’s
dollars
Remember that the units of 𝑅 are dollars at 𝑡 + 1 per dollar at 𝑡
It follows that

• the units of 𝑅−1 are dollars at 𝑡 per dollar at 𝑡 + 1

• the units of 𝑅−2 are dollars at 𝑡 per dollar at 𝑡 + 2
• and so on; the units of 𝑅−𝑗 are dollars at 𝑡 per dollar at 𝑡 + 𝑗

So if someone has a claim on 𝑥 dollars at time 𝑡 + 𝑗, it is worth 𝑥𝑅−𝑗 dollars at time 𝑡 (e.g.,
today)

20.6.3 Application to Asset Pricing

A lease requires a payments stream of 𝑥𝑡 dollars at times 𝑡 = 0, 1, 2, … where

𝑥𝑡 = 𝐺𝑡 𝑥0

where 𝐺 = (1 + 𝑔) and 𝑔 ∈ (0, 1)

Thus, lease payments increase at 𝑔 percent per period
For a reason soon to be revealed, we assume that 𝐺 < 𝑅
The present value of the lease is

𝑝0 = 𝑥0 + 𝑥1 /𝑅 + 𝑥2 /(𝑅2 )+ ⋱
= 𝑥0 (1 + 𝐺𝑅−1 + 𝐺2 𝑅−2 + ⋯)
1
= 𝑥0
1 − 𝐺𝑅−1

where the last line uses the formula for an infinite geometric series
Recall that 𝑅 = 1 + 𝑟 and 𝐺 = 1 + 𝑔 and that 𝑅 > 𝐺 and 𝑟 > 𝑔 and that 𝑟 and𝑔 are typically
small numbers, e.g., .05 or .03
1
Use the Taylor series of 1+𝑟 about 𝑟 = 0, namely,

1
= 1 − 𝑟 + 𝑟2 − 𝑟3 + ⋯
1+𝑟

1
and the fact that 𝑟 is small to aproximate 1+𝑟 ≈ 1−𝑟
Use this approximation to write 𝑝0 as
326 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

1
𝑝0 = 𝑥0
1 − 𝐺𝑅−1
1
= 𝑥0
1 − (1 + 𝑔)(1 − 𝑟)
1
= 𝑥0
1 − (1 + 𝑔 − 𝑟 − 𝑟𝑔)
1
≈ 𝑥0
𝑟−𝑔

where the last step uses the approximation 𝑟𝑔 ≈ 0

The approximation

𝑥0
𝑝0 =
𝑟−𝑔

is known as the Gordon formula for the present value or current price of an infinite pay-
ment stream 𝑥0 𝐺𝑡 when the nominal one-period interest rate is 𝑟 and when 𝑟 > 𝑔
We can also extend the asset pricing formula so that it applies to finite leases
Let the payment stream on the lease now be 𝑥𝑡 for 𝑡 = 1, 2, … , 𝑇 , where again

𝑥𝑡 = 𝐺𝑡 𝑥0

The present value of this lease is:

𝑝0 = 𝑥0 + 𝑥1 /𝑅 + ⋯ + 𝑥𝑇 /𝑅𝑇
= 𝑥0 (1 + 𝐺𝑅−1 + ⋯ + 𝐺𝑇 𝑅−𝑇 )
𝑥0 (1 − 𝐺𝑇 +1 𝑅−(𝑇 +1) )
=
1 − 𝐺𝑅−1

Applying the Taylor series to 𝑅−(𝑇 +1) about 𝑟 = 0 we get:

1 1
= 1 − 𝑟(𝑇 + 1) + 𝑟2 (𝑇 + 1)(𝑇 + 2) + ⋯ ≈ 1 − 𝑟(𝑇 + 1)
(1 + 𝑟)𝑇 +1 2

Similarly, applying the Taylor series to 𝐺𝑇 +1 about 𝑔 = 0:

(1 + 𝑔)𝑇 +1 = 1 + (𝑇 + 1)𝑔(1 + 𝑔)𝑇 + (𝑇 + 1)𝑇 𝑔2 (1 + 𝑔)𝑇 −1 + ⋯ ≈ 1 + (𝑇 + 1)𝑔

Thus, we get the following approximation:

𝑥0 (1 − (1 + (𝑇 + 1)𝑔)(1 − 𝑟(𝑇 + 1)))

𝑝0 =
1 − (1 − 𝑟)(1 + 𝑔)

Expanding:
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 327

𝑥0 (1 − 1 + (𝑇 + 1)2 𝑟𝑔 − 𝑟(𝑇 + 1) + 𝑔(𝑇 + 1))

𝑝0 =
1 − 1 + 𝑟 − 𝑔 + 𝑟𝑔
𝑥 (𝑇 + 1)((𝑇 + 1)𝑟𝑔 + 𝑟 − 𝑔)
= 0
𝑟 − 𝑔 + 𝑟𝑔
𝑥0 (𝑇 + 1)(𝑟 − 𝑔) 𝑥0 𝑟𝑔(𝑇 + 1)
≈ +
𝑟−𝑔 𝑟−𝑔
𝑥0 𝑟𝑔(𝑇 + 1)
= 𝑥0 (𝑇 + 1) +
𝑟−𝑔

We could have also approximated by removing the second term 𝑟𝑔𝑥0 (𝑇 + 1) when 𝑇 is rela-
tively small compared to 1/(𝑟𝑔) to get 𝑥0 (𝑇 + 1) as in the finite stream approximation
We will plot the true finite stream present-value and the two approximations, under different
values of 𝑇 , and 𝑔 and 𝑟 in python
First we plot the true finite stream present-value after computing it below

In [2]: # True present value of a finite lease

def finite_lease_pv(T, g, r, x_0):
G = (1 + g)
R = (1 + r)
return (x_0 * (1 - G**(T + 1) * R**(-T - 1))) / (1 - G * R**(-1))
# First approximation for our finite lease

def finite_lease_pv_approx_f(T, g, r, x_0):

p = x_0 * (T + 1) + x_0 * r * g * (T + 1) / (r - g)
return p

# Second approximation for our finite lease

def finite_lease_pv_approx_s(T, g, r, x_0):
return (x_0 * (T + 1))

# Infinite lease
def infinite_lease(g, r, x_0):
G = (1 + g)
R = (1 + r)
return x_0 / (1 - G * R**(-1))

Now that we have test run our functions, we can plot some outcomes
First we study the quality of our approximations

In [3]: g = 0.02
r = 0.03
x_0 = 1
T_max = 50
T = np.arange(0, T_max+1)
fig, ax = plt.subplots()
ax.set_title('Finite Lease Present Value $T$ Periods Ahead')
y_1 = finite_lease_pv(T, g, r, x_0)
y_2 = finite_lease_pv_approx_f(T, g, r, x_0)
y_3 = finite_lease_pv_approx_s(T, g, r, x_0)
ax.plot(T, y_1, label='True T-period Lease PV')
ax.plot(T, y_2, label='T-period Lease First-order Approx.')
ax.plot(T, y_3, label='T-period Lease First-order Approx. adj.')
ax.legend()
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
plt.show()
328 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Evidently our approximations perform well for small values of 𝑇

However, holding 𝑔 and r fixed, our approximations deteriorate as 𝑇 increases
Next we compare the infinite and finite duration lease present values over different lease
lengths 𝑇

In [4]: # Convergence of infinite and finite

T_max = 1000
T = np.arange(0, T_max+1)
fig, ax = plt.subplots()
ax.set_title('Infinite and Finite Lease Present Value $T$ Periods Ahead')
y_1 = finite_lease_pv(T, g, r, x_0)
y_2 = np.ones(T_max+1)*infinite_lease(g, r, x_0)
ax.plot(T, y_1, label='T-period lease PV')
ax.plot(T, y_2, '--', label='Infinite lease PV')
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
ax.legend()
plt.show()
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 329

The above graphs shows how as duration 𝑇 → +∞, the value of a lease of duration 𝑇 ap-
proaches the value of a perpetural lease
Now we consider two different views of what happens as 𝑟 and 𝑔 covary

In [5]: # First view

# Changing r and g
fig, ax = plt.subplots()
ax.set_title('Value of lease of length $T$')
ax.set_ylabel('Present Value, $p_0$')
ax.set_xlabel('$T$ periods ahead')
T_max = 10
T=np.arange(0, T_max+1)
# r >> g, much bigger than g
r = 0.9
g = 0.4
ax.plot(finite_lease_pv(T, g, r, x_0), label='$r\gg g$')
# r > g
r = 0.5
g = 0.4
ax.plot(finite_lease_pv(T, g, r, x_0), label='$r>g$', color='green')

# r ~ g, not defined when r = g, but approximately goes to straight line with slope 1
r = 0.4001
g = 0.4
ax.plot(finite_lease_pv(T, g, r, x_0), label=r'$r \approx g$', color='orange')

# r < g
r = 0.4
g = 0.5
ax.plot(finite_lease_pv(T, g, r, x_0), label='$r<g$', color='red')
ax.legend()
plt.show()
330 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

The above graphs gives a big hint for why the condition 𝑟 > 𝑔 is necessary if a lease of length
𝑇 = +∞ is to have finite value
For fans of 3-d graphs the same point comes through in the following graph
If you aren’t enamored of 3-d graphs, feel free to skip the next visualization!

In [6]: # Second view

from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
T = 3
ax = fig.gca(projection='3d')
r = np.arange(0.01, 0.99, 0.005)
g = np.arange(0.01, 0.99, 0.005)

rr, gg = np.meshgrid(r, g)
z = finite_lease_pv(T, gg, rr, x_0)

# Removes points where undefined

same = (rr == gg)
z[same] = np.nan
surf = ax.plot_surface(rr, gg, z, cmap=cm.coolwarm, antialiased=True, clim=(0, 15))
fig.colorbar(surf, shrink=0.5, aspect=5)
ax.set_xlabel('$r$')
ax.set_ylabel('$g$')
ax.set_zlabel('Present Value, $p_0$')
ax.view_init(20, 10)
ax.set_title('Three Period Lease PV with Varying $g$ and $r$')
plt.show()

/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: divide by zero encou

"""
/home/anju/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:5: RuntimeWarning: invalid value encoun
"""
/home/anju/anaconda3/lib/python3.7/site-packages/matplotlib/colors.py:512: RuntimeWarning: invalid value encou
xa[xa < 0] = -1
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 331

We can use a little calculus to study how the present value 𝑝0 of a lease varies with 𝑟 and 𝑔
We will use a library called SymPy
SymPy enables us to do symbolic math calculations including computing derivatives of alge-
braic equations.
We will illustrate how it works by creating a symbolic expression that represents our present
value formula for an infinite lease
After that, we’ll use SymPy to compute derivatives

In [7]: import sympy as sym

from sympy import init_printing

# Creates algebraic symbols that can be used in an algebraic expression

g, r, x0 = sym.symbols('g, r, x0')
G = (1 + g)
R = (1 + r)
p0 = x0 / (1 - G * R**(-1))
init_printing()
print('Our formula is:')
p0

Our formula is:

Out[7]:

𝑥0
𝑔+1
− 𝑟+1 + 1

In [8]: print('dp0 / dg is:')

dp_dg = sym.diff(p0, g)
dp_dg

dp0 / dg is:
332 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Out[8]:

𝑥0
2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)

In [9]: print('dp0 / dr is:')

dp_dr = sym.diff(p0, r)
dp_dr

dp0 / dr is:

Out[9]:

𝑥0 (−𝑔 − 1)
2 2
(𝑟 + 1) (− 𝑔+1
𝑟+1 + 1)

We can see that for 𝜕𝑝

𝜕𝑟 < 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, this equation
0

will always be negative

Similarly, 𝜕𝑝
𝜕𝑔 > 0 as long as 𝑟 > 𝑔, 𝑟 > 0 and 𝑔 > 0 and 𝑥0 is positive, this equation will
0

always be postive

20.7 Back to the Keynesian Multiplier

We will now go back to the case of the Keynesian multiplier and plot the time path of 𝑦𝑡 ,
given that consumption is a constant fraction of national income, and investment is fixed

In [10]: # Function that calculates a path of y

def calculate_y(i, b, g, T, y_init):
y = np.zeros(T+1)
y[0] = i + b * y_init + g
for t in range(1, T+1):
y[t] = b * y[t-1] + i + g
return y

# Initial values
i_0 = 0.3
g_0 = 0.3
# 2/3 of income goes towards consumption
b = 2/3
y_init = 0
T = 100

fig, ax = plt.subplots()
ax.set_title('Path of Aggregate Output Over Time')
ax.set_xlabel('$t$')
ax.set_ylabel('$y_t$')
ax.plot(np.arange(0, T+1), calculate_y(i_0, b, g_0, T, y_init))
# Output predicted by geometric series
ax.hlines(i_0 / (1 - b) + g_0 / (1 - b), xmin=-1, xmax=101, linestyles='--')
plt.show()
20.7. BACK TO THE KEYNESIAN MULTIPLIER 333

In this model, income grows over time, until it gradually converges to the infinite geometric
series sum of income
We now examine what will happen if we vary the so-called marginal propensity to con-
sume, i.e., the fraction of income that is consumed

In [11]: # Changing fraction of consumption

b_0 = 1/3
b_1 = 2/3
b_2 = 5/6
b_3 = 0.9

fig,ax = plt.subplots()
ax.set_title('Changing Consumption as a Fraction of Income')
ax.set_ylabel('$y_t$')
ax.set_xlabel('$t$')
x = np.arange(0, T+1)
for b in (b_0, b_1, b_2, b_3):
y = calculate_y(i_0, b, g_0, T, y_init)
ax.plot(x, y, label=r'$b=$'+f"{b:.2f}")
ax.legend()
plt.show()
334 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS

Increasing the marginal propensity to consumer 𝑏 increases the path of output over time

In [12]: x = np.arange(0, T+1)

y_0 = calculate_y(i_0, b, g_0, T, y_init)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(6, 10))
fig.subplots_adjust(hspace=0.3)

# Changing initial investment:

i_1 = 0.4
y_1 = calculate_y(i_1, b, g_0, T, y_init)
ax1.set_title('An Increase in Investment on Output')
ax1.plot(x, y_0, label=r'$i=0.3$', linestyle='--')
ax1.plot(x, y_1, label=r'$i=0.4$')
ax1.legend()
ax1.set_ylabel('$y_t$')
ax1.set_xlabel('$t$')

# Changing government spending

g_1 = 0.4
y_1 = calculate_y(i_0, b, g_1, T, y_init)
ax2.set_title('An Increase in Government Spending on Output')
ax2.plot(x, y_0, label=r'$g=0.3$', linestyle='--')
ax2.plot(x, y_1, label=r'$g=0.4$')
ax2.legend()
ax2.set_ylabel('$y_t$')
ax2.set_xlabel('$t$')
plt.show()
20.7. BACK TO THE KEYNESIAN MULTIPLIER 335

Notice here, whether government spending increases from 0.3 to 0.4 or investment increases
from 0.3 to 0.4, the shifts in the graphs are identical
336 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
21

Linear Algebra

21.1 Contents

• Overview 21.2

• Vectors 21.3

• Matrices 21.4

• Solving Systems of Equations 21.5

• Eigenvalues and Eigenvectors 21.6

• Further Topics 21.7

• Exercises 21.8

• Solutions 21.9

21.2 Overview

Linear algebra is one of the most useful branches of applied mathematics for economists to
invest in
For example, many applied problems in economics and finance require the solution of a linear
system of equations, such as

𝑦1 = 𝑎𝑥1 + 𝑏𝑥2
𝑦2 = 𝑐𝑥1 + 𝑑𝑥2

or, more generally,

𝑦1 = 𝑎11 𝑥1 + 𝑎12 𝑥2 + ⋯ + 𝑎1𝑘 𝑥𝑘

⋮ (1)
𝑦𝑛 = 𝑎𝑛1 𝑥1 + 𝑎𝑛2 𝑥2 + ⋯ + 𝑎𝑛𝑘 𝑥𝑘

The objective here is to solve for the “unknowns” 𝑥1 , … , 𝑥𝑘 given 𝑎11 , … , 𝑎𝑛𝑘 and 𝑦1 , … , 𝑦𝑛

337
338 21. LINEAR ALGEBRA

When considering such problems, it is essential that we first consider at least some of the fol-
lowing questions

• Does a solution actually exist?

• Are there in fact many solutions, and if so how should we interpret them?
• If no solution exists, is there a best “approximate” solution?
• If a solution exists, how should we compute it?

These are the kinds of topics addressed by linear algebra

In this lecture we will cover the basics of linear and matrix algebra, treating both theory and
computation
We admit some overlap with this lecture, where operations on NumPy arrays were first ex-
plained
Note that this lecture is more theoretical than most, and contains background material that
will be used in applications as we go along

21.3 Vectors

A vector of length 𝑛 is just a sequence (or array, or tuple) of 𝑛 numbers, which we write as
𝑥 = (𝑥1 , … , 𝑥𝑛 ) or 𝑥 = [𝑥1 , … , 𝑥𝑛 ]
We will write these sequences either horizontally or vertically as we please
(Later, when we wish to perform certain matrix operations, it will become necessary to distin-
guish between the two)
The set of all 𝑛-vectors is denoted by R𝑛
For example, R2 is the plane, and a vector in R2 is just a point in the plane
Traditionally, vectors are represented visually as arrows from the origin to the point
The following figure represents three vectors in this manner

In [1]: import matplotlib.pyplot as plt

%matplotlib inline

fig, ax = plt.subplots(figsize=(10, 8))

# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(-5, 5), ylim=(-5, 5))

ax.grid()
vecs = ((2, 4), (-3, 3), (-4, -3.5))
for v in vecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.7,
width=0.5))
ax.text(1.1 * v[0], 1.1 * v[1], str(v))
plt.show()
21.3. VECTORS 339

21.3.1 Vector Operations

The two most common operators for vectors are addition and scalar multiplication, which we
now describe
As a matter of definition, when we add two vectors, we add them element-by-element

𝑥1 𝑦1 𝑥1 + 𝑦1
⎡𝑥 ⎤ ⎡𝑦 ⎤ ⎡𝑥 + 𝑦 ⎤
𝑥 + 𝑦 = ⎢ 2 ⎥ + ⎢ 2 ⎥ ∶= ⎢ 2 2⎥
⎢ ⋮ ⎥ ⎢ ⋮ ⎥ ⎢ ⋮ ⎥
𝑥
⎣ 𝑛⎦ ⎣ 𝑛⎦𝑦 𝑥
⎣ 𝑛 + 𝑦 𝑛⎦

Scalar multiplication is an operation that takes a number 𝛾 and a vector 𝑥 and produces

𝛾𝑥1
⎡ 𝛾𝑥 ⎤
𝛾𝑥 ∶= ⎢ 2 ⎥
⎢ ⋮ ⎥
⎣𝛾𝑥𝑛 ⎦

Scalar multiplication is illustrated in the next figure

In [2]: import numpy as np

fig, ax = plt.subplots(figsize=(10, 8))

# Set the axes through the origin
340 21. LINEAR ALGEBRA

for spine in ['left', 'bottom']:

ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(xlim=(-5, 5), ylim=(-5, 5))

x = (2, 2)
ax.annotate('', xy=x, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=1,
width=0.5))
ax.text(x[0] + 0.4, x[1] - 0.2, '$x$', fontsize='16')

scalars = (-2, 2)
x = np.array(x)

for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] - 0.2, f'${s} x$', fontsize='16')
plt.show()

In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is
more commonly represented as a NumPy array
One advantage of NumPy arrays is that scalar multiplication and addition have very natural
syntax
21.3. VECTORS 341

In [3]: x = np.ones(3) # Vector of three ones

y = np.array((2, 4, 6)) # Converts tuple (2, 4, 6) into array
x + y

Out[3]: array([3., 5., 7.])

In [4]: 4 * x

Out[4]: array([4., 4., 4.])

21.3.2 Inner Product and Norm

The inner product of vectors 𝑥, 𝑦 ∈ R𝑛 is defined as

𝑛
𝑥′ 𝑦 ∶= ∑ 𝑥𝑖 𝑦𝑖
𝑖=1

Two vectors are called orthogonal if their inner product is zero

The norm of a vector 𝑥 represents its “length” (i.e., its distance from the zero vector) and is
defined as

1/2
√ 𝑛
‖𝑥‖ ∶= 𝑥′ 𝑥 ∶= (∑ 𝑥2𝑖 )
𝑖=1

The expression ‖𝑥 − 𝑦‖ is thought of as the distance between 𝑥 and 𝑦

Continuing on from the previous example, the inner product and norm can be computed as
follows

In [5]: np.sum(x * y) # Inner product of x and y

Out[5]: 12.0

In [6]: np.sqrt(np.sum(x**2)) # Norm of x, take one

Out[6]: 1.7320508075688772

In [7]: np.linalg.norm(x) # Norm of x, take two

Out[7]: 1.7320508075688772

21.3.3 Span

Given a set of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in R𝑛 , it’s natural to think about the new vectors we
can create by performing linear operations
New vectors created in this manner are called linear combinations of 𝐴
In particular, 𝑦 ∈ R𝑛 is a linear combination of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } if

𝑦 = 𝛽1 𝑎1 + ⋯ + 𝛽𝑘 𝑎𝑘 for some scalars 𝛽1 , … , 𝛽𝑘

342 21. LINEAR ALGEBRA

In this context, the values 𝛽1 , … , 𝛽𝑘 are called the coefficients of the linear combination
The set of linear combinations of 𝐴 is called the span of 𝐴
The next figure shows the span of 𝐴 = {𝑎1 , 𝑎2 } in R3
The span is a two-dimensional plane passing through these two points and the origin

In [8]: from matplotlib import cm

from mpl_toolkits.mplot3d import Axes3D
from scipy.interpolate import interp2d

fig = plt.figure(figsize=(10, 8))

ax = fig.gca(projection='3d')

x_min, x_max = -5, 5

y_min, y_max = -5, 5

α, β = 0.2, 0.1

ax.set(xlim=(x_min, x_max), ylim=(x_min, x_max), zlim=(x_min, x_max),

xticks=(0,), yticks=(0,), zticks=(0,))

gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k-', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k-', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k-', lw=2, alpha=0.5)

# Fixed linear function, to generate a plane

def f(x, y):
return α * x + β * y

# Vector locations, by coordinate

x_coords = np.array((3, 3))
y_coords = np.array((4, -4))
z = f(x_coords, y_coords)
for i in (0, 1):
ax.text(x_coords[i], y_coords[i], z[i], f'$a_{i+1}$', fontsize=14)

# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b-', lw=1.5, alpha=0.6)

# Draw the plane

grid_size = 20
xr2 = np.linspace(x_min, x_max, grid_size)
yr2 = np.linspace(y_min, y_max, grid_size)
x2, y2 = np.meshgrid(xr2, yr2)
z2 = f(x2, y2)
ax.plot_surface(x2, y2, z2, rstride=1, cstride=1, cmap=cm.jet,
linewidth=0, antialiased=True, alpha=0.2)
plt.show()
21.3. VECTORS 343

Examples
If 𝐴 contains only one vector 𝑎1 ∈ R2 , then its span is just the scalar multiples of 𝑎1 , which is
the unique line passing through both 𝑎1 and the origin
If 𝐴 = {𝑒1 , 𝑒2 , 𝑒3 } consists of the canonical basis vectors of R3 , that is

1 0 0
𝑒1 ∶= ⎡ ⎤
⎢0⎥ , 𝑒2 ∶= ⎡ ⎤
⎢1⎥ , 𝑒3 ∶= ⎡
⎢0⎥
⎤
⎣0⎦ ⎣0⎦ ⎣1⎦

then the span of 𝐴 is all of R3 , because, for any 𝑥 = (𝑥1 , 𝑥2 , 𝑥3 ) ∈ R3 , we can write

𝑥 = 𝑥 1 𝑒1 + 𝑥 2 𝑒2 + 𝑥 3 𝑒3

Now consider 𝐴0 = {𝑒1 , 𝑒2 , 𝑒1 + 𝑒2 }

If 𝑦 = (𝑦1 , 𝑦2 , 𝑦3 ) is any linear combination of these vectors, then 𝑦3 = 0 (check it)
Hence 𝐴0 fails to span all of R3

21.3.4 Linear Independence

As we’ll see, it’s often desirable to find families of vectors with relatively large span, so that
many vectors can be described by linear operators on a few vectors
344 21. LINEAR ALGEBRA

The condition we need for a set of vectors to have a large span is what’s called linear inde-
pendence
In particular, a collection of vectors 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } in R𝑛 is said to be

• linearly dependent if some strict subset of 𝐴 has the same span as 𝐴

• linearly independent if it is not linearly dependent

Put differently, a set of vectors is linearly independent if no vector is redundant to the span
and linearly dependent otherwise
To illustrate the idea, recall the figure that showed the span of vectors {𝑎1 , 𝑎2 } in R3 as a
plane through the origin
If we take a third vector 𝑎3 and form the set {𝑎1 , 𝑎2 , 𝑎3 }, this set will be

• linearly dependent if 𝑎3 lies in the plane

• linearly independent otherwise

As another illustration of the concept, since R𝑛 can be spanned by 𝑛 vectors (see the discus-
sion of canonical basis vectors above), any collection of 𝑚 > 𝑛 vectors in R𝑛 must be linearly
dependent
The following statements are equivalent to linear independence of 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ R𝑛

1. No vector in 𝐴 can be formed as a linear combination of the other elements

2. If 𝛽1 𝑎1 + ⋯ 𝛽𝑘 𝑎𝑘 = 0 for scalars 𝛽1 , … , 𝛽𝑘 , then 𝛽1 = ⋯ = 𝛽𝑘 = 0

(The zero in the first expression is the origin of R𝑛 )

21.3.5 Unique Representations

Another nice thing about sets of linearly independent vectors is that each element in the span
has a unique representation as a linear combination of these vectors
In other words, if 𝐴 ∶= {𝑎1 , … , 𝑎𝑘 } ⊂ R𝑛 is linearly independent and

𝑦 = 𝛽 1 𝑎1 + ⋯ 𝛽 𝑘 𝑎𝑘

then no other coefficient sequence 𝛾1 , … , 𝛾𝑘 will produce the same vector 𝑦

Indeed, if we also have 𝑦 = 𝛾1 𝑎1 + ⋯ 𝛾𝑘 𝑎𝑘 , then

(𝛽1 − 𝛾1 )𝑎1 + ⋯ + (𝛽𝑘 − 𝛾𝑘 )𝑎𝑘 = 0

Linear independence now implies 𝛾𝑖 = 𝛽𝑖 for all 𝑖

21.4 Matrices

Matrices are a neat way of organizing data for use in linear operations
21.4. MATRICES 345

An 𝑛 × 𝑘 matrix is a rectangular array 𝐴 of numbers with 𝑛 rows and 𝑘 columns:

𝑎11 𝑎12 ⋯ 𝑎1𝑘

⎡𝑎 𝑎22 ⋯ 𝑎2𝑘 ⎤
𝐴 = ⎢ 21 ⎥
⎢ ⋮ ⋮ ⋮ ⎥
⎣𝑎𝑛1 𝑎𝑛2 ⋯ 𝑎𝑛𝑘 ⎦
Often, the numbers in the matrix represent coefficients in a system of linear equations, as dis-
cussed at the start of this lecture
For obvious reasons, the matrix 𝐴 is also called a vector if either 𝑛 = 1 or 𝑘 = 1
In the former case, 𝐴 is called a row vector, while in the latter it is called a column vector
If 𝑛 = 𝑘, then 𝐴 is called square
The matrix formed by replacing 𝑎𝑖𝑗 by 𝑎𝑗𝑖 for every 𝑖 and 𝑗 is called the transpose of 𝐴 and
denoted 𝐴′ or 𝐴⊤
If 𝐴 = 𝐴′ , then 𝐴 is called symmetric
For a square matrix 𝐴, the 𝑖 elements of the form 𝑎𝑖𝑖 for 𝑖 = 1, … , 𝑛 are called the principal
diagonal
𝐴 is called diagonal if the only nonzero entries are on the principal diagonal
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then
𝐴 is called the identity matrix and denoted by 𝐼

21.4.1 Matrix Operations

Just as was the case for vectors, a number of algebraic operations are defined for matrices
Scalar multiplication and addition are immediate generalizations of the vector case:

𝑎11 ⋯ 𝑎1𝑘 𝛾𝑎11 ⋯ 𝛾𝑎1𝑘

𝛾𝐴 = 𝛾 ⎡
⎢ ⋮ ⋮ ⋮ ⎤ ∶= ⎡ ⋮
⎥ ⎢ ⋮ ⋮ ⎤⎥
⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝛾𝑎𝑛1 ⋯ 𝛾𝑎𝑛𝑘 ⎦
and

𝑎11 ⋯ 𝑎1𝑘 𝑏11 ⋯ 𝑏1𝑘 𝑎11 + 𝑏11 ⋯ 𝑎1𝑘 + 𝑏1𝑘

𝐴+𝐵 = ⎡
⎢ ⋮ ⋮ ⋮ ⎤+⎡ ⋮
⎥ ⎢ ⋮ ⋮ ⎤ ∶= ⎡
⎥ ⎢ ⋮ ⋮ ⋮ ⎤
⎥
⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝑏𝑛1 ⋯ 𝑏𝑛𝑘 ⎦ ⎣𝑎𝑛1 + 𝑏𝑛1 ⋯ 𝑎𝑛𝑘 + 𝑏𝑛𝑘 ⎦
In the latter case, the matrices must have the same shape in order for the definition to make
sense
We also have a convention for multiplying two matrices
The rule for matrix multiplication generalizes the idea of inner products discussed above and
is designed to make multiplication play well with basic linear operations
If 𝐴 and 𝐵 are two matrices, then their product 𝐴𝐵 is formed by taking as its 𝑖, 𝑗-th element
the inner product of the 𝑖-th row of 𝐴 and the 𝑗-th column of 𝐵
There are many tutorials to help you visualize this operation, such as this one, or the discus-
sion on the Wikipedia page
346 21. LINEAR ALGEBRA

If 𝐴 is 𝑛 × 𝑘 and 𝐵 is 𝑗 × 𝑚, then to multiply 𝐴 and 𝐵 we require 𝑘 = 𝑗, and the resulting

matrix 𝐴𝐵 is 𝑛 × 𝑚
As perhaps the most important special case, consider multiplying 𝑛 × 𝑘 matrix 𝐴 and 𝑘 × 1
column vector 𝑥
According to the preceding rule, this gives us an 𝑛 × 1 column vector

𝑎11 ⋯ 𝑎1𝑘 𝑥1 𝑎11 𝑥1 + ⋯ + 𝑎1𝑘 𝑥𝑘

𝐴𝑥 = ⎡
⎢ ⋮ ⋮ ⋮ ⎤ ⎡ ⋮ ⎤ ∶= ⎡
⎥⎢ ⎥ ⎢ ⋮ ⎤
⎥ (2)
⎣𝑎𝑛1 ⋯ 𝑎𝑛𝑘 ⎦ ⎣𝑥𝑘 ⎦ ⎣𝑎𝑛1 𝑥1 + ⋯ + 𝑎𝑛𝑘 𝑥𝑘 ⎦

Note
𝐴𝐵 and 𝐵𝐴 are not generally the same thing

Another important special case is the identity matrix

You should check that if 𝐴 is 𝑛 × 𝑘 and 𝐼 is the 𝑘 × 𝑘 identity matrix, then 𝐴𝐼 = 𝐴
If 𝐼 is the 𝑛 × 𝑛 identity matrix, then 𝐼𝐴 = 𝐴

21.4.2 Matrices in NumPy

NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all
the standard matrix operations [1]
You can create them manually from tuples of tuples (or lists of lists) as follows

In [9]: A = ((1, 2),

(3, 4))

type(A)

Out[9]: tuple

In [10]: A = np.array(A)

type(A)

Out[10]: numpy.ndarray

In [11]: A.shape

Out[11]: (2, 2)

The shape attribute is a tuple giving the number of rows and columns — see here for more
discussion
To get the transpose of A, use A.transpose() or, more simply, A.T
There are many convenient functions for creating common matrices (matrices of zeros, ones,
etc.) — see here
Since operations are performed elementwise by default, scalar multiplication and addition
have very natural syntax
21.5. SOLVING SYSTEMS OF EQUATIONS 347

In [12]: A = np.identity(3)
B = np.ones((3, 3))
2 * A

Out[12]: array([[2., 0., 0.],

[0., 2., 0.],
[0., 0., 2.]])

In [13]: A + B

Out[13]: array([[2., 1., 1.],

[1., 2., 1.],
[1., 1., 2.]])

To multiply matrices we use the @ symbol

In particular, A @ B is matrix multiplication, whereas A * B is element-by-element multipli-
cation
See here for more discussion

21.4.3 Matrices as Maps

Each 𝑛 × 𝑘 matrix 𝐴 can be identified with a function 𝑓(𝑥) = 𝐴𝑥 that maps 𝑥 ∈ R𝑘 into
𝑦 = 𝐴𝑥 ∈ R𝑛
These kinds of functions have a special property: they are linear
A function 𝑓 ∶ R𝑘 → R𝑛 is called linear if, for all 𝑥, 𝑦 ∈ R𝑘 and all scalars 𝛼, 𝛽, we have

𝑓(𝛼𝑥 + 𝛽𝑦) = 𝛼𝑓(𝑥) + 𝛽𝑓(𝑦)

You can check that this holds for the function 𝑓(𝑥) = 𝐴𝑥 + 𝑏 when 𝑏 is the zero vector and
fails when 𝑏 is nonzero
In fact, it’s known that 𝑓 is linear if and only if there exists a matrix 𝐴 such that 𝑓(𝑥) = 𝐴𝑥
for all 𝑥

21.5 Solving Systems of Equations

Recall again the system of equations Eq. (1)

If we compare Eq. (1) and Eq. (2), we see that Eq. (1) can now be written more conveniently
as

𝑦 = 𝐴𝑥 (3)

The problem we face is to determine a vector 𝑥 ∈ R𝑘 that solves Eq. (3), taking 𝑦 and 𝐴 as
given
This is a special case of a more general problem: Find an 𝑥 such that 𝑦 = 𝑓(𝑥)
Given an arbitrary function 𝑓 and a 𝑦, is there always an 𝑥 such that 𝑦 = 𝑓(𝑥)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
348 21. LINEAR ALGEBRA

In [14]: def f(x):

return 0.6 * np.cos(4 * x) + 1.4

xmin, xmax = -1, 1

x = np.linspace(xmin, xmax, 160)
y = f(x)
ya, yb = np.min(y), np.max(y)

fig, axes = plt.subplots(2, 1, figsize=(10, 10))

for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')

ax.set(ylim=(-0.6, 3.2), xlim=(xmin, xmax),

yticks=(), xticks=())

ax.plot(x, y, 'k-', lw=2, label='$f$')

ax.fill_between(x, ya, yb, facecolor='blue', alpha=0.05)
ax.vlines([0], ya, yb, lw=3, color='blue', label='range of $f$')
ax.text(0.04, -0.3, '$0$', fontsize=16)

ax = axes[0]

ax.legend(loc='upper right', frameon=False)

ybar = 1.5
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.05, 0.8 * ybar, '$y$', fontsize=16)
for i, z in enumerate((-0.35, 0.35)):
ax.vlines(z, 0, f(z), linestyle='--', alpha=0.5)
ax.text(z, -0.2, f'$x_{i}$', fontsize=16)

ax = axes[1]

ybar = 2.6
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.04, 0.91 * ybar, '$y$', fontsize=16)

plt.show()
21.5. SOLVING SYSTEMS OF EQUATIONS 349

In the first plot, there are multiple solutions, as the function is not one-to-one, while in the
second there are no solutions, since 𝑦 lies outside the range of 𝑓
Can we impose conditions on 𝐴 in Eq. (3) that rule out these problems?
In this context, the most important thing to recognize about the expression 𝐴𝑥 is that it cor-
responds to a linear combination of the columns of 𝐴
In particular, if 𝑎1 , … , 𝑎𝑘 are the columns of 𝐴, then

𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘

Hence the range of 𝑓(𝑥) = 𝐴𝑥 is exactly the span of the columns of 𝐴

We want the range to be large so that it contains arbitrary 𝑦
As you might recall, the condition that we want for the span to be large is linear indepen-
dence
A happy fact is that linear independence of the columns of 𝐴 also gives us uniqueness
Indeed, it follows from our earlier discussion that if {𝑎1 , … , 𝑎𝑘 } are linearly independent and
𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + ⋯ + 𝑥𝑘 𝑎𝑘 , then no 𝑧 ≠ 𝑥 satisfies 𝑦 = 𝐴𝑧
350 21. LINEAR ALGEBRA

21.5.1 The Square Matrix Case

Let’s discuss some more details, starting with the case where 𝐴 is 𝑛 × 𝑛
This is the familiar case where the number of unknowns equals the number of equations
For arbitrary 𝑦 ∈ R𝑛 , we hope to find a unique 𝑥 ∈ R𝑛 such that 𝑦 = 𝐴𝑥
In view of the observations immediately above, if the columns of 𝐴 are linearly independent,
then their span, and hence the range of 𝑓(𝑥) = 𝐴𝑥, is all of R𝑛
Hence there always exists an 𝑥 such that 𝑦 = 𝐴𝑥
Moreover, the solution is unique
In particular, the following are equivalent

1. The columns of 𝐴 are linearly independent

2. For any 𝑦 ∈ R𝑛 , the equation 𝑦 = 𝐴𝑥 has a unique solution

The property of having linearly independent columns is sometimes expressed as having full
column rank
Inverse Matrices
Can we give some sort of expression for the solution?
If 𝑦 and 𝐴 are scalar with 𝐴 ≠ 0, then the solution is 𝑥 = 𝐴−1 𝑦
A similar expression is available in the matrix case
In particular, if square matrix 𝐴 has full column rank, then it possesses a multiplicative in-
verse matrix 𝐴−1 , with the property that 𝐴𝐴−1 = 𝐴−1 𝐴 = 𝐼
As a consequence, if we pre-multiply both sides of 𝑦 = 𝐴𝑥 by 𝐴−1 , we get 𝑥 = 𝐴−1 𝑦
This is the solution that we’re looking for
Determinants
Another quick comment about square matrices is that to every such matrix we assign a
unique number called the determinant of the matrix — you can find the expression for it here
If the determinant of 𝐴 is not zero, then we say that 𝐴 is nonsingular
Perhaps the most important fact about determinants is that 𝐴 is nonsingular if and only if 𝐴
is of full column rank
This gives us a useful one-number summary of whether or not a square matrix can be in-
verted

21.5.2 More Rows than Columns

This is the 𝑛 × 𝑘 case with 𝑛 > 𝑘

This case is very important in many settings, not least in the setting of linear regression
(where 𝑛 is the number of observations, and 𝑘 is the number of explanatory variables)
Given arbitrary 𝑦 ∈ R𝑛 , we seek an 𝑥 ∈ R𝑘 such that 𝑦 = 𝐴𝑥
In this setting, the existence of a solution is highly unlikely
21.5. SOLVING SYSTEMS OF EQUATIONS 351

Without much loss of generality, let’s go over the intuition focusing on the case where the
columns of 𝐴 are linearly independent
It follows that the span of the columns of 𝐴 is a 𝑘-dimensional subspace of R𝑛
This span is very “unlikely” to contain arbitrary 𝑦 ∈ R𝑛
To see why, recall the figure above, where 𝑘 = 2 and 𝑛 = 3
Imagine an arbitrarily chosen 𝑦 ∈ R3 , located somewhere in that three-dimensional space
What’s the likelihood that 𝑦 lies in the span of {𝑎1 , 𝑎2 } (i.e., the two dimensional plane
through these points)?
In a sense, it must be very small, since this plane has zero “thickness”
As a result, in the 𝑛 > 𝑘 case we usually give up on existence
However, we can still seek the best approximation, for example, an 𝑥 that makes the distance
‖𝑦 − 𝐴𝑥‖ as small as possible
To solve this problem, one can use either calculus or the theory of orthogonal projections
The solution is known to be 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦 — see for example chapter 3 of these notes

21.5.3 More Columns than Rows

This is the 𝑛 × 𝑘 case with 𝑛 < 𝑘, so there are fewer equations than unknowns
In this case there are either no solutions or infinitely many — in other words, uniqueness
never holds
For example, consider the case where 𝑘 = 3 and 𝑛 = 2
Thus, the columns of 𝐴 consists of 3 vectors in R2
This set can never be linearly independent, since it is possible to find two vectors that span
R2
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two
For example, let’s say that 𝑎1 = 𝛼𝑎2 + 𝛽𝑎3
Then if 𝑦 = 𝐴𝑥 = 𝑥1 𝑎1 + 𝑥2 𝑎2 + 𝑥3 𝑎3 , we can also write

𝑦 = 𝑥1 (𝛼𝑎2 + 𝛽𝑎3 ) + 𝑥2 𝑎2 + 𝑥3 𝑎3 = (𝑥1 𝛼 + 𝑥2 )𝑎2 + (𝑥1 𝛽 + 𝑥3 )𝑎3

In other words, uniqueness fails

21.5.4 Linear Equations with SciPy

Here’s an illustration of how to solve linear equations with SciPy’s linalg submodule
All of these routines are Python front ends to time-tested and highly optimized FORTRAN
code

In [15]: from scipy.linalg import inv, solve, det

352 21. LINEAR ALGEBRA

A = ((1, 2), (3, 4))

A = np.array(A)
y = np.ones((2, 1)) # Column vector
det(A) # Check that A is nonsingular, and hence invertible

Out[15]: -2.0

In [16]: A_inv = inv(A) # Compute the inverse

A_inv

Out[16]: array([[-2. , 1. ],
[ 1.5, -0.5]])

In [17]: x = A_inv @ y # Solution

A @ x # Should equal y

Out[17]: array([[1.],
[1.]])

In [18]: solve(A, y) # Produces the same solution

Out[18]: array([[-1.],
[ 1.]])

Observe how we can solve for 𝑥 = 𝐴−1 𝑦 by either via inv(A) @ y, or using solve(A, y)
The latter method uses a different algorithm (LU decomposition) that is numerically more
stable, and hence should almost always be preferred
To obtain the least-squares solution 𝑥̂ = (𝐴′ 𝐴)−1 𝐴′ 𝑦, use scipy.linalg.lstsq(A, y)

21.6 Eigenvalues and Eigenvectors

Let 𝐴 be an 𝑛 × 𝑛 square matrix

If 𝜆 is scalar and 𝑣 is a non-zero vector in R𝑛 such that

𝐴𝑣 = 𝜆𝑣

then we say that 𝜆 is an eigenvalue of 𝐴, and 𝑣 is an eigenvector

Thus, an eigenvector of 𝐴 is a vector such that when the map 𝑓(𝑥) = 𝐴𝑥 is applied, 𝑣 is
merely scaled
The next figure shows two eigenvectors (blue arrows) and their images under 𝐴 (red arrows)
As expected, the image 𝐴𝑣 of each 𝑣 is just a scaled version of the original

In [19]: from scipy.linalg import eig

A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evecs = evecs[:, 0], evecs[:, 1]

fig, ax = plt.subplots(figsize=(10, 8))

# Set the axes through the origin
21.6. EIGENVALUES AND EIGENVECTORS 353

for spine in ['left', 'bottom']:

ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax.grid(alpha=0.4)

xmin, xmax = -3, 3

ymin, ymax = -3, 3
ax.set(xlim=(xmin, xmax), ylim=(ymin, ymax))

# Plot each eigenvector

for v in evecs:
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='blue',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the image of each eigenvector

for v in evecs:
v = A @ v
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.6,
width=0.5))

# Plot the lines they run through

x = np.linspace(xmin, xmax, 3)
for v in evecs:
a = v[1] / v[0]
ax.plot(x, a * x, 'b-', lw=0.4)

plt.show()
354 21. LINEAR ALGEBRA

The eigenvalue equation is equivalent to (𝐴 − 𝜆𝐼)𝑣 = 0, and this has a nonzero solution 𝑣 only
when the columns of 𝐴 − 𝜆𝐼 are linearly dependent
This in turn is equivalent to stating that the determinant is zero
Hence to find all eigenvalues, we can look for 𝜆 such that the determinant of 𝐴 − 𝜆𝐼 is zero
This problem can be expressed as one of solving for the roots of a polynomial in 𝜆 of degree 𝑛
This in turn implies the existence of 𝑛 solutions in the complex plane, although some might
be repeated
Some nice facts about the eigenvalues of a square matrix 𝐴 are as follows

1. The determinant of 𝐴 equals the product of the eigenvalues

2. The trace of 𝐴 (the sum of the elements on the principal diagonal) equals the sum of
the eigenvalues
3. If 𝐴 is symmetric, then all of its eigenvalues are real
4. If 𝐴 is invertible and 𝜆1 , … , 𝜆𝑛 are its eigenvalues, then the eigenvalues of 𝐴−1 are
1/𝜆1 , … , 1/𝜆𝑛

A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues
are nonzero
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows

In [20]: A = ((1, 2),

(2, 1))

A = np.array(A)
evals, evecs = eig(A)
evals

Out[20]: array([ 3.+0.j, -1.+0.j])

In [21]: evecs

Out[21]: array([[ 0.70710678, -0.70710678],

[ 0.70710678, 0.70710678]])

Note that the columns of evecs are the eigenvectors

Since any scalar multiple of an eigenvector is an eigenvector with the same eigenvalue (check
it), the eig routine normalizes the length of each eigenvector to one

21.6.1 Generalized Eigenvalues

It is sometimes useful to consider the generalized eigenvalue problem, which, for given matri-
ces 𝐴 and 𝐵, seeks generalized eigenvalues 𝜆 and eigenvectors 𝑣 such that

𝐴𝑣 = 𝜆𝐵𝑣

This can be solved in SciPy via scipy.linalg.eig(A, B)

Of course, if 𝐵 is square and invertible, then we can treat the generalized eigenvalue problem
as an ordinary eigenvalue problem 𝐵−1 𝐴𝑣 = 𝜆𝑣, but this is not always the case
21.7. FURTHER TOPICS 355

21.7 Further Topics

We round out our discussion by briefly mentioning several other important topics

21.7.1 Series Expansions

Recall the usual summation formula for a geometric progression, which states that if |𝑎| < 1,
∞
then ∑𝑘=0 𝑎𝑘 = (1 − 𝑎)−1
A generalization of this idea exists in the matrix setting
Matrix Norms
Let 𝐴 be a square matrix, and let

‖𝐴‖ ∶= max ‖𝐴𝑥‖

‖𝑥‖=1

The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand
side is a matrix norm — in this case, the so-called spectral norm
For example, for a square matrix 𝑆, the condition ‖𝑆‖ < 1 means that 𝑆 is contractive, in the
sense that it pulls all vectors towards the origin [2]
Neumann’s Theorem
Let 𝐴 be a square matrix and let 𝐴𝑘 ∶= 𝐴𝐴𝑘−1 with 𝐴1 ∶= 𝐴
In other words, 𝐴𝑘 is the 𝑘-th power of 𝐴
Neumann’s theorem states the following: If ‖𝐴𝑘 ‖ < 1 for some 𝑘 ∈ N, then 𝐼 − 𝐴 is invertible,
and

∞
(𝐼 − 𝐴)−1 = ∑ 𝐴𝑘 (4)
𝑘=0

Spectral Radius
A result known as Gelfand’s formula tells us that, for any square matrix 𝐴,

𝜌(𝐴) = lim ‖𝐴𝑘 ‖1/𝑘

𝑘→∞

Here 𝜌(𝐴) is the spectral radius, defined as max𝑖 |𝜆𝑖 |, where {𝜆𝑖 }𝑖 is the set of eigenvalues of 𝐴
As a consequence of Gelfand’s formula, if all eigenvalues are strictly less than one in modulus,
there exists a 𝑘 with ‖𝐴𝑘 ‖ < 1
In which case Eq. (4) is valid

21.7.2 Positive Definite Matrices

Let 𝐴 be a symmetric 𝑛 × 𝑛 matrix

We say that 𝐴 is
356 21. LINEAR ALGEBRA

1. positive definite if 𝑥′ 𝐴𝑥 > 0 for every 𝑥 ∈ R𝑛 {0}

2. positive semi-definite or nonnegative definite if 𝑥′ 𝐴𝑥 ≥ 0 for every 𝑥 ∈ R𝑛

Analogous definitions exist for negative definite and negative semi-definite matrices
It is notable that if 𝐴 is positive definite, then all of its eigenvalues are strictly positive, and
hence 𝐴 is invertible (with positive definite inverse)

21.7.3 Differentiating Linear and Quadratic Forms

The following formulas are useful in many economic contexts. Let

• 𝑧, 𝑥 and 𝑎 all be 𝑛 × 1 vectors

• 𝐴 be an 𝑛 × 𝑛 matrix
• 𝐵 be an 𝑚 × 𝑛 matrix and 𝑦 be an 𝑚 × 1 vector

Then

𝜕𝑎′ 𝑥
1. 𝜕𝑥 = 𝑎
𝜕𝐴𝑥 ′
2. 𝜕𝑥 = 𝐴
′
𝜕𝑥 𝐴𝑥
3. 𝜕𝑥 = (𝐴 + 𝐴′ )𝑥
𝜕𝑦′ 𝐵𝑧
4. 𝜕𝑦 = 𝐵𝑧
𝜕𝑦′ 𝐵𝑧 ′
5. 𝜕𝐵 = 𝑦𝑧

Exercise 1 below asks you to apply these formulas

21.7.4 Further Reading

The documentation of the scipy.linalg submodule can be found here

Chapters 2 and 3 of the Econometric Theory contains a discussion of linear algebra along the
same lines as above, with solved exercises
If you don’t mind a slightly abstract approach, a nice intermediate-level text on linear algebra
is [69]

21.8 Exercises

21.8.1 Exercise 1

Let 𝑥 be a given 𝑛 × 1 vector and consider the problem

𝑣(𝑥) = max {−𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢}

𝑦,𝑢

subject to the linear constraint

𝑦 = 𝐴𝑥 + 𝐵𝑢

Here
21.9. SOLUTIONS 357

• 𝑃 is an 𝑛 × 𝑛 matrix and 𝑄 is an 𝑚 × 𝑚 matrix

• 𝐴 is an 𝑛 × 𝑛 matrix and 𝐵 is an 𝑛 × 𝑚 matrix
• both 𝑃 and 𝑄 are symmetric and positive semidefinite

(What must the dimensions of 𝑦 and 𝑢 be to make this a well-posed problem?)

One way to solve the problem is to form the Lagrangian

ℒ = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]

where 𝜆 is an 𝑛 × 1 vector of Lagrange multipliers

Try applying the formulas given above for differentiating quadratic and linear forms to ob-
tain the first-order conditions for maximizing ℒ with respect to 𝑦, 𝑢 and minimizing it with
respect to 𝜆
Show that these conditions imply that

1. 𝜆 = −2𝑃 𝑦
2. The optimizing choice of 𝑢 satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
3. The function 𝑣 satisfies 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 where 𝑃 ̃ = 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴

As we will see, in economic contexts Lagrange multipliers often are shadow prices

Note
If we don’t care about the Lagrange multipliers, we can substitute the constraint
into the objective function, and then just maximize −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) −
𝑢′ 𝑄𝑢 with respect to 𝑢. You can verify that this leads to the same maximizer.

21.9 Solutions

21.9.1 Solution to Exercise 1

We have an optimization problem:

𝑣(𝑥) = max{−𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢}

𝑦,𝑢

s.t.

𝑦 = 𝐴𝑥 + 𝐵𝑢

with primitives

• 𝑃 be a symmetric and positive semidefinite 𝑛 × 𝑛 matrix

• 𝑄 be a symmetric and positive semidefinite 𝑚 × 𝑚 matrix
• 𝐴 an 𝑛 × 𝑛 matrix
• 𝐵 an 𝑛 × 𝑚 matrix
358 21. LINEAR ALGEBRA

The associated Lagrangian is :

𝐿 = −𝑦′ 𝑃 𝑦 − 𝑢′ 𝑄𝑢 + 𝜆′ [𝐴𝑥 + 𝐵𝑢 − 𝑦]

1.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields

𝜕𝐿
= −(𝑃 + 𝑃 ′ )𝑦 − 𝜆 = −2𝑃 𝑦 − 𝜆 = 0 ,
𝜕𝑦

since P is symmetric
Accordingly, the first-order condition for maximizing L w.r.t. y implies

𝜆 = −2𝑃 𝑦

2.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields

𝜕𝐿
= −(𝑄 + 𝑄′ )𝑢 − 𝐵′ 𝜆 = −2𝑄𝑢 + 𝐵′ 𝜆 = 0
𝜕𝑢
Substituting 𝜆 = −2𝑃 𝑦 gives

𝑄𝑢 + 𝐵′ 𝑃 𝑦 = 0

Substituting the linear constraint 𝑦 = 𝐴𝑥 + 𝐵𝑢 into above equation gives

𝑄𝑢 + 𝐵′ 𝑃 (𝐴𝑥 + 𝐵𝑢) = 0

(𝑄 + 𝐵′ 𝑃 𝐵)𝑢 + 𝐵′ 𝑃 𝐴𝑥 = 0

which is the first-order condition for maximizing L w.r.t. u

Thus, the optimal choice of u must satisfy

𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥 ,

which follows from the definition of the first-order conditions for Lagrangian equation
3.
Rewriting our problem by substituting the constraint into the objective function, we get

𝑣(𝑥) = max{−(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢}

𝑢

Since we know the optimal choice of u satisfies 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥, then

𝑣(𝑥) = −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢 𝑤𝑖𝑡ℎ 𝑢 = −(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

21.9. SOLUTIONS 359

To evaluate the function

𝑣(𝑥) = −(𝐴𝑥 + 𝐵𝑢)′ 𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢

= −(𝑥′ 𝐴′ + 𝑢′ 𝐵′ )𝑃 (𝐴𝑥 + 𝐵𝑢) − 𝑢′ 𝑄𝑢
= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑥′ 𝐴′ 𝑃 𝐵𝑢 − 𝑢′ 𝐵′ 𝑃 𝐵𝑢 − 𝑢′ 𝑄𝑢
= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 2𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢

For simplicity, denote by 𝑆 ∶= (𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴, then $u = -Sx$

Regarding the second term −2𝑢′ 𝐵′ 𝑃 𝐴𝑥,

−2𝑢′ 𝐵′ 𝑃 𝐴𝑥 = −2𝑥′ 𝑆 ′ 𝐵′ 𝑃 𝐴𝑥
= 2𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

Notice that the term (𝑄 + 𝐵′ 𝑃 𝐵)−1 is symmetric as both P and Q are symmetric
Regarding the third term −𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢,

−𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢 = −𝑥′ 𝑆 ′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑆𝑥

= −𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

Hence, the summation of second and third terms is 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥

This implies that

𝑣(𝑥) = −𝑥′ 𝐴′ 𝑃 𝐴𝑥 − 2𝑢′ 𝐵′ 𝑃 𝐴𝑥 − 𝑢′ (𝑄 + 𝐵′ 𝑃 𝐵)𝑢

= −𝑥′ 𝐴′ 𝑃 𝐴𝑥 + 𝑥′ 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴𝑥
= −𝑥′ [𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴]𝑥

Therefore, the solution to the optimization problem 𝑣(𝑥) = −𝑥′ 𝑃 ̃ 𝑥 follows the above result by
denoting 𝑃 ̃ ∶= 𝐴′ 𝑃 𝐴 − 𝐴′ 𝑃 𝐵(𝑄 + 𝐵′ 𝑃 𝐵)−1 𝐵′ 𝑃 𝐴
Footnotes
[1] Although there is a specialized matrix data type defined in NumPy, it’s more standard to
work with ordinary NumPy arrays. See this discussion.
[2] Suppose that ‖𝑆‖ < 1. Take any nonzero vector 𝑥, and let 𝑟 ∶= ‖𝑥‖. We have ‖𝑆𝑥‖ =
𝑟‖𝑆(𝑥/𝑟)‖ ≤ 𝑟‖𝑆‖ < 𝑟 = ‖𝑥‖. Hence every point is pulled towards the origin.
360 21. LINEAR ALGEBRA
22

Complex Numbers and Trignometry

22.1 Contents

• Overview 22.2

• De Moivre’s Theorem 22.3

• Applications of de Moivre’s Theorem 22.4

22.2 Overview

This lecture introduces some elementary mathematics and trigonometry

Useful and interesting in its own right, these concepts reap substantial rewards when studying
dynamics generated by linear difference equations or linear differential equations
For example, these tools are keys to understanding outcomes attained by Paul Samuelson
(1939) [115] in his classic paper on interactions between the investment accelerator and the
Keynesian consumption function, our topic in the lecture Samuelson Multiplier Accelerator
In addition to providing foundations for Samuelson’s work and extensions of it, this lec-
ture can be read as a stand-alone quick reminder of key results from elementary high school
trigonometry
So let’s dive in

22.2.1 Complex Numbers

A complex number has a real part 𝑥 and a purely imaginary part 𝑦

The Euclidean, polar, and trigonometric forms of a complex number 𝑧 are:

𝑧 = 𝑥 + 𝑖𝑦 = 𝑟𝑒𝑖𝜃 = 𝑟(cos 𝜃 + 𝑖 sin 𝜃)

The second equality above is known as called Euler’s formula

• Euler contributed many other formulas too!

361
362 22. COMPLEX NUMBERS AND TRIGNOMETRY

The complex conjugate 𝑧 ̄ of 𝑧 is defined as

𝑧 ̄ = 𝑟𝑒−𝑖𝜃 = 𝑟(cos 𝜃 − 𝑖 sin 𝜃)

The value 𝑥 is the real part of 𝑧 and 𝑦 is the imaginary part of 𝑧

The symbol |𝑧| = 𝑧𝑧̄ = 𝑟 represents the modulus of 𝑧
The value 𝑟 is the Euclidean distance of vector (𝑥, 𝑦) from the origin:

𝑟 = |𝑧| = √𝑥2 + 𝑦2

The value 𝜃 is the angle of (𝑥, 𝑦) with respect to the real axis
Evidently, the tangent of 𝜃 is ( 𝑥𝑦 )
Therefore,

𝑦
𝜃 = tan−1 ( )
𝑥

Three elementary trigonometric functions are

𝑥 𝑒𝑖𝜃 + 𝑒−𝑖𝜃 𝑦 𝑒𝑖𝜃 − 𝑒−𝑖𝜃 𝑥

cos 𝜃 = = , sin 𝜃 = = , tan 𝜃 =
𝑟 2 𝑟 2𝑖 𝑦

We’ll need the following imports

In [1]: import numpy as np

import matplotlib.pyplot as plt
%matplotlib inline

22.2.2 An Example
√
Consider the complex number 𝑧 = 1 + 3𝑖
√ √
For 𝑧 = 1 + 3𝑖, 𝑥 = 1, 𝑦 = 3
√
It follows that 𝑟 = 2 and 𝜃 = tan−1 ( 3) = 𝜋3 = 60𝑜
√
Let’s use Python to plot the trigonometric form of the complex number 𝑧 = 1 + 3𝑖

In [2]: # Abbreviate useful values and functions

π = np.pi
zeros = np.zeros
ones = np.ones

# Set parameters
r = 2
θ = π/3
x = r * np.cos(θ)
x_range = np.linspace(0, x, 1000)
θ_range = np.linspace(0, θ, 1000)

# Plot
fig = plt.figure(figsize=(8, 8))
ax = plt.subplot(111, projection='polar')

ax.plot((0, θ), (0, r), marker='o', color='b') # plot r

22.2. OVERVIEW 363

ax.plot(zeros(x_range.shape), x_range, color='b') # plot x

ax.plot(θ_range, x / np.cos(θ_range), color='b') # plot y
ax.plot(θ_range, ones(θ_range.shape) * 0.1, color='r') # plot θ

ax.margins(0) # Let the plot starts at origin

ax.set_title("Trigonometry of complex numbers", va='bottom', fontsize='x-large')

ax.set_rmax(2)
ax.set_rticks((0.5, 1, 1.5, 2)) # less radial ticks
ax.set_rlabel_position(-88.5) # get radial labels away from plotted line

ax.text(θ, r+0.01 , r'$z = x + iy = 1 + \sqrt{3}\, i$') # label z

ax.text(θ+0.2, 1 , '$r = 2$') # label r
ax.text(0-0.2, 0.5, '$x = 1$') # label x
ax.text(0.5, 1.2, r'$y = \sqrt{3}$') # label y
ax.text(0.25, 0.15, r'$\theta = 60^o$') # label θ

ax.grid(True)
plt.show()
364 22. COMPLEX NUMBERS AND TRIGNOMETRY

22.3 De Moivre’s Theorem

de Moivre’s theorem states that:

(𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = 𝑟𝑛 𝑒𝑖𝑛𝜃 = 𝑟𝑛 (cos 𝑛𝜃 + 𝑖 sin 𝑛𝜃)

To prove de Moivre’s theorem, note that

𝑛
(𝑟(cos 𝜃 + 𝑖 sin 𝜃))𝑛 = (𝑟𝑒𝑖𝜃 )

and compute

22.4 Applications of de Moivre’s Theorem

22.4.1 Example 1

We can use de Moivre’s theorem to show that 𝑟 = √𝑥2 + 𝑦2

We have

1 = 𝑒𝑖𝜃 𝑒−𝑖𝜃
= (cos 𝜃 + 𝑖 sin 𝜃)(cos (-𝜃) + 𝑖 sin (-𝜃))
= (cos 𝜃 + 𝑖 sin 𝜃)(cos 𝜃 − 𝑖 sin 𝜃)
= cos2 𝜃 + sin2 𝜃
𝑥2 𝑦2
= + 2
𝑟2 𝑟

and thus

𝑥2 + 𝑦2 = 𝑟2

We recognize this as a theorem of Pythagoras

22.4.2 Example 2

Let 𝑧 = 𝑟𝑒𝑖𝜃 and 𝑧 ̄ = 𝑟𝑒−𝑖𝜃 so that 𝑧 ̄ is the complex conjugate of 𝑧

(𝑧, 𝑧)̄ form a complex conjugate pair of complex numbers
Let 𝑎 = 𝑝𝑒𝑖𝜔 and 𝑎̄ = 𝑝𝑒−𝑖𝜔 be another complex conjugate pair
For each element of a sequence of integers 𝑛 = 0, 1, 2, … ,
To do so, we can apply de Moivre’s formula
Thus,
22.4. APPLICATIONS OF DE MOIVRE’S THEOREM 365

𝑥𝑛 = 𝑎𝑧 𝑛 + 𝑎𝑧̄ 𝑛̄
= 𝑝𝑒𝑖𝜔 (𝑟𝑒𝑖𝜃 )𝑛 + 𝑝𝑒−𝑖𝜔 (𝑟𝑒−𝑖𝜃 )𝑛
= 𝑝𝑟𝑛 𝑒𝑖(𝜔+𝑛𝜃) + 𝑝𝑟𝑛 𝑒−𝑖(𝜔+𝑛𝜃)
= 𝑝𝑟𝑛 [cos (𝜔 + 𝑛𝜃) + 𝑖 sin (𝜔 + 𝑛𝜃) + cos (𝜔 + 𝑛𝜃) − 𝑖 sin (𝜔 + 𝑛𝜃)]
= 2𝑝𝑟𝑛 cos (𝜔 + 𝑛𝜃)

22.4.3 Example 3

This example provides machinery that is at the heard of Samuelson’s analysis of his
multiplier-accelerator model [115]
Thus, consider a second-order linear difference equation

𝑥𝑛+2 = 𝑐1 𝑥𝑛+1 + 𝑐2 𝑥𝑛

whose characteristic polynomial is

𝑧 2 − 𝑐1 𝑧 − 𝑐 2 = 0

(𝑧2 − 𝑐1 𝑧 − 𝑐2 ) = (𝑧 − 𝑧1 )(𝑧 − 𝑧2 ) = 0

has roots 𝑧1 , 𝑧1
A solution is a sequence {𝑥𝑛 }∞
𝑛=0 that satisfies the difference equation

Under the following circumstances, we can apply our example 2 formula to solve the differ-
ence equation

• the roots 𝑧1 , 𝑧2 of the characteristic polynomial of the difference equation form a com-
plex conjugate pair
• the values 𝑥0 , 𝑥1 are given initial conditions

To solve the difference equation, recall from example 2 that

𝑥𝑛 = 2𝑝𝑟𝑛 cos (𝜔 + 𝑛𝜃)

where 𝜔, 𝑝 are coefficients to be determined from information encoded in the initial conditions
𝑥1 , 𝑥0
Since 𝑥0 = 2𝑝 cos 𝜔 and 𝑥1 = 2𝑝𝑟 cos (𝜔 + 𝜃) the ratio of 𝑥1 to 𝑥0 is

𝑥1 𝑟 cos (𝜔 + 𝜃)
=
𝑥0 cos 𝜔

We can solve this equation for 𝜔 then solve for 𝑝 using 𝑥0 = 2𝑝𝑟0 cos (𝜔 + 𝑛𝜃)
With the sympy package in Python, we are able to solve and plot the dynamics of 𝑥𝑛 given
different values of 𝑛
366 22. COMPLEX NUMBERS AND TRIGNOMETRY
√ √
In this example, we set the initial values: - 𝑟 = 0.9 - 𝜃 = 14 𝜋 - 𝑥0 = 4 - 𝑥1 = 𝑟 ⋅ 2 2 = 1.8 2
We first numerically solve for 𝜔 and 𝑝 using nsolve in the sympy package based on the
above initial condition:

In [3]: from sympy import *

# Set parameters
r = 0.9
θ = π/4
x0 = 4
x1 = 2 * r * sqrt(2)

# Define symbols to be calculated

ω, p = symbols('ω p', real=True)

# Solve for ω
## Note: we choose the solution near 0
eq1 = Eq(x1/x0 - r * cos(ω+θ) / cos(ω))
ω = nsolve(eq1, ω, 0)
ω = np.float(ω)
print(f'ω = {ω:1.3f}')

# Solve for p
eq2 = Eq(x0 - 2 * p * cos(ω))
p = nsolve(eq2, p, 0)
p = np.float(p)
print(f'p = {p:1.3f}')

ω = 0.000
p = 2.000

Using the code above, we compute that 𝜔 = 0 and 𝑝 = 2

Then we plug in the values we solve for 𝜔 and 𝑝 and plot the dynamic

In [4]: # Define range of n

max_n = 30
n = np.arange(0, max_n+1, 0.01)

# Define x_n
x = lambda n: 2 * p * r**n * np.cos(ω + n * θ)

# Plot
fig, ax = plt.subplots(figsize=(12, 8))

ax.plot(n, x(n))
ax.set(xlim=(0, max_n), ylim=(-5, 5), xlabel='$n$', ylabel='$x_n$')

ax.spines['bottom'].set_position('center') # Set x-axis in the middle of the plot

ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
ax.xaxis.set_ticks_position('bottom')
ax.yaxis.set_ticks_position('left')

ticklab = ax.xaxis.get_ticklabels()[0] # Set x-label position

trans = ticklab.get_transform()
ax.xaxis.set_label_coords(31, 0, transform=trans)

ticklab = ax.yaxis.get_ticklabels()[0] # Set y-label position

trans = ticklab.get_transform()
ax.yaxis.set_label_coords(0, 5, transform=trans)

ax.grid()
plt.show()
22.4. APPLICATIONS OF DE MOIVRE’S THEOREM 367

22.4.4 Trigonometric Identities

We can obtain a complete suite of trigonometric identities by appropriately manipulating po-

lar forms of complex numbers
We’ll get many of them by deducing implications of the equality

𝑒𝑖(𝜔+𝜃) = 𝑒𝑖𝜔 𝑒𝑖𝜃

For example, we’ll calculate identities for

cos (𝜔 + 𝜃) and sin (𝜔 + 𝜃)
Using the sine and cosine formulas presented at the beginning of this lecture, we have:

𝑒𝑖(𝜔+𝜃) + 𝑒−𝑖(𝜔+𝜃)
cos (𝜔 + 𝜃) =
2
𝑒𝑖(𝜔+𝜃) − 𝑒−𝑖(𝜔+𝜃)
sin (𝜔 + 𝜃) =
2𝑖

We can also obtain the trigonometric identities as follows:

cos (𝜔 + 𝜃) + 𝑖 sin (𝜔 + 𝜃) = 𝑒𝑖(𝜔+𝜃)

= 𝑒𝑖𝜔 𝑒𝑖𝜃
= (cos 𝜔 + 𝑖 sin 𝜔)(cos 𝜃 + 𝑖 sin 𝜃)
= (cos 𝜔 cos 𝜃 − sin 𝜔 sin 𝜃) + 𝑖(cos 𝜔 sin 𝜃 + sin 𝜔 cos 𝜃)

Since both real and imaginary parts of the above formula should be equal, we get:
368 22. COMPLEX NUMBERS AND TRIGNOMETRY

cos (𝜔 + 𝜃) = cos 𝜔 cos 𝜃 − sin 𝜔 sin 𝜃

sin (𝜔 + 𝜃) = cos 𝜔 sin 𝜃 + sin 𝜔 cos 𝜃

The equations above are also known as the angle sum identities. We can verify the equa-
tions using the simplify function in the sympy package:

In [5]: # Define symbols

ω, θ = symbols('ω θ', real=True)

# Verify
print("cos(ω)cos(θ) - sin(ω)sin(θ) =", simplify(cos(ω)*cos(θ) - sin(ω) * sin(θ)))
print("cos(ω)sin(θ) + sin(ω)cos(θ) =", simplify(cos(ω)*sin(θ) + sin(ω) * cos(θ)))

cos(ω)cos(θ) - sin(ω)sin(θ) = cos(θ + ω)

cos(ω)sin(θ) + sin(ω)cos(θ) = sin(θ + ω)

22.4.5 Trigonometric Integrals

We can also compute the trigonometric integrals using polar forms of complex numbers
For example, we want to solve the following integral:

𝜋
∫ cos(𝜔) sin(𝜔) 𝑑𝜔
−𝜋

Using Euler’s formula, we have:

(𝑒𝑖𝜔 + 𝑒−𝑖𝜔 ) (𝑒𝑖𝜔 − 𝑒−𝑖𝜔 )

∫ cos(𝜔) sin(𝜔) 𝑑𝜔 = ∫ 𝑑𝜔
2 2𝑖
1
= ∫ 𝑒2𝑖𝜔 − 𝑒−2𝑖𝜔 𝑑𝜔
4𝑖
1 −𝑖 𝑖
= ( 𝑒2𝑖𝜔 − 𝑒−2𝑖𝜔 + 𝐶1 )
4𝑖 2 2
2 2
1
= − [(𝑒𝑖𝜔 ) + (𝑒−𝑖𝜔 ) − 2] + 𝐶2
8
1
= − (𝑒𝑖𝜔 − 𝑒−𝑖𝜔 )2 + 𝐶2
8
2
1 𝑒𝑖𝜔 − 𝑒−𝑖𝜔
= ( ) + 𝐶2
2 2𝑖
1
= sin2 (𝜔) + 𝐶2
2

and thus:

𝜋
1 1
∫ cos(𝜔) sin(𝜔) 𝑑𝜔 = sin2 (𝜋) − sin2 (−𝜋) = 0
−𝜋 2 2

We can verify the analytical as well as numerical results using integrate in the sympy
package:
22.4. APPLICATIONS OF DE MOIVRE’S THEOREM 369

In [6]: # Set initial printing

init_printing()

ω = Symbol('ω')
print('The analytical solution for integral of cos(ω)sin(ω) is:')
integrate(cos(ω) * sin(ω), ω)

The analytical solution for integral of cos(ω)sin(ω) is:

Out[6]:

sin2 (𝜔)
2

In [7]: print('The numerical solution for the integral of cos(ω)sin(ω) from -π to π is:')
integrate(cos(ω) * sin(ω), (ω, -π, π))

The numerical solution for the integral of cos(ω)sin(ω) from -π to π is:

Out[7]:

0
370 22. COMPLEX NUMBERS AND TRIGNOMETRY
23

Orthogonal Projections and Their

Applications

23.1 Contents

• Overview 23.2

• Key Definitions 23.3

• The Orthogonal Projection Theorem 23.4

• Orthonormal Basis 23.5

• Projection Using Matrix Algebra 23.6

• Least Squares Regression 23.7

• Orthogonalization and Decomposition 23.8

• Exercises 23.9

• Solutions 23.10

23.2 Overview

Orthogonal projection is a cornerstone of vector space methods, with many diverse applica-
tions
These include, but are not limited to,

• Least squares projection, also known as linear regression

• Conditional expectations for multivariate normal (Gaussian) distributions
• Gram–Schmidt orthogonalization
• QR decomposition
• Orthogonal polynomials
• etc

In this lecture, we focus on

371
372 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

• key ideas
• least squares regression

23.2.1 Further Reading

For background and foundational concepts, see our lecture on linear algebra
For more proofs and greater theoretical detail, see A Primer in Econometric Theory
For a complete set of proofs in a general setting, see, for example, [109]
For an advanced treatment of projection in the context of least squares prediction, see this
book chapter

23.3 Key Definitions

Assume 𝑥, 𝑧 ∈ R𝑛
Define ⟨𝑥, 𝑧⟩ = ∑𝑖 𝑥𝑖 𝑧𝑖
Recall ‖𝑥‖2 = ⟨𝑥, 𝑥⟩
The law of cosines states that ⟨𝑥, 𝑧⟩ = ‖𝑥‖‖𝑧‖ cos(𝜃) where 𝜃 is the angle between the vectors
𝑥 and 𝑧
When ⟨𝑥, 𝑧⟩ = 0, then cos(𝜃) = 0 and 𝑥 and 𝑧 are said to be orthogonal and we write 𝑥 ⟂ 𝑧

For a linear subspace 𝑆 ⊂ R𝑛 , we call 𝑥 ∈ R𝑛 orthogonal to 𝑆 if 𝑥 ⟂ 𝑧 for all 𝑧 ∈ 𝑆, and

write 𝑥 ⟂ 𝑆
23.3. KEY DEFINITIONS 373

The orthogonal complement of linear subspace 𝑆 ⊂ R𝑛 is the set 𝑆 ⟂ ∶= {𝑥 ∈ R𝑛 ∶ 𝑥 ⟂ 𝑆}

𝑆 ⟂ is a linear subspace of R𝑛

• To see this, fix 𝑥, 𝑦 ∈ 𝑆 ⟂ and 𝛼, 𝛽 ∈ R

• Observe that if 𝑧 ∈ 𝑆, then

⟨𝛼𝑥 + 𝛽𝑦, 𝑧⟩ = 𝛼⟨𝑥, 𝑧⟩ + 𝛽⟨𝑦, 𝑧⟩ = 𝛼 × 0 + 𝛽 × 0 = 0

• Hence 𝛼𝑥 + 𝛽𝑦 ∈ 𝑆 ⟂ , as was to be shown

374 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

A set of vectors {𝑥1 , … , 𝑥𝑘 } ⊂ R𝑛 is called an orthogonal set if 𝑥𝑖 ⟂ 𝑥𝑗 whenever 𝑖 ≠ 𝑗

If {𝑥1 , … , 𝑥𝑘 } is an orthogonal set, then the Pythagorean Law states that

‖𝑥1 + ⋯ + 𝑥𝑘 ‖2 = ‖𝑥1 ‖2 + ⋯ + ‖𝑥𝑘 ‖2

For example, when 𝑘 = 2, 𝑥1 ⟂ 𝑥2 implies

‖𝑥1 + 𝑥2 ‖2 = ⟨𝑥1 + 𝑥2 , 𝑥1 + 𝑥2 ⟩ = ⟨𝑥1 , 𝑥1 ⟩ + 2⟨𝑥2 , 𝑥1 ⟩ + ⟨𝑥2 , 𝑥2 ⟩ = ‖𝑥1 ‖2 + ‖𝑥2 ‖2

23.3.1 Linear Independence vs Orthogonality

If 𝑋 ⊂ R𝑛 is an orthogonal set and 0 ∉ 𝑋, then 𝑋 is linearly independent

Proving this is a nice exercise
While the converse is not true, a kind of partial converse holds, as we’ll see below

23.4 The Orthogonal Projection Theorem

What vector within a linear subspace of R𝑛 best approximates a given vector in R𝑛 ?

The next theorem provides answer to this question
Theorem (OPT) Given 𝑦 ∈ R𝑛 and linear subspace 𝑆 ⊂ R𝑛 , there exists a unique solution
to the minimization problem

𝑦 ̂ ∶= min ‖𝑦 − 𝑧‖
𝑧∈𝑆

The minimizer 𝑦 ̂ is the unique vector in R𝑛 that satisfies

• 𝑦̂ ∈ 𝑆
• 𝑦 − 𝑦̂ ⟂ 𝑆

The vector 𝑦 ̂ is called the orthogonal projection of 𝑦 onto 𝑆

The next figure provides some intuition

23.4.1 Proof of Sufficiency

We’ll omit the full proof.

But we will prove sufficiency of the asserted conditions
To this end, let 𝑦 ∈ R𝑛 and let 𝑆 be a linear subspace of R𝑛
Let 𝑦 ̂ be a vector in R𝑛 such that 𝑦 ̂ ∈ 𝑆 and 𝑦 − 𝑦 ̂ ⟂ 𝑆
Let 𝑧 be any other point in 𝑆 and use the fact that 𝑆 is a linear subspace to deduce

‖𝑦 − 𝑧‖2 = ‖(𝑦 − 𝑦)̂ + (𝑦 ̂ − 𝑧)‖2 = ‖𝑦 − 𝑦‖̂ 2 + ‖𝑦 ̂ − 𝑧‖2

Hence ‖𝑦 − 𝑧‖ ≥ ‖𝑦 − 𝑦‖,
̂ which completes the proof
23.4. THE ORTHOGONAL PROJECTION THEOREM 375

23.4.2 Orthogonal Projection as a Mapping

For a linear space 𝑌 and a fixed linear subspace 𝑆, we have a functional relationship

𝑦 ∈ 𝑌 ↦ its orthogonal projection 𝑦 ̂ ∈ 𝑆

By the OPT, this is a well-defined mapping or operator from R𝑛 to R𝑛

In what follows we denote this operator by a matrix 𝑃

• 𝑃 𝑦 represents the projection 𝑦 ̂

• This is sometimes expressed as 𝐸𝑆̂ 𝑦 = 𝑃 𝑦, where 𝐸̂ denotes a wide-sense expecta-
tions operator and the subscript 𝑆 indicates that we are projecting 𝑦 onto the linear
subspace 𝑆

The operator 𝑃 is called the orthogonal projection mapping onto 𝑆

376 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

It is immediate from the OPT that for any 𝑦 ∈ R𝑛

1. 𝑃 𝑦 ∈ 𝑆 and
2. 𝑦 − 𝑃 𝑦 ⟂ 𝑆

From this, we can deduce additional useful properties, such as

1. ‖𝑦‖2 = ‖𝑃 𝑦‖2 + ‖𝑦 − 𝑃 𝑦‖2 and

2. ‖𝑃 𝑦‖ ≤ ‖𝑦‖

For example, to prove 1, observe that 𝑦 = 𝑃 𝑦 + 𝑦 − 𝑃 𝑦 and apply the Pythagorean law
Orthogonal Complement
Let 𝑆 ⊂ R𝑛 .
The orthogonal complement of 𝑆 is the linear subspace 𝑆 ⟂ that satisfies 𝑥1 ⟂ 𝑥2 for every
𝑥1 ∈ 𝑆 and 𝑥2 ∈ 𝑆 ⟂
Let 𝑌 be a linear space with linear subspace 𝑆 and its orthogonal complement 𝑆 ⟂
We write

𝑌 = 𝑆 ⊕ 𝑆⟂

to indicate that for every 𝑦 ∈ 𝑌 there is unique 𝑥1 ∈ 𝑆 and a unique 𝑥2 ∈ 𝑆 ⟂ such that
𝑦 = 𝑥1 + 𝑥2
Moreover, 𝑥1 = 𝐸𝑆̂ 𝑦 and 𝑥2 = 𝑦 − 𝐸𝑆̂ 𝑦
This amounts to another version of the OPT:
Theorem. If 𝑆 is a linear subspace of R𝑛 , 𝐸𝑆̂ 𝑦 = 𝑃 𝑦 and 𝐸𝑆̂ ⟂ 𝑦 = 𝑀 𝑦, then

𝑃 𝑦 ⟂ 𝑀𝑦 and 𝑦 = 𝑃 𝑦 + 𝑀 𝑦 for all 𝑦 ∈ R𝑛

23.5. ORTHONORMAL BASIS 377

The next figure illustrates

23.5 Orthonormal Basis

An orthogonal set of vectors 𝑂 ⊂ R𝑛 is called an orthonormal set if ‖𝑢‖ = 1 for all 𝑢 ∈ 𝑂

Let 𝑆 be a linear subspace of R𝑛 and let 𝑂 ⊂ 𝑆
If 𝑂 is orthonormal and span 𝑂 = 𝑆, then 𝑂 is called an orthonormal basis of 𝑆
𝑂 is necessarily a basis of 𝑆 (being independent by orthogonality and the fact that no ele-
ment is the zero vector)
One example of an orthonormal set is the canonical basis {𝑒1 , … , 𝑒𝑛 } that forms an orthonor-
mal basis of R𝑛 , where 𝑒𝑖 is the 𝑖 th unit vector
If {𝑢1 , … , 𝑢𝑘 } is an orthonormal basis of linear subspace 𝑆, then

𝑘
𝑥 = ∑⟨𝑥, 𝑢𝑖 ⟩𝑢𝑖 for all 𝑥∈𝑆
𝑖=1

To see this, observe that since 𝑥 ∈ span{𝑢1 , … , 𝑢𝑘 }, we can find scalars 𝛼1 , … , 𝛼𝑘 that verify

𝑘
𝑥 = ∑ 𝛼𝑗 𝑢𝑗 (1)
𝑗=1

Taking the inner product with respect to 𝑢𝑖 gives

𝑘
⟨𝑥, 𝑢𝑖 ⟩ = ∑ 𝛼𝑗 ⟨𝑢𝑗 , 𝑢𝑖 ⟩ = 𝛼𝑖
𝑗=1

Combining this result with Eq. (1) verifies the claim

378 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

23.5.1 Projection onto an Orthonormal Basis

When the subspace onto which are projecting is orthonormal, computing the projection sim-
plifies:
Theorem If {𝑢1 , … , 𝑢𝑘 } is an orthonormal basis for 𝑆, then

𝑘
𝑃 𝑦 = ∑⟨𝑦, 𝑢𝑖 ⟩𝑢𝑖 , ∀ 𝑦 ∈ R𝑛 (2)
𝑖=1

Proof: Fix 𝑦 ∈ R𝑛 and let 𝑃 𝑦 be defined as in Eq. (2)

Clearly, 𝑃 𝑦 ∈ 𝑆
We claim that 𝑦 − 𝑃 𝑦 ⟂ 𝑆 also holds
It sufficies to show that 𝑦 − 𝑃 𝑦 ⟂ any basis vector 𝑢𝑖 (why?)
This is true because

𝑘 𝑘
⟨𝑦 − ∑⟨𝑦, 𝑢𝑖 ⟩𝑢𝑖 , 𝑢𝑗 ⟩ = ⟨𝑦, 𝑢𝑗 ⟩ − ∑⟨𝑦, 𝑢𝑖 ⟩⟨𝑢𝑖 , 𝑢𝑗 ⟩ = 0
𝑖=1 𝑖=1

23.6 Projection Using Matrix Algebra

Let 𝑆 be a linear subspace of R𝑛 and let 𝑦 ∈ R𝑛

We want to compute the matrix 𝑃 that verifies

𝐸𝑆̂ 𝑦 = 𝑃 𝑦

Evidently 𝑃 𝑦 is a linear function from 𝑦 ∈ R𝑛 to 𝑃 𝑦 ∈ R𝑛

This reference is useful https://en.wikipedia.org/wiki/Linear_map#Matrices
Theorem. Let the columns of 𝑛 × 𝑘 matrix 𝑋 form a basis of 𝑆. Then

𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′

Proof: Given arbitrary 𝑦 ∈ R𝑛 and 𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ , our claim is that

1. 𝑃 𝑦 ∈ 𝑆, and
2. 𝑦 − 𝑃 𝑦 ⟂ 𝑆

Claim 1 is true because

𝑃 𝑦 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 = 𝑋𝑎 when 𝑎 ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

An expression of the form 𝑋𝑎 is precisely a linear combination of the columns of 𝑋, and

hence an element of 𝑆
Claim 2 is equivalent to the statement
23.6. PROJECTION USING MATRIX ALGEBRA 379

𝑦 − 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 ⟂ 𝑋𝑏 for all 𝑏 ∈ R𝐾

This is true: If 𝑏 ∈ R𝐾 , then

(𝑋𝑏)′ [𝑦 − 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦] = 𝑏′ [𝑋 ′ 𝑦 − 𝑋 ′ 𝑦] = 0

The proof is now complete

23.6.1 Starting with the Basis

It is common in applications to start with 𝑛 × 𝑘 matrix 𝑋 with linearly independent columns

and let

𝑆 ∶= span 𝑋 ∶= span{1 𝑋, … ,𝑘 𝑋}

Then the columns of 𝑋 form a basis of 𝑆

From the preceding theorem, 𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 projects 𝑦 onto 𝑆
In this context, 𝑃 is often called the projection matrix

• The matrix 𝑀 = 𝐼 − 𝑃 satisfies 𝑀 𝑦 = 𝐸𝑆̂ ⟂ 𝑦 and is sometimes called the annihilator

matrix

23.6.2 The Orthonormal Case

Suppose that 𝑈 is 𝑛 × 𝑘 with orthonormal columns

Let 𝑢𝑖 ∶= col 𝑈𝑖 for each 𝑖, let 𝑆 ∶= span 𝑈 and let 𝑦 ∈ R𝑛
We know that the projection of 𝑦 onto 𝑆 is

𝑃 𝑦 = 𝑈 (𝑈 ′ 𝑈 )−1 𝑈 ′ 𝑦

Since 𝑈 has orthonormal columns, we have 𝑈 ′ 𝑈 = 𝐼

Hence

𝑘
𝑃 𝑦 = 𝑈 𝑈 ′ 𝑦 = ∑⟨𝑢𝑖 , 𝑦⟩𝑢𝑖
𝑖=1

We have recovered our earlier result about projecting onto the span of an orthonormal basis

23.6.3 Application: Overdetermined Systems of Equations

Let 𝑦 ∈ R𝑛 and let 𝑋 is 𝑛 × 𝑘 with linearly independent columns

Given 𝑋 and 𝑦, we seek 𝑏 ∈ R𝑘 satisfying the system of linear equations 𝑋𝑏 = 𝑦
If 𝑛 > 𝑘 (more equations than unknowns), then 𝑏 is said to be overdetermined
380 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

Intuitively, we may not be able to find a 𝑏 that satisfies all 𝑛 equations

The best approach here is to

• Accept that an exact solution may not exist

• Look instead for an approximate solution

By approximate solution, we mean a 𝑏 ∈ R𝑘 such that 𝑋𝑏 is as close to 𝑦 as possible

The next theorem shows that the solution is well defined and unique
The proof uses the OPT
Theorem The unique minimizer of ‖𝑦 − 𝑋𝑏‖ over 𝑏 ∈ R𝐾 is

𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

Proof: Note that

𝑋 𝛽 ̂ = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 = 𝑃 𝑦

Since 𝑃 𝑦 is the orthogonal projection onto span(𝑋) we have

‖𝑦 − 𝑃 𝑦‖ ≤ ‖𝑦 − 𝑧‖ for any 𝑧 ∈ span(𝑋)

Because 𝑋𝑏 ∈ span(𝑋)

‖𝑦 − 𝑋 𝛽‖̂ ≤ ‖𝑦 − 𝑋𝑏‖ for any 𝑏 ∈ R𝐾

This is what we aimed to show

23.7 Least Squares Regression

Let’s apply the theory of orthogonal projection to least squares regression

This approach provides insights about many geometric properties of linear regression
We treat only some examples

23.7.1 Squared Risk Measures

Given pairs (𝑥, 𝑦) ∈ R𝐾 × R, consider choosing 𝑓 ∶ R𝐾 → R to minimize the risk

𝑅(𝑓) ∶= E [(𝑦 − 𝑓(𝑥))2 ]

If probabilities and hence E are unknown, we cannot solve this problem directly
However, if a sample is available, we can estimate the risk with the empirical risk:

1 𝑁
min ∑(𝑦 − 𝑓(𝑥𝑛 ))2
𝑓∈ℱ 𝑁 𝑛=1 𝑛
23.7. LEAST SQUARES REGRESSION 381

Minimizing this expression is called empirical risk minimization

The set ℱ is sometimes called the hypothesis space
The theory of statistical learning tells us that to prevent overfitting we should take the set ℱ
to be relatively simple
If we let ℱ be the class of linear functions 1/𝑁 , the problem is

𝑁
min ∑(𝑦𝑛 − 𝑏′ 𝑥𝑛 )2
𝑏∈R𝐾
𝑛=1

This is the sample linear least squares problem

23.7.2 Solution

Define the matrices

𝑦1 𝑥𝑛1
⎛
⎜ 𝑦2 ⎞
⎟ ⎛
⎜ 𝑥𝑛2 ⎞
⎟
𝑦 ∶= ⎜
⎜ ⎟
⎟ , 𝑥𝑛 ∶= ⎜
⎜ ⎟
⎟ = :math:‘n‘-th obs on all regressors
⎜ ⋮ ⎟ ⎜ ⋮ ⎟
⎝ 𝑦𝑁 ⎠ ⎝ 𝑥𝑛𝐾 ⎠

and

𝑥′1 𝑥11 𝑥12 ⋯ 𝑥1𝐾

⎛
⎜ 𝑥′2 ⎞
⎟ ⎛
⎜ 𝑥21 𝑥22 ⋯ 𝑥2𝐾 ⎞
⎟
𝑋 ∶= ⎜
⎜ ⎟
⎟ ∶=∶ ⎜
⎜ ⎟
⎟
⎜ ⋮ ⎟ ⎜ ⋮ ⋮ ⋮ ⎟
⎝ 𝑥′𝑁 ⎠ 𝑥
⎝ 𝑁1 𝑥𝑁2 ⋯ 𝑥 𝑁𝐾 ⎠

We assume throughout that 𝑁 > 𝐾 and 𝑋 is full column rank

𝑁
If you work through the algebra, you will be able to verify that ‖𝑦 − 𝑋𝑏‖2 = ∑𝑛=1 (𝑦𝑛 − 𝑏′ 𝑥𝑛 )2
Since monotone transforms don’t affect minimizers, we have

𝑁
min ∑(𝑦𝑛 − 𝑏′ 𝑥𝑛 )2 = min ‖𝑦 − 𝑋𝑏‖
𝑏∈R𝐾 𝑏∈R𝐾
𝑛=1

By our results about overdetermined linear systems of equations, the solution is

𝛽 ̂ ∶= (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦

Let 𝑃 and 𝑀 be the projection and annihilator associated with 𝑋:

𝑃 ∶= 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ and 𝑀 ∶= 𝐼 − 𝑃

The vector of fitted values is

𝑦 ̂ ∶= 𝑋 𝛽 ̂ = 𝑃 𝑦
382 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

The vector of residuals is

𝑢̂ ∶= 𝑦 − 𝑦 ̂ = 𝑦 − 𝑃 𝑦 = 𝑀 𝑦

Here are some more standard definitions:

• The total sum of squares is ∶= ‖𝑦‖2

• The sum of squared residuals is ∶= ‖𝑢‖̂ 2
• The explained sum of squares is ∶= ‖𝑦‖̂ 2

TSS = ESS + SSR

We can prove this easily using the OPT

From the OPT we have 𝑦 = 𝑦 ̂ + 𝑢̂ and 𝑢̂ ⟂ 𝑦 ̂
Applying the Pythagorean law completes the proof

23.8 Orthogonalization and Decomposition

Let’s return to the connection between linear independence and orthogonality touched on
above
A result of much interest is a famous algorithm for constructing orthonormal sets from lin-
early independent sets
The next section gives details

23.8.1 Gram-Schmidt Orthogonalization

Theorem For each linearly independent set {𝑥1 , … , 𝑥𝑘 } ⊂ R𝑛 , there exists an orthonormal
set {𝑢1 , … , 𝑢𝑘 } with

span{𝑥1 , … , 𝑥𝑖 } = span{𝑢1 , … , 𝑢𝑖 } for 𝑖 = 1, … , 𝑘

The Gram-Schmidt orthogonalization procedure constructs an orthogonal set

{𝑢1 , 𝑢2 , … , 𝑢𝑛 }
One description of this procedure is as follows:

• For 𝑖 = 1, … , 𝑘, form 𝑆𝑖 ∶= span{𝑥1 , … , 𝑥𝑖 } and 𝑆𝑖⟂

• Set 𝑣1 = 𝑥1
• For 𝑖 ≥ 2 set 𝑣𝑖 ∶= 𝐸𝑆̂ 𝑖−1
⟂ 𝑥𝑖 and 𝑢𝑖 ∶= 𝑣𝑖 /‖𝑣𝑖 ‖

The sequence 𝑢1 , … , 𝑢𝑘 has the stated properties

A Gram-Schmidt orthogonalization construction is a key idea behind the Kalman filter de-
scribed in A First Look at the Kalman filter
In some exercises below, you are asked to implement this algorithm and test it using projec-
tion
23.9. EXERCISES 383

23.8.2 QR Decomposition

The following result uses the preceding algorithm to produce a useful decomposition
Theorem If 𝑋 is 𝑛 × 𝑘 with linearly independent columns, then there exists a factorization
𝑋 = 𝑄𝑅 where

• 𝑅 is 𝑘 × 𝑘, upper triangular, and nonsingular

• 𝑄 is 𝑛 × 𝑘 with orthonormal columns

Proof sketch: Let

• 𝑥𝑗 ∶=𝑗 (𝑋)
• {𝑢1 , … , 𝑢𝑘 } be orthonormal with the same span as {𝑥1 , … , 𝑥𝑘 } (to be constructed using
Gram–Schmidt)
• 𝑄 be formed from cols 𝑢𝑖

Since 𝑥𝑗 ∈ span{𝑢1 , … , 𝑢𝑗 }, we have

𝑗
𝑥𝑗 = ∑⟨𝑢𝑖 , 𝑥𝑗 ⟩𝑢𝑖 for 𝑗 = 1, … , 𝑘
𝑖=1

Some rearranging gives 𝑋 = 𝑄𝑅

23.8.3 Linear Regression via QR Decomposition

For matrices 𝑋 and 𝑦 that overdetermine 𝑏𝑒𝑡𝑎 in the linear equation system 𝑦 = 𝑋𝛽, we
found the least squares approximator 𝛽 ̂ = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦
Using the QR decomposition 𝑋 = 𝑄𝑅 gives

𝛽 ̂ = (𝑅′ 𝑄′ 𝑄𝑅)−1 𝑅′ 𝑄′ 𝑦
= (𝑅′ 𝑅)−1 𝑅′ 𝑄′ 𝑦
= 𝑅−1 (𝑅′ )−1 𝑅′ 𝑄′ 𝑦 = 𝑅−1 𝑄′ 𝑦

Numerical routines would in this case use the alternative form 𝑅𝛽 ̂ = 𝑄′ 𝑦 and back substitu-
tion

23.9 Exercises

23.9.1 Exercise 1

Show that, for any linear subspace 𝑆 ⊂ R𝑛 , 𝑆 ∩ 𝑆 ⟂ = {0}

23.9.2 Exercise 2

Let 𝑃 = 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ and let 𝑀 = 𝐼 − 𝑃 . Show that 𝑃 and 𝑀 are both idempotent and
symmetric. Can you give any intuition as to why they should be idempotent?
384 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

23.9.3 Exercise 3

Using Gram-Schmidt orthogonalization, produce a linear projection of 𝑦 onto the column

space of 𝑋 and verify this using the projection matrix 𝑃 ∶= 𝑋(𝑋 ′ 𝑋)−1 𝑋 ′ and also using
QR decomposition, where:

1
𝑦 ∶= ⎛
⎜ 3 ⎞⎟,
⎝ −3 ⎠

and

1 0
𝑋 ∶= ⎛
⎜ 0 −6 ⎞
⎟
⎝ 2 2 ⎠

23.10 Solutions

23.10.1 Exercise 1

If 𝑥 ∈ 𝑆 and 𝑥 ∈ 𝑆 ⟂ , then we have in particular that ⟨𝑥, 𝑥⟩ = 0, ut then 𝑥 = 0

23.10.2 Exercise 2

Symmetry and idempotence of 𝑀 and 𝑃 can be established using standard rules for matrix
algebra. The intuition behind idempotence of 𝑀 and 𝑃 is that both are orthogonal projec-
tions. After a point is projected into a given subspace, applying the projection again makes
no difference. (A point inside the subspace is not shifted by orthogonal projection onto that
space because it is already the closest point in the subspace to itself.)

23.10.3 Exercise 3

Here’s a function that computes the orthonormal vectors using the GS algorithm given in the
lecture

In [1]: import numpy as np

def gram_schmidt(X):
"""
Implements Gram-Schmidt orthogonalization.

Parameters
----------
X : an n x k array with linearly independent columns

Returns
-------
U : an n x k array with orthonormal columns

"""

# Set up
n, k = X.shape
U = np.empty((n, k))
23.10. SOLUTIONS 385

I = np.eye(n)

# The first col of U is just the normalized first col of X

v1 = X[:,0]
U[:, 0] = v1 / np.sqrt(np.sum(v1 * v1))

for i in range(1, k):

# Set up
b = X[:, i] # The vector we're going to project
Z = X[:, 0:i] # First i-1 columns of X

# Project onto the orthogonal complement of the col span of Z

M = I - Z @ np.linalg.inv(Z.T @ Z) @ Z.T
u = M @ b

# Normalize
U[:, i] = u / np.sqrt(np.sum(u * u))

return U

Here are the arrays we’ll work with

In [2]: y = [1, 3, -3]

X = [[1, 0],
[0, -6],
[2, 2]]

X, y = [np.asarray(z) for z in (X, y)]

First, let’s try projection of 𝑦 onto the column space of 𝑋 using the ordinary matrix expres-
sion:

In [3]: Py1 = X @ np.linalg.inv(X.T @ X) @ X.T @ y

Py1

Out[3]: array([-0.56521739, 3.26086957, -2.2173913 ])

Now let’s do the same using an orthonormal basis created from our gram_schmidt function

In [4]: U = gram_schmidt(X)
U

Out[4]: array([[ 0.4472136 , -0.13187609],

[ 0. , -0.98907071],
[ 0.89442719, 0.06593805]])

In [5]: Py2 = U @ U.T @ y

Py2

Out[5]: array([-0.56521739, 3.26086957, -2.2173913 ])

This is the same answer. So far so good. Finally, let’s try the same thing but with the basis
obtained via QR decomposition:

In [6]: from scipy.linalg import qr

Q, R = qr(X, mode='economic')
Q
386 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS

Out[6]: array([[-0.4472136 , -0.13187609],

[-0. , -0.98907071],
[-0.89442719, 0.06593805]])

In [7]: Py3 = Q @ Q.T @ y

Py3

Out[7]: array([-0.56521739, 3.26086957, -2.2173913 ])

Again, we obtain the same answer

LLN and CLT

24.1 Contents

• Overview 24.2

• Relationships 24.3

• LLN 24.4

• CLT 24.5

• Exercises 24.6

• Solutions 24.7

24.2 Overview

This lecture illustrates two of the most important theorems of probability and statistics: The
law of large numbers (LLN) and the central limit theorem (CLT)
These beautiful theorems lie behind many of the most fundamental results in econometrics
and quantitative economic modeling
The lecture is based around simulations that show the LLN and CLT in action
We also demonstrate how the LLN and CLT break down when the assumptions they are
based on do not hold
In addition, we examine several useful extensions of the classical theorems, such as

• The delta method, for smooth functions of random variables

• The multivariate case

Some of these extensions are presented as exercises

24.3 Relationships

The CLT refines the LLN

387
388 24. LLN AND CLT

The LLN gives conditions under which sample moments converge to population moments as
sample size increases
The CLT provides information about the rate at which sample moments converge to popula-
tion moments as sample size increases

24.4 LLN

We begin with the law of large numbers, which tells us when sample averages will converge to
their population means

24.4.1 The Classical LLN

The classical law of large numbers concerns independent and identically distributed (IID)
random variables
Here is the strongest version of the classical LLN, known as Kolmogorov’s strong law
Let 𝑋1 , … , 𝑋𝑛 be independent and identically distributed scalar random variables, with com-
mon distribution 𝐹
When it exists, let 𝜇 denote the common mean of this sample:

𝜇 ∶= E𝑋 = ∫ 𝑥𝐹 (𝑑𝑥)

In addition, let

1 𝑛
𝑋̄ 𝑛 ∶= ∑ 𝑋𝑖
𝑛 𝑖=1

Kolmogorov’s strong law states that, if E|𝑋| is finite, then

P {𝑋̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (1)

What does this last expression mean?

Let’s think about it from a simulation perspective, imagining for a moment that our com-
puter can generate perfect random samples (which of course it can’t)
Let’s also imagine that we can generate infinite sequences so that the statement 𝑋̄ 𝑛 → 𝜇 can
be evaluated
In this setting, Eq. (1) should be interpreted as meaning that the probability of the computer
producing a sequence where 𝑋̄ 𝑛 → 𝜇 fails to occur is zero

24.4.2 Proof

The proof of Kolmogorov’s strong law is nontrivial – see, for example, theorem 8.3.5 of [38]
On the other hand, we can prove a weaker version of the LLN very easily and still get most of
the intuition
24.4. LLN 389

The version we prove is as follows: If 𝑋1 , … , 𝑋𝑛 is IID with E𝑋𝑖2 < ∞, then, for any 𝜖 > 0,
we have

P {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} → 0 as 𝑛→∞ (2)

(This version is weaker because we claim only convergence in probability rather than almost
sure convergence, and assume a finite second moment)
To see that this is so, fix 𝜖 > 0, and let 𝜎2 be the variance of each 𝑋𝑖
Recall the Chebyshev inequality, which tells us that

E[(𝑋̄ 𝑛 − 𝜇)2 ]
P {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ (3)
𝜖2

Now observe that

2
⎧
{ 1 𝑛 ⎫
}
E[(𝑋̄ 𝑛 − 𝜇)2 ] = E ⎨[ ∑(𝑋𝑖 − 𝜇)] ⎬
{ 𝑛 𝑖=1 }
⎩ ⎭
𝑛 𝑛
1
= 2 ∑ ∑ E(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇)
𝑛 𝑖=1 𝑗=1
1 𝑛
= 2 ∑ E(𝑋𝑖 − 𝜇)2
𝑛 𝑖=1
𝜎2
=
𝑛

Here the crucial step is at the third equality, which follows from independence
Independence means that if 𝑖 ≠ 𝑗, then the covariance term E(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) drops out
As a result, 𝑛2 − 𝑛 terms vanish, leading us to a final expression that goes to zero in 𝑛
Combining our last result with Eq. (3), we come to the estimate

𝜎2
P {|𝑋̄ 𝑛 − 𝜇| ≥ 𝜖} ≤ 2 (4)
𝑛𝜖

The claim in Eq. (2) is now clear

Of course, if the sequence 𝑋1 , … , 𝑋𝑛 is correlated, then the cross-product terms E(𝑋𝑖 −
𝜇)(𝑋𝑗 − 𝜇) are not necessarily zero
While this doesn’t mean that the same line of argument is impossible, it does mean that if we
want a similar result then the covariances should be “almost zero” for “most” of these terms
In a long sequence, this would be true if, for example, E(𝑋𝑖 − 𝜇)(𝑋𝑗 − 𝜇) approached zero
when the difference between 𝑖 and 𝑗 became large
In other words, the LLN can still work if the sequence 𝑋1 , … , 𝑋𝑛 has a kind of “asymptotic
independence”, in the sense that correlation falls to zero as variables become further apart in
the sequence
This idea is very important in time series analysis, and we’ll come across it again soon enough
390 24. LLN AND CLT

24.4.3 Illustration

Let’s now illustrate the classical IID law of large numbers using simulation
In particular, we aim to generate some sequences of IID random variables and plot the evolu-
tion of 𝑋̄ 𝑛 as 𝑛 increases
Below is a figure that does just this (as usual, you can click on it to expand it)
It shows IID observations from three different distributions and plots 𝑋̄ 𝑛 against 𝑛 in each
case
The dots represent the underlying observations 𝑋𝑖 for 𝑖 = 1, … , 100
In each of the three cases, convergence of 𝑋̄ 𝑛 to 𝜇 occurs as predicted

In [1]: import random

import numpy as np
from scipy.stats import t, beta, lognorm, expon, gamma, poisson
import matplotlib.pyplot as plt
%matplotlib inline

n = 100

# == Arbitrary collection of distributions == #

distributions = {"student's t with 10 degrees of freedom": t(10),
"β(2, 2)": beta(2, 2),
"lognormal LN(0, 1/2)": lognorm(0.5),
"γ(5, 1/2)": gamma(5, scale=2),
"poisson(4)": poisson(4),
"exponential with λ = 1": expon(1)}

# == Create a figure and some axes == #

num_plots = 3
fig, axes = plt.subplots(num_plots, 1, figsize=(20, 20))

# == Set some plotting parameters to improve layout == #

bbox = (0., 1.02, 1., .102)
legend_args = {'ncol': 2,
'bbox_to_anchor': bbox,
'loc': 3,
'mode': 'expand'}
plt.subplots_adjust(hspace=0.5)

for ax in axes:
# == Choose a randomly selected distribution == #
name = random.choice(list(distributions.keys()))
distribution = distributions.pop(name)

# == Generate n draws from the distribution == #

data = distribution.rvs(n)

# == Compute sample mean at each n == #

sample_mean = np.empty(n)
for i in range(n):
sample_mean[i] = np.mean(data[:i+1])

# == Plot == #
ax.plot(list(range(n)), data, 'o', color='grey', alpha=0.5)
axlabel = '$\\bar X_n$ for $X_i \sim$' + name
ax.plot(list(range(n)), sample_mean, 'g-', lw=3, alpha=0.6, label=axlabel)
m = distribution.mean()
ax.plot(list(range(n)), [m] * n, 'k--', lw=1.5, label='$\mu$')
ax.vlines(list(range(n)), m, data, lw=0.2)
ax.legend(**legend_args)

plt.show()
24.4. LLN 391

The three distributions are chosen at random from a selection stored in the dictionary dis-
tributions

24.4.4 Infinite Mean

What happens if the condition E|𝑋| < ∞ in the statement of the LLN is not satisfied?
This might be the case if the underlying distribution is heavy-tailed — the best- known ex-
ample is the Cauchy distribution, which has density

1
𝑓(𝑥) = (𝑥 ∈ R)
𝜋(1 + 𝑥2 )

The next figure shows 100 independent draws from this distribution

In [2]: from scipy.stats import cauchy

n = 100
distribution = cauchy()
392 24. LLN AND CLT

fig, ax = plt.subplots(figsize=(10, 6))

data = distribution.rvs(n)

ax.plot(list(range(n)), data, linestyle='', marker='o', alpha=0.5)

ax.vlines(list(range(n)), 0, data, lw=0.2)
ax.set_title(f"{n} observations from the Cauchy distribution")

plt.show()

Notice how extreme observations are far more prevalent here than the previous figure
Let’s now have a look at the behavior of the sample mean

In [3]: n = 1000
distribution = cauchy()

fig, ax = plt.subplots(figsize=(10, 6))

data = distribution.rvs(n)

# == Compute sample mean at each n == #

sample_mean = np.empty(n)

for i in range(1, n):

sample_mean[i] = np.mean(data[:i])

# == Plot == #
ax.plot(list(range(n)), sample_mean, 'r-', lw=3, alpha=0.6,
label='$\\bar X_n$')
ax.plot(list(range(n)), [0] * n, 'k--', lw=0.5)
ax.legend()

plt.show()
24.5. CLT 393

Here we’ve increased 𝑛 to 1000, but the sequence still shows no sign of converging
Will convergence become visible if we take 𝑛 even larger?
The answer is no
To see this, recall that the characteristic function of the Cauchy distribution is

𝜙(𝑡) = E𝑒𝑖𝑡𝑋 = ∫ 𝑒𝑖𝑡𝑥 𝑓(𝑥)𝑑𝑥 = 𝑒−|𝑡| (5)

Using independence, the characteristic function of the sample mean becomes

̄ 𝑡 𝑛
E𝑒𝑖𝑡𝑋𝑛 = E exp {𝑖 ∑ 𝑋𝑗 }
𝑛 𝑗=1
𝑛
𝑡
= E ∏ exp {𝑖 𝑋𝑗 }
𝑗=1
𝑛
𝑛
𝑡
= ∏ E exp {𝑖 𝑋𝑗 } = [𝜙(𝑡/𝑛)]𝑛
𝑗=1
𝑛

In view of Eq. (5), this is just 𝑒−|𝑡|

Thus, in the case of the Cauchy distribution, the sample mean itself has the very same
Cauchy distribution, regardless of 𝑛
In particular, the sequence 𝑋̄ 𝑛 does not converge to a point

24.5 CLT

Next, we turn to the central limit theorem, which tells us about the distribution of the devia-
tion between sample averages and population means
394 24. LLN AND CLT

24.5.1 Statement of the Theorem

The central limit theorem is one of the most remarkable results in all of mathematics
In the classical IID setting, it tells us the following:
If the sequence 𝑋1 , … , 𝑋𝑛 is IID, with common mean 𝜇 and common variance 𝜎2 ∈ (0, ∞),
then

√ 𝑑
𝑛(𝑋̄ 𝑛 − 𝜇) → 𝑁 (0, 𝜎2 ) as 𝑛→∞ (6)

𝑑
Here → 𝑁 (0, 𝜎2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal
with standard deviation 𝜎

24.5.2 Intuition

The striking implication of the CLT is that for any distribution with finite second moment,
the simple operation of adding independent copies always leads to a Gaussian curve
A relatively simple proof of the central limit theorem can be obtained by working with char-
acteristic functions (see, e.g., theorem 9.5.6 of [38])
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition
In fact, all of the proofs of the CLT that we know are similar in this respect
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating the addition of independent Bernoulli
random variables
In particular, let 𝑋𝑖 be binary, with P{𝑋𝑖 = 0} = P{𝑋𝑖 = 1} = 0.5, and let 𝑋1 , … , 𝑋𝑛 be
independent
𝑛
Think of 𝑋𝑖 = 1 as a “success”, so that 𝑌𝑛 = ∑𝑖=1 𝑋𝑖 is the number of successes in 𝑛 trials
The next figure plots the probability mass function of 𝑌𝑛 for 𝑛 = 1, 2, 4, 8

In [4]: from scipy.stats import binom

fig, axes = plt.subplots(2, 2, figsize=(10, 6))

plt.subplots_adjust(hspace=0.4)
axes = axes.flatten()
ns = [1, 2, 4, 8]
dom = list(range(9))

for ax, n in zip(axes, ns):

b = binom(n, 0.5)
ax.bar(dom, b.pmf(dom), alpha=0.6, align='center')
ax.set(xlim=(-0.5, 8.5), ylim=(0, 0.55),
xticks=list(range(9)), yticks=(0, 0.2, 0.4),
title=f'$n = {n}$')

plt.show()
24.5. CLT 395

When 𝑛 = 1, the distribution is flat — one success or no successes have the same probability
When 𝑛 = 2 we can either have 0, 1 or 2 successes
Notice the peak in probability mass at the mid-point 𝑘 = 1
The reason is that there are more ways to get 1 success (“fail then succeed” or “succeed then
fail”) than to get zero or two successes
Moreover, the two trials are independent, so the outcomes “fail then succeed” and “succeed
then fail” are just as likely as the outcomes “fail then fail” and “succeed then succeed”
(If there was positive correlation, say, then “succeed then fail” would be less likely than “suc-
ceed then succeed”)
Here, already we have the essence of the CLT: addition under independence leads probability
mass to pile up in the middle and thin out at the tails
For 𝑛 = 4 and 𝑛 = 8 we again get a peak at the “middle” value (halfway between the mini-
mum and the maximum possible value)
The intuition is the same — there are simply more ways to get these middle outcomes
If we continue, the bell-shaped curve becomes even more pronounced
We are witnessing the binomial approximation of the normal distribution

24.5.3 Simulation 1

Since the CLT seems almost magical, running simulations that verify its implications is one
good way to build intuition
To this end, we now perform the following simulation

1. Choose an arbitrary distribution 𝐹 for the underlying observations 𝑋𝑖

396 24. LLN AND CLT

√
2. Generate independent draws of 𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇)
3. Use these draws to compute some measure of their distribution — such as a histogram
4. Compare the latter to 𝑁 (0, 𝜎2 )

Here’s some code that does exactly this for the exponential distribution 𝐹 (𝑥) = 1 − 𝑒−𝜆𝑥
(Please experiment with other choices of 𝐹 , but remember that, to conform with the condi-
tions of the CLT, the distribution must have a finite second moment)

In [5]: from scipy.stats import norm

# == Set parameters == #
n = 250 # Choice of n
k = 100000 # Number of draws of Y_n
distribution = expon(2) # Exponential distribution, λ = 1/2
μ, s = distribution.mean(), distribution.std()

# == Draw underlying RVs. Each row contains a draw of X_1,..,X_n == #

data = distribution.rvs((k, n))
# == Compute mean of each row, producing k draws of \bar X_n == #
sample_means = data.mean(axis=1)
# == Generate observations of Y_n == #
Y = np.sqrt(n) * (sample_means - μ)

# == Plot == #
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * s, 3 * s
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k-', lw=2, label='$N(0, \sigma^2)$')
ax.legend()

plt.show()

Notice the absence of for loops — every operation is vectorized, meaning that the major cal-
culations are all shifted to highly optimized C code
24.5. CLT 397

The fit to the normal density is already tight and can be further improved by increasing n
You can also experiment with other specifications of 𝐹

24.5.4 Simulation 2

Our next simulation is somewhat like the first, except that we aim to track the distribution of
√
𝑌𝑛 ∶= 𝑛(𝑋̄ 𝑛 − 𝜇) as 𝑛 increases
In the simulation, we’ll be working with random variables having 𝜇 = 0
Thus, when 𝑛 = 1, we have 𝑌1 = 𝑋1 , so the first distribution is just the distribution of the
underlying random variable
√
For 𝑛 = 2, the distribution of 𝑌2 is that of (𝑋1 + 𝑋2 )/ 2, and so on
What we expect is that, regardless of the distribution of the underlying random variable, the
distribution of 𝑌𝑛 will smooth out into a bell-shaped curve
The next figure shows this process for 𝑋𝑖 ∼ 𝑓, where 𝑓 was specified as the convex combina-
tion of three different beta densities
(Taking a convex combination is an easy way to produce an irregular shape for 𝑓)
In the figure, the closest density is that of 𝑌1 , while the furthest is that of 𝑌5

In [6]: from scipy.stats import gaussian_kde

from mpl_toolkits.mplot3d import Axes3D
from matplotlib.collections import PolyCollection

beta_dist = beta(2, 2)

def gen_x_draws(k):
"""
Returns a flat array containing k independent draws from the
distribution of X, the underlying random variable. This distribution is
itself a convex combination of three beta distributions.
"""
bdraws = beta_dist.rvs((3, k))
# == Transform rows, so each represents a different distribution == #
bdraws[0, :] -= 0.5
bdraws[1, :] += 0.6
bdraws[2, :] -= 1.1
# == Set X[i] = bdraws[j, i], where j is a random draw from {0, 1, 2} == #
js = np.random.randint(0, 2, size=k)
X = bdraws[js, np.arange(k)]
# == Rescale, so that the random variable is zero mean == #
m, sigma = X.mean(), X.std()
return (X - m) / sigma

nmax = 5
reps = 100000
ns = list(range(1, nmax + 1))

# == Form a matrix Z such that each column is reps independent draws of X == #

Z = np.empty((reps, nmax))
for i in range(nmax):
Z[:, i] = gen_x_draws(reps)
# == Take cumulative sum across columns
S = Z.cumsum(axis=1)
# == Multiply j-th column by sqrt j == #
Y = (1 / np.sqrt(ns)) * S

# == Plot == #

fig = plt.figure(figsize = (10, 6))

398 24. LLN AND CLT

ax = fig.gca(projection='3d')

a, b = -3, 3
gs = 100
xs = np.linspace(a, b, gs)

# == Build verts == #
greys = np.linspace(0.3, 0.7, nmax)
verts = []
for n in ns:
density = gaussian_kde(Y[:, n-1])
ys = density(xs)
verts.append(list(zip(xs, ys)))

poly = PolyCollection(verts, facecolors=[str(g) for g in greys])

poly.set_alpha(0.85)
ax.add_collection3d(poly, zs=ns, zdir='x')

ax.set(xlim3d=(1, nmax), xticks=(ns), ylabel='$Y_n$', zlabel='$p(y_n)$',

xlabel=("n"), yticks=((-3, 0, 3)), ylim3d=(a, b),
zlim3d=(0, 0.4), zticks=((0.2, 0.4)))
ax.invert_xaxis()
ax.view_init(30, 45) # Rotates the plot 30 deg on z axis and 45 deg on x axis
plt.show()

As expected, the distribution smooths out into a bell curve as 𝑛 increases

We leave you to investigate its contents if you wish to know more
If you run the file from the ordinary IPython shell, the figure should pop up in a window that
you can rotate with your mouse, giving different views on the density sequence

24.5.5 The Multivariate Case

The law of large numbers and central limit theorem work just as nicely in multidimensional
settings
To state the results, let’s recall some elementary facts about random vectors
A random vector X is just a sequence of 𝑘 random variables (𝑋1 , … , 𝑋𝑘 )
24.5. CLT 399

Each realization of X is an element of R𝑘

A collection of random vectors X1 , … , X𝑛 is called independent if, given any 𝑛 vectors
x1 , … , x𝑛 in R𝑘 , we have

P{X1 ≤ x1 , … , X𝑛 ≤ x𝑛 } = P{X1 ≤ x1 } × ⋯ × P{X𝑛 ≤ x𝑛 }

(The vector inequality X ≤ x means that 𝑋𝑗 ≤ 𝑥𝑗 for 𝑗 = 1, … , 𝑘)

Let 𝜇𝑗 ∶= E[𝑋𝑗 ] for all 𝑗 = 1, … , 𝑘
The expectation E[X] of X is defined to be the vector of expectations:

E[𝑋1 ] 𝜇1
⎛
⎜ E[𝑋2 ] ⎞
⎟ ⎛
⎜ 𝜇2 ⎞
⎟
E[X] ∶= ⎜
⎜ ⎟
⎟ =⎜ ⎟ =∶ 𝜇
⎜ ⋮ ⎟ ⎜⎜ ⋮ ⎟⎟
⎝ E[𝑋 𝑘] 𝜇
⎠ ⎝ 𝑘 ⎠

The variance-covariance matrix of random vector X is defined as

Var[X] ∶= E[(X − 𝜇)(X − 𝜇)′ ]

Expanding this out, we get

E[(𝑋1 − 𝜇1 )(𝑋1 − 𝜇1 )] ⋯ E[(𝑋1 − 𝜇1 )(𝑋𝑘 − 𝜇𝑘 )]

⎛ E[(𝑋 ⎞
⎜ 2 − 𝜇2 )(𝑋1 − 𝜇1 )] ⋯ E[(𝑋2 − 𝜇2 )(𝑋𝑘 − 𝜇𝑘 )] ⎟
Var[X] = ⎜
⎜ ⎟
⎟
⎜ ⋮ ⋮ ⋮ ⎟
⎝ E[(𝑋𝑘 − 𝜇𝑘 )(𝑋1 − 𝜇1 )] ⋯ E[(𝑋𝑘 − 𝜇𝑘 )(𝑋𝑘 − 𝜇𝑘 )] ⎠

The 𝑗, 𝑘-th term is the scalar covariance between 𝑋𝑗 and 𝑋𝑘

With this notation, we can proceed to the multivariate LLN and CLT
Let X1 , … , X𝑛 be a sequence of independent and identically distributed random vectors, each
one taking values in R𝑘
Let 𝜇 be the vector E[X𝑖 ], and let Σ be the variance-covariance matrix of X𝑖
Interpreting vector addition and scalar multiplication in the usual way (i.e., pointwise), let

1 𝑛
X̄ 𝑛 ∶= ∑ X𝑖
𝑛 𝑖=1

In this setting, the LLN tells us that

P {X̄ 𝑛 → 𝜇 as 𝑛 → ∞} = 1 (7)

Here X̄ 𝑛 → 𝜇 means that ‖X̄ 𝑛 − 𝜇‖ → 0, where ‖ ⋅ ‖ is the standard Euclidean norm

The CLT tells us that, provided Σ is finite,

√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) as 𝑛→∞ (8)
400 24. LLN AND CLT

24.6 Exercises

24.6.1 Exercise 1

One very useful consequence of the central limit theorem is as follows

Assume the conditions of the CLT as stated above
If 𝑔 ∶ R → R is differentiable at 𝜇 and 𝑔′ (𝜇) ≠ 0, then

√ 𝑑
𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} → 𝑁 (0, 𝑔′ (𝜇)2 𝜎2 ) as 𝑛→∞ (9)

This theorem is used frequently in statistics to obtain the asymptotic distribution of estima-
tors — many of which can be expressed as functions of sample means
(These kinds of results are often said to use the “delta method”)
The proof is based on a Taylor expansion of 𝑔 around the point 𝜇
Taking the result as given, let the distribution 𝐹 of each 𝑋𝑖 be uniform on [0, 𝜋/2] and let
𝑔(𝑥) = sin(𝑥)
√
Derive the asymptotic distribution of 𝑛{𝑔(𝑋̄ 𝑛 ) − 𝑔(𝜇)} and illustrate convergence in the
same spirit as the program illustrate_clt.py discussed above
What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
What is the source of the problem?

24.6.2 Exercise 2

Here’s a result that’s often used in developing statistical tests, and is connected to the multi-
variate central limit theorem
If you study econometric theory, you will see this result used again and again
Assume the setting of the multivariate CLT discussed above, so that

1. X1 , … , X𝑛 is a sequence of IID random vectors, each taking values in R𝑘

2. 𝜇 ∶= E[X𝑖 ], and Σ is the variance-covariance matrix of X𝑖
3. The convergence

√ 𝑑
𝑛(X̄ 𝑛 − 𝜇) → 𝑁 (0, Σ) (10)

is valid
In a statistical setting, one often wants the right-hand side to be standard normal so that
confidence intervals are easily computed
This normalization can be achieved on the basis of three observations
First, if X is a random vector in R𝑘 and A is constant and 𝑘 × 𝑘, then

Var[AX] = A Var[X]A′
24.6. EXERCISES 401

𝑑
Second, by the continuous mapping theorem, if Z𝑛 → Z in R𝑘 and A is constant and 𝑘 × 𝑘,
then

𝑑
AZ𝑛 → AZ

Third, if S is a 𝑘 × 𝑘 symmetric positive definite matrix, then there exists a symmetric posi-
tive definite matrix Q, called the inverse square root of S, such that

QSQ′ = I

Here I is the 𝑘 × 𝑘 identity matrix

Putting these things together, your first exercise is to show that if Q is the inverse square
root of �, then

√ 𝑑
Z𝑛 ∶= 𝑛Q(X̄ 𝑛 − 𝜇) → Z ∼ 𝑁 (0, I)

Applying the continuous mapping theorem one more time tells us that

𝑑
‖Z𝑛 ‖2 → ‖Z‖2

Given the distribution of Z, we conclude that

𝑑
𝑛‖Q(X̄ 𝑛 − 𝜇)‖2 → 𝜒2 (𝑘) (11)

where 𝜒2 (𝑘) is the chi-squared distribution with 𝑘 degrees of freedom

(Recall that 𝑘 is the dimension of X𝑖 , the underlying random vectors)
Your second exercise is to illustrate the convergence in Eq. (11) with a simulation
In doing so, let

𝑊𝑖
X𝑖 ∶= ( )
𝑈𝑖 + 𝑊 𝑖

where

• each 𝑊𝑖 is an IID draw from the uniform distribution on [−1, 1]

• each 𝑈𝑖 is an IID draw from the uniform distribution on [−2, 2]
• 𝑈𝑖 and 𝑊𝑖 are independent of each other

Hints:

1. scipy.linalg.sqrtm(A) computes the square root of A. You still need to invert it

2. You should be able to work out Σ from the preceding information
402 24. LLN AND CLT

24.7 Solutions

24.7.1 Exercise 1

Here is one solution

In [7]: """
Illustrates the delta method, a consequence of the central limit theorem.
"""

from scipy.stats import uniform

# == Set parameters == #
n = 250
replications = 100000
distribution = uniform(loc=0, scale=(np.pi / 2))
μ, s = distribution.mean(), distribution.std()

g = np.sin
g_prime = np.cos

# == Generate obs of sqrt{n} (g(X_n) - g(μ)) == #

data = distribution.rvs((replications, n))
sample_means = data.mean(axis=1) # Compute mean of each row
error_obs = np.sqrt(n) * (g(sample_means) - g(μ))

# == Plot == #
asymptotic_sd = g_prime(μ) * s
fig, ax = plt.subplots(figsize=(10, 6))
xmin = -3 * g_prime(μ) * s
xmax = -xmin
ax.set_xlim(xmin, xmax)
ax.hist(error_obs, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
lb = "$N(0, g'(\mu)^2 \sigma^2)$"
ax.plot(xgrid, norm.pdf(xgrid, scale=asymptotic_sd), 'k-', lw=2, label=lb)
ax.legend()
plt.show()
24.7. SOLUTIONS 403

What happens when you replace [0, 𝜋/2] with [0, 𝜋]?
In this case, the mean 𝜇 of this distribution is 𝜋/2, and since 𝑔′ = cos, we have 𝑔′ (𝜇) = 0
Hence the conditions of the delta theorem are not satisfied

24.7.2 Exercise 2

First we want to verify the claim that

√ 𝑑
𝑛Q(X̄ 𝑛 − 𝜇) → 𝑁 (0, I)

This is straightforward given the facts presented in the exercise

Let

√
Y𝑛 ∶= 𝑛(X̄ 𝑛 − 𝜇) and Y ∼ 𝑁 (0, Σ)

By the multivariate CLT and the continuous mapping theorem, we have

𝑑
QY𝑛 → QY

Since linear combinations of normal random variables are normal, the vector QY is also nor-
mal
Its mean is clearly 0, and its variance-covariance matrix is

Var[QY] = QVar[Y]Q′ = QΣQ′ = I

𝑑
In conclusion, QY𝑛 → QY ∼ 𝑁 (0, I), which is what we aimed to show
Now we turn to the simulation exercise
Our solution is as follows

In [8]: from scipy.stats import chi2

from scipy.linalg import inv, sqrtm

# == Set parameters == #
n = 250
replications = 50000
dw = uniform(loc=-1, scale=2) # Uniform(-1, 1)
du = uniform(loc=-2, scale=4) # Uniform(-2, 2)
sw, su = dw.std(), du.std()
vw, vu = sw**2, su**2
Σ = ((vw, vw), (vw, vw + vu))
Σ = np.array(Σ)

# == Compute Σ^{-1/2} == #
Q = inv(sqrtm(Σ))

# == Generate observations of the normalized sample mean == #

error_obs = np.empty((2, replications))
for i in range(replications):
# == Generate one sequence of bivariate shocks == #
X = np.empty((2, n))
W = dw.rvs(n)
U = du.rvs(n)
404 24. LLN AND CLT

# == Construct the n observations of the random vector == #

X[0, :] = W
X[1, :] = W + U
# == Construct the i-th observation of Y_n == #
error_obs[:, i] = np.sqrt(n) * X.mean(axis=1)

# == Premultiply by Q and then take the squared norm == #

temp = Q @ error_obs
chisq_obs = np.sum(temp**2, axis=0)

# == Plot == #
fig, ax = plt.subplots(figsize=(10, 6))
xmax = 8
ax.set_xlim(0, xmax)
xgrid = np.linspace(0, xmax, 200)
lb = "Chi-squared with 2 degrees of freedom"
ax.plot(xgrid, chi2.pdf(xgrid, 2), 'k-', lw=2, label=lb)
ax.legend()
ax.hist(chisq_obs, bins=50, density=True)
plt.show()
25

Linear State Space Models

25.1 Contents

• Overview 25.2

• The Linear State Space Model 25.3

• Distributions and Moments 25.4

• Stationarity and Ergodicity 25.5

• Noisy Observations 25.6

• Prediction 25.7

• Code 25.8

• Exercises 25.9

• Solutions 25.10

“We may regard the present state of the universe as the effect of its past and the
cause of its future” – Marquis de Laplace

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

25.2 Overview

This lecture introduces the linear state space dynamic system

This model is a workhorse that carries a powerful theory of prediction
Its many applications include:

• representing dynamics of higher-order linear systems

• predicting the position of a system 𝑗 steps into the future

405
406 25. LINEAR STATE SPACE MODELS

• predicting a geometric sum of future values of a variable like

– non-financial income
– dividends on a stock
– the money supply
– a government deficit or surplus, etc.

• key ingredient of useful models

– Friedman’s permanent income model of consumption smoothing

– Barro’s model of smoothing total tax collections
– Rational expectations version of Cagan’s model of hyperinflation
– Sargent and Wallace’s “unpleasant monetarist arithmetic,” etc.

25.3 The Linear State Space Model

The objects in play are:

• An 𝑛 × 1 vector 𝑥𝑡 denoting the state at time 𝑡 = 0, 1, 2, …

• An IID sequence of 𝑚 × 1 random vectors 𝑤𝑡 ∼ 𝑁 (0, 𝐼)
• A 𝑘 × 1 vector 𝑦𝑡 of observations at time 𝑡 = 0, 1, 2, …
• An 𝑛 × 𝑛 matrix 𝐴 called the transition matrix
• An 𝑛 × 𝑚 matrix 𝐶 called the volatility matrix
• A 𝑘 × 𝑛 matrix 𝐺 sometimes called the output matrix

Here is the linear state-space system

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1

𝑦𝑡 = 𝐺𝑥𝑡 (1)
𝑥0 ∼ 𝑁 (𝜇0 , Σ0 )

25.3.1 Primitives

The primitives of the model are

1. the matrices 𝐴, 𝐶, 𝐺
2. shock distribution, which we have specialized to 𝑁 (0, 𝐼)
3. the distribution of the initial condition 𝑥0 , which we have set to 𝑁 (𝜇0 , Σ0 )

Given 𝐴, 𝐶, 𝐺 and draws of 𝑥0 and 𝑤1 , 𝑤2 , …, the model Eq. (1) pins down the values of the
sequences {𝑥𝑡 } and {𝑦𝑡 }
Even without these draws, the primitives 1–3 pin down the probability distributions of {𝑥𝑡 }
and {𝑦𝑡 }
Later we’ll see how to compute these distributions and their moments
Martingale Difference Shocks
We’ve made the common assumption that the shocks are independent standardized normal
vectors
25.3. THE LINEAR STATE SPACE MODEL 407

But some of what we say will be valid under the assumption that {𝑤𝑡+1 } is a martingale
difference sequence
A martingale difference sequence is a sequence that is zero mean when conditioned on past
information
In the present case, since {𝑥𝑡 } is our state sequence, this means that it satisfies

E[𝑤𝑡+1 |𝑥𝑡 , 𝑥𝑡−1 , …] = 0

This is a weaker condition than that {𝑤𝑡 } is IID with 𝑤𝑡+1 ∼ 𝑁 (0, 𝐼)

25.3.2 Examples

By appropriate choice of the primitives, a variety of dynamics can be represented in terms of

the linear state space model
The following examples help to highlight this point
They also illustrate the wise dictum finding the state is an art
Second-order Difference Equation
Let {𝑦𝑡 } be a deterministic sequence that satisfies

𝑦𝑡+1 = 𝜙0 + 𝜙1 𝑦𝑡 + 𝜙2 𝑦𝑡−1 s.t. 𝑦0 , 𝑦−1 given (2)

To map Eq. (2) into our state space system Eq. (1), we set

1 1 0 0 0
𝑥𝑡 = ⎡ 𝑦
⎢ 𝑡 ⎥
⎤ 𝐴=⎡ ⎤
⎢ 0 𝜙1 𝜙2 ⎥
𝜙 𝐶=⎡
⎢0⎥
⎤ 𝐺 = [0 1 0]
⎣𝑦𝑡−1 ⎦ ⎣0 1 0⎦ ⎣0⎦

You can confirm that under these definitions, Eq. (1) and Eq. (2) agree
The next figure shows the dynamics of this process when 𝜙0 = 1.1, 𝜙1 = 0.8, 𝜙2 = −0.8, 𝑦0 =
𝑦−1 = 1
408 25. LINEAR STATE SPACE MODELS

Later you’ll be asked to recreate this figure

Univariate Autoregressive Processes
We can use Eq. (1) to represent the model

𝑦𝑡+1 = 𝜙1 𝑦𝑡 + 𝜙2 𝑦𝑡−1 + 𝜙3 𝑦𝑡−2 + 𝜙4 𝑦𝑡−3 + 𝜎𝑤𝑡+1 (3)

where {𝑤𝑡 } is IID and standard normal

′
To put this in the linear state space format we take 𝑥𝑡 = [𝑦𝑡 𝑦𝑡−1 𝑦𝑡−2 𝑦𝑡−3 ] and

𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡1 0 0 0⎤ ⎡0⎤
𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [1 0 0 0]
⎢0 1 0 0⎥ ⎢0⎥
⎣0 0 1 0⎦ ⎣0⎦

The matrix 𝐴 has the form of the companion matrix to the vector [𝜙1 𝜙2 𝜙3 𝜙4 ]
The next figure shows the dynamics of this process when

𝜙1 = 0.5, 𝜙2 = −0.2, 𝜙3 = 0, 𝜙4 = 0.5, 𝜎 = 0.2, 𝑦0 = 𝑦−1 = 𝑦−2 = 𝑦−3 = 1

Vector Autoregressions
Now suppose that

• 𝑦𝑡 is a 𝑘 × 1 vector
• 𝜙𝑗 is a 𝑘 × 𝑘 matrix and
• 𝑤𝑡 is 𝑘 × 1

Then Eq. (3) is termed a vector autoregression

To map this into Eq. (1), we set
25.3. THE LINEAR STATE SPACE MODEL 409

𝑦𝑡 𝜙1 𝜙2 𝜙3 𝜙4 𝜎
⎡𝑦 ⎤ ⎡𝐼 0 0 0⎤ ⎡0⎤
𝑥𝑡 = ⎢ 𝑡−1 ⎥ 𝐴=⎢ ⎥ 𝐶=⎢ ⎥ 𝐺 = [𝐼 0 0 0]
⎢𝑦𝑡−2 ⎥ ⎢0 𝐼 0 0⎥ ⎢0⎥
⎣𝑦𝑡−3 ⎦ ⎣0 0 𝐼 0⎦ ⎣0⎦

where 𝐼 is the 𝑘 × 𝑘 identity matrix and 𝜎 is a 𝑘 × 𝑘 matrix

Seasonals
We can use Eq. (1) to represent

1. the deterministic seasonal 𝑦𝑡 = 𝑦𝑡−4

2. the indeterministic seasonal 𝑦𝑡 = 𝜙4 𝑦𝑡−4 + 𝑤𝑡

In fact, both are special cases of Eq. (3)

With the deterministic seasonal, the transition matrix becomes

0 0 0 1
⎡1 0 0 0⎤
𝐴=⎢ ⎥
⎢0 1 0 0⎥
⎣0 0 1 0⎦

It is easy to check that 𝐴4 = 𝐼, which implies that 𝑥𝑡 is strictly periodic with period 4:[1]

𝑥𝑡+4 = 𝑥𝑡

Such an 𝑥𝑡 process can be used to model deterministic seasonals in quarterly time series
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations
Time Trends
The model 𝑦𝑡 = 𝑎𝑡 + 𝑏 is known as a linear time trend
We can represent this model in the linear state space form by taking

1 1 0
𝐴=[ ] 𝐶=[ ] 𝐺 = [𝑎 𝑏] (4)
0 1 0
′
and starting at initial condition 𝑥0 = [0 1]
In fact, it’s possible to use the state-space system to represent polynomial trends of any order
For instance, let

0 1 1 0 0
𝑥0 = ⎢0⎤
⎡
⎥ 𝐴 = ⎢0 1 1 ⎤
⎡
⎥ 𝐶 = ⎢0⎤
⎡
⎥
1
⎣ ⎦ ⎣ 0 0 1 ⎦ 0
⎣ ⎦
It follows that

1 𝑡 𝑡(𝑡 − 1)/2
𝐴𝑡 = ⎡
⎢0 1 𝑡 ⎤
⎥
⎣0 0 1 ⎦
410 25. LINEAR STATE SPACE MODELS

Then 𝑥′𝑡 = [𝑡(𝑡 − 1)/2 𝑡 1], so that 𝑥𝑡 contains linear and quadratic time trends

25.3.3 Moving Average Representations

A nonrecursive expression for 𝑥𝑡 as a function of 𝑥0 , 𝑤1 , 𝑤2 , … , 𝑤𝑡 can be found by using

Eq. (1) repeatedly to obtain

𝑥𝑡 = 𝐴𝑥𝑡−1 + 𝐶𝑤𝑡
= 𝐴2 𝑥𝑡−2 + 𝐴𝐶𝑤𝑡−1 + 𝐶𝑤𝑡
⋮ (5)
𝑡−1
= ∑ 𝐴𝑗 𝐶𝑤𝑡−𝑗 + 𝐴𝑡 𝑥0
𝑗=0

Representation Eq. (5) is a moving average representation

It expresses {𝑥𝑡 } as a linear function of

1. current and past values of the process {𝑤𝑡 } and

2. the initial condition 𝑥0

As an example of a moving average representation, let the model be

1 1 1
𝐴=[ ] 𝐶=[ ]
0 1 0

1 𝑡 ′
You will be able to show that 𝐴𝑡 = [ ] and 𝐴𝑗 𝐶 = [1 0]
0 1
Substituting into the moving average representation Eq. (5), we obtain

𝑡−1
𝑥1𝑡 = ∑ 𝑤𝑡−𝑗 + [1 𝑡] 𝑥0
𝑗=0

where 𝑥1𝑡 is the first entry of 𝑥𝑡

The first term on the right is a cumulated sum of martingale differences and is therefore a
martingale
The second term is a translated linear function of time
For this reason, 𝑥1𝑡 is called a martingale with drift

25.4 Distributions and Moments

25.4.1 Unconditional Moments

Using Eq. (1), it’s easy to obtain expressions for the (unconditional) means of 𝑥𝑡 and 𝑦𝑡
We’ll explain what unconditional and conditional mean soon
25.4. DISTRIBUTIONS AND MOMENTS 411

Letting 𝜇𝑡 ∶= E[𝑥𝑡 ] and using linearity of expectations, we find that

𝜇𝑡+1 = 𝐴𝜇𝑡 with 𝜇0 given (6)

Here 𝜇0 is a primitive given in Eq. (1)

The variance-covariance matrix of 𝑥𝑡 is Σ𝑡 ∶= E[(𝑥𝑡 − 𝜇𝑡 )(𝑥𝑡 − 𝜇𝑡 )′ ]
Using 𝑥𝑡+1 − 𝜇𝑡+1 = 𝐴(𝑥𝑡 − 𝜇𝑡 ) + 𝐶𝑤𝑡+1 , we can determine this matrix recursively via

Σ𝑡+1 = 𝐴Σ𝑡 𝐴′ + 𝐶𝐶 ′ with Σ0 given (7)

As with 𝜇0 , the matrix Σ0 is a primitive given in Eq. (1)

As a matter of terminology, we will sometimes call

• 𝜇𝑡 the unconditional mean of 𝑥𝑡

• Σ𝑡 the unconditional variance-covariance matrix of 𝑥𝑡

This is to distinguish 𝜇𝑡 and Σ𝑡 from related objects that use conditioning information, to be
defined below
However, you should be aware that these “unconditional” moments do depend on the initial
distribution 𝑁 (𝜇0 , Σ0 )
Moments of the Observations
Using linearity of expectations again we have

E[𝑦𝑡 ] = E[𝐺𝑥𝑡 ] = 𝐺𝜇𝑡 (8)

The variance-covariance matrix of 𝑦𝑡 is easily shown to be

Var[𝑦𝑡 ] = Var[𝐺𝑥𝑡 ] = 𝐺Σ𝑡 𝐺′ (9)

25.4.2 Distributions

In general, knowing the mean and variance-covariance matrix of a random vector is not quite
as good as knowing the full distribution
However, there are some situations where these moments alone tell us all we need to know
These are situations in which the mean vector and covariance matrix are sufficient statis-
tics for the population distribution
(Sufficient statistics form a list of objects that characterize a population distribution)
One such situation is when the vector in question is Gaussian (i.e., normally distributed)
This is the case here, given

1. our Gaussian assumptions on the primitives

2. the fact that normality is preserved under linear operations
412 25. LINEAR STATE SPACE MODELS

In fact, it’s well-known that

𝑢 ∼ 𝑁 (𝑢,̄ 𝑆) and 𝑣 = 𝑎 + 𝐵𝑢 ⟹ 𝑣 ∼ 𝑁 (𝑎 + 𝐵𝑢,̄ 𝐵𝑆𝐵′ ) (10)

In particular, given our Gaussian assumptions on the primitives and the linearity of Eq. (1)
we can see immediately that both 𝑥𝑡 and 𝑦𝑡 are Gaussian for all 𝑡 ≥ 0 [2]
Since 𝑥𝑡 is Gaussian, to find the distribution, all we need to do is find its mean and variance-
covariance matrix
But in fact we’ve already done this, in Eq. (6) and Eq. (7)
Letting 𝜇𝑡 and Σ𝑡 be as defined by these equations, we have

𝑥𝑡 ∼ 𝑁 (𝜇𝑡 , Σ𝑡 ) (11)

By similar reasoning combined with Eq. (8) and Eq. (9),

𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ ) (12)

25.4.3 Ensemble Interpretations

How should we interpret the distributions defined by Eq. (11)–Eq. (12)?

Intuitively, the probabilities in a distribution correspond to relative frequencies in a large
population drawn from that distribution
Let’s apply this idea to our setting, focusing on the distribution of 𝑦𝑇 for fixed 𝑇
We can generate independent draws of 𝑦𝑇 by repeatedly simulating the evolution of the sys-
tem up to time 𝑇 , using an independent set of shocks each time
The next figure shows 20 simulations, producing 20 time series for {𝑦𝑡 }, and hence 20 draws
of 𝑦𝑇
The system in question is the univariate autoregressive model Eq. (3)
The values of 𝑦𝑇 are represented by black dots in the left-hand figure

In the right-hand figure, these values are converted into a rotated histogram that shows rela-
tive frequencies from our sample of 20 𝑦𝑇 ’s
(The parameters and source code for the figures can be found in file lin-
ear_models/paths_and_hist.py)
Here is another figure, this time with 100 observations
25.4. DISTRIBUTIONS AND MOMENTS 413

Let’s now try with 500,000 observations, showing only the histogram (without rotation)

The black line is the population density of 𝑦𝑇 calculated from Eq. (12)
The histogram and population distribution are close, as expected
By looking at the figures and experimenting with parameters, you will gain a feel for how the
population distribution depends on the model primitives listed above, as intermediated by the
distribution’s sufficient statistics
Ensemble Means
In the preceding figure, we approximated the population distribution of 𝑦𝑇 by

1. generating 𝐼 sample paths (i.e., time series) where 𝐼 is a large number

2. recording each observation 𝑦𝑇𝑖
3. histogramming this sample

Just as the histogram approximates the population distribution, the ensemble or cross-
sectional average

1 𝐼 𝑖
𝑦𝑇̄ ∶= ∑𝑦
𝐼 𝑖=1 𝑇

approximates the expectation E[𝑦𝑇 ] = 𝐺𝜇𝑇 (as implied by the law of large numbers)
Here’s a simulation comparing the ensemble averages and population means at time points
𝑡 = 0, … , 50
414 25. LINEAR STATE SPACE MODELS

The parameters are the same as for the preceding figures, and the sample size is relatively
small (𝐼 = 20)

The ensemble mean for 𝑥𝑡 is

1 𝐼 𝑖
𝑥𝑇̄ ∶= ∑ 𝑥 → 𝜇𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇

The limit 𝜇𝑇 is a “long-run average”

(By long-run average we mean the average for an infinite (𝐼 = ∞) number of sample 𝑥𝑇 ’s)
Another application of the law of large numbers assures us that

1 𝐼
∑(𝑥𝑖 − 𝑥𝑇̄ )(𝑥𝑖𝑇 − 𝑥𝑇̄ )′ → Σ𝑇 (𝐼 → ∞)
𝐼 𝑖=1 𝑇

25.4.4 Joint Distributions

In the preceding discussion, we looked at the distributions of 𝑥𝑡 and 𝑦𝑡 in isolation

This gives us useful information but doesn’t allow us to answer questions like

• what’s the probability that 𝑥𝑡 ≥ 0 for all 𝑡?

• what’s the probability that the process {𝑦𝑡 } exceeds some value 𝑎 before falling below
𝑏?
• etc., etc.

Such questions concern the joint distributions of these sequences

To compute the joint distribution of 𝑥0 , 𝑥1 , … , 𝑥𝑇 , recall that joint and conditional densities
are linked by the rule

𝑝(𝑥, 𝑦) = 𝑝(𝑦 | 𝑥)𝑝(𝑥) (joint = conditional × marginal)

25.5. STATIONARITY AND ERGODICITY 415

From this rule we get 𝑝(𝑥0 , 𝑥1 ) = 𝑝(𝑥1 | 𝑥0 )𝑝(𝑥0 )

The Markov property 𝑝(𝑥𝑡 | 𝑥𝑡−1 , … , 𝑥0 ) = 𝑝(𝑥𝑡 | 𝑥𝑡−1 ) and repeated applications of the preced-
ing rule lead us to

𝑇 −1
𝑝(𝑥0 , 𝑥1 , … , 𝑥𝑇 ) = 𝑝(𝑥0 ) ∏ 𝑝(𝑥𝑡+1 | 𝑥𝑡 )
𝑡=0

The marginal 𝑝(𝑥0 ) is just the primitive 𝑁 (𝜇0 , Σ0 )

In view of Eq. (1), the conditional densities are

𝑝(𝑥𝑡+1 | 𝑥𝑡 ) = 𝑁 (𝐴𝑥𝑡 , 𝐶𝐶 ′ )

Autocovariance Functions
An important object related to the joint distribution is the autocovariance function

Σ𝑡+𝑗,𝑡 ∶= E[(𝑥𝑡+𝑗 − 𝜇𝑡+𝑗 )(𝑥𝑡 − 𝜇𝑡 )′ ] (13)

Elementary calculations show that

Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ𝑡 (14)

Notice that Σ𝑡+𝑗,𝑡 in general depends on both 𝑗, the gap between the two dates, and 𝑡, the
earlier date

25.5 Stationarity and Ergodicity

Stationarity and ergodicity are two properties that, when they hold, greatly aid analysis of
linear state space models
Let’s start with the intuition

25.5.1 Visualizing Stability

Let’s look at some more time series from the same model that we analyzed above
This picture shows cross-sectional distributions for 𝑦 at times 𝑇 , 𝑇 ′ , 𝑇 ″
416 25. LINEAR STATE SPACE MODELS

Note how the time series “settle down” in the sense that the distributions at 𝑇 ′ and 𝑇 ″ are
relatively similar to each other — but unlike the distribution at 𝑇
Apparently, the distributions of 𝑦𝑡 converge to a fixed long-run distribution as 𝑡 → ∞
When such a distribution exists it is called a stationary distribution

25.5.2 Stationary Distributions

In our setting, a distribution 𝜓∞ is said to be stationary for 𝑥𝑡 if

𝑥𝑡 ∼ 𝜓∞ and 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 ⟹ 𝑥𝑡+1 ∼ 𝜓∞

Since

1. in the present case, all distributions are Gaussian

2. a Gaussian distribution is pinned down by its mean and variance-covariance matrix

we can restate the definition as follows: 𝜓∞ is stationary for 𝑥𝑡 if

𝜓∞ = 𝑁 (𝜇∞ , Σ∞ )

where 𝜇∞ and Σ∞ are fixed points of Eq. (6) and Eq. (7) respectively
25.5. STATIONARITY AND ERGODICITY 417

25.5.3 Covariance Stationary Processes

Let’s see what happens to the preceding figure if we start 𝑥0 at the stationary distribution

Now the differences in the observed distributions at 𝑇 , 𝑇 ′ and 𝑇 ″ come entirely from random
fluctuations due to the finite sample size
By

• our choosing 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ )

• the definitions of 𝜇∞ and Σ∞ as fixed points of Eq. (6) and Eq. (7) respectively

we’ve ensured that

𝜇𝑡 = 𝜇∞ and Σ𝑡 = Σ∞ for all 𝑡

Moreover, in view of Eq. (14), the autocovariance function takes the form Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞ ,
which depends on 𝑗 but not on 𝑡
This motivates the following definition
A process {𝑥𝑡 } is said to be covariance stationary if

• both 𝜇𝑡 and Σ𝑡 are constant in 𝑡

• Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on time 𝑡

In our setting, {𝑥𝑡 } will be covariance stationary if 𝜇0 , Σ0 , 𝐴, 𝐶 assume values that imply that
none of 𝜇𝑡 , Σ𝑡 , Σ𝑡+𝑗,𝑡 depends on 𝑡

25.5.4 Conditions for Stationarity

The Globally Stable Case

The difference equation 𝜇𝑡+1 = 𝐴𝜇𝑡 is known to have unique fixed point 𝜇∞ = 0 if all eigen-
values of 𝐴 have moduli strictly less than unity
That is, if (np.absolute(np.linalg.eigvals(A)) < 1).all() == True
418 25. LINEAR STATE SPACE MODELS

The difference equation Eq. (7) also has a unique fixed point in this case, and, moreover

𝜇𝑡 → 𝜇∞ = 0 and Σ𝑡 → Σ∞ as 𝑡→∞

regardless of the initial conditions 𝜇0 and Σ0

This is the globally stable case — see these notes for more a theoretical treatment
However, global stability is more than we need for stationary solutions, and often more than
we want
To illustrate, consider our second order difference equation example
′
Here the state is 𝑥𝑡 = [1 𝑦𝑡 𝑦𝑡−1 ]
Because of the constant first component in the state vector, we will never have 𝜇𝑡 → 0
How can we find stationary solutions that respect a constant state component?
Processes with a Constant State Component
To investigate such a process, suppose that 𝐴 and 𝐶 take the form

𝐴1 𝑎 𝐶1
𝐴=[ ] 𝐶=[ ]
0 1 0

where

• 𝐴1 is an (𝑛 − 1) × (𝑛 − 1) matrix
• 𝑎 is an (𝑛 − 1) × 1 column vector

′
Let 𝑥𝑡 = [𝑥′1𝑡 1] where 𝑥1𝑡 is (𝑛 − 1) × 1
It follows that

𝑥1,𝑡+1 = 𝐴1 𝑥1𝑡 + 𝑎 + 𝐶1 𝑤𝑡+1

Let 𝜇1𝑡 = E[𝑥1𝑡 ] and take expectations on both sides of this expression to get

𝜇1,𝑡+1 = 𝐴1 𝜇1,𝑡 + 𝑎 (15)

Assume now that the moduli of the eigenvalues of 𝐴1 are all strictly less than one
Then Eq. (15) has a unique stationary solution, namely,

𝜇1∞ = (𝐼 − 𝐴1 )−1 𝑎

′
The stationary value of 𝜇𝑡 itself is then 𝜇∞ ∶= [𝜇′1∞ 1]
The stationary values of Σ𝑡 and Σ𝑡+𝑗,𝑡 satisfy

Σ∞ = 𝐴Σ∞ 𝐴′ + 𝐶𝐶 ′
(16)
Σ𝑡+𝑗,𝑡 = 𝐴𝑗 Σ∞
25.5. STATIONARITY AND ERGODICITY 419

Notice that here Σ𝑡+𝑗,𝑡 depends on the time gap 𝑗 but not on calendar time 𝑡
In conclusion, if

• 𝑥0 ∼ 𝑁 (𝜇∞ , Σ∞ ) and
• the moduli of the eigenvalues of 𝐴1 are all strictly less than unity

then the {𝑥𝑡 } process is covariance stationary, with constant state component

Note
If the eigenvalues of 𝐴1 are less than unity in modulus, then (a) starting from any
initial value, the mean and variance-covariance matrix both converge to their sta-
tionary values; and (b) iterations on Eq. (7) converge to the fixed point of the dis-
crete Lyapunov equation in the first line of Eq. (16)

25.5.5 Ergodicity

Let’s suppose that we’re working with a covariance stationary process

In this case, we know that the ensemble mean will converge to 𝜇∞ as the sample size 𝐼 ap-
proaches infinity
Averages over Time
Ensemble averages across simulations are interesting theoretically, but in real life, we usually
observe only a single realization {𝑥𝑡 , 𝑦𝑡 }𝑇𝑡=0
So now let’s take a single realization and form the time-series averages

1 𝑇 1 𝑇
𝑥̄ ∶= ∑𝑥 and 𝑦 ̄ ∶= ∑𝑦
𝑇 𝑡=1 𝑡 𝑇 𝑡=1 𝑡

Do these time series averages converge to something interpretable in terms of our basic state-
space representation?
The answer depends on something called ergodicity
Ergodicity is the property that time series and ensemble averages coincide
More formally, ergodicity implies that time series sample averages converge to their expecta-
tion under the stationary distribution
In particular,

1 𝑇
• 𝑇 ∑𝑡=1 𝑥𝑡 → 𝜇∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → Σ∞
1 𝑇
• 𝑇 ∑𝑡=1 (𝑥𝑡+𝑗 − 𝑥𝑇̄ )(𝑥𝑡 − 𝑥𝑇̄ )′ → 𝐴𝑗 Σ∞

In our linear Gaussian setting, any covariance stationary process is also ergodic
420 25. LINEAR STATE SPACE MODELS

25.6 Noisy Observations

In some settings, the observation equation 𝑦𝑡 = 𝐺𝑥𝑡 is modified to include an error term
Often this error term represents the idea that the true state can only be observed imperfectly
To include an error term in the observation we introduce

• An IID sequence of ℓ × 1 random vectors 𝑣𝑡 ∼ 𝑁 (0, 𝐼)

• A 𝑘 × ℓ matrix 𝐻

and extend the linear state-space system to

𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1

𝑦𝑡 = 𝐺𝑥𝑡 + 𝐻𝑣𝑡 (17)
𝑥0 ∼ 𝑁 (𝜇0 , Σ0 )

The sequence {𝑣𝑡 } is assumed to be independent of {𝑤𝑡 }

The process {𝑥𝑡 } is not modified by noise in the observation equation and its moments, distri-
butions and stability properties remain the same
The unconditional moments of 𝑦𝑡 from Eq. (8) and Eq. (9) now become

E[𝑦𝑡 ] = E[𝐺𝑥𝑡 + 𝐻𝑣𝑡 ] = 𝐺𝜇𝑡 (18)

The variance-covariance matrix of 𝑦𝑡 is easily shown to be

Var[𝑦𝑡 ] = Var[𝐺𝑥𝑡 + 𝐻𝑣𝑡 ] = 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ (19)

The distribution of 𝑦𝑡 is therefore

𝑦𝑡 ∼ 𝑁 (𝐺𝜇𝑡 , 𝐺Σ𝑡 𝐺′ + 𝐻𝐻 ′ )

25.7 Prediction

The theory of prediction for linear state space systems is elegant and simple

25.7.1 Forecasting Formulas – Conditional Means

The natural way to predict variables is to use conditional distributions

For example, the optimal forecast of 𝑥𝑡+1 given information known at time 𝑡 is

E𝑡 [𝑥𝑡+1 ] ∶= E[𝑥𝑡+1 ∣ 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0 ] = 𝐴𝑥𝑡

The right-hand side follows from 𝑥𝑡+1 = 𝐴𝑥𝑡 + 𝐶𝑤𝑡+1 and the fact that 𝑤𝑡+1 is zero mean and
independent of 𝑥𝑡 , 𝑥𝑡−1 , … , 𝑥0
That E𝑡 [𝑥𝑡+1 ] = E[𝑥𝑡+1 ∣ 𝑥𝑡 ] is an implication of {𝑥𝑡 } having the Markov property
25.7. PREDICTION 421

The one-step-ahead forecast error is

𝑥𝑡+1 − E𝑡 [𝑥𝑡+1 ] = 𝐶𝑤𝑡+1

The covariance matrix of the forecast error is

E[(𝑥𝑡+1 − E𝑡 [𝑥𝑡+1 ])(𝑥𝑡+1 − E𝑡 [𝑥𝑡+1 ])′ ] = 𝐶𝐶 ′

More generally, we’d like to compute the 𝑗-step ahead forecasts E𝑡 [𝑥𝑡+𝑗 ] and E𝑡 [𝑦𝑡+𝑗 ]
With a bit of algebra, we obtain

𝑥𝑡+𝑗 = 𝐴𝑗 𝑥𝑡 + 𝐴𝑗−1 𝐶𝑤𝑡+1 + 𝐴𝑗−2 𝐶𝑤𝑡+2 + ⋯ + 𝐴0 𝐶𝑤𝑡+𝑗

In view of the IID property, current and past state values provide no information about fu-
ture values of the shock
Hence E𝑡 [𝑤𝑡+𝑘 ] = E[𝑤𝑡+𝑘 ] = 0
It now follows from linearity of expectations that the 𝑗-step ahead forecast of 𝑥 is

E𝑡 [𝑥𝑡+𝑗 ] = 𝐴𝑗 𝑥𝑡

The 𝑗-step ahead forecast of 𝑦 is therefore

E𝑡 [𝑦𝑡+𝑗 ] = E𝑡 [𝐺𝑥𝑡+𝑗 + 𝐻𝑣𝑡+𝑗 ] = 𝐺𝐴𝑗 𝑥𝑡

25.7.2 Covariance of Prediction Errors

It is useful to obtain the covariance matrix of the vector of 𝑗-step-ahead prediction errors

𝑗−1
𝑥𝑡+𝑗 − E𝑡 [𝑥𝑡+𝑗 ] = ∑ 𝐴𝑠 𝐶𝑤𝑡−𝑠+𝑗 (20)
𝑠=0

Evidently,

𝑗−1
′
𝑉𝑗 ∶= E𝑡 [(𝑥𝑡+𝑗 − E𝑡 [𝑥𝑡+𝑗 ])(𝑥𝑡+𝑗 − E𝑡 [𝑥𝑡+𝑗 ]) ] = ∑ 𝐴𝑘 𝐶𝐶 ′ 𝐴𝑘
′
(21)
𝑘=0

𝑉𝑗 defined in Eq. (21) can be calculated recursively via 𝑉1 = 𝐶𝐶 ′ and

𝑉𝑗 = 𝐶𝐶 ′ + 𝐴𝑉𝑗−1 𝐴′ , 𝑗≥2 (22)

𝑉𝑗 is the conditional covariance matrix of the errors in forecasting 𝑥𝑡+𝑗 , conditioned on time 𝑡
information 𝑥𝑡
Under particular conditions, 𝑉𝑗 converges to

𝑉∞ = 𝐶𝐶 ′ + 𝐴𝑉∞ 𝐴′ (23)
422 25. LINEAR STATE SPACE MODELS

Equation Eq. (23) is an example of a discrete Lyapunov equation in the covariance matrix 𝑉∞
A sufficient condition for 𝑉𝑗 to converge is that the eigenvalues of 𝐴 be strictly less than one
in modulus
Weaker sufficient conditions for convergence associate eigenvalues equaling or exceeding one
in modulus with elements of 𝐶 that equal 0

25.7.3 Forecasts of Geometric Sums

In several contexts, we want to compute forecasts of geometric sums of future random vari-
ables governed by the linear state-space system Eq. (1)
We want the following objects

∞
• Forecast of a geometric sum of future 𝑥’s, or E𝑡 [∑𝑗=0 𝛽 𝑗 𝑥𝑡+𝑗 ]
∞
• Forecast of a geometric sum of future 𝑦’s, or E𝑡 [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 ]

These objects are important components of some famous and interesting dynamic models
For example,

∞
• if {𝑦𝑡 } is a stream of dividends, then E [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of a stock price
∞
• if {𝑦𝑡 } is the money supply, then E [∑𝑗=0 𝛽 𝑗 𝑦𝑡+𝑗 |𝑥𝑡 ] is a model of the price level

Formulas
Fortunately, it is easy to use a little matrix algebra to compute these objects
1
Suppose that every eigenvalue of 𝐴 has modulus strictly less than 𝛽
−1
It then follows that 𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ = [𝐼 − 𝛽𝐴]
This leads to our formulas:

• Forecast of a geometric sum of future 𝑥’s

∞
E𝑡 [∑ 𝛽 𝑗 𝑥𝑡+𝑗 ] = [𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = [𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

• Forecast of a geometric sum of future 𝑦’s

∞
E𝑡 [∑ 𝛽 𝑗 𝑦𝑡+𝑗 ] = 𝐺[𝐼 + 𝛽𝐴 + 𝛽 2 𝐴2 + ⋯ ]𝑥𝑡 = 𝐺[𝐼 − 𝛽𝐴]−1 𝑥𝑡
𝑗=0

25.8 Code

Our preceding simulations and calculations are based on code in the file lss.py from the
QuantEcon.py package
25.9. EXERCISES 423

The code implements a class for handling linear state space models (simulations, calculating
moments, etc.)
One Python construct you might not be familiar with is the use of a generator function in the
method moment_sequence()
Go back and read the relevant documentation if you’ve forgotten how generator functions
work
Examples of usage are given in the solutions to the exercises

25.9 Exercises

25.9.1 Exercise 1

Replicate this figure using the LinearStateSpace class from lss.py

25.9.2 Exercise 2

Replicate this figure modulo randomness using the same class

25.9.3 Exercise 3

Replicate this figure modulo randomness using the same class

The state space model and parameters are the same as for the preceding exercise

25.9.4 Exercise 4

Replicate this figure modulo randomness using the same class

The state space model and parameters are the same as for the preceding exercise, except that
the initial condition is the stationary distribution
Hint: You can use the stationary_distributions method to get the initial conditions
The number of sample paths is 80, and the time horizon in the figure is 100
Producing the vertical bars and dots is optional, but if you wish to try, the bars are at dates
10, 50 and 75

25.10 Solutions
In [2]: import numpy as np
import matplotlib.pyplot as plt
from quantecon import LinearStateSpace

25.10.1 Exercise 1
In [3]: �_0, �_1, �_2 = 1.1, 0.8, -0.8

A = [[1, 0, 0 ],
424 25. LINEAR STATE SPACE MODELS

[�_0, �_1, �_2],

[0, 1, 0 ]]
C = np.zeros((3, 1))
G = [0, 1, 0]

ar = LinearStateSpace(A, C, G, mu_0=np.ones(3))
x, y = ar.simulate(ts_length=50)

fig, ax = plt.subplots(figsize=(10, 6))

y = y.flatten()
ax.plot(y, 'b-', lw=2, alpha=0.7)
ax.grid()
ax.set_xlabel('time')
ax.set_ylabel('$y_t$', fontsize=16)
plt.show()

25.10.2 Exercise 2
In [4]: �_1, �_2, �_3, �_4 = 0.5, -0.2, 0, 0.5
σ = 0.2

A = [[�_1, �_2, �_3, �_4],

[1, 0, 0, 0 ],
[0, 1, 0, 0 ],
[0, 0, 1, 0 ]]
C = [[σ],
[0],
[0],
[0]]
G = [1, 0, 0, 0]

ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
x, y = ar.simulate(ts_length=200)

fig, ax = plt.subplots(figsize=(10, 6))

y = y.flatten()
ax.plot(y, 'b-', lw=2, alpha=0.7)
ax.grid()
ax.set_xlabel('time')
ax.set_ylabel('$y_t$', fontsize=16)
plt.show()
25.10. SOLUTIONS 425

25.10.3 Exercise 3
In [5]: from scipy.stats import norm
import random

�_1, �_2, �_3, �_4 = 0.5, -0.2, 0, 0.5

σ = 0.1

A = [[�_1, �_2, �_3, �_4],

[1, 0, 0, 0 ],
[0, 1, 0, 0 ],
[0, 0, 1, 0 ]]
C = [[σ],
[0],
[0],
[0]]
G = [1, 0, 0, 0]

I = 20
T = 50
ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
ymin, ymax = -0.5, 1.15

fig, ax = plt.subplots(figsize=(8, 5))

ax.set_ylim(ymin, ymax)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel('$y_t$', fontsize=16)

ensemble_mean = np.zeros(T)
for i in range(I):
x, y = ar.simulate(ts_length=T)
y = y.flatten()
ax.plot(y, 'c-', lw=0.8, alpha=0.5)
ensemble_mean = ensemble_mean + y

ensemble_mean = ensemble_mean / I
ax.plot(ensemble_mean, color='b', lw=2, alpha=0.8, label='$\\bar y_t$')

m = ar.moment_sequence()
population_means = []
426 25. LINEAR STATE SPACE MODELS

for t in range(T):
μ_x, μ_y, Σ_x, Σ_y = next(m)
population_means.append(float(μ_y))
ax.plot(population_means, color='g', lw=2, alpha=0.8, label='$G\mu_t$')
ax.legend(ncol=2)
plt.show()

25.10.4 Exercise 4
In [6]: �_1, �_2, �_3, �_4 = 0.5, -0.2, 0, 0.5
σ = 0.1

A = [[�_1, �_2, �_3, �_4],

[1, 0, 0, 0 ],
[0, 1, 0, 0 ],
[0, 0, 1, 0 ]]
C = [[σ],
[0],
[0],
[0]]
G = [1, 0, 0, 0]

T0 = 10
T1 = 50
T2 = 75
T4 = 100

ar = LinearStateSpace(A, C, G, mu_0=np.ones(4), Sigma_0=Σ_x)

ymin, ymax = -0.6, 0.6

fig, ax = plt.subplots(figsize=(8, 5))

ax.grid(alpha=0.4)
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$y_t$', fontsize=16)
ax.vlines((T0, T1, T2), -1.5, 1.5)

ax.set_xticks((T0, T1, T2))

ax.set_xticklabels(("$T$", "$T'$", "$T''$"), fontsize=14)
25.10. SOLUTIONS 427

μ_x, μ_y, Σ_x, Σ_y = ar.stationary_distributions()

ar.mu_0 = μ_x
ar.Sigma_0 = Σ_x

for i in range(80):
rcolor = random.choice(('c', 'g', 'b'))
x, y = ar.simulate(ts_length=T4)
y = y.flatten()
ax.plot(y, color=rcolor, lw=0.8, alpha=0.5)
ax.plot((T0, T1, T2), (y[T0], y[T1], y[T2],), 'ko', alpha=0.5)
plt.show()

Footnotes
[1] The eigenvalues of 𝐴 are (1, −1, 𝑖, −𝑖).
[2] The correct way to argue this is by induction. Suppose that 𝑥𝑡 is Gaussian. Then Eq. (1)
and Eq. (10) imply that 𝑥𝑡+1 is Gaussian. Since 𝑥0 is assumed to be Gaussian, it follows that
every 𝑥𝑡 is Gaussian. Evidently, this implies that each 𝑦𝑡 is Gaussian.
428 25. LINEAR STATE SPACE MODELS
26

Finite Markov Chains

26.1 Contents

• Overview 26.2

• Definitions 26.3

• Simulation 26.4

• Marginal Distributions 26.5

• Irreducibility and Aperiodicity 26.6

• Stationary Distributions 26.7

• Ergodicity 26.8

• Computing Expectations 26.9

• Exercises 26.10

• Solutions 26.11

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

26.2 Overview

Markov chains are one of the most useful classes of stochastic processes, being

• simple, flexible and supported by many elegant theoretical results

• valuable for building intuition about random dynamic models
• central to quantitative modeling in their own right

You will find them in many of the workhorse models of economics and finance
In this lecture, we review some of the theory of Markov chains

429
430 26. FINITE MARKOV CHAINS

We will also introduce some of the high-quality routines for working with Markov chains
available in QuantEcon.py
Prerequisite knowledge is basic probability and linear algebra

26.3 Definitions

The following concepts are fundamental

26.3.1 Stochastic Matrices

A stochastic matrix (or Markov matrix) is an 𝑛 × 𝑛 square matrix 𝑃 such that

1. each element of 𝑃 is nonnegative, and

2. each row of 𝑃 sums to one

Each row of 𝑃 can be regarded as a probability mass function over 𝑛 possible outcomes
It is too not difficult to check [1] that if 𝑃 is a stochastic matrix, then so is the 𝑘-th power 𝑃 𝑘
for all 𝑘 ∈ N

26.3.2 Markov Chains

There is a close connection between stochastic matrices and Markov chains

To begin, let 𝑆 be a finite set with 𝑛 elements {𝑥1 , … , 𝑥𝑛 }
The set 𝑆 is called the state space and 𝑥1 , … , 𝑥𝑛 are the state values
A Markov chain {𝑋𝑡 } on 𝑆 is a sequence of random variables on 𝑆 that have the Markov
property
This means that, for any date 𝑡 and any state 𝑦 ∈ 𝑆,

P{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 } = P{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 , 𝑋𝑡−1 , …} (1)

In other words, knowing the current state is enough to know probabilities for future states
In particular, the dynamics of a Markov chain are fully determined by the set of values

𝑃 (𝑥, 𝑦) ∶= P{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥} (𝑥, 𝑦 ∈ 𝑆) (2)

By construction,

• 𝑃 (𝑥, 𝑦) is the probability of going from 𝑥 to 𝑦 in one unit of time (one step)
• 𝑃 (𝑥, ⋅) is the conditional distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑥

We can view 𝑃 as a stochastic matrix where

𝑃𝑖𝑗 = 𝑃 (𝑥𝑖 , 𝑥𝑗 ) 1 ≤ 𝑖, 𝑗 ≤ 𝑛
26.3. DEFINITIONS 431

Going the other way, if we take a stochastic matrix 𝑃 , we can generate a Markov chain {𝑋𝑡 }
as follows:

• draw 𝑋0 from some specified distribution

• for each 𝑡 = 0, 1, …, draw 𝑋𝑡+1 from 𝑃 (𝑋𝑡 , ⋅)

By construction, the resulting process satisfies Eq. (2)

26.3.3 Example 1

Consider a worker who, at any given time 𝑡, is either unemployed (state 0) or employed (state
1)
Suppose that, over a one month period,

1. An unemployed worker finds a job with probability 𝛼 ∈ (0, 1)

2. An employed worker loses her job and becomes unemployed with probability 𝛽 ∈ (0, 1)

In terms of a Markov model, we have

• 𝑆 = {0, 1}
• 𝑃 (0, 1) = 𝛼 and 𝑃 (1, 0) = 𝛽

We can write out the transition probabilities in matrix form as

1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽

Once we have the values 𝛼 and 𝛽, we can address a range of questions, such as

• What is the average duration of unemployment?

• Over the long-run, what fraction of time does a worker find herself unemployed?
• Conditional on employment, what is the probability of becoming unemployed at least
once over the next 12 months?

We’ll cover such applications below

26.3.4 Example 2

Using US unemployment data, Hamilton [51] estimated the stochastic matrix

0.971 0.029 0
𝑃 =⎛
⎜ 0.145 0.778 0.077 ⎞
⎟
⎝ 0 0.508 0.492 ⎠

where

• the frequency is monthly

432 26. FINITE MARKOV CHAINS

• the first state represents “normal growth”

• the second state represents “mild recession”
• the third state represents “severe recession”

For example, the matrix tells us that when the state is normal growth, the state will again be
normal growth next month with probability 0.97
In general, large values on the main diagonal indicate persistence in the process {𝑋𝑡 }
This Markov process can also be represented as a directed graph, with edges labeled by tran-
sition probabilities

Here “ng” is normal growth, “mr” is mild recession, etc.

26.4 Simulation

One natural way to answer questions about Markov chains is to simulate them
(To approximate the probability of event 𝐸, we can simulate many times and count the frac-
tion of times that 𝐸 occurs)
Nice functionality for simulating Markov chains exists in QuantEcon.py

• Efficient, bundled with lots of other useful routines for handling Markov chains

However, it’s also a good exercise to roll our own routines — let’s do that first and then come
back to the methods in QuantEcon.py
In these exercises, we’ll take the state space to be 𝑆 = 0, … , 𝑛 − 1

26.4.1 Rolling Our Own

To simulate a Markov chain, we need its stochastic matrix 𝑃 and either an initial state or a
probability distribution 𝜓 for initial state to be drawn from
The Markov chain is then constructed as discussed above. To repeat:

1. At time 𝑡 = 0, the 𝑋0 is set to some fixed state or chosen from 𝜓

2. At each subsequent time 𝑡, the new state 𝑋𝑡+1 is drawn from 𝑃 (𝑋𝑡 , ⋅)

In order to implement this simulation procedure, we need a method for generating draws from
a discrete distribution
For this task, we’ll use DiscreteRV from QuantEcon
26.4. SIMULATION 433

In [2]: import quantecon as qe

import numpy as np

ψ = (0.1, 0.9) # Probabilities over sample space {0, 1}

cdf = np.cumsum(ψ)
qe.random.draw(cdf, 5) # Generate 5 independent draws from ψ

Out[2]: array([1, 1, 1, 1, 1])

We’ll write our code as a function that takes the following three arguments

• A stochastic matrix P
• An initial state init
• A positive integer sample_size representing the length of the time series the function
should return

In [3]: def mc_sample_path(P, init=0, sample_size=1000):

# === make sure P is a NumPy array === #
P = np.asarray(P)
# === allocate memory === #
X = np.empty(sample_size, dtype=int)
X[0] = init
# === convert each row of P into a distribution === #
# In particular, P_dist[i] = the distribution corresponding to P[i, :]
n = len(P)
P_dist = [np.cumsum(P[i, :]) for i in range(n)]

# === generate the sample path === #

for t in range(sample_size - 1):
X[t+1] = qe.random.draw(P_dist[X[t]])

return X

Let’s see how it works using the small matrix

0.4 0.6
𝑃 ∶= ( ) (3)
0.2 0.8

As we’ll see later, for a long series drawn from P, the fraction of the sample that takes value 0
will be about 0.25
If you run the following code you should get roughly that answer

In [4]: P = [[0.4, 0.6], [0.2, 0.8]]

X = mc_sample_path(P, sample_size=100000)
np.mean(X == 0)

Out[4]: 0.25109

26.4.2 Using QuantEcon’s Routines

As discussed above, QuantEcon.py has routines for handling Markov chains, including simula-
tion
Here’s an illustration using the same P as the preceding example

In [5]: P = [[0.4, 0.6], [0.2, 0.8]]

mc = qe.MarkovChain(P)
X = mc.simulate(ts_length=1000000)
np.mean(X == 0)
434 26. FINITE MARKOV CHAINS

Out[5]: 0.249741

In fact the QuantEcon.py routine is JIT compiled and much faster

(Because it’s JIT compiled the first run takes a bit longer — the function has to be compiled
and stored in memory)

In [6]: %timeit mc_sample_path(P, sample_size=1000000) # our version

678 ms ± 9.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [7]: %timeit mc.simulate(ts_length=1000000) # qe version

30.2 ms ± 396 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Adding State Values and Initial Conditions

If we wish to, we can provide a specification of state values to MarkovChain
These state values can be integers, floats, or even strings
The following code illustrates

In [8]: mc = qe.MarkovChain(P, state_values=('unemployed', 'employed'))

mc.simulate(ts_length=4, init='employed')

Out[8]: array(['employed', 'employed', 'employed', 'employed'], dtype='<U10')

In [9]: mc.simulate(ts_length=4, init='unemployed')

Out[9]: array(['unemployed', 'employed', 'employed', 'employed'], dtype='<U10')

In [10]: mc.simulate(ts_length=4) # Start at randomly chosen initial state

Out[10]: array(['unemployed', 'employed', 'unemployed', 'employed'], dtype='<U10')

If we want to simulate with output as indices rather than state values we can use

In [11]: mc.simulate_indices(ts_length=4)

Out[11]: array([1, 1, 1, 1])

26.5 Marginal Distributions

Suppose that

1. {𝑋𝑡 } is a Markov chain with stochastic matrix 𝑃

2. the distribution of 𝑋𝑡 is known to be 𝜓𝑡

What then is the distribution of 𝑋𝑡+1 , or, more generally, of 𝑋𝑡+𝑚 ?

26.5. MARGINAL DISTRIBUTIONS 435

26.5.1 Solution

Let 𝜓𝑡 be the distribution of 𝑋𝑡 for 𝑡 = 0, 1, 2, …

Our first aim is to find 𝜓𝑡+1 given 𝜓𝑡 and 𝑃
To begin, pick any 𝑦 ∈ 𝑆
Using the law of total probability, we can decompose the probability that 𝑋𝑡+1 = 𝑦 as follows:

P{𝑋𝑡+1 = 𝑦} = ∑ P{𝑋𝑡+1 = 𝑦 | 𝑋𝑡 = 𝑥} ⋅ P{𝑋𝑡 = 𝑥}

𝑥∈𝑆

In words, to get the probability of being at 𝑦 tomorrow, we account for all ways this can hap-
pen and sum their probabilities
Rewriting this statement in terms of marginal and conditional probabilities gives
>
𝜓𝑡+1 (𝑦) = ∑ 𝑃 (𝑥, 𝑦)𝜓𝑡 (𝑥)
𝑥∈𝑆

There are 𝑛 such equations, one for each 𝑦 ∈ 𝑆

If we think of 𝜓𝑡+1 and 𝜓𝑡 as row vectors (as is traditional in this literature), these 𝑛 equa-
tions are summarized by the matrix expression

𝜓𝑡+1 = 𝜓𝑡 𝑃 (4)

In other words, to move the distribution forward one unit of time, we postmultiply by 𝑃
By repeating this 𝑚 times we move forward 𝑚 steps into the future
Hence, iterating on Eq. (4), the expression 𝜓𝑡+𝑚 = 𝜓𝑡 𝑃 𝑚 is also valid — here 𝑃 𝑚 is the 𝑚-th
power of 𝑃
As a special case, we see that if 𝜓0 is the initial distribution from which 𝑋0 is drawn, then
𝜓0 𝑃 𝑚 is the distribution of 𝑋𝑚
This is very important, so let’s repeat it

𝑋0 ∼ 𝜓 0 ⟹ 𝑋𝑚 ∼ 𝜓0 𝑃 𝑚 (5)

and, more generally,

𝑋𝑡 ∼ 𝜓𝑡 ⟹ 𝑋𝑡+𝑚 ∼ 𝜓𝑡 𝑃 𝑚 (6)

26.5.2 Multiple Step Transition Probabilities

We know that the probability of transitioning from 𝑥 to 𝑦 in one step is 𝑃 (𝑥, 𝑦)

It turns out that the probability of transitioning from 𝑥 to 𝑦 in 𝑚 steps is 𝑃 𝑚 (𝑥, 𝑦), the
(𝑥, 𝑦)-th element of the 𝑚-th power of 𝑃
To see why, consider again Eq. (6), but now with 𝜓𝑡 putting all probability on state 𝑥
436 26. FINITE MARKOV CHAINS

• 1 in the 𝑥-th position and zero elsewhere

Inserting this into Eq. (6), we see that, conditional on 𝑋𝑡 = 𝑥, the distribution of 𝑋𝑡+𝑚 is the
𝑥-th row of 𝑃 𝑚
In particular

P{𝑋𝑡+𝑚 = 𝑦} = 𝑃 𝑚 (𝑥, 𝑦) = (𝑥, 𝑦)-th element of 𝑃 𝑚

26.5.3 Example: Probability of Recession

Recall the stochastic matrix 𝑃 for recession and growth considered above
Suppose that the current state is unknown — perhaps statistics are available only at the end
of the current month
We estimate the probability that the economy is in state 𝑥 to be 𝜓(𝑥)
The probability of being in recession (either mild or severe) in 6 months time is given by the
inner product

0
𝜓𝑃 6 ⋅ ⎛
⎜ 1 ⎞
⎟
⎝ 1 ⎠

26.5.4 Example 2: Cross-Sectional Distributions

The marginal distributions we have been studying can be viewed either as probabilities or as
cross-sectional frequencies in large samples
To illustrate, recall our model of employment/unemployment dynamics for a given worker
discussed above
Consider a large (i.e., tending to infinite) population of workers, each of whose lifetime expe-
rience is described by the specified dynamics, independent of one another
Let 𝜓 be the current cross-sectional distribution over {0, 1}

• For example, 𝜓(0) is the unemployment rate

The cross-sectional distribution records the fractions of workers employed and unemployed at
a given moment
The same distribution also describes the fractions of a particular worker’s career spent being
employed and unemployed, respectively

26.6 Irreducibility and Aperiodicity

Irreducibility and aperiodicity are central concepts of modern Markov chain theory
Let’s see what they’re about
26.6. IRREDUCIBILITY AND APERIODICITY 437

26.6.1 Irreducibility

Let 𝑃 be a fixed stochastic matrix

Two states 𝑥 and 𝑦 are said to communicate with each other if there exist positive integers
𝑗 and 𝑘 such that

𝑃 𝑗 (𝑥, 𝑦) > 0 and 𝑃 𝑘 (𝑦, 𝑥) > 0

In view of our discussion above, this means precisely that

• state 𝑥 can be reached eventually from state 𝑦, and

• state 𝑦 can be reached eventually from state 𝑥

The stochastic matrix 𝑃 is called irreducible if all states communicate; that is, if 𝑥 and 𝑦
communicate for all (𝑥, 𝑦) in 𝑆 × 𝑆
For example, consider the following transition probabilities for wealth of a fictitious set of
households

We can translate this into a stochastic matrix, putting zeros where there’s no edge between
nodes

0.9 0.1 0
𝑃 ∶= ⎛
⎜ 0.4 0.4 0.2 ⎞
⎟
⎝ 0.1 0.1 0.8 ⎠

It’s clear from the graph that this stochastic matrix is irreducible: we can reach any state
from any other state eventually
We can also test this using QuantEcon.py’s MarkovChain class

In [12]: P = [[0.9, 0.1, 0.0],

[0.4, 0.4, 0.2],
[0.1, 0.1, 0.8]]

mc = qe.MarkovChain(P, ('poor', 'middle', 'rich'))

mc.is_irreducible

Out[12]: True

Here’s a more pessimistic scenario, where the poor are poor forever
438 26. FINITE MARKOV CHAINS

This stochastic matrix is not irreducible, since, for example, rich is not accessible from poor
Let’s confirm this

In [13]: P = [[1.0, 0.0, 0.0],

[0.1, 0.8, 0.1],
[0.0, 0.2, 0.8]]

mc = qe.MarkovChain(P, ('poor', 'middle', 'rich'))

mc.is_irreducible

Out[13]: False

We can also determine the “communication classes”

In [14]: mc.communication_classes

Out[14]: [array(['poor'], dtype='<U6'), array(['middle', 'rich'], dtype='<U6')]

It might be clear to you already that irreducibility is going to be important in terms of long
run outcomes
For example, poverty is a life sentence in the second graph but not the first
We’ll come back to this a bit later

26.6.2 Aperiodicity

Loosely speaking, a Markov chain is called periodic if it cycles in a predictible way, and aperi-
odic otherwise
Here’s a trivial example with three states

The chain cycles with period 3:

26.7. STATIONARY DISTRIBUTIONS 439

In [15]: P = [[0, 1, 0],

[0, 0, 1],
[1, 0, 0]]

mc = qe.MarkovChain(P)
mc.period

Out[15]: 3

More formally, the period of a state 𝑥 is the greatest common divisor of the set of integers

𝐷(𝑥) ∶= {𝑗 ≥ 1 ∶ 𝑃 𝑗 (𝑥, 𝑥) > 0}

In the last example, 𝐷(𝑥) = {3, 6, 9, …} for every state 𝑥, so the period is 3
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic other-
wise
For example, the stochastic matrix associated with the transition probabilities below is peri-
odic because, for example, state 𝑎 has period 2

We can confirm that the stochastic matrix is periodic as follows

In [16]: P = [[0.0, 1.0, 0.0, 0.0],

[0.5, 0.0, 0.5, 0.0],
[0.0, 0.5, 0.0, 0.5],
[0.0, 0.0, 1.0, 0.0]]

mc = qe.MarkovChain(P)
mc.period

Out[16]: 2

In [17]: mc.is_aperiodic

Out[17]: False

26.7 Stationary Distributions

As seen in Eq. (4), we can shift probabilities forward one unit of time via postmultiplication
by 𝑃
Some distributions are invariant under this updating process — for example,

In [18]: P = np.array([[.4, .6], [.2, .8]])

ψ = (0.25, 0.75)
ψ @ P

Out[18]: array([0.25, 0.75])

Such distributions are called stationary, or invariant

Formally, a distribution 𝜓∗ on 𝑆 is called stationary for 𝑃 if 𝜓∗ = 𝜓∗ 𝑃
440 26. FINITE MARKOV CHAINS

From this equality, we immediately get 𝜓∗ = 𝜓∗ 𝑃 𝑡 for all 𝑡

This tells us an important fact: If the distribution of 𝑋0 is a stationary distribution, then 𝑋𝑡
will have this same distribution for all 𝑡
Hence stationary distributions have a natural interpretation as stochastic steady states —
we’ll discuss this more in just a moment
Mathematically, a stationary distribution is a fixed point of 𝑃 when 𝑃 is thought of as the
map 𝜓 ↦ 𝜓𝑃 from (row) vectors to (row) vectors
Theorem. Every stochastic matrix 𝑃 has at least one stationary distribution
(We are assuming here that the state space 𝑆 is finite; if not more assumptions are required)
For proof of this result, you can apply Brouwer’s fixed point theorem, or see EDTC, theorem
4.3.5
There may in fact be many stationary distributions corresponding to a given stochastic ma-
trix 𝑃

• For example, if 𝑃 is the identity matrix, then all distributions are stationary

Since stationary distributions are long run equilibria, to get uniqueness we require that initial
conditions are not infinitely persistent
Infinite persistence of initial conditions occurs if certain regions of the state space cannot be
accessed from other regions, which is the opposite of irreducibility
This gives some intuition for the following fundamental theorem
Theorem. If 𝑃 is both aperiodic and irreducible, then

1. 𝑃 has exactly one stationary distribution 𝜓∗

2. For any initial distribution 𝜓0 , we have ‖𝜓0 𝑃 𝑡 − 𝜓∗ ‖ → 0 as 𝑡 → ∞

For a proof, see, for example, theorem 5.2 of [47]

(Note that part 1 of the theorem requires only irreducibility, whereas part 2 requires both
irreducibility and aperiodicity)
A stochastic matrix satisfying the conditions of the theorem is sometimes called uniformly
ergodic
One easy sufficient condition for aperiodicity and irreducibility is that every element of 𝑃 is
strictly positive

• Try to convince yourself of this

26.7.1 Example

Recall our model of employment/unemployment dynamics for a given worker discussed above
Assuming 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), the uniform ergodicity condition is satisfied
Let 𝜓∗ = (𝑝, 1 − 𝑝) be the stationary distribution, so that 𝑝 corresponds to unemployment
(state 0)
26.7. STATIONARY DISTRIBUTIONS 441

Using 𝜓∗ = 𝜓∗ 𝑃 and a bit of algebra yields

𝛽
𝑝=
𝛼+𝛽

This is, in some sense, a steady state probability of unemployment — more on interpretation
below
Not surprisingly it tends to zero as 𝛽 → 0, and to one as 𝛼 → 0

26.7.2 Calculating Stationary Distributions

As discussed above, a given Markov matrix 𝑃 can have many stationary distributions
That is, there can be many row vectors 𝜓 such that 𝜓 = 𝜓𝑃
In fact if 𝑃 has two distinct stationary distributions 𝜓1 , 𝜓2 then it has infinitely many, since
in this case, as you can verify,

𝜓3 ∶= 𝜆𝜓1 + (1 − 𝜆)𝜓2

is a stationary distribution for 𝑃 for any 𝜆 ∈ [0, 1]

If we restrict attention to the case where only one stationary distribution exists, one option
for finding it is to try to solve the linear system 𝜓(𝐼𝑛 − 𝑃 ) = 0 for 𝜓, where 𝐼𝑛 is the 𝑛 × 𝑛
identity
But the zero vector solves this equation
Hence we need to impose the restriction that the solution must be a probability distribution
A suitable algorithm is implemented in QuantEcon.py — the next code block illustrates

In [19]: P = [[0.4, 0.6], [0.2, 0.8]]

mc = qe.MarkovChain(P)
mc.stationary_distributions # Show all stationary distributions

Out[19]: array([[0.25, 0.75]])

The stationary distribution is unique

26.7.3 Convergence to Stationarity

Part 2 of the Markov chain convergence theorem stated above tells us that the distribution of
𝑋𝑡 converges to the stationary distribution regardless of where we start off
This adds considerable weight to our interpretation of 𝜓∗ as a stochastic steady state
The convergence in the theorem is illustrated in the next figure

In [20]: from mpl_toolkits.mplot3d import Axes3D

import matplotlib.pyplot as plt
%matplotlib inline

P = ((0.971, 0.029, 0.000),

(0.145, 0.778, 0.077),
442 26. FINITE MARKOV CHAINS

(0.000, 0.508, 0.492))

P = np.array(P)

ψ = (0.0, 0.2, 0.8) # Initial condition

fig = plt.figure(figsize=(8, 6))

ax = fig.add_subplot(111, projection='3d')

ax.set(xlim=(0, 1), ylim=(0, 1), zlim=(0, 1),

xticks=(0.25, 0.5, 0.75),
yticks=(0.25, 0.5, 0.75),
zticks=(0.25, 0.5, 0.75))

x_vals, y_vals, z_vals = [], [], []

for t in range(20):
x_vals.append(ψ[0])
y_vals.append(ψ[1])
z_vals.append(ψ[2])
ψ = ψ @ P

ax.scatter(x_vals, y_vals, z_vals, c='r', s=60)

ax.view_init(30, 210)

mc = qe.MarkovChain(P)
ψ_star = mc.stationary_distributions[0]
ax.scatter(ψ_star[0], ψ_star[1], ψ_star[2], c='k', s=60)

plt.show()

Here

• 𝑃 is the stochastic matrix for recession and growth considered above

• The highest red dot is an arbitrarily chosen initial probability distribution 𝜓, repre-
sented as a vector in R3
26.8. ERGODICITY 443

• The other red dots are the distributions 𝜓𝑃 𝑡 for 𝑡 = 1, 2, …

• The black dot is 𝜓∗

The code for the figure can be found here — you might like to try experimenting with differ-
ent initial conditions

26.8 Ergodicity

Under irreducibility, yet another important result obtains: For all 𝑥 ∈ 𝑆,

1 𝑚
∑ 1{𝑋𝑡 = 𝑥} → 𝜓∗ (𝑥) as 𝑚 → ∞ (7)
𝑛 𝑡=1

Here

• 1{𝑋𝑡 = 𝑥} = 1 if 𝑋𝑡 = 𝑥 and zero otherwise

• convergence is with probability one
• the result does not depend on the distribution (or value) of 𝑋0

The result tells us that the fraction of time the chain spends at state 𝑥 converges to 𝜓∗ (𝑥) as
time goes to infinity
This gives us another way to interpret the stationary distribution — provided that the con-
vergence result in Eq. (7) is valid
The convergence in Eq. (7) is a special case of a law of large numbers result for Markov
chains — see EDTC, section 4.3.4 for some additional information

26.8.1 Example

Recall our cross-sectional interpretation of the employment/unemployment model discussed

above
Assume that 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), so that irreducibility and aperiodicity both hold
We saw that the stationary distribution is (𝑝, 1 − 𝑝), where

𝛽
𝑝=
𝛼+𝛽

In the cross-sectional interpretation, this is the fraction of people unemployed

In view of our latest (ergodicity) result, it is also the fraction of time that a worker can ex-
pect to spend unemployed
Thus, in the long-run, cross-sectional averages for a population and time-series averages for a
given person coincide
This is one interpretation of the notion of ergodicity
444 26. FINITE MARKOV CHAINS

26.9 Computing Expectations

We are interested in computing expectations of the form

E[ℎ(𝑋𝑡 )] (8)

and conditional expectations such as

E[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] (9)

where

• {𝑋𝑡 } is a Markov chain generated by 𝑛 × 𝑛 stochastic matrix 𝑃

• ℎ is a given function, which, in expressions involving matrix algebra, we’ll think of as
the column vector

ℎ(𝑥1 )
ℎ=⎛
⎜ ⋮ ⎞
⎟
⎝ ℎ(𝑥𝑛 ) ⎠

The unconditional expectation Eq. (8) is easy: We just sum over the distribution of 𝑋𝑡 to get

E[ℎ(𝑋𝑡 )] = ∑(𝜓𝑃 𝑡 )(𝑥)ℎ(𝑥)

𝑥∈𝑆

Here 𝜓 is the distribution of 𝑋0

Since 𝜓 and hence 𝜓𝑃 𝑡 are row vectors, we can also write this as

E[ℎ(𝑋𝑡 )] = 𝜓𝑃 𝑡 ℎ

For the conditional expectation Eq. (9), we need to sum over the conditional distribution of
𝑋𝑡+𝑘 given 𝑋𝑡 = 𝑥
We already know that this is 𝑃 𝑘 (𝑥, ⋅), so

E[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] = (𝑃 𝑘 ℎ)(𝑥) (10)

The vector 𝑃 𝑘 ℎ stores the conditional expectation E[ℎ(𝑋𝑡+𝑘 ) ∣ 𝑋𝑡 = 𝑥] over all 𝑥

26.9.1 Expectations of Geometric Sums

Sometimes we also want to compute expectations of a geometric sum, such as ∑𝑡 𝛽 𝑡 ℎ(𝑋𝑡 )

In view of the preceding discussion, this is

∞
E [∑ 𝛽 𝑗 ℎ(𝑋𝑡+𝑗 ) ∣ 𝑋𝑡 = 𝑥] = [(𝐼 − 𝛽𝑃 )−1 ℎ](𝑥)
𝑗=0
26.10. EXERCISES 445

where

(𝐼 − 𝛽𝑃 )−1 = 𝐼 + 𝛽𝑃 + 𝛽 2 𝑃 2 + ⋯

Premultiplication by (𝐼 − 𝛽𝑃 )−1 amounts to “applying the resolvent operator”

26.10 Exercises

26.10.1 Exercise 1

According to the discussion above, if a worker’s employment dynamics obey the stochastic
matrix

1−𝛼 𝛼
𝑃 =( )
𝛽 1−𝛽

with 𝛼 ∈ (0, 1) and 𝛽 ∈ (0, 1), then, in the long-run, the fraction of time spent unemployed
will be

𝛽
𝑝 ∶=
𝛼+𝛽

In other words, if {𝑋𝑡 } represents the Markov chain for employment, then 𝑋̄ 𝑚 → 𝑝 as 𝑚 →
∞, where

1 𝑚
𝑋̄ 𝑚 ∶= ∑ 1{𝑋𝑡 = 0}
𝑚 𝑡=1

Your exercise is to illustrate this convergence

First,

• generate one simulated time series {𝑋𝑡 } of length 10,000, starting at 𝑋0 = 0

• plot 𝑋̄ 𝑚 − 𝑝 against 𝑚, where 𝑝 is as defined above

Second, repeat the first step, but this time taking 𝑋0 = 1

In both cases, set 𝛼 = 𝛽 = 0.1
The result should look something like the following — modulo randomness, of course
446 26. FINITE MARKOV CHAINS

(You don’t need to add the fancy touches to the graph—see the solution if you’re interested)

26.10.2 Exercise 2

A topic of interest for economics and many other disciplines is ranking

Let’s now consider one of the most practical and important ranking problems — the rank as-
signed to web pages by search engines
(Although the problem is motivated from outside of economics, there is in fact a deep connec-
tion between search ranking systems and prices in certain competitive equilibria — see [37])
To understand the issue, consider the set of results returned by a query to a web search en-
gine
For the user, it is desirable to

1. receive a large set of accurate matches

2. have the matches returned in order, where the order corresponds to some measure of
“importance”

Ranking according to a measure of importance is the problem we now consider

The methodology developed to solve this problem by Google founders Larry Page and Sergey
Brin is known as PageRank
To illustrate the idea, consider the following diagram
26.10. EXERCISES 447

Imagine that this is a miniature version of the WWW, with

• each node representing a web page

• each arrow representing the existence of a link from one page to another

Now let’s think about which pages are likely to be important, in the sense of being valuable
to a search engine user
One possible criterion for the importance of a page is the number of inbound links — an indi-
cation of popularity
By this measure, m and j are the most important pages, with 5 inbound links each
However, what if the pages linking to m, say, are not themselves important?
Thinking this way, it seems appropriate to weight the inbound nodes by relative importance
The PageRank algorithm does precisely this
A slightly simplified presentation that captures the basic idea is as follows
Letting 𝑗 be (the integer index of) a typical page and 𝑟𝑗 be its ranking, we set

𝑟𝑖
𝑟𝑗 = ∑
𝑖∈𝐿𝑗
ℓ𝑖

where

• ℓ𝑖 is the total number of outbound links from 𝑖

• 𝐿𝑗 is the set of all pages 𝑖 such that 𝑖 has a link to 𝑗

This is a measure of the number of inbound links, weighted by their own ranking (and nor-
malized by 1/ℓ𝑖 )
There is, however, another interpretation, and it brings us back to Markov chains
Let 𝑃 be the matrix given by 𝑃 (𝑖, 𝑗) = 1{𝑖 → 𝑗}/ℓ𝑖 where 1{𝑖 → 𝑗} = 1 if 𝑖 has a link to 𝑗
and zero otherwise
The matrix 𝑃 is a stochastic matrix provided that each page has at least one link
448 26. FINITE MARKOV CHAINS

With this definition of 𝑃 we have

𝑟𝑖 𝑟
𝑟𝑗 = ∑ = ∑ 1{𝑖 → 𝑗} 𝑖 = ∑ 𝑃 (𝑖, 𝑗)𝑟𝑖
𝑖∈𝐿𝑗
ℓ𝑖 all 𝑖
ℓ𝑖 all 𝑖

Writing 𝑟 for the row vector of rankings, this becomes 𝑟 = 𝑟𝑃

Hence 𝑟 is the stationary distribution of the stochastic matrix 𝑃
Let’s think of 𝑃 (𝑖, 𝑗) as the probability of “moving” from page 𝑖 to page 𝑗
The value 𝑃 (𝑖, 𝑗) has the interpretation

• 𝑃 (𝑖, 𝑗) = 1/𝑘 if 𝑖 has 𝑘 outbound links and 𝑗 is one of them

• 𝑃 (𝑖, 𝑗) = 0 if 𝑖 has no direct link to 𝑗

Thus, motion from page to page is that of a web surfer who moves from one page to another
by randomly clicking on one of the links on that page
Here “random” means that each link is selected with equal probability
Since 𝑟 is the stationary distribution of 𝑃 , assuming that the uniform ergodicity condition is
valid, we can interpret 𝑟𝑗 as the fraction of time that a (very persistent) random surfer spends
at page 𝑗
Your exercise is to apply this ranking algorithm to the graph pictured above and return the
list of pages ordered by rank
The data for this graph is in the web_graph_data.txt file — you can also view it here
There is a total of 14 nodes (i.e., web pages), the first named a and the last named n
A typical line from the file has the form

d -> h;

This should be interpreted as meaning that there exists a link from d to h

To parse this file and extract the relevant information, you can use regular expressions
The following code snippet provides a hint as to how you can go about this

In [21]: import re

re.findall('\w', 'x +++ y ****** z') # \w matches alphanumerics

Out[21]: ['x', 'y', 'z']

In [22]: re.findall('\w', 'a ^^ b &&& $$ c')

Out[22]: ['a', 'b', 'c']

When you solve for the ranking, you will find that the highest ranked node is in fact g, while
the lowest is a
26.10. EXERCISES 449

26.10.3 Exercise 3

In numerical work, it is sometimes convenient to replace a continuous model with a discrete

one
In particular, Markov chains are routinely generated as discrete approximations to AR(1)
processes of the form

𝑦𝑡+1 = 𝜌𝑦𝑡 + 𝑢𝑡+1

Here 𝑢𝑡 is assumed to be IID and 𝑁 (0, 𝜎𝑢2 )

The variance of the stationary probability distribution of {𝑦𝑡 } is

𝜎𝑢2
𝜎𝑦2 ∶=
1 − 𝜌2

Tauchen’s method [128] is the most common method for approximating this continuous state
process with a finite state Markov chain
A routine for this already exists in QuantEcon.py but let’s write our own version as an exer-
cise
As a first step, we choose

• 𝑛, the number of states for the discrete approximation

• 𝑚, an integer that parameterizes the width of the state space

Next, we create a state space {𝑥0 , … , 𝑥𝑛−1 } ⊂ R and a stochastic 𝑛 × 𝑛 matrix 𝑃 such that

• 𝑥0 = −𝑚 𝜎𝑦
• 𝑥𝑛−1 = 𝑚 𝜎𝑦
• 𝑥𝑖+1 = 𝑥𝑖 + 𝑠 where 𝑠 = (𝑥𝑛−1 − 𝑥0 )/(𝑛 − 1)

Let 𝐹 be the cumulative distribution function of the normal distribution 𝑁 (0, 𝜎𝑢2 )
The values 𝑃 (𝑥𝑖 , 𝑥𝑗 ) are computed to approximate the AR(1) process — omitting the deriva-
tion, the rules are as follows:

1. If 𝑗 = 0, then set

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝑃 (𝑥𝑖 , 𝑥0 ) = 𝐹 (𝑥0 − 𝜌𝑥𝑖 + 𝑠/2)

1. If 𝑗 = 𝑛 − 1, then set

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝑃 (𝑥𝑖 , 𝑥𝑛−1 ) = 1 − 𝐹 (𝑥𝑛−1 − 𝜌𝑥𝑖 − 𝑠/2)

1. Otherwise, set
450 26. FINITE MARKOV CHAINS

𝑃 (𝑥𝑖 , 𝑥𝑗 ) = 𝐹 (𝑥𝑗 − 𝜌𝑥𝑖 + 𝑠/2) − 𝐹 (𝑥𝑗 − 𝜌𝑥𝑖 − 𝑠/2)

The exercise is to write a function approx_markov(rho, sigma_u, m=3, n=7) that

returns {𝑥0 , … , 𝑥𝑛−1 } ⊂ R and 𝑛 × 𝑛 matrix 𝑃 as described above

• Even better, write a function that returns an instance of QuantEcon.py’s MarkovChain

class

26.11 Solutions

In [23]: import numpy as np

import matplotlib.pyplot as plt
from quantecon import MarkovChain

26.11.1 Exercise 1

Compute the fraction of time that the worker spends unemployed, and compare it to the sta-
tionary probability

In [24]: α = β = 0.1
N = 10000
p = β / (α + β)

P = ((1 - α, α), # Careful: P and p are distinct

( β, 1 - β))
P = np.array(P)
mc = MarkovChain(P)

fig, ax = plt.subplots(figsize=(9, 6))

ax.set_ylim(-0.25, 0.25)
ax.grid()
ax.hlines(0, 0, N, lw=2, alpha=0.6) # Horizonal line at zero

for x0, col in ((0, 'blue'), (1, 'green')):

# == Generate time series for worker that starts at x0 == #
X = mc.simulate(N, init=x0)
# == Compute fraction of time spent unemployed, for each n == #
X_bar = (X == 0).cumsum() / (1 + np.arange(N, dtype=float))
# == Plot == #
ax.fill_between(range(N), np.zeros(N), X_bar - p, color=col, alpha=0.1)
ax.plot(X_bar - p, color=col, label=f'$X_0 = \, {x0} $')
ax.plot(X_bar - p, 'k-', alpha=0.6) # Overlay in black--make lines clearer

ax.legend(loc='upper right')
plt.show()
26.11. SOLUTIONS 451

26.11.2 Exercise 2

First, save the data into a file called web_graph_data.txt by executing the next cell

In [25]: %%file web_graph_data.txt

a -> d;
a -> f;
b -> j;
b -> k;
b -> m;
c -> c;
c -> g;
c -> j;
c -> m;
d -> f;
d -> h;
d -> k;
e -> d;
e -> h;
e -> l;
f -> a;
f -> b;
f -> j;
f -> l;
g -> b;
g -> j;
h -> d;
h -> g;
h -> l;
h -> m;
i -> g;
i -> h;
i -> n;
j -> e;
j -> i;
j -> k;
k -> n;
l -> m;
452 26. FINITE MARKOV CHAINS

m -> g;
n -> c;
n -> j;
n -> m;

Writing web_graph_data.txt

In [26]: """
Return list of pages, ordered by rank
"""
import numpy as np
from operator import itemgetter

infile = 'web_graph_data.txt'
alphabet = 'abcdefghijklmnopqrstuvwxyz'

n = 14 # Total number of web pages (nodes)

# == Create a matrix Q indicating existence of links == #

# * Q[i, j] = 1 if there is a link from i to j
# * Q[i, j] = 0 otherwise
Q = np.zeros((n, n), dtype=int)
f = open(infile, 'r')
edges = f.readlines()
f.close()
for edge in edges:
from_node, to_node = re.findall('\w', edge)
i, j = alphabet.index(from_node), alphabet.index(to_node)
Q[i, j] = 1
# == Create the corresponding Markov matrix P == #
P = np.empty((n, n))
for i in range(n):
P[i, :] = Q[i, :] / Q[i, :].sum()
mc = MarkovChain(P)
# == Compute the stationary distribution r == #
r = mc.stationary_distributions[0]
ranked_pages = {alphabet[i] : r[i] for i in range(n)}
# == Print solution, sorted from highest to lowest rank == #
print('Rankings\n ***')
for name, rank in sorted(ranked_pages.items(), key=itemgetter(1), reverse=1):
print(f'{name}: {rank:.4}')

Rankings
***
g: 0.1607
j: 0.1594
m: 0.1195
n: 0.1088
k: 0.09106
b: 0.08326
e: 0.05312
i: 0.05312
c: 0.04834
h: 0.0456
l: 0.03202
d: 0.03056
f: 0.01164
a: 0.002911

26.11.3 Exercise 3

A solution from the QuantEcon.py library can be found here

Footnotes
26.11. SOLUTIONS 453

[1] Hint: First show that if 𝑃 and 𝑄 are stochastic matrices then so is their product — to
check the row sums, try post multiplying by a column vector of ones. Finally, argue that 𝑃 𝑛
is a stochastic matrix using induction.
454 26. FINITE MARKOV CHAINS
27

Continuous State Markov Chains

27.1 Contents

• Overview 27.2

• The Density Case 27.3

• Beyond Densities 27.4

• Stability 27.5

• Exercises 27.6

• Solutions 27.7

• Appendix 27.8

In addition to what’s in Anaconda, this lecture will need the following libraries

In [1]: !pip install quantecon

27.2 Overview

In a previous lecture, we learned about finite Markov chains, a relatively elementary class of
stochastic dynamic models
The present lecture extends this analysis to continuous (i.e., uncountable) state Markov
chains
Most stochastic dynamic models studied by economists either fit directly into this class or can
be represented as continuous state Markov chains after minor modifications
In this lecture, our focus will be on continuous Markov models that

• evolve in discrete-time
• are often nonlinear

The fact that we accommodate nonlinear models here is significant, because linear stochastic
models have their own highly developed toolset, as we’ll see later on

455
456 27. CONTINUOUS STATE MARKOV CHAINS

The question that interests us most is: Given a particular stochastic dynamic model, how will
the state of the system evolve over time?
In particular,

• What happens to the distribution of the state variables?

• Is there anything we can say about the “average behavior” of these variables?
• Is there a notion of “steady state” or “long-run equilibrium” that’s applicable to the
model?

– If so, how can we compute it?

Answering these questions will lead us to revisit many of the topics that occupied us in the
finite state case, such as simulation, distribution dynamics, stability, ergodicity, etc.

Note
For some people, the term “Markov chain” always refers to a process with a finite
or discrete state space. We follow the mainstream mathematical literature (e.g.,
[95]) in using the term to refer to any discrete time Markov process

27.3 The Density Case

You are probably aware that some distributions can be represented by densities and some
cannot
(For example, distributions on the real numbers R that put positive probability on individual
points have no density representation)
We are going to start our analysis by looking at Markov chains where the one-step transition
probabilities have density representations
The benefit is that the density case offers a very direct parallel to the finite case in terms of
notation and intuition
Once we’ve built some intuition we’ll cover the general case

27.3.1 Definitions and Basic Properties

In our lecture on finite Markov chains, we studied discrete-time Markov chains that evolve on
a finite state space 𝑆
In this setting, the dynamics of the model are described by a stochastic matrix — a nonnega-
tive square matrix 𝑃 = 𝑃 [𝑖, 𝑗] such that each row 𝑃 [𝑖, ⋅] sums to one
The interpretation of 𝑃 is that 𝑃 [𝑖, 𝑗] represents the probability of transitioning from state 𝑖
to state 𝑗 in one unit of time
In symbols,

P{𝑋𝑡+1 = 𝑗 | 𝑋𝑡 = 𝑖} = 𝑃 [𝑖, 𝑗]

Equivalently,
27.3. THE DENSITY CASE 457

• 𝑃 can be thought of as a family of distributions 𝑃 [𝑖, ⋅], one for each 𝑖 ∈ 𝑆

• 𝑃 [𝑖, ⋅] is the distribution of 𝑋𝑡+1 given 𝑋𝑡 = 𝑖

(As you probably recall, when using NumPy arrays, 𝑃 [𝑖, ⋅] is expressed as P[i, :])
In this section, we’ll allow 𝑆 to be a subset of R, such as

• R itself
• the positive reals (0, ∞)
• a bounded interval (𝑎, 𝑏)

The family of discrete distributions 𝑃 [𝑖, ⋅] will be replaced by a family of densities 𝑝(𝑥, ⋅), one
for each 𝑥 ∈ 𝑆
Analogous to the finite state case, 𝑝(𝑥, ⋅) is to be understood as the distribution (density) of
𝑋𝑡+1 given 𝑋𝑡 = 𝑥
More formally, a stochastic kernel on 𝑆 is a function 𝑝 ∶ 𝑆 × 𝑆 → R with the property that

1. 𝑝(𝑥, 𝑦) ≥ 0 for all 𝑥, 𝑦 ∈ 𝑆

2. ∫ 𝑝(𝑥, 𝑦)𝑑𝑦 = 1 for all 𝑥 ∈ 𝑆

(Integrals are over the whole space unless otherwise specified)

For example, let 𝑆 = R and consider the particular stochastic kernel 𝑝𝑤 defined by

1 (𝑦 − 𝑥)2
𝑝𝑤 (𝑥, 𝑦) ∶= √ exp {− } (1)
2𝜋 2