Quantitative Economics With Python
Quantitative Economics With Python
with Python
July 1, 2019
1 https://lectures.quantecon.org/py/
2
Contents
I Introduction to Python 1
1 About Python 3
3 An Introductory Example 35
4 Python Essentials 55
6 NumPy 81
7 Matplotlib 99
8 SciPy 111
9 Numba 123
15 Debugging 237
16 Pandas 245
3
4 CONTENTS
44 Consumption and Tax Smoothing with Complete and Incomplete Markets 729
46 Robustness 763
Introduction to Python
1
1
About Python
1.1 Contents
โข Overview 1.2
1.2 Overview
At this stage, itโs not our intention that you try to replicate all you see
We will work through what follows at a slow pace later in the lecture series
Our only objective for this lecture is to give you some feel of what Python is, and what it can
do
3
4 1. ABOUT PYTHON
โข communications
โข web development
โข CGI and graphical user interfaces
โข games
โข multimedia, data processing, security, etc., etc., etc.
โข Google
โข Dropbox
โข Reddit
โข YouTube
โข Walt Disney Animation, etc., etc.
The following chart, produced using Stack Overflow Trends, shows one measure of the relative
popularity of Python
The figure indicates not only that Python is widely used but also that adoption of Python
has accelerated significantly since 2012
We suspect this is driven at least in part by uptake in the scientific domain, particularly in
rapidly growing fields like data science
1.3. WHATโS PYTHON? 5
For example, the popularity of pandas, a library for data analysis with Python has exploded,
as seen here
(The corresponding time path for MATLAB is shown for comparison)
Note that pandas takes off in 2012, which is the same year that we seek Pythonโs popularity
begin to spike in the first figure
Overall, itโs clear that
1.3.3 Features
One nice feature of Python is its elegant syntax โ weโll see many examples later on
Elegant code might sound superfluous but in fact itโs highly beneficial because it makes the
syntax easy to read and easy to remember
Remembering how to read from files, sort dictionaries and other such routine tasks means
that you donโt need to break your flow in order to hunt down correct syntax
Closely related to elegant syntax is an elegant design
6 1. ABOUT PYTHON
Features like iterators, generators, decorators, list comprehensions, etc. make Python highly
expressive, allowing you to get more done with less code
Namespaces improve productivity by cutting down on bugs and syntax errors
Fundamental matrix and array processing capabilities are provided by the excellent NumPy
library
NumPy provides the basic array data type plus some simple processing operations
For example, letโs build some arrays
In [2]: b @ c
Out[2]: 1.5265566588595902e-16
The number you see here might vary slightly but itโs essentially zero
(For older versions of Python and NumPy you need to use the np.dot function)
The SciPy library is built on top of NumPy and provides additional functionality
2
For example, letโs calculate โซโ2 ๐(๐ง)๐๐ง where ๐ is the standard normal density
1.4. SCIENTIFIC PROGRAMMING 7
๏ฟฝ = norm()
value, error = quad(๏ฟฝ.pdf, -2, 2) # Integrate using Gaussian quadrature
value
Out[3]: 0.9544997361036417
โข linear algebra
โข integration
โข interpolation
โข optimization
โข distributions and random number generation
โข signal processing
โข etc., etc.
1.4.2 Graphics
The most popular and comprehensive Python library for creating figures and graphs is Mat-
plotlib
Example 3D plot
โข Plotly
โข Bokeh
โข VPython โ 3D graphics and animations
1.4. SCIENTIFIC PROGRAMMING 9
Out[4]: 3*x + y
solve polynomials
solve(x**2 + x + 2)
limit(1 / x, x, 0)
Out[7]: oo
In [8]: limit(sin(x) / x, x, 0)
Out[8]: 1
In [9]: diff(sin(x), x)
Out[9]: cos(x)
The beauty of importing this functionality into Python is that we are working within a fully
fledged programming language
Can easily create tables of derivatives, generate LaTeX output, add it to figures, etc., etc.
10 1. ABOUT PYTHON
1.4.4 Statistics
Pythonโs data manipulation and statistics libraries have improved rapidly over the last few
years
Pandas
One of the most popular libraries for working with data is pandas
Pandas is fast, efficient, flexible and well designed
Hereโs a simple example, using some fake data
price weight
2010-12-28 0.471435 -1.190976
2010-12-29 1.432707 -0.312652
2010-12-30 -0.720589 0.887163
2010-12-31 0.859588 -0.636524
2011-01-01 0.015696 -2.242685
In [11]: df.mean()
Hereโs some example code that generates and plots a random graph, with node color deter-
mined by shortest path length from a central node
1.4. SCIENTIFIC PROGRAMMING 11
/home/anju/anaconda3/lib/python3.7/site-packages/networkx/drawing/nx_pylab.py:611: MatplotlibDeprecationWarnin
if cb.is_numlike(alpha):
Running your Python code on massive servers in the cloud is becoming easier and easier
A nice example is Anaconda Enterprise
12 1. ABOUT PYTHON
See also
- Amazon Elastic Compute Cloud
- The Google App Engine (Python, Java, PHP or Go)
- Pythonanywhere
- Sagemath Cloud
Apart from the cloud computing options listed above, you might like to consider
- Parallel computing through IPython clusters
- The Starcluster interface to Amazonโs EC2
- GPU programming through PyCuda, PyOpenCL, Theano or similar
There are many other interesting developments with scientific programming in Python
Some representative examples include
- Jupyter โ Python in your browser with code cells, embedded images, etc.
- Numba โ Make Python run at the same speed as native machine code!
- Blaze โ a generalization of NumPy
- PyTables โ manage large data sets
- CVXPY โ convex optimization in Python
2.1 Contents
โข Overview 2.2
โข Anaconda 2.3
โข Exercises 2.8
2.2 Overview
1. get a Python environment up and running with all the necessary tools
2. execute simple Python commands
3. run a sample program
4. install the code libraries that underpin these lectures
2.3 Anaconda
The core Python package is easy to install but not what you should choose for these lectures
These lectures require the entire scientific programming ecosystem, which
13
14 2. SETTING UP YOUR PYTHON ENVIRONMENT
Hence the best approach for our purposes is to install a free Python distribution that contains
โข very popular
โข cross platform
โข comprehensive
โข completely unrelated to the Nicki Minaj song of the same name
Anaconda also comes with a great package management system to organize your code li-
braries
All of what follows assumes that you adopt this recommendation!
Installing Anaconda is straightforward: download the binary and follow the instructions
Important points:
Anaconda supplies a tool called conda to manage and upgrade your Anaconda packages
One conda command you should execute regularly is the one that updates the whole Ana-
conda distribution
As a practice run, please execute the following
1. Open up a terminal
2. Type conda update anaconda
Jupyter notebooks are one of the many possible ways to interact with Python and the scien-
tific libraries
They use a browser-based interface to Python with
2.4. JUPYTER NOTEBOOKS 15
Because of these possibilities, Jupyter is fast turning into a major player in the scientific com-
puting ecosystem
Hereโs an image showing execution of some code (borrowed from here) in a Jupyter notebook
You can find a nice example of the kinds of things you can do in a Jupyter notebook (such as
include maths and text) here
While Jupyter isnโt the only way to code in Python, itโs great for when you wish to
Once you have installed Anaconda, you can start the Jupyter notebook
Either
If you use the second option, you will see something like this (click to enlarge)
Thus, the Jupyter kernel is listening for Python commands on port 8888 of our local machine
Hopefully, your default browser has also opened up with a web page that looks something like
this (click to enlarge)
2.4. JUPYTER NOTEBOOKS 17
The notebook displays an active cell, into which you can type Python commands
Letโs start with how to edit code and run simple programs
Running Cells
Notice that in the previous figure the cell is surrounded by a green border
This means that the cell is in edit mode
As a result, you can type in Python code and it will appear in the cell
When youโre ready to execute the code in a cell, hit Shift-Enter instead of the usual En-
ter
2.4. JUPYTER NOTEBOOKS 19
(Note: There are also menu and button options for running code in a cell that you can find
by exploring)
Modal Editing
The next thing to understand about the Jupyter notebook is that it uses a modal editing sys-
tem
This means that the effect of typing at the keyboard depends on which mode you are in
The two modes are
1. Edit mode
1. Command mode
To switch to
โข command mode from edit mode, hit the Esc key or Ctrl-M
20 2. SETTING UP YOUR PYTHON ENVIRONMENT
The modal behavior of the Jupyter notebook is a little tricky at first but very efficient when
you get used to it
User Interface Tour
At this stage, we recommend you take your time to
โข look at the various options in the menus and see what they do
โข take the โuser interface tourโ, which can be accessed through the help menu
N = 20
ฮธ = np.linspace(0.0, 2 * np.pi, N, endpoint=False)
radii = 10 * np.random.rand(N)
width = np.pi / 4 * np.random.rand(N)
ax = plt.subplot(111, polar=True)
bars = ax.bar(ฮธ, radii, width=width, bottom=0.0)
plt.show()
2.4. JUPYTER NOTEBOOKS 21
Donโt worry about the details for now โ letโs just run it and see what happens
The easiest way to run this code is to copy and paste into a cell in the notebook
(In older versions of Jupyter you might need to add the command %matplotlib inline
before you generate the figure)
Clicking on the top right of the lower split closes the on-line help
Other Content
In addition to executing code, the Jupyter notebook allows you to embed text, equations, fig-
ures and even videos in the page
For example, here we enter a mixture of plain text and LaTeX instead of code
24 2. SETTING UP YOUR PYTHON ENVIRONMENT
Next we Esc to enter command mode and then type m to indicate that we are writing Mark-
down, a mark-up language similar to (but simpler than) LaTeX
(You can also use your mouse to select Markdown from the Code drop-down box just below
the list of menu items)
Now we Shift+Enter to produce this
2.4. JUPYTER NOTEBOOKS 25
Notebook files are just text files structured in JSON and typically ending with .ipynb
You can share them in the usual way that you share files โ or by using web services such as
nbviewer
The notebooks you see on that site are static html representations
To run one, download it as an ipynb file by clicking on the download icon at the top right
Save it somewhere, navigate to it from the Jupyter dashboard and then run as discussed
above
QuantEcon has its own site for sharing Jupyter notebooks related to economics โ QuantEcon
Notes
Notebooks submitted to QuantEcon Notes can be shared with a link, and are open to com-
ments and votes by the community
26 2. SETTING UP YOUR PYTHON ENVIRONMENT
into a cell
Alternatively, you can type the following into a terminal
Using the run command is often easier than copy and paste
(You might find that the % is unnecessary โ use %automagic to toggle the need for %)
Note that Jupyter only looks for test.py in the present working directory (PWD)
If test.py isnโt in that directory, you will get an error
Letโs look at a successful example, where we run a file test.py with contents:
foobar
foobar
foobar
foobar
foobar
Here
โข pwd asks Jupyter to show the PWD (or %pwd โ see the comment about automagic
above)
โ Note that test.py is there (on our computer, because we saved it there earlier)
โข cat test.py asks Jupyter to print the contents of test.py (or !type test.py on
Windows)
If youโre trying to run a file not in the present working directory, youโll get an error
To fix this error you need to either
One way to achieve the first option is to use the Upload button
โข The button is on the top level dashboard, where Jupyter first opened to
โข Look where the pointer is in this picture
Note: You can type the first letter or two of each directory name and then use the tab key to
expand
Itโs often convenient to be able to see your code before you run it
2.7. EDITORS AND IDES 29
The preceding discussion covers most of what you need to know to interact with this website
However, as you start to write longer programs, you might want to experiment with your
workflow
There are many different options and we mention them only in passing
30 2. SETTING UP YOUR PYTHON ENVIRONMENT
2.7.1 JupyterLab
A text editor is an application that is specifically designed to work with text files โ such as
Python programs
Nothing beats the power and efficiency of a good text editor for working with program text
A good text editor will provide
โข efficient text editing commands (e.g., copy, paste, search and replace)
โข syntax highlighting, etc.
The IPython shell has many of the features of the notebook: tab completion, color syntax,
etc.
It also has command history through the arrow key
The up arrow key to brings previously typed commands to the prompt
This saves a lot of typingโฆ
Hereโs one set up, on a Linux box, with
2.7.4 IDEs
IDEs are Integrated Development Environments, which allow you to edit, execute and inter-
act with code from an integrated environment
One of the most popular in recent times is VS Code, which is now available via Anaconda
We hear good things about VS Code โ please tell us about your experiences on the forum
2.8 Exercises
2.8.1 Exercise 1
If Jupyter is still running, quit by using Ctrl-C at the terminal where you started it
Now launch again, but this time using jupyter notebook --no-browser
This should start the kernel without launching the browser
Note also the startup message: It should give you a URL such as
http://localhost:8888 where the notebook is running
Now
2.8.2 Exercise 2
As an exercise, try
1. Installing Git
2. Getting a copy of QuantEcon.py using Git
For example, if youโve installed the command line version, open up a terminal and enter
(This is just git clone in front of the URL for the repository)
Even better,
1. Sign up to GitHub
2. Look into โforkingโ GitHub repositories (forking means making your own copy of a
GitHub repository, stored on GitHub)
3. Fork QuantEcon.py
4. Clone your fork to some local directory, make edits, commit them, and push them back
up to your forked GitHub repo
5. If you made a valuable improvement, send us a pull request!
An Introductory Example
3.1 Contents
โข Overview 3.2
โข Version 1 3.4
โข Exercises 3.6
โข Solutions 3.7
Note: These references offer help on installing Python but you should probably stick with the
method on our set up page
Youโll then have an outstanding scientific computing environment (Anaconda) and be ready
to move on to the rest of our course
3.2 Overview
In this lecture, we will write and then pick apart small Python programs
35
36 3. AN INTRODUCTORY EXAMPLE
The objective is to introduce you to basic Python syntax and data structures
Deeper concepts will be covered in later lectures
3.2.1 Prerequisites
Suppose we want to simulate and plot the white noise process ๐0 , ๐1 , โฆ , ๐๐ , where each draw
๐๐ก is independent standard normal
In other words, we want to generate figures that look something like this:
3.4 Version 1
Here are a few lines of code that perform the task we set
x = np.random.randn(100)
plt.plot(x)
plt.show()
3.4. VERSION 1 37
After import numpy as np we have access to these attributes via the syntax np.
Hereโs another example
np.sqrt(4)
Out[2]: 2.0
numpy.sqrt(4)
Out[3]: 2.0
38 3. AN INTRODUCTORY EXAMPLE
In fact, you can find and explore the directory for NumPy on your computer easily enough if
you look around
On this machine, itโs located in
anaconda3/lib/python3.6/site-packages/numpy
Subpackages
Consider the line x = np.random.randn(100)
Here np refers to the package NumPy, while random is a subpackage of NumPy
You can see the contents here
Subpackages are just packages that are subdirectories of another package
np.sqrt(4)
Out[4]: 2.0
sqrt(4)
Out[5]: 2.0
for i in range(ts_length):
e = np.random.randn()
๏ฟฝ_values.append(e)
plt.plot(๏ฟฝ_values)
plt.show()
40 3. AN INTRODUCTORY EXAMPLE
In brief,
3.5.2 Lists
In [7]: x = [10, 'foo', False] # We can include heterogeneous data inside a list
type(x)
Out[7]: list
The first element of x is an integer, the next is a string and the third is a Boolean value
When adding a value to a list, we can use the syntax list_name.append(some_value)
In [8]: x
3.5. ALTERNATIVE VERSIONS 41
In [9]: x.append(2.5)
x
Here append() is whatโs called a method, which is a function โattached toโ an objectโin
this case, the list x
Weโll learn all about methods later on, but just to give you some idea,
โข Python objects such as lists, strings, etc. all have methods that are used to manipulate
the data contained in the object
โข String objects have string methods, list objects have list methods, etc.
In [10]: x
In [11]: x.pop()
Out[11]: 2.5
In [12]: x
In [13]: x
In [14]: x[0]
Out[14]: 10
In [15]: x[1]
Out[15]: 'foo'
42 3. AN INTRODUCTORY EXAMPLE
Now letโs consider the for loop from the program above, which was
Python executes the two indented lines ts_length times before moving on
These two lines are called a code block, since they comprise the โblockโ of code that we
are looping over
Unlike most other languages, Python knows the extent of the code block only from indenta-
tion
In our program, indentation decreases after line ๏ฟฝ_values.append(e), telling Python that
this line marks the lower limit of the code block
More on indentation belowโfor now, letโs look at another example of a for loop
This example helps to clarify how the for loop works: When we execute a loop of the form
โข For each element of the sequence, it โbindsโ the name variable_name to that ele-
ment and then executes the code block
The sequence object can in fact be a very general object, as weโll see soon enough
In discussing the for loop, we explained that the code blocks being looped over are delimited
by indentation
In fact, in Python, all code blocks (i.e., those occurring inside loops, if clauses, function defi-
nitions, etc.) are delimited by indentation
Thus, unlike most other languages, whitespace in Python code affects the output of the pro-
gram
Once you get used to it, this is a good thing: It
3.5. ALTERNATIVE VERSIONS 43
On the other hand, it takes a bit of care to get right, so please remember:
โข The line before the start of a code block always ends in a colon
โ for i in range(10):
โ if x > y:
โ while x < 100:
โ etc., etc.
โข All lines in a code block must have the same amount of indentation
โข The Python standard is 4 spaces, and thatโs what you should use
Tabs vs Spaces
One small โgotchaโ here is the mixing of tabs and spaces, which often leads to errors
(Important: Within text files, the internal representation of tabs and spaces is not the same)
You can use your Tab key to insert 4 spaces, but you need to make sure itโs configured to do
so
If you are using a Jupyter notebook you will have no problems here
Also, good text editors will allow you to configure the Tab key to insert spaces instead of tabs
โ trying searching online
The for loop is the most common technique for iteration in Python
But, for the purpose of illustration, letโs modify the program above to use a while loop in-
stead
Note that
โข the code block for the while loop is again delimited only by indentation
โข the statement i = i + 1 can be replaced by i += 1
Now letโs go back to the for loop, but restructure our program to make the logic clearer
To this end, we will break our program into two parts:
data = generate_data(100)
plt.plot(data)
plt.show()
3.5. ALTERNATIVE VERSIONS 45
Letโs go over this carefully, in case youโre not familiar with functions and how they work
We have defined a function called generate_data() as follows
This whole function definition is read by the Python interpreter and stored in memory
When the interpreter gets to the expression generate_data(100), it executes the function
body with n set equal to 100
The net result is that the name data is bound to the list ๏ฟฝ_values returned by the function
3.5.7 Conditions
else:
e = np.random.randn()
๏ฟฝ_values.append(e)
return ๏ฟฝ_values
Hopefully, the syntax of the if/else clause is self-explanatory, with indentation again delimit-
ing the extent of the code blocks
Notes
Now, there are several ways that we can simplify the code above
For example, we can get rid of the conditionals all together by just passing the desired gener-
ator type as a function
To understand this, consider the following version
This principle works more generallyโfor example, consider the following piece of code
Out[22]: 7
In [23]: m = max
m(7, 2, 4)
Out[23]: 7
Here we created another name for the built-in function max(), which could then be used in
identical ways
In the context of our program, the ability to bind new names to functions means that there is
no problem passing a function as an argument to another functionโas we did above
48 3. AN INTRODUCTORY EXAMPLE
We can also simplify the code for generating the list of random draws considerably by using
something called a list comprehension
List comprehensions are an elegant Python tool for creating lists
Consider the following example, where the list comprehension is on the right-hand side of the
second line
In [25]: range(8)
Out[25]: range(0, 8)
๏ฟฝ_values = []
for i in range(n):
e = generator_type()
๏ฟฝ_values.append(e)
into
3.6 Exercises
3.6.1 Exercise 1
3.6.2 Exercise 2
The binomial random variable ๐ โผ ๐ต๐๐(๐, ๐) represents the number of successes in ๐ binary
trials, where each trial succeeds with probability ๐
Without any import besides from numpy.random import uniform, write a function
binomial_rv such that binomial_rv(n, p) generates one draw of ๐
Hint: If ๐ is uniform on (0, 1) and ๐ โ (0, 1), then the expression U < p evaluates to True
with probability ๐
3.6.3 Exercise 3
โข If ๐ is a bivariate uniform random variable on the unit square (0, 1)2 , then the proba-
bility that ๐ lies in a subset ๐ต of (0, 1)2 is equal to the area of ๐ต
โข If ๐1 , โฆ , ๐๐ are IID copies of ๐ , then, as ๐ gets large, the fraction that falls in ๐ต, con-
verges to the probability of landing in ๐ต
โข For a circle, area = pi * radius^2
3.6.4 Exercise 4
Write a program that prints one realization of the following random device:
3.6.5 Exercise 5
Your next task is to simulate and plot the correlated time series
3.6.6 Exercise 6
To do the next exercise, you will need to know how to produce a plot legend
The following example should be sufficient to convey the idea
Now, starting with your solution to exercise 5, plot three simulated time series, one for each
of the cases ๐ผ = 0, ๐ผ = 0.8 and ๐ผ = 0.98
In particular, you should produce (modulo randomness) a figure that looks as follows
3.7. SOLUTIONS 51
(The figure nicely illustrates how time series with the same one-step-ahead conditional volatil-
ities, as these three processes have, can have very different unconditional volatilities.)
Use a for loop to step through the ๐ผ values
Important hints:
โข If you call the plot() function multiple times before calling show(), all of the lines
you produce will end up on the same figure
โ And if you omit the argument 'b-' to the plot function, Matplotlib will automati-
cally select different colors for each line
3.7 Solutions
3.7.1 Exercise 1
In [30]: def factorial(n):
k = 1
for i in range(n):
k = k * (i + 1)
return k
factorial(4)
Out[30]: 24
3.7.2 Exercise 2
In [31]: from numpy.random import uniform
52 3. AN INTRODUCTORY EXAMPLE
binomial_rv(10, 0.5)
Out[31]: 5
3.7.3 Exercise 3
In [32]: n = 100000
count = 0
for i in range(n):
u, v = np.random.uniform(), np.random.uniform()
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
area_estimate = count / n
3.13976
3.7.4 Exercise 4
In [33]: from numpy.random import uniform
payoff = 0
count = 0
for i in range(10):
U = uniform()
count = count + 1 if U < 0.5 else 0
if count == 3:
payoff = 1
print(payoff)
1
3.7. SOLUTIONS 53
3.7.5 Exercise 5
The next line embeds all subsequent figures in the browser itself
In [34]: ฮฑ = 0.9
ts_length = 200
current_x = 0
x_values = []
for i in range(ts_length + 1):
x_values.append(current_x)
current_x = ฮฑ * current_x + np.random.randn()
plt.plot(x_values)
plt.show()
3.7.6 Exercise 6
for ฮฑ in ฮฑs:
x_values = []
current_x = 0
for i in range(ts_length):
x_values.append(current_x)
current_x = ฮฑ * current_x + np.random.randn()
plt.plot(x_values, label=f'ฮฑ = {ฮฑ}')
plt.legend()
plt.show()
54 3. AN INTRODUCTORY EXAMPLE
4
Python Essentials
4.1 Contents
โข Iterating 4.4
โข Exercises 4.8
โข Solutions 4.9
In this lecture, weโll cover features of the language that are essential to reading and writing
Python code
Weโve already met several built-in Python data types, such as strings, integers, floats and
lists
Letโs learn a bit more about them
One simple data type is Boolean values, which can be either True or False
In [1]: x = True
x
Out[1]: True
55
56 4. PYTHON ESSENTIALS
In the next line of code, the interpreter evaluates the expression on the right of = and binds y
to this value
Out[2]: False
In [3]: type(y)
Out[3]: bool
In [4]: x + y
Out[4]: 1
In [5]: x * y
Out[5]: 0
Out[6]: 2
sum(bools)
Out[7]: 3
The two most common data types used to represent numbers are integers and floats
In [8]: a, b = 1, 2
c, d = 2.5, 10.0
type(a)
Out[8]: int
In [9]: type(c)
Out[9]: float
Computers distinguish between the two because, while floats are more informative, arithmetic
operations on integers are faster and more accurate
As long as youโre using Python 3.x, division of integers yields floats
In [10]: 1 / 2
4.2. DATA TYPES 57
Out[10]: 0.5
But be careful! If youโre still using Python 2.x, division of two integers returns only the inte-
ger part
For integer division in Python 3.x use this syntax:
In [11]: 1 // 2
Out[11]: 0
In [12]: x = complex(1, 2)
y = complex(2, 1)
x * y
Out[12]: 5j
4.2.2 Containers
Python has several basic types for storing collections of (possibly heterogeneous) data
Weโve already discussed lists
A related data type is tuples, which are โimmutableโ lists
In [14]: type(x)
Out[14]: tuple
In Python, an object is called immutable if, once created, the object cannot be changed
Conversely, an object is mutable if it can still be altered after creation
Python lists are mutable
In [15]: x = [1, 2]
x[0] = 10
x
Out[15]: [10, 2]
In [16]: x = (1, 2)
x[0] = 10
58 4. PYTHON ESSENTIALS
---------------------------------------------------------------------------
<ipython-input-16-d1b2647f6c81> in <module>
1 x = (1, 2)
----> 2 x[0] = 10
Weโll say more about the role of mutable and immutable data a bit later
Tuples (and lists) can be โunpackedโ as follows
Out[17]: 10
In [18]: y
Out[18]: 20
In [19]: a = [2, 4, 6, 8]
a[1:]
Out[19]: [4, 6, 8]
In [20]: a[1:3]
Out[20]: [4, 6]
Out[21]: [6, 8]
In [22]: s = 'foobar'
s[-3:] # Select the last three elements
4.3. INPUT AND OUTPUT 59
Out[22]: 'bar'
Out[23]: dict
In [24]: d['age']
Out[24]: 33
Out[25]: set
Out[26]: False
In [27]: s1.intersection(s2)
Out[27]: {'b'}
Letโs briefly review reading and writing to text files, starting with writing
Here
In [30]: %pwd
Out[30]: '/home/anju/Desktop/lecture-source-py/_build/jupyter/executed'
In [32]: print(out)
Testing
Testing again
4.3.1 Paths
Note that if newfile.txt is not in the present working directory then this call to open()
fails
In this case, you can shift the file to the pwd or specify the full path to the file
f = open('insert_full_path_to_file/newfile.txt', 'r')
4.4 Iterating
One of the most important tasks in computing is stepping through a sequence of data and
performing a given action
One of Pythonโs strengths is its simple, flexible interface to this kind of iteration via the for
loop
Many Python objects are โiterableโ, in the sense that they can be looped over
To give an example, letโs write the file us_cities.txt, which lists US cities and their popula-
tion, to the present working directory
4.4. ITERATING 61
Overwriting us_cities.txt
Suppose that we want to make the information more readable, by capitalizing names and
adding commas to mark thousands
The program us_cities.py program reads the data in and makes the conversion:
Here format() is a string method used for inserting variables into strings
The reformatting of each line is the result of three different string methods, the details of
which can be left till later
The interesting part of this program for us is line 2, which shows that
1. The file object f is iterable, in the sense that it can be placed to the right of in within
a for loop
2. Iteration steps through each line in the file
One thing you might have noticed is that Python tends to favor looping without explicit in-
dexing
For example,
62 4. PYTHON ESSENTIALS
1
4
9
is preferred to
1
4
9
When you compare these two alternatives, you can see why the first one is preferred
Python provides some facilities to simplify looping without indices
One is zip(), which is used for stepping through pairs from two sequences
For example, try running the following code
The zip() function is also useful for creating dictionaries โ for example
If we actually need the index from a list, one option is to use enumerate()
To understand what enumerate() does, consider the following example
letter_list[0] = 'a'
letter_list[1] = 'b'
letter_list[2] = 'c'
4.5.1 Comparisons
Many different kinds of expressions evaluate to one of the Boolean values (i.e., True or
False)
A common type is comparisons, such as
In [41]: x, y = 1, 2
x < y
Out[41]: True
In [42]: x > y
Out[42]: False
Out[43]: True
Out[44]: True
In [45]: x = 1 # Assignment
x == 2 # Comparison
Out[45]: False
In [46]: 1 != 2
Out[46]: True
Note that when testing conditions, we can use any valid Python expression
Out[47]: 'yes'
Out[48]: 'no'
64 4. PYTHON ESSENTIALS
โข Expressions that evaluate to zero, empty sequences or containers (strings, lists, etc.)
and None are all equivalent to False
Out[49]: True
Out[50]: False
Out[51]: True
Out[52]: False
Out[53]: True
Remember
Letโs talk a bit more about functions, which are all important for good programming style
Python has a number of built-in functions that are available without import
We have already met some
4.6. MORE FUNCTIONS 65
Out[54]: 20
Out[55]: range(0, 4)
In [56]: list(range(4)) # will evaluate the range iterator and create a list
Out[56]: [0, 1, 2, 3]
In [57]: str(22)
Out[57]: '22'
In [58]: type(22)
Out[58]: int
Out[59]: False
Out[60]: True
User-defined functions are important for improving the clarity of your code by
Functions without a return statement automatically return the special Python object None
4.6.3 Docstrings
Python has a system for adding comments to functions, modules, etc. called docstrings
The nice thing about docstrings is that they are available at run-time
Try running this
In [63]: f?
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Docstring: This function squares its argument
In [64]: f??
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
4.6. MORE FUNCTIONS 67
Definition: f(x)
Source:
def f(x):
"""
This function squares its argument
"""
return x**2
With one question mark we bring up the docstring, and with two we get the source code as
well
and
quad(lambda x: x**3, 0, 2)
Here the function created by lambda is said to be anonymous because it was never given a
name
If you did the exercises in the previous lecture, you would have come across the statement
In this call to Matplotlibโs plot function, notice that the last argument is passed in
name=argument syntax
This is called a keyword argument, with label being the keyword
Non-keyword arguments are called positional arguments, since their meaning is determined by
order
Keyword arguments are particularly useful when a function has a lot of arguments, in which
case itโs hard to remember the right order
You can adopt keyword arguments in user-defined functions with no difficulty
The next example illustrates the syntax
The keyword argument values we supplied in the definition of f become the default values
In [69]: f(2)
Out[69]: 3
Out[70]: 14
To learn more about the Python programming philosophy type import this at the prompt
Among other things, Python strongly favors consistency in programming style
Weโve all heard the saying about consistency and little minds
In programming, as in mathematics, the opposite is true
โข A mathematical paper where the symbols โช and โฉ were reversed would be very hard to
read, even if the author told you so on the first page
4.8 Exercises
4.8.1 Exercise 1
Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute their
inner product using zip()
Part 2: In one line, count the number of even numbers in 0,โฆ,99
Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of
pairs (a, b) such that both a and b are even
4.8.2 Exercise 2
๐
๐(๐ฅ) = ๐0 + ๐1 ๐ฅ + ๐2 ๐ฅ2 + โฏ ๐๐ ๐ฅ๐ = โ ๐๐ ๐ฅ๐ (1)
๐=0
Write a function p such that p(x, coeff) that computes the value in Eq. (1) given a point
x and a list of coefficients coeff
Try to use enumerate() in your loop
4.8.3 Exercise 3
Write a function that takes a string as an argument and returns the number of capital letters
in the string
Hint: 'foo'.upper() returns 'FOO'
4.8.4 Exercise 4
Write a function that takes two sequences seq_a and seq_b as arguments and returns True
if every element in seq_a is also an element of seq_b, else False
4.8.5 Exercise 5
When we cover the numerical libraries, we will see they include many alternatives for interpo-
lation and function approximation
70 4. PYTHON ESSENTIALS
and returns the piecewise linear interpolation of f at x, based on n evenly spaced grid points
a = point[0] < point[1] < ... < point[n-1] = b
Aim for clarity, not efficiency
4.9 Solutions
4.9.1 Exercise 1
Part 1 Solution:
Hereโs one possible solution
Out[71]: 6
Out[72]: 6
Part 2 Solution:
One solution is
Out[73]: 50
Out[74]: 50
Some less natural alternatives that nonetheless help to illustrate the flexibility of list compre-
hensions are
4.9. SOLUTIONS 71
Out[75]: 50
and
Out[76]: 50
Part 3 Solution
Hereโs one possibility
In [77]: pairs = ((2, 5), (4, 2), (9, 8), (12, 10))
sum([x % 2 == 0 and y % 2 == 0 for x, y in pairs])
Out[77]: 2
4.9.2 Exercise 2
In [78]: def p(x, coeff):
return sum(a * x**i for i, a in enumerate(coeff))
Out[79]: 6
4.9.3 Exercise 3
Out[80]: 3
4.9.4 Exercise 4
Hereโs a solution:
# == test == #
True
False
Of course, if we use the sets data type then the solution is easier
4.9.5 Exercise 5
In [83]: def linapprox(f, a, b, n, x):
"""
Evaluates the piecewise linear interpolant of f at x on the interval
[a, b], with n evenly spaced grid points.
Parameters
===========
f : function
The function to approximate
n : integer
Number of grid points
Returns
=========
A float. The interpolant evaluated at x
"""
length_of_interval = b - a
num_subintervals = n - 1
step = length_of_interval / num_subintervals
# === x must lie between the gridpoints (point - step) and point === #
u, v = point - step, point
5.1 Contents
โข Overview 5.2
โข Objects 5.3
โข Summary 5.4
5.2 Overview
Python is a pragmatic language that blends object-oriented and procedural styles, rather than
taking a purist approach
However, at a foundational level, Python is object-oriented
73
74 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING
5.3 Objects
In Python, an object is a collection of data and instructions held in computer memory that
consists of
1. a type
2. a unique identity
3. data (i.e., content)
4. methods
5.3.1 Type
Python provides for different types of objects, to accommodate different categories of data
For example
Out[1]: str
Out[2]: int
Out[3]: '300cc'
Out[4]: 700
---------------------------------------------------------------------------
<ipython-input-5-263a89d2d982> in <module>
----> 1 '300' + 400
Here we are mixing types, and itโs unclear to Python whether the user wants to
To avoid the error, you need to clarify by changing the relevant type
For example,
Out[6]: 700
5.3.2 Identity
In Python, each object has a unique identifier, which helps Python (and us) keep track of the
object
The identity of an object can be obtained via the id() function
In [7]: y = 2.5
z = 2.5
id(y)
Out[7]: 140535456630128
In [8]: id(z)
Out[8]: 140535456630080
In this example, y and z happen to have the same value (i.e., 2.5), but they are not the
same object
The identity of an object is in fact just the address of the object in memory
76 5. OOP I: INTRODUCTION TO OBJECT ORIENTED PROGRAMMING
If we set x = 42 then we create an object of type int that contains the data 42
In fact, it contains more, as the following example shows
In [9]: x = 42
x
Out[9]: 42
In [10]: x.imag
Out[10]: 0
In [11]: x.__class__
Out[11]: int
When Python creates this integer object, it stores with it various auxiliary information, such
as the imaginary part, and the type
Any name following a dot is called an attribute of the object to the left of the dot
We see from this example that objects have attributes that contain auxiliary information
They also have attributes that act like functions, called methods
These attributes are important, so letโs discuss them in-depth
5.3.4 Methods
Out[12]: True
In [13]: callable(x.__doc__)
Out[13]: False
Methods typically act on the data contained in the object they belong to, or combine that
data with other data
In [15]: s.lower()
It doesnโt look like there are any methods used here, but in fact the square bracket assign-
ment notation is just a convenient interface to a method call
What actually happens is that Python calls the __setitem__ method, as follows
(If you wanted to you could modify the __setitem__ method, so that square bracket as-
signment does something totally different)
5.4 Summary
In [20]: type(f)
Out[20]: function
In [21]: id(f)
Out[21]: 140535456543336
In [22]: f.__name__
Out[22]: 'f'
We can see that f has type, identity, attributes and so onโjust like any other object
It also has methods
One example is the __call__ method, which just evaluates the function
In [23]: f.__call__(3)
Out[23]: 9
id(math)
Out[24]: 140535632790936
This uniform treatment of data in Python (everything is an object) helps keep the language
simple and consistent
Part II
79
6
NumPy
6.1 Contents
โข Overview 6.2
โข Exercises 6.7
โข Solutions 6.8
โLetโs be clear: the work of science has nothing whatever to do with consensus.
Consensus is the business of politics. Science, on the contrary, requires only one
investigator who happens to be right, which means that he or she has results that
are verifiable by reference to the real world. In science consensus is irrelevant.
What is relevant is reproducible results.โ โ Michael Crichton
6.2 Overview
In this lecture, we introduce NumPy arrays and the fundamental array processing operations
provided by NumPy
6.2.1 References
81
82 6. NUMPY
โข Loops in Python over Python data types like lists carry significant overhead
โข C and Fortran code contains a lot of type information that can be used for optimization
โข Various optimizations can be carried out during compilation when the compiler sees the
instructions as a whole
However, for a task like the one described above, thereโs no need to switch back to C or For-
tran
Instead, we can use NumPy, where the instructions look like this:
x = np.random.uniform(0, 1, size=1000000)
x.mean()
Out[1]: 0.5004892850074708
The operations of creating the array and computing its mean are both passed out to carefully
optimized machine code compiled from C
More generally, NumPy sends operations in batches to optimized C and Fortran code
This is similar in spirit to Matlab, which provides an interface to fast Fortran routines
In a later lecture, weโll discuss code that isnโt easy to vectorize and how such routines can
also be optimized
The most important thing that NumPy defines is an array data type formally called a
numpy.ndarray
6.4. NUMPY ARRAYS 83
In [2]: a = np.zeros(3)
a
In [3]: type(a)
Out[3]: numpy.ndarray
NumPy arrays are somewhat like native Python lists, except that
There are also dtypes to represent complex numbers, unsigned integers, etc
On modern machines, the default dtype for arrays is float64
In [4]: a = np.zeros(3)
type(a[0])
Out[4]: numpy.float64
Out[5]: numpy.int64
In [6]: z = np.zeros(10)
Here z is a flat array with no dimension โ neither row nor column vector
The dimension is recorded in the shape attribute, which is a tuple
In [7]: z.shape
84 6. NUMPY
Out[7]: (10,)
Here the shape tuple has only one element, which is the length of the array (tuples with one
element end with a comma)
To give it dimension, we can change the shape attribute
Out[8]: array([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]])
In [9]: z = np.zeros(4)
z.shape = (2, 2)
z
In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() func-
tion, as in z = np.zeros((2, 2))
In [10]: z = np.empty(3)
z
In [12]: z = np.identity(2)
z
6.4. NUMPY ARRAYS 85
In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array
In [14]: type(z)
Out[14]: numpy.ndarray
See also np.asarray, which performs a similar function, but does not make a distinct copy
of data already in a NumPy array
Out[17]: True
Out[18]: False
To read in the array data from a text file containing numeric data use np.loadtxt or
np.genfromtxtโsee the documentation for details
In [19]: z = np.linspace(1, 2, 5)
z
In [20]: z[0]
Out[20]: 1.0
86 6. NUMPY
In [22]: z[-1]
Out[22]: 2.0
In [24]: z[0, 0]
Out[24]: 1
In [25]: z[0, 1]
Out[25]: 2
And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows
In [26]: z[0, :]
In [27]: z[:, 1]
In [28]: z = np.linspace(2, 4, 5)
z
In [30]: z
6.4. NUMPY ARRAYS 87
In [32]: z[d]
Out[32]: array([2.5, 3. ])
In [33]: z = np.empty(3)
z
In [34]: z[:] = 42
z
Out[37]: 10
Out[38]: 2.5
Out[39]: 4
Out[40]: 3
Out[43]: 1.25
Out[44]: 1.118033988749895
In [46]: z = np.linspace(2, 4, 5)
z
In [47]: z.searchsorted(2.2)
Out[47]: 1
Many of the methods discussed above have equivalent functions in the NumPy namespace
In [49]: np.sum(a)
Out[49]: 10
In [50]: np.mean(a)
Out[50]: 2.5
6.5. OPERATIONS ON ARRAYS 89
In [52]: a * b
In [53]: a + 10
In [54]: a * 10
In [56]: A + 10
In [57]: A * B
With Anacondaโs scientific Python package based around Python 3.5 and above, one can use
the @ symbol for matrix multiplication, as follows:
(For older versions of Python and NumPy you need to use the np.dot function)
We can also use @ to take the inner product of two flat arrays
Out[59]: 50
In [61]: A @ (0, 1)
Mutability leads to the following behavior (which can be shocking to MATLAB program-
mersโฆ)
In [64]: a = np.random.randn(3)
a
In [65]: b = a
b[0] = 0.0
a
In [66]: a = np.random.randn(3)
a
In [67]: b = np.copy(a)
b
In [68]: b[:] = 1
b
In [69]: a
NumPy provides versions of the standard functions log, exp, sin, etc. that act element-
wise on arrays
In [71]: n = len(z)
y = np.empty(n)
for i in range(n):
y[i] = np.sin(z[i])
Because they act element-wise on arrays, these functions are called vectorized functions
In NumPy-speak, they are also called ufuncs, which stands for โuniversal functionsโ
As we saw above, the usual arithmetic operations (+, *, etc.) also work element-wise, and
combining these with the ufuncs gives a very large set of fast element-wise functions
In [72]: z
In [75]: x = np.random.randn(4)
x
f = np.vectorize(f)
f(x) # Passing the same vector x as in the previous example
However, this approach doesnโt always obtain the same speed as a more carefully crafted vec-
torized function
6.6.2 Comparisons
In [79]: y[0] = 5
z == y
In [80]: z != y
In [82]: z > 3
In [83]: b = z > 3
b
In [84]: z[b]
6.6.3 Sub-packages
NumPy provides some additional functionality related to scientific programming through its
sub-packages
Weโve already seen how we can generate random variables using np.random
Out[86]: 5.034
Out[87]: -2.0000000000000004
Out[88]: array([[-2. , 1. ],
[ 1.5, -0.5]])
Much of this functionality is also available in SciPy, a collection of modules that are built on
top of NumPy
Weโll cover the SciPy versions in more detail soon
For a comprehensive list of whatโs available in NumPy see this documentation
6.7 Exercises
6.7.1 Exercise 1
๐
๐(๐ฅ) = ๐0 + ๐1 ๐ฅ + ๐2 ๐ฅ2 + โฏ ๐๐ ๐ฅ๐ = โ ๐๐ ๐ฅ๐ (1)
๐=0
Earlier, you wrote a simple function p(x, coeff) to evaluate Eq. (1) without considering
efficiency
Now write a new function that does the same job, but uses NumPy arrays and array opera-
tions for its computations, rather than any form of Python loop
(Such functionality is already implemented as np.poly1d, but for the sake of the exercise
donโt use this class)
6.7.2 Exercise 2
โข Divide the unit interval [0, 1] into ๐ subintervals ๐ผ0 , ๐ผ1 , โฆ , ๐ผ๐โ1 such that the length of
๐ผ๐ is ๐๐
โข Draw a uniform random variable ๐ on [0, 1] and return the ๐ such that ๐ โ ๐ผ๐
def sample(q):
a = 0.0
U = uniform(0, 1)
for i in range(len(q)):
if a < U <= a + q[i]:
return i
a = a + q[i]
If you canโt see how this works, try thinking through the flow for a simple example, such as q
= [0.25, 0.75] It helps to sketch the intervals on paper
Your exercise is to speed it up using NumPy, avoiding explicit loops
If you can, write the method so that draw(k) returns k draws from q
6.7.3 Exercise 3
6.8 Solutions
In [90]: import matplotlib.pyplot as plt
%matplotlib inline
6.8.1 Exercise 1
Letโs test it
[1. 1. 1.]
3.0
3.0
6.8.2 Exercise 2
class DiscreteRV:
"""
Generates an array of draws from a discrete random variable with vector of
probabilities given by q.
"""
The logic is not obvious, but if you take your time and read it slowly, you will understand
There is a problem here, however
Suppose that q is altered after an instance of discreteRV is created, for example by
6.8. SOLUTIONS 97
The problem is that Q does not change accordingly, and Q is the data used in the draw
method
To deal with this, one option is to compute Q every time the draw method is called
But this is inefficient relative to computing Q once-off
A better option is to use descriptors
A solution from the quantecon library using descriptors that behaves as we desire can be
found here
6.8.3 Exercise 3
In [95]: """
Modifies ecdf.py from QuantEcon to add in a plot method
"""
class ECDF:
"""
One-dimensional empirical distribution function given a vector of
observations.
Parameters
----------
observations : array_like
An array of observations
Attributes
----------
observations : array_like
An array of observations
"""
Parameters
----------
x : scalar(float)
The x at which the ecdf is evaluated
Returns
-------
scalar(float)
Fraction of the sample less than x
"""
return np.mean(self.observations <= x)
Parameters
----------
a : scalar(float), optional(default=None)
Lower endpoint of the plot interval
b : scalar(float), optional(default=None)
Upper endpoint of the plot interval
"""
In [96]: X = np.random.randn(1000)
F = ECDF(X)
F.plot()
7
Matplotlib
7.1 Contents
โข Overview 7.2
โข Exercises 7.6
โข Solutions 7.7
7.2 Overview
Weโve already generated quite a few figures in these lectures using Matplotlib
Matplotlib is an outstanding graphics library, designed for scientific computing, with
99
100 7. MATPLOTLIB
Hereโs the kind of easy example you might find in introductory treatments
This is simple and convenient, but also somewhat limited and un-Pythonic
For example, in the function calls, a lot of objects get created and passed around without
making themselves known to the programmer
Python programmers tend to prefer a more explicit style of programming (run import this
in a code block and look at the second line)
This leads us to the alternative, object-oriented Matplotlib API
Hereโs the code corresponding to the preceding figure using the object-oriented API
7.3.3 Tweaks
Weโve also used alpha to make the line slightly transparentโwhich makes it look smoother
The location of the legend can be changed by replacing ax.legend() with
ax.legend(loc='upper center')
Matplotlib has a huge array of functions and features, which you can discover over time as
you have need for them
We mention just a few
fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = f'$\mu = {m:.2}$'
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()
7.4.3 3D Plots
Perhaps you will find a set of customizations that you regularly use
Suppose we usually prefer our axes to go through the origin, and to have a grid
7.5. FURTHER READING 107
Hereโs a nice example from Matthew Doty of how the object-oriented API can be used to
build a custom subplots function that implements these changes
Read carefully through the code and see if you can follow whatโs going on
ax.grid()
return fig, ax
1. calls the standard plt.subplots function internally to generate the fig, ax pair,
2. makes the desired customizations to ax, and
3. passes the fig, ax pair back to the calling code
7.6 Exercises
7.6.1 Exercise 1
7.7 Solutions
7.7.1 Exercise 1
for ฮธ in ฮธ_vals:
ax.plot(x, np.cos(np.pi * ฮธ * x) * np.exp(- x))
plt.show()
7.7. SOLUTIONS 109
110 7. MATPLOTLIB
8
SciPy
8.1 Contents
โข Statistics 8.3
โข Optimization 8.5
โข Integration 8.6
โข Exercises 8.8
โข Solutions 8.9
SciPy builds on top of NumPy to provide common tools for scientific programming such as
โข linear algebra
โข numerical integration
โข interpolation
โข optimization
โข distributions and random number generation
โข signal processing
โข etc., etc
111
112 8. SCIPY
SciPy is a package that contains various tools that are built on top of NumPy, using its array
data type and related functionality
In fact, when we import SciPy we also get NumPy, as can be seen from the SciPy initializa-
tion file
__all__ = []
__all__ += _num.__all__
__all__ += ['randn', 'rand', 'fft', 'ifft']
del _num
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
__all__.remove('linalg')
However, itโs more common and better practice to use NumPy functionality explicitly
a = np.identity(3)
8.3 Statistics
๐ฅ(๐โ1) (1 โ ๐ฅ)(๐โ1)
๐(๐ฅ; ๐, ๐) = 1
(0 โค ๐ฅ โค 1) (1)
โซ0 ๐ข(๐โ1) (1 โ ๐ข)(๐โ1) ๐๐ข
Sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this, we can use scipy.stats, which provides all of this functionality as well as random
number generation in a single consistent interface
Hereโs an example of usage
In this code, we created a so-called rv_frozen object, via the call q = beta(5, 5)
114 8. SCIPY
The โfrozenโ part of the notation implies that q represents a particular distribution with a
particular set of parameters
Once weโve done so, we can then generate random numbers, evaluate the density, etc., all
from this fixed distribution
Out[6]: 0.26656768000000003
Out[7]: 2.0901888000000013
Out[8]: 0.6339134834642708
In [9]: q.mean()
Out[9]: 0.5
identifier = scipy.stats.distribution_name(shape_parameters)
identifier = scipy.stats.distribution_name(shape_parameters,
loc=c, scale=d)
fig, ax = plt.subplots()
ax.hist(obs, bins=40, density=True)
ax.plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
plt.show()
8.4. ROOTS AND FIXED POINTS 115
x = np.random.randn(200)
y = 2 * x + 0.1 * np.random.randn(200)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
gradient, intercept
plt.figure(figsize=(10, 8))
plt.plot(x, f(x))
plt.axhline(ls='--', c='k')
plt.show()
8.4.1 Bisection
And so on
This is bisection
Hereโs a fairly simplistic implementation of the algorithm in Python
It works for all sufficiently well behaved increasing continuous functions with ๐(๐) < 0 < ๐(๐)
8.4. ROOTS AND FIXED POINTS 117
In fact, SciPy provides its own bisection function, which we now test using the function ๐ de-
fined in Eq. (2)
bisect(f, 0, 1)
Out[14]: 0.4082935042806639
โข When the function is well-behaved, the Newton-Raphson method is faster than bisec-
tion
โข When the function is less well-behaved, the Newton-Raphson might fail
Letโs investigate this using the same function ๐, first looking at potential instability
Out[15]: 0.40829350427935673
Out[16]: 0.7001700000000279
62.4 ยตs ยฑ 4.15 ยตs per loop (mean ยฑ std. dev. of 7 runs, 10000 loops each)
149 ยตs ยฑ 5.77 ยตs per loop (mean ยฑ std. dev. of 7 runs, 10000 loops each)
So far we have seen that the Newton-Raphson method is fast but not robust
This bisection algorithm is robust but relatively slow
This illustrates a general principle
โข If you have specific knowledge about your function, you might be able to exploit it to
generate efficiency
โข If not, then the algorithm choice involves a trade-off between the speed of convergence
and robustness
In practice, most default algorithms for root-finding, optimization and fixed points use hybrid
methods
These methods typically combine a fast method with a robust method in the following man-
ner:
In scipy.optimize, the function brentq is such a hybrid method and a good default
In [19]: brentq(f, 0, 1)
Out[19]: 0.40829350427936706
15.6 ยตs ยฑ 840 ns per loop (mean ยฑ std. dev. of 7 runs, 100000 loops each)
Here the correct solution is found and the speed is almost the same as newton
Out[21]: array(1.)
If you donโt get good results, you can always switch back to the brentq root finder, since
the fixed point of a function ๐ is the root of ๐(๐ฅ) โถ= ๐ฅ โ ๐(๐ฅ)
8.5 Optimization
Out[22]: 0.0
8.6 Integration
Out[23]: 0.33333333333333337
In fact, quad is an interface to a very standard numerical integration routine in the Fortran
library QUADPACK
It uses Clenshaw-Curtis quadrature, based on expansion in terms of Chebychev polynomials
There are other options for univariate integrationโa useful one is fixed_quad, which is fast
and hence works well inside for loops
There are also functions for multivariate integration
See the documentation for more details
We saw that NumPy provides a module for linear algebra called linalg
SciPy also provides a module for linear algebra with the same name
The latter is not an exact superset of the former, but overall it has more functionality
We leave you to investigate the set of available routines
8.8 Exercises
8.8.1 Exercise 1
8.9 Solutions
8.9.1 Exercise 1
Out[26]: 0.408294677734375
122 8. SCIPY
9
Numba
9.1 Contents
โข Overview 9.2
โข Vectorization 9.4
โข Numba 9.5
In addition to whatโs in Anaconda, this lecture will need the following libraries
9.2 Overview
In our lecture on NumPy, we learned one method to improve speed and efficiency in numeri-
cal work
That method, called vectorization, involved sending array processing operations in batch to
efficient low-level code
This clever idea dates back to Matlab, which uses it extensively
Unfortunately, vectorization is limited and has several weaknesses
One weakness is that it is highly memory-intensive
Another problem is that only some algorithms can be vectorized
In the last few years, a new Python library called Numba has appeared that solves many of
these problems
It does so through something called just in time (JIT) compilation
JIT compilation is effective in many numerical settings and can generate extremely fast, effi-
cient code
It can also do other tricks such as facilitate multithreading (a form of parallelization well
suited to numerical work)
123
124 9. NUMBA
To understand what Numba does and why, we need some background knowledge
Letโs start by thinking about higher-level languages, such as Python
These languages are optimized for humans
This means that the programmer can leave many details to the runtime environment
The upside is that, compared to low-level languages, Python is typically faster to write, less
error-prone and easier to debug
The downside is that Python is harder to optimize โ that is, turn into fast machine code โ
than languages like C or Fortran
Indeed, the standard implementation of Python (called CPython) cannot match the speed of
compiled languages such as C or Fortran
Does that mean that we should just switch to C or Fortran for everything?
The answer is no, no and one hundred times no
High productivity languages should be chosen over high-speed languages for the great major-
ity of scientific computing tasks
This is because
1. Of any given program, relatively few lines are ever going to be time-critical
2. For those lines of code that are time-critical, we can achieve C-like speed using a combi-
nation of NumPy and Numba
Letโs start by trying to understand why high-level languages like Python are slower than com-
piled code
In [2]: a, b = 10, 10
a + b
Out[2]: 20
Even for this simple operation, the Python interpreter has a fair bit of work to do
For example, in the statement a + b, the interpreter has to know which operation to invoke
If a and b are strings, then a + b requires string concatenation
9.3. WHERE ARE THE BOTTLENECKS? 125
Out[3]: 'foobar'
(We say that the operator + is overloaded โ its action depends on the type of the objects on
which it acts)
As a result, Python must check the type of the objects and then call the correct operation
This involves substantial overheads
Static Types
Compiled languages avoid these overheads with explicit, static types
For example, consider the following C code, which sums the integers from 1 to 10
#include <stdio.h>
int main(void) {
int i;
int sum = 0;
for (i = 1; i <= 10; i++) {
sum = sum + i;
}
printf("sum = %d\n", sum);
return 0;
}
โข In modern computers, memory addresses are allocated to each byte (one byte = 8 bits)
126 9. NUMBA
Moreover, the compiler is made aware of the data type by the programmer
Hence, each successive data point can be accessed by shifting forward in memory space by a
known and fixed amount
9.4 Vectorization
โข The machine code itself is typically compiled from carefully optimized C or Fortran
This can greatly accelerate many (but not all) numerical computations
Out[6]: 0.04178762435913086
In [7]: qe.util.tic()
n = 100_000
x = np.random.uniform(0, 1, n)
np.sum(x**2)
qe.util.toc()
Out[7]: 0.0038301944732666016
The second code block โ which achieves the same thing as the first โ runs much faster
The reason is that in the second implementation we have broken the loop down into three
basic operations
1. draw n uniforms
2. square them
3. sum them
Many functions provided by NumPy are so-called universal functions โ also called ufuncs
This means that they
In [8]: np.cos(1.0)
Out[8]: 0.5403023058681398
128 9. NUMBA
cos(๐ฅ2 + ๐ฆ2 )
๐(๐ฅ, ๐ฆ) = and ๐ = 3
1 + ๐ฅ2 + ๐ฆ 2
Hereโs a plot of ๐
qe.tic()
for x in grid:
for y in grid:
z = f(x, y)
if z > m:
m = z
qe.toc()
Out[11]: 2.7486989498138428
qe.tic()
np.max(f(x, y))
qe.toc()
Out[12]: 0.02516627311706543
In the vectorized version, all the looping takes place in compiled code
As you can see, the second version is much faster
(Weโll make it even faster again below when we discuss Numba)
9.5 Numba
9.5.1 Prerequisites
9.5.2 An Example
๐ฅ๐ก+1 = 4๐ฅ๐ก (1 โ ๐ฅ๐ก )
Hereโs the plot of a typical trajectory, starting from ๐ฅ0 = 0.1, with ๐ก on the x-axis
x = qm(0.1, 250)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, 'b-', lw=2, alpha=0.8)
ax.set_xlabel('time', fontsize=16)
plt.show()
Letโs time and compare identical function calls across these two versions:
In [15]: qe.util.tic()
qm(0.1, int(10**5))
time1 = qe.util.toc()
132 9. NUMBA
In [16]: qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()
The first execution is relatively slow because of JIT compilation (see below)
Next time and all subsequent times it runs much faster:
In [17]: qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()
Out[18]: 174.51294400963275
In [19]: @jit
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x
Numba attempts to generate fast machine code using the infrastructure provided by the
LLVM Project
It does this by inferring type information on the fly
As you can imagine, this is easier for simple Python objects (simple scalar data types, such as
floats, integers, etc.)
Numba also plays well with NumPy arrays, which it treats as typed memory regions
9.5. NUMBA 133
In [20]: a = 1
@jit
def add_x(x):
return a + x
print(add_x(10))
11
In [21]: a = 2
print(add_x(10))
11
Notice that changing the global had no effect on the value returned by the function
When Numba compiles machine code for functions, it treats global variables as constants to
ensure type stability
Numba can also be used to create custom ufuncs with the @vectorize decorator
To illustrate the advantage of using Numba to vectorize a function, we return to a maximiza-
tion problem discussed above
@vectorize
def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)
qe.tic()
np.max(f_vec(x, y))
qe.toc()
Out[22]: 0.030055522918701172
qe.tic()
np.max(f_vec(x, y))
qe.toc()
Out[23]: 0.023700714111328125
10.1 Contents
โข Overview 10.2
โข Cython 10.3
โข Joblib 10.4
โข Exercises 10.6
โข Solutions 10.7
In addition to whatโs in Anaconda, this lecture will need the following libraries
10.2 Overview
In this lecture, we review some other scientific libraries that are useful for economic research
and analysis
We have, however, already picked most of the low hanging fruit in terms of economic research
Hence you should feel free to skip this lecture on first pass
10.3 Cython
Like Numba, Cython provides an approach to generating fast compiled code that can be used
from Python
As was the case with Numba, a key problem is the fact that Python is dynamically typed
As youโll recall, Numba solves this problem (where possible) by inferring type
Cythonโs approach is different โ programmers add type definitions directly to their โPythonโ
code
135
136 10. OTHER SCIENTIFIC LIBRARIES
As such, the Cython language can be thought of as Python with type definitions
In addition to a language specification, Cython is also a language translator, transforming
Cython code into optimized C and C++ code
Cython also takes care of building language extensions โ the wrapper code that interfaces
between the resulting compiled code and Python
Important Note:
In what follows code is executed in a Jupyter notebook
This is to take advantage of a Cython cell magic that makes Cython particularly easy to use
Some modifications are required to run the code outside a notebook
๐
1 โ ๐ผ๐+1
โ ๐ผ๐ =
๐=0
1โ๐ผ
If youโre not familiar with C, the main thing you should take notice of is the type definitions
In [4]: %%cython
def geo_prog_cython(double alpha, int n):
cdef double current = 1.0
cdef double sum = current
cdef int i
for i in range(n):
current = current * alpha
sum = sum + current
return sum
Here cdef is a Cython keyword indicating a variable declaration and is followed by a type
The %%cython line at the top is not actually Cython code โ itโs a Jupyter cell magic indi-
cating the start of Cython code
After executing the cell, you can now call the function geo_prog_cython from within
Python
What you are in fact calling is compiled C code with a Python call interface
Out[5]: 0.0884397029876709
In [6]: qe.util.tic()
geo_prog_cython(0.99, int(10**6))
qe.util.toc()
Out[6]: 0.03421354293823242
138 10. OTHER SCIENTIFIC LIBRARIES
Letโs go back to the first problem that we worked with: generating the iterates of the
quadratic map
๐ฅ๐ก+1 = 4๐ฅ๐ก (1 โ ๐ฅ๐ก )
The problem of computing iterates and returning a time series requires us to work with ar-
rays
The natural array type to work with is NumPy arrays
Hereโs a Cython implementation that initializes, populates and returns a NumPy array
In [7]: %%cython
import numpy as np
If you run this code and time it, you will see that its performance is disappointing โ nothing
like the speed gain we got from Numba
In [8]: qe.util.tic()
qm_cython_first_pass(0.1, int(10**5))
qe.util.toc()
Out[8]: 0.03150629997253418
This example was also computed in the Numba lecture, and you can see Numba is around 90
times faster
The reason is that working with NumPy arrays incurs substantial Python overheads
We can do better by using Cythonโs typed memoryviews, which provide more direct access to
arrays in memory
When using them, the first step is to create a NumPy array
Next, we declare a memoryview and bind it to the NumPy array
Hereโs an example:
In [9]: %%cython
import numpy as np
from numpy cimport float_t
Here
In [10]: qe.util.tic()
qm_cython(0.1, int(10**5))
qe.util.toc()
Out[10]: 0.0006136894226074219
10.3.3 Summary
Cython requires more expertise than Numba, and is a little more fiddly in terms of getting
good performance
In fact, itโs surprising how difficult it is to beat the speed improvements provided by Numba
Nonetheless,
10.4 Joblib
10.4.1 Caching
Perhaps, like us, you sometimes run a long computation that simulates a model at a given set
of parameters โ to generate a figure, say, or a table
20 minutes later you realize that you want to tweak the figure and now you have to do it all
again
What caching will do is automatically store results at each parameterization
With Joblib, results are compressed and stored on file, and automatically served back up to
you when you repeat the calculation
10.4.2 An Example
Letโs look at a toy example, related to the quadratic map model discussed above
Letโs say we want to generate a long trajectory from a certain initial condition ๐ฅ0 and see
what fraction of the sample is below 0.1
(Weโll omit JIT compilation or other speedups for simplicity)
Hereโs our code
@memory.cache
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return np.mean(x < 0.1)
We are using joblib to cache the result of calling qm at a given set of parameters
With the argument location=โ./joblib_cacheโ, any call to this function results in both the in-
put values and output values being stored a subdirectory joblib_cache of the present working
directory
(In UNIX shells, . refers to the present working directory)
The first time we call the function with a given set of parameters we see some extra output
that notes information being cached
In [13]: qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()
________________________________________________________________________________
[Memory] Calling __main__--home-anju-Desktop-lecture-source-py-_build-jupyter-executed-__ipython-input__.qmโฆ
qm(0.2, 10000000)
_______________________________________________________________qm - 8.9s, 0.1min
TOC: Elapsed: 0:00:8.85
Out[13]: 8.85545039176941
10.5. OTHER OPTIONS 141
The next time we call the function with the same set of parameters, the result is returned
almost instantaneously
In [14]: qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()
Out[14]: 0.0007827281951904297
There are in fact many other approaches to speeding up your Python code
One is interfacing with Fortran
If you are comfortable writing Fortran you will find it very easy to create extension modules
from Fortran code using F2Py
F2Py is a Fortran-to-Python interface generator that is particularly simple to use
Robert Johansson provides a very nice introduction to F2Py, among other things
Recently, a Jupyter cell magic for Fortran has been developed โ you might want to give it a
try
10.6 Exercises
10.6.1 Exercise 1
For example, let the period length be one month, and suppose the current state is high
We see from the graph that the state next month will be
Your task is to simulate a sequence of monthly volatility states according to this rule
Set the length of the sequence to n = 100000 and start in the high state
Implement a pure Python version, a Numba version and a Cython version, and compare
speeds
To test your code, evaluate the fraction of time that the chain spends in the low state
If your code is correct, it should be about 2/3
10.7 Solutions
10.7.1 Exercise 1
We let
โข 0 represent โlowโ
โข 1 represent โhighโ
In [15]: p, q = 0.1, 0.2 # Prob of leaving low and high state respectively
Letโs run this code and check that the fraction of time spent in the low state is about 0.666
In [17]: n = 100000
x = compute_series(n)
print(np.mean(x == 0)) # Fraction of time x is in state 0
0.6629
In [18]: qe.util.tic()
compute_series(n)
qe.util.toc()
Out[18]: 0.0751335620880127
10.7. SOLUTIONS 143
compute_series_numba = jit(compute_series)
In [20]: x = compute_series_numba(n)
print(np.mean(x == 0))
0.66566
In [21]: qe.util.tic()
compute_series_numba(n)
qe.util.toc()
Out[21]: 0.0015265941619873047
In [23]: %%cython
import numpy as np
from numpy cimport int_t, float_t
In [24]: compute_series_cy(10)
144 10. OTHER SCIENTIFIC LIBRARIES
In [25]: x = compute_series_cy(n)
print(np.mean(x == 0))
0.66746
In [26]: qe.util.tic()
compute_series_cy(n)
qe.util.toc()
Out[26]: 0.0033597946166992188
145
11
11.1 Contents
โข Overview 11.2
โข Summary 11.6
11.2 Overview
When computer programs are small, poorly written code is not overly costly
But more data, more sophisticated models, and more computer power are enabling us to take
on more challenging problems that involve writing longer programs
For such programs, investment in good coding practices will pay high returns
The main payoffs are higher productivity and faster code
In this lecture, we review some elements of good coding practice
We also touch on modern developments in scientific computing โ such as just in time compi-
lation โ and how they affect good program design
Here
147
148 11. WRITING GOOD CODE
1. sets ๐0 = 1
2. iterates using Eq. (1) to produce a sequence ๐0 , ๐1 , ๐2 โฆ , ๐๐
3. plots the sequence
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**ฮฑ[j] + (1 - ฮด) * k[t]
axes[0].plot(k, 'o-', label=rf"$\alpha = {ฮฑ[j]},\; s = {s},\; \delta={ฮด}$")
axes[0].grid(lw=0.2)
axes[0].set_ylim(0, 18)
axes[0].set_xlabel('time')
axes[0].set_ylabel('capital')
axes[0].legend(loc='upper left', frameon=True, fontsize=14)
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s[j] * k[t]**ฮฑ + (1 - ฮด) * k[t]
axes[1].plot(k, 'o-', label=rf"$\alpha = {ฮฑ},\; s = {s},\; \delta={ฮด}$")
axes[1].grid(lw=0.2)
axes[1].set_xlabel('time')
axes[1].set_ylabel('capital')
axes[1].set_ylim(0, 18)
axes[1].legend(loc='upper left', frameon=True, fontsize=14)
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**ฮฑ + (1 - ฮด[j]) * k[t]
axes[2].plot(k, 'o-', label=rf"$\alpha = {ฮฑ},\; s = {s},\; \delta={ฮด[j]}$")
11.3. AN EXAMPLE OF BAD CODE 149
axes[2].set_ylim(0, 18)
axes[2].set_xlabel('time')
axes[2].set_ylabel('capital')
axes[2].grid(lw=0.2)
axes[2].legend(loc='upper left', frameon=True, fontsize=14)
plt.show()
There are usually many different ways to write a program that accomplishes a given task
For small programs, like the one above, the way you write code doesnโt matter too much
But if you are ambitious and want to produce useful things, youโll write medium to large pro-
grams too
In those settings, coding style matters a great deal
Fortunately, lots of smart people have thought about the best way to write code
Here are some basic precepts
If you look at the code above, youโll see numbers like 50 and 49 and 3 scattered through the
code
These kinds of numeric literals in the body of your code are sometimes called โmagic num-
bersโ
This is not a complement
While numeric literals are not all evil, the numbers shown in the program above should cer-
tainly be replaced by named constants
For example, the code above could declare the variable time_series_length = 50
Then in the loops, 49 should be replaced by time_series_length - 1
The advantages are:
Yes, we realize that you can just cut and paste and change a few symbols
But as a programmer, your aim should be to automate repetition, not do it yourself
More importantly, repeating the same logic in different places means that eventually one of
them will likely be wrong
If you want to know more, read the excellent summary found on this page
Weโll talk about how to avoid repetition below
11.4. GOOD CODING PRACTICE 151
Sure, global variables (i.e., names assigned to values outside of any function or class) are con-
venient
Rookie programmers typically use global variables with abandon โ as we once did ourselves
But global variables are dangerous, especially in medium to large size programs, since
This makes it much harder to be certain about what some small part of a given piece of code
actually commands
Hereโs a useful discussion on the topic
While the odd global in small scripts is no big deal, we recommend that you teach yourself to
avoid them
(Weโll discuss how just below)
JIT Compilation
In fact, thereโs now another good reason to avoid global variables
In scientific computing, weโre witnessing the rapid growth of just in time (JIT) compilation
JIT compilation can generate excellent performance for scripting languages like Python and
Julia
But the task of the compiler used for JIT compilation becomes much harder when many
global variables are present
(This is because data type instability hinders the generation of efficient machine code โ weโll
learn more about such topics later on)
Fortunately, we can easily avoid the evils of global variables and WET code
โข WET stands for โwe love typingโ and is the opposite of DRY
Hereโs some code that reproduces the plot above with better coding style
It uses a function to avoid repetition
Note also that
โข global variables are quarantined by collecting together at the end, not the start of the
program
โข magic numbers are avoided
โข the loop at the end where the actual work is done is short and relatively simple
ax.grid(lw=0.2)
ax.set_xlabel('time')
ax.set_ylabel('capital')
ax.set_ylim(0, 18)
ax.legend(loc='upper left', frameon=True, fontsize=14)
plt.show()
11.6. SUMMARY 153
11.6 Summary
12.1 Contents
โข Overview 12.2
โข Exercises 12.6
โข Solutions 12.7
12.2 Overview
So imagine now you want to write a program with consumers, who can
155
156 12. OOP II: BUILDING CLASSES
As discussed an earlier lecture, in the OOP paradigm, data and functions are bundled to-
gether into โobjectsโ
An example is a Python list, which not only stores data but also knows how to sort itself, etc.
In [1]: x = [1, 5, 4]
x.sort()
x
Out[1]: [1, 4, 5]
As we now know, sort is a function that is โpart ofโ the list object โ and hence called a
method
If we want to make our own types of objects we need to use class definitions
A class definition is a blueprint for a particular class of objects (e.g., lists, strings or complex
numbers)
It describes
In Python, the data and methods of an object are collectively referred to as attributes
Attributes are accessed via โdotted attribute notationโ
โข object_name.data
โข object_name.method_name()
In the example
In [2]: x = [1, 5, 4]
x.sort()
x.__class__
Out[2]: list
โข x is an object or instance, created from the definition for Python lists, but with its own
particular data
โข x.sort() and x.__class__ are two attributes of x
โข dir(x) can be used to view all the attributes of x
OOP is useful for the same reason that abstraction is useful: for recognizing and exploiting
the common structure
For example,
โข a Markov chain consists of a set of states and a collection of transition probabilities for
moving across states
โข a general equilibrium theory consists of a commodity space, preferences, technologies,
and an equilibrium definition
โข a game consists of a list of players, lists of actions available to each player, player pay-
offs as functions of all playersโ actions, and a timing protocol
These are all abstractions that collect together โobjectsโ of the same โtypeโ
Recognizing common structure allows us to employ common tools
In economic theory, this might be a proposition that applies to all games of a certain type
In Python, this might be a method thatโs useful for all Markov chains (e.g., simulate)
When we use OOP, the simulate method is conveniently bundled together with the Markov
chain object
Admittedly a little contrived, this example of a class helps us internalize some new syntax
Hereโs one implementation
This class defines instance data wealth and three methods: __init__, earn and spend
โข wealth is instance data because each consumer we create (each instance of the Con-
sumer class) will have its own separate wealth data
The ideas behind the earn and spend methods were discussed above
Both of these act on the instance data wealth
The __init__ method is a constructor method
Whenever we create an instance of the class, this method will be called automatically
Calling __init__ sets up a โnamespaceโ to hold the instance data โ more on this soon
Weโll also discuss the role of self just below
Usage
Hereโs an example of usage
Out[4]: 5
In [5]: c1.earn(15)
c1.spend(100)
Insufficent funds
We can of course create multiple instances each with its own data
In [6]: c1 = Consumer(10)
c2 = Consumer(12)
c2.spend(4)
c2.wealth
Out[6]: 8
In [7]: c1.wealth
Out[7]: 10
In [8]: c1.__dict__
In [9]: c2.__dict__
Out[9]: {'wealth': 8}
When we access or set attributes weโre actually just modifying the dictionary maintained by
the instance
Self
If you look at the Consumer class definition again youโll see the word self throughout the
code
The rules with self are that
โ e.g., the earn method references self.wealth rather than just wealth
โข Any method defined within the class should have self as its first argument
There are no examples of the last rule in the preceding code but we will see some shortly
Details
In this section, we look at some more formal details related to classes and self
160 12. OOP II: BUILDING CLASSES
โข You might wish to skip to the next section on first pass of this lecture
โข You can return to these details after youโve familiarized yourself with more examples
Methods actually live inside a class object formed when the interpreter reads the class defini-
tion
Note how the three methods __init__, earn and spend are stored in the class object
Consider the following code
In [11]: c1 = Consumer(10)
c1.earn(10)
c1.wealth
Out[11]: 20
When you call earn via c1.earn(10) the interpreter passes the instance c1 and the argu-
ment 10 to Consumer.earn
In fact, the following are equivalent
โข c1.earn(10)
โข Consumer.earn(c1, 10)
In the function call Consumer.earn(c1, 10) note that c1 is the first argument
Recall that in the definition of the earn method, self is the first parameter
The end result is that self is bound to the instance c1 inside the function call
Thatโs why the statement self.wealth += y inside earn ends up modifying c1.wealth
For our next example, letโs write a simple class to implement the Solow growth model
The Solow growth model is a neoclassical growth model where the amount of capital stock
per capita ๐๐ก evolves according to the rule
๐ ๐ง๐๐ก๐ผ + (1 โ ๐ฟ)๐๐ก
๐๐ก+1 = (1)
1+๐
Here
12.4. DEFINING YOUR OWN CLASSES 161
The steady state of the model is the ๐ that solves Eq. (1) when ๐๐ก+1 = ๐๐ก = ๐
Hereโs a class that implements this model
Some points of interest in the code are
โข An instance maintains a record of its current capital stock in the variable self.k
โ Notice how inside update the reference to the local method h is self.h
"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
ฮด=0.1, # depreciation rate
ฮฑ=0.3, # share of labor
z=2.0, # productivity
k=1.0): # current capital stock
def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, ฮด, ฮฑ, z = self.n, self.s, self.ฮด, self.ฮฑ, self.z
# Apply the update rule
return (s * z * self.k**ฮฑ + (1 - ฮด) * self.k) / (1 + n)
def update(self):
"Update the current state (i.e., the capital stock)."
self.k = self.h()
def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, ฮด, ฮฑ, z = self.n, self.s, self.ฮด, self.ฮฑ, self.z
# Compute and return steady state
return ((s * z) / (n + ฮด))**(1 / (1 - ฮฑ))
Hereโs a little program that uses the class to compute time series from two different initial
conditions
The common steady state is also plotted for comparison
s1 = Solow()
s2 = Solow(k=8.0)
T = 60
fig, ax = plt.subplots(figsize=(9, 6))
ax.legend()
plt.show()
Next, letโs write a class for a simple one good market where agents are price takers
The market consists of the following objects:
Here
The class provides methods to compute various values of interest, including competitive equi-
librium price and quantity, tax revenue raised, consumer surplus and producer surplus
Hereโs our implementation
class Market:
"""
self.ad, self.bd, self.az, self.bz, self.tax = ad, bd, az, bz, tax
if ad < az:
raise ValueError('Insufficient demand.')
def price(self):
"Return equilibrium price"
return (self.ad - self.az + self.bz * self.tax) / (self.bd + self.bz)
def quantity(self):
"Compute equilibrium quantity"
return self.ad - self.bd * self.price()
def consumer_surp(self):
"Compute consumer surplus"
# == Compute area under inverse demand function == #
integrand = lambda x: (self.ad / self.bd) - (1 / self.bd) * x
area, error = quad(integrand, 0, self.quantity())
return area - self.price() * self.quantity()
def producer_surp(self):
"Compute producer surplus"
# == Compute area above inverse supply curve, excluding tax == #
integrand = lambda x: -(self.az / self.bz) + (1 / self.bz) * x
area, error = quad(integrand, 0, self.quantity())
return (self.price() - self.tax) * self.quantity() - area
def taxrev(self):
"Compute tax revenue"
return self.tax * self.quantity()
Hereโs a short program that uses this class to plot an inverse demand curve together with in-
verse supply curves with and without taxes
q_max = m.quantity() * 2
q_grid = np.linspace(0.0, q_max, 100)
pd = m.inverse_demand(q_grid)
ps = m.inverse_supply(q_grid)
psno = m.inverse_supply_no_tax(q_grid)
fig, ax = plt.subplots()
ax.plot(q_grid, pd, lw=2, alpha=0.6, label='demand')
ax.plot(q_grid, ps, lw=2, alpha=0.6, label='supply')
ax.plot(q_grid, psno, '--k', lw=2, alpha=0.6, label='supply without tax')
ax.set_xlabel('quantity', fontsize=14)
ax.set_xlim(0, q_max)
ax.set_ylabel('price', fontsize=14)
ax.legend(loc='lower right', frameon=False, fontsize=14)
plt.show()
Out[20]: 1.125
Letโs look at one more example, related to chaotic dynamics in nonlinear systems
One simple transition rule that can generate complex dynamics is the logistic map
Letโs write a class for generating time series from this model
Hereโs one implementation
def update(self):
"Apply the map to update state."
self.x = self.r * self.x *(1 - self.x)
fig, ax = plt.subplots()
ax.set_xlabel('$t$', fontsize=14)
ax.set_ylabel('$x_t$', fontsize=14)
x = ch.generate_sequence(ts_length)
ax.plot(range(ts_length), x, 'bo-', alpha=0.5, lw=2, label='$x_t$')
plt.show()
ax.set_xlabel('$r$', fontsize=16)
plt.show()
12.5. SPECIAL METHODS 167
Python provides special methods with which some neat tricks can be performed
For example, recall that lists and tuples have a notion of length and that this length can be
queried via the len function
Out[25]: 2
If you want to provide a return value for the len function when applied to your user-defined
object, use the __len__ special method
def __len__(self):
return 42
Now we get
In [27]: f = Foo()
len(f)
Out[27]: 42
In [29]: f = Foo()
f(8) # Exactly equivalent to f.__call__(8)
Out[29]: 50
12.6 Exercises
12.6.1 Exercise 1
The empirical cumulative distribution function (ecdf) corresponding to a sample {๐๐ }๐๐=1 is
defined as
1 ๐
๐น๐ (๐ฅ) โถ= โ 1{๐๐ โค ๐ฅ} (๐ฅ โ R) (3)
๐ ๐=1
Here 1{๐๐ โค ๐ฅ} is an indicator function (one if ๐๐ โค ๐ฅ and zero otherwise) and hence ๐น๐ (๐ฅ)
is the fraction of the sample that falls below ๐ฅ
The GlivenkoโCantelli Theorem states that, provided that the sample is IID, the ecdf ๐น๐ con-
verges to the true distribution function ๐น
Implement ๐น๐ as a class called ECDF, where
12.7. SOLUTIONS 169
โข A given sample {๐๐ }๐๐=1 are the instance data, stored as self.observations
โข The class implements a __call__ method that returns ๐น๐ (๐ฅ) for any ๐ฅ
12.6.2 Exercise 2
๐
๐(๐ฅ) = ๐0 + ๐1 ๐ฅ + ๐2 ๐ฅ2 + โฏ ๐๐ ๐ฅ๐ = โ ๐๐ ๐ฅ๐ (๐ฅ โ R) (4)
๐=0
The instance data for the class Polynomial will be the coefficients (in the case of Eq. (4),
the numbers ๐0 , โฆ , ๐๐ )
Provide methods that
12.7 Solutions
12.7.1 Exercise 1
In [30]: class ECDF:
In [31]: # == test == #
print(F(0.5))
0.4
0.484
12.7.2 Exercise 2
In [32]: class Polynomial:
def differentiate(self):
"Reset self.coefficients to those of p' instead of p."
new_coefficients = []
for i, a in enumerate(self.coefficients):
new_coefficients.append(i * a)
# Remove the first element, which is zero
del new_coefficients[0]
# And reset coefficients data to new values
self.coefficients = new_coefficients
return new_coefficients
13
13.1 Contents
โข Overview 13.2
โข Details 13.3
โข Implementation 13.4
โข Stochastic Shocks 13.5
โข Government Spending 13.6
โข Wrapping Everything Into a Class 13.7
โข Using the LinearStateSpace Class 13.8
โข Pure Multiplier Model 13.9
โข Summary 13.10
13.2 Overview
This lecture creates non-stochastic and stochastic versions of Paul Samuelsonโs celebrated
multiplier accelerator model [115]
In doing so, we extend the example of the Solow model class in our second OOP lecture
Our objectives are to
171
172 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
Samuelson used a second-order linear difference equation to represent a model of national out-
put based on three components:
โข a national output identity asserting that national outcome is the sum of consumption
plus investment plus government purchases
โข a Keynesian consumption function asserting that consumption at time ๐ก is equal to a
constant times national output at time ๐ก โ 1
โข an investment accelerator asserting that investment at time ๐ก equals a constant called
the accelerator coefficient times the difference in output between period ๐ก โ 1 and ๐ก โ 2
โข the idea that consumption plus investment plus government purchases constitute aggre-
gate demand, which automatically calls forth an equal amount of aggregate supply
(To read about linear difference equations see here or chapter IX of [118])
Samuelson used the model to analyze how particular values of the marginal propensity to
consume and the accelerator coefficient might give rise to transient business cycles in national
output
Possible dynamic properties include
Later we present an extension that adds a random shock to the right side of the national in-
come identity representing random fluctuations in aggregate demand
This modification makes national output become governed by a second-order stochastic linear
difference equation that, with appropriate parameter values, gives rise to recurrent irregular
business cycles
(To read about stochastic linear difference equations see chapter XI of [118])
13.3 Details
๐ถ๐ก = ๐๐๐กโ1 + ๐พ (1)
๐๐ก = ๐ถ๐ก + ๐ผ๐ก + ๐บ๐ก (3)
Equations Eq. (1), Eq. (2), and Eq. (3) imply the following second-order linear difference
equation for national income:
๐๐ก = (๐ + ๐)๐๐กโ1 โ ๐๐๐กโ2 + (๐พ + ๐บ๐ก )
or
where ๐1 = (๐ + ๐) and ๐2 = โ๐
To complete the model, we require two initial conditions
If the model is to generate time series for ๐ก = 0, โฆ , ๐ , we require initial values
ฬ ,
๐โ1 = ๐โ1 ฬ
๐โ2 = ๐โ2
Weโll ordinarily set the parameters (๐, ๐) so that starting from an arbitrary pair of initial con-
ฬ , ๐โ2
ditions (๐โ1 ฬ ), national income ๐ _๐ก converges to a constant value as ๐ก becomes large
The deterministic version of the model described so far โ meaning that no random shocks
hit aggregate demand โ has only transient fluctuations
We can convert the model to one that has persistent irregular fluctuations by adding a ran-
dom shock to aggregate demand
๐๐ก = ๐1 ๐๐กโ1 + ๐2 ๐๐กโ2
or
To discover the properties of the solution of Eq. (6), it is useful first to form the characteris-
tic polynomial for Eq. (6):
๐ง 2 โ ๐1 ๐ง โ ๐ 2 (7)
๐ง2 โ ๐1 ๐ง โ ๐2 = (๐ง โ ๐1 )(๐ง โ ๐2 ) = 0 (8)
๐1 = ๐๐๐๐ , ๐2 = ๐๐โ๐๐
13.3. DETAILS 175
where ๐ is the amplitude of the complex number and ๐ is its angle or phase
These can also be represented as
๐1 = ๐(๐๐๐ (๐) + ๐ sin(๐))
๐2 = ๐(๐๐๐ (๐) โ ๐ sin(๐))
๐๐ก = ๐๐ก1 ๐1 + ๐๐ก2 ๐2
where ๐1 and ๐2 are constants that depend on the two initial conditions and on ๐1 , ๐2
When the roots are complex, it is useful to pursue the following calculations
Notice that
๐๐ก = ๐1 (๐๐๐๐ )๐ก + ๐2 (๐๐โ๐๐ )๐ก
= ๐1 ๐๐ก ๐๐๐๐ก + ๐2 ๐๐ก ๐โ๐๐๐ก
= ๐1 ๐๐ก [cos(๐๐ก) + ๐ sin(๐๐ก)] + ๐2 ๐๐ก [cos(๐๐ก) โ ๐ sin(๐๐ก)]
= (๐1 + ๐2 )๐๐ก cos(๐๐ก) + ๐(๐1 โ ๐2 )๐๐ก sin(๐๐ก)
The only way that ๐๐ก can be a real number for each ๐ก is if ๐1 + ๐2 is a real number and ๐1 โ ๐2
is an imaginary number
This happens only when ๐1 and ๐2 are complex conjugates, in which case they can be written
in the polar forms
๐1 = ๐ฃ๐๐๐ , ๐2 = ๐ฃ๐โ๐๐
So we can write
where ๐ฃ and ๐ are constants that must be chosen to satisfy initial conditions for ๐โ1 , ๐โ2
This formula shows that when the roots are complex, ๐๐ก displays oscillations with period
๐ฬ = 2๐
๐ and damping factor ๐
We say that ๐ฬ is the period because in that amount of time the cosine wave cos(๐๐ก + ๐) goes
through exactly one complete cycles
(Draw a cosine function to convince yourself of this please)
176 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
Remark: Following [115], we want to choose the parameters ๐, ๐ of the model so that the ab-
solute values (of the possibly complex) roots ๐1 , ๐2 of the characteristic polynomial are both
strictly less than one:
Remark: When both roots ๐1 , ๐2 of the characteristic polynomial have absolute values
strictly less than one, the absolute value of the larger one governs the rate of convergence to
the steady state of the non stochastic version of the model
Here is the formula for the matrix ๐ด in the linear state space system in the case that govern-
ment expenditures are a constant ๐บ:
1 0 0
๐ด = โข๐พ + ๐บ ๐1 ๐2 โค
โก
โฅ
โฃ 0 1 0 โฆ
13.4 Implementation
def param_plot():
"""this function creates the graph on page 189 of Sargent Macroeconomic Theory, second edition, 19
# Set axis
xmin, ymin = -3, -2
xmax, ymax = -xmin, -ymin
plt.axis([xmin, xmax, ymin, ymax])
return fig
param_plot()
plt.show()
The graph portrays regions in which the (๐1 , ๐2 ) root pairs implied by the (๐1 = (๐ + ๐), ๐2 =
โ๐) difference equation parameter pairs in the Samuelson model are such that:
โข (๐1 , ๐2 ) are complex with modulus less than 1 - in this case, the {๐๐ก } sequence displays
damped oscillations
โข (๐1 , ๐2 ) are both real, but one is strictly greater than 1 - this leads to explosive growth
โข (๐1 , ๐2 ) are both real, but one is strictly less than โ1 - this leads to explosive oscilla-
tions
โข (๐1 , ๐2 ) are both real and both are less than 1 in absolute value - in this case, there is
smooth convergence to the steady state without damped cycles
Later weโll present the graph with a red mark showing the particular point implied by the
setting of (๐, ๐)
13.4. IMPLEMENTATION 179
discriminant = ฯ1 ** 2 + 4 * ฯ2
if ฯ2 > 1 + ฯ1 or ฯ2 < -1:
print('Explosive oscillations')
elif ฯ1 + ฯ2 > 1:
print('Explosive growth')
elif discriminant < 0:
print('Roots are complex with modulus less than one; therefore damped oscillations')
else:
print('Roots are real and absolute values are less than one; therefore get smooth convergence
categorize_solution(1.3, -.4)
Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
The following function calculates roots of the characteristic polynomial using high school al-
gebra
(Weโll calculate the roots in other ways later)
The function also plots a ๐๐ก starting from initial conditions that we set
roots = []
ฯ1 = ฮฑ + ฮฒ
ฯ2 = -ฮฒ
print(f'ฯ_1 is {ฯ1}')
print(f'ฯ_2 is {ฯ2}')
discriminant = ฯ1 ** 2 + 4 * ฯ2
180 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
if discriminant == 0:
roots.append(-ฯ1 / 2)
print('Single real root: ')
print(''.join(str(roots)))
elif discriminant > 0:
roots.append((-ฯ1 + sqrt(discriminant).real) / 2)
roots.append((-ฯ1 - sqrt(discriminant).real) / 2)
print('Two real roots: ')
print(''.join(str(roots)))
else:
roots.append((-ฯ1 + sqrt(discriminant)) / 2)
roots.append((-ฯ1 - sqrt(discriminant)) / 2)
print('Two complex roots: ')
print(''.join(str(roots)))
return y_t
plot_y(y_nonstochastic())
ฯ_1 is 1.42
ฯ_2 is -0.5
Two real roots:
[-0.6459687576256715, -0.7740312423743284]
Absolute values of roots are less than one
13.4. IMPLEMENTATION 181
The next cell writes code that takes as inputs the modulus ๐ and phase ๐ of a conjugate pair
of complex numbers in polar form
๐1 = ๐ exp(๐๐), ๐2 = ๐ exp(โ๐๐)
โข The code assumes that these two complex numbers are the roots of the characteristic
polynomial
โข It then reverse-engineers (๐, ๐) and (๐1 , ๐2 ), pairs that would generate those roots
import cmath
import math
r = .95
period = 10 # Length of cycle in units of time
๏ฟฝ = 2 * math.pi/period
a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ฯ1, ฯ2 = (1.5371322893124+0j), (-0.9024999999999999+0j)
ฯ1 = ฯ1.real
ฯ2 = ฯ2.real
ฯ1, ฯ2
Here weโll use numpy to compute the roots of the characteristic polynomial
p1 = cmath.polar(r1)
p2 = cmath.polar(r2)
r, ๏ฟฝ = 0.95, 0.6283185307179586
p1, p2 = (0.95, 0.6283185307179586), (0.95, -0.6283185307179586)
a, b = (0.6346322893124001+0j), (0.9024999999999999-0j)
ฯ1, ฯ2 = 1.5371322893124, -0.9024999999999999
""" Rather than computing the roots of the characteristic polynomial by hand as we did earlier, t
enlists numpy to do the work for us """
# Useful constants
ฯ1 = ฮฑ + ฮฒ
ฯ2 = -ฮฒ
categorize_solution(ฯ1, ฯ2)
return y_t
plot_y(y_nonstochastic())
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.85+0.27838822j 0.85-0.27838822j]
Roots are complex
13.4. IMPLEMENTATION 183
a = a.real # drop the imaginary part so that it is a valid input into y_nonstochastic
b = b.real
a, b = 0.6180339887498949, 1.0
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [0.80901699+0.58778525j 0.80901699-0.58778525j]
Roots are complex
Roots are less than one
184 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
We can also use sympy to compute analytic formulas for the roots
r1 = Symbol("ฯ_1")
r2 = Symbol("ฯ_2")
z = Symbol("z")
Out[12]:
๐1 1 ๐1 1
[ โ โ๐12 + 4๐2 , + โ๐12 + 4๐2 ]
2 2 2 2
In [13]: a = Symbol("ฮฑ")
b = Symbol("ฮฒ")
r1 = a + b
r2 = -b
Out[13]:
13.5. STOCHASTIC SHOCKS 185
๐ผ ๐ฝ 1 ๐ผ ๐ฝ 1
[ + โ โ๐ผ2 + 2๐ผ๐ฝ + ๐ฝ 2 โ 4๐ฝ, + + โ๐ผ2 + 2๐ผ๐ฝ + ๐ฝ 2 โ 4๐ฝ]
2 2 2 2 2 2
Now weโll construct some code to simulate the stochastic version of the model that emerges
when we add a random shock process to aggregate demand
"""This function takes parameters of a stochastic version of the model and proceeds to analyze
the roots of the characteristic polynomial and also generate a simulation"""
# Useful constants
ฯ1 = ฮฑ + ฮฒ
ฯ2 = -ฮฒ
# Categorize solution
categorize_solution(ฯ1, ฯ2)
# Generate shocks
๏ฟฝ = np.random.normal(0, 1, n)
return y_t
plot_y(y_stochastic())
Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
186 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
Letโs do a simulation in which there are shocks and the characteristic polynomial has complex
roots
In [15]: r = .97
a = a.real # drop the imaginary part so that it is a valid input into y_nonstochastic
b = b.real
a, b = 0.6285929690873979, 0.9409000000000001
Roots are complex with modulus less than one; therefore damped oscillations
[0.78474648+0.57015169j 0.78474648-0.57015169j]
Roots are complex
Roots are less than one
13.6. GOVERNMENT SPENDING 187
"""This program computes a response to a permanent increase in government expenditures that occur
at time 20"""
# Useful constants
ฯ1 = ฮฑ + ฮฒ
ฯ2 = -ฮฒ
# Categorize solution
categorize_solution(ฯ1, ฯ2)
# Generate shocks
๏ฟฝ = np.random.normal(0, 1, n)
# Stochastic
else:
๏ฟฝ = np.random.normal(0, 1, n)
return ฯ1 * x[t - 1] + ฯ2 * x[t - 2] + ฮณ + g + ฯ * ๏ฟฝ[t]
# No government spending
if g == 0:
y_t.append(transition(y_t, t))
Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
13.6. GOVERNMENT SPENDING 189
We can also see the response to a one time jump in government expenditures
Roots are real and absolute values are less than one; therefore get smooth convergence to a steady state
[0.7236068 0.2763932]
Roots are real
Roots are less than one
190 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
.. math::
Parameters
----------
y_0 : scalar
Initial condition for Y_0
y_1 : scalar
Initial condition for Y_1
ฮฑ : scalar
Marginal propensity to consume
ฮฒ : scalar
Accelerator coefficient
n : int
Number of iterations
ฯ : scalar
Volatility parameter. It must be greater than or equal to 0. Set
equal to 0 for a non-stochastic model.
g : scalar
Government spending shock
g_t : int
Time at which government spending shock occurs. Must be specified
when duration != None.
duration : {None, 'permanent', 'one-off'}
Specifies type of government spending shock. If none, government
spending equal to g for all t.
"""
def __init__(self,
y_0=100,
y_1=50,
ฮฑ=1.3,
ฮฒ=0.2,
ฮณ=10,
n=100,
ฯ=0,
g=0,
g_t=0,
duration=None):
def root_type(self):
if all(isinstance(root, complex) for root in self.roots):
return 'Complex conjugate'
elif len(self.roots) > 1:
return 'Double real'
else:
return 'Single real'
13.7. WRAPPING EVERYTHING INTO A CLASS 191
def root_less_than_one(self):
if all(abs(root) < 1 for root in self.roots):
return True
def solution_type(self):
ฯ1, ฯ2 = self.ฯ1, self.ฯ2
discriminant = ฯ1 ** 2 + 4 * ฯ2
if ฯ2 >= 1 + ฯ1 or ฯ2 <= -1:
return 'Explosive oscillations'
elif ฯ1 + ฯ2 >= 1:
return 'Explosive growth'
elif discriminant < 0:
return 'Damped oscillations'
else:
return 'Steady state'
# Stochastic
else:
๏ฟฝ = np.random.normal(0, 1, self.n)
return self.ฯ1 * x[t - 1] + self.ฯ2 * x[t - 2] + self.ฮณ + g + self.ฯ * ๏ฟฝ[t]
def generate_series(self):
# No government spending
if self.g == 0:
y_t.append(self._transition(y_t, t))
def summary(self):
print('Summary\n' + '-' * 50)
print(f'Root type: {self.root_type()}')
print(f'Solution type: {self.solution_type()}')
print(f'Roots: {str(self.roots)}')
if self.root_less_than_one() == True:
print('Absolute value of roots is less than one')
else:
print('Absolute value of roots is not less than one')
if self.ฯ > 0:
print('Stochastic series with ฯ = ' + str(self.ฯ))
else:
192 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
print('Non-stochastic series')
if self.g != 0:
print('Government spending equal to ' + str(self.g))
if self.duration != None:
print(self.duration.capitalize() +
' government spending shock at t = ' + str(self.g_t))
def plot(self):
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(self.generate_series())
ax.set(xlabel='Iteration', xlim=(0, self.n))
ax.set_ylabel('$Y_t$', rotation=0)
ax.grid()
return fig
def param_plot(self):
fig = param_plot()
ax = fig.gca()
plt.legend(fontsize=12, loc=3)
return fig
Summary
--------------------------------------------------
Root type: Complex conjugate
Solution type: Damped oscillations
Roots: [0.65+0.27838822j 0.65-0.27838822j]
Absolute value of roots is less than one
Stochastic series with ฯ = 2
Government spending equal to 10
Permanent government spending shock at t = 20
In [21]: sam.plot()
plt.show()
13.7. WRAPPING EVERYTHING INTO A CLASS 193
Weโll use our graph to show where the roots lie and how their location is consistent with the
behavior of the path just graphed
The red + sign shows the location of the roots
In [22]: sam.param_plot()
plt.show()
194 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
It turns out that we can use the QuantEcon.py LinearStateSpace class to do much of the
work that we have done from scratch above
Here is how we map the Samuelson model into an instance of a LinearStateSpace class
""" This script maps the Samuelson model in the the ``LinearStateSpace`` class"""
ฮฑ = 0.8
ฮฒ = 0.9
ฯ1 = ฮฑ + ฮฒ
ฯ2 = -ฮฒ
ฮณ = 10
ฯ = 1
g = 10
n = 100
A = [[1, 0, 0],
[ฮณ + g, ฯ1, ฯ2],
[0, 1, 0]]
x, y = sam_t.simulate(ts_length=n)
axes[-1].set_xlabel('Iteration')
plt.show()
13.8. USING THE LINEARSTATESPACE CLASS 195
Letโs plot impulse response functions for the instance of the Samuelson model using a
method in the LinearStateSpace class
Out[24]:
(2, 6, 1)
(2, 6, 1)
Now letโs compute the zeros of the characteristic polynomial by simply calculating the eigen-
values of ๐ด
In [25]: A = np.asarray(A)
w, v = np.linalg.eig(A)
print(w)
We could also create a subclass of LinearStateSpace (inheriting all its methods and at-
tributes) to add more functions to use
"""
this subclass creates a Samuelson multiplier-accelerator model
as a linear state space system
"""
def __init__(self,
y_0=100,
y_1=100,
ฮฑ=0.8,
ฮฒ=0.9,
ฮณ=10,
ฯ=1,
g=10):
self.ฮฑ, self.ฮฒ = ฮฑ, ฮฒ
self.y_0, self.y_1, self.g = y_0, y_1, g
self.ฮณ, self.ฯ = ฮณ, ฯ
self.ฯ1 = ฮฑ + ฮฒ
self.ฯ2 = -ฮฒ
x, y = self.simulate(ts_length)
axes[-1].set_xlabel('Iteration')
return fig
x, y = self.impulse_response(j)
return fig
13.8.3 Illustrations
In [30]: samlss.plot_irf(100)
plt.show()
In [31]: samlss.multipliers()
Letโs shut down the accelerator by setting ๐ = 0 to get a pure multiplier model
โข the absence of cycles gives an idea about why Samuelson included the accelerator
In [33]: pure_multiplier.plot_simulation()
Out[33]:
200 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
In [35]: pure_multiplier.plot_simulation()
13.9. PURE MULTIPLIER MODEL 201
Out[35]:
In [36]: pure_multiplier.plot_irf(100)
202 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
Out[36]:
13.10. SUMMARY 203
13.10 Summary
In this lecture, we wrote functions and classes to represent non-stochastic and stochastic ver-
sions of the Samuelson (1939) multiplier-accelerator model, described in [115]
We saw that different parameter values led to different output paths, which could either be
stationary, explosive, or oscillating
We also were able to represent the model using the QuantEcon.py LinearStateSpace class
204 13. OOP III: SAMUELSON MULTIPLIER ACCELERATOR
14
14.1 Contents
โข Overview 14.2
โข Iterables and Iterators 14.3
โข Names and Name Resolution 14.4
โข Handling Errors 14.5
โข Decorators and Descriptors 14.6
โข Generators 14.7
โข Recursive Function Calls 14.8
โข Exercises 14.9
โข Solutions 14.10
14.2 Overview
With this last lecture, our advice is to skip it on first pass, unless you have a burning de-
sire to read it
Itโs here
A variety of topics are treated in the lecture, including generators, exceptions and descriptors
205
206 14. MORE LANGUAGE FEATURES
14.3.1 Iterators
Writing us_cities.txt
In [2]: f = open('us_cities.txt')
f.__next__()
In [3]: f.__next__()
We see that file objects do indeed have a __next__ method, and that calling this method
returns the next line in the file
The next method can also be accessed via the builtin function next(), which directly calls
this method
In [4]: next(f)
In [6]: next(e)
Writing test_table.csv
f = open('test_table.csv', 'r')
nikkei_data = reader(f)
next(nikkei_data)
In [9]: next(nikkei_data)
All iterators can be placed to the right of the in keyword in for loop statements
In fact this is how the for loop works: If we write
for x in iterator:
<code block>
f = open('somefile.txt', 'r')
for line in f:
# do something
14.3.3 Iterables
You already know that we can put a Python list to the right of in in a for loop
spam
eggs
Out[11]: list
In [12]: next(x)
---------------------------------------------------------------------------
<ipython-input-12-92de4e9f6b1e> in <module>
----> 1 next(x)
Out[13]: list
In [14]: y = iter(x)
type(y)
Out[14]: list_iterator
In [15]: next(y)
Out[15]: 'foo'
In [16]: next(y)
14.3. ITERABLES AND ITERATORS 209
Out[16]: 'bar'
In [17]: next(y)
---------------------------------------------------------------------------
<ipython-input-17-81b9d2f0f16a> in <module>
----> 1 next(y)
StopIteration:
In [18]: iter(42)
---------------------------------------------------------------------------
<ipython-input-18-ef50b48e4398> in <module>
----> 1 iter(42)
Some built-in functions that act on sequences also work with iterables
For example
Out[19]: 10
In [20]: y = iter(x)
type(y)
Out[20]: list_iterator
210 14. MORE LANGUAGE FEATURES
In [21]: max(y)
Out[21]: 10
One thing to remember about iterators is that they are depleted by use
Out[22]: 10
In [23]: max(y)
---------------------------------------------------------------------------
<ipython-input-23-062424e6ec08> in <module>
----> 1 max(y)
In [24]: x = 42
We now know that when this statement is executed, Python creates an object of type int in
your computerโs memory, containing
โข the value 42
โข some associated attributes
g = f
id(g) == id(f)
14.4. NAMES AND NAME RESOLUTION 211
Out[25]: True
In [26]: g('test')
test
In the first step, a function object is created, and the name f is bound to it
After binding the name g to the same object, we can use it anywhere we would use f
What happens when the number of names bound to an object goes to zero?
Hereโs an example of this situation, where the name x is first bound to one object and then
rebound to another
In [27]: x = 'foo'
id(x)
Out[27]: 139979150881488
14.4.2 Namespaces
In [29]: x = 42
Writing math2.py
Next letโs import the math module from the standard library
In [33]: math.pi
Out[33]: 3.141592653589793
In [34]: math2.pi
Out[34]: 'foobar'
These two different bindings of pi exist in different namespaces, each one implemented as a
dictionary
We can look at the dictionary directly, using module_name.__dict__
math.__dict__.items()
Out[35]: dict_items([('__name__', 'math'), ('__doc__', 'This module is always available. It provides access t
math2.__dict__.items()
As you know, we access elements of the namespace using the dotted attribute notation
In [37]: math.pi
Out[37]: 3.141592653589793
Out[38]: True
14.4. NAMES AND NAME RESOLUTION 213
In [39]: vars(math).items()
Out[39]: dict_items([('__name__', 'math'), ('__doc__', 'This module is always available. It provides access t
In [40]: dir(math)[0:10]
Out[40]: ['__doc__',
'__file__',
'__loader__',
'__name__',
'__package__',
'__spec__',
'acos',
'acosh',
'asin',
'asinh']
In [41]: print(math.__doc__)
In [42]: math.__name__
Out[42]: 'math'
In [43]: print(__name__)
214 14. MORE LANGUAGE FEATURES
__main__
When we run a script using IPythonโs run command, the contents of the file are executed as
part of __main__ too
To see this, letโs create a file mod.py that prints its own __name__ attribute
Writing mod.py
mod
__main__
In the second case, the code is executed as part of __main__, so __name__ is equal to
__main__
To see the contents of the namespace of __main__ we use vars() rather than
vars(__main__)
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has
initialized when you started up your session
If you prefer to see only the variables you have initialized, use whos
In [47]: x = 2
y = 3
import numpy as np
%whos
import amodule
At this point, the interpreter creates a namespace for the module amodule and starts exe-
cuting commands in the module
While this occurs, the namespace amodule.__dict__ is the global namespace
Once execution of the module finishes, the interpreter returns to the module from where the
import statement was made
In this case itโs __main__, so the namespace of __main__ again becomes the global names-
pace
Important fact: When we call a function, the interpreter creates a local namespace for that
function, and registers the variables in that namespace
The reason for this will be explained in just a moment
Variables in the local namespace are called local variables
After the function returns, the namespace is deallocated and lost
While the function is executing, we can view the contents of the local namespace with lo-
cals()
For example, consider
In [49]: f(1)
{'x': 1, 'a': 2}
Out[49]: 2
We have been using various built-in functions, such as max(), dir(), str(), list(),
len(), range(), type(), etc.
How does access to these names work?
In [50]: dir()[0:10]
Out[50]: ['In', 'Out', '_', '_11', '_13', '_14', '_15', '_16', '_19', '_2']
In [51]: dir(__builtins__)[0:10]
Out[51]: ['ArithmeticError',
'AssertionError',
'AttributeError',
'BaseException',
'BlockingIOError',
'BrokenPipeError',
'BufferError',
'BytesWarning',
'ChildProcessError',
'ConnectionAbortedError']
In [52]: __builtins__.max
But __builtins__ is special, because we can always access them directly as well
In [53]: max
Out[54]: True
At any point of execution, there are in fact at least two namespaces that can be accessed di-
rectly
(โAccessed directlyโ means without using a dot, as in pi rather than math.pi)
These namespaces are
If the interpreter is executing a function, then the directly accessible namespaces are
Here f is the enclosing function for g, and each function gets its own namespaces
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is
If the name is not in any of these namespaces, the interpreter raises a NameError
This is called the LEGB rule (local, enclosing, global, builtin)
Hereโs an example that helps to illustrate
Consider a script test.py that looks as follows
a = 0
y = g(10)
print("a = ", a, "y = ", y)
Writing test.py
a = 0 y = 11
In [58]: x
Out[58]: 2
First,
This is a good time to say a little more about mutable vs immutable objects
Consider the code segment
x = 1
print(f(x), x)
2 1
We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as
the value of x
First f and x are registered in the global namespace
The call f(x) creates a local namespace and adds x to it, bound to 1
Next, this local x is rebound to the new integer object 2, and this value is returned
None of this affects the global x
However, itโs a different story when we use a mutable data type such as a list
14.5. HANDLING ERRORS 219
x = [1]
print(f(x), x)
[2] [2]
๐
1
๐ 2 โถ= โ(๐ฆ๐ โ ๐ฆ)ฬ 2 ๐ฆ ฬ = sample mean
๐ โ 1 ๐=1
โข Because the debugging information provided by the interpreter is often less useful than
the information on possible errors you have in your head when writing code
โข Because errors causing execution to stop are frustrating if youโre in the middle of a
large computation
โข Because itโs reduces confidence in your code on the part of your users (if you are writing
for others)
220 14. MORE LANGUAGE FEATURES
14.5.1 Assertions
If we run this with an array of length one, the program will terminate and print our error
message
In [62]: var([1])
---------------------------------------------------------------------------
<ipython-input-62-8419b6ab38ec> in <module>
----> 1 var([1])
<ipython-input-61-e6ffb16a7098> in var(y)
1 def var(y):
2 n = len(y)
----> 3 assert n > 1, 'Sample size must be greater than one.'
4 return np.sum((y - y.mean())**2) / float(n-1)
The approach used above is a bit limited, because it always leads to termination
Sometimes we can handle errors more gracefully, by treating special cases
Letโs look at how this is done
Exceptions
Hereโs an example of a common error type
In [63]: def f:
Since illegal syntax cannot be executed, a syntax error terminates execution of the program
Hereโs a different kind of error, unrelated to syntax
In [64]: 1 / 0
---------------------------------------------------------------------------
<ipython-input-64-bc757c3fda29> in <module>
----> 1 1 / 0
Hereโs another
In [65]: x1 = y1
---------------------------------------------------------------------------
<ipython-input-65-a7b8d65e9e45> in <module>
----> 1 x1 = y1
And another
In [66]: 'foo' + 6
---------------------------------------------------------------------------
<ipython-input-66-216809d6e6fe> in <module>
----> 1 'foo' + 6
And another
In [67]: X = []
x = X[0]
---------------------------------------------------------------------------
<ipython-input-67-082a18d7a0aa> in <module>
1 X = []
----> 2 x = X[0]
In [69]: f(2)
Out[69]: 0.5
In [70]: f(0)
In [71]: f(0.0)
In [73]: f(2)
Out[73]: 0.5
In [74]: f(0)
14.6. DECORATORS AND DESCRIPTORS 223
In [75]: f('foo')
In [77]: f(2)
Out[77]: 0.5
In [78]: f(0)
In [79]: f('foo')
Letโs look at some special syntax elements that are routinely used by Python developers
You might not need the following concepts immediately, but you will see them in other peo-
pleโs code
Hence you need to understand them at some stage of your Python education
224 14. MORE LANGUAGE FEATURES
14.6.1 Decorators
Decorators are a bit of syntactic sugar that, while easily avoided, have turned out to be popu-
lar
Itโs very easy to say what decorators do
On the other hand it takes a bit of effort to explain why you might use them
An Example
Suppose we are working on a program that looks something like this
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
Now suppose thereโs a problem: occasionally negative numbers get fed to f and g in the cal-
culations that follow
If you try it, youโll see that when these functions are called with negative numbers they re-
turn a NumPy object called nan
This stands for โnot a numberโ (and indicates that you are trying to evaluate a mathematical
function at a point where it is not defined)
Perhaps this isnโt what we want, because it causes other problems that are hard to pick up
later on
Suppose that instead we want the program to terminate whenever this happens, with a sensi-
ble error message
This change is easy enough to implement
def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))
def g(x):
assert x >= 0, "Argument must be nonnegative"
return np.sqrt(42 * x)
Notice however that there is some repetition here, in the form of two identical lines of code
Repetition makes our code longer and harder to maintain, and hence is something we try
hard to avoid
Here itโs not a big deal, but imagine now that instead of just f and g, we have 20 such func-
tions that we need to modify in exactly the same way
This means we need to repeat the test logic (i.e., the assert line testing nonnegativity) 20
times
14.6. DECORATORS AND DESCRIPTORS 225
The situation is still worse if the test logic is longer and more complicated
In this kind of scenario the following approach would be neater
def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
with
226 14. MORE LANGUAGE FEATURES
In [86]: @check_nonneg
def f(x):
return np.log(np.log(x))
@check_nonneg
def g(x):
return np.sqrt(42 * x)
14.6.2 Descriptors
One potential problem we might have here is that a user alters one of these variables but not
the other
Out[88]: 1000
In [89]: car.kms
Out[89]: 1610.0
Out[90]: 1610.0
In the last two lines we see that miles and kms are out of sync
14.6. DECORATORS AND DESCRIPTORS 227
What we really want is some mechanism whereby each time a user sets one of these variables,
the other is automatically updated
A Solution
In Python, this issue is solved using descriptors
A descriptor is just a Python object that implements certain methods
These methods are triggered when the object is accessed through dotted attribute notation
The best way to understand this is to see it in action
Consider this alternative version of the Car class
def get_miles(self):
return self._miles
def get_kms(self):
return self._kms
Out[92]: 1000
Out[93]: 9660.0
The builtin Python function property takes getter and setter methods and creates a prop-
erty
For example, after car is created as an instance of Car, the object car.miles is a property
Being a property, when we set its value via car.miles = 6000 its setter method is trig-
gered โ in this case set_miles
Decorators and Properties
These days its very common to see the property function used via a decorator
Hereโs another version of our Car class that works as before but now uses decorators to set
up the properties
@property
def miles(self):
return self._miles
@property
def kms(self):
return self._kms
@miles.setter
def miles(self, value):
self._miles = value
self._kms = value * 1.61
@kms.setter
def kms(self, value):
self._kms = value
self._miles = value / 1.61
14.7 Generators
Out[95]: tuple
14.7. GENERATORS 229
In [97]: type(plural)
Out[97]: list
Out[98]: generator
In [99]: next(plural)
Out[99]: 'dogs'
In [100]: next(plural)
Out[100]: 'cats'
In [101]: next(plural)
Out[101]: 'birds'
Out[102]: 285
The function sum() calls next() to get the items, adds successive terms
In fact, we can omit the outer brackets in this case
Out[103]: 285
The most flexible way to create generator objects is to use generator functions
Letโs look at some examples
Example 1
Hereโs a very simple example of a generator function
230 14. MORE LANGUAGE FEATURES
It looks like a function, but uses a keyword yield that we havenโt met before
Letโs see how it works after running this code
In [105]: type(f)
Out[105]: function
In [107]: next(gen)
Out[107]: 'start'
In [108]: next(gen)
Out[108]: 'middle'
In [109]: next(gen)
Out[109]: 'end'
In [110]: next(gen)
---------------------------------------------------------------------------
<ipython-input-110-6e72e47198db> in <module>
----> 1 next(gen)
StopIteration:
The generator function f() is used to create generator objects (in this case gen)
Generators are iterators, because they support a next method
The first call to next(gen)
The second call to next(gen) starts executing from the next line
14.7. GENERATORS 231
In [113]: g
Out[114]: generator
In [115]: next(gen)
Out[115]: 2
In [116]: next(gen)
Out[116]: 4
In [117]: next(gen)
Out[117]: 16
In [118]: next(gen)
---------------------------------------------------------------------------
<ipython-input-118-6e72e47198db> in <module>
----> 1 next(gen)
StopIteration:
232 14. MORE LANGUAGE FEATURES
โข The body of g() executes until the line yield x, and the value of x is returned
Out[121]: 5001162
But we are creating two huge lists here, range(n) and draws
This uses lots of memory and is very slow
If we make n even bigger then this happens
In [122]: n = 100000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
In [124]: n = 10000000
draws = f(n)
draws
In [125]: sum(draws)
Out[125]: 5000216
In summary, iterables
This is not something that you will use every day, but it is still useful โ you should learn it
at some stage
Basically, a recursive function is a function that calls itself
For example, consider the problem of computing ๐ฅ๐ก for some t when
What happens here is that each successive call uses itโs own frame in the stack
โข a frame is where the local variables of a given function call are held
โข stack is memory used to process function calls
โ a First In Last Out (FILO) queue
This example is somewhat contrived, since the first (iterative) solution would usually be pre-
ferred to the recursive solution
Weโll meet less contrived applications of recursion later on
234 14. MORE LANGUAGE FEATURES
14.9 Exercises
14.9.1 Exercise 1
The first few numbers in the sequence are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55
Write a function to recursively compute the ๐ก-th Fibonacci number for any ๐ก
14.9.2 Exercise 2
Complete the following code, and test it using this csv file, which we assume that youโve put
in your current working directory
dates = column_iterator('test_table.csv', 1)
14.9.3 Exercise 3
prices
3
8
7
21
Using try โ except, write a program to read in the contents of the file and sum the num-
bers, ignoring lines without numbers
14.10. SOLUTIONS 235
14.10 Solutions
14.10.1 Exercise 1
Letโs test it
14.10.2 Exercise 2
dates = column_iterator('test_table.csv', 1)
i = 1
for date in dates:
print(date)
if i == 10:
break
i += 1
Date
2009-05-21
2009-05-20
2009-05-19
2009-05-18
2009-05-15
2009-05-14
2009-05-13
2009-05-12
2009-05-11
14.10.3 Exercise 3
7
21
Writing numbers.txt
In [132]: f = open('numbers.txt')
total = 0.0
for line in f:
try:
total += float(line)
except ValueError:
pass
f.close()
print(total)
39.0
15
Debugging
15.1 Contents
โข Overview 15.2
โข Debugging 15.3
โDebugging is twice as hard as writing the code in the first place. Therefore, if
you write the code as cleverly as possible, you are, by definition, not smart enough
to debug it.โ โ Brian Kernighan
15.2 Overview
Are you one of those programmers who fills their code with print statements when trying to
debug their programs?
Hey, we all used to do that
(OK, sometimes we still do thatโฆ)
But once you start writing larger programs youโll need a better system
Debugging tools for Python vary across platforms, IDEs and editors
Here weโll focus on Jupyter and leave you to explore other settings
Weโll need the following imports
15.3 Debugging
237
238 15. DEBUGGING
---------------------------------------------------------------------------
<ipython-input-2-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot
<ipython-input-2-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6
This code is intended to plot the log function over the interval [1, 2]
But thereโs an error here: plt.subplots(2, 1) should be just plt.subplots()
(The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suit-
able for having two subplots on the same figure)
The traceback shows that the error occurs at the method call ax.plot(x, np.log(x))
The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array
has no plot method
15.3. DEBUGGING 239
But letโs pretend that we donโt understand this for the moment
We might suspect thereโs something wrong with ax but when we try to investigate this ob-
ject, we get the following exception:
In [3]: ax
---------------------------------------------------------------------------
<ipython-input-3-b00e77935981> in <module>
----> 1 ax
The problem is that ax was defined inside plot_log(), and the name is lost once that func-
tion terminates
Letโs try doing it a different way
We run the first cell block again, generating the same error
---------------------------------------------------------------------------
<ipython-input-4-c32a2280f47b> in <module>
5 plt.show()
6
----> 7 plot_log() # Call the function, generate plot
<ipython-input-4-c32a2280f47b> in plot_log()
2 fig, ax = plt.subplots(2, 1)
3 x = np.linspace(1, 2, 10)
----> 4 ax.plot(x, np.log(x))
5 plt.show()
6
%debug
You should be dropped into a new prompt that looks something like this
ipdb>
ipdb> ax
array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>,
<matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)
Itโs now very clear that ax is an array, which clarifies the source of the problem
To find out what else you can do from inside ipdb (or pdb), use the online help
ipdb> h
Undocumented commands:
======================
retval rv
ipdb> h c
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.
plot_log()
Here the original problem is fixed, but weโve accidentally written np.logspace(1, 2,
10) instead of np.linspace(1, 2, 10)
242 15. DEBUGGING
Now there wonโt be any exception, but the plot wonโt look right
To investigate, it would be helpful if we could inspect variables like x during execution of the
function
To this end, we add a โbreak pointโ by inserting breakpoint() inside the function code
block
def plot_log():
breakpoint()
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Now letโs run the script, and investigate via the debugger
> <ipython-input-6-a188074383b7>(6)plot_log()
-> fig, ax = plt.subplots()
(Pdb) n
> <ipython-input-6-a188074383b7>(7)plot_log()
-> x = np.logspace(1, 2, 10)
(Pdb) n
> <ipython-input-6-a188074383b7>(8)plot_log()
-> ax.plot(x, np.log(x))
(Pdb) x
array([ 10. , 12.91549665, 16.68100537, 21.5443469 ,
27.82559402, 35.93813664, 46.41588834, 59.94842503,
77.42636827, 100. ])
We used n twice to step forward through the code (one line at a time)
Then we printed the value of x to see what was happening with that variable
To exit from the debugger, use q
243
16
Pandas
16.1 Contents
โข Overview 16.2
โข Series 16.3
โข DataFrames 16.4
โข Exercises 16.6
โข Solutions 16.7
16.2 Overview
245
246 16. PANDAS
Just as NumPy provides the basic array data type plus core array operations, pandas
โข reading in data
โข adjusting indices
โข working with dates and time series
โข sorting, grouping, re-ordering and general data munging [1]
โข dealing with missing values, etc., etc.
More sophisticated statistical functionality is left to other packages, such as statsmodels and
scikit-learn, which are built on top of pandas
This lecture will provide a basic introduction to pandas
Throughout the lecture, we will assume that the following imports have taken place
16.3 Series
Two important data types defined by pandas are Series and DataFrame
You can think of a Series as a โcolumnโ of data, such as a collection of observations on a
single variable
A DataFrame is an object for storing related columns of data
Letโs start with Series
Out[2]: 0 0.246617
1 1.616297
16.3. SERIES 247
2 1.371344
3 -0.854713
Name: daily returns, dtype: float64
Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the
values being daily returns on their shares
Pandas Series are built on top of NumPy arrays and support many similar operations
In [3]: s * 100
Out[3]: 0 24.661661
1 161.629724
2 137.134394
3 -85.471300
Name: daily returns, dtype: float64
In [4]: np.abs(s)
Out[4]: 0 0.246617
1 1.616297
2 1.371344
3 0.854713
Name: daily returns, dtype: float64
In [5]: s.describe()
Viewed in this way, Series are like fast, efficient Python dictionaries (with the restriction
that the items in the dictionary all have the same typeโin this case, floats)
In fact, you can use much of the same syntax as Python dictionaries
In [7]: s['AMZN']
248 16. PANDAS
Out[7]: 0.24661661104520952
In [8]: s['AMZN'] = 0
s
In [9]: 'AAPL' in s
Out[9]: True
16.4 DataFrames
While a Series is a single column of data, a DataFrame is several columns, one for each
variable
In essence, a DataFrame in pandas is analogous to a (highly optimized) Excel spreadsheet
Thus, it is a powerful tool for representing and analyzing data that are naturally organized
into rows and columns, often with descriptive indexes for individual rows and individual
columns
Letโs look at an example that reads data from the CSV file pandas/data/test_pwt.csv
that can be downloaded here
Hereโs the content of test_pwt.csv
"country","country isocode","year","POP","XRAT","tcgdp","cc","cg"
"Argentina","ARG","2000","37335.653","0.9995","295072.21869","75.716805379","5.5
"Australia","AUS","2000","19053.186","1.72483","541804.6521","67.759025993","6.7
"India","IND","2000","1006300.297","44.9416","1728144.3748","64.575551328","14.0
"Israel","ISR","2000","6114.57","4.07733","129253.89423","64.436450847","10.2666
"Malawi","MWI","2000","11801.505","59.543808333","5026.2217836","74.707624181","
"South Africa","ZAF","2000","45064.098","6.93983","227242.36949","72.718710427",
"United States","USA","2000","282171.957","1","9898700","72.347054303","6.032453
"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282","
Supposing you have this data saved as test_pwt.csv in the present working directory (type
%pwd in Jupyter to see what this is), it can be read in as follows:
In [10]: df = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/test_pw
type(df)
Out[10]: pandas.core.frame.DataFrame
In [11]: df
cc cg
0 75.716805 5.578804
1 67.759026 6.720098
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
5 72.718710 5.726546
6 72.347054 6.032454
7 78.978740 5.108068
We can select particular rows using standard Python array slicing notation
In [12]: df[2:5]
cc cg
2 64.575551 14.072206
3 64.436451 10.266688
4 74.707624 11.658954
To select columns, we can pass a list containing the names of the desired columns represented
as strings
To select both rows and columns using integers, the iloc attribute should be used with the
format .iloc[rows, columns]
To select rows and columns using a mixture of integers and labels, the loc attribute can be
used in a similar way
Letโs imagine that weโre only interested in population and total GDP (tcgdp)
One way to strip the data frame df down to only these variables is to overwrite the
dataframe using the selection method described above
Here the index 0, 1,..., 7 is redundant because we can use the country names as an in-
dex
To do this, we set the index to be the country variable in the dataframe
In [17]: df = df.set_index('country')
df
Next, weโre going to add a column showing real GDP per capita, multiplying by 1,000,000 as
we go because total GDP is in millions
One of the nice things about pandas DataFrame and Series objects is that they have
methods for plotting and visualization that work through Matplotlib
For example, we can easily generate a bar plot of GDP per capita
df['GDP percap'].plot(kind='bar')
plt.show()
252 16. PANDAS
At the moment the data frame is ordered alphabetically on the countriesโletโs change it to
GDP per capita
https://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv
One option is to use requests, a standard Python library for requesting data over the Internet
To begin, try the following code on your computer
r = requests.get('http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv')
1. You are not connected to the Internet โ hopefully, this isnโt the case
2. Your machine is accessing the Internet through a proxy server, and Python isnโt aware
of this
Out[25]: 'DATE,VALUE\r'
In [26]: source[1]
Out[26]: '1948-01-01,3.4\r'
In [27]: source[2]
Out[27]: '1948-02-01,3.8\r'
We could now write some additional code to parse this text and store it as an array
But this is unnecessary โ pandasโ read_csv function can handle the task for us
We use parse_dates=True so that pandas recognizes our dates column, allowing for simple
date filtering
The data has been read into a pandas DataFrame called data that we can now manipulate in
the usual way
16.5. ON-LINE DATA SOURCES 255
In [29]: type(data)
Out[29]: pandas.core.frame.DataFrame
Out[30]: VALUE
DATE
1948-01-01 3.4
1948-02-01 3.8
1948-03-01 4.0
1948-04-01 3.9
1948-05-01 3.5
In [31]: pd.set_option('precision', 1)
data.describe() # Your output might differ slightly
Out[31]: VALUE
count 857.0
mean 5.8
std 1.6
min 2.5
25% 4.6
50% 5.6
75% 6.8
max 10.8
We can also plot the unemployment rate from 2006 to 2012 as follows
In [32]: data['2006':'2012'].plot()
plt.show()
256 16. PANDAS
Letโs look at one more example of downloading and manipulating data โ this time from the
World Bank
The World Bank collects and organizes data on a huge range of indicators
For example, hereโs some data on government debt as a ratio to GDP
If you click on โDOWNLOAD DATAโ you will be given the option to download the data as
an Excel file
The next program does this for you, reads an Excel file into a pandas DataFrame, and plots
time series for the US and Australia
16.6 Exercises
16.6.1 Exercise 1
Write a program to calculate the percentage price change over 2013 for the following shares
A dataset of daily closing prices for the above firms can be found in pan-
das/data/ticker_data.csv and can be downloaded here
Plot the result as a bar graph like follows
16.7 Solutions
16.7.1 Exercise 1
In [35]: ticker = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas/data/tic
ticker.set_index('Date', inplace=True)
'AMZN': 'Amazon',
'BA': 'Boeing',
'QCOM': 'Qualcomm',
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}
price_change = pd.Series()
price_change.sort_values(inplace=True)
fig, ax = plt.subplots(figsize=(10,8))
price_change.plot(kind='bar', ax=ax)
plt.show()
Footnotes
[1] Wikipedia defines munging as cleaning data from one raw form into a structured, purged
one.
17
17.1 Contents
โข Overview 17.2
โข Exercises 17.7
โข Solutions 17.8
17.2 Overview
pandas (derived from โpanelโ and โdataโ) contains powerful and easy-to-use tools for solving
exactly these kinds of problems
In what follows, we will use a panel data set of real minimum wages from the OECD to cre-
ate:
259
260 17. PANDAS FOR PANEL DATA
We will begin by reading in our long format panel data from a CSV file and reshaping the
resulting DataFrame with pivot_table to build a MultiIndex
Additional detail will be added to our DataFrame using pandasโ merge function, and data
will be summarized with the groupby function
Most of this lecture was created by Natasha Watkins
We will read in a dataset from the OECD of real minimum wages in 32 countries and assign
it to realwage
The dataset pandas_panel/realwage.csv can be downloaded here
Make sure the file is in your current working directory
realwage = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/r
The data is currently in long format, which is difficult to analyze when there are several di-
mensions to the data
We will use pivot_table to create a wide format panel, with a MultiIndex to handle
higher dimensional data
pivot_table arguments should specify the data (values), the index, and the columns we
want in our resulting dataframe
By passing a list in columns, we can create a MultiIndex in our column axis
Country โฆ \
Series In 2015 constant prices at 2015 USD exchange rates โฆ
Pay period Annual โฆ
Time โฆ
2006-01-01 23,826.64 โฆ
2007-01-01 24,616.84 โฆ
2008-01-01 24,185.70 โฆ
2009-01-01 24,496.84 โฆ
2010-01-01 24,373.76 โฆ
Country
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Time
2006-01-01 12,594.40 6.05
2007-01-01 12,974.40 6.24
2008-01-01 14,097.56 6.78
2009-01-01 15,756.42 7.58
2010-01-01 16,391.31 7.88
To more easily filter our time series data, later on, we will convert the index into a Date-
TimeIndex
Out[4]: pandas.core.indexes.datetimes.DatetimeIndex
The columns contain multiple levels of indexing, known as a MultiIndex, with levels being
ordered hierarchically (Country > Series > Pay period)
A MultiIndex is the simplest and most flexible way to manage panel data in pandas
In [5]: type(realwage.columns)
Out[5]: pandas.core.indexes.multi.MultiIndex
In [6]: realwage.columns.names
Like before, we can select the country (the top level of our MultiIndex)
262 17. PANDAS FOR PANEL DATA
Stacking and unstacking levels of the MultiIndex will be used throughout this lecture to
reshape our dataframe into a format we need
.stack() rotates the lowest level of the column MultiIndex to the row index (.un-
stack() works in the opposite direction - try it out)
In [8]: realwage.stack().head()
Country \
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 23,826.64
Hourly 12.06
2007-01-01 Annual 24,616.84
Hourly 12.46
2008-01-01 Annual 24,185.70
Country Belgium โฆ \
Series In 2015 constant prices at 2015 USD PPPs โฆ
Time Pay period โฆ
2006-01-01 Annual 21,042.28 โฆ
Hourly 10.09 โฆ
2007-01-01 Annual 21,310.05 โฆ
Hourly 10.22 โฆ
2008-01-01 Annual 21,416.96 โฆ
Country
Series In 2015 constant prices at 2015 USD exchange rates
Time Pay period
2006-01-01 Annual 12,594.40
Hourly 6.05
2007-01-01 Annual 12,974.40
Hourly 6.24
2008-01-01 Annual 14,097.56
[5 rows x 64 columns]
We can also pass in an argument to select the level we would like to stack
In [9]: realwage.stack(level='Country').head()
Time
Series In 2015 constant prices at 2015 USD exchange rates
Pay period Annual Hourly
Country
Australia 25,349.90 12.83
Belgium 20,753.48 9.95
Brazil 2,842.28 1.21
Canada 17,367.24 8.35
Chile 4,251.49 1.81
For the rest of lecture, we will work with a dataframe of the hourly real minimum wages
across countries and time, measured in 2015 US dollars
264 17. PANDAS FOR PANEL DATA
To create our filtered dataframe (realwage_f), we can use the xs method to select values
at lower levels in the multiindex, while keeping the higher levels (countries in this case)
In [11]: realwage_f = realwage.xs(('Hourly', 'In 2015 constant prices at 2015 USD exchange rates'),
level=('Pay period', 'Series'), axis=1)
realwage_f.head()
[5 rows x 32 columns]
Similar to relational databases like SQL, pandas has built in methods to merge datasets to-
gether
Using country information from WorldData.info, weโll add the continent of each country to
realwage_f with the merge function
The CSV file can be found in pandas_panel/countries.csv and can be downloaded
here
[5 rows x 17 columns]
First, weโll select just the country and continent variables from worlddata and rename the
column to โCountryโ
In [14]: realwage_f.transpose().head()
Time 2016-01-01
Country
Australia 12.98
Belgium 9.76
Brazil 1.24
Canada 8.48
Chile 1.91
[5 rows x 11 columns]
We can use either left, right, inner, or outer join to merge our datasets:
We will also need to specify where the country name is located in each dataframe, which will
be the key that is used to merge the dataframes โonโ
Our โleftโ dataframe (realwage_f.transpose()) contains countries in the index, so we
set left_index=True
Our โrightโ dataframe (worlddata) contains countries in the โCountryโ column, so we set
right_on='Country'
[5 rows x 13 columns]
Countries that appeared in realwage_f but not in worlddata will have NaN in the Conti-
nent column
To check whether this has occurred, we can use .isnull() on the continent column and
filter the merged dataframe
In [16]: merged[merged['Continent'].isnull()]
[3 rows x 13 columns]
merged['Country'].map(missing_continents)
Out[17]: 17 NaN
23 NaN
32 NaN
100 NaN
38 NaN
108 NaN
41 NaN
225 NaN
53 NaN
58 NaN
45 NaN
68 NaN
233 NaN
86 NaN
88 NaN
91 NaN
247 Asia
117 NaN
122 NaN
123 NaN
138 NaN
153 NaN
151 NaN
174 NaN
175 NaN
247 Europe
247 Europe
198 NaN
200 NaN
227 NaN
241 NaN
240 NaN
Name: Country, dtype: object
merged[merged['Country'] == 'Korea']
268 17. PANDAS FOR PANEL DATA
[1 rows x 13 columns]
We will also combine the Americas into a single continent - this will make our visualization
nicer later on
To do this, we will use .replace() and loop through a list of the continent values we want
to replace
Now that we have all the data we want in a single DataFrame, we will reshape it back into
panel form with a MultiIndex
We should also ensure to sort the index using .sort_index() so that we can efficiently fil-
ter our dataframe later on
By default, levels will be sorted top-down
2015-01-01 2016-01-01
Continent Country
America Brazil 1.21 1.24
Canada 8.35 8.48
Chile 1.81 1.91
Colombia 1.13 1.12
Costa Rica 2.56 2.63
[5 rows x 11 columns]
While merging, we lost our DatetimeIndex, as we merged columns that were not in date-
time format
In [21]: merged.columns
Now that we have set the merged columns as the index, we can recreate a DatetimeIndex
using .to_datetime()
17.5. GROUPING AND SUMMARIZING DATA 269
The DatetimeIndex tends to work more smoothly in the row axis, so we will go ahead and
transpose merged
[5 rows x 32 columns]
Grouping and summarizing data can be particularly useful for understanding large panel
datasets
A simple way to summarize data is to call an aggregation method on the dataframe, such as
.mean() or .max()
For example, we can calculate the average real minimum wage for each country over the pe-
riod 2006 to 2016 (the default is to aggregate over rows)
In [24]: merged.mean().head(10)
Using this series, we can plot the average real minimum wage over the past decade for each
country in our data set
plt.show()
Passing in axis=1 to .mean() will aggregate over columns (giving the average minimum
wage for all countries over time)
In [26]: merged.mean(axis=1).head()
Out[26]: Time
2006-01-01 4.69
2007-01-01 4.84
2008-01-01 4.90
2009-01-01 5.08
2010-01-01 5.11
dtype: float64
In [27]: merged.mean(axis=1).plot()
plt.title('Average real minimum wage 2006 - 2016')
17.5. GROUPING AND SUMMARIZING DATA 271
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
We can also specify a level of the MultiIndex (in the column axis) to aggregate over
We can plot the average minimum wages in each continent as a time series
In [31]: merged.stack().describe()
The groupby method achieves the first step of this process, creating a new
DataFrameGroupBy object with data split into groups
Letโs split merged by continent again, this time using the groupby function, and name the
resulting object grouped
Calling an aggregation method on the object applies the function to each group, the results of
which are combined in a new data structure
For example, we can return the number of countries in our dataset for each continent using
.size()
In this case, our new data structure is a Series
In [33]: grouped.size()
Out[33]: Continent
America 7
Asia 4
Europe 19
dtype: int64
Calling .get_group() to return just the countries in a single group, we can create a kernel
density estimate of the distribution of real minimum wages in 2016 for each continent
grouped.groups.keys() will return the keys from the groupby object
274 17. PANDAS FOR PANEL DATA
continents = grouped.groups.keys()
This lecture has provided an introduction to some of pandasโ more advanced features, includ-
ing multiindices, merging, grouping and plotting
Other tools that may be useful in panel data analysis include xarray, a python package that
extends pandas to N-dimensional data structures
17.7 Exercises
17.7.1 Exercise 1
In these exercises, youโll work with a dataset of employment rates in Europe by age and sex
from Eurostat
The dataset pandas_panel/employ.csv can be downloaded here
Reading in the CSV file returns a panel dataset in long format. Use .pivot_table() to
construct a wide format dataframe with a MultiIndex in the columns
17.8. SOLUTIONS 275
Start off by exploring the dataframe and the variables available in the MultiIndex levels
Write a program that quickly returns all values in the MultiIndex
17.7.2 Exercise 2
Filter the above dataframe to only include employment as a percentage of โactive populationโ
Create a grouped boxplot using seaborn of employment rates in 2015 by age group and sex
Hint: GEO includes both areas and countries
17.8 Solutions
17.8.1 Exercise 1
In [35]: employ = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/pandas_panel/em
employ = employ.pivot_table(values='Value',
index=['DATE'],
columns=['UNIT','AGE', 'SEX', 'INDIC_EM', 'GEO'])
employ.index = pd.to_datetime(employ.index) # ensure that dates are datetime format
employ.head()
UNIT
AGE
SEX
INDIC_EM
GEO United Kingdom
DATE
2007-01-01 4,131.00
2008-01-01 4,204.00
2009-01-01 4,193.00
2010-01-01 4,186.00
2011-01-01 4,164.00
This is a large dataset so it is useful to explore the levels and variables available
In [36]: employ.columns.names
276 17. PANDAS FOR PANEL DATA
17.8.2 Exercise 2
To easily filter by country, swap GEO to the top level and sort the MultiIndex
We need to get rid of a few items in GEO which are not countries
A fast way to get rid of the EU areas is to use a list comprehension to find the level values in
GEO that begin with โEuroโ
Select only percentage employed in the active population from the dataframe
GEO
AGE
SEX Total
DATE
2007-01-01 59.30
2008-01-01 59.80
2009-01-01 60.30
2010-01-01 60.00
2011-01-01 59.70
18.1 Contents
โข Overview 18.2
โข Endogeneity 18.5
โข Summary 18.6
โข Exercises 18.7
โข Solutions 18.8
In addition to whatโs in Anaconda, this lecture will need the following libraries
18.2 Overview
Linear regression is a standard tool for analyzing the relationship between two or more vari-
ables
In this lecture, weโll use the Python package statsmodels to estimate, interpret, and visu-
alize linear regression models
Along the way, weโll discuss a variety of topics, including
As an example, we will replicate results from Acemoglu, Johnson and Robinsonโs seminal pa-
per [3]
279
280 18. LINEAR REGRESSION IN PYTHON
In the paper, the authors emphasize the importance of institutions in economic development
The main contribution is the use of settler mortality rates as a source of exogenous variation
in institutional differences
Such variation is needed to determine whether it is institutions that give rise to greater eco-
nomic growth, rather than the other way around
18.2.1 Prerequisites
18.2.2 Comments
[3] wish to determine whether or not differences in institutions can help to explain observed
economic outcomes
How do we measure institutional differences and economic outcomes?
In this paper,
โข economic outcomes are proxied by log GDP per capita in 1995, adjusted for exchange
rates
โข institutional differences are proxied by an index of protection against expropriation on
average over 1985-95, constructed by the Political Risk Services Group
These variables and other data used in the paper are available for download on Daron Ace-
mogluโs webpage
We will use pandasโ .read_stata() function to read in data contained in the .dta files to
dataframes
df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable1.dt
df1.head()
Letโs use a scatterplot to see whether any obvious relationship exists between GDP per capita
and the protection against expropriation index
The plot shows a fairly strong positive relationship between protection against expropriation
and log GDP per capita
Specifically, if higher protection against expropriation is a measure of institutional quality,
then better institutions appear to be positively correlated with better economic outcomes
(higher GDP per capita)
Given the plot, choosing a linear model to describe this relationship seems like a reasonable
assumption
We can write our model as
๐๐๐๐๐๐95๐ = ๐ฝ0 + ๐ฝ1 ๐๐ฃ๐๐ฅ๐๐๐ + ๐ข๐
where:
โข ๐ฝ1 is the slope of the linear trend line, representing the marginal effect of protection
against risk on log GDP per capita
โข ๐ข๐ is a random error term (deviations of observations from the linear trend due to fac-
tors not included in the model)
Visually, this linear model involves choosing a straight line that best fits the data, as in the
following plot (Figure 2 in [3])
X = df1_subset['avexpr']
y = df1_subset['logpgp95']
labels = df1_subset['shortnam']
plt.xlim([3.3,10.5])
plt.ylim([4,10.5])
plt.xlabel('Average Expropriation Risk 1985-95')
plt.ylabel('Log GDP per capita, PPP, 1995')
plt.title('Figure 2: OLS relationship between expropriation risk and income')
plt.show()
18.3. SIMPLE LINEAR REGRESSION 283
The most common technique to estimate the parameters (๐ฝโs) of the linear model is Ordinary
Least Squares (OLS)
As the name implies, an OLS model is solved by finding the parameters that minimize the
sum of squared residuals, ie.
๐
min โ ๐ขฬ2๐
๐ฝฬ ๐=1
where ๐ขฬ๐ is the difference between the observation and the predicted value of the dependent
variable
To estimate the constant term ๐ฝ0 , we need to add a column of 1โs to our dataset (consider
the equation if ๐ฝ0 was replaced with ๐ฝ0 ๐ฅ๐ and ๐ฅ๐ = 1)
In [5]: df1['const'] = 1
Now we can construct our model in statsmodels using the OLS function
We will use pandas dataframes with statsmodels, however standard arrays can also be
used as arguments
Out[6]: statsmodels.regression.linear_model.OLS
Out[7]: statsmodels.regression.linear_model.RegressionResultsWrapper
In [8]: print(results.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Using our parameter estimates, we can now write our estimated relationship as
ฬ
๐๐๐๐๐๐95 ๐ = 4.63 + 0.53 ๐๐ฃ๐๐ฅ๐๐๐
This equation describes the line that best fits our data, as shown in Figure 2
We can use this equation to predict the level of log GDP per capita for a value of the index of
expropriation protection
For example, for a country with an index value of 7.07 (the average for the dataset), we find
that their predicted level of log GDP per capita in 1995 is 8.38
Out[9]: 6.515625
Out[10]: 8.3771
An easier (and more accurate) way to obtain this result is to use .predict() and set
๐๐๐๐ ๐ก๐๐๐ก = 1 and ๐๐ฃ๐๐ฅ๐๐๐ = ๐๐๐๐_๐๐ฅ๐๐
Out[11]: array([8.09156367])
We can obtain an array of predicted ๐๐๐๐๐๐95๐ for every value of ๐๐ฃ๐๐ฅ๐๐๐ in our dataset by
calling .predict() on our results
Plotting the predicted values against ๐๐ฃ๐๐ฅ๐๐๐ shows that the predicted values lie along the
linear line that we fitted above
The observed values of ๐๐๐๐๐๐95๐ are also plotted for comparison purposes
plt.legend()
plt.title('OLS predicted values')
plt.xlabel('avexpr')
plt.ylabel('logpgp95')
plt.show()
So far we have only accounted for institutions affecting economic performance - almost cer-
tainly there are numerous other factors affecting GDP that are not included in our model
286 18. LINEAR REGRESSION IN PYTHON
Leaving out variables that affect ๐๐๐๐๐๐95๐ will result in omitted variable bias, yielding
biased and inconsistent parameter estimates
We can extend our bivariate regression model to a multivariate regression model by
adding in other factors that may affect ๐๐๐๐๐๐95๐
[3] consider other factors such as:
Letโs estimate some of the extended models considered in the paper (Table 2) using data from
maketable2.dta
Now that we have fitted our model, we will use summary_col to display the results in a sin-
gle table (model numbers correspond to those in the paper)
results_table = summary_col(results=[reg1,reg2,reg3],
float_format='%0.2f',
stars = True,
model_names=['Model 1',
'Model 3',
'Model 4'],
info_dict=info_dict,
regressor_order=['const',
'avexpr',
'lat_abst',
'asia',
'africa'])
print(results_table)
(0.49) (0.45)
asia -0.15
(0.15)
africa -0.92***
(0.17)
other 0.30
(0.37)
R-squared 0.61 0.62 0.72
No. observations 111 111 111
=========================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01
18.5 Endogeneity
As [3] discuss, the OLS models likely suffer from endogeneity issues, resulting in biased and
inconsistent model estimates
Namely, there is likely a two-way relationship between institutions and economic outcomes:
To deal with endogeneity, we can use two-stage least squares (2SLS) regression, which
is an extension of OLS regression
This method requires replacing the endogenous variable ๐๐ฃ๐๐ฅ๐๐๐ with a variable that is:
The new set of regressors is called an instrument, which aims to remove endogeneity in our
proxy of institutional differences
The main contribution of [3] is the use of settler mortality rates to instrument for institu-
tional differences
They hypothesize that higher mortality rates of colonizers led to the establishment of insti-
tutions that were more extractive in nature (less protection against expropriation), and these
institutions still persist today
Using a scatterplot (Figure 3 in [3]), we can see protection against expropriation is negatively
correlated with settler mortality rates, coinciding with the authorsโ hypothesis and satisfying
the first condition of a valid instrument
X = df1_subset2['logem4']
y = df1_subset2['avexpr']
labels = df1_subset2['shortnam']
plt.scatter(X, y, marker='')
plt.xlim([1.8,8.4])
plt.ylim([3.3,10.4])
plt.xlabel('Log of Settler Mortality')
plt.ylabel('Average Expropriation Risk 1985-95')
plt.title('Figure 3: First-stage relationship between settler mortality and expropriation risk')
plt.show()
The second condition may not be satisfied if settler mortality rates in the 17th to 19th cen-
turies have a direct effect on current GDP (in addition to their indirect effect through institu-
tions)
For example, settler mortality rates may be related to the current disease environment in a
country, which could affect current economic performance
[3] argue this is unlikely because:
โข The majority of settler deaths were due to malaria and yellow fever and had a limited
effect on local people
โข The disease burden on local people in Africa or India, for example, did not appear to
be higher than average, supported by relatively high population densities in these areas
before colonization
As we appear to have a valid instrument, we can use 2SLS regression to obtain consistent and
unbiased parameter estimates
First stage
18.5. ENDOGENEITY 289
The first stage involves regressing the endogenous variable (๐๐ฃ๐๐ฅ๐๐๐ ) on the instrument
The instrument is the set of all exogenous variables in our model (and not just the variable
we have replaced)
Using model 1 as an example, our instrument is simply a constant and settler mortality rates
๐๐๐๐๐4๐
Therefore, we will estimate the first-stage regression as
๐๐ฃ๐๐ฅ๐๐๐ = ๐ฟ0 + ๐ฟ1 ๐๐๐๐๐4๐ + ๐ฃ๐
The data we need to estimate this equation is located in maketable4.dta (only complete
data, indicated by baseco = 1, is used for estimation)
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Second stage
We need to retrieve the predicted values of ๐๐ฃ๐๐ฅ๐๐๐ using .predict()
We then replace the endogenous variable ๐๐ฃ๐๐ฅ๐๐๐ with the predicted values ๐๐ฃ๐๐ฅ๐๐
ฬ ๐ in the
original linear model
Our second stage regression is thus
๐๐๐๐๐๐95๐ = ๐ฝ0 + ๐ฝ1 ๐๐ฃ๐๐ฅ๐๐
ฬ ๐ + ๐ข๐
290 18. LINEAR REGRESSION IN PYTHON
results_ss = sm.OLS(df4['logpgp95'],
df4[['const', 'predicted_avexpr']]).fit()
print(results_ss.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The second-stage regression results give us an unbiased and consistent estimate of the effect
of institutions on economic outcomes
The result suggests a stronger positive relationship than what the OLS results indicated
Note that while our parameter estimates are correct, our standard errors are not and for this
reason, computing 2SLS โmanuallyโ (in stages with OLS) is not recommended
We can correctly estimate a 2SLS regression in one step using the linearmodels package, an
extension of statsmodels
Note that when using IV2SLS, the exogenous and instrument variables are split up in the
function arguments (whereas before the instrument included exogenous variables)
In [19]: iv = IV2SLS(dependent=df4['logpgp95'],
exog=df4['const'],
endog=df4['avexpr'],
instruments=df4['logem4']).fit(cov_type='unadjusted')
print(iv.summary)
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 1.9097 1.0106 1.8897 0.0588 -0.0710 3.8903
avexpr 0.9443 0.1541 6.1293 0.0000 0.6423 1.2462
==============================================================================
Endogenous: avexpr
Instruments: logem4
Unadjusted Covariance (Homoskedastic)
Debiased: False
Given that we now have consistent and unbiased estimates, we can infer from the model we
have estimated that institutional differences (stemming from institutions set up during colo-
nization) can help to explain differences in income levels across countries today
[3] use a marginal effect of 0.94 to calculate that the difference in the index between Chile
and Nigeria (ie. institutional quality) implies up to a 7-fold difference in income, emphasizing
the significance of institutions in economic development
18.6 Summary
We have demonstrated basic OLS and 2SLS regression in statsmodels and linearmod-
els
If you are familiar with R, you may want to use the formula interface to statsmodels, or
consider using r2py to call R from within Python
18.7 Exercises
18.7.1 Exercise 1
In the lecture, we think the original model suffers from endogeneity bias due to the likely ef-
fect income has on institutional development
Although endogeneity is often best identified by thinking about the data and model, we can
formally test for endogeneity using the Hausman test
We want to test for correlation between the endogenous variable, ๐๐ฃ๐๐ฅ๐๐๐ , and the errors, ๐ข๐
๐๐ฃ๐๐ฅ๐๐๐ = ๐0 + ๐1 ๐๐๐๐๐4๐ + ๐๐
Second, we retrieve the residuals ๐๐ฬ and include them in the original equation
If ๐ผ is statistically significant (with a p-value < 0.05), then we reject the null hypothesis and
conclude that ๐๐ฃ๐๐ฅ๐๐๐ is endogenous
Using the above information, estimate a Hausman test and interpret your results
18.7.2 Exercise 2
The OLS parameter ๐ฝ can also be estimated using matrix algebra and numpy (you may need
to review the numpy lecture to complete this exercise)
The linear equation we want to estimate is (written in matrix form)
๐ฆ = ๐๐ฝ + ๐ข
To solve for the unknown parameter ๐ฝ, we want to minimize the sum of squared residuals
min๐ขฬโฒ ๐ขฬ
๐ฝฬ
Rearranging the first equation and substituting into the second equation, we can write
Solving this optimization problem gives the solution for the ๐ฝ ฬ coefficients
๐ฝ ฬ = (๐ โฒ ๐)โ1 ๐ โฒ ๐ฆ
Using the above information, compute ๐ฝ ฬ from model 1 using numpy - your results should be
the same as those in the statsmodels output from earlier in the lecture
18.8 Solutions
18.8.1 Exercise 1
In [20]: # Load in data
df4 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable4.d
print(reg2.summary())
18.8. SOLUTIONS 293
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The output shows that the coefficient on the residuals is statistically significant, indicating
๐๐ฃ๐๐ฅ๐๐๐ is endogenous
18.8.2 Exercise 2
In [21]: # Load in data
df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/master/ols/maketable1.d
df1 = df1.dropna(subset=['logpgp95', 'avexpr'])
# Compute ฮฒ_hat
ฮฒ_hat = np.linalg.solve(X.T @ X, X.T @ y)
ฮฒ_0 = 4.6
ฮฒ_1 = 0.53
19.1 Contents
โข Overview 19.2
โข Summary 19.8
โข Exercises 19.9
โข Solutions 19.10
19.2 Overview
In a previous lecture, we estimated the relationship between dependent and explanatory vari-
ables using linear regression
But what if a linear relationship is not an appropriate assumption for our model?
One widely used alternative is maximum likelihood estimation, which involves specifying a
class of distributions, indexed by unknown parameters, and then using the data to pin down
these parameter values
The benefit relative to linear regression is that it allows more flexibility in the probabilistic
relationships between variables
Here we illustrate maximum likelihood by replicating Daniel Treismanโs (2016) paper, Rus-
siaโs Billionaires, which connects the number of billionaires in a country to its economic char-
acteristics
The paper concludes that Russia has a higher number of billionaires than economic factors
such as market size and tax rate predict
295
296 19. MAXIMUM LIKELIHOOD ESTIMATION
19.2.1 Prerequisites
19.2.2 Comments
Letโs consider the steps we need to go through in maximum likelihood estimation and how
they pertain to this study
The first step with maximum likelihood estimation is to choose the probability distribution
believed to be generating the data
More precisely, we need to make an assumption as to which parametric class of distributions
is generating the data
โข e.g., the class of all normal distributions, or the class of all gamma distributions
โข e.g., the class of normal distributions is a family of distributions indexed by its mean
๐ โ (โโ, โ) and standard deviation ๐ โ (0, โ)
Weโll let the data pick out a particular element of the class by pinning down the parameters
The parameter estimates so produced will be called maximum likelihood estimates
๐๐ฆ โ๐
๐(๐ฆ) = ๐ , ๐ฆ = 0, 1, 2, โฆ , โ
๐ฆ!
We can plot the Poisson distribution over ๐ฆ for different values of ๐ as follows
19.3. SET UP AND ASSUMPTIONS 297
ax.grid()
ax.set_xlabel('$y$', fontsize=14)
ax.set_ylabel('$f(y \mid \mu)$', fontsize=14)
ax.axis(xmin=0, ymin=0)
ax.legend(fontsize=14)
plt.show()
Notice that the Poisson distribution begins to resemble a normal distribution as the mean of
๐ฆ increases
Letโs have a look at the distribution of the data weโll be working with in this lecture
Treismanโs main source of data is Forbesโ annual rankings of billionaires and their estimated
net worth
The dataset mle/fp.dta can be downloaded here or from its AER page
298 19. MAXIMUM LIKELIHOOD ESTIMATION
[5 rows x 36 columns]
Using a histogram, we can view the distribution of the number of billionaires per country,
numbil0, in 2008 (the United States is dropped for plotting purposes)
plt.subplots(figsize=(12, 8))
plt.hist(numbil0_2008, bins=30)
plt.xlim(xmin=0)
plt.grid()
plt.xlabel('Number of billionaires in 2008')
plt.ylabel('Count')
plt.show()
/home/anju/anaconda3/lib/python3.7/site-packages/matplotlib/axes/_base.py:3215: MatplotlibDeprecationWarning:
The `xmin` argument was deprecated in Matplotlib 3.0 and will be removed in 3.2. Use `left` instead.
alternative='`left`', obj_type='argument')
19.4. CONDITIONAL DISTRIBUTIONS 299
From the histogram, it appears that the Poisson assumption is not unreasonable (albeit with
a very low ๐ and some outliers)
๐ฆ
๐ ๐
๐(๐ฆ๐ โฃ x๐ ) = ๐ ๐โ๐๐ ; ๐ฆ๐ = 0, 1, 2, โฆ , โ. (1)
๐ฆ๐ !
To illustrate the idea that the distribution of ๐ฆ๐ depends on x๐ letโs run a simple simulation
We use our poisson_pmf function from above and arbitrary values for ๐ฝ and x๐
for X in datasets:
ฮผ = exp(X @ ฮฒ)
distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, ฮผ))
ax.plot(y_values,
distribution,
label=f'$\mu_i$={ฮผ:.1}',
marker='o',
markersize=8,
alpha=0.5)
ax.grid()
ax.legend()
ax.set_xlabel('$y \mid x_i$')
ax.set_ylabel(r'$f(y \mid x_i; \beta )$')
ax.axis(xmin=0, ymin=0)
plt.show()
300 19. MAXIMUM LIKELIHOOD ESTIMATION
In our model for number of billionaires, the conditional distribution contains 4 (๐ = 4) pa-
rameters that we need to estimate
We will label our entire parameter vector as ๐ฝ where
๐ฝ0
โก๐ฝ โค
๐ฝ = โข 1โฅ
โข๐ฝ2 โฅ
โฃ๐ฝ3 โฆ
To estimate the model using MLE, we want to maximize the likelihood that our estimate ๐ฝฬ is
the true parameter ๐ฝ
Intuitively, we want to find the ๐ฝฬ that best fits our data
First, we need to construct the likelihood function โ(๐ฝ), which is similar to a joint probabil-
ity density function
Assume we have some data ๐ฆ๐ = {๐ฆ1 , ๐ฆ2 } and ๐ฆ๐ โผ ๐(๐ฆ๐ )
If ๐ฆ1 and ๐ฆ2 are independent, the joint pmf of these data is ๐(๐ฆ1 , ๐ฆ2 ) = ๐(๐ฆ1 ) โ
๐(๐ฆ2 )
If ๐ฆ๐ follows a Poisson distribution with ๐ = 7, we can visualize the joint pmf like so
plot_joint_poisson(ฮผ=7, y_n=20)
Similarly, the joint pmf of our data (which is distributed as a conditional Poisson distribu-
tion) can be written as
๐ ๐ฆ
๐ ๐
๐(๐ฆ1 , ๐ฆ2 , โฆ , ๐ฆ๐ โฃ x1 , x2 , โฆ , x๐ ; ๐ฝ) = โ ๐ ๐โ๐๐
๐ฆ!
๐=1 ๐
๐ ๐ฆ
๐ ๐
โ(๐ฝ โฃ ๐ฆ1 , ๐ฆ2 , โฆ , ๐ฆ๐ ; x1 , x2 , โฆ , x๐ ) = โ ๐ ๐โ๐๐
๐ฆ!
๐=1 ๐
=๐(๐ฆ1 , ๐ฆ2 , โฆ , ๐ฆ๐ โฃ x1 , x2 , โฆ , x๐ ; ๐ฝ)
302 19. MAXIMUM LIKELIHOOD ESTIMATION
Now that we have our likelihood function, we want to find the ๐ฝฬ that yields the maximum
likelihood value
maxโ(๐ฝ)
๐ฝ
The MLE of the Poisson to the Poisson for ๐ฝ ฬ can be obtained by solving
๐ ๐ ๐
max( โ ๐ฆ๐ log ๐๐ โ โ ๐๐ โ โ log ๐ฆ!)
๐ฝ
๐=1 ๐=1 ๐=1
However, no analytical solution exists to the above problem โ to find the MLE we need to use
numerical methods
Many distributions do not have nice, analytical solutions and therefore require numerical
methods to solve for parameter estimates
One such numerical method is the Newton-Raphson algorithm
Our goal is to find the maximum likelihood estimate ๐ฝฬ
At ๐ฝ,ฬ the first derivative of the log-likelihood function will be equal to 0
Letโs illustrate this by supposing
ax1.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=35,
fontsize=15)
ax2.set_ylabel(r'$\frac{dlog \mathcal{L(\beta)}}{d \beta}$ ',
rotation=0,
labelpad=35,
fontsize=19)
ax2.set_xlabel(r'$\beta$', fontsize=15)
ax1.grid(), ax2.grid()
plt.axhline(c='black')
plt.show()
๐ log โ(๐ฝ)
The plot shows that the maximum likelihood value (the top plot) occurs when ๐๐ฝ = 0
(the bottom plot)
Therefore, the likelihood is maximized when ๐ฝ = 10
We can also ensure that this value is a maximum (as opposed to a minimum) by checking
that the second derivative (slope of the bottom plot) is negative
The Newton-Raphson algorithm finds a point where the first derivative is 0
To use the algorithm, we take an initial guess at the maximum value, ๐ฝ0 (the OLS parameter
estimates might be a reasonable guess), then
where:
304 19. MAXIMUM LIKELIHOOD ESTIMATION
As can be seen from the updating equation, ๐ฝ (๐+1) = ๐ฝ (๐) only when ๐บ(๐ฝ (๐) ) = 0 ie. where the
first derivative is equal to 0
(In practice, we stop iterating when the difference is below a small tolerance threshold)
Letโs have a go at implementing the Newton-Raphson algorithm
First, weโll create a class called PoissonRegression so we can easily recompute the values
of the log likelihood, gradient and Hessian for every iteration
def ฮผ(self):
return np.exp(self.X @ self.ฮฒ)
def logL(self):
y = self.y
ฮผ = self.ฮผ()
return np.sum(y * np.log(ฮผ) - ฮผ - np.log(factorial(y)))
def G(self):
y = self.y
ฮผ = self.ฮผ()
return X.T @ (y - ฮผ)
def H(self):
X = self.X
ฮผ = self.ฮผ()
return -(X.T @ (ฮผ * X))
Our function newton_raphson will take a PoissonRegression object that has an initial
guess of the parameter vector ๐ฝ 0
The algorithm will update the parameter vector according to the updating rule, and recalcu-
late the gradient and Hessian matrices at the new parameter estimates
Iteration will end when either:
โข The difference between the parameter and the updated parameter is below a tolerance
level
โข The maximum number of iterations has been achieved (meaning convergence is not
achieved)
19.6. MLE WITH NUMERICAL METHODS 305
So we can get an idea of whatโs going on while the algorithm is running, an option dis-
play=True is added to print out values at each iteration
i = 0
error = 100 # Initial error value
# Print iterations
if display:
ฮฒ_list = [f'{t:.3}' for t in list(model.ฮฒ.flatten())]
update = f'{i:<13}{model.logL():<16.8}{ฮฒ_list}'
print(update)
i += 1
return model.ฮฒ.flatten() # Return a flat array for ฮฒ (instead of a k_by_1 column vector)
Letโs try out our algorithm with a small dataset of 5 observations and 3 variables in X
y = np.array([1, 0, 1, 1, 0])
Iteration_k Log-likelihood ฮธ
-----------------------------------------------------------------------------------------
0 -4.3447622 ['-1.49', '0.265', '0.244']
1 -3.5742413 ['-3.38', '0.528', '0.474']
2 -3.3999526 ['-5.06', '0.782', '0.702']
3 -3.3788646 ['-5.92', '0.909', '0.82']
4 -3.3783559 ['-6.07', '0.933', '0.843']
5 -3.3783555 ['-6.08', '0.933', '0.843']
Number of iterations: 6
ฮฒ_hat = [-6.07848205 0.93340226 0.84329625]
As this was a simple model with few observations, the algorithm achieved convergence in only
6 iterations
306 19. MAXIMUM LIKELIHOOD ESTIMATION
You can see that with each iteration, the log-likelihood value increased
Remember, our objective was to maximize the log-likelihood function, which the algorithm
has worked to achieve
Also, note that the increase in log โ(๐ฝ (๐) ) becomes smaller with each iteration
This is because the gradient is approaching 0 as we reach the maximum, and therefore the
numerator in our updating equation is becoming smaller
The gradient vector should be close to 0 at ๐ฝฬ
In [10]: poi.G()
Out[10]: array([[-3.95169228e-07],
[-1.00114805e-06],
[-7.73114562e-07]])
The iterative process can be visualized in the following diagram, where the maximum is found
at ๐ฝ = 10
ฮฒ = np.linspace(2, 18)
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(ฮฒ, logL(ฮฒ), lw=2, c='black')
Note that our implementation of the Newton-Raphson algorithm is rather basic โ for more
robust implementations see, for example, scipy.optimize
Now that we know whatโs going on under the hood, we can apply MLE to an interesting ap-
plication
Weโll use the Poisson regression model in statsmodels to obtain a richer output with stan-
dard errors, test values, and more
statsmodels uses the same algorithm as above to find the maximum likelihood estimates
Before we begin, letโs re-estimate our simple model with statsmodels to confirm we obtain
the same coefficients and log-likelihood value
X = np.array([[1, 2, 5],
[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])
y = np.array([1, 0, 1, 1, 0])
Now letโs replicate results from Daniel Treismanโs paper, Russiaโs Billionaires, mentioned ear-
lier in the lecture
Treisman starts by estimating equation Eq. (1), where:
โข ๐ฆ๐ is ๐๐ข๐๐๐๐ ๐๐ ๐๐๐๐๐๐๐๐๐๐๐๐ ๐
โข ๐ฅ๐1 is log ๐บ๐ท๐ ๐๐๐ ๐๐๐๐๐ก๐๐
โข ๐ฅ๐2 is log ๐๐๐๐ข๐๐๐ก๐๐๐๐
โข ๐ฅ๐3 is ๐ฆ๐๐๐๐ ๐๐ ๐บ๐ด๐ ๐ ๐ โ years membership in GATT and WTO (to proxy access to in-
ternational markets)
# Add a constant
df['const'] = 1
# Variable sets
reg1 = ['const', 'lngdppc', 'lnpop', 'gattwto08']
reg2 = ['const', 'lngdppc', 'lnpop',
'gattwto08', 'lnmcap08', 'rintr', 'topint08']
reg3 = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08',
'rintr', 'topint08', 'nrrents', 'roflaw']
Then we can use the Poisson function from statsmodels to fit the model
Weโll use robust standard errors as in the authorโs paper
# Specify model
poisson_reg = sm.Poisson(df[['numbil0']], df[reg1],
missing='drop').fit(cov_type='HC0')
print(poisson_reg.summary())
results_table = summary_col(results=results,
float_format='%0.3f',
stars=True,
model_names=reg_names,
info_dict=info_dict,
regressor_order=regressor_order)
results_table.add_title('Table 1 - Explaining the Number of Billionaires in 2008')
print(results_table)
(0.010) (0.010)
topint08 -0.051***-0.058***
(0.011) (0.012)
nrrents -0.005
(0.010)
roflaw 0.203
(0.372)
Pseudo R-squared 0.86 0.90 0.90
No. observations 197 131 131
=================================================
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01
The output suggests that the frequency of billionaires is positively correlated with GDP
per capita, population size, stock market capitalization, and negatively correlated with top
marginal income tax rate
To analyze our results by country, we can plot the difference between the predicted an actual
values, then sort from highest to lowest and plot the first 15
# Calculate difference
results_df['difference'] = results_df['numbil0'] - results_df['prediction']
As we can see, Russia has by far the highest number of billionaires in excess of what is pre-
dicted by the model (around 50 more than expected)
Treisman uses this empirical result to discuss possible reasons for Russiaโs excess of billion-
aires, including the origination of wealth in Russia, the political climate, and the history of
privatization in the years after the USSR
19.8 Summary
19.9 Exercises
19.9.1 Exercise 1
Suppose we wanted to estimate the probability of an event ๐ฆ๐ occurring, given some observa-
tions
312 19. MAXIMUM LIKELIHOOD ESTIMATION
๐ฆ
๐(๐ฆ๐ ; ๐ฝ) = ๐๐ ๐ (1 โ ๐๐ )1โ๐ฆ๐ , ๐ฆ๐ = 0, 1
where ๐๐ = ฮฆ(xโฒ๐ ๐ฝ)
ฮฆ represents the cumulative normal distribution and constrains the predicted ๐ฆ๐ to be be-
tween 0 and 1 (as required for a probability)
๐ฝ is a vector of coefficients
Following the example in the lecture, write a class to represent the Probit model
To begin, find the log-likelihood function and derive the gradient and Hessian
The scipy module stats.norm contains the functions needed to compute the cmf and pmf
of the normal distribution
19.9.2 Exercise 2
Use the following dataset and initial values of ๐ฝ to estimate the MLE with the Newton-
Raphson algorithm developed earlier in the lecture
1 2 4 1
โก1 1 1โค โก0โค 0.1
โข โฅ โข โฅ
X = โข1 4 3โฅ ๐ฆ = โข1โฅ ๐ฝ (0) = โก
โข0.1โฅ
โค
โข1 5 6โฅ โข1โฅ โฃ0.1โฆ
โฃ1 3 5โฆ โฃ0โฆ
Verify your results with statsmodels - you can import the Probit function with the follow-
ing import statement
Note that the simple Newton-Raphson algorithm developed in this lecture is very sensitive to
initial values, and therefore you may fail to achieve convergence with different starting values
19.10 Solutions
19.10.1 Exercise 1
๐
log โ = โ [๐ฆ๐ log ฮฆ(xโฒ๐ ๐ฝ) + (1 โ ๐ฆ๐ ) log(1 โ ฮฆ(xโฒ๐ ๐ฝ))]
๐=1
๐
ฮฆ(๐ ) = ๐(๐ )
๐๐
19.10. SOLUTIONS 313
๐
๐ log โ ๐(xโฒ๐ ๐ฝ) ๐(xโฒ๐ ๐ฝ)
= โ [๐ฆ๐ โ (1 โ ๐ฆ ๐ ) ]x
๐๐ฝ ๐=1
ฮฆ(xโฒ๐ ๐ฝ) 1 โ ฮฆ(xโฒ๐ ๐ฝ) ๐
๐
๐ 2 log โ โฒ ๐(xโฒ๐ ๐ฝ) + xโฒ๐ ๐ฝฮฆ(xโฒ๐ ๐ฝ) ๐๐ (xโฒ๐ ๐ฝ) โ xโฒ๐ ๐ฝ(1 โ ฮฆ(xโฒ๐ ๐ฝ))
โฒ = โ โ ๐(x ๐ ๐ฝ)[๐ฆ ๐ โฒ 2
+ (1 โ ๐ฆ ๐ ) โฒ 2
]x๐ xโฒ๐
๐๐ฝ๐๐ฝ ๐=1
[ฮฆ(x ๐ ๐ฝ)] [1 โ ฮฆ(x ๐ ๐ฝ)]
Using these results, we can write a class for the Probit model as follows
class ProbitRegression:
def ฮผ(self):
return norm.cdf(self.X @ self.ฮฒ.T)
def ๏ฟฝ(self):
return norm.pdf(self.X @ self.ฮฒ.T)
def logL(self):
ฮผ = self.ฮผ()
return np.sum(y * np.log(ฮผ) + (1 - y) * np.log(1 - ฮผ))
def G(self):
ฮผ = self.ฮผ()
๏ฟฝ = self.๏ฟฝ()
return np.sum((X.T * y * ๏ฟฝ / ฮผ - X.T * (1 - y) * ๏ฟฝ / (1 - ฮผ)), axis=1)
def H(self):
X = self.X
ฮฒ = self.ฮฒ
ฮผ = self.ฮผ()
๏ฟฝ = self.๏ฟฝ()
a = (๏ฟฝ + (X @ ฮฒ.T) * ฮผ) / ฮผ**2
b = (๏ฟฝ - (X @ ฮฒ.T) * (1 - ฮผ)) / (1 - ฮผ)**2
return -(๏ฟฝ * (y * a + (1 - y) * b) * X.T) @ X
19.10.2 Exercise 2
In [19]: X = np.array([[1, 2, 4],
[1, 1, 1],
[1, 4, 3],
[1, 5, 6],
[1, 3, 5]])
y = np.array([1, 0, 1, 1, 0])
Iteration_k Log-likelihood ฮธ
-----------------------------------------------------------------------------------------
0 -2.3796884 ['-1.34', '0.775', '-0.157']
1 -2.3687526 ['-1.53', '0.775', '-0.0981']
2 -2.3687294 ['-1.55', '0.778', '-0.0971']
3 -2.3687294 ['-1.55', '0.778', '-0.0971']
Number of iterations: 4
ฮฒ_hat = [-1.54625858 0.77778952 -0.09709757]
print(Probit(y, X).fit().summary())
315
20
20.1 Contents
โข Overview 20.2
โข Key Formulas 20.3
โข Example: The Money Multiplier in Fractional Reserve Banking 20.4
โข Example: The Keynesian Multiplier 20.5
โข Example: Interest Rates and Present Values 20.6
โข Back to the Keynesian Multiplier 20.7
20.2 Overview
The lecture describes important ideas in economics that use the mathematics of geometric
series
Among these are
(As we shall see below, the term multiplier comes down to meaning sum of a convergent
geometric series)
These and other applications prove the truth of the wise crack that
317
318 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
1 + ๐ + ๐2 + ๐3 + โฏ
1
1 + ๐ + ๐2 + ๐3 + โฏ = (1)
1โ๐
To prove key formula Eq. (1), multiply both sides by (1 โ ๐) and verify that if ๐ โ (โ1, 1),
then the outcome is the equation 1 = 1
1 + ๐ + ๐2 + ๐3 + โฏ + ๐๐
1 โ ๐๐ +1
1 + ๐ + ๐2 + ๐3 + โฏ + ๐๐ =
1โ๐
Remark: The above formula works for any value of the scalar ๐. We donโt have to restrict ๐
to be in the set (โ1, 1)
We now move on to describe some famuous economic applications of geometric series
In a fractional reserve banking system, banks hold only a fraction ๐ โ (0, 1) of cash behind
each deposit receipt that they issue
20.4. EXAMPLE: THE MONEY MULTIPLIER IN FRACTIONAL RESERVE BANKING319
โข In recent times
โ cash consists of pieces of paper issued by the government and called dollars or
pounds or โฆ
โ a deposit is a balance in a checking or savings account that entitles the owner to
ask the bank for immediate payment in cash
โข When the UK and France and the US were on either a gold or silver standard (before
1914, for example)
Economists and financiers often define the supply of money as an economy-wide sum of
cash plus deposits
In a fractional reserve banking system (one in which the reserve ratio ๐ satisfying 0 <
๐ < 1), banks create money by issuing deposits backed by fractional reserves plus loans
that they make to their customers
A geometric series is a key tool for understanding how banks create money (i.e., deposits) in
a fractional reserve system
The geometric series formula Eq. (1) is at the heart of the classic model of the money cre-
ation process โ one that leads us to the celebrated money multiplier
๐ฟ๐ + ๐ ๐ = ๐ท๐
The left side of the above equation is the sum of the bankโs assets, namely, the loans ๐ฟ๐ it
has outstanding plus its reserves of cash ๐
๐
The right side records bank ๐โs liabilities, namely, the deposits ๐ท๐ held by its depositors; these
are IOUโs from the bank to its depositors in the form of either checking accounts or savings
accounts (or before 1914, bank notes issued by a bank stating promises to redeem note for
gold or silver on demand)
Ecah bank ๐ sets its reserves to satisfy the equation
๐ ๐ = ๐๐ท๐ (2)
โข the reserve ratio is either set by a government or chosen by banks for precautionary rea-
sons
320 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Next we add a theory stating that bank ๐ + 1โs deposits depend entirely on loans made by
bank ๐, namely
๐ท๐+1 = ๐ฟ๐ (3)
Thus, we can think of the banks as being arranged along a line with loans from bank ๐ being
immediately deposited in ๐ + 1
Finally, we add an initial condition about an exogenous level of bank 0โs deposits
๐ท0 is given exogenously
We can think of ๐ท0 as being the amount of cash that a first depositor put into the first bank
in the system, bank number ๐ = 0
Now we do a little algebra
Combining equations Eq. (2) and Eq. (3) tells us that
๐ฟ๐ = (1 โ ๐)๐ท๐ (4)
This states that bank ๐ loans a fraction (1 โ ๐) of its deposits and keeps a fraction ๐ as cash
reserves
Combining equation Eq. (4) with equation Eq. (3) tells us that
Equation Eq. (5) expresses ๐ท๐ as the ๐ th term in the product of ๐ท0 and the geometric series
1, (1 โ ๐), (1 โ ๐)2 , โฏ
โ
๐ท0 ๐ท
โ(1 โ ๐)๐ ๐ท0 = = 0 (6)
๐=0
1 โ (1 โ ๐) ๐
The money multiplier is a number that tells the multiplicative factor by which an exoge-
nous injection of cash into bank 0 leads to an increase in the total deposits in the banking
system
1
Equation Eq. (6) asserts that the money multiplier is ๐
20.5. EXAMPLE: THE KEYNESIAN MULTIPLIER 321
โข an initial deposit of cash of ๐ท0 in bank 0 leads the banking system to create total de-
posits of ๐ท๐0
โข The initial deposit ๐ท0 is held as reserves, distributed throughout the banking system
โ
according to ๐ท0 = โ๐=0 ๐
๐
The famous economist John Maynard Keynes and his followers created a simple model in-
tended to determine national income ๐ฆ in circumstances in which
โข there are substantial unemployed resources, in particular excess supply of labor and
capital
โข prices and interest rates fail to adjust to make aggregate supply equal demand (e.g.,
prices and interest rates are frozen)
โข national income is entirely determined by aggregate demand
๐+๐ = ๐ฆ
The second equation is a Keynesian consumption function asserting that people consume a
fraction ๐ โ (0, 1) of their income:
๐ = ๐๐ฆ
1
๐ฆ= ๐
1โ๐
1
The quantity 1โ๐ is called the investment multiplier or simply the multiplier
Applying the formula for the sum of an infinite geometric series, we can write the above equa-
tion as
322 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
โ
๐ฆ = ๐ โ ๐๐ก
๐ก=0
โ
1
= โ ๐๐ก
1โ๐ ๐ก=0
โ
The expression โ๐ก=0 ๐๐ก motivates an interpretation of the multiplier as the outcome of a dy-
namic process that we describe next
We arrive at a dynamic version by interpreting the nonnegative integer ๐ก as indexing time and
changing our specification of the consumption function to take time into account
๐๐ก = ๐๐ฆ๐กโ1
so that ๐ is the marginal propensity to consume (now) out of last periodโs income
We begin wtih an initial condition stating that
๐ฆโ1 = 0
๐๐ก = ๐ for all ๐ก โฅ 0
๐ฆ0 = ๐ + ๐0 = ๐ + ๐๐ฆโ1 = ๐
and
๐ฆ1 = ๐1 + ๐ = ๐๐ฆ0 + ๐ = (1 + ๐)๐
and
๐ฆ2 = ๐2 + ๐ = ๐๐ฆ1 + ๐ = (1 + ๐ + ๐2 )๐
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 323
๐ฆ๐ก = ๐๐ฆ๐กโ1 + ๐ = (1 + ๐ + ๐2 + โฏ + ๐๐ก )๐
or
1 โ ๐๐ก+1
๐ฆ๐ก = ๐
1โ๐
Evidently, as ๐ก โ +โ,
1
๐ฆ๐ก โ ๐
1โ๐
Remark 1: The above formula is often applied to assert that an exogenous increase in
investment of ฮ๐ at time 0 ignites a dynamic process of increases in national income by
amounts
at times 0, 1, 2, โฆ
Remark 2 Let ๐๐ก be an exogenous sequence of government expenditures
If we generalize the model so that the national income identity becomes
๐๐ก + ๐ ๐ก + ๐ ๐ก = ๐ฆ ๐ก
then a version of the preceding argument shows that the government expenditures mul-
1
tiplier is also 1โ๐ , so that a permanent increase in government expenditures ultimately leads
to an increase in national income equal to the multiplier times the increase in government ex-
penditures
We can apply our formula for geometric series to study how interest rates affect values of
streams of dollar payments that extend over time
We work in discrete time and assume that ๐ก = 0, 1, 2, โฆ indexes time
We let ๐ โ (0, 1) be a one-period net nominal interest rate
๐ = 1 + ๐ โ (1, 2)
Remark: The gross nominal interest rate ๐
is an exchange rate or relative price of dol-
lars at between times ๐ก and ๐ก + 1. The units of ๐
are dollars at time ๐ก + 1 per dollar at time
๐ก
When people borrow and lend, they trade dollars now for dollars later or dollars later for dol-
lars now
The price at which these exchanges occur is the gross nominal interest rate
We assume that the net nominal interest rate ๐ is fixed over time, so that ๐
is the gross nom-
inal interest rate at times ๐ก = 0, 1, 2, โฆ
Two important geometric sequences are
1, ๐ , ๐ 2 , โฏ (7)
and
Sequence Eq. (7) tells us how dollar values of an investment accumulate through time
Sequence Eq. (8) tells us how to discount future dollars to get their values in terms of to-
dayโs dollars
20.6.1 Accumulation
Geometric sequence Eq. (7) tells us how one dollar invested and re-invested in a project with
gross one period nominal rate of return accumulates
โข here we assume that net interest payments are reinvested in the project
โข thus, 1 dollar invested at time 0 pays interest ๐ dollars after one period, so we have ๐ +
1 = ๐
dollars at time1
โข at time 1 we reinvest 1 + ๐ = ๐
dollars and receive interest of ๐๐
dollars at time 2 plus
the principal ๐
dollars, so we receive ๐๐
+ ๐
= (1 + ๐)๐
= ๐
2 dollars at the end of
period 2
โข and so on
Evidently, if we invest ๐ฅ dollars at time 0 and reinvest the proceeds, then the sequence
๐ฅ, ๐ฅ๐ , ๐ฅ๐ 2 , โฏ
20.6.2 Discounting
Geometric sequence Eq. (8) tells us how much future dollars are worth in terms of todayโs
dollars
Remember that the units of ๐
are dollars at ๐ก + 1 per dollar at ๐ก
It follows that
So if someone has a claim on ๐ฅ dollars at time ๐ก + ๐, it is worth ๐ฅ๐
โ๐ dollars at time ๐ก (e.g.,
today)
๐ฅ๐ก = ๐บ๐ก ๐ฅ0
๐0 = ๐ฅ0 + ๐ฅ1 /๐
+ ๐ฅ2 /(๐
2 )+ โฑ
= ๐ฅ0 (1 + ๐บ๐
โ1 + ๐บ2 ๐
โ2 + โฏ)
1
= ๐ฅ0
1 โ ๐บ๐
โ1
where the last line uses the formula for an infinite geometric series
Recall that ๐
= 1 + ๐ and ๐บ = 1 + ๐ and that ๐
> ๐บ and ๐ > ๐ and that ๐ and๐ are typically
small numbers, e.g., .05 or .03
1
Use the Taylor series of 1+๐ about ๐ = 0, namely,
1
= 1 โ ๐ + ๐2 โ ๐3 + โฏ
1+๐
1
and the fact that ๐ is small to aproximate 1+๐ โ 1โ๐
Use this approximation to write ๐0 as
326 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
1
๐0 = ๐ฅ0
1 โ ๐บ๐
โ1
1
= ๐ฅ0
1 โ (1 + ๐)(1 โ ๐)
1
= ๐ฅ0
1 โ (1 + ๐ โ ๐ โ ๐๐)
1
โ ๐ฅ0
๐โ๐
๐ฅ0
๐0 =
๐โ๐
is known as the Gordon formula for the present value or current price of an infinite pay-
ment stream ๐ฅ0 ๐บ๐ก when the nominal one-period interest rate is ๐ and when ๐ > ๐
We can also extend the asset pricing formula so that it applies to finite leases
Let the payment stream on the lease now be ๐ฅ๐ก for ๐ก = 1, 2, โฆ , ๐ , where again
๐ฅ๐ก = ๐บ๐ก ๐ฅ0
๐0 = ๐ฅ0 + ๐ฅ1 /๐
+ โฏ + ๐ฅ๐ /๐
๐
= ๐ฅ0 (1 + ๐บ๐
โ1 + โฏ + ๐บ๐ ๐
โ๐ )
๐ฅ0 (1 โ ๐บ๐ +1 ๐
โ(๐ +1) )
=
1 โ ๐บ๐
โ1
1 1
= 1 โ ๐(๐ + 1) + ๐2 (๐ + 1)(๐ + 2) + โฏ โ 1 โ ๐(๐ + 1)
(1 + ๐)๐ +1 2
Expanding:
20.6. EXAMPLE: INTEREST RATES AND PRESENT VALUES 327
We could have also approximated by removing the second term ๐๐๐ฅ0 (๐ + 1) when ๐ is rela-
tively small compared to 1/(๐๐) to get ๐ฅ0 (๐ + 1) as in the finite stream approximation
We will plot the true finite stream present-value and the two approximations, under different
values of ๐ , and ๐ and ๐ in python
First we plot the true finite stream present-value after computing it below
# Infinite lease
def infinite_lease(g, r, x_0):
G = (1 + g)
R = (1 + r)
return x_0 / (1 - G * R**(-1))
Now that we have test run our functions, we can plot some outcomes
First we study the quality of our approximations
In [3]: g = 0.02
r = 0.03
x_0 = 1
T_max = 50
T = np.arange(0, T_max+1)
fig, ax = plt.subplots()
ax.set_title('Finite Lease Present Value $T$ Periods Ahead')
y_1 = finite_lease_pv(T, g, r, x_0)
y_2 = finite_lease_pv_approx_f(T, g, r, x_0)
y_3 = finite_lease_pv_approx_s(T, g, r, x_0)
ax.plot(T, y_1, label='True T-period Lease PV')
ax.plot(T, y_2, label='T-period Lease First-order Approx.')
ax.plot(T, y_3, label='T-period Lease First-order Approx. adj.')
ax.legend()
ax.set_xlabel('$T$ Periods Ahead')
ax.set_ylabel('Present Value, $p_0$')
plt.show()
328 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
The above graphs shows how as duration ๐ โ +โ, the value of a lease of duration ๐ ap-
proaches the value of a perpetural lease
Now we consider two different views of what happens as ๐ and ๐ covary
# r ~ g, not defined when r = g, but approximately goes to straight line with slope 1
r = 0.4001
g = 0.4
ax.plot(finite_lease_pv(T, g, r, x_0), label=r'$r \approx g$', color='orange')
# r < g
r = 0.4
g = 0.5
ax.plot(finite_lease_pv(T, g, r, x_0), label='$r<g$', color='red')
ax.legend()
plt.show()
330 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
The above graphs gives a big hint for why the condition ๐ > ๐ is necessary if a lease of length
๐ = +โ is to have finite value
For fans of 3-d graphs the same point comes through in the following graph
If you arenโt enamored of 3-d graphs, feel free to skip the next visualization!
rr, gg = np.meshgrid(r, g)
z = finite_lease_pv(T, gg, rr, x_0)
We can use a little calculus to study how the present value ๐0 of a lease varies with ๐ and ๐
We will use a library called SymPy
SymPy enables us to do symbolic math calculations including computing derivatives of alge-
braic equations.
We will illustrate how it works by creating a symbolic expression that represents our present
value formula for an infinite lease
After that, weโll use SymPy to compute derivatives
Out[7]:
๐ฅ0
๐+1
โ ๐+1 + 1
dp0 / dg is:
332 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Out[8]:
๐ฅ0
2
(๐ + 1) (โ ๐+1
๐+1 + 1)
dp0 / dr is:
Out[9]:
๐ฅ0 (โ๐ โ 1)
2 2
(๐ + 1) (โ ๐+1
๐+1 + 1)
always be postive
We will now go back to the case of the Keynesian multiplier and plot the time path of ๐ฆ๐ก ,
given that consumption is a constant fraction of national income, and investment is fixed
# Initial values
i_0 = 0.3
g_0 = 0.3
# 2/3 of income goes towards consumption
b = 2/3
y_init = 0
T = 100
fig, ax = plt.subplots()
ax.set_title('Path of Aggregate Output Over Time')
ax.set_xlabel('$t$')
ax.set_ylabel('$y_t$')
ax.plot(np.arange(0, T+1), calculate_y(i_0, b, g_0, T, y_init))
# Output predicted by geometric series
ax.hlines(i_0 / (1 - b) + g_0 / (1 - b), xmin=-1, xmax=101, linestyles='--')
plt.show()
20.7. BACK TO THE KEYNESIAN MULTIPLIER 333
In this model, income grows over time, until it gradually converges to the infinite geometric
series sum of income
We now examine what will happen if we vary the so-called marginal propensity to con-
sume, i.e., the fraction of income that is consumed
fig,ax = plt.subplots()
ax.set_title('Changing Consumption as a Fraction of Income')
ax.set_ylabel('$y_t$')
ax.set_xlabel('$t$')
x = np.arange(0, T+1)
for b in (b_0, b_1, b_2, b_3):
y = calculate_y(i_0, b, g_0, T, y_init)
ax.plot(x, y, label=r'$b=$'+f"{b:.2f}")
ax.legend()
plt.show()
334 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
Increasing the marginal propensity to consumer ๐ increases the path of output over time
Notice here, whether government spending increases from 0.3 to 0.4 or investment increases
from 0.3 to 0.4, the shifts in the graphs are identical
336 20. GEOMETRIC SERIES FOR ELEMENTARY ECONOMICS
21
Linear Algebra
21.1 Contents
โข Overview 21.2
โข Vectors 21.3
โข Matrices 21.4
โข Exercises 21.8
โข Solutions 21.9
21.2 Overview
Linear algebra is one of the most useful branches of applied mathematics for economists to
invest in
For example, many applied problems in economics and finance require the solution of a linear
system of equations, such as
๐ฆ1 = ๐๐ฅ1 + ๐๐ฅ2
๐ฆ2 = ๐๐ฅ1 + ๐๐ฅ2
The objective here is to solve for the โunknownsโ ๐ฅ1 , โฆ , ๐ฅ๐ given ๐11 , โฆ , ๐๐๐ and ๐ฆ1 , โฆ , ๐ฆ๐
337
338 21. LINEAR ALGEBRA
When considering such problems, it is essential that we first consider at least some of the fol-
lowing questions
21.3 Vectors
A vector of length ๐ is just a sequence (or array, or tuple) of ๐ numbers, which we write as
๐ฅ = (๐ฅ1 , โฆ , ๐ฅ๐ ) or ๐ฅ = [๐ฅ1 , โฆ , ๐ฅ๐ ]
We will write these sequences either horizontally or vertically as we please
(Later, when we wish to perform certain matrix operations, it will become necessary to distin-
guish between the two)
The set of all ๐-vectors is denoted by R๐
For example, R2 is the plane, and a vector in R2 is just a point in the plane
Traditionally, vectors are represented visually as arrows from the origin to the point
The following figure represents three vectors in this manner
The two most common operators for vectors are addition and scalar multiplication, which we
now describe
As a matter of definition, when we add two vectors, we add them element-by-element
๐ฅ1 ๐ฆ1 ๐ฅ1 + ๐ฆ1
โก๐ฅ โค โก๐ฆ โค โก๐ฅ + ๐ฆ โค
๐ฅ + ๐ฆ = โข 2 โฅ + โข 2 โฅ โถ= โข 2 2โฅ
โข โฎ โฅ โข โฎ โฅ โข โฎ โฅ
๐ฅ
โฃ ๐โฆ โฃ ๐โฆ๐ฆ ๐ฅ
โฃ ๐ + ๐ฆ ๐โฆ
Scalar multiplication is an operation that takes a number ๐พ and a vector ๐ฅ and produces
๐พ๐ฅ1
โก ๐พ๐ฅ โค
๐พ๐ฅ โถ= โข 2 โฅ
โข โฎ โฅ
โฃ๐พ๐ฅ๐ โฆ
scalars = (-2, 2)
x = np.array(x)
for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] - 0.2, f'${s} x$', fontsize='16')
plt.show()
In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is
more commonly represented as a NumPy array
One advantage of NumPy arrays is that scalar multiplication and addition have very natural
syntax
21.3. VECTORS 341
In [4]: 4 * x
๐
๐ฅโฒ ๐ฆ โถ= โ ๐ฅ๐ ๐ฆ๐
๐=1
1/2
โ ๐
โ๐ฅโ โถ= ๐ฅโฒ ๐ฅ โถ= (โ ๐ฅ2๐ )
๐=1
Out[5]: 12.0
Out[6]: 1.7320508075688772
Out[7]: 1.7320508075688772
21.3.3 Span
Given a set of vectors ๐ด โถ= {๐1 , โฆ , ๐๐ } in R๐ , itโs natural to think about the new vectors we
can create by performing linear operations
New vectors created in this manner are called linear combinations of ๐ด
In particular, ๐ฆ โ R๐ is a linear combination of ๐ด โถ= {๐1 , โฆ , ๐๐ } if
In this context, the values ๐ฝ1 , โฆ , ๐ฝ๐ are called the coefficients of the linear combination
The set of linear combinations of ๐ด is called the span of ๐ด
The next figure shows the span of ๐ด = {๐1 , ๐2 } in R3
The span is a two-dimensional plane passing through these two points and the origin
ฮฑ, ฮฒ = 0.2, 0.1
gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k-', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k-', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k-', lw=2, alpha=0.5)
# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b-', lw=1.5, alpha=0.6)
Examples
If ๐ด contains only one vector ๐1 โ R2 , then its span is just the scalar multiples of ๐1 , which is
the unique line passing through both ๐1 and the origin
If ๐ด = {๐1 , ๐2 , ๐3 } consists of the canonical basis vectors of R3 , that is
1 0 0
๐1 โถ= โก โค
โข0โฅ , ๐2 โถ= โก โค
โข1โฅ , ๐3 โถ= โก
โข0โฅ
โค
โฃ0โฆ โฃ0โฆ โฃ1โฆ
then the span of ๐ด is all of R3 , because, for any ๐ฅ = (๐ฅ1 , ๐ฅ2 , ๐ฅ3 ) โ R3 , we can write
๐ฅ = ๐ฅ 1 ๐1 + ๐ฅ 2 ๐2 + ๐ฅ 3 ๐3
As weโll see, itโs often desirable to find families of vectors with relatively large span, so that
many vectors can be described by linear operators on a few vectors
344 21. LINEAR ALGEBRA
The condition we need for a set of vectors to have a large span is whatโs called linear inde-
pendence
In particular, a collection of vectors ๐ด โถ= {๐1 , โฆ , ๐๐ } in R๐ is said to be
Put differently, a set of vectors is linearly independent if no vector is redundant to the span
and linearly dependent otherwise
To illustrate the idea, recall the figure that showed the span of vectors {๐1 , ๐2 } in R3 as a
plane through the origin
If we take a third vector ๐3 and form the set {๐1 , ๐2 , ๐3 }, this set will be
As another illustration of the concept, since R๐ can be spanned by ๐ vectors (see the discus-
sion of canonical basis vectors above), any collection of ๐ > ๐ vectors in R๐ must be linearly
dependent
The following statements are equivalent to linear independence of ๐ด โถ= {๐1 , โฆ , ๐๐ } โ R๐
Another nice thing about sets of linearly independent vectors is that each element in the span
has a unique representation as a linear combination of these vectors
In other words, if ๐ด โถ= {๐1 , โฆ , ๐๐ } โ R๐ is linearly independent and
๐ฆ = ๐ฝ 1 ๐1 + โฏ ๐ฝ ๐ ๐๐
21.4 Matrices
Matrices are a neat way of organizing data for use in linear operations
21.4. MATRICES 345
Just as was the case for vectors, a number of algebraic operations are defined for matrices
Scalar multiplication and addition are immediate generalizations of the vector case:
Note
๐ด๐ต and ๐ต๐ด are not generally the same thing
NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all
the standard matrix operations [1]
You can create them manually from tuples of tuples (or lists of lists) as follows
type(A)
Out[9]: tuple
In [10]: A = np.array(A)
type(A)
Out[10]: numpy.ndarray
In [11]: A.shape
Out[11]: (2, 2)
The shape attribute is a tuple giving the number of rows and columns โ see here for more
discussion
To get the transpose of A, use A.transpose() or, more simply, A.T
There are many convenient functions for creating common matrices (matrices of zeros, ones,
etc.) โ see here
Since operations are performed elementwise by default, scalar multiplication and addition
have very natural syntax
21.5. SOLVING SYSTEMS OF EQUATIONS 347
In [12]: A = np.identity(3)
B = np.ones((3, 3))
2 * A
In [13]: A + B
Each ๐ ร ๐ matrix ๐ด can be identified with a function ๐(๐ฅ) = ๐ด๐ฅ that maps ๐ฅ โ R๐ into
๐ฆ = ๐ด๐ฅ โ R๐
These kinds of functions have a special property: they are linear
A function ๐ โถ R๐ โ R๐ is called linear if, for all ๐ฅ, ๐ฆ โ R๐ and all scalars ๐ผ, ๐ฝ, we have
You can check that this holds for the function ๐(๐ฅ) = ๐ด๐ฅ + ๐ when ๐ is the zero vector and
fails when ๐ is nonzero
In fact, itโs known that ๐ is linear if and only if there exists a matrix ๐ด such that ๐(๐ฅ) = ๐ด๐ฅ
for all ๐ฅ
๐ฆ = ๐ด๐ฅ (3)
The problem we face is to determine a vector ๐ฅ โ R๐ that solves Eq. (3), taking ๐ฆ and ๐ด as
given
This is a special case of a more general problem: Find an ๐ฅ such that ๐ฆ = ๐(๐ฅ)
Given an arbitrary function ๐ and a ๐ฆ, is there always an ๐ฅ such that ๐ฆ = ๐(๐ฅ)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
348 21. LINEAR ALGEBRA
for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax = axes[0]
ax = axes[1]
ybar = 2.6
ax.plot(x, x * 0 + ybar, 'k--', alpha=0.5)
ax.text(0.04, 0.91 * ybar, '$y$', fontsize=16)
plt.show()
21.5. SOLVING SYSTEMS OF EQUATIONS 349
In the first plot, there are multiple solutions, as the function is not one-to-one, while in the
second there are no solutions, since ๐ฆ lies outside the range of ๐
Can we impose conditions on ๐ด in Eq. (3) that rule out these problems?
In this context, the most important thing to recognize about the expression ๐ด๐ฅ is that it cor-
responds to a linear combination of the columns of ๐ด
In particular, if ๐1 , โฆ , ๐๐ are the columns of ๐ด, then
๐ด๐ฅ = ๐ฅ1 ๐1 + โฏ + ๐ฅ๐ ๐๐
Letโs discuss some more details, starting with the case where ๐ด is ๐ ร ๐
This is the familiar case where the number of unknowns equals the number of equations
For arbitrary ๐ฆ โ R๐ , we hope to find a unique ๐ฅ โ R๐ such that ๐ฆ = ๐ด๐ฅ
In view of the observations immediately above, if the columns of ๐ด are linearly independent,
then their span, and hence the range of ๐(๐ฅ) = ๐ด๐ฅ, is all of R๐
Hence there always exists an ๐ฅ such that ๐ฆ = ๐ด๐ฅ
Moreover, the solution is unique
In particular, the following are equivalent
The property of having linearly independent columns is sometimes expressed as having full
column rank
Inverse Matrices
Can we give some sort of expression for the solution?
If ๐ฆ and ๐ด are scalar with ๐ด โ 0, then the solution is ๐ฅ = ๐ดโ1 ๐ฆ
A similar expression is available in the matrix case
In particular, if square matrix ๐ด has full column rank, then it possesses a multiplicative in-
verse matrix ๐ดโ1 , with the property that ๐ด๐ดโ1 = ๐ดโ1 ๐ด = ๐ผ
As a consequence, if we pre-multiply both sides of ๐ฆ = ๐ด๐ฅ by ๐ดโ1 , we get ๐ฅ = ๐ดโ1 ๐ฆ
This is the solution that weโre looking for
Determinants
Another quick comment about square matrices is that to every such matrix we assign a
unique number called the determinant of the matrix โ you can find the expression for it here
If the determinant of ๐ด is not zero, then we say that ๐ด is nonsingular
Perhaps the most important fact about determinants is that ๐ด is nonsingular if and only if ๐ด
is of full column rank
This gives us a useful one-number summary of whether or not a square matrix can be in-
verted
Without much loss of generality, letโs go over the intuition focusing on the case where the
columns of ๐ด are linearly independent
It follows that the span of the columns of ๐ด is a ๐-dimensional subspace of R๐
This span is very โunlikelyโ to contain arbitrary ๐ฆ โ R๐
To see why, recall the figure above, where ๐ = 2 and ๐ = 3
Imagine an arbitrarily chosen ๐ฆ โ R3 , located somewhere in that three-dimensional space
Whatโs the likelihood that ๐ฆ lies in the span of {๐1 , ๐2 } (i.e., the two dimensional plane
through these points)?
In a sense, it must be very small, since this plane has zero โthicknessโ
As a result, in the ๐ > ๐ case we usually give up on existence
However, we can still seek the best approximation, for example, an ๐ฅ that makes the distance
โ๐ฆ โ ๐ด๐ฅโ as small as possible
To solve this problem, one can use either calculus or the theory of orthogonal projections
The solution is known to be ๐ฅฬ = (๐ดโฒ ๐ด)โ1 ๐ดโฒ ๐ฆ โ see for example chapter 3 of these notes
This is the ๐ ร ๐ case with ๐ < ๐, so there are fewer equations than unknowns
In this case there are either no solutions or infinitely many โ in other words, uniqueness
never holds
For example, consider the case where ๐ = 3 and ๐ = 2
Thus, the columns of ๐ด consists of 3 vectors in R2
This set can never be linearly independent, since it is possible to find two vectors that span
R2
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two
For example, letโs say that ๐1 = ๐ผ๐2 + ๐ฝ๐3
Then if ๐ฆ = ๐ด๐ฅ = ๐ฅ1 ๐1 + ๐ฅ2 ๐2 + ๐ฅ3 ๐3 , we can also write
Hereโs an illustration of how to solve linear equations with SciPyโs linalg submodule
All of these routines are Python front ends to time-tested and highly optimized FORTRAN
code
Out[15]: -2.0
Out[16]: array([[-2. , 1. ],
[ 1.5, -0.5]])
Out[17]: array([[1.],
[1.]])
Out[18]: array([[-1.],
[ 1.]])
Observe how we can solve for ๐ฅ = ๐ดโ1 ๐ฆ by either via inv(A) @ y, or using solve(A, y)
The latter method uses a different algorithm (LU decomposition) that is numerically more
stable, and hence should almost always be preferred
To obtain the least-squares solution ๐ฅฬ = (๐ดโฒ ๐ด)โ1 ๐ดโฒ ๐ฆ, use scipy.linalg.lstsq(A, y)
๐ด๐ฃ = ๐๐ฃ
A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evecs = evecs[:, 0], evecs[:, 1]
plt.show()
354 21. LINEAR ALGEBRA
The eigenvalue equation is equivalent to (๐ด โ ๐๐ผ)๐ฃ = 0, and this has a nonzero solution ๐ฃ only
when the columns of ๐ด โ ๐๐ผ are linearly dependent
This in turn is equivalent to stating that the determinant is zero
Hence to find all eigenvalues, we can look for ๐ such that the determinant of ๐ด โ ๐๐ผ is zero
This problem can be expressed as one of solving for the roots of a polynomial in ๐ of degree ๐
This in turn implies the existence of ๐ solutions in the complex plane, although some might
be repeated
Some nice facts about the eigenvalues of a square matrix ๐ด are as follows
A corollary of the first statement is that a matrix is invertible if and only if all its eigenvalues
are nonzero
Using SciPy, we can solve for the eigenvalues and eigenvectors of a matrix as follows
A = np.array(A)
evals, evecs = eig(A)
evals
In [21]: evecs
It is sometimes useful to consider the generalized eigenvalue problem, which, for given matri-
ces ๐ด and ๐ต, seeks generalized eigenvalues ๐ and eigenvectors ๐ฃ such that
๐ด๐ฃ = ๐๐ต๐ฃ
We round out our discussion by briefly mentioning several other important topics
Recall the usual summation formula for a geometric progression, which states that if |๐| < 1,
โ
then โ๐=0 ๐๐ = (1 โ ๐)โ1
A generalization of this idea exists in the matrix setting
Matrix Norms
Let ๐ด be a square matrix, and let
The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand
side is a matrix norm โ in this case, the so-called spectral norm
For example, for a square matrix ๐, the condition โ๐โ < 1 means that ๐ is contractive, in the
sense that it pulls all vectors towards the origin [2]
Neumannโs Theorem
Let ๐ด be a square matrix and let ๐ด๐ โถ= ๐ด๐ด๐โ1 with ๐ด1 โถ= ๐ด
In other words, ๐ด๐ is the ๐-th power of ๐ด
Neumannโs theorem states the following: If โ๐ด๐ โ < 1 for some ๐ โ N, then ๐ผ โ ๐ด is invertible,
and
โ
(๐ผ โ ๐ด)โ1 = โ ๐ด๐ (4)
๐=0
Spectral Radius
A result known as Gelfandโs formula tells us that, for any square matrix ๐ด,
Here ๐(๐ด) is the spectral radius, defined as max๐ |๐๐ |, where {๐๐ }๐ is the set of eigenvalues of ๐ด
As a consequence of Gelfandโs formula, if all eigenvalues are strictly less than one in modulus,
there exists a ๐ with โ๐ด๐ โ < 1
In which case Eq. (4) is valid
Analogous definitions exist for negative definite and negative semi-definite matrices
It is notable that if ๐ด is positive definite, then all of its eigenvalues are strictly positive, and
hence ๐ด is invertible (with positive definite inverse)
Then
๐๐โฒ ๐ฅ
1. ๐๐ฅ = ๐
๐๐ด๐ฅ โฒ
2. ๐๐ฅ = ๐ด
โฒ
๐๐ฅ ๐ด๐ฅ
3. ๐๐ฅ = (๐ด + ๐ดโฒ )๐ฅ
๐๐ฆโฒ ๐ต๐ง
4. ๐๐ฆ = ๐ต๐ง
๐๐ฆโฒ ๐ต๐ง โฒ
5. ๐๐ต = ๐ฆ๐ง
21.8 Exercises
21.8.1 Exercise 1
๐ฆ = ๐ด๐ฅ + ๐ต๐ข
Here
21.9. SOLUTIONS 357
โ = โ๐ฆโฒ ๐ ๐ฆ โ ๐ขโฒ ๐๐ข + ๐โฒ [๐ด๐ฅ + ๐ต๐ข โ ๐ฆ]
1. ๐ = โ2๐ ๐ฆ
2. The optimizing choice of ๐ข satisfies ๐ข = โ(๐ + ๐ตโฒ ๐ ๐ต)โ1 ๐ตโฒ ๐ ๐ด๐ฅ
3. The function ๐ฃ satisfies ๐ฃ(๐ฅ) = โ๐ฅโฒ ๐ ฬ ๐ฅ where ๐ ฬ = ๐ดโฒ ๐ ๐ด โ ๐ดโฒ ๐ ๐ต(๐ + ๐ตโฒ ๐ ๐ต)โ1 ๐ตโฒ ๐ ๐ด
As we will see, in economic contexts Lagrange multipliers often are shadow prices
Note
If we donโt care about the Lagrange multipliers, we can substitute the constraint
into the objective function, and then just maximize โ(๐ด๐ฅ + ๐ต๐ข)โฒ ๐ (๐ด๐ฅ + ๐ต๐ข) โ
๐ขโฒ ๐๐ข with respect to ๐ข. You can verify that this leads to the same maximizer.
21.9 Solutions
s.t.
๐ฆ = ๐ด๐ฅ + ๐ต๐ข
with primitives
๐ฟ = โ๐ฆโฒ ๐ ๐ฆ โ ๐ขโฒ ๐๐ข + ๐โฒ [๐ด๐ฅ + ๐ต๐ข โ ๐ฆ]
1.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields
๐๐ฟ
= โ(๐ + ๐ โฒ )๐ฆ โ ๐ = โ2๐ ๐ฆ โ ๐ = 0 ,
๐๐ฆ
since P is symmetric
Accordingly, the first-order condition for maximizing L w.r.t. y implies
๐ = โ2๐ ๐ฆ
2.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields
๐๐ฟ
= โ(๐ + ๐โฒ )๐ข โ ๐ตโฒ ๐ = โ2๐๐ข + ๐ตโฒ ๐ = 0
๐๐ข
Substituting ๐ = โ2๐ ๐ฆ gives
๐๐ข + ๐ตโฒ ๐ ๐ฆ = 0
๐๐ข + ๐ตโฒ ๐ (๐ด๐ฅ + ๐ต๐ข) = 0
(๐ + ๐ตโฒ ๐ ๐ต)๐ข + ๐ตโฒ ๐ ๐ด๐ฅ = 0
๐ข = โ(๐ + ๐ตโฒ ๐ ๐ต)โ1 ๐ตโฒ ๐ ๐ด๐ฅ ,
which follows from the definition of the first-order conditions for Lagrangian equation
3.
Rewriting our problem by substituting the constraint into the objective function, we get
Since we know the optimal choice of u satisfies ๐ข = โ(๐ + ๐ตโฒ ๐ ๐ต)โ1 ๐ตโฒ ๐ ๐ด๐ฅ, then
โ2๐ขโฒ ๐ตโฒ ๐ ๐ด๐ฅ = โ2๐ฅโฒ ๐ โฒ ๐ตโฒ ๐ ๐ด๐ฅ
= 2๐ฅโฒ ๐ดโฒ ๐ ๐ต(๐ + ๐ตโฒ ๐ ๐ต)โ1 ๐ตโฒ ๐ ๐ด๐ฅ
Notice that the term (๐ + ๐ตโฒ ๐ ๐ต)โ1 is symmetric as both P and Q are symmetric
Regarding the third term โ๐ขโฒ (๐ + ๐ตโฒ ๐ ๐ต)๐ข,
Therefore, the solution to the optimization problem ๐ฃ(๐ฅ) = โ๐ฅโฒ ๐ ฬ ๐ฅ follows the above result by
denoting ๐ ฬ โถ= ๐ดโฒ ๐ ๐ด โ ๐ดโฒ ๐ ๐ต(๐ + ๐ตโฒ ๐ ๐ต)โ1 ๐ตโฒ ๐ ๐ด
Footnotes
[1] Although there is a specialized matrix data type defined in NumPy, itโs more standard to
work with ordinary NumPy arrays. See this discussion.
[2] Suppose that โ๐โ < 1. Take any nonzero vector ๐ฅ, and let ๐ โถ= โ๐ฅโ. We have โ๐๐ฅโ =
๐โ๐(๐ฅ/๐)โ โค ๐โ๐โ < ๐ = โ๐ฅโ. Hence every point is pulled towards the origin.
360 21. LINEAR ALGEBRA
22
22.1 Contents
โข Overview 22.2
22.2 Overview
361
362 22. COMPLEX NUMBERS AND TRIGNOMETRY
๐ = |๐ง| = โ๐ฅ2 + ๐ฆ2
The value ๐ is the angle of (๐ฅ, ๐ฆ) with respect to the real axis
Evidently, the tangent of ๐ is ( ๐ฅ๐ฆ )
Therefore,
๐ฆ
๐ = tanโ1 ( )
๐ฅ
22.2.2 An Example
โ
Consider the complex number ๐ง = 1 + 3๐
โ โ
For ๐ง = 1 + 3๐, ๐ฅ = 1, ๐ฆ = 3
โ
It follows that ๐ = 2 and ๐ = tanโ1 ( 3) = ๐3 = 60๐
โ
Letโs use Python to plot the trigonometric form of the complex number ๐ง = 1 + 3๐
# Set parameters
r = 2
ฮธ = ฯ/3
x = r * np.cos(ฮธ)
x_range = np.linspace(0, x, 1000)
ฮธ_range = np.linspace(0, ฮธ, 1000)
# Plot
fig = plt.figure(figsize=(8, 8))
ax = plt.subplot(111, projection='polar')
ax.set_rmax(2)
ax.set_rticks((0.5, 1, 1.5, 2)) # less radial ticks
ax.set_rlabel_position(-88.5) # get radial labels away from plotted line
ax.grid(True)
plt.show()
364 22. COMPLEX NUMBERS AND TRIGNOMETRY
๐
(๐(cos ๐ + ๐ sin ๐))๐ = (๐๐๐๐ )
and compute
22.4.1 Example 1
1 = ๐๐๐ ๐โ๐๐
= (cos ๐ + ๐ sin ๐)(cos (-๐) + ๐ sin (-๐))
= (cos ๐ + ๐ sin ๐)(cos ๐ โ ๐ sin ๐)
= cos2 ๐ + sin2 ๐
๐ฅ2 ๐ฆ2
= + 2
๐2 ๐
and thus
๐ฅ2 + ๐ฆ2 = ๐2
22.4.2 Example 2
๐ฅ๐ = ๐๐ง ๐ + ๐๐งฬ ๐ฬ
= ๐๐๐๐ (๐๐๐๐ )๐ + ๐๐โ๐๐ (๐๐โ๐๐ )๐
= ๐๐๐ ๐๐(๐+๐๐) + ๐๐๐ ๐โ๐(๐+๐๐)
= ๐๐๐ [cos (๐ + ๐๐) + ๐ sin (๐ + ๐๐) + cos (๐ + ๐๐) โ ๐ sin (๐ + ๐๐)]
= 2๐๐๐ cos (๐ + ๐๐)
22.4.3 Example 3
This example provides machinery that is at the heard of Samuelsonโs analysis of his
multiplier-accelerator model [115]
Thus, consider a second-order linear difference equation
๐ฅ๐+2 = ๐1 ๐ฅ๐+1 + ๐2 ๐ฅ๐
๐ง 2 โ ๐1 ๐ง โ ๐ 2 = 0
or
(๐ง2 โ ๐1 ๐ง โ ๐2 ) = (๐ง โ ๐ง1 )(๐ง โ ๐ง2 ) = 0
has roots ๐ง1 , ๐ง1
A solution is a sequence {๐ฅ๐ }โ
๐=0 that satisfies the difference equation
Under the following circumstances, we can apply our example 2 formula to solve the differ-
ence equation
โข the roots ๐ง1 , ๐ง2 of the characteristic polynomial of the difference equation form a com-
plex conjugate pair
โข the values ๐ฅ0 , ๐ฅ1 are given initial conditions
where ๐, ๐ are coefficients to be determined from information encoded in the initial conditions
๐ฅ1 , ๐ฅ0
Since ๐ฅ0 = 2๐ cos ๐ and ๐ฅ1 = 2๐๐ cos (๐ + ๐) the ratio of ๐ฅ1 to ๐ฅ0 is
๐ฅ1 ๐ cos (๐ + ๐)
=
๐ฅ0 cos ๐
We can solve this equation for ๐ then solve for ๐ using ๐ฅ0 = 2๐๐0 cos (๐ + ๐๐)
With the sympy package in Python, we are able to solve and plot the dynamics of ๐ฅ๐ given
different values of ๐
366 22. COMPLEX NUMBERS AND TRIGNOMETRY
โ โ
In this example, we set the initial values: - ๐ = 0.9 - ๐ = 14 ๐ - ๐ฅ0 = 4 - ๐ฅ1 = ๐ โ
2 2 = 1.8 2
We first numerically solve for ๐ and ๐ using nsolve in the sympy package based on the
above initial condition:
# Set parameters
r = 0.9
ฮธ = ฯ/4
x0 = 4
x1 = 2 * r * sqrt(2)
# Solve for ฯ
## Note: we choose the solution near 0
eq1 = Eq(x1/x0 - r * cos(ฯ+ฮธ) / cos(ฯ))
ฯ = nsolve(eq1, ฯ, 0)
ฯ = np.float(ฯ)
print(f'ฯ = {ฯ:1.3f}')
# Solve for p
eq2 = Eq(x0 - 2 * p * cos(ฯ))
p = nsolve(eq2, p, 0)
p = np.float(p)
print(f'p = {p:1.3f}')
ฯ = 0.000
p = 2.000
# Define x_n
x = lambda n: 2 * p * r**n * np.cos(ฯ + n * ฮธ)
# Plot
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(n, x(n))
ax.set(xlim=(0, max_n), ylim=(-5, 5), xlabel='$n$', ylabel='$x_n$')
ax.grid()
plt.show()
22.4. APPLICATIONS OF DE MOIVREโS THEOREM 367
๐๐(๐+๐) + ๐โ๐(๐+๐)
cos (๐ + ๐) =
2
๐๐(๐+๐) โ ๐โ๐(๐+๐)
sin (๐ + ๐) =
2๐
Since both real and imaginary parts of the above formula should be equal, we get:
368 22. COMPLEX NUMBERS AND TRIGNOMETRY
The equations above are also known as the angle sum identities. We can verify the equa-
tions using the simplify function in the sympy package:
# Verify
print("cos(ฯ)cos(ฮธ) - sin(ฯ)sin(ฮธ) =", simplify(cos(ฯ)*cos(ฮธ) - sin(ฯ) * sin(ฮธ)))
print("cos(ฯ)sin(ฮธ) + sin(ฯ)cos(ฮธ) =", simplify(cos(ฯ)*sin(ฮธ) + sin(ฯ) * cos(ฮธ)))
We can also compute the trigonometric integrals using polar forms of complex numbers
For example, we want to solve the following integral:
๐
โซ cos(๐) sin(๐) ๐๐
โ๐
and thus:
๐
1 1
โซ cos(๐) sin(๐) ๐๐ = sin2 (๐) โ sin2 (โ๐) = 0
โ๐ 2 2
We can verify the analytical as well as numerical results using integrate in the sympy
package:
22.4. APPLICATIONS OF DE MOIVREโS THEOREM 369
ฯ = Symbol('ฯ')
print('The analytical solution for integral of cos(ฯ)sin(ฯ) is:')
integrate(cos(ฯ) * sin(ฯ), ฯ)
Out[6]:
sin2 (๐)
2
In [7]: print('The numerical solution for the integral of cos(ฯ)sin(ฯ) from -ฯ to ฯ is:')
integrate(cos(ฯ) * sin(ฯ), (ฯ, -ฯ, ฯ))
Out[7]:
0
370 22. COMPLEX NUMBERS AND TRIGNOMETRY
23
23.1 Contents
โข Overview 23.2
โข Exercises 23.9
โข Solutions 23.10
23.2 Overview
Orthogonal projection is a cornerstone of vector space methods, with many diverse applica-
tions
These include, but are not limited to,
371
372 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
โข key ideas
โข least squares regression
For background and foundational concepts, see our lecture on linear algebra
For more proofs and greater theoretical detail, see A Primer in Econometric Theory
For a complete set of proofs in a general setting, see, for example, [109]
For an advanced treatment of projection in the context of least squares prediction, see this
book chapter
Assume ๐ฅ, ๐ง โ R๐
Define โจ๐ฅ, ๐งโฉ = โ๐ ๐ฅ๐ ๐ง๐
Recall โ๐ฅโ2 = โจ๐ฅ, ๐ฅโฉ
The law of cosines states that โจ๐ฅ, ๐งโฉ = โ๐ฅโโ๐งโ cos(๐) where ๐ is the angle between the vectors
๐ฅ and ๐ง
When โจ๐ฅ, ๐งโฉ = 0, then cos(๐) = 0 and ๐ฅ and ๐ง are said to be orthogonal and we write ๐ฅ โ ๐ง
๐ โ is a linear subspace of R๐
๐ฆ ฬ โถ= min โ๐ฆ โ ๐งโ
๐งโ๐
โข ๐ฆฬ โ ๐
โข ๐ฆ โ ๐ฆฬ โ ๐
Hence โ๐ฆ โ ๐งโ โฅ โ๐ฆ โ ๐ฆโ,
ฬ which completes the proof
23.4. THE ORTHOGONAL PROJECTION THEOREM 375
For a linear space ๐ and a fixed linear subspace ๐, we have a functional relationship
1. ๐ ๐ฆ โ ๐ and
2. ๐ฆ โ ๐ ๐ฆ โ ๐
For example, to prove 1, observe that ๐ฆ = ๐ ๐ฆ + ๐ฆ โ ๐ ๐ฆ and apply the Pythagorean law
Orthogonal Complement
Let ๐ โ R๐ .
The orthogonal complement of ๐ is the linear subspace ๐ โ that satisfies ๐ฅ1 โ ๐ฅ2 for every
๐ฅ1 โ ๐ and ๐ฅ2 โ ๐ โ
Let ๐ be a linear space with linear subspace ๐ and its orthogonal complement ๐ โ
We write
๐ = ๐ โ ๐โ
to indicate that for every ๐ฆ โ ๐ there is unique ๐ฅ1 โ ๐ and a unique ๐ฅ2 โ ๐ โ such that
๐ฆ = ๐ฅ1 + ๐ฅ2
Moreover, ๐ฅ1 = ๐ธ๐ฬ ๐ฆ and ๐ฅ2 = ๐ฆ โ ๐ธ๐ฬ ๐ฆ
This amounts to another version of the OPT:
Theorem. If ๐ is a linear subspace of R๐ , ๐ธ๐ฬ ๐ฆ = ๐ ๐ฆ and ๐ธ๐ฬ โ ๐ฆ = ๐ ๐ฆ, then
๐
๐ฅ = โโจ๐ฅ, ๐ข๐ โฉ๐ข๐ for all ๐ฅโ๐
๐=1
To see this, observe that since ๐ฅ โ span{๐ข1 , โฆ , ๐ข๐ }, we can find scalars ๐ผ1 , โฆ , ๐ผ๐ that verify
๐
๐ฅ = โ ๐ผ๐ ๐ข๐ (1)
๐=1
๐
โจ๐ฅ, ๐ข๐ โฉ = โ ๐ผ๐ โจ๐ข๐ , ๐ข๐ โฉ = ๐ผ๐
๐=1
When the subspace onto which are projecting is orthonormal, computing the projection sim-
plifies:
Theorem If {๐ข1 , โฆ , ๐ข๐ } is an orthonormal basis for ๐, then
๐
๐ ๐ฆ = โโจ๐ฆ, ๐ข๐ โฉ๐ข๐ , โ ๐ฆ โ R๐ (2)
๐=1
๐ ๐
โจ๐ฆ โ โโจ๐ฆ, ๐ข๐ โฉ๐ข๐ , ๐ข๐ โฉ = โจ๐ฆ, ๐ข๐ โฉ โ โโจ๐ฆ, ๐ข๐ โฉโจ๐ข๐ , ๐ข๐ โฉ = 0
๐=1 ๐=1
๐ธ๐ฬ ๐ฆ = ๐ ๐ฆ
๐ = ๐(๐ โฒ ๐)โ1 ๐ โฒ
1. ๐ ๐ฆ โ ๐, and
2. ๐ฆ โ ๐ ๐ฆ โ ๐
๐ โถ= span ๐ โถ= span{1 ๐, โฆ ,๐ ๐}
๐ ๐ฆ = ๐ (๐ โฒ ๐ )โ1 ๐ โฒ ๐ฆ
๐
๐ ๐ฆ = ๐ ๐ โฒ ๐ฆ = โโจ๐ข๐ , ๐ฆโฉ๐ข๐
๐=1
We have recovered our earlier result about projecting onto the span of an orthonormal basis
๐ฝ ฬ โถ= (๐ โฒ ๐)โ1 ๐ โฒ ๐ฆ
๐ ๐ฝ ฬ = ๐(๐ โฒ ๐)โ1 ๐ โฒ ๐ฆ = ๐ ๐ฆ
Because ๐๐ โ span(๐)
If probabilities and hence E are unknown, we cannot solve this problem directly
However, if a sample is available, we can estimate the risk with the empirical risk:
1 ๐
min โ(๐ฆ โ ๐(๐ฅ๐ ))2
๐โโฑ ๐ ๐=1 ๐
23.7. LEAST SQUARES REGRESSION 381
๐
min โ(๐ฆ๐ โ ๐โฒ ๐ฅ๐ )2
๐โR๐พ
๐=1
23.7.2 Solution
๐ฆ1 ๐ฅ๐1
โ
โ ๐ฆ2 โ
โ โ
โ ๐ฅ๐2 โ
โ
๐ฆ โถ= โ
โ โ
โ , ๐ฅ๐ โถ= โ
โ โ
โ = :math:โnโ-th obs on all regressors
โ โฎ โ โ โฎ โ
โ ๐ฆ๐ โ โ ๐ฅ๐๐พ โ
and
๐
min โ(๐ฆ๐ โ ๐โฒ ๐ฅ๐ )2 = min โ๐ฆ โ ๐๐โ
๐โR๐พ ๐โR๐พ
๐=1
๐ฝ ฬ โถ= (๐ โฒ ๐)โ1 ๐ โฒ ๐ฆ
๐ฆ ฬ โถ= ๐ ๐ฝ ฬ = ๐ ๐ฆ
382 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
๐ขฬ โถ= ๐ฆ โ ๐ฆ ฬ = ๐ฆ โ ๐ ๐ฆ = ๐ ๐ฆ
Letโs return to the connection between linear independence and orthogonality touched on
above
A result of much interest is a famous algorithm for constructing orthonormal sets from lin-
early independent sets
The next section gives details
Theorem For each linearly independent set {๐ฅ1 , โฆ , ๐ฅ๐ } โ R๐ , there exists an orthonormal
set {๐ข1 , โฆ , ๐ข๐ } with
23.8.2 QR Decomposition
The following result uses the preceding algorithm to produce a useful decomposition
Theorem If ๐ is ๐ ร ๐ with linearly independent columns, then there exists a factorization
๐ = ๐๐
where
โข ๐ฅ๐ โถ=๐ (๐)
โข {๐ข1 , โฆ , ๐ข๐ } be orthonormal with the same span as {๐ฅ1 , โฆ , ๐ฅ๐ } (to be constructed using
GramโSchmidt)
โข ๐ be formed from cols ๐ข๐
๐
๐ฅ๐ = โโจ๐ข๐ , ๐ฅ๐ โฉ๐ข๐ for ๐ = 1, โฆ , ๐
๐=1
For matrices ๐ and ๐ฆ that overdetermine ๐๐๐ก๐ in the linear equation system ๐ฆ = ๐๐ฝ, we
found the least squares approximator ๐ฝ ฬ = (๐ โฒ ๐)โ1 ๐ โฒ ๐ฆ
Using the QR decomposition ๐ = ๐๐
gives
๐ฝ ฬ = (๐
โฒ ๐โฒ ๐๐
)โ1 ๐
โฒ ๐โฒ ๐ฆ
= (๐
โฒ ๐
)โ1 ๐
โฒ ๐โฒ ๐ฆ
= ๐
โ1 (๐
โฒ )โ1 ๐
โฒ ๐โฒ ๐ฆ = ๐
โ1 ๐โฒ ๐ฆ
Numerical routines would in this case use the alternative form ๐
๐ฝ ฬ = ๐โฒ ๐ฆ and back substitu-
tion
23.9 Exercises
23.9.1 Exercise 1
23.9.2 Exercise 2
Let ๐ = ๐(๐ โฒ ๐)โ1 ๐ โฒ and let ๐ = ๐ผ โ ๐ . Show that ๐ and ๐ are both idempotent and
symmetric. Can you give any intuition as to why they should be idempotent?
384 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
23.9.3 Exercise 3
1
๐ฆ โถ= โ
โ 3 โโ,
โ โ3 โ
and
1 0
๐ โถ= โ
โ 0 โ6 โ
โ
โ 2 2 โ
23.10 Solutions
23.10.1 Exercise 1
23.10.2 Exercise 2
Symmetry and idempotence of ๐ and ๐ can be established using standard rules for matrix
algebra. The intuition behind idempotence of ๐ and ๐ is that both are orthogonal projec-
tions. After a point is projected into a given subspace, applying the projection again makes
no difference. (A point inside the subspace is not shifted by orthogonal projection onto that
space because it is already the closest point in the subspace to itself.)
23.10.3 Exercise 3
Hereโs a function that computes the orthonormal vectors using the GS algorithm given in the
lecture
def gram_schmidt(X):
"""
Implements Gram-Schmidt orthogonalization.
Parameters
----------
X : an n x k array with linearly independent columns
Returns
-------
U : an n x k array with orthonormal columns
"""
# Set up
n, k = X.shape
U = np.empty((n, k))
23.10. SOLUTIONS 385
I = np.eye(n)
# Normalize
U[:, i] = u / np.sqrt(np.sum(u * u))
return U
X = [[1, 0],
[0, -6],
[2, 2]]
First, letโs try projection of ๐ฆ onto the column space of ๐ using the ordinary matrix expres-
sion:
Now letโs do the same using an orthonormal basis created from our gram_schmidt function
In [4]: U = gram_schmidt(X)
U
This is the same answer. So far so good. Finally, letโs try the same thing but with the basis
obtained via QR decomposition:
Q, R = qr(X, mode='economic')
Q
386 23. ORTHOGONAL PROJECTIONS AND THEIR APPLICATIONS
24.1 Contents
โข Overview 24.2
โข Relationships 24.3
โข LLN 24.4
โข CLT 24.5
โข Exercises 24.6
โข Solutions 24.7
24.2 Overview
This lecture illustrates two of the most important theorems of probability and statistics: The
law of large numbers (LLN) and the central limit theorem (CLT)
These beautiful theorems lie behind many of the most fundamental results in econometrics
and quantitative economic modeling
The lecture is based around simulations that show the LLN and CLT in action
We also demonstrate how the LLN and CLT break down when the assumptions they are
based on do not hold
In addition, we examine several useful extensions of the classical theorems, such as
24.3 Relationships
387
388 24. LLN AND CLT
The LLN gives conditions under which sample moments converge to population moments as
sample size increases
The CLT provides information about the rate at which sample moments converge to popula-
tion moments as sample size increases
24.4 LLN
We begin with the law of large numbers, which tells us when sample averages will converge to
their population means
The classical law of large numbers concerns independent and identically distributed (IID)
random variables
Here is the strongest version of the classical LLN, known as Kolmogorovโs strong law
Let ๐1 , โฆ , ๐๐ be independent and identically distributed scalar random variables, with com-
mon distribution ๐น
When it exists, let ๐ denote the common mean of this sample:
๐ โถ= E๐ = โซ ๐ฅ๐น (๐๐ฅ)
In addition, let
1 ๐
๐ฬ ๐ โถ= โ ๐๐
๐ ๐=1
P {๐ฬ ๐ โ ๐ as ๐ โ โ} = 1 (1)
24.4.2 Proof
The proof of Kolmogorovโs strong law is nontrivial โ see, for example, theorem 8.3.5 of [38]
On the other hand, we can prove a weaker version of the LLN very easily and still get most of
the intuition
24.4. LLN 389
The version we prove is as follows: If ๐1 , โฆ , ๐๐ is IID with E๐๐2 < โ, then, for any ๐ > 0,
we have
(This version is weaker because we claim only convergence in probability rather than almost
sure convergence, and assume a finite second moment)
To see that this is so, fix ๐ > 0, and let ๐2 be the variance of each ๐๐
Recall the Chebyshev inequality, which tells us that
E[(๐ฬ ๐ โ ๐)2 ]
P {|๐ฬ ๐ โ ๐| โฅ ๐} โค (3)
๐2
2
โง
{ 1 ๐ โซ
}
E[(๐ฬ ๐ โ ๐)2 ] = E โจ[ โ(๐๐ โ ๐)] โฌ
{ ๐ ๐=1 }
โฉ โญ
๐ ๐
1
= 2 โ โ E(๐๐ โ ๐)(๐๐ โ ๐)
๐ ๐=1 ๐=1
1 ๐
= 2 โ E(๐๐ โ ๐)2
๐ ๐=1
๐2
=
๐
Here the crucial step is at the third equality, which follows from independence
Independence means that if ๐ โ ๐, then the covariance term E(๐๐ โ ๐)(๐๐ โ ๐) drops out
As a result, ๐2 โ ๐ terms vanish, leading us to a final expression that goes to zero in ๐
Combining our last result with Eq. (3), we come to the estimate
๐2
P {|๐ฬ ๐ โ ๐| โฅ ๐} โค 2 (4)
๐๐
24.4.3 Illustration
Letโs now illustrate the classical IID law of large numbers using simulation
In particular, we aim to generate some sequences of IID random variables and plot the evolu-
tion of ๐ฬ ๐ as ๐ increases
Below is a figure that does just this (as usual, you can click on it to expand it)
It shows IID observations from three different distributions and plots ๐ฬ ๐ against ๐ in each
case
The dots represent the underlying observations ๐๐ for ๐ = 1, โฆ , 100
In each of the three cases, convergence of ๐ฬ ๐ to ๐ occurs as predicted
n = 100
for ax in axes:
# == Choose a randomly selected distribution == #
name = random.choice(list(distributions.keys()))
distribution = distributions.pop(name)
# == Plot == #
ax.plot(list(range(n)), data, 'o', color='grey', alpha=0.5)
axlabel = '$\\bar X_n$ for $X_i \sim$' + name
ax.plot(list(range(n)), sample_mean, 'g-', lw=3, alpha=0.6, label=axlabel)
m = distribution.mean()
ax.plot(list(range(n)), [m] * n, 'k--', lw=1.5, label='$\mu$')
ax.vlines(list(range(n)), m, data, lw=0.2)
ax.legend(**legend_args)
plt.show()
24.4. LLN 391
The three distributions are chosen at random from a selection stored in the dictionary dis-
tributions
What happens if the condition E|๐| < โ in the statement of the LLN is not satisfied?
This might be the case if the underlying distribution is heavy-tailed โ the best- known ex-
ample is the Cauchy distribution, which has density
1
๐(๐ฅ) = (๐ฅ โ R)
๐(1 + ๐ฅ2 )
The next figure shows 100 independent draws from this distribution
n = 100
distribution = cauchy()
392 24. LLN AND CLT
plt.show()
Notice how extreme observations are far more prevalent here than the previous figure
Letโs now have a look at the behavior of the sample mean
In [3]: n = 1000
distribution = cauchy()
# == Plot == #
ax.plot(list(range(n)), sample_mean, 'r-', lw=3, alpha=0.6,
label='$\\bar X_n$')
ax.plot(list(range(n)), [0] * n, 'k--', lw=0.5)
ax.legend()
plt.show()
24.5. CLT 393
Here weโve increased ๐ to 1000, but the sequence still shows no sign of converging
Will convergence become visible if we take ๐ even larger?
The answer is no
To see this, recall that the characteristic function of the Cauchy distribution is
ฬ ๐ก ๐
E๐๐๐ก๐๐ = E exp {๐ โ ๐๐ }
๐ ๐=1
๐
๐ก
= E โ exp {๐ ๐๐ }
๐=1
๐
๐
๐ก
= โ E exp {๐ ๐๐ } = [๐(๐ก/๐)]๐
๐=1
๐
24.5 CLT
Next, we turn to the central limit theorem, which tells us about the distribution of the devia-
tion between sample averages and population means
394 24. LLN AND CLT
The central limit theorem is one of the most remarkable results in all of mathematics
In the classical IID setting, it tells us the following:
If the sequence ๐1 , โฆ , ๐๐ is IID, with common mean ๐ and common variance ๐2 โ (0, โ),
then
โ ๐
๐(๐ฬ ๐ โ ๐) โ ๐ (0, ๐2 ) as ๐โโ (6)
๐
Here โ ๐ (0, ๐2 ) indicates convergence in distribution to a centered (i.e, zero mean) normal
with standard deviation ๐
24.5.2 Intuition
The striking implication of the CLT is that for any distribution with finite second moment,
the simple operation of adding independent copies always leads to a Gaussian curve
A relatively simple proof of the central limit theorem can be obtained by working with char-
acteristic functions (see, e.g., theorem 9.5.6 of [38])
The proof is elegant but almost anticlimactic, and it provides surprisingly little intuition
In fact, all of the proofs of the CLT that we know are similar in this respect
Why does adding independent copies produce a bell-shaped distribution?
Part of the answer can be obtained by investigating the addition of independent Bernoulli
random variables
In particular, let ๐๐ be binary, with P{๐๐ = 0} = P{๐๐ = 1} = 0.5, and let ๐1 , โฆ , ๐๐ be
independent
๐
Think of ๐๐ = 1 as a โsuccessโ, so that ๐๐ = โ๐=1 ๐๐ is the number of successes in ๐ trials
The next figure plots the probability mass function of ๐๐ for ๐ = 1, 2, 4, 8
plt.show()
24.5. CLT 395
When ๐ = 1, the distribution is flat โ one success or no successes have the same probability
When ๐ = 2 we can either have 0, 1 or 2 successes
Notice the peak in probability mass at the mid-point ๐ = 1
The reason is that there are more ways to get 1 success (โfail then succeedโ or โsucceed then
failโ) than to get zero or two successes
Moreover, the two trials are independent, so the outcomes โfail then succeedโ and โsucceed
then failโ are just as likely as the outcomes โfail then failโ and โsucceed then succeedโ
(If there was positive correlation, say, then โsucceed then failโ would be less likely than โsuc-
ceed then succeedโ)
Here, already we have the essence of the CLT: addition under independence leads probability
mass to pile up in the middle and thin out at the tails
For ๐ = 4 and ๐ = 8 we again get a peak at the โmiddleโ value (halfway between the mini-
mum and the maximum possible value)
The intuition is the same โ there are simply more ways to get these middle outcomes
If we continue, the bell-shaped curve becomes even more pronounced
We are witnessing the binomial approximation of the normal distribution
24.5.3 Simulation 1
Since the CLT seems almost magical, running simulations that verify its implications is one
good way to build intuition
To this end, we now perform the following simulation
โ
2. Generate independent draws of ๐๐ โถ= ๐(๐ฬ ๐ โ ๐)
3. Use these draws to compute some measure of their distribution โ such as a histogram
4. Compare the latter to ๐ (0, ๐2 )
Hereโs some code that does exactly this for the exponential distribution ๐น (๐ฅ) = 1 โ ๐โ๐๐ฅ
(Please experiment with other choices of ๐น , but remember that, to conform with the condi-
tions of the CLT, the distribution must have a finite second moment)
# == Set parameters == #
n = 250 # Choice of n
k = 100000 # Number of draws of Y_n
distribution = expon(2) # Exponential distribution, ฮป = 1/2
ฮผ, s = distribution.mean(), distribution.std()
# == Plot == #
fig, ax = plt.subplots(figsize=(10, 6))
xmin, xmax = -3 * s, 3 * s
ax.set_xlim(xmin, xmax)
ax.hist(Y, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
ax.plot(xgrid, norm.pdf(xgrid, scale=s), 'k-', lw=2, label='$N(0, \sigma^2)$')
ax.legend()
plt.show()
Notice the absence of for loops โ every operation is vectorized, meaning that the major cal-
culations are all shifted to highly optimized C code
24.5. CLT 397
The fit to the normal density is already tight and can be further improved by increasing n
You can also experiment with other specifications of ๐น
24.5.4 Simulation 2
Our next simulation is somewhat like the first, except that we aim to track the distribution of
โ
๐๐ โถ= ๐(๐ฬ ๐ โ ๐) as ๐ increases
In the simulation, weโll be working with random variables having ๐ = 0
Thus, when ๐ = 1, we have ๐1 = ๐1 , so the first distribution is just the distribution of the
underlying random variable
โ
For ๐ = 2, the distribution of ๐2 is that of (๐1 + ๐2 )/ 2, and so on
What we expect is that, regardless of the distribution of the underlying random variable, the
distribution of ๐๐ will smooth out into a bell-shaped curve
The next figure shows this process for ๐๐ โผ ๐, where ๐ was specified as the convex combina-
tion of three different beta densities
(Taking a convex combination is an easy way to produce an irregular shape for ๐)
In the figure, the closest density is that of ๐1 , while the furthest is that of ๐5
beta_dist = beta(2, 2)
def gen_x_draws(k):
"""
Returns a flat array containing k independent draws from the
distribution of X, the underlying random variable. This distribution is
itself a convex combination of three beta distributions.
"""
bdraws = beta_dist.rvs((3, k))
# == Transform rows, so each represents a different distribution == #
bdraws[0, :] -= 0.5
bdraws[1, :] += 0.6
bdraws[2, :] -= 1.1
# == Set X[i] = bdraws[j, i], where j is a random draw from {0, 1, 2} == #
js = np.random.randint(0, 2, size=k)
X = bdraws[js, np.arange(k)]
# == Rescale, so that the random variable is zero mean == #
m, sigma = X.mean(), X.std()
return (X - m) / sigma
nmax = 5
reps = 100000
ns = list(range(1, nmax + 1))
# == Plot == #
ax = fig.gca(projection='3d')
a, b = -3, 3
gs = 100
xs = np.linspace(a, b, gs)
# == Build verts == #
greys = np.linspace(0.3, 0.7, nmax)
verts = []
for n in ns:
density = gaussian_kde(Y[:, n-1])
ys = density(xs)
verts.append(list(zip(xs, ys)))
The law of large numbers and central limit theorem work just as nicely in multidimensional
settings
To state the results, letโs recall some elementary facts about random vectors
A random vector X is just a sequence of ๐ random variables (๐1 , โฆ , ๐๐ )
24.5. CLT 399
E[๐1 ] ๐1
โ
โ E[๐2 ] โ
โ โ
โ ๐2 โ
โ
E[X] โถ= โ
โ โ
โ =โ โ =โถ ๐
โ โฎ โ โโ โฎ โโ
โ E[๐ ๐] ๐
โ โ ๐ โ
1 ๐
Xฬ ๐ โถ= โ X๐
๐ ๐=1
P {Xฬ ๐ โ ๐ as ๐ โ โ} = 1 (7)
โ ๐
๐(Xฬ ๐ โ ๐) โ ๐ (0, ฮฃ) as ๐โโ (8)
400 24. LLN AND CLT
24.6 Exercises
24.6.1 Exercise 1
โ ๐
๐{๐(๐ฬ ๐ ) โ ๐(๐)} โ ๐ (0, ๐โฒ (๐)2 ๐2 ) as ๐โโ (9)
This theorem is used frequently in statistics to obtain the asymptotic distribution of estima-
tors โ many of which can be expressed as functions of sample means
(These kinds of results are often said to use the โdelta methodโ)
The proof is based on a Taylor expansion of ๐ around the point ๐
Taking the result as given, let the distribution ๐น of each ๐๐ be uniform on [0, ๐/2] and let
๐(๐ฅ) = sin(๐ฅ)
โ
Derive the asymptotic distribution of ๐{๐(๐ฬ ๐ ) โ ๐(๐)} and illustrate convergence in the
same spirit as the program illustrate_clt.py discussed above
What happens when you replace [0, ๐/2] with [0, ๐]?
What is the source of the problem?
24.6.2 Exercise 2
Hereโs a result thatโs often used in developing statistical tests, and is connected to the multi-
variate central limit theorem
If you study econometric theory, you will see this result used again and again
Assume the setting of the multivariate CLT discussed above, so that
โ ๐
๐(Xฬ ๐ โ ๐) โ ๐ (0, ฮฃ) (10)
is valid
In a statistical setting, one often wants the right-hand side to be standard normal so that
confidence intervals are easily computed
This normalization can be achieved on the basis of three observations
First, if X is a random vector in R๐ and A is constant and ๐ ร ๐, then
Var[AX] = A Var[X]Aโฒ
24.6. EXERCISES 401
๐
Second, by the continuous mapping theorem, if Z๐ โ Z in R๐ and A is constant and ๐ ร ๐,
then
๐
AZ๐ โ AZ
Third, if S is a ๐ ร ๐ symmetric positive definite matrix, then there exists a symmetric posi-
tive definite matrix Q, called the inverse square root of S, such that
QSQโฒ = I
โ ๐
Z๐ โถ= ๐Q(Xฬ ๐ โ ๐) โ Z โผ ๐ (0, I)
Applying the continuous mapping theorem one more time tells us that
๐
โZ๐ โ2 โ โZโ2
๐
๐โQ(Xฬ ๐ โ ๐)โ2 โ ๐2 (๐) (11)
๐๐
X๐ โถ= ( )
๐๐ + ๐ ๐
where
Hints:
24.7 Solutions
24.7.1 Exercise 1
In [7]: """
Illustrates the delta method, a consequence of the central limit theorem.
"""
# == Set parameters == #
n = 250
replications = 100000
distribution = uniform(loc=0, scale=(np.pi / 2))
ฮผ, s = distribution.mean(), distribution.std()
g = np.sin
g_prime = np.cos
# == Plot == #
asymptotic_sd = g_prime(ฮผ) * s
fig, ax = plt.subplots(figsize=(10, 6))
xmin = -3 * g_prime(ฮผ) * s
xmax = -xmin
ax.set_xlim(xmin, xmax)
ax.hist(error_obs, bins=60, alpha=0.5, density=True)
xgrid = np.linspace(xmin, xmax, 200)
lb = "$N(0, g'(\mu)^2 \sigma^2)$"
ax.plot(xgrid, norm.pdf(xgrid, scale=asymptotic_sd), 'k-', lw=2, label=lb)
ax.legend()
plt.show()
24.7. SOLUTIONS 403
What happens when you replace [0, ๐/2] with [0, ๐]?
In this case, the mean ๐ of this distribution is ๐/2, and since ๐โฒ = cos, we have ๐โฒ (๐) = 0
Hence the conditions of the delta theorem are not satisfied
24.7.2 Exercise 2
โ ๐
๐Q(Xฬ ๐ โ ๐) โ ๐ (0, I)
โ
Y๐ โถ= ๐(Xฬ ๐ โ ๐) and Y โผ ๐ (0, ฮฃ)
๐
QY๐ โ QY
Since linear combinations of normal random variables are normal, the vector QY is also nor-
mal
Its mean is clearly 0, and its variance-covariance matrix is
๐
In conclusion, QY๐ โ QY โผ ๐ (0, I), which is what we aimed to show
Now we turn to the simulation exercise
Our solution is as follows
# == Set parameters == #
n = 250
replications = 50000
dw = uniform(loc=-1, scale=2) # Uniform(-1, 1)
du = uniform(loc=-2, scale=4) # Uniform(-2, 2)
sw, su = dw.std(), du.std()
vw, vu = sw**2, su**2
ฮฃ = ((vw, vw), (vw, vw + vu))
ฮฃ = np.array(ฮฃ)
# == Compute ฮฃ^{-1/2} == #
Q = inv(sqrtm(ฮฃ))
# == Plot == #
fig, ax = plt.subplots(figsize=(10, 6))
xmax = 8
ax.set_xlim(0, xmax)
xgrid = np.linspace(0, xmax, 200)
lb = "Chi-squared with 2 degrees of freedom"
ax.plot(xgrid, chi2.pdf(xgrid, 2), 'k-', lw=2, label=lb)
ax.legend()
ax.hist(chisq_obs, bins=50, density=True)
plt.show()
25
25.1 Contents
โข Overview 25.2
โข Prediction 25.7
โข Code 25.8
โข Exercises 25.9
โข Solutions 25.10
โWe may regard the present state of the universe as the effect of its past and the
cause of its futureโ โ Marquis de Laplace
In addition to whatโs in Anaconda, this lecture will need the following libraries
25.2 Overview
405
406 25. LINEAR STATE SPACE MODELS
โ non-financial income
โ dividends on a stock
โ the money supply
โ a government deficit or surplus, etc.
25.3.1 Primitives
1. the matrices ๐ด, ๐ถ, ๐บ
2. shock distribution, which we have specialized to ๐ (0, ๐ผ)
3. the distribution of the initial condition ๐ฅ0 , which we have set to ๐ (๐0 , ฮฃ0 )
Given ๐ด, ๐ถ, ๐บ and draws of ๐ฅ0 and ๐ค1 , ๐ค2 , โฆ, the model Eq. (1) pins down the values of the
sequences {๐ฅ๐ก } and {๐ฆ๐ก }
Even without these draws, the primitives 1โ3 pin down the probability distributions of {๐ฅ๐ก }
and {๐ฆ๐ก }
Later weโll see how to compute these distributions and their moments
Martingale Difference Shocks
Weโve made the common assumption that the shocks are independent standardized normal
vectors
25.3. THE LINEAR STATE SPACE MODEL 407
But some of what we say will be valid under the assumption that {๐ค๐ก+1 } is a martingale
difference sequence
A martingale difference sequence is a sequence that is zero mean when conditioned on past
information
In the present case, since {๐ฅ๐ก } is our state sequence, this means that it satisfies
This is a weaker condition than that {๐ค๐ก } is IID with ๐ค๐ก+1 โผ ๐ (0, ๐ผ)
25.3.2 Examples
To map Eq. (2) into our state space system Eq. (1), we set
1 1 0 0 0
๐ฅ๐ก = โก ๐ฆ
โข ๐ก โฅ
โค ๐ด=โก โค
โข 0 ๐1 ๐2 โฅ
๐ ๐ถ=โก
โข0โฅ
โค ๐บ = [0 1 0]
โฃ๐ฆ๐กโ1 โฆ โฃ0 1 0โฆ โฃ0โฆ
You can confirm that under these definitions, Eq. (1) and Eq. (2) agree
The next figure shows the dynamics of this process when ๐0 = 1.1, ๐1 = 0.8, ๐2 = โ0.8, ๐ฆ0 =
๐ฆโ1 = 1
408 25. LINEAR STATE SPACE MODELS
๐1 ๐2 ๐3 ๐4 ๐
โก1 0 0 0โค โก0โค
๐ด=โข โฅ ๐ถ=โข โฅ ๐บ = [1 0 0 0]
โข0 1 0 0โฅ โข0โฅ
โฃ0 0 1 0โฆ โฃ0โฆ
The matrix ๐ด has the form of the companion matrix to the vector [๐1 ๐2 ๐3 ๐4 ]
The next figure shows the dynamics of this process when
Vector Autoregressions
Now suppose that
โข ๐ฆ๐ก is a ๐ ร 1 vector
โข ๐๐ is a ๐ ร ๐ matrix and
โข ๐ค๐ก is ๐ ร 1
๐ฆ๐ก ๐1 ๐2 ๐3 ๐4 ๐
โก๐ฆ โค โก๐ผ 0 0 0โค โก0โค
๐ฅ๐ก = โข ๐กโ1 โฅ ๐ด=โข โฅ ๐ถ=โข โฅ ๐บ = [๐ผ 0 0 0]
โข๐ฆ๐กโ2 โฅ โข0 ๐ผ 0 0โฅ โข0โฅ
โฃ๐ฆ๐กโ3 โฆ โฃ0 0 ๐ผ 0โฆ โฃ0โฆ
0 0 0 1
โก1 0 0 0โค
๐ด=โข โฅ
โข0 1 0 0โฅ
โฃ0 0 1 0โฆ
It is easy to check that ๐ด4 = ๐ผ, which implies that ๐ฅ๐ก is strictly periodic with period 4:[1]
๐ฅ๐ก+4 = ๐ฅ๐ก
Such an ๐ฅ๐ก process can be used to model deterministic seasonals in quarterly time series
The indeterministic seasonal produces recurrent, but aperiodic, seasonal fluctuations
Time Trends
The model ๐ฆ๐ก = ๐๐ก + ๐ is known as a linear time trend
We can represent this model in the linear state space form by taking
1 1 0
๐ด=[ ] ๐ถ=[ ] ๐บ = [๐ ๐] (4)
0 1 0
โฒ
and starting at initial condition ๐ฅ0 = [0 1]
In fact, itโs possible to use the state-space system to represent polynomial trends of any order
For instance, let
0 1 1 0 0
๐ฅ0 = โข0โค
โก
โฅ ๐ด = โข0 1 1 โค
โก
โฅ ๐ถ = โข0โค
โก
โฅ
1
โฃ โฆ โฃ 0 0 1 โฆ 0
โฃ โฆ
It follows that
1 ๐ก ๐ก(๐ก โ 1)/2
๐ด๐ก = โก
โข0 1 ๐ก โค
โฅ
โฃ0 0 1 โฆ
410 25. LINEAR STATE SPACE MODELS
Then ๐ฅโฒ๐ก = [๐ก(๐ก โ 1)/2 ๐ก 1], so that ๐ฅ๐ก contains linear and quadratic time trends
๐ฅ๐ก = ๐ด๐ฅ๐กโ1 + ๐ถ๐ค๐ก
= ๐ด2 ๐ฅ๐กโ2 + ๐ด๐ถ๐ค๐กโ1 + ๐ถ๐ค๐ก
โฎ (5)
๐กโ1
= โ ๐ด๐ ๐ถ๐ค๐กโ๐ + ๐ด๐ก ๐ฅ0
๐=0
1 1 1
๐ด=[ ] ๐ถ=[ ]
0 1 0
1 ๐ก โฒ
You will be able to show that ๐ด๐ก = [ ] and ๐ด๐ ๐ถ = [1 0]
0 1
Substituting into the moving average representation Eq. (5), we obtain
๐กโ1
๐ฅ1๐ก = โ ๐ค๐กโ๐ + [1 ๐ก] ๐ฅ0
๐=0
Using Eq. (1), itโs easy to obtain expressions for the (unconditional) means of ๐ฅ๐ก and ๐ฆ๐ก
Weโll explain what unconditional and conditional mean soon
25.4. DISTRIBUTIONS AND MOMENTS 411
This is to distinguish ๐๐ก and ฮฃ๐ก from related objects that use conditioning information, to be
defined below
However, you should be aware that these โunconditionalโ moments do depend on the initial
distribution ๐ (๐0 , ฮฃ0 )
Moments of the Observations
Using linearity of expectations again we have
25.4.2 Distributions
In general, knowing the mean and variance-covariance matrix of a random vector is not quite
as good as knowing the full distribution
However, there are some situations where these moments alone tell us all we need to know
These are situations in which the mean vector and covariance matrix are sufficient statis-
tics for the population distribution
(Sufficient statistics form a list of objects that characterize a population distribution)
One such situation is when the vector in question is Gaussian (i.e., normally distributed)
This is the case here, given
In particular, given our Gaussian assumptions on the primitives and the linearity of Eq. (1)
we can see immediately that both ๐ฅ๐ก and ๐ฆ๐ก are Gaussian for all ๐ก โฅ 0 [2]
Since ๐ฅ๐ก is Gaussian, to find the distribution, all we need to do is find its mean and variance-
covariance matrix
But in fact weโve already done this, in Eq. (6) and Eq. (7)
Letting ๐๐ก and ฮฃ๐ก be as defined by these equations, we have
๐ฅ๐ก โผ ๐ (๐๐ก , ฮฃ๐ก ) (11)
In the right-hand figure, these values are converted into a rotated histogram that shows rela-
tive frequencies from our sample of 20 ๐ฆ๐ โs
(The parameters and source code for the figures can be found in file lin-
ear_models/paths_and_hist.py)
Here is another figure, this time with 100 observations
25.4. DISTRIBUTIONS AND MOMENTS 413
Letโs now try with 500,000 observations, showing only the histogram (without rotation)
The black line is the population density of ๐ฆ๐ calculated from Eq. (12)
The histogram and population distribution are close, as expected
By looking at the figures and experimenting with parameters, you will gain a feel for how the
population distribution depends on the model primitives listed above, as intermediated by the
distributionโs sufficient statistics
Ensemble Means
In the preceding figure, we approximated the population distribution of ๐ฆ๐ by
Just as the histogram approximates the population distribution, the ensemble or cross-
sectional average
1 ๐ผ ๐
๐ฆ๐ฬ โถ= โ๐ฆ
๐ผ ๐=1 ๐
approximates the expectation E[๐ฆ๐ ] = ๐บ๐๐ (as implied by the law of large numbers)
Hereโs a simulation comparing the ensemble averages and population means at time points
๐ก = 0, โฆ , 50
414 25. LINEAR STATE SPACE MODELS
The parameters are the same as for the preceding figures, and the sample size is relatively
small (๐ผ = 20)
1 ๐ผ ๐
๐ฅ๐ฬ โถ= โ ๐ฅ โ ๐๐ (๐ผ โ โ)
๐ผ ๐=1 ๐
1 ๐ผ
โ(๐ฅ๐ โ ๐ฅ๐ฬ )(๐ฅ๐๐ โ ๐ฅ๐ฬ )โฒ โ ฮฃ๐ (๐ผ โ โ)
๐ผ ๐=1 ๐
๐ โ1
๐(๐ฅ0 , ๐ฅ1 , โฆ , ๐ฅ๐ ) = ๐(๐ฅ0 ) โ ๐(๐ฅ๐ก+1 | ๐ฅ๐ก )
๐ก=0
๐(๐ฅ๐ก+1 | ๐ฅ๐ก ) = ๐ (๐ด๐ฅ๐ก , ๐ถ๐ถ โฒ )
Autocovariance Functions
An important object related to the joint distribution is the autocovariance function
ฮฃ๐ก+๐,๐ก = ๐ด๐ ฮฃ๐ก (14)
Notice that ฮฃ๐ก+๐,๐ก in general depends on both ๐, the gap between the two dates, and ๐ก, the
earlier date
Stationarity and ergodicity are two properties that, when they hold, greatly aid analysis of
linear state space models
Letโs start with the intuition
Letโs look at some more time series from the same model that we analyzed above
This picture shows cross-sectional distributions for ๐ฆ at times ๐ , ๐ โฒ , ๐ โณ
416 25. LINEAR STATE SPACE MODELS
Note how the time series โsettle downโ in the sense that the distributions at ๐ โฒ and ๐ โณ are
relatively similar to each other โ but unlike the distribution at ๐
Apparently, the distributions of ๐ฆ๐ก converge to a fixed long-run distribution as ๐ก โ โ
When such a distribution exists it is called a stationary distribution
Since
๐โ = ๐ (๐โ , ฮฃโ )
where ๐โ and ฮฃโ are fixed points of Eq. (6) and Eq. (7) respectively
25.5. STATIONARITY AND ERGODICITY 417
Letโs see what happens to the preceding figure if we start ๐ฅ0 at the stationary distribution
Now the differences in the observed distributions at ๐ , ๐ โฒ and ๐ โณ come entirely from random
fluctuations due to the finite sample size
By
Moreover, in view of Eq. (14), the autocovariance function takes the form ฮฃ๐ก+๐,๐ก = ๐ด๐ ฮฃโ ,
which depends on ๐ but not on ๐ก
This motivates the following definition
A process {๐ฅ๐ก } is said to be covariance stationary if
In our setting, {๐ฅ๐ก } will be covariance stationary if ๐0 , ฮฃ0 , ๐ด, ๐ถ assume values that imply that
none of ๐๐ก , ฮฃ๐ก , ฮฃ๐ก+๐,๐ก depends on ๐ก
The difference equation Eq. (7) also has a unique fixed point in this case, and, moreover
๐๐ก โ ๐โ = 0 and ฮฃ๐ก โ ฮฃโ as ๐กโโ
๐ด1 ๐ ๐ถ1
๐ด=[ ] ๐ถ=[ ]
0 1 0
where
โข ๐ด1 is an (๐ โ 1) ร (๐ โ 1) matrix
โข ๐ is an (๐ โ 1) ร 1 column vector
โฒ
Let ๐ฅ๐ก = [๐ฅโฒ1๐ก 1] where ๐ฅ1๐ก is (๐ โ 1) ร 1
It follows that
Let ๐1๐ก = E[๐ฅ1๐ก ] and take expectations on both sides of this expression to get
Assume now that the moduli of the eigenvalues of ๐ด1 are all strictly less than one
Then Eq. (15) has a unique stationary solution, namely,
๐1โ = (๐ผ โ ๐ด1 )โ1 ๐
โฒ
The stationary value of ๐๐ก itself is then ๐โ โถ= [๐โฒ1โ 1]
The stationary values of ฮฃ๐ก and ฮฃ๐ก+๐,๐ก satisfy
ฮฃโ = ๐ดฮฃโ ๐ดโฒ + ๐ถ๐ถ โฒ
(16)
ฮฃ๐ก+๐,๐ก = ๐ด๐ ฮฃโ
25.5. STATIONARITY AND ERGODICITY 419
Notice that here ฮฃ๐ก+๐,๐ก depends on the time gap ๐ but not on calendar time ๐ก
In conclusion, if
โข ๐ฅ0 โผ ๐ (๐โ , ฮฃโ ) and
โข the moduli of the eigenvalues of ๐ด1 are all strictly less than unity
then the {๐ฅ๐ก } process is covariance stationary, with constant state component
Note
If the eigenvalues of ๐ด1 are less than unity in modulus, then (a) starting from any
initial value, the mean and variance-covariance matrix both converge to their sta-
tionary values; and (b) iterations on Eq. (7) converge to the fixed point of the dis-
crete Lyapunov equation in the first line of Eq. (16)
25.5.5 Ergodicity
1 ๐ 1 ๐
๐ฅฬ โถ= โ๐ฅ and ๐ฆ ฬ โถ= โ๐ฆ
๐ ๐ก=1 ๐ก ๐ ๐ก=1 ๐ก
Do these time series averages converge to something interpretable in terms of our basic state-
space representation?
The answer depends on something called ergodicity
Ergodicity is the property that time series and ensemble averages coincide
More formally, ergodicity implies that time series sample averages converge to their expecta-
tion under the stationary distribution
In particular,
1 ๐
โข ๐ โ๐ก=1 ๐ฅ๐ก โ ๐โ
1 ๐
โข ๐ โ๐ก=1 (๐ฅ๐ก โ ๐ฅ๐ฬ )(๐ฅ๐ก โ ๐ฅ๐ฬ )โฒ โ ฮฃโ
1 ๐
โข ๐ โ๐ก=1 (๐ฅ๐ก+๐ โ ๐ฅ๐ฬ )(๐ฅ๐ก โ ๐ฅ๐ฬ )โฒ โ ๐ด๐ ฮฃโ
In our linear Gaussian setting, any covariance stationary process is also ergodic
420 25. LINEAR STATE SPACE MODELS
In some settings, the observation equation ๐ฆ๐ก = ๐บ๐ฅ๐ก is modified to include an error term
Often this error term represents the idea that the true state can only be observed imperfectly
To include an error term in the observation we introduce
๐ฆ๐ก โผ ๐ (๐บ๐๐ก , ๐บฮฃ๐ก ๐บโฒ + ๐ป๐ป โฒ )
25.7 Prediction
The theory of prediction for linear state space systems is elegant and simple
The right-hand side follows from ๐ฅ๐ก+1 = ๐ด๐ฅ๐ก + ๐ถ๐ค๐ก+1 and the fact that ๐ค๐ก+1 is zero mean and
independent of ๐ฅ๐ก , ๐ฅ๐กโ1 , โฆ , ๐ฅ0
That E๐ก [๐ฅ๐ก+1 ] = E[๐ฅ๐ก+1 โฃ ๐ฅ๐ก ] is an implication of {๐ฅ๐ก } having the Markov property
25.7. PREDICTION 421
More generally, weโd like to compute the ๐-step ahead forecasts E๐ก [๐ฅ๐ก+๐ ] and E๐ก [๐ฆ๐ก+๐ ]
With a bit of algebra, we obtain
In view of the IID property, current and past state values provide no information about fu-
ture values of the shock
Hence E๐ก [๐ค๐ก+๐ ] = E[๐ค๐ก+๐ ] = 0
It now follows from linearity of expectations that the ๐-step ahead forecast of ๐ฅ is
E๐ก [๐ฅ๐ก+๐ ] = ๐ด๐ ๐ฅ๐ก
It is useful to obtain the covariance matrix of the vector of ๐-step-ahead prediction errors
๐โ1
๐ฅ๐ก+๐ โ E๐ก [๐ฅ๐ก+๐ ] = โ ๐ด๐ ๐ถ๐ค๐กโ๐ +๐ (20)
๐ =0
Evidently,
๐โ1
โฒ
๐๐ โถ= E๐ก [(๐ฅ๐ก+๐ โ E๐ก [๐ฅ๐ก+๐ ])(๐ฅ๐ก+๐ โ E๐ก [๐ฅ๐ก+๐ ]) ] = โ ๐ด๐ ๐ถ๐ถ โฒ ๐ด๐
โฒ
(21)
๐=0
๐๐ is the conditional covariance matrix of the errors in forecasting ๐ฅ๐ก+๐ , conditioned on time ๐ก
information ๐ฅ๐ก
Under particular conditions, ๐๐ converges to
๐โ = ๐ถ๐ถ โฒ + ๐ด๐โ ๐ดโฒ (23)
422 25. LINEAR STATE SPACE MODELS
Equation Eq. (23) is an example of a discrete Lyapunov equation in the covariance matrix ๐โ
A sufficient condition for ๐๐ to converge is that the eigenvalues of ๐ด be strictly less than one
in modulus
Weaker sufficient conditions for convergence associate eigenvalues equaling or exceeding one
in modulus with elements of ๐ถ that equal 0
In several contexts, we want to compute forecasts of geometric sums of future random vari-
ables governed by the linear state-space system Eq. (1)
We want the following objects
โ
โข Forecast of a geometric sum of future ๐ฅโs, or E๐ก [โ๐=0 ๐ฝ ๐ ๐ฅ๐ก+๐ ]
โ
โข Forecast of a geometric sum of future ๐ฆโs, or E๐ก [โ๐=0 ๐ฝ ๐ ๐ฆ๐ก+๐ ]
These objects are important components of some famous and interesting dynamic models
For example,
โ
โข if {๐ฆ๐ก } is a stream of dividends, then E [โ๐=0 ๐ฝ ๐ ๐ฆ๐ก+๐ |๐ฅ๐ก ] is a model of a stock price
โ
โข if {๐ฆ๐ก } is the money supply, then E [โ๐=0 ๐ฝ ๐ ๐ฆ๐ก+๐ |๐ฅ๐ก ] is a model of the price level
Formulas
Fortunately, it is easy to use a little matrix algebra to compute these objects
1
Suppose that every eigenvalue of ๐ด has modulus strictly less than ๐ฝ
โ1
It then follows that ๐ผ + ๐ฝ๐ด + ๐ฝ 2 ๐ด2 + โฏ = [๐ผ โ ๐ฝ๐ด]
This leads to our formulas:
โ
E๐ก [โ ๐ฝ ๐ ๐ฅ๐ก+๐ ] = [๐ผ + ๐ฝ๐ด + ๐ฝ 2 ๐ด2 + โฏ ]๐ฅ๐ก = [๐ผ โ ๐ฝ๐ด]โ1 ๐ฅ๐ก
๐=0
โ
E๐ก [โ ๐ฝ ๐ ๐ฆ๐ก+๐ ] = ๐บ[๐ผ + ๐ฝ๐ด + ๐ฝ 2 ๐ด2 + โฏ ]๐ฅ๐ก = ๐บ[๐ผ โ ๐ฝ๐ด]โ1 ๐ฅ๐ก
๐=0
25.8 Code
Our preceding simulations and calculations are based on code in the file lss.py from the
QuantEcon.py package
25.9. EXERCISES 423
The code implements a class for handling linear state space models (simulations, calculating
moments, etc.)
One Python construct you might not be familiar with is the use of a generator function in the
method moment_sequence()
Go back and read the relevant documentation if youโve forgotten how generator functions
work
Examples of usage are given in the solutions to the exercises
25.9 Exercises
25.9.1 Exercise 1
25.9.2 Exercise 2
25.9.3 Exercise 3
25.9.4 Exercise 4
25.10 Solutions
In [2]: import numpy as np
import matplotlib.pyplot as plt
from quantecon import LinearStateSpace
25.10.1 Exercise 1
In [3]: ๏ฟฝ_0, ๏ฟฝ_1, ๏ฟฝ_2 = 1.1, 0.8, -0.8
A = [[1, 0, 0 ],
424 25. LINEAR STATE SPACE MODELS
ar = LinearStateSpace(A, C, G, mu_0=np.ones(3))
x, y = ar.simulate(ts_length=50)
25.10.2 Exercise 2
In [4]: ๏ฟฝ_1, ๏ฟฝ_2, ๏ฟฝ_3, ๏ฟฝ_4 = 0.5, -0.2, 0, 0.5
ฯ = 0.2
ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
x, y = ar.simulate(ts_length=200)
25.10.3 Exercise 3
In [5]: from scipy.stats import norm
import random
I = 20
T = 50
ar = LinearStateSpace(A, C, G, mu_0=np.ones(4))
ymin, ymax = -0.5, 1.15
ax.set_ylim(ymin, ymax)
ax.set_xlabel('time', fontsize=16)
ax.set_ylabel('$y_t$', fontsize=16)
ensemble_mean = np.zeros(T)
for i in range(I):
x, y = ar.simulate(ts_length=T)
y = y.flatten()
ax.plot(y, 'c-', lw=0.8, alpha=0.5)
ensemble_mean = ensemble_mean + y
ensemble_mean = ensemble_mean / I
ax.plot(ensemble_mean, color='b', lw=2, alpha=0.8, label='$\\bar y_t$')
m = ar.moment_sequence()
population_means = []
426 25. LINEAR STATE SPACE MODELS
for t in range(T):
ฮผ_x, ฮผ_y, ฮฃ_x, ฮฃ_y = next(m)
population_means.append(float(ฮผ_y))
ax.plot(population_means, color='g', lw=2, alpha=0.8, label='$G\mu_t$')
ax.legend(ncol=2)
plt.show()
25.10.4 Exercise 4
In [6]: ๏ฟฝ_1, ๏ฟฝ_2, ๏ฟฝ_3, ๏ฟฝ_4 = 0.5, -0.2, 0, 0.5
ฯ = 0.1
T0 = 10
T1 = 50
T2 = 75
T4 = 100
ax.grid(alpha=0.4)
ax.set_ylim(ymin, ymax)
ax.set_ylabel('$y_t$', fontsize=16)
ax.vlines((T0, T1, T2), -1.5, 1.5)
for i in range(80):
rcolor = random.choice(('c', 'g', 'b'))
x, y = ar.simulate(ts_length=T4)
y = y.flatten()
ax.plot(y, color=rcolor, lw=0.8, alpha=0.5)
ax.plot((T0, T1, T2), (y[T0], y[T1], y[T2],), 'ko', alpha=0.5)
plt.show()
Footnotes
[1] The eigenvalues of ๐ด are (1, โ1, ๐, โ๐).
[2] The correct way to argue this is by induction. Suppose that ๐ฅ๐ก is Gaussian. Then Eq. (1)
and Eq. (10) imply that ๐ฅ๐ก+1 is Gaussian. Since ๐ฅ0 is assumed to be Gaussian, it follows that
every ๐ฅ๐ก is Gaussian. Evidently, this implies that each ๐ฆ๐ก is Gaussian.
428 25. LINEAR STATE SPACE MODELS
26
26.1 Contents
โข Overview 26.2
โข Definitions 26.3
โข Simulation 26.4
โข Ergodicity 26.8
โข Exercises 26.10
โข Solutions 26.11
In addition to whatโs in Anaconda, this lecture will need the following libraries
26.2 Overview
Markov chains are one of the most useful classes of stochastic processes, being
You will find them in many of the workhorse models of economics and finance
In this lecture, we review some of the theory of Markov chains
429
430 26. FINITE MARKOV CHAINS
We will also introduce some of the high-quality routines for working with Markov chains
available in QuantEcon.py
Prerequisite knowledge is basic probability and linear algebra
26.3 Definitions
Each row of ๐ can be regarded as a probability mass function over ๐ possible outcomes
It is too not difficult to check [1] that if ๐ is a stochastic matrix, then so is the ๐-th power ๐ ๐
for all ๐ โ N
In other words, knowing the current state is enough to know probabilities for future states
In particular, the dynamics of a Markov chain are fully determined by the set of values
By construction,
โข ๐ (๐ฅ, ๐ฆ) is the probability of going from ๐ฅ to ๐ฆ in one unit of time (one step)
โข ๐ (๐ฅ, โ
) is the conditional distribution of ๐๐ก+1 given ๐๐ก = ๐ฅ
๐๐๐ = ๐ (๐ฅ๐ , ๐ฅ๐ ) 1 โค ๐, ๐ โค ๐
26.3. DEFINITIONS 431
Going the other way, if we take a stochastic matrix ๐ , we can generate a Markov chain {๐๐ก }
as follows:
26.3.3 Example 1
Consider a worker who, at any given time ๐ก, is either unemployed (state 0) or employed (state
1)
Suppose that, over a one month period,
โข ๐ = {0, 1}
โข ๐ (0, 1) = ๐ผ and ๐ (1, 0) = ๐ฝ
1โ๐ผ ๐ผ
๐ =( )
๐ฝ 1โ๐ฝ
Once we have the values ๐ผ and ๐ฝ, we can address a range of questions, such as
26.3.4 Example 2
0.971 0.029 0
๐ =โ
โ 0.145 0.778 0.077 โ
โ
โ 0 0.508 0.492 โ
where
For example, the matrix tells us that when the state is normal growth, the state will again be
normal growth next month with probability 0.97
In general, large values on the main diagonal indicate persistence in the process {๐๐ก }
This Markov process can also be represented as a directed graph, with edges labeled by tran-
sition probabilities
26.4 Simulation
One natural way to answer questions about Markov chains is to simulate them
(To approximate the probability of event ๐ธ, we can simulate many times and count the frac-
tion of times that ๐ธ occurs)
Nice functionality for simulating Markov chains exists in QuantEcon.py
โข Efficient, bundled with lots of other useful routines for handling Markov chains
However, itโs also a good exercise to roll our own routines โ letโs do that first and then come
back to the methods in QuantEcon.py
In these exercises, weโll take the state space to be ๐ = 0, โฆ , ๐ โ 1
To simulate a Markov chain, we need its stochastic matrix ๐ and either an initial state or a
probability distribution ๐ for initial state to be drawn from
The Markov chain is then constructed as discussed above. To repeat:
In order to implement this simulation procedure, we need a method for generating draws from
a discrete distribution
For this task, weโll use DiscreteRV from QuantEcon
26.4. SIMULATION 433
Weโll write our code as a function that takes the following three arguments
โข A stochastic matrix P
โข An initial state init
โข A positive integer sample_size representing the length of the time series the function
should return
return X
0.4 0.6
๐ โถ= ( ) (3)
0.2 0.8
As weโll see later, for a long series drawn from P, the fraction of the sample that takes value 0
will be about 0.25
If you run the following code you should get roughly that answer
Out[4]: 0.25109
As discussed above, QuantEcon.py has routines for handling Markov chains, including simula-
tion
Hereโs an illustration using the same P as the preceding example
Out[5]: 0.249741
678 ms ยฑ 9.12 ms per loop (mean ยฑ std. dev. of 7 runs, 1 loop each)
30.2 ms ยฑ 396 ยตs per loop (mean ยฑ std. dev. of 7 runs, 10 loops each)
If we want to simulate with output as indices rather than state values we can use
In [11]: mc.simulate_indices(ts_length=4)
Suppose that
26.5.1 Solution
In words, to get the probability of being at ๐ฆ tomorrow, we account for all ways this can hap-
pen and sum their probabilities
Rewriting this statement in terms of marginal and conditional probabilities gives
>
๐๐ก+1 (๐ฆ) = โ ๐ (๐ฅ, ๐ฆ)๐๐ก (๐ฅ)
๐ฅโ๐
๐๐ก+1 = ๐๐ก ๐ (4)
In other words, to move the distribution forward one unit of time, we postmultiply by ๐
By repeating this ๐ times we move forward ๐ steps into the future
Hence, iterating on Eq. (4), the expression ๐๐ก+๐ = ๐๐ก ๐ ๐ is also valid โ here ๐ ๐ is the ๐-th
power of ๐
As a special case, we see that if ๐0 is the initial distribution from which ๐0 is drawn, then
๐0 ๐ ๐ is the distribution of ๐๐
This is very important, so letโs repeat it
๐0 โผ ๐ 0 โน ๐๐ โผ ๐0 ๐ ๐ (5)
๐๐ก โผ ๐๐ก โน ๐๐ก+๐ โผ ๐๐ก ๐ ๐ (6)
Inserting this into Eq. (6), we see that, conditional on ๐๐ก = ๐ฅ, the distribution of ๐๐ก+๐ is the
๐ฅ-th row of ๐ ๐
In particular
Recall the stochastic matrix ๐ for recession and growth considered above
Suppose that the current state is unknown โ perhaps statistics are available only at the end
of the current month
We estimate the probability that the economy is in state ๐ฅ to be ๐(๐ฅ)
The probability of being in recession (either mild or severe) in 6 months time is given by the
inner product
0
๐๐ 6 โ
โ
โ 1 โ
โ
โ 1 โ
The marginal distributions we have been studying can be viewed either as probabilities or as
cross-sectional frequencies in large samples
To illustrate, recall our model of employment/unemployment dynamics for a given worker
discussed above
Consider a large (i.e., tending to infinite) population of workers, each of whose lifetime expe-
rience is described by the specified dynamics, independent of one another
Let ๐ be the current cross-sectional distribution over {0, 1}
The cross-sectional distribution records the fractions of workers employed and unemployed at
a given moment
The same distribution also describes the fractions of a particular workerโs career spent being
employed and unemployed, respectively
Irreducibility and aperiodicity are central concepts of modern Markov chain theory
Letโs see what theyโre about
26.6. IRREDUCIBILITY AND APERIODICITY 437
26.6.1 Irreducibility
The stochastic matrix ๐ is called irreducible if all states communicate; that is, if ๐ฅ and ๐ฆ
communicate for all (๐ฅ, ๐ฆ) in ๐ ร ๐
For example, consider the following transition probabilities for wealth of a fictitious set of
households
We can translate this into a stochastic matrix, putting zeros where thereโs no edge between
nodes
0.9 0.1 0
๐ โถ= โ
โ 0.4 0.4 0.2 โ
โ
โ 0.1 0.1 0.8 โ
Itโs clear from the graph that this stochastic matrix is irreducible: we can reach any state
from any other state eventually
We can also test this using QuantEcon.pyโs MarkovChain class
Out[12]: True
Hereโs a more pessimistic scenario, where the poor are poor forever
438 26. FINITE MARKOV CHAINS
This stochastic matrix is not irreducible, since, for example, rich is not accessible from poor
Letโs confirm this
Out[13]: False
In [14]: mc.communication_classes
It might be clear to you already that irreducibility is going to be important in terms of long
run outcomes
For example, poverty is a life sentence in the second graph but not the first
Weโll come back to this a bit later
26.6.2 Aperiodicity
Loosely speaking, a Markov chain is called periodic if it cycles in a predictible way, and aperi-
odic otherwise
Hereโs a trivial example with three states
mc = qe.MarkovChain(P)
mc.period
Out[15]: 3
More formally, the period of a state ๐ฅ is the greatest common divisor of the set of integers
In the last example, ๐ท(๐ฅ) = {3, 6, 9, โฆ} for every state ๐ฅ, so the period is 3
A stochastic matrix is called aperiodic if the period of every state is 1, and periodic other-
wise
For example, the stochastic matrix associated with the transition probabilities below is peri-
odic because, for example, state ๐ has period 2
mc = qe.MarkovChain(P)
mc.period
Out[16]: 2
In [17]: mc.is_aperiodic
Out[17]: False
As seen in Eq. (4), we can shift probabilities forward one unit of time via postmultiplication
by ๐
Some distributions are invariant under this updating process โ for example,
โข For example, if ๐ is the identity matrix, then all distributions are stationary
Since stationary distributions are long run equilibria, to get uniqueness we require that initial
conditions are not infinitely persistent
Infinite persistence of initial conditions occurs if certain regions of the state space cannot be
accessed from other regions, which is the opposite of irreducibility
This gives some intuition for the following fundamental theorem
Theorem. If ๐ is both aperiodic and irreducible, then
26.7.1 Example
Recall our model of employment/unemployment dynamics for a given worker discussed above
Assuming ๐ผ โ (0, 1) and ๐ฝ โ (0, 1), the uniform ergodicity condition is satisfied
Let ๐โ = (๐, 1 โ ๐) be the stationary distribution, so that ๐ corresponds to unemployment
(state 0)
26.7. STATIONARY DISTRIBUTIONS 441
๐ฝ
๐=
๐ผ+๐ฝ
This is, in some sense, a steady state probability of unemployment โ more on interpretation
below
Not surprisingly it tends to zero as ๐ฝ โ 0, and to one as ๐ผ โ 0
As discussed above, a given Markov matrix ๐ can have many stationary distributions
That is, there can be many row vectors ๐ such that ๐ = ๐๐
In fact if ๐ has two distinct stationary distributions ๐1 , ๐2 then it has infinitely many, since
in this case, as you can verify,
๐3 โถ= ๐๐1 + (1 โ ๐)๐2
Part 2 of the Markov chain convergence theorem stated above tells us that the distribution of
๐๐ก converges to the stationary distribution regardless of where we start off
This adds considerable weight to our interpretation of ๐โ as a stochastic steady state
The convergence in the theorem is illustrated in the next figure
mc = qe.MarkovChain(P)
ฯ_star = mc.stationary_distributions[0]
ax.scatter(ฯ_star[0], ฯ_star[1], ฯ_star[2], c='k', s=60)
plt.show()
Here
The code for the figure can be found here โ you might like to try experimenting with differ-
ent initial conditions
26.8 Ergodicity
1 ๐
โ 1{๐๐ก = ๐ฅ} โ ๐โ (๐ฅ) as ๐ โ โ (7)
๐ ๐ก=1
Here
The result tells us that the fraction of time the chain spends at state ๐ฅ converges to ๐โ (๐ฅ) as
time goes to infinity
This gives us another way to interpret the stationary distribution โ provided that the con-
vergence result in Eq. (7) is valid
The convergence in Eq. (7) is a special case of a law of large numbers result for Markov
chains โ see EDTC, section 4.3.4 for some additional information
26.8.1 Example
๐ฝ
๐=
๐ผ+๐ฝ
E[โ(๐๐ก )] (8)
E[โ(๐๐ก+๐ ) โฃ ๐๐ก = ๐ฅ] (9)
where
โ(๐ฅ1 )
โ=โ
โ โฎ โ
โ
โ โ(๐ฅ๐ ) โ
The unconditional expectation Eq. (8) is easy: We just sum over the distribution of ๐๐ก to get
E[โ(๐๐ก )] = ๐๐ ๐ก โ
For the conditional expectation Eq. (9), we need to sum over the conditional distribution of
๐๐ก+๐ given ๐๐ก = ๐ฅ
We already know that this is ๐ ๐ (๐ฅ, โ
), so
โ
E [โ ๐ฝ ๐ โ(๐๐ก+๐ ) โฃ ๐๐ก = ๐ฅ] = [(๐ผ โ ๐ฝ๐ )โ1 โ](๐ฅ)
๐=0
26.10. EXERCISES 445
where
(๐ผ โ ๐ฝ๐ )โ1 = ๐ผ + ๐ฝ๐ + ๐ฝ 2 ๐ 2 + โฏ
26.10 Exercises
26.10.1 Exercise 1
According to the discussion above, if a workerโs employment dynamics obey the stochastic
matrix
1โ๐ผ ๐ผ
๐ =( )
๐ฝ 1โ๐ฝ
with ๐ผ โ (0, 1) and ๐ฝ โ (0, 1), then, in the long-run, the fraction of time spent unemployed
will be
๐ฝ
๐ โถ=
๐ผ+๐ฝ
In other words, if {๐๐ก } represents the Markov chain for employment, then ๐ฬ ๐ โ ๐ as ๐ โ
โ, where
1 ๐
๐ฬ ๐ โถ= โ 1{๐๐ก = 0}
๐ ๐ก=1
(You donโt need to add the fancy touches to the graphโsee the solution if youโre interested)
26.10.2 Exercise 2
Now letโs think about which pages are likely to be important, in the sense of being valuable
to a search engine user
One possible criterion for the importance of a page is the number of inbound links โ an indi-
cation of popularity
By this measure, m and j are the most important pages, with 5 inbound links each
However, what if the pages linking to m, say, are not themselves important?
Thinking this way, it seems appropriate to weight the inbound nodes by relative importance
The PageRank algorithm does precisely this
A slightly simplified presentation that captures the basic idea is as follows
Letting ๐ be (the integer index of) a typical page and ๐๐ be its ranking, we set
๐๐
๐๐ = โ
๐โ๐ฟ๐
โ๐
where
This is a measure of the number of inbound links, weighted by their own ranking (and nor-
malized by 1/โ๐ )
There is, however, another interpretation, and it brings us back to Markov chains
Let ๐ be the matrix given by ๐ (๐, ๐) = 1{๐ โ ๐}/โ๐ where 1{๐ โ ๐} = 1 if ๐ has a link to ๐
and zero otherwise
The matrix ๐ is a stochastic matrix provided that each page has at least one link
448 26. FINITE MARKOV CHAINS
๐๐ ๐
๐๐ = โ = โ 1{๐ โ ๐} ๐ = โ ๐ (๐, ๐)๐๐
๐โ๐ฟ๐
โ๐ all ๐
โ๐ all ๐
Thus, motion from page to page is that of a web surfer who moves from one page to another
by randomly clicking on one of the links on that page
Here โrandomโ means that each link is selected with equal probability
Since ๐ is the stationary distribution of ๐ , assuming that the uniform ergodicity condition is
valid, we can interpret ๐๐ as the fraction of time that a (very persistent) random surfer spends
at page ๐
Your exercise is to apply this ranking algorithm to the graph pictured above and return the
list of pages ordered by rank
The data for this graph is in the web_graph_data.txt file โ you can also view it here
There is a total of 14 nodes (i.e., web pages), the first named a and the last named n
A typical line from the file has the form
d -> h;
In [21]: import re
When you solve for the ranking, you will find that the highest ranked node is in fact g, while
the lowest is a
26.10. EXERCISES 449
26.10.3 Exercise 3
๐๐ข2
๐๐ฆ2 โถ=
1 โ ๐2
Tauchenโs method [128] is the most common method for approximating this continuous state
process with a finite state Markov chain
A routine for this already exists in QuantEcon.py but letโs write our own version as an exer-
cise
As a first step, we choose
Next, we create a state space {๐ฅ0 , โฆ , ๐ฅ๐โ1 } โ R and a stochastic ๐ ร ๐ matrix ๐ such that
โข ๐ฅ0 = โ๐ ๐๐ฆ
โข ๐ฅ๐โ1 = ๐ ๐๐ฆ
โข ๐ฅ๐+1 = ๐ฅ๐ + ๐ where ๐ = (๐ฅ๐โ1 โ ๐ฅ0 )/(๐ โ 1)
Let ๐น be the cumulative distribution function of the normal distribution ๐ (0, ๐๐ข2 )
The values ๐ (๐ฅ๐ , ๐ฅ๐ ) are computed to approximate the AR(1) process โ omitting the deriva-
tion, the rules are as follows:
1. If ๐ = 0, then set
1. If ๐ = ๐ โ 1, then set
1. Otherwise, set
450 26. FINITE MARKOV CHAINS
26.11 Solutions
26.11.1 Exercise 1
Compute the fraction of time that the worker spends unemployed, and compare it to the sta-
tionary probability
In [24]: ฮฑ = ฮฒ = 0.1
N = 10000
p = ฮฒ / (ฮฑ + ฮฒ)
ax.legend(loc='upper right')
plt.show()
26.11. SOLUTIONS 451
26.11.2 Exercise 2
First, save the data into a file called web_graph_data.txt by executing the next cell
m -> g;
n -> c;
n -> j;
n -> m;
Writing web_graph_data.txt
In [26]: """
Return list of pages, ordered by rank
"""
import numpy as np
from operator import itemgetter
infile = 'web_graph_data.txt'
alphabet = 'abcdefghijklmnopqrstuvwxyz'
Rankings
***
g: 0.1607
j: 0.1594
m: 0.1195
n: 0.1088
k: 0.09106
b: 0.08326
e: 0.05312
i: 0.05312
c: 0.04834
h: 0.0456
l: 0.03202
d: 0.03056
f: 0.01164
a: 0.002911
26.11.3 Exercise 3
[1] Hint: First show that if ๐ and ๐ are stochastic matrices then so is their product โ to
check the row sums, try post multiplying by a column vector of ones. Finally, argue that ๐ ๐
is a stochastic matrix using induction.
454 26. FINITE MARKOV CHAINS
27
27.1 Contents
โข Overview 27.2
โข Stability 27.5
โข Exercises 27.6
โข Solutions 27.7
โข Appendix 27.8
In addition to whatโs in Anaconda, this lecture will need the following libraries
27.2 Overview
In a previous lecture, we learned about finite Markov chains, a relatively elementary class of
stochastic dynamic models
The present lecture extends this analysis to continuous (i.e., uncountable) state Markov
chains
Most stochastic dynamic models studied by economists either fit directly into this class or can
be represented as continuous state Markov chains after minor modifications
In this lecture, our focus will be on continuous Markov models that
โข evolve in discrete-time
โข are often nonlinear
The fact that we accommodate nonlinear models here is significant, because linear stochastic
models have their own highly developed toolset, as weโll see later on
455
456 27. CONTINUOUS STATE MARKOV CHAINS
The question that interests us most is: Given a particular stochastic dynamic model, how will
the state of the system evolve over time?
In particular,
Answering these questions will lead us to revisit many of the topics that occupied us in the
finite state case, such as simulation, distribution dynamics, stability, ergodicity, etc.
Note
For some people, the term โMarkov chainโ always refers to a process with a finite
or discrete state space. We follow the mainstream mathematical literature (e.g.,
[95]) in using the term to refer to any discrete time Markov process
You are probably aware that some distributions can be represented by densities and some
cannot
(For example, distributions on the real numbers R that put positive probability on individual
points have no density representation)
We are going to start our analysis by looking at Markov chains where the one-step transition
probabilities have density representations
The benefit is that the density case offers a very direct parallel to the finite case in terms of
notation and intuition
Once weโve built some intuition weโll cover the general case
In our lecture on finite Markov chains, we studied discrete-time Markov chains that evolve on
a finite state space ๐
In this setting, the dynamics of the model are described by a stochastic matrix โ a nonnega-
tive square matrix ๐ = ๐ [๐, ๐] such that each row ๐ [๐, โ
] sums to one
The interpretation of ๐ is that ๐ [๐, ๐] represents the probability of transitioning from state ๐
to state ๐ in one unit of time
In symbols,
P{๐๐ก+1 = ๐ | ๐๐ก = ๐} = ๐ [๐, ๐]
Equivalently,
27.3. THE DENSITY CASE 457
(As you probably recall, when using NumPy arrays, ๐ [๐, โ
] is expressed as P[i, :])
In this section, weโll allow ๐ to be a subset of R, such as
โข R itself
โข the positive reals (0, โ)
โข a bounded interval (๐, ๐)
The family of discrete distributions ๐ [๐, โ
] will be replaced by a family of densities ๐(๐ฅ, โ
), one
for each ๐ฅ โ ๐
Analogous to the finite state case, ๐(๐ฅ, โ
) is to be understood as the distribution (density) of
๐๐ก+1 given ๐๐ก = ๐ฅ
More formally, a stochastic kernel on ๐ is a function ๐ โถ ๐ ร ๐ โ R with the property that
1 (๐ฆ โ ๐ฅ)2
๐๐ค (๐ฅ, ๐ฆ) โถ= โ exp {โ } (1)
2๐ 2