QuantEconlectures Python3
QuantEconlectures Python3
lectures-python3 PDF
Release 2018-Sep-29
This pdf presents a series of lectures on quantitative economic modeling, designed and written
by Thomas J. Sargent and John Stachurski. The primary programming languages are Python and
Julia. You can send feedback to the authors via [email protected].
Note: You are currently viewing an automatically generated pdf version of our online
lectures, which are located at
https://lectures.quantecon.org
Please visit the website for more information on the aims and scope of the lectures and the two
language options (Julia or Python).
Due to automatic generation of this pdf, presentation quality is likely to be lower than that
of the website.
i
ii
CONTENTS
1 Introduction to Python 1
1.1 About Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Setting up Your Python Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.3 An Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.4 Python Essentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
1.5 OOP I: Introduction to Object Oriented Programming . . . . . . . . . . . . . . . . . . . . 78
iii
6 Dynamic Programming 485
6.1 Shortest Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
6.2 Job Search I: The McCall Search Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
6.3 Job Search II: Search and Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
6.4 A Problem that Stumped Milton Friedman . . . . . . . . . . . . . . . . . . . . . . . . . . 518
6.5 Job Search III: Search with Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
6.6 Job Search IV: Modeling Career Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
6.7 Job Search V: On-the-Job Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
6.8 Optimal Growth I: The Stochastic Optimal Growth Model . . . . . . . . . . . . . . . . . . 583
6.9 Optimal Growth II: Time Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602
6.10 Optimal Growth III: The Endogenous Grid Method . . . . . . . . . . . . . . . . . . . . . . 621
6.11 LQ Dynamic Programming Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 630
6.12 Optimal Savings I: The Permanent Income Model . . . . . . . . . . . . . . . . . . . . . . 662
6.13 Optimal Savings II: LQ Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 679
6.14 Consumption and Tax Smoothing with Complete and Incomplete Markets . . . . . . . . . . 697
6.15 Optimal Savings III: Occasionally Binding Constraints . . . . . . . . . . . . . . . . . . . . 716
6.16 Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
6.17 Discrete State Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755
iv
9.7 Fiscal Risk and Government Debt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246
9.8 Competitive Equilibria of Chang Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1272
9.9 Credible Government Policies in Chang Model . . . . . . . . . . . . . . . . . . . . . . . . 1303
Bibliography 1347
v
vi
CHAPTER
ONE
INTRODUCTION TO PYTHON
This first part of the course provides a relatively fast-paced introduction to the Python programming language
Contents
• About Python
– Overview
– Whats Python?
– Scientific Programming
– Learn More
1.1.1 Overview
Python is a general purpose programming language conceived in 1989 by Dutch programmer Guido van
Rossum
1
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
Python is free and open source, with development coordinated through the Python Software Foundation
Python has experienced rapid adoption in the last decade, and is now one of the most popular programming
languages
Common Uses
Relative Popularity
The following chart, produced using Stack Overflow Trends, shows one measure of the relative popularity
of Python
The figure indicates not only that Python is widely used but also that adoption of Python has accelerated
significantly since 2012
We suspect this is driven at least in part by uptake in the scientific domain, particularly in rapidly growing
fields like data science
For example, the popularity of pandas, a library for data analysis with Python has exploded, as seen here
(The corresponding time path for MATLAB is shown for comparison)
Note that pandas takes off in 2012, which is the same year that we seek Pythons popularity begin to spike in
the first figure
Overall, its clear that
• Python is one of the most popular programming languages worldwide
• Python is a major tool for scientific computing, accounting for a rapidly rising share of scientific work
around the globe
Features
One nice feature of Python is its elegant syntax well see many examples later on
Elegant code might sound superfluous but in fact its highly beneficial because it makes the syntax easy to
read and easy to remember
Remembering how to read from files, sort dictionaries and other such routine tasks means that you dont need
to break your flow in order to hunt down correct syntax
Closely related to elegant syntax is elegant design
Features like iterators, generators, decorators, list comprehensions, etc. make Python highly expressive,
allowing you to get more done with less code
Namespaces improve productivity by cutting down on bugs and syntax errors
Numerical programming
Fundamental matrix and array processing capabilities are provided by the excellent NumPy library
NumPy provides the basic array data type plus some simple processing operations
For example, lets build some arrays
b @ c
1.5265566588595902e-16
The number you see here might vary slightly but its essentially zero
(For older versions of Python and NumPy you need to use the np.dot function)
The SciPy library is built on top of NumPy and provides additional functionality
∫2
For example, lets calculate −2 ϕ(z)dz where ϕ is the standard normal density
= norm()
value, error = quad(.pdf, -2, 2) # Integrate using Gaussian quadrature
value
0.9544997361036417
Graphics
The most popular and comprehensive Python library for creating figures and graphs is Matplotlib
• Plots, histograms, contour images, 3D, bar charts, etc., etc.
• Output in many formats (PDF, PNG, EPS, etc.)
• LaTeX integration
Example 2D plot with embedded LaTeX annotations
Example 3D plot
Symbolic Algebra
3*x + y
expression = (x + y)**2
expression.expand()
solve polynomials
solve(x**2 + x + 2)
limit(1 / x, x, 0)
oo
limit(sin(x) / x, x, 0)
diff(sin(x), x)
cos(x)
The beauty of importing this functionality into Python is that we are working within a fully fledged pro-
gramming language
Can easily create tables of derivatives, generate LaTeX output, add it to figures, etc., etc.
Statistics
Pythons data manipulation and statistics libraries have improved rapidly over the last few years
Pandas
One of the most popular libraries for working with data is pandas
Pandas is fast, efficient, flexible and well designed
Heres a simple example, using some fake data
import pandas as pd
np.random.seed(1234)
price weight
2010-12-28 0.471435 -1.190976
2010-12-29 1.432707 -0.312652
2010-12-30 -0.720589 0.887163
2010-12-31 0.859588 -0.636524
2011-01-01 0.015696 -2.242685
df.mean()
price 0.411768
weight -0.699135
dtype: float64
import networkx as nx
import matplotlib.pyplot as plt
np.random.seed(1234)
Cloud Computing
Running your Python code on massive servers in the cloud is becoming easier and easier
A nice example is Anaconda Enterprise
See also
• Amazon Elastic Compute Cloud
• The Google App Engine (Python, Java, PHP or Go)
• Pythonanywhere
• Sagemath Cloud
Parallel Processing
Apart from the cloud computing options listed above, you might like to consider
• Parallel computing through IPython clusters
• The Starcluster interface to Amazons EC2
• GPU programming through PyCuda, PyOpenCL, Theano or similar
Other Developments
There are many other interesting developments with scientific programming in Python
Some representative examples include
• Jupyter Python in your browser with code cells, embedded images, etc.
• Numba Make Python run at the same speed as native machine code!
• Blaze a generalization of NumPy
• PyTables manage large data sets
• CVXPY convex optimization in Python
Contents
1.2.1 Overview
1.2.2 Anaconda
The core Python package is easy to install but not what you should choose for these lectures
These lectures require the entire scientific programming ecosystem, which
• the core installation doesnt provide
• is painful to install one piece at a time
Hence the best approach for our purposes is to install a free Python distribution that contains
1. the core Python language and
2. the most popular scientific libraries
The best such distribution is Anaconda
Anaconda is
• very popular
• cross platform
• comprehensive
• completely unrelated to the Nicki Minaj song of the same name
Anaconda also comes with a great package management system to organize your code libraries
All of what follows assumes that you adopt this recommendation!
Installing Anaconda
Installing Anaconda is straightforward: download the binary and follow the instructions
Important points:
• Install the latest version
• If you are asked during the installation process whether youd like to make Anaconda your default
Python installation, say yes
• Otherwise you can accept all of the defaults
Well be using your browser to interact with Python, so now might be a good time to
1. update your browser, or
2. install a free modern browser such as Chrome or Firefox
Jupyter notebooks are one of the many possible ways to interact with Python and the scientific libraries
They use a browser-based interface to Python with
• The ability to write and execute Python commands
• Formatted output in the browser, including tables, figures, animation, etc.
• The option to mix in formatted text and mathematical expressions
Because of these possibilities, Jupyter is fast turning into a major player in the scientific computing ecosys-
tem
Heres an image of showing execution of some code (borrowed from here) in a Jupyter notebook
You can find a nice example of the kinds of things you can do in a Jupyter notebook (such as include maths
and text) here
Further examples can be found at QuantEcons notebook archive or the NB viewer site
While Jupyter isnt the only way to code in Python, its great for when you wish to
Once you have installed Anaconda, you can start the Jupyter notebook
Either
• search for Jupyter in your applications menu, or
• open up a terminal and type jupyter notebook
– Windows users should substitute Anaconda command prompt for terminal in the previous line
If you use the second option, you will see something like this (click to enlarge)
Hopefully your default browser has also opened up with a web page that looks something like this (click to
enlarge)
The notebook displays an active cell, into which you can type Python commands
Notebook Basics
Lets start with how to edit code and run simple programs
Running Cells
Notice that in the previous figure the cell is surrounded by a green border
This means that the cell is in edit mode
As a result, you can type in Python code and it will appear in the cell
When youre ready to execute the code in a cell, hit Shift-Enter instead of the usual Enter
(Note: There are also menu and button options for running code in a cell that you can find by exploring)
Modal Editing
The next thing to understand about the Jupyter notebook is that it uses a modal editing system
This means that the effect of typing at the keyboard depends on which mode you are in
The two modes are
1. Edit mode
Python 3 introduced support for unicode characters, allowing the use of characters such as α and β in your
code
Unicode characters can be typed quickly in Jupyter using the tab key
Try creating a new code cell and typing \alpha, then hitting the tab key on your keyboard
A Test Program
N = 20
θ = np.linspace(0.0, 2 * np.pi, N, endpoint=False)
radii = 10 * np.random.rand(N)
width = np.pi / 4 * np.random.rand(N)
ax = plt.subplot(111, polar=True)
bars = ax.bar(θ, radii, width=width, bottom=0.0)
plt.show()
Dont worry about the details for now lets just run it and see what happens
The easiest way to run this code is to copy and paste into a cell in the notebook
You should see something like this
(In older versions of Jupyter you might need to add the command %matplotlib inline before you
generate the figure)
Tab Completion
On-Line Help
Clicking in the top right of the lower split closes the on-line help
Other Content
In addition to executing code, the Jupyter notebook allows you to embed text, equations, figures and even
videos in the page
For example, here we enter a mixture of plain text and LaTeX instead of code
Next we Esc to enter command mode and then type m to indicate that we are writing Markdown, a mark-up
language similar to (but simpler than) LaTeX
(You can also use your mouse to select Markdown from the Code drop-down box just below the list of
menu items)
Now we Shift+Enter to produce this
Sharing Notebooks
Notebook files are just text files structured in JSON and typically ending with .ipynb
You can share them in the usual way that you share files or by using web services such as nbviewer
The notebooks you see on that site are static html representations
To run one, download it as an ipynb file by clicking on the download icon at the top right
Save it somewhere, navigate to it from the Jupyter dashboard and then run as discussed above
1.2.4 QuantEcon.py
In these lectures well make extensive use of code from the QuantEcon organization
On the Python side well be using the QuantEcon.py version
This code has been organized into a Python package
• A Python package is a software library that has been bundled for distribution
• Hosted Python packages can be found through channels like Anaconda and PyPi
You can install QuantEcon.py by starting Jupyter and typing
!pip install quantecon
into a cell
Alternatively, you can type the following into a terminal
pip install quantecon
More instructions can be found on the library page
Note: The QuantEcon.py package can also be installed using conda by:
For these lectures to run without error you need to keep your software up to date
Updating Anaconda
Anaconda supplies a tool called conda to manage and upgrade your Anaconda packages
One conda command you should execute regularly is the one that updates the whole Anaconda distribution
As a practice run, please execute the following
1. Open up a terminal
2. Type conda update anaconda
For more information on conda, type conda help in a terminal
Updating QuantEcon.py
Or open up Jupyter and type the same thing in a notebook cell with ! in front of it
Method 2: Run
Using the run command is often easier than copy and paste
• For example, %run test.py will run the file test.py
(You might find that the % is unnecessary use %automagic to toggle the need for %)
Note that Jupyter only looks for test.py in the present working directory (PWD)
If test.py isnt in that directory, you will get an error
Lets look at a successful example, where we run a file test.py with contents:
for i in range(5):
print('foobar')
Here
• pwd asks Jupyter to show the PWD (or %pwd see the comment about automagic above)
– This is where Jupyter is going to look for files to run
– Your output will look a bit different depending on your OS
• ls asks Jupyter to list files in the PWD (or %ls)
– Note that test.py is there (on our computer, because we saved it there earlier)
• cat test.py asks Jupyter to print the contents of test.py (or !type test.py on Windows)
• run test.py runs the file and prints any output
If youre trying to run a file not in the present working director, youll get an error
To fix this error you need to either
1. Shift the file into the PWD, or
2. Change the PWD to where the file lives
One way to achieve the first option is to use the Upload button
• The button is on the top level dashboard, where Jupyter first opened to
• Look where the pointer is in this picture
Loading Files
Its often convenient to be able to see your code before you run it
Saving Files
The preceding discussion covers most of what you need to know to interact with this website
However, as you start to write longer programs, you might want to experiment with your workflow
There are many different options and we mention them only in passing
JupyterLab
Text Editors
A text editor is an application that is specifically designed to work with text files such as Python programs
Nothing beats the power and efficiency of a good text editor for working with program text
A good text editor will provide
• efficient text editing commands (e.g., copy, paste, search and replace)
• syntax highlighting, etc.
Among the most popular are Sublime Text and Atom
For a top quality open source text editor with a steeper learning curve, try Emacs
If you want an outstanding free text editor and dont mind a seemingly vertical learning curve plus long days
of pain and suffering while all your neural pathways are rewired, try Vim
The IPython shell has many of the features of the notebook: tab completion, color syntax, etc.
It also has command history through the arrow key
The up arrow key to brings previously typed commands to the prompt
This saves a lot of typing
Heres one set up, on a Linux box, with
• a file being edited in Vim
• An IPython shell next to it, to run the file
IDEs
IDEs are Integrated Development Environments, which allow you to edit, execute and interact with code
from an integrated environment
One of the most popular in recent times is VS Code, which is now available via Anaconda
We hear good things about VS Code please tell us about your experiences on the forum
1.2.8 Exercises
Exercise 1
If Jupyter is still running, quit by using Ctrl-C at the terminal where you started it
Now launch again, but this time using jupyter notebook --no-browser
This should start the kernel without launching the browser
Note also the startup message: It should give you a URL such as http://localhost:8888 where the
notebook is running
Now
1. Start your browser or open a new tab if its already running
2. Enter the URL from above (e.g. http://localhost:8888) in the address bar at the top
You should now be able to run a standard Jupyter notebook session
This is an alternative way to start the notebook that can also be handy
Exercise 2
Contents
• An Introductory Example
– Overview
– The Task: Plotting a White Noise Process
– Version 1
– Alternative Versions
– Exercises
– Solutions
1.3.1 Overview
In this lecture we will write and then pick apart small Python programs
The objective is to introduce you to basic Python syntax and data structures
Deeper concepts will be covered in later lectures
Prerequisites
Suppose we want to simulate and plot the white noise process ϵ0 , ϵ1 , . . . , ϵT , where each draw ϵt is indepen-
dent standard normal
In other words, we want to generate figures that look something like this:
1.3.3 Version 1
import numpy as np
import matplotlib.pyplot as plt
x = np.random.randn(100)
plt.plot(x)
plt.show()
Import Statements
import numpy as np
np.sqrt(4)
2.0
import numpy
numpy.sqrt(4)
2.0
Packages
In fact you can find and explore the directory for NumPy on your computer easily enough if you look around
On this machine its located in
anaconda3/lib/python3.6/site-packages/numpy
Subpackages
import numpy as np
np.sqrt(4)
2.0
sqrt(4)
2.0
ts_length = 100
_values = [] # Empty list
for i in range(ts_length):
e = np.random.randn()
_values.append(e)
plt.plot(_values)
plt.show()
In brief,
• The first pair of lines import functionality as before
• The next line sets the desired length of the time series
• The next line creates an empty list called _values that will store the ϵt values as we generate them
• The next three lines are the for loop, which repeatedly draws a new random number ϵt and appends it
to the end of the list _values
• The last two lines generate the plot and display it to the user
Lets study some parts of this program in more detail
Lists
list
The first element of x is an integer, the next is a string and the third is a Boolean value
When adding a value to a list, we can use the syntax list_name.append(some_value)
x.append(2.5)
x
Here append() is whats called a method, which is a function attached to an objectin this case, the list x
Well learn all about methods later on, but just to give you some idea,
• Python objects such as lists, strings, etc. all have methods that are used to manipulate the data con-
tained in the object
• String objects have string methods, list objects have list methods, etc.
Another useful list method is pop()
x.pop()
2.5
x[0]
10
x[1]
'foo'
Now lets consider the for loop from the program above, which was
for i in range(ts_length):
e = np.random.randn()
_values.append(e)
Python executes the two indented lines ts_length times before moving on
These two lines are called a code block, since they comprise the block of code that we are looping over
Unlike most other languages, Python knows the extent of the code block only from indentation
In our program, indentation decreases after line _values.append(e), telling Python that this line marks
the lower limit of the code block
More on indentation belowfor now lets look at another example of a for loop
animals = ['dog', 'cat', 'bird']
for animal in animals:
print("The plural of " + animal + " is " + animal + "s")
If you put this in a text file or Jupyter cell and run it you will see
The plural of dog is dogs
The plural of cat is cats
The plural of bird is birds
This example helps to clarify how the for loop works: When we execute a loop of the form
In discussing the for loop, we explained that the code blocks being looped over are delimited by indentation
In fact, in Python all code blocks (i.e., those occurring inside loops, if clauses, function definitions, etc.) are
delimited by indentation
Thus, unlike most other languages, whitespace in Python code affects the output of the program
Once you get used to it, this is a good thing: It
• forces clean, consistent indentation, improving readability
• removes clutter, such as the brackets or end statements used in other languages
On the other hand, it takes a bit of care to get right, so please remember:
• The line before the start of a code block always ends in a colon
– for i in range(10):
– if x > y:
– while x < 100:
– etc., etc.
• All lines in a code block must have the same amount of indentation
• The Python standard is 4 spaces, and thats what you should use
Tabs vs Spaces
One small gotcha here is the mixing of tabs and spaces, which often leads to errors
(Important: Within text files, the internal representation of tabs and spaces is not the same)
You can use your Tab key to insert 4 spaces, but you need to make sure its configured to do so
If you are using a Jupyter notebook you will have no problems here
Also, good text editors will allow you to configure the Tab key to insert spaces instead of tabs trying
searching on line
While Loops
The for loop is the most common technique for iteration in Python
But, for the purpose of illustration, lets modify the program above to use a while loop instead
ts_length = 100
_values = []
i = 0
while i < ts_length:
e = np.random.randn()
_values.append(e)
i = i + 1
plt.plot(_values)
plt.show()
Note that
• the code block for the while loop is again delimited only by indentation
• the statement i = i + 1 can be replaced by i += 1
User-Defined Functions
Now lets go back to the for loop, but restructure our program to make the logic clearer
To this end, we will break our program into two parts:
1. A user-defined function that generates a list of random variables
2. The main part of the program that
(a) calls this function to get data
(b) plots the data
This is accomplished in the next program
def generate_data(n):
_values = []
for i in range(n):
e = np.random.randn()
_values.append(e)
return _values
data = generate_data(100)
plt.plot(data)
plt.show()
Lets go over this carefully, in case youre not familiar with functions and how they work
We have defined a function called generate_data() as follows
• def is a Python keyword used to start function definitions
• def generate_data(n): indicates that the function is called generate_data, and that it
has a single argument n
• The indented code is a code block called the function bodyin this case it creates an iid list of random
draws using the same logic as before
• The return keyword indicates that _values is the object that should be returned to the calling
code
This whole function definition is read by the Python interpreter and stored in memory
When the interpreter gets to the expression generate_data(100), it executes the function body with n
set equal to 100
The net result is that the name data is bound to the list _values returned by the function
Conditions
Hopefully the syntax of the if/else clause is self-explanatory, with indentation again delimiting the extent of
the code blocks
Notes
• We are passing the argument U as a string, which is why we write it as 'U'
• Notice that equality is tested with the == syntax, not =
– For example, the statement a = 10 assigns the name a to the value 10
– The expression a == 10 evaluates to either True or False, depending on the value of a
Now, there are several ways that we can simplify the code above
For example, we can get rid of the conditionals all together by just passing the desired generator type as a
function
To understand this, consider the following version
Now, when we call the function generate_data(), we pass np.random.uniform as the second
argument
This object is a function
When the function call generate_data(100, np.random.uniform) is executed, Python runs the
function code block with n equal to 100 and the name generator_type bound to the function np.
random.uniform
• While these lines are executed, the names generator_type and np.random.uniform are
synonyms, and can be used in identical ways
This principle works more generallyfor example, consider the following piece of code
m = max
m(7, 2, 4)
Here we created another name for the built-in function max(), which could then be used in identical ways
In the context of our program, the ability to bind new names to functions means that there is no problem
passing a function as an argument to another functionas we did above
List Comprehensions
We can also simplify the code for generating the list of random draws considerably by using something
called a list comprehension
List comprehensions are an elegant Python tool for creating lists
Consider the following example, where the list comprehension is on the right-hand side of the second line
animals = ['dog', 'cat', 'bird']
plurals = [animal + 's' for animal in animals]
plurals
[0, 1, 2, 3, 4, 5, 6, 7]
into
_values = [generator_type() for i in range(n)]
1.3.5 Exercises
Exercise 1
There are functions to compute this in various modules, but lets write our own version as an exercise
In particular, write a function factorial such that factorial(n) returns n! for any positive integer n
Exercise 2
The binomial random variable Y ∼ Bin(n, p) represents the number of successes in n binary trials, where
each trial succeeds with probability p
Without any import besides from numpy.random import uniform, write a function
binomial_rv such that binomial_rv(n, p) generates one draw of Y
Hint: If U is uniform on (0, 1) and p ∈ (0, 1), then the expression U < p evaluates to True with probability
p
Exercise 3
Exercise 4
Write a program that prints one realization of the following random device:
• Flip an unbiased coin 10 times
• If 3 consecutive heads occur one or more times within this sequence, pay one dollar
• If not, pay nothing
Use no import besides from numpy.random import uniform
Exercise 5
Your next task is to simulate and plot the correlated time series
xt+1 = α xt + ϵt+1 where x0 = 0 and t = 0, . . . , T
The sequence of shocks {ϵt } is assumed to be iid and standard normal
In your solution, restrict your import statements to
import numpy as np
import matplotlib.pyplot as plt
Exercise 6
To do the next exercise, you will need to know how to produce a plot legend
The following example should be sufficient to convey the idea
import numpy as np
import matplotlib.pyplot as plt
Now, starting with your solution to exercise 5, plot three simulated time series, one for each of the cases
α = 0, α = 0.8 and α = 0.98
In particular, you should produce (modulo randomness) a figure that looks as follows
(The figure nicely illustrates how time series with the same one-step-ahead conditional volatilities, as these
three processes have, can have very different unconditional volatilities.)
Use a for loop to step through the α values
Important hints:
• If you call the plot() function multiple times before calling show(), all of the lines you produce
will end up on the same figure
– And if you omit the argument 'b-' to the plot function, Matplotlib will automatically select
different colors for each line
• The expression 'foo' + str(42) evaluates to 'foo42'
1.3.6 Solutions
Exercise 1
def factorial(n):
k = 1
for i in range(n):
k = k * (i + 1)
return k
factorial(4)
24
Exercise 2
binomial_rv(10, 0.5)
Exercise 3
n = 100000
count = 0
for i in range(n):
u, v = np.random.uniform(), np.random.uniform()
d = np.sqrt((u - 0.5)**2 + (v - 0.5)**2)
if d < 0.5:
count += 1
area_estimate = count / n
3.1496
Exercise 4
payoff = 0
count = 0
for i in range(10):
U = uniform()
count = count + 1 if U < 0.5 else 0
if count == 3:
payoff = 1
print(payoff)
Exercise 5
The next line embeds all subsequent figures in the browser itself
α = 0.9
ts_length = 200
current_x = 0
x_values = []
for i in range(ts_length + 1):
x_values.append(current_x)
current_x = α * current_x + np.random.randn()
plt.plot(x_values)
plt.show()
Exercise 6
for α in αs:
x_values = []
current_x = 0
for i in range(ts_length):
x_values.append(current_x)
current_x = α * current_x + np.random.randn()
plt.plot(x_values, label=f'α = {α}')
plt.legend()
plt.show()
Contents
• Python Essentials
– Data Types
– Input and Output
– Iterating
– Comparisons and Logical Operators
– More Functions
– Coding Style and PEP8
– Exercises
– Solutions
In this lecture well cover features of the language that are essential to reading and writing Python code
Weve already met several built in Python data types, such as strings, integers, floats and lists
Lets learn a bit more about them
One simple data type is Boolean values, which can be either True or False
x = True
x
True
In the next line of code, the interpreter evaluates the expression on the right of = and binds y to this value
y = 100 < 10
y
False
type(y)
bool
x + y
x * y
True + True
sum(bools)
The two most common data types used to represent numbers are integers and floats
a, b = 1, 2
c, d = 2.5, 10.0
type(a)
int
type(c)
float
Computers distinguish between the two because, while floats are more informative, arithmetic operations on
integers are faster and more accurate
As long as youre using Python 3.x, division of integers yields floats
1 / 2
0.5
But be careful! If youre still using Python 2.x, division of two integers returns only the integer part
For integer division in Python 3.x use this syntax:
1 // 2
x = complex(1, 2)
y = complex(2, 1)
x * y
5j
Containers
Python has several basic types for storing collections of (possibly heterogeneous) data
Weve already discussed lists
('a', 'b')
type(x)
tuple
In Python, an object is called immutable if, once created, the object cannot be changed
Conversely, an object is mutable if it can still be altered after creation
Python lists are mutable
x = [1, 2]
x[0] = 10
x
[10, 2]
x = (1, 2)
x[0] = 10
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<python-input-21-6cb4d74ca096> in <module>()
----> 1 x[0]=10
Well say more about the role of mutable and immutable data a bit later
Tuples (and lists) can be unpacked as follows
10
20
Slice Notation
To access multiple elements of a list or tuple, you can use Pythons slice notation
For example,
a = [2, 4, 6, 8]
a[1:]
[4, 6, 8]
a[1:3]
[4, 6]
[6, 8]
s = 'foobar'
s[-3:] # Select the last three elements
'bar'
Two other container types we should mention before moving on are sets and dictionaries
Dictionaries are much like lists, except that the items are named instead of numbered
dict
d['age']
33
s1 = {'a', 'b'}
type(s1)
set
s2 = {'b', 'c'}
s1.issubset(s2)
False
s1.intersection(s2)
set(['b'])
Lets briefly review reading and writing to text files, starting with writing
Here
• The built-in function open() creates a file object for writing to
• Both write() and close() are methods of file objects
Where is this file that weve created?
Recall that Python maintains a concept of the present working directory (pwd) that can be located from with
Jupyter or IPython via
%pwd
f = open('newfile.txt', 'r')
out = f.read()
out
'Testing\nTesting again'
print(out)
Testing
Testing again
Paths
Note that if newfile.txt is not in the present working directory then this call to open() fails
In this case you can shift the file to the pwd or specify the full path to the file
f = open('insert_full_path_to_file/newfile.txt', 'r')
1.4.3 Iterating
One of the most important tasks in computing is stepping through a sequence of data and performing a given
action
One of Pythons strengths is its simple, flexible interface to this kind of iteration via the for loop
Many Python objects are iterable, in the sense that they can looped over
To give an example, lets write the file us_cities.txt, which lists US cities and their population, to the present
working directory
%%file us_cities.txt
new york: 8244910
los angeles: 3819702
chicago: 2707120
houston: 2145146
philadelphia: 1536471
phoenix: 1469471
san antonio: 1359758
san diego: 1326179
dallas: 1223229
Suppose that we want to make the information more readable, by capitalizing names and adding commas to
mark thousands
The program us_cities.py program reads the data in and makes the conversion:
Here format() is a string method used for inserting variables into strings
The output is as follows
The reformatting of each line is the result of three different string methods, the details of which can be left
till later
The interesting part of this program for us is line 2, which shows that
1. The file object f is iterable, in the sense that it can be placed to the right of in within a for loop
2. Iteration steps through each line in the file
This leads to the clean, convenient syntax shown in our program
Many other kinds of objects are iterable, and well discuss some of them later on
One thing you might have noticed is that Python tends to favor looping without explicit indexing
For example,
is preferred to
for i in range(len(x_values)):
print(x_values[i] * x_values[i])
When you compare these two alternatives, you can see why the first one is preferred
The zip() function is also useful for creating dictionaries for example
names = ['Tom', 'John']
marks = ['E', 'F']
dict(zip(names, marks))
If we actually need the index from a list, one option is to use enumerate()
To understand what enumerate() does, consider the following example
letter_list = ['a', 'b', 'c']
for index, letter in enumerate(letter_list):
print(f"letter_list[{index}] = '{letter}'")
Comparisons
Many different kinds of expressions evaluate to one of the Boolean values (i.e., True or False)
A common type is comparisons, such as
x, y = 1, 2
x < y
True
x > y
False
1 < 2 < 3
True
1 <= 2 <= 3
True
x = 1 # Assignment
x == 2 # Comparison
False
1 != 2
True
Note that when testing conditions, we can use any valid Python expression
'yes'
'no'
Combining Expressions
These are the standard logical connectives (conjunction, disjunction and denial)
True
False
True
not True
False
True
Remember
• P and Q is True if both are True, else False
• P or Q is False if both are False, else True
Lets talk a bit more about functions, which are all-important for good programming style
Python has a number of built-in functions that are available without import
We have already met some
max(19, 20)
20
range(0, 4)
[0, 1, 2, 3]
str(22)
'22'
type(22)
int
False
True
User defined functions are important for improving the clarity of your code by
• separating different strands of logic
• facilitating code reuse
(Writing the same thing twice is almost always a bad idea)
The basics of user defined functions were discussed here
def f(x):
if x < 0:
return 'negative'
return 'nonnegative'
Functions without a return statement automatically return the special Python object None
Docstrings
Python has a system for adding comments to functions, modules, etc. called docstrings
The nice thing about docstrings is that they are available at run-time
Try running this
def f(x):
"""
This function squares its argument
"""
return x**2
f?
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Docstring: This function squares its argument
f??
Type: function
String Form:<function f at 0x2223320>
File: /home/john/temp/temp.py
Definition: f(x)
Source:
def f(x):
"""
This function squares its argument
"""
return x**2
With one question mark we bring up the docstring, and with two we get the source code as well
def f(x):
return x**3
and
f = lambda x: x**3
quad(lambda x: x**3, 0, 2)
(4.0, 4.440892098500626e-14)
Here the function created by lambda is said to be anonymous, because it was never given a name
Keyword Arguments
If you did the exercises in the previous lecture, you would have come across the statement
In this call to Matplotlibs plot function, notice that the last argument is passed in name=argument
syntax
This is called a keyword argument, with label being the keyword
Non-keyword arguments are called positional arguments, since their meaning is determined by order
• plot(x, 'b-', label="white noise") is different from plot('b-', x,
label="white noise")
Keyword arguments are particularly useful when a function has a lot of arguments, in which case its hard to
remember the right order
You can adopt keyword arguments in user defined functions with no difficulty
The next example illustrates the syntax
The keyword argument values we supplied in the definition of f become the default values
f(2)
14
To learn more about the Python programming philosophy type import this at the prompt
Among other things, Python strongly favors consistency in programming style
Weve all heard the saying about consistency and little minds
In programming, as in mathematics, the opposite is true
• A mathematical paper where the symbols ∪ and ∩ were reversed would be very hard to read, even if
the author told you so on the first page
In Python, the standard style is set out in PEP8
(Occasionally well deviate from PEP8 in these lectures to better match mathematical notation)
1.4.7 Exercises
Exercise 1
Part 1: Given two numeric lists or tuples x_vals and y_vals of equal length, compute their inner product
using zip()
Part 2: In one line, count the number of even numbers in 0,,99
• Hint: x % 2 returns 0 if x is even, 1 otherwise
Part 3: Given pairs = ((2, 5), (4, 2), (9, 8), (12, 10)), count the number of pairs
(a, b) such that both a and b are even
Exercise 2
∑
n
p(x) = a0 + a1 x + a2 x2 + · · · an xn = ai x i (1.1)
i=0
Write a function p such that p(x, coeff) that computes the value in (1.1) given a point x and a list of
coefficients coeff
Try to use enumerate() in your loop
Exercise 3
Write a function that takes a string as an argument and returns the number of capital letters in the string
Hint: 'foo'.upper() returns 'FOO'
Exercise 4
Write a function that takes two sequences seq_a and seq_b as arguments and returns True if every
element in seq_a is also an element of seq_b, else False
• By sequence we mean a list, a tuple or a string
• Do the exercise without using sets and set methods
Exercise 5
When we cover the numerical libraries, we will see they include many alternatives for interpolation and
function approximation
Nevertheless, lets write our own function approximation routine as an exercise
In particular, without using any imports, write a function linapprox that takes as arguments
• A function f mapping some interval [a, b] into R
• two scalars a and b providing the limits of this interval
• An integer n determining the number of grid points
• A number x satisfying a <= x <= b
and returns the piecewise linear interpolation of f at x, based on n evenly spaced grid points a =
point[0] < point[1] < ... < point[n-1] = b
Aim for clarity, not efficiency
1.4.8 Solutions
Exercise 1
Part 1 solution:
x_vals = [1, 2, 3]
y_vals = [1, 1, 1]
sum([x * y for x, y in zip(x_vals, y_vals)])
Part 2 solution:
One solution is
50
50
Some less natural alternatives that nonetheless help to illustrate the flexibility of list comprehensions are
50
and
50
Part 3 solution
Exercise 2
Exercise 3
def f(string):
count = 0
for letter in string:
if letter == letter.upper() and letter.isalpha():
count += 1
return count
f('The Rain in Spain')
Exercise 4
Heres a solution:
# == test == #
True
False
Of course if we use the sets data type then the solution is easier
Exercise 5
Parameters
===========
f : function
The function to approximate
n : integer
Number of grid points
Returns
=========
A float. The interpolant evaluated at x
"""
length_of_interval = b - a
num_subintervals = n - 1
step = length_of_interval / num_subintervals
# === x must lie between the gridpoints (point - step) and point === #
u, v = point - step, point
Contents
1.5.1 Overview
Python is pragmatic language that blends object oriented and procedural styles, rather than taking a purist
approach
However, at a foundational level, Python is object oriented
In particular, in Python, everything is an object
In this lecture we explain what that statement means and why it matters
1.5.2 Objects
In Python, an object is a collection of data and instructions held in computer memory that consists of
1. a type
2. a unique identity
3. data (i.e., content)
4. methods
These concepts are defined and discussed sequentially below
Type
Python provides for different types of objects, to accommodate different categories of data
For example
s = 'This is a string'
type(s)
str
int
'300' + 'cc'
'300cc'
300 + 400
700
'300' + 400
Here we are mixing types, and its unclear to Python whether the user wants to
• convert '300' to an integer and then add it to 400, or
• convert 400 to string and then concatenate it with '300'
Some languages might try to guess but Python is strongly typed
• Type is important, and implicit type conversion is rare
• Python will respond instead by raising a TypeError
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-9b7dffd27f2d> in <module>()
----> 1 '300' + 400
To avoid the error, you need to clarify by changing the relevant type
For example,
700
Identity
In Python, each object has a unique identifier, which helps Python (and us) keep track of the object
The identity of an object can be obtained via the id() function
y = 2.5
z = 2.5
id(y)
166719660
id(z)
166719740
In this example, y and z happen to have the same value (i.e., 2.5), but they are not the same object
The identity of an object is in fact just the address of the object in memory
If we set x = 42 then we create an object of type int that contains the data 42
In fact it contains more, as the following example shows
x = 42
x
42
x.imag
x.__class__
int
When Python creates this integer object, it stores with it various auxiliary information, such as the imaginary
part, and the type
Any name following a dot is called an attribute of the object to the left of the dot
• e.g.,‘‘imag‘‘ and __class__ are attributes of x
We see from this example that objects have attributes that contain auxillary information
They also have attributes that act like functions, called methods
These attributes are important, so lets discuss them in depth
Methods
x = ['foo', 'bar']
callable(x.append)
True
callable(x.__doc__)
False
Methods typically act on the data contained in the object they belong to, or combine that data with other
data
x = ['a', 'b']
x.append('c')
s = 'This is a string'
s.upper()
'THIS IS A STRING'
s.lower()
'this is a string'
s.replace('This', 'That')
'That is a string'
x = ['a', 'b']
x[0] = 'aa' # Item assignment using square bracket notation
x
['aa', 'b']
It doesnt look like there are any methods used here, but in fact the square bracket assignment notation is just
a convenient interface to a method call
What actually happens is that Python calls the __setitem__ method, as follows
x = ['a', 'b']
x.__setitem__(0, 'aa') # Equivalent to x[0] = 'aa'
x
['aa', 'b']
(If you wanted to you could modify the __setitem__ method, so that square bracket assignment does
something totally different)
1.5.3 Summary
<function __main__.f>
type(f)
function
id(f)
3074342220L
f.__name__
'f'
We can see that f has type, identity, attributes and so onjust like any other object
It also has methods
One example is the __call__ method, which just evaluates the function
f.__call__(3)
import math
id(math)
3074329380L
This uniform treatment of data in Python (everything is an object) helps keep the language simple and
consistent
TWO
Next we cover the third party libraries most useful for scientific work in Python
2.1 NumPy
Contents
• NumPy
– Overview
– Introduction to NumPy
– NumPy Arrays
– Operations on Arrays
– Additional Functionality
– Exercises
– Solutions
Lets be clear: the work of science has nothing whatever to do with consensus. Consensus is the
business of politics. Science, on the contrary, requires only one investigator who happens to be
right, which means that he or she has results that are verifiable by reference to the real world.
In science consensus is irrelevant. What is relevant is reproducible results. – Michael Crichton
2.1.1 Overview
85
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
References
import numpy as np
x = np.random.uniform(0, 1, size=1000000)
x.mean()
0.49990566939719772
The operations of creating the array and computing its mean are both passed out to carefully optimized
machine code compiled from C
More generally, NumPy sends operations in batches to optimized C and Fortran code
This is similar in spirit to Matlab, which provides an interface to fast Fortran routines
A Comment on Vectorization
The most important thing that NumPy defines is an array data type formally called a numpy.ndarray
NumPy arrays power a large proportion of the scientific Python ecosystem
To create a NumPy array containing only zeros we use np.zeros
a = np.zeros(3)
a
type(a)
numpy.ndarray
NumPy arrays are somewhat like native Python lists, except that
• Data must be homogeneous (all elements of the same type)
• These types must be one of the data types (dtypes) provided by NumPy
The most important of these dtypes are:
• float64: 64 bit floating point number
• int64: 64 bit integer
• bool: 8 bit True or False
There are also dtypes to represent complex numbers, unsigned integers, etc
On modern machines, the default dtype for arrays is float64
a = np.zeros(3)
type(a[0])
numpy.float64
a = np.zeros(3, dtype=int)
type(a[0])
numpy.int64
2.1. NumPy 87
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
z = np.zeros(10)
Here z is a flat array with no dimension neither row nor column vector
The dimension is recorded in the shape attribute, which is a tuple
z.shape
(10,)
Here the shape tuple has only one element, which is the length of the array (tuples with one element end
with a comma)
To give it dimension, we can change the shape attribute
z.shape = (10, 1)
z
array([[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.],
[0.]])
z = np.zeros(4)
z.shape = (2, 2)
z
array([[0., 0.],
[0., 0.]])
In the last case, to make the 2 by 2 array, we could also pass a tuple to the zeros() function, as in z =
np.zeros((2, 2))
Creating Arrays
z = np.empty(3)
z
z = np.identity(2)
z
array([[1., 0.],
[0., 1.]])
In addition, NumPy arrays can be created from Python lists, tuples, etc. using np.array
array([10, 20])
type(z)
numpy.ndarray
array([[1, 2],
[3, 4]])
See also np.asarray, which performs a similar function, but does not make a distinct copy of data already
in a NumPy array
na = np.linspace(10, 20, 2)
na is np.asarray(na) # Does not copy NumPy arrays
2.1. NumPy 89
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
True
False
To read in the array data from a text file containing numeric data use np.loadtxt or np.
genfromtxtsee the documentation for details
Array Indexing
z = np.linspace(1, 2, 5)
z
z[0]
1.0
array([ 1. , 1.25])
z[-1]
2.0
array([[1, 2],
[3, 4]])
z[0, 0]
z[0, 1]
And so on
Note that indices are still zero-based, to maintain compatibility with Python sequences
Columns and rows can be extracted as follows
z[0, :]
array([1, 2])
z[:, 1]
array([2, 4])
z = np.linspace(2, 4, 5)
z
array([2. , 3. , 3.5])
z[d]
array([2.5, 3. ])
2.1. NumPy 91
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
z = np.empty(3)
z
array([2. , 3. , 3.5])
z[:] = 42
z
Array Methods
a = np.array((4, 3, 2, 1))
a
array([4, 3, 2, 1])
array([1, 2, 3, 4])
a.sum() # Sum
10
a.mean() # Mean
2.5
a.max() # Max
array([ 1, 3, 6, 10])
array([ 1, 2, 6, 24])
a.var() # Variance
1.25
1.1180339887498949
a.shape = (2, 2)
a.T # Equivalent to a.transpose()
array([[1, 3],
[2, 4]])
z = np.linspace(2, 4, 5)
z
z.searchsorted(2.2)
Many of the methods discussed above have equivalent functions in the NumPy namespace
a = np.array((4, 3, 2, 1))
np.sum(a)
10
np.mean(a)
2.5
2.1. NumPy 93
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
Arithmetic Operations
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
a + b
a * b
a + 10
a * 10
A = np.ones((2, 2))
B = np.ones((2, 2))
A + B
array([[2., 2.],
[2., 2.]])
A + 10
array([[11., 11.],
[11., 11.]])
A * B
array([[1., 1.],
[1., 1.]])
Matrix Multiplication
With Anacondas scientific Python package based around Python 3.5 and above, one can use the @ symbol
for matrix multiplication, as follows:
A = np.ones((2, 2))
B = np.ones((2, 2))
A @ B
array([[2., 2.],
[2., 2.]])
(For older versions of Python and NumPy you need to use the np.dot function)
We can also use @ to take the inner product of two flat arrays
A = np.array((1, 2))
B = np.array((10, 20))
A @ B
50
array([[1, 2],
[3, 4]])
A @ (0, 1)
array([2, 4])
a = np.array([42, 44])
a
2.1. NumPy 95
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
array([42, 44])
array([42, 0])
Mutability leads to the following behavior (which can be shocking to MATLAB programmers)
a = np.random.randn(3)
a
b = a
b[0] = 0.0
a
Making Copies
a = np.random.randn(3)
a
b = np.copy(a)
b
b[:] = 1
b
Vectorized Functions
NumPy provides versions of the standard functions log, exp, sin, etc. that act element-wise on arrays
z = np.array([1, 2, 3])
np.sin(z)
n = len(z)
y = np.empty(n)
for i in range(n):
y[i] = np.sin(z[i])
Because they act element-wise on arrays, these functions are called vectorized functions
In NumPy-speak, they are also called ufuncs, which stands for universal functions
As we saw above, the usual arithmetic operations (+, *, etc.) also work element-wise, and combining these
with the ufuncs gives a very large set of fast element-wise functions
array([1, 2, 3])
2.1. NumPy 97
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
def f(x):
return 1 if x > 0 else 0
x = np.random.randn(4)
x
array([0, 1, 0, 0])
f = np.vectorize(f)
f(x) # Passing the same vector x as in the previous example
array([0, 1, 0, 0])
However, this approach doesnt always obtain the same speed as a more carefully crafted vectorized function
Comparisons
z = np.array([2, 3])
y = np.array([2, 3])
z == y
y[0] = 5
z == y
array([False, True])
z != y
z = np.linspace(0, 10, 5)
z
z > 3
b = z > 3
b
z[b]
z[z > 3]
Subpackages
NumPy provides some additional functionality related to scientific programming through its subpackages
Weve already seen how we can generate random variables using np.random
5.096
2.1. NumPy 99
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
-2.0000000000000004
array([[-2. , 1. ],
[ 1.5, -0.5]])
Much of this functionality is also available in SciPy, a collection of modules that are built on top of NumPy
Well cover the SciPy versions in more detail soon
For a comprehensive list of whats available in NumPy see this documentation
2.1.6 Exercises
Exercise 1
∑
N
p(x) = a0 + a1 x + a2 x2 + · · · aN xN = an x n (2.1)
n=0
Earlier, you wrote a simple function p(x, coeff) to evaluate (2.1) without considering efficiency
Now write a new function that does the same job, but uses NumPy arrays and array operations for its
computations, rather than any form of Python loop
(Such functionality is already implemented as np.poly1d, but for the sake of the exercise dont use this
class)
• Hint: Use np.cumprod()
Exercise 2
def sample(q):
a = 0.0
U = uniform(0, 1)
for i in range(len(q)):
if a < U <= a + q[i]:
return i
a = a + q[i]
If you cant see how this works, try thinking through the flow for a simple example, such as q = [0.25,
0.75] It helps to sketch the intervals on paper
Your exercise is to speed it up using NumPy, avoiding explicit loops
• Hint: Use np.searchsorted and np.cumsum
If you can, implement the functionality as a class called discreteRV, where
• the data for an instance of the class is the vector of probabilities q
• the class has a draw() method, which returns one draw according to the algorithm described above
If you can, write the method so that draw(k) returns k draws from q
Exercise 3
2.1.7 Solutions
Exercise 1
Lets test it
coef = np.ones(3)
print(coef)
print(p(1, coef))
# For comparison
q = np.poly1d(coef)
print(q(1))
[1. 1. 1.]
3.0
3.0
Exercise 2
class DiscreteRV:
"""
Generates an array of draws from a discrete random variable with vector of
probabilities given by q.
"""
The logic is not obvious, but if you take your time and read it slowly, you will understand
There is a problem here, however
Suppose that q is altered after an instance of discreteRV is created, for example by
q = (0.1, 0.9)
d = DiscreteRV(q)
d.q = (0.5, 0.5)
The problem is that Q does not change accordingly, and Q is the data used in the draw method
To deal with this, one option is to compute Q every time the draw method is called
Exercise 3
"""
Modifies ecdf.py from QuantEcon to add in a plot method
"""
class ECDF:
"""
One-dimensional empirical distribution function given a vector of
observations.
Parameters
----------
observations : array_like
An array of observations
Attributes
----------
observations : array_like
An array of observations
"""
Parameters
----------
x : scalar(float)
The x at which the ecdf is evaluated
Returns
-------
scalar(float)
Fraction of the sample less than x
"""
return np.mean(self.observations <= x)
Parameters
----------
a : scalar(float), optional(default=None)
Lower end point of the plot interval
b : scalar(float), optional(default=None)
Upper end point of the plot interval
"""
X = np.random.randn(1000)
F = ECDF(X)
F.plot()
2.2 Matplotlib
Contents
• Matplotlib
– Overview
– The APIs
– More Features
– Further Reading
– Exercises
– Solutions
2.2.1 Overview
Weve already generated quite a few figures in these lectures using Matplotlib
Matplotlib is an outstanding graphics library, designed for scientific computing, with
• high quality 2D and 3D plots
• output in all the usual formats (PDF, PNG, etc.)
• LaTeX integration
• fine grained control over all aspects of presentation
• animation, etc.
Heres the kind of easy example you might find in introductory treatments
This is simple and convenient, but also somewhat limited and un-Pythonic
For example, in the function calls, a lot of objects get created and passed around without making themselves
known to the programmer
Python programmers tend to prefer a more explicit style of programming (run import this in a code
block and look at the second line)
This leads us to the alternative, object oriented Matplotlib API
Heres the code corresponding to the preceding figure using the object-oriented API
fig, ax = plt.subplots()
ax.plot(x, y, 'b-', linewidth=2)
plt.show()
Tweaks
fig, ax = plt.subplots()
ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend()
plt.show()
Weve also used alpha to make the line slightly transparentwhich makes it look smoother
The location of the legend can be changed by replacing ax.legend() with ax.legend(loc='upper
center')
fig, ax = plt.subplots()
ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
ax.legend(loc='upper center')
plt.show()
fig, ax = plt.subplots()
ax.plot(x, y, 'r-', linewidth=2, label='$y=\sin(x)$', alpha=0.6)
ax.legend(loc='upper center')
plt.show()
fig, ax = plt.subplots()
ax.plot(x, y, 'r-', linewidth=2, label='$y=\sin(x)$', alpha=0.6)
ax.legend(loc='upper center')
ax.set_yticks([-1, 0, 1])
ax.set_title('Test plot')
plt.show()
Matplotlib has a huge array of functions and features, which you can discover over time as you have need
for them
We mention just a few
fig, ax = plt.subplots()
x = np.linspace(-4, 4, 150)
for i in range(3):
m, s = uniform(-1, 1), uniform(1, 2)
y = norm.pdf(x, loc=m, scale=s)
current_label = f'$\mu = {m:.2}$'
ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
ax.legend()
plt.show()
Multiple Subplots
num_rows, num_cols = 3, 2
fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 12))
for i in range(num_rows):
for j in range(num_cols):
m, s = uniform(-1, 1), uniform(1, 2)
x = norm.rvs(loc=m, scale=s, size=100)
axes[i, j].hist(x, alpha=0.6, bins=20)
t = f'$\mu = {m:.2}, \quad \sigma = {s:.2}$'
axes[i, j].set(title=t, xticks=[-4, 0, 4], yticks=[])
plt.show()
3D Plots
A Customizing Function
Perhaps you will find a set of customizations that you regularly use
Suppose we usually prefer our axes to go through the origin, and to have a grid
Heres a nice example from Matthew Doty of how the object-oriented API can be used to build a custom
subplots function that implements these changes
Read carefully through the code and see if you can follow whats going on
def subplots():
"Custom subplots with axes throught the origin"
fig, ax = plt.subplots()
ax.grid()
return fig, ax
Heres the figure it produces (note axes through the origin and the grid)
2.2.5 Exercises
Exercise 1
2.2.6 Solutions
Exercise 1
for θ in θ_vals:
ax.plot(x, np.cos(np.pi * θ * x) * np.exp(- x))
plt.show()
2.3 SciPy
Contents
• SciPy
– SciPy versus NumPy
– Statistics
– Roots and Fixed Points
– Optimization
– Integration
– Linear Algebra
– Exercises
– Solutions
SciPy builds on top of NumPy to provide common tools for scientific programming, such as
• linear algebra
• numerical integration
• interpolation
• optimization
• distributions and random number generation
• signal processing
• etc., etc
SciPy is a package that contains various tools that are built on top of NumPy, using its array data type and
related functionality
In fact, when we import SciPy we also get NumPy, as can be seen from the SciPy initialization file
__all__ = []
__all__ += _num.__all__
__all__ += ['randn', 'rand', 'fft', 'ifft']
del _num
# Remove the linalg imported from numpy so that the scipy.linalg package can
,→be
# imported.
del linalg
__all__.remove('linalg')
However, its more common and better practice to use NumPy functionality explicitly
import numpy as np
a = np.identity(3)
2.3.2 Statistics
np.random.beta(5, 5, size=3)
x(a−1) (1 − x)(b−1)
f (x; a, b) = ∫ 1 (0 ≤ x ≤ 1) (2.2)
(a−1) u(b−1) du
0 u
Sometimes we need access to the density itself, or the cdf, the quantiles, etc.
For this we can use scipy.stats, which provides all of this functionality as well as random number
generation in a single consistent interface
Heres an example of usage
In this code we created a so-called rv_frozen object, via the call q = beta(5, 5)
The frozen part of the notation implies that q represents a particular distribution with a particular set of
parameters
Once weve done so, we can then generate random numbers, evaluate the density, etc., all from this fixed
distribution
0.2665676800000002
2.0901888000000004
0.63391348346427079
q.mean()
0.5
identifier = scipy.stats.distribution_name(shape_parameters)
where distribution_name is one of the distribution names in scipy.stats
There are also two keyword arguments, loc and scale, which following our example above, are called as
identifier = scipy.stats.distribution_name(shape_parameters,
loc=c, scale=d)
These transform the original random variable X into Y = c + dX
The methods rvs, pdf, cdf, etc. are transformed accordingly
Before finishing this section, we note that there is an alternative way of calling the methods described above
For example, the previous code can be replaced by
fig, ax = plt.subplots()
ax.hist(obs, bins=40, normed=True)
ax.plot(grid, beta.pdf(grid, 5, 5), 'k-', linewidth=2)
plt.show()
x = np.random.randn(200)
y = 2 * x + 0.1 * np.random.randn(200)
gradient, intercept, r_value, p_value, std_err = linregress(x, y)
gradient, intercept
(1.9962554379482236, 0.008172822032671799)
plt.figure(figsize=(10, 8))
plt.plot(x, f(x))
plt.axhline(ls='--', c='k')
plt.show()
Bisection
One of the most common algorithms for numerical root finding is bisection
To understand the idea, recall the well known game where
• Player A thinks of a secret number between 1 and 100
In fact SciPy provides its own bisection function, which we now test using the function f defined in (2.3)
bisect(f, 0, 1)
0.40829350427936706
Lets investigate this using the same function f , first looking at potential instability
0.40829350427935679
0.70017000000002816
%timeit bisect(f, 0, 1)
Hybrid Methods
So far we have seen that the Newton-Raphson method is fast but not robust
This bisection algorithm is robust but relatively slow
This illustrates a general principle
• If you have specific knowledge about your function, you might be able to exploit it to generate effi-
ciency
• If not, then the algorithm choice involves a trade-off between speed of convergence and robustness
In practice, most default algorithms for root finding, optimization and fixed points use hybrid methods
These methods typically combine a fast method with a robust method in the following manner:
1. Attempt to use a fast method
2. Check diagnostics
3. If diagnostics are bad, then switch to a more robust algorithm
In scipy.optimize, the function brentq is such a hybrid method, and a good default
brentq(f, 0, 1)
0.40829350427936706
%timeit brentq(f, 0, 1)
Here the correct solution is found and the speed is almost the same as newton
Fixed Points
array(1.0)
If you dont get good results, you can always switch back to the brentq root finder, since the fixed point of
a function f is the root of g(x) := x − f (x)
2.3.4 Optimization
0.0
Multivariate Optimization
Multivariate local optimizers include minimize, fmin, fmin_powell, fmin_cg, fmin_bfgs, and
fmin_ncg
Constrained multivariate local optimizers include fmin_l_bfgs_b, fmin_tnc, fmin_cobyla
See the documentation for details
2.3.5 Integration
Most numerical integration methods work by computing the integral of an approximating polynomial
The resulting error depends on how well the polynomial fits the integrand, which in turn depends on how
regular the integrand is
In SciPy, the relevant module for numerical integration is scipy.integrate
A good default for univariate integration is quad
0.33333333333333337
In fact quad is an interface to a very standard numerical integration routine in the Fortran library QUAD-
PACK
It uses Clenshaw-Curtis quadrature, based on expansion in terms of Chebychev polynomials
There are other options for univariate integrationa useful one is fixed_quad, which is fast and hence
works well inside for loops
There are also functions for multivariate integration
See the documentation for more details
We saw that NumPy provides a module for linear algebra called linalg
SciPy also provides a module for linear algebra with the same name
The latter is not an exact superset of the former, but overall it has more functionality
We leave you to investigate the set of available routines
2.3.7 Exercises
Exercise 1
2.3.8 Solutions
Exercise 1
2.4 Numba
Contents
• Numba
– Overview
– Where are the Bottlenecks?
– Vectorization
– Numba
2.4.1 Overview
In our lecture on NumPy we learned one method to improve speed and efficiency in numerical work
That method, called vectorization, involved sending array processing operations in batch to efficient low
level code
This clever idea dates back to Matlab, which uses it extensively
Unfortunately, vectorization is limited and has several weaknesses
One weakness is that it is highly memory intensive
To understand what Numba does and why, we need some background knowledge
Lets start by thinking about higher level languages, such as Python
These languages are optimized for humans
This means that the programmer can leave many details to the runtime environment
• specifying variable types
• memory allocation/deallocation, etc.
The upside is that, compared to low-level languages, Python is typically faster to write, less error prone and
easier to debug
The downside is that Python is harder to optimize that is, turn into fast machine code than languages like
C or Fortran
Indeed, the standard implementation of Python (called CPython) cannot match the speed of compiled lan-
guages such as C or Fortran
Does that mean that we should just switch to C or Fortran for everything?
The answer is no, no and one hundred times no
High productivity languages should be chosen over high speed languages for the great majority of scientific
computing tasks
This is because
1. Of any given program, relatively few lines are ever going to be time-critical
2. For those lines of code that are time-critical, we can achieve C-like speed using a combination of
NumPy and Numba
This lecture provides a guide
Lets start by trying to understand why high level languages like Python are slower than compiled code
Dynamic Typing
a, b = 10, 10
a + b
20
Even for this simple operation, the Python interpreter has a fair bit of work to do
For example, in the statement a + b, the interpreter has to know which operation to invoke
If a and b are strings, then a + b requires string concatenation
a, b = 'foo', 'bar'
a + b
'foobar'
a, b = ['foo'], ['bar']
a + b
['foo', 'bar']
(We say that the operator + is overloaded its action depends on the type of the objects on which it acts)
As a result, Python must check the type of the objects and then call the correct operation
This involves substantial overheads
Static Types
#include <stdio.h>
int main(void) {
int i;
int sum = 0;
for (i = 1; i <= 10; i++) {
sum = sum + i;
}
printf("sum = %d\n", sum);
return 0;
}
Data Access
In C or Fortran, these integers would typically be stored in an array, which is a simple data structure for
storing homogeneous data
Such an array is stored in a single contiguous block of memory
• In modern computers, memory addresses are allocated to each byte (one byte = 8 bits)
• For example, a 64 bit integer is stored in 8 bytes of memory
• An array of n such integers occupies 8n consecutive memory slots
Moreover, the compiler is made aware of the data type by the programmer
• In this case 64 bit integers
Hence, each successive data point can be accessed by shifting forward in memory space by a known and
fixed amount
• In this case 8 bytes
2.4.3 Vectorization
Operations on Arrays
import random
import numpy as np
import quantecon as qe
qe.util.tic()
n = 100_000
x = np.random.uniform(0, 1, n)
np.sum(x**2)
qe.util.toc()
The second code block which achieves the same thing as the first runs much faster
The reason is that in the second implementation we have broken the loop down into three basic operations
1. draw n uniforms
2. square them
3. sum them
These are sent as batch operators to optimized machine code
Apart from minor overheads associated with sending data back and forth, the result is C or Fortran-like
speed
When we run batch operations on arrays like this, we say that the code is vectorized
Universal Functions
Many functions provided by NumPy are so-called universal functions also called ufuncs
This means that they
• map scalars into scalars, as expected
• map arrays into arrays, acting element-wise
For example, np.cos is a ufunc:
np.cos(1.0)
0.54030230586813977
np.cos(np.linspace(0, 1, 3))
cos(x2 + y 2 )
f (x, y) = and a = 3
1 + x2 + y 2
Heres a plot of f
f(x, y),
rstride=2, cstride=2,
cmap=cm.jet,
alpha=0.7,
linewidth=0.25)
ax.set_zlim(-0.5, 1.0)
plt.show()
qe.tic()
for x in grid:
for y in grid:
z = f(x, y)
if z > m:
m = z
qe.toc()
qe.tic()
np.max(f(x, y))
qe.toc()
In the vectorized version, all the looping takes place in compiled code
As you can see, the second version is much faster
(Well make it even faster again below, when we discuss Numba)
2.4.4 Numba
Prerequisites
An Example
xt+1 = 4xt (1 − xt )
Heres the plot of a typical trajectory, starting from x0 = 0.1, with t on the x-axis
x = qm(0.1, 250)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, 'b-', lw=2, alpha=0.8)
ax.set_xlabel('time', fontsize=16)
plt.show()
Lets time and compare identical function calls across these two versions:
qe.util.tic()
qm(0.1, int(10**5))
time1 = qe.util.toc()
qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()
The first execution is relatively slow because of JIT compilation (see below)
Next time and all subsequent times it runs much faster:
qe.util.tic()
qm_numba(0.1, int(10**5))
time2 = qe.util.toc()
182.8322188449848
Decorator Notation
If you dont need a separate name for the numbafied version of qm, you can just put @jit before the function
@jit
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return x
Numba attempts to generate fast machine code using the infrastructure provided by the LLVM Project
It does this by inferring type information on the fly
As you can imagine, this is easier for simple Python objects (simple scalar data types, such as floats, integers,
etc.)
Numba also plays well with NumPy arrays, which it treats as typed memory regions
In an ideal setting, Numba can infer all necessary type information
This allows it to generate native machine code, without having to call the Python runtime environment
In such a setting, Numba will be on par with machine code from low level languages
When Numba cannot infer all type information, some Python objects are given generic object status, and
some code is generated using the Python runtime
In this second setting, Numba typically provides only minor speed gains or none at all
Hence, its prudent when using Numba to focus on speeding up small, time-critical snippets of code
This will give you much better performance than blanketing your Python programs with @jit statements
a = 1
@jit
def add_x(x):
return a + x
print(add_x(10))
11
a = 2
print(add_x(10))
11
Notice that changing the global had no effect on the value returned by the function
When Numba compiles machine code for functions, it treats global variables as constants to ensure type
stability
Numba can also be used to create custom ufuncs with the @vectorize decorator
To illustrate the advantage of using Numba to vectorize a function, we return to a maximization problem
discussed above
@vectorize
def f_vec(x, y):
return np.cos(x**2 + y**2) / (1 + x**2 + y**2)
qe.tic()
np.max(f_vec(x, y))
qe.toc()
qe.tic()
np.max(f_vec(x, y))
qe.toc()
Contents
2.5.1 Overview
In this lecture we review some other scientific libraries that are useful for economic research and analysis
We have, however, already picked most of the low hanging fruit in terms of economic research
Hence you should feel free to skip this lecture on first pass
2.5.2 Cython
Like Numba, Cython provides an approach to generating fast compiled code that can be used from Python
As was the case with Numba, a key problem is the fact that Python is dynamically typed
As youll recall, Numba solves this problem (where possible) by inferring type
Cythons approach is different programmers add type definitions directly to their Python code
As such, the Cython language can be thought of as Python with type definitions
In addition to a language specification, Cython is also a language translator, transforming Cython code into
optimized C and C++ code
Cython also takes care of building language extensions the wrapper code that interfaces between the result-
ing compiled code and Python
Important Note:
In what follows code is executed in a Jupyter notebook
This is to take advantage of a Cython cell magic that makes Cython particularly easy to use
Some modifications are required to run the code outside a notebook
• See the book Cython by Kurt Smith or the online documentation
A First Example
∑
n
1 − αn+1
αi =
1−α
i=0
Python vs C
If youre not familiar with C, the main thing you should take notice of is the type definitions
• int means integer
• double means double precision floating point number
• the double in double geo_prog(... indicates that the function will return a double
Not surprisingly, the C code is faster than the Python code
A Cython Implementation
%load_ext Cython
%%cython
def geo_prog_cython(double alpha, int n):
cdef double current = 1.0
cdef double sum = current
cdef int i
for i in range(n):
current = current * alpha
sum = sum + current
return sum
Here cdef is a Cython keyword indicating a variable declaration, and is followed by a type
The %%cython line at the top is not actually Cython code its a Jupyter cell magic indicating the start of
Cython code
After executing the cell, you can now call the function geo_prog_cython from within Python
What you are in fact calling is compiled C code with a Python call interface
import quantecon as qe
qe.util.tic()
geo_prog(0.99, int(10**6))
qe.util.toc()
qe.util.tic()
geo_prog_cython(0.99, int(10**6))
qe.util.toc()
Lets go back to the first problem that we worked with: generating the iterates of the quadratic map
xt+1 = 4xt (1 − xt )
The problem of computing iterates and returning a time series requires us to work with arrays
The natural array type to work with is NumPy arrays
Heres a Cython implementation that initializes, populates and returns a NumPy array
%%cython
import numpy as np
If you run this code and time it, you will see that its performance is disappointing nothing like the speed
gain we got from Numba
qe.util.tic()
qm_cython_first_pass(0.1, int(10**5))
qe.util.toc()
This example was also computed in the Numba lecture, and you can see Numba is around 90 times faster
The reason is that working with NumPy arrays incurs substantial Python overheads
We can do better by using Cythons typed memoryviews, which provide more direct access to arrays in
memory
When using them, the first step is to create a NumPy array
Next, we declare a memoryview and bind it to the NumPy array
Heres an example:
%%cython
import numpy as np
from numpy cimport float_t
Here
• cimport pulls in some compile-time information from NumPy
• cdef float_t [:] x = x_np_array creates a memoryview on the NumPy array
x_np_array
• the return statement uses np.asarray(x) to convert the memoryview back to a NumPy array
Lets time it:
qe.util.tic()
qm_cython(0.1, int(10**5))
qe.util.toc()
Summary
Cython requires more expertise than Numba, and is a little more fiddly in terms of getting good performance
In fact, its surprising how difficult it is to beat the speed improvements provided by Numba
Nonetheless,
2.5.3 Joblib
Caching
Perhaps, like us, you sometimes run a long computation that simulates a model at a given set of parameters
to generate a figure, say, or a table
20 minutes later you realize that you want to tweak the figure and now you have to do it all again
What caching will do is automatically store results at each parameterization
With Joblib, results are compressed and stored on file, and automatically served back up to you when you
repeat the calculation
An Example
Lets look at a toy example, related to the quadratic map model discussed above
Lets say we want to generate a long trajectory from a certain initial condition x0 and see what fraction of
the sample is below 0.1
(Well omit JIT compilation or other speed ups for simplicity)
Heres our code
memory = Memory(cachedir='./joblib_cache')
@memory.cache
def qm(x0, n):
x = np.empty(n+1)
x[0] = x0
for t in range(n):
x[t+1] = 4 * x[t] * (1 - x[t])
return np.mean(x < 0.1)
We are using joblib to cache the result of calling qm at a given set of parameters
With the argument cachedir=./joblib_cache, any call to this function results in both the input values and
output values being stored a subdirectory joblib_cache of the present working directory
(In UNIX shells, . refers to the present working directory)
The first time we call the function with a given set of parameters we see some extra output that notes
information being cached
qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()
The next time we call the function with the same set of parameters, the result is returned almost instanta-
neously
qe.util.tic()
n = int(1e7)
qm(0.2, n)
qe.util.toc()
0.204758079524
TOC: Elapsed: 0.0009872913360595703 seconds.
There are in fact many other approaches to speeding up your Python code
One is interfacing with Fortran
If you are comfortable writing Fortran you will find it very easy to create extension modules from Fortran
code using F2Py
F2Py is a Fortran-to-Python interface generator that is particularly simple to use
Robert Johansson provides a very nice introduction to F2Py, among other things
Recently, a Jupyter cell magic for Fortran has been developed you might want to give it a try
2.5.5 Exercises
Exercise 1
For now, lets just concentrate on simulating a very simple example of such a chain
Suppose that the volatility of returns on an asset can be in one of two regimes high or low
The transition probabilities across states are as follows
For example, let the period length be one month, and suppose the current state is high
We see from the graph that the state next month will be
• high with probability 0.8
• low with probability 0.2
Your task is to simulate a sequence of monthly volatility states according to this rule
Set the length of the sequence to n = 100000 and start in the high state
Implement a pure Python version, a Numba version and a Cython version, and compare speeds
To test your code, evaluate the fraction of time that the chain spends in the low state
If your code is correct, it should be about 2/3
2.5.6 Solutions
Exercise 1
We let
• 0 represent low
• 1 represent high
def compute_series(n):
x = np.empty(n, dtype=int)
x[0] = 1 # Start in state 1
U = np.random.uniform(0, 1, size=n)
for t in range(1, n):
current_x = x[t-1]
if current_x == 0:
Lets run this code and check that the fraction of time spent in the low state is about 0.666
n = 100000
x = compute_series(n)
print(np.mean(x == 0)) # Fraction of time x is in state 0
0.66951
qe.util.tic()
compute_series(n)
qe.util.toc()
compute_series_numba = jit(compute_series)
x = compute_series_numba(n)
print(np.mean(x == 0))
0.66764
qe.util.tic()
compute_series_numba(n)
qe.util.toc()
%load_ext Cython
%%cython
import numpy as np
compute_series_cy(10)
array([1, 0, 0, 0, 0, 0, 0, 0, 0, 0])
x = compute_series_cy(n)
print(np.mean(x == 0))
0.66927
qe.util.tic()
compute_series_cy(n)
qe.util.toc()
THREE
Contents
3.1.1 Overview
When computer programs are small, poorly written code is not overly costly
But more data, more sophisticated models, and more computer power are enabling us to take on more
challenging problems that involve writing longer programs
For such programs, investment in good coding practices will pay high returns
The main payoffs are higher productivity and faster code
In this lecture, we review some elements of good coding practice
We also touch on modern developments in scientific computing such as just in time compilation and how
they affect good program design
149
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
Here
• kt is capital at time t and
• s, α, δ are parameters (savings, a productivity parameter and depreciation)
For each parameterization, the code
1. sets k0 = 1
2. iterates using (3.1) to produce a sequence k0 , k1 , k2 . . . , kT
3. plots the sequence
The plots will be grouped into three subfigures
In each subfigure, two parameters are held fixed while another varies
import numpy as np
import matplotlib.pyplot as plt
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α[j] + (1 - δ) * k[t]
axes[0].plot(k, 'o-', label=rf"$\alpha = {α[j]},\; s = {s},\; \delta={δ}$
,→")
axes[0].grid(lw=0.2)
axes[0].set_ylim(0, 18)
axes[0].set_xlabel('time')
axes[0].set_ylabel('capital')
axes[0].legend(loc='upper left', frameon=True, fontsize=14)
α = 0.33
s = (0.3, 0.4, 0.5)
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s[j] * k[t]**α + (1 - δ) * k[t]
axes[1].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta={δ}$")
axes[1].grid(lw=0.2)
axes[1].set_xlabel('time')
axes[1].set_ylabel('capital')
axes[1].set_ylim(0, 18)
axes[1].legend(loc='upper left', frameon=True, fontsize=14)
for j in range(3):
k[0] = 1
for t in range(49):
k[t+1] = s * k[t]**α + (1 - δ[j]) * k[t]
axes[2].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta={δ[j]}$
,→")
axes[2].set_ylim(0, 18)
axes[2].set_xlabel('time')
axes[2].set_ylabel('capital')
axes[2].grid(lw=0.2)
axes[2].legend(loc='upper left', frameon=True, fontsize=14)
plt.show()
There are usually many different ways to write a program that accomplishes a given task
For small programs, like the one above, the way you write code doesnt matter too much
But if you are ambitious and want to produce useful things, youll write medium to large programs too
In those settings, coding style matters a great deal
Fortunately, lots of smart people have thought about the best way to write code
Here are some basic precepts
If you look at the code above, youll see numbers like 50 and 49 and 3 scattered through the code
These kinds of numeric literals in the body of your code are sometimes called magic numbers
This is not a complement
While numeric literals are not all evil, the numbers shown in the program above should certainly be replaced
by named constants
For example, the code above could declare the variable time_series_length = 50
Then in the loops, 49 should be replaced by time_series_length - 1
The advantages are:
• the meaning is much clearer throughout
• to alter the time series length, you only need to change one value
Sure, global variables (i.e., names assigned to values outside of any function or class) are convenient
Rookie programmers typically use global variables with abandon as we once did ourselves
But global variables are dangerous, especially in medium to large size programs, since
• they can affect what happens in any part of your program
• they can be changed by any function
This makes it much harder to be certain about what some small part of a given piece of code actually
commands
Heres a useful discussion on the topic
While the odd global in small scripts is no big deal, we recommend that you teach yourself to avoid them
(Well discuss how just below)
JIT Compilation
Fortunately, we can easily avoid the evils of global variables and WET code
• WET stands for we love typing and is the opposite of DRY
We can do this by making frequent use of functions or classes
In fact, functions and classes are designed specifically to help us avoid shaming ourselves by repeating code
or excessive use of global variables
Both can be useful, and in fact they work well with each other
Well learn more about these topics over time
(Personal preference is part of the story too)
Whats really important is that you use one or the other or both
Heres some code that reproduces the plot above with better coding style
It uses a function to avoid repetition
Note also that
• global variables are quarantined by collecting together at the end, not the start of the program
• magic numbers are avoided
• the loop at the end where the actual work is done is short and relatively simple
ax.grid(lw=0.2)
ax.set_xlabel('time')
ax.set_ylabel('capital')
ax.set_ylim(0, 18)
ax.legend(loc='upper left', frameon=True, fontsize=14)
plt.show()
3.1.5 Summary
We recommend that you cultivate good habits and style even when you write relatively short programs
Contents
3.2.1 Overview
Key Concepts
As discussed an earlier lecture, in the OOP paradigm, data and functions are bundled together into objects
An example is a Python list, which not only stores data, but also knows how to sort itself, etc.
x = [1, 5, 4]
x.sort()
x
[1, 4, 5]
As we now know, sort is a function that is part of the list object and hence called a method
If we want to make our own types of objects we need to use class definitions
A class definition is a blueprint for a particular class of objects (e.g., lists, strings or complex numbers)
It describes
• What kind of data the class stores
• What methods it has for acting on these data
An object or instance is a realization of the class, created from the blueprint
• Each instance has its own unique data
• Methods set out in the class definition act on this (and other) data
In Python, the data and methods of an object are collectively referred to as attributes
Attributes are accessed via dotted attribute notation
• object_name.data
• object_name.method_name()
In the example
x = [1, 5, 4]
x.sort()
x.__class__
list
• x is an object or instance, created from the definition for Python lists, but with its own particular data
• x.sort() and x.__class__ are two attributes of x
• dir(x) can be used to view all the attributes of x
OOP is useful for the same reason that abstraction is useful: for recognizing and exploiting common struc-
ture
For example,
• a Markov chain consists of a set of states and a collection of transition probabilities for moving across
states
• a general equilibrium theory consists of a commodity space, preferences, technologies, and an equi-
librium definition
• a game consists of a list of players, lists of actions available to each player, player payoffs as functions
of all players actions, and a timing protocol
These are all abstractions that collect together objects of the same type
Recognizing common structure allows us to employ common tools
In economic theory, this might be a proposition that applies to all games of a certain type
In Python, this might be a method thats useful for all Markov chains (e.g., simulate)
When we use OOP, the simulate method is conveniently bundled together with the Markov chain object
class Consumer:
Usage
c1.earn(15)
c1.spend(100)
Insufficent funds
We can of course create multiple instances each with its own data
c1 = Consumer(10)
c2 = Consumer(12)
c2.spend(4)
c2.wealth
c1.wealth
10
c1.__dict__
{'wealth': 10}
c2.__dict__
{'wealth': 8}
When we access or set attributes were actually just modifying the dictionary maintained by the instance
Self
If you look at the Consumer class definition again youll see the word self throughout the code
The rules with self are that
• Any instance data should be prepended with self
– e.g., the earn method references self.wealth rather than just wealth
• Any method defined within the class should have self as its first argument
– e.g., def earn(self, y) rather than just def earn(y)
• Any method referenced within the class should be called as self.method_name
There are no examples of the last rule in the preceding code but we will see some shortly
Details
In this section we look at some more formal details related to classes and self
• You might wish to skip to the next section on first pass of this lecture
• You can return to these details after youve familiarized yourself with more examples
Methods actually live inside a class object formed when the interpreter reads the class definition
Note how the three methods __init__, earn and spend are stored in the class object
Consider the following code
c1 = Consumer(10)
c1.earn(10)
c1.wealth
20
When you call earn via c1.earn(10) the interpreter passes the instance c1 and the argument 10 to Con-
sumer.earn
In fact the following are equivalent
• c1.earn(10)
• Consumer.earn(c1, 10)
In the function call Consumer.earn(c1, 10) note that c1 is the first argument
Recall that in the definition of the earn method, self is the first parameter
The end result is that self is bound to the instance c1 inside the function call
Thats why the statement self.wealth += y inside earn ends up modifying c1.wealth
For our next example, lets write a simple class to implement the Solow growth model
The Solow growth model is a neoclassical growth model where the amount of capital stock per capita kt
evolves according to the rule
szktα + (1 − δ)kt
kt+1 = (3.2)
1+n
Here
• s is an exogenously given savings rate
• z is a productivity parameter
• α is capitals share of income
• n is the population growth rate
• δ is the depreciation rate
The steady state of the model is the k that solves (3.2) when kt+1 = kt = k
Heres a class that implements this model
Some points of interest in the code are
• An instance maintains a record of its current capital stock in the variable self.k
• The h method implements the right hand side of (3.2)
• The update method uses h to update capital as per (3.2)
– Notice how inside update the reference to the local method h is self.h
The methods steady_state and generate_sequence are fairly self explanatory
class Solow:
r"""
Implements the Solow growth model with update rule
"""
def __init__(self, n=0.05, # population growth rate
s=0.25, # savings rate
δ=0.1, # depreciation rate
α=0.3, # share of labor
z=2.0, # productivity
k=1.0): # current capital stock
def h(self):
"Evaluate the h function"
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Apply the update rule
return (s * z * self.k**α + (1 - δ) * self.k) / (1 + n)
def update(self):
def steady_state(self):
"Compute the steady state value of capital."
# Unpack parameters (get rid of self to simplify notation)
n, s, δ, α, z = self.n, self.s, self.δ, self.α, self.z
# Compute and return steady state
return ((s * z) / (n + δ))**(1 / (1 - α))
Heres a little program that uses the class to compute time series from two different initial conditions
The common steady state is also plotted for comparison
s1 = Solow()
s2 = Solow(k=8.0)
T = 60
fig, ax = plt.subplots(figsize=(9, 6))
ax.legend()
plt.show()
Example: A Market
Next lets write a class for a simple one good market where agents are price takers
The market consists of the following objects:
• A linear demand curve Q = ad − bd p
• A linear supply curve Q = az + bz (p − t)
Here
• p is price paid by the consumer, Q is quantity, and t is a per unit tax
• Other symbols are demand and supply parameters
The class provides methods to compute various values of interest, including competitive equlibrium price
and quantity, tax revenue raised, consumer surplus and producer surplus
Heres our implementation
from scipy.integrate import quad
class Market:
"""
self.ad, self.bd, self.az, self.bz, self.tax = ad, bd, az, bz, tax
if ad < az:
def price(self):
"Return equilibrium price"
return (self.ad - self.az + self.bz * self.tax) / (self.bd + self.bz)
def quantity(self):
"Compute equilibrium quantity"
return self.ad - self.bd * self.price()
def consumer_surp(self):
"Compute consumer surplus"
# == Compute area under inverse demand function == #
integrand = lambda x: (self.ad / self.bd) - (1 / self.bd) * x
area, error = quad(integrand, 0, self.quantity())
return area - self.price() * self.quantity()
def producer_surp(self):
"Compute producer surplus"
# == Compute area above inverse supply curve, excluding tax == #
integrand = lambda x: -(self.az / self.bz) + (1 / self.bz) * x
area, error = quad(integrand, 0, self.quantity())
return (self.price() - self.tax) * self.quantity() - area
def taxrev(self):
"Compute tax revenue"
return self.tax * self.quantity()
Heres a short program that uses this class to plot an inverse demand curve together with inverse supply
curves with and without taxes
import numpy as np
q_max = m.quantity() * 2
q_grid = np.linspace(0.0, q_max, 100)
pd = m.inverse_demand(q_grid)
ps = m.inverse_supply(q_grid)
psno = m.inverse_supply_no_tax(q_grid)
fig, ax = plt.subplots()
ax.plot(q_grid, pd, lw=2, alpha=0.6, label='demand')
ax.plot(q_grid, ps, lw=2, alpha=0.6, label='supply')
ax.plot(q_grid, psno, '--k', lw=2, alpha=0.6, label='supply without tax')
ax.set_xlabel('quantity', fontsize=14)
ax.set_xlim(0, q_max)
ax.set_ylabel('price', fontsize=14)
ax.legend(loc='lower right', frameon=False, fontsize=14)
plt.show()
def deadw(m):
"Computes deadweight loss for market m."
# == Create analogous market with no tax == #
m_no_tax = Market(m.ad, m.bd, m.az, m.bz, 0)
# == Compare surplus, return difference == #
surp1 = m_no_tax.consumer_surp() + m_no_tax.producer_surp()
surp2 = m.consumer_surp() + m.producer_surp() + m.taxrev()
return surp1 - surp2
1.125
Example: Chaos
Lets look at one more example, related to chaotic dynamics in nonlinear systems
One simple transition rule that can generate complex dynamics is the logistic map
Lets write a class for generating time series from this model
Heres one implementation
class Chaos:
"""
Models the dynamical system with :math:`x_{t+1} = r x_t (1 - x_t)`
"""
def __init__(self, x0, r):
"""
Initialize with state x0 and parameter r
"""
self.x, self.r = x0, r
def update(self):
"Apply the map to update state."
self.x = self.r * self.x *(1 - self.x)
self.update()
return path
ch = Chaos(0.1, 4.0)
ts_length = 250
fig, ax = plt.subplots()
ax.set_xlabel('$t$', fontsize=14)
ax.set_ylabel('$x_t$', fontsize=14)
x = ch.generate_sequence(ts_length)
ax.plot(range(ts_length), x, 'bo-', alpha=0.5, lw=2, label='$x_t$')
plt.show()
fig, ax = plt.subplots()
ch = Chaos(0.1, 4)
r = 2.5
while r < 4:
ch.r = r
t = ch.generate_sequence(1000)[950:]
ax.plot([r] * len(t), t, 'b.', ms=0.6)
r = r + 0.005
ax.set_xlabel('$r$', fontsize=16)
plt.show()
For r a little bit higher than 3.45, the time series settles down to oscillating among the four values plotted
on the vertical axis
Notice that there is no value of r that leads to a steady state oscillating among three values
Python provides special methods with which some neat tricks can be performed
For example, recall that lists and tuples have a notion of length, and that this length can be queried via the
len function
x = (10, 20)
len(x)
If you want to provide a return value for the len function when applied to your user-defined object, use the
__len__ special method
class Foo:
def __len__(self):
return 42
Now we get
f = Foo()
len(f)
42
class Foo:
f = Foo()
f(8) # Exactly equivalent to f.__call__(8)
50
3.2.5 Exercises
Exercise 1
The empirical cumulative distribution function (ecdf) corresponding to a sample {Xi }ni=1 is defined as
1∑
n
Fn (x) := 1{Xi ≤ x} (x ∈ R) (3.4)
n
i=1
Here 1{Xi ≤ x} is an indicator function (one if Xi ≤ x and zero otherwise) and hence Fn (x) is the fraction
of the sample that falls below x
The Glivenko–Cantelli Theorem states that, provided that the sample is iid, the ecdf Fn converges to the
true distribution function F
Implement Fn as a class called ECDF, where
• A given sample {Xi }ni=1 are the instance data, stored as self.observations
• The class implements a __call__ method that returns Fn (x) for any x
Your code should work as follows (modulo randomness)
0.29
0.479
Exercise 2
∑
N
p(x) = a0 + a1 x + a2 x + · · · aN x
2 N
= an xn (x ∈ R) (3.5)
n=0
The instance data for the class Polynomial will be the coefficients (in the case of (3.5), the numbers
a0 , . . . , aN )
Provide methods that
1. Evaluate the polynomial (3.5), returning p(x) for any x
2. Differentiate the polynomial, replacing the original coefficients with those of its derivative p′
Avoid using any import statements
3.2.6 Solutions
Exercise 1
class ECDF:
# == test == #
print(F(0.5))
0.5
0.486
Exercise 2
class Polynomial:
def differentiate(self):
"Reset self.coefficients to those of p' instead of p."
new_coefficients = []
for i, a in enumerate(self.coefficients):
new_coefficients.append(i * a)
# Remove the first element, which is zero
del new_coefficients[0]
# And reset coefficients data to new values
self.coefficients = new_coefficients
return new_coefficients
Contents
3.3.1 Overview
This lecture creates nonstochastic and stochastic versions of Paul Samuelsons celebrated multiplier acceler-
ator model [Sam39]
In doing so, we extend the example of the Solow model class in our second OOP lecture
Our objectives are to
• provide a more detailed example of OOP and classes
• review a famous model
• review linear difference equations, both deterministic and stochastic
Samuelsons Model
Samuelson used a second-order linear difference equation to represent a model of national output based on
three components:
• a national output identity asserting that national outcome is the sum of consumption plus investment
plus government purchases
• a Keynesian consumption function asserting that consumption at time t is equal to a constant times
national output at time t − 1
• an investment accelerator asserting that investment at time t equals a constant called the accelerator
coefficient times the difference in output between period t − 1 and t − 2
• the idea that consumption plus investment plus government purchases constitute aggregate demand,
which automatically calls forth an equal amount of aggregate supply
(To read about linear difference equations see here or chapter IX of [Sar87])
Samuelson used the model to analyze how particular values of the marginal propensity to consume and the
accelerator coefficient might give rise to transient business cycles in national output
Possible dynamic properties include
• smooth convergence to a constant level of output
• damped business cycles that eventually converge to a constant level of output
• persistent business cycles that neither dampen nor explode
Later we present an extension that adds a random shock to the right side of the national income identity
representing random fluctuations in aggregate demand
This modification makes national output become governed by a second-order stochastic linear difference
equation that, with appropriate parameter values, gives rise to recurrent irregular business cycles.
(To read about stochastic linear difference equations see chapter XI of [Sar87])
3.3.2 Details
Ct = aYt−1 + γ (3.6)
Yt = Ct + It + Gt (3.8)
• The parameter a is peoples marginal propensity to consume out of income - equation (3.6) asserts that
people consume a fraction of math:a in (0,1) of each additional dollar of income
• The parameter $b > 0 $ is the investment accelerator coefficient - equation (3.7) asserts that people
invest in physical capital when income is increasing and disinvest when it is decreasing
Equations (3.6), (3.7), and (3.8) imply the following second-order linear difference equation for national
income:
Yt = (a + b)Yt−1 − bYt−2 + (γ + Gt )
or
where ρ1 = (a + b) and ρ2 = −b
Well ordinarily set the parameters (a, b) so that starting from an arbitrary pair of initial conditions
(Ȳ−1 , Ȳ−2 ), national income $Y_t $ converges to a constant value as t becomes large
We are interested in studying
• the transient fluctuations in Yt as it converges to its steady state level
• the rate at which it converges to a steady state level
The deterministic version of the model described so far meaning that no random shocks hit aggregate
demand has only transient fluctuations
We can convert the model to one that has persistent irregular fluctuations by adding a random shock to
aggregate demand
We create a random or stochastic version of the model by adding a random process of shocks or dis-
turbances {σϵt } to the right side of equation (3.9), leading to the second-order scalar linear stochastic
difference equation:
Yt = ρ1 Yt−1 + ρ2 Yt−2
or
To discover the properties of the solution of (3.11), it is useful first to form the characteristic polynomial
for (3.11):
z 2 − ρ1 z − ρ2 (3.12)
We want to find the two zeros (a.k.a. roots) – namely λ1 , λ2 – of the characteristic polynomial
These are two special values of z, say z = λ1 and z = λ2 , such that if we set z equal to one of these values
in expression (3.12), the characteristic polynomial (3.12) equals zero:
z 2 − ρ1 z − ρ2 = (z − λ1 )(z − λ2 ) = 0 (3.13)
λ1 = reiω , λ2 = re−iω
where r is the amplitude of the complex number and ω is its angle or phase
These can also be represented as
λ1 = r(cos(ω) + i sin(ω))
λ2 = r(cos(ω) − i sin(ω))
(To read about the polar form, see here)
Given initial conditions Y−1 , Y−2 , we want to generate a solution of the difference equation (3.11)
It can be represented as
Yt = λt1 c1 + λt2 c2
where c1 and c2 are constants that depend on the two initial conditions and on ρ1 , ρ2
When the roots are complex, it is useful to pursue the following calculations
Notice that
The only way that Yt can be a real number for each t is if c1 + c2 is a real number and c1 − c2 is an imaginary
number This happens only when c1 and c2 are complex conjugates, in which case they can be written in the
polar forms
c1 = veiθ , c2 = ve−iθ
So we can write
where v and θ are constants that must be chosen to satisfy initial conditions for Y−1 , Y−2
where c̃1 , c̃2 is a pair of constants chosen to satisfy the given initial conditions for Y−1 , Y−2
This formula shows that when the roots are complex, Yt displays oscillations with period p̌ = 2π ω and
damping factor r We say that p̌ is the period because in that amount of time the cosine wave cos(ωt + θ)
goes through exactly one complete cycles (Draw a cosine funtion to convince yourself of this please)
Remark: Following [Sam39], we want to choose the parameters a, b of the model so that the absolute values
(of the possibly complex) roots λ1 , λ2 of the characteristic polynomial are both strictly less than one:
Remark: When both roots λ1 , λ2 of the characteristic polynomial have absolute values strictly less than
one, the absolute value of the larger one governs the rate of convergence to the steady state of the non
stochastic version of the model
We can use a LinearStateSpace instance to do various things that we did above with our homemade function
and class
Among other things, we show by example that the eigenvalues of the matrix A that we use to form the
instance of the LinearStateSpace class for the Samuelson model equal the roots of the characteristic poly-
nomial (3.12) for the Samuelson multiplier accelerator model
Here is the formula for the matrix A in the linear state space system in the case that government expenditures
are a constant G:
1 0 0
A = γ + G ρ 1 ρ2
0 1 0
3.3.3 Implementation
def param_plot():
# Set axis
xmin, ymin = -3, -2
xmax, ymax = -xmin, -ymin
plt.axis([xmin, xmax, ymin, ymax])
return fig
param_plot()
plt.show()
The graph portrays regions in which the (λ1 , λ2 ) root pairs implied by the (ρ1 = (a+b), ρ2 = −b) difference
equation parameter pairs in the Samuelson model are such that:
• (λ1 , λ2 ) are complex with modulus less than 1 - in this case, the {Yt } sequence displays damped
oscillations
• (λ1 , λ2 ) are both real, but one is strictly greater than 1 - this leads to explosive growth
• (λ1 , λ2 ) are both real, but one is strictly less than −1 - this leads to explosive oscillations
• (λ1 , λ2 ) are both real and both are less than 1 in absolute value - in this case, there is smooth conver-
gence to the steady state without damped cycles
Later well present the graph with a red mark showing the particular point implied by the setting of (a, b)
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 > 1 + ρ1 or ρ2 < -1:
print('Explosive oscillations')
elif ρ1 + ρ2 > 1:
print('Explosive growth')
elif discriminant < 0:
print('Roots are complex with modulus less than one; therefore damped
,→oscillations')
else:
print('Roots are real and absolute values are less than zero;
,→therefore get smooth convergence to a steady state')
categorize_solution(1.3, -.4)
Roots are real and absolute values are less than zero; therefore get smooth
,→convergence to a steady state
def plot_y(function=None):
"""function plots path of Y_t"""
plt.subplots(figsize=(12, 8))
plt.plot(function)
plt.xlabel('Time $t$')
plt.ylabel('$Y_t$', rotation=0)
plt.grid()
plt.show()
The following function calculates roots of the characteristic polynomial using high school algebra
(Well calculate the roots in other ways later)
The function also plots a Yt starting from initial conditions that we set
roots = []
ρ1 = α + β
ρ2 = -β
print(f'ρ_1 is {ρ1}')
print(f'ρ_2 is {ρ2}')
discriminant = ρ1 ** 2 + 4 * ρ2
if discriminant == 0:
roots.append(-ρ1 / 2)
print('Single real root: ')
print(''.join(str(roots)))
elif discriminant > 0:
roots.append((-ρ1 + sqrt(discriminant).real) / 2)
roots.append((-ρ1 - sqrt(discriminant).real) / 2)
print('Two real roots: ')
print(''.join(str(roots)))
else:
roots.append((-ρ1 + sqrt(discriminant)) / 2)
roots.append((-ρ1 - sqrt(discriminant)) / 2)
print('Two complex roots: ')
print(''.join(str(roots)))
return y_t
plot_y(y_nonstochastic())
ρ_1 is 1.42
ρ_2 is -0.5
Two real roots:
[-0.6459687576256715, -0.7740312423743284]
Absolute values of roots are less than one
The next cell writes code that takes as inputs the modulus r and phase ϕ of a conjugate pair of complex
numbers in polar form
λ1 = r exp(iϕ), λ2 = r exp(−iϕ)
• The code assumes that these two complex numbers are the roots of the characteristic polynomial
• It then reverse engineers (a, b) and (ρ1 , ρ2 ), pairs that would generate those roots
import cmath
import math
def f(r, ):
"""
Takes modulus r and angle of complex number r exp(j )
and creates ρ1 and ρ2 of characteristic polynomial for which
r exp(j ) and r exp(- j ) are complex roots.
r = .95
period = 10 # Length of cycle in units of time
= 2 * math.pi/period
a, b = (0.6346322893124001+0j) (0.9024999999999999-0j)
ρ1, ρ2 = (1.5371322893124+0j) (-0.9024999999999999+0j)
ρ1 = ρ1.real
ρ2 = ρ2.real
ρ1, ρ2
(1.5371322893124, -0.9024999999999999)
Here well use numpy to compute the roots of the characteristic polynomial
p1 = cmath.polar(r1)
p2 = cmath.polar(r2)
r, = 0.95 0.6283185307179586
p1, p2 = (0.95, 0.6283185307179586) (0.95, -0.6283185307179586)
a, b = (0.6346322893124001+0j) (0.9024999999999999-0j)
ρ1, ρ2 = 1.5371322893124 -0.9024999999999999
# Useful constants
ρ1 = α + β
ρ2 = -β
categorize_solution(ρ1, ρ2)
return y_t
plot_y(y_nonstochastic())
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [ 0.85+0.27838822j 0.85-0.27838822j]
Roots are complex
Roots are less than one
The next cell studies the implications of reverse engineered complex roots
Well generate an undamped cycle of period 10
b = b.real
a, b = 0.6180339887498949 1.0
Roots are complex with modulus less than one; therefore damped oscillations
Roots are [ 0.80901699+0.58778525j 0.80901699-0.58778525j]
Roots are complex
Roots are less than one
We can also use sympy to compute analytic formulas for the roots
import sympy
from sympy import Symbol, init_printing
init_printing()
r1 = Symbol("ρ_1")
r2 = Symbol("ρ_2")
z = Symbol("z")
[ √ √ ]
ρ1 1 ρ1 1
− ρ21 + 4ρ2 , + ρ21 + 4ρ2
2 2 2 2
a = Symbol("α")
b = Symbol("β")
r1 = a + b
r2 = -b
Now well construct some code to simulate the stochastic version of the model that emerges when we add a
random shock process to aggregate demand
# Useful constants
ρ1 = α + β
ρ2 = -β
# Categorize solution
categorize_solution(ρ1, ρ2)
# Generate shocks
= np.random.normal(0, 1, n)
return y_t
plot_y(y_stochastic())
Roots are real and absolute values are less than zero; therefore get smooth
,→convergence to a steady state
[ 0.7236068 0.2763932]
Roots are real
Roots are less than one
Lets do a simulation in which there are shocks and the characteristic polynomial has complex roots
r = .97
b = b.real
a, b = 0.6285929690873979 0.9409000000000001
Roots are complex with modulus less than one; therefore damped oscillations
[ 0.78474648+0.57015169j 0.78474648-0.57015169j]
Roots are complex
Roots are less than one
This function computes a response to either a permanent or one-off increase in government expenditures
def y_stochastic_g(y_0=20,
y_1=20,
α=0.8,
β=0.2,
γ=10,
n=100,
σ=2,
g=0,
g_t=0,
duration='permanent'):
# Useful constants
ρ1 = α + β
ρ2 = -β
# Categorize solution
categorize_solution(ρ1, ρ2)
# Generate shocks
= np.random.normal(0, 1, n)
# Stochastic
else:
= np.random.normal(0, 1, n)
return ρ1 * x[t - 1] + ρ2 * x[t - 2] + γ + g + σ * [t]
# No government spending
if g == 0:
y_t.append(transition(y_t, t))
Roots are real and absolute values are less than zero; therefore get smooth
,→convergence to a steady state
[ 0.7236068 0.2763932]
Roots are real
Roots are less than one
We can also see the response to a one time jump in government expenditures
Roots are real and absolute values are less than zero; therefore get smooth
,→convergence to a steady state
[ 0.7236068 0.2763932]
Roots are real
Roots are less than one
class Samuelson():
.. math::
Parameters
----------
y_0 : scalar
Initial condition for Y_0
y_1 : scalar
Initial condition for Y_1
α : scalar
Marginal propensity to consume
β : scalar
Accelerator coefficient
n : int
Number of iterations
σ : scalar
Volatility parameter. Must be greater than or equal to 0. Set
equal to 0 for non-stochastic model.
g : scalar
Government spending shock
g_t : int
Time at which government spending shock occurs. Must be specified
when duration != None.
duration : {None, 'permanent', 'one-off'}
Specifies type of government spending shock. If none, government
spending equal to g for all t.
"""
def __init__(self,
y_0=100,
y_1=50,
α=1.3,
β=0.2,
γ=10,
n=100,
σ=0,
g=0,
g_t=0,
duration=None):
def root_type(self):
if all(isinstance(root, complex) for root in self.roots):
return 'Complex conjugate'
elif len(self.roots) > 1:
return 'Double real'
else:
return 'Single real'
def root_less_than_one(self):
if all(abs(root) < 1 for root in self.roots):
return True
def solution_type(self):
ρ1, ρ2 = self.ρ1, self.ρ2
discriminant = ρ1 ** 2 + 4 * ρ2
if ρ2 >= 1 + ρ1 or ρ2 <= -1:
return 'Explosive oscillations'
elif ρ1 + ρ2 >= 1:
return 'Explosive growth'
elif discriminant < 0:
return 'Damped oscillations'
else:
return 'Steady state'
# Stochastic
else:
= np.random.normal(0, 1, self.n)
return self.ρ1 * x[t - 1] + self.ρ2 * x[t - 2] + self.γ + g +
,→self.σ * [t]
def generate_series(self):
# No government spending
if self.g == 0:
y_t.append(self._transition(y_t, t))
def summary(self):
print('Summary\n' + '-' * 50)
print(f'Root type: {self.root_type()}')
print(f'Solution type: {self.solution_type()}')
print(f'Roots: {str(self.roots)}')
if self.root_less_than_one() == True:
print('Absolute value of roots is less than one')
else:
print('Absolute value of roots is not less than one')
if self.σ > 0:
print('Stochastic series with σ = ' + str(self.σ))
else:
print('Non-stochastic series')
if self.g != 0:
print('Government spending equal to ' + str(self.g))
if self.duration != None:
print(self.duration.capitalize() +
' government spending shock at t = ' + str(self.g_t))
def plot(self):
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(self.generate_series())
ax.set(xlabel='Iteration', xlim=(0, self.n))
ax.set_ylabel('$Y_t$', rotation=0)
ax.grid()
return fig
def param_plot(self):
fig = param_plot()
ax = fig.gca()
plt.legend(fontsize=12, loc=3)
return fig
Summary
---------------------------------------------------------
Root type: Complex conjugate
Solution type: Damped oscillations
Roots: [ 0.65+0.27838822j 0.65-0.27838822j]
Absolute value of roots is less than one
Stochastic series with σ = 2
Government spending equal to 10
Permanent government spending shock at t = 20
sam.plot()
plt.show()
Well use our graph to show where the roots lie and how their location is consistent with the behavior of the
path just graphed
The red $+$ sign shows the location of the roots
sam.param_plot()
plt.show()
It turns out that we can use the QuantEcon.py LinearStateSpace class to do much of the work that we have
done from scratch above
Here is how we map the Samuelson model into an instance of a LinearStateSpace class
from quantecon import LinearStateSpace
""" This script maps the Samuelson model in the the LinearStateSpace class"""
α = 0.8
β = 0.9
ρ1 = α + β
ρ2 = -β
γ = 10
σ = 1
g = 10
n = 100
A = [[1, 0, 0],
[γ + g, ρ1, ρ2],
[0, 1, 0]]
x, y = sam_t.simulate(ts_length=n)
axes[-1].set_xlabel('Iteration')
plt.show()
Lets plot impulse response functions for the instance of the Samuelson model using a method in the Lin-
earStateSpace class
imres = sam_t.impulse_response()
imres = np.asarray(imres)
y1 = imres[:, :, 0]
y2 = imres[:, :, 1]
y1.shape
(2, 6, 1)
Now lets compute the zeros of the characteristic polynomial by simply calculating the eigenvalues of A
A = np.asarray(A)
w, v = np.linalg.eig(A)
print(w)
We could also create a subclass of LinearStateSpace (inheriting all its methods and attributes) to add more
functions to use
class SamuelsonLSS(LinearStateSpace):
"""
this subclass creates a Samuelson multiplier-accelerator model
as a linear state space system
"""
def __init__(self,
y_0=100,
y_1=100,
α=0.8,
β=0.9,
γ=10,
σ=1,
g=10):
self.α, self.β = α, β
self.y_0, self.y_1, self.g = y_0, y_1, g
self.γ, self.σ = γ, σ
self.ρ1 = α + β
self.ρ2 = -β
self.µ_0 = self.µ_y
self.Σ_0 = self.σ_y
# Exception where no convergence achieved when calculating
,→stationary distributions
except ValueError:
print('Stationary distribution does not exist')
x, y = self.simulate(ts_length)
axes[-1].set_xlabel('Iteration')
return fig
x, y = self.impulse_response(j)
return fig
Illustrations
samlss = SamuelsonLSS()
samlss.plot_simulation(100, stationary=False)
plt.show()
samlss.plot_simulation(100, stationary=True)
plt.show()
samlss.plot_irf(100)
plt.show()
samlss.multipliers()
Lets shut down the accelerator by setting b = 0 to get a pure multiplier model
• the absence of cycles gives an idea about why Samuelson included the accelerator
pure_multiplier.plot_simulation()
pure_multiplier.plot_simulation()
pure_multiplier.plot_irf(100)
3.3.9 Summary
In this lecture, we wrote functions and classes to represent non-stochastic and stochastic versions of the
Samuelson (1939) multiplier-accelerator model, described in [Sam39]
We saw that different parameter values led to different output paths, which could either be stationary, explo-
sive, or oscillating
We also were able to represent the model using the QuantEcon.py LinearStateSpace class
Contents
– Generators
– Recursive Function Calls
– Exercises
– Solutions
3.4.1 Overview
With this last lecture, our advice is to skip it on first pass, unless you have a burning desire to read it
Its here
1. as a reference, so we can link back to it when required, and
2. for those who have worked through a number of applications, and now want to learn more about the
Python language
A variety of topics are treated in the lecture, including generators, exceptions and descriptors
Iterators
f = open('us_cities.txt')
f.__next__()
f.__next__()
We see that file objects do indeed have a __next__ method, and that calling this method returns the next
line in the file
The next method can also be accessed via the builtin function next(), which directly calls this method
next(f)
e = enumerate(['foo', 'bar'])
next(e)
(0, 'foo')
next(e)
(1, 'bar')
f = open('test_table.csv', 'r')
nikkei_data = reader(f)
next(nikkei_data)
next(nikkei_data)
All iterators can be placed to the right of the in keyword in for loop statements
In fact this is how the for loop works: If we write
for x in iterator:
<code block>
f = open('somefile.txt', 'r')
for line in f:
# do something
Iterables
You already know that we can put a Python list to the right of in in a for loop
spam
eggs
x = ['foo', 'bar']
type(x)
list
next(x)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-5e4e57af3a97> in <module>()
----> 1 next(x)
x = ['foo', 'bar']
type(x)
list
y = iter(x)
type(y)
list_iterator
next(y)
'foo'
next(y)
'bar'
next(y)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-62-75a92ee8313a> in <module>()
----> 1 y.next()
StopIteration:
iter(42)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-63-826bbd6e91fc> in <module>()
----> 1 iter(42)
Some built-in functions that act on sequences also work with iterables
• max(), min(), sum(), all(), any()
For example
x = [10, -10]
max(x)
10
y = iter(x)
type(y)
listiterator
max(y)
10
One thing to remember about iterators is that they are depleted by use
x = [10, -10]
y = iter(x)
max(y)
10
max(y)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-72-1d3b6314f310> in <module>()
----> 1 max(y)
x = 42
We now know that when this statement is executed, Python creates an object of type int in your computers
memory, containing
• the value 42
• some associated attributes
g = f
id(g) == id(f)
True
g('test')
test
In the first step, a function object is created, and the name f is bound to it
After binding the name g to the same object, we can use it anywhere we would use f
What happens when the number of names bound to an object goes to zero?
Heres an example of this situation, where the name x is first bound to one object and then rebound to another
x = 'foo'
id(x)
164994764
What happens here is that the first object, with identity 164994764 is garbage collected
In other words, the memory slot that stores that object is deallocated, and returned to the operating system
Namespaces
x = 42
# Filename: math2.py
pi = 'foobar'
import math2
Next lets import the math module from the standard library
import math
math.pi
3.1415926535897931
math2.pi
'foobar'
These two different bindings of pi exist in different namespaces, each one implemented as a dictionary
We can look at the dictionary directly, using module_name.__dict__
import math
math.__dict__
import math2
math2.__dict__
As you know, we access elements of the namespace using the dotted attribute notation
math.pi
3.1415926535897931
math.__dict__['pi'] == math.pi
True
Viewing Namespaces
vars(math)
dir(math)
print(math.__doc__)
math.__name__
'math'
Interactive Sessions
To check this, we can look at the current module name via the value of __name__ given at the prompt
print(__name__)
__main__
When we run a script using IPythons run command, the contents of the file are executed as part of
__main__ too
To see this, lets create a file mod.py that prints its own __name__ attribute
# Filename: mod.py
print(__name__)
mod
__main__
In the second case, the code is executed as part of __main__, so __name__ is equal to __main__
To see the contents of the namespace of __main__ we use vars() rather than vars(__main__)
If you do this in IPython, you will see a whole lot of variables that IPython needs, and has initialized when
you started up your session
If you prefer to see only the variables you have initialized, use whos
x = 2
y = 3
import numpy as np
%whos
For example, suppose that we start the interpreter and begin making assignments
We are now working in the module __main__, and hence the namespace for __main__ is the global
namespace
Next, we import a module called amodule
import amodule
At this point, the interpreter creates a namespace for the module amodule and starts executing commands
in the module
While this occurs, the namespace amodule.__dict__ is the global namespace
Once execution of the module finishes, the interpreter returns to the module from where the import statement
was made
In this case its __main__, so the namespace of __main__ again becomes the global namespace
Local Namespaces
Important fact: When we call a function, the interpreter creates a local namespace for that function, and
registers the variables in that namespace
The reason for this will be explained in just a moment
Variables in the local namespace are called local variables
After the function returns, the namespace is deallocated and lost
While the function is executing, we can view the contents of the local namespace with locals()
For example, consider
def f(x):
a = 2
print(locals())
return a * x
f(1)
{'a': 2, 'x': 1}
We have been using various built-in functions, such as max(), dir(), str(), list(), len(),
range(), type(), etc.
How does access to these names work?
dir()
dir(__builtins__)
__builtins__.max
But __builtins__ is special, because we can always access them directly as well
max
__builtins__.max == max
True
Name Resolution
Here f is the enclosing function for g, and each function gets its own namespaces
Now we can give the rule for how namespace resolution works:
The order in which the interpreter searches for names is
1. the local namespace (if it exists)
2. the hierarchy of enclosing namespaces (if they exist)
3. the global namespace
4. the builtin namespace
If the name is not in any of these namespaces, the interpreter raises a NameError
This is called the LEGB rule (local, enclosing, global, builtin)
Heres an example that helps to illustrate
Consider a script test.py that looks as follows
def g(x):
a = 1
x = x + a
return x
a = 0
y = g(10)
print("a = ", a, "y = ", y)
a = 0 y = 11
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-401b30e3b8b5> in <module>()
----> 1 x
First,
• The global namespace {} is created
• The function object is created, and g is bound to it within the global namespace
• The name a is bound to 0, again in the global namespace
Next g is called via y = g(10), leading to the following sequence of actions
• The local namespace for the function is created
• Local names x and a are bound, so that the local namespace becomes {'x': 10, 'a': 1}
• Statement x = x + a uses the local a and local x to compute x + a, and binds local name x to
the result
• This value is returned, and y is bound to it in the global namespace
• Local x and a are discarded (and the local namespace is deallocated)
Note that the global a was not affected by the local a
This is a good time to say a little more about mutable vs immutable objects
Consider the code segment
def f(x):
x = x + 1
return x
x = 1
print(f(x), x)
We now understand what will happen here: The code prints 2 as the value of f(x) and 1 as the value of x
First f and x are registered in the global namespace
The call f(x) creates a local namespace and adds x to it, bound to 1
Next, this local x is rebound to the new integer object 2, and this value is returned
None of this affects the global x
However, its a different story when we use a mutable data type such as a list
def f(x):
x[0] = x[0] + 1
return x
x = [1]
print(f(x), x)
1 ∑
n
s2 := (yi − ȳ)2 ȳ = sample mean
n−1
i=1
Assertions
def var(y):
n = len(y)
assert n > 1, 'Sample size must be greater than one.'
return np.sum((y - y.mean())**2) / float(n-1)
If we run this with an array of length one, the program will terminate and print our error message
var([1])
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-20-0032ff8a150f> in <module>()
----> 1 var([1])
<ipython-input-19-cefafaec3555> in var(y)
1 def var(y):
2 n = len(y)
----> 3 assert n > 1, 'Sample size must be greater than one.'
4 return np.sum((y - y.mean())**2) / float(n-1)
The approach used above is a bit limited, because it always leads to termination
Sometimes we can handle errors more gracefully, by treating special cases
Lets look at how this is done
Exceptions
def f:
Since illegal syntax cannot be executed, a syntax error terminates execution of the program
Heres a different kind of error, unrelated to syntax
1 / 0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-17-05c9758a9c21> in <module>()
----> 1 1/0
Heres another
x1 = y1
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-23-142e0509fbd6> in <module>()
----> 1 x1 = y1
And another
'foo' + 6
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-20-44bbe7e963e7> in <module>()
----> 1 'foo' + 6
And another
X = []
x = X[0]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-22-018da6d9fc14> in <module>()
----> 1 x = X[0]
Catching Exceptions
We can catch and deal with exceptions using try – except blocks
Heres a simple example
def f(x):
try:
return 1.0 / x
except ZeroDivisionError:
print('Error: division by zero. Returned None')
return None
f(2)
0.5
f(0)
f(0.0)
def f(x):
try:
return 1.0 / x
except ZeroDivisionError:
print('Error: Division by zero. Returned None')
except TypeError:
print('Error: Unsupported operation. Returned None')
return None
f(2)
0.5
f(0)
f('foo')
def f(x):
try:
return 1.0 / x
except (TypeError, ZeroDivisionError):
print('Error: Unsupported operation. Returned None')
return None
f(2)
0.5
f(0)
f('foo')
def f(x):
try:
return 1.0 / x
except:
print('Error. Returned None')
return None
Lets look at some special syntax elements that are routinely used by Python developers
You might not need the following concepts immediately, but you will see them in other peoples code
Hence you need to understand them at some stage of your Python education
Decorators
Decorators are a bit of syntactic sugar that, while easily avoided, have turned out to be popular
Its very easy to say what decorators do
On the other hand it takes a bit of effort to explain why you might use them
An Example
import numpy as np
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
Now suppose theres a problem: occasionally negative numbers get fed to f and g in the calculations that
follow
If you try it, youll see that when these functions are called with negative numbers they return a NumPy
object called nan
This stands for not a number (and indicates that you are trying to evaluate a mathematical function at a point
where it is not defined)
Perhaps this isnt what we want, because it causes other problems that are hard to pick up later on
Suppose that instead we want the program to terminate whenever this happens, with a sensible error message
This change is easy enough to implement
import numpy as np
def f(x):
assert x >= 0, "Argument must be nonnegative"
return np.log(np.log(x))
def g(x):
assert x >= 0, "Argument must be nonnegative"
return np.sqrt(42 * x)
Notice however that there is some repetition here, in the form of two identical lines of code
Repetition makes our code longer and harder to maintain, and hence is something we try hard to avoid
Here its not a big deal, but imagine now that instead of just f and g, we have 20 such functions that we need
to modify in exactly the same way
This means we need to repeat the test logic (i.e., the assert line testing nonnegativity) 20 times
The situation is still worse if the test logic is longer and more complicated
In this kind of scenario the following approach would be neater
import numpy as np
def check_nonneg(func):
def safe_function(x):
assert x >= 0, "Argument must be nonnegative"
return func(x)
return safe_function
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
# Program continues with various calculations using f and g
Enter Decorators
def f(x):
return np.log(np.log(x))
def f(x):
return np.log(np.log(x))
def g(x):
return np.sqrt(42 * x)
f = check_nonneg(f)
g = check_nonneg(g)
with
@check_nonneg
def f(x):
return np.log(np.log(x))
@check_nonneg
def g(x):
return np.sqrt(42 * x)
Descriptors
class Car:
One potential problem we might have here is that a user alters one of these variables but not the other
car = Car()
car.miles
1000
car.kms
1610.0
car.miles = 6000
car.kms
1610.0
In the last two lines we see that miles and kms are out of sync
What we really want is some mechanism whereby each time a user sets one of these variables, the other is
automatically updated
A Solution
def get_miles(self):
return self._miles
def get_kms(self):
return self._kms
car = Car()
car.miles
1000
car.miles = 6000
car.kms
9660.0
How it Works
The names _miles and _kms are arbitrary names we are using to store the values of the variables
The objects miles and kms are properties, a common kind of descriptor
The methods get_miles, set_miles, get_kms and set_kms define what happens when you get
(i.e. access) or set (bind) these variables
• So-called getter and setter methods
The builtin Python function property takes getter and setter methods and creates a property
For example, after car is created as an instance of Car, the object car.miles is a property
Being a property, when we set its value via car.miles = 6000 its setter method is triggered in this
case set_miles
These days its very common to see the property function used via a decorator
Heres another version of our Car class that works as before but now uses decorators to set up the properties
class Car:
@property
def miles(self):
return self._miles
@property
def kms(self):
return self._kms
@miles.setter
def miles(self, value):
self._miles = value
self._kms = value * 1.61
@kms.setter
def kms(self, value):
self._kms = value
self._miles = value / 1.61
3.4.6 Generators
Generator Expressions
tuple
type(plural)
list
generator
next(plural)
'dogs'
next(plural)
'cats'
next(plural)
'birds'
285
The function sum() calls next() to get the items, adds successive terms
In fact, we can omit the outer brackets in this case
285
Generator Functions
The most flexible way to create generator objects is to use generator functions
Lets look at some examples
Example 1
def f():
yield 'start'
yield 'middle'
yield 'end'
It looks like a function, but uses a keyword yield that we havent met before
Lets see how it works after running this code
type(f)
function
gen = f()
gen
next(gen)
'start'
next(gen)
'middle'
next(gen)
'end'
next(gen)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-21-b2c61ce5e131> in <module>()
----> 1 gen.next()
StopIteration:
The generator function f() is used to create generator objects (in this case gen)
Generators are iterators, because they support a next method
The first call to next(gen)
• Executes code in the body of f() until it meets a yield statement
• Returns that value to the caller of next(gen)
The second call to next(gen) starts executing from the next line
def f():
yield 'start'
yield 'middle' # This line!
yield 'end'
Example 2
def g(x):
while x < 100:
yield x
x = x * x
<function __main__.g>
gen = g(2)
type(gen)
generator
next(gen)
next(gen)
next(gen)
16
next(gen)
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-32-b2c61ce5e131> in <module>()
----> 1 gen.next()
StopIteration:
def g(x):
while x < 100:
yield x
x = x * x # execution continues from here
def g(x):
while 1:
yield x
x = x * x
Advantages of Iterators
import random
n = 10000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
sum(draws)
But we are creating two huge lists here, range(n) and draws
This uses lots of memory and is very slow
If we make n even bigger then this happens
n = 1000000000
draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-9-20d1ec1dae24> in <module>()
----> 1 draws = [random.uniform(0, 1) < 0.5 for i in range(n)]
def f(n):
i = 1
while i <= n:
yield random.uniform(0, 1) < 0.5
i += 1
n = 10000000
draws = f(n)
draws
sum(draws)
4999141
In summary, iterables
• avoid the need to create big lists/tuples, and
• provide a uniform interface to iteration that can be used transparently in for loops
This is not something that you will use every day, but it is still useful you should learn it at some stage
Basically, a recursive function is a function that calls itself
For example, consider the problem of computing xt for some t when
def x_loop(t):
x = 1
for i in range(t):
x = 2 * x
return x
def x(t):
if t == 0:
return 1
else:
return 2 * x(t-1)
What happens here is that each successive call uses its own frame in the stack
• a frame is where the local variables of a given function call are held
• stack is memory used to process function calls
– a First In Last Out (FILO) queue
This example is somewhat contrived, since the first (iterative) solution would usually be preferred to the
recursive solution
Well meet less contrived applications of recursion later on
3.4.8 Exercises
Exercise 1
The first few numbers in the sequence are: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55
Write a function to recursively compute the t-th Fibonacci number for any t
Exercise 2
Complete the following code, and test it using this csv file, which we assume that youve put in your current
working directory.
dates = column_iterator('test_table.csv', 1)
Exercise 3
prices
3
8
7
21
Using try – except, write a program to read in the contents of the file and sum the numbers, ignoring
lines without numbers
3.4.9 Solutions
Exercise 1
def x(t):
if t == 0:
return 0
if t == 1:
return 1
else:
return x(t-1) + x(t-2)
Lets test it
Exercise 2
A small sample from test_table.csv is included (and saved) in the code below for convenience
%%file test_table.csv
Date,Open,High,Low,Close,Volume,Adj Close
2009-05-21,9280.35,9286.35,9189.92,9264.15,133200,9264.15
2009-05-20,9372.72,9399.40,9311.61,9344.64,143200,9344.64
2009-05-19,9172.56,9326.75,9166.97,9290.29,167000,9290.29
2009-05-18,9167.05,9167.82,8997.74,9038.69,147800,9038.69
2009-05-15,9150.21,9272.08,9140.90,9265.02,172000,9265.02
2009-05-14,9212.30,9223.77,9052.41,9093.73,169400,9093.73
2009-05-13,9305.79,9379.47,9278.89,9340.49,176000,9340.49
2009-05-12,9358.25,9389.61,9298.61,9298.61,188400,9298.61
2009-05-11,9460.72,9503.91,9342.75,9451.98,230800,9451.98
2009-05-08,9351.40,9464.43,9349.57,9432.83,220200,9432.83
dates = column_iterator('test_table.csv', 1)
i = 1
for date in dates:
print(date)
if i == 10:
break
i += 1
Date
2009-05-21
2009-05-20
2009-05-19
2009-05-18
2009-05-15
2009-05-14
2009-05-13
2009-05-12
2009-05-11
Exercise 3
%%file numbers.txt
prices
3
8
7
21
Writing numbers.txt
f = open('numbers.txt')
total = 0.0
for line in f:
try:
total += float(line)
except ValueError:
pass
f.close()
print(total)
39.0
3.5 Debugging
Contents
• Debugging
– Overview
– Debugging
– Other Useful Magics
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code
as cleverly as possible, you are, by definition, not smart enough to debug it. – Brian Kernighan
3.5.1 Overview
Are you one of those programmers who fills their code with print statements when trying to debug their
programs?
Hey, we all used to do that
(OK, sometimes we still do that)
But once you start writing larger programs youll need a better system
Debugging tools for Python vary across platforms, IDEs and editors
Here well focus on Jupyter and leave you to explore other settings
3.5.2 Debugging
import numpy as np
import matplotlib.pyplot as plt
def plot_log():
fig, ax = plt.subplots(2, 1)
x = np.linspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
This code is intended to plot the log function over the interval [1, 2]
But theres an error here: plt.subplots(2, 1) should be just plt.subplots()
(The call plt.subplots(2, 1) returns a NumPy array containing two axes objects, suitable for having
two subplots on the same figure)
Heres what happens when we run the code:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-1-ef5c75a58138> in <module>()
8 plt.show()
9
---> 10 plot_log() # Call the function, generate plot
<ipython-input-1-ef5c75a58138> in plot_log()
5 fig, ax = plt.subplots(2, 1)
6 x = np.linspace(1, 2, 10)
----> 7 ax.plot(x, np.log(x))
8 plt.show()
9
The traceback shows that the error occurs at the method call ax.plot(x, np.log(x))
The error occurs because we have mistakenly made ax a NumPy array, and a NumPy array has no plot
method
But lets pretend that we dont understand this for the moment
We might suspect theres something wrong with ax but when we try to investigate this object, we get the
following exception:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-645aedc8a285> in <module>()
----> 1 ax
The problem is that ax was defined inside plot_log(), and the name is lost once that function terminates
Lets try doing it a different way
We run the first cell block again, generating the same error
import numpy as np
import matplotlib.pyplot as plt
def plot_log():
fig, ax = plt.subplots(2, 1)
x = np.linspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-1-ef5c75a58138> in <module>()
8 plt.show()
9
---> 10 plot_log() # Call the function, generate plot
<ipython-input-1-ef5c75a58138> in plot_log()
5 fig, ax = plt.subplots(2, 1)
6 x = np.linspace(1, 2, 10)
----> 7 ax.plot(x, np.log(x))
8 plt.show()
9
%debug
You should be dropped into a new prompt that looks something like this
ipdb>
ipdb> ax
array([<matplotlib.axes.AxesSubplot object at 0x290f5d0>,
<matplotlib.axes.AxesSubplot object at 0x2930810>], dtype=object)
Its now very clear that ax is an array, which clarifies the source of the problem
To find out what else you can do from inside ipdb (or pdb), use the online help
ipdb> h
Undocumented commands:
======================
retval rv
ipdb> h c
c(ont(inue))
Continue execution, only stop when a breakpoint is encountered.
import numpy as np
import matplotlib.pyplot as plt
def plot_log():
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Here the original problem is fixed, but weve accidentally written np.logspace(1, 2, 10) instead of
np.linspace(1, 2, 10)
Now there wont be any exception, but the plot wont look right
To investigate, it would be helpful if we could inspect variables like x during execution of the function
To this end , we add a break point by inserting the line from IPython.core.debugger import
Tracer; Tracer()() inside the function code block
import numpy as np
import matplotlib.pyplot as plt
from IPython.core.debugger import Pdb
def plot_log():
Pdb().set_trace()
fig, ax = plt.subplots()
x = np.logspace(1, 2, 10)
ax.plot(x, np.log(x))
plt.show()
plot_log()
Now lets run the script, and investigate via the debugger
> <ipython-input-5-c5864f6d184b>(6)plot_log()
4 def plot_log():
5 from IPython.core.debugger import Tracer; Tracer()()
----> 6 fig, ax = plt.subplots()
7 x = np.logspace(1, 2, 10)
8 ax.plot(x, np.log(x))
ipdb> n
> <ipython-input-5-c5864f6d184b>(7)plot_log()
5 from IPython.core.debugger import Tracer; Tracer()()
6 fig, ax = plt.subplots()
----> 7 x = np.logspace(1, 2, 10)
8 ax.plot(x, np.log(x))
9 plt.show()
ipdb> n
> <ipython-input-5-c5864f6d184b>(8)plot_log()
6 fig, ax = plt.subplots()
7 x = np.logspace(1, 2, 10)
----> 8 ax.plot(x, np.log(x))
9 plt.show()
10
ipdb> x
array([ 10. , 12.91549665, 16.68100537, 21.5443469 ,
27.82559402, 35.93813664, 46.41588834, 59.94842503,
77.42636827, 100. ])
We used n twice to step forward through the code (one line at a time)
Then we printed the value of x to see what was happening with that variable
To exit from the debugger, use q
FOUR
This part of the course provides a set of lectures focused on Data and Empirics using Python
4.1 Pandas
Contents
• Pandas
– Overview
– Series
– DataFrames
– On-Line Data Sources
– Exercises
– Solutions
4.1.1 Overview
249
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
Just as NumPy provides the basic array data type plus core array operations, pandas
1. defines fundamental structures for working with data and
2. endows them with methods that facilitate operations such as
• reading in data
• adjusting indices
• working with dates and time series
• sorting, grouping, re-ordering and general data munging1
• dealing with missing values, etc., etc.
More sophisticated statistical functionality is left to other packages, such as statsmodels and scikit-learn,
which are built on top of pandas
This lecture will provide a basic introduction to pandas
Throughout the lecture we will assume that the following imports have taken place
import pandas as pd
import numpy as np
1
Wikipedia defines munging as cleaning data from one raw form into a structured, purged one.
4.1.2 Series
Two important data types defined by pandas are Series and DataFrame
You can think of a Series as a column of data, such as a collection of observations on a single variable
A DataFrame is an object for storing related columns of data
Lets start with Series
0 0.430271
1 0.617328
2 -0.265421
3 -0.836113
Name: daily returns, dtype: float64
Here you can imagine the indices 0, 1, 2, 3 as indexing four listed companies, and the values being
daily returns on their shares
Pandas Series are built on top of NumPy arrays, and support many similar operations
s * 100
0 43.027108
1 61.732829
2 -26.542104
3 -83.611339
Name: daily returns, dtype: float64
np.abs(s)
0 0.430271
1 0.617328
2 0.265421
3 0.836113
Name: daily returns, dtype: float64
s.describe()
count 4.000000
mean -0.013484
std 0.667092
min -0.836113
25% -0.408094
50% 0.082425
75% 0.477035
max 0.617328
Name: daily returns, dtype: float64
AMZN 0.430271
AAPL 0.617328
MSFT -0.265421
GOOG -0.836113
Name: daily returns, dtype: float64
Viewed in this way, Series are like fast, efficient Python dictionaries (with the restriction that the items in
the dictionary all have the same typein this case, floats)
In fact, you can use much of the same syntax as Python dictionaries
s['AMZN']
0.43027108469945924
s['AMZN'] = 0
s
AMZN 0.000000
AAPL 0.617328
MSFT -0.265421
GOOG -0.836113
Name: daily returns, dtype: float64
'AAPL' in s
True
4.1.3 DataFrames
While a Series is a single column of data, a DataFrame is several columns, one for each variable
In essence, a DataFrame in pandas is analogous to a (highly optimized) Excel spreadsheet
Thus, it is a powerful tool for representing and analyzing data that are naturally organized into rows and
columns, often with descriptive indexes for individual rows and individual columns
Lets look at an example that reads data from the CSV file pandas/data/test_pwt.csv, and can be
downloaded here
Heres the contents of test_pwt.csv
"country","country isocode","year","POP","XRAT","tcgdp","cc","cg"
"Argentina","ARG","2000","37335.653","0.9995","295072.21869","75.716805379",
,→"5.5788042896"
"Australia","AUS","2000","19053.186","1.72483","541804.6521","67.759025993",
,→"6.7200975332"
"India","IND","2000","1006300.297","44.9416","1728144.3748","64.575551328",
,→"14.072205773"
"Israel","ISR","2000","6114.57","4.07733","129253.89423","64.436450847","10.
,→266688415"
"Malawi","MWI","2000","11801.505","59.543808333","5026.2217836","74.707624181
,→","11.658954494"
"South Africa","ZAF","2000","45064.098","6.93983","227242.36949","72.718710427
,→","5.7265463933"
"United States","USA","2000","282171.957","1","9898700","72.347054303","6.
,→0324539789"
"Uruguay","URY","2000","3219.793","12.099591667","25255.961693","78.978740282
,→","5.108067988"
Supposing you have this data saved as test_pwt.csv in the present working directory (type %pwd in Jupyter
to see what this is), it can be read in as follows:
df = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/
,→master/pandas/data/test_pwt.csv')
type(df)
pandas.core.frame.DataFrame
df
We can select particular rows using standard Python array slicing notation
df[2:5]
To select columns, we can pass a list containing the names of the desired columns represented as strings
df[['country', 'tcgdp']]
country tcgdp
0 Argentina 295072.218690
1 Australia 541804.652100
2 India 1728144.374800
3 Israel 129253.894230
4 Malawi 5026.221784
5 South Africa 227242.369490
6 United States 9898700.000000
7 Uruguay 25255.961693
To select both rows and columns using integers, the iloc attribute should be used with the format .
iloc[rows,columns]
df.iloc[2:5,0:4]
To select rows and columns using a mixture of integers and labels, the loc attribute can be used in a similar
way
country tcgdp
2 India 1728144.374800
3 Israel 129253.894230
4 Malawi 5026.221784
Lets imagine that were only interested in population and total GDP (tcgdp)
One way to strip the data frame df down to only these variables is to overwrite the dataframe using the
selection method described above
df = df[['country','POP','tcgdp']]
df
Here the index 0, 1,..., 7 is redundant, because we can use the country names as an index
To do this, we set the index to be the country variable in the dataframe
df = df.set_index('country')
df
POP tcgdp
country
Argentina 37335.653 295072.218690
Australia 19053.186 541804.652100
India 1006300.297 1728144.374800
Israel 6114.570 129253.894230
Malawi 11801.505 5026.221784
South Africa 45064.098 227242.369490
United States 282171.957 9898700.000000
Uruguay 3219.793 25255.961693
Next were going to add a column showing real GDP per capita, multiplying by 1,000,000 as we go because
total GDP is in millions
One of the nice things about pandas DataFrame and Series objects is that they have methods for plotting
and visualization that work through Matplotlib
For example, we can easily generate a bar plot of GDP per capita
df['GDP percap'].plot(kind='bar')
plt.show()
At the moment the data frame is ordered alphabetically on the countrieslets change it to GDP per capita
df['GDP percap'].plot(kind='bar')
plt.show()
https://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.csv
One option is to use requests, a standard Python library for requesting data over the Internet
To begin, try the following code on your computer
import requests
r = requests.get('http://research.stlouisfed.org/fred2/series/UNRATE/
,→downloaddata/UNRATE.csv')
url = 'http://research.stlouisfed.org/fred2/series/UNRATE/downloaddata/UNRATE.
,→csv'
source = requests.get(url).content.decode().split("\n")
source[0]
'DATE,VALUE\r\n'
source[1]
'1948-01-01,3.4\r\n'
source[2]
'1948-02-01,3.8\r\n'
We could now write some additional code to parse this text and store it as an array
But this is unnecessary pandas read_csv function can handle the task for us
We use parse_dates=True so that pandas recognizes our dates column, allowing for simple date filter-
ing
The data has been read into a pandas DataFrame called data that we can now manipulate in the usual way
type(data)
pandas.core.frame.DataFrame
VALUE
DATE
1948-01-01 3.4
1948-02-01 3.8
1948-03-01 4.0
1948-04-01 3.9
1948-05-01 3.5
pd.set_option('precision', 1)
data.describe() # Your output might differ slightly
VALUE
count 830.0
mean 5.8
std 1.6
min 2.5
25% 4.7
50% 5.6
75% 6.9
max 10.8
We can also plot the unemployment rate from 2006 to 2012 as follows
data['2006':'2012'].plot()
plt.show()
Lets look at one more example of downloading and manipulating data this time from the World Bank
The World Bank collects and organizes data on a huge range of indicators
For example, heres some data on government debt as a ratio to GDP
If you click on DOWNLOAD DATA you will be given the option to download the data as an Excel file
The next program does this for you, reads an Excel file into a pandas DataFrame, and plots time series for
the US and Australia
r = requests.get(wb_data_query)
with open('gd.xls', 'wb') as output:
output.write(r.content)
4.1.5 Exercises
Exercise 1
Write a program to calculate the percentage price change over 2013 for the following shares
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}
A dataset of daily closing prices for the above firms can be found in pandas/data/ticker_data.
csv, and can be downloaded here
Plot the result as a bar graph like follows
4.1.6 Solutions
Exercise 1
ticker = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/
,→raw/master/pandas/data/ticker_data.csv')
ticker.set_index('Date', inplace=True)
'AAPL': 'Apple',
'AMZN': 'Amazon',
'BA': 'Boeing',
'QCOM': 'Qualcomm',
'KO': 'Coca-Cola',
'GOOG': 'Google',
'SNE': 'Sony',
'PTR': 'PetroChina'}
price_change = pd.Series()
price_change.sort_values(inplace=True)
fig, ax = plt.subplots(figsize=(10,8))
price_change.plot(kind='bar', ax=ax)
plt.show()
Contents
– Exercises
– Solutions
4.2.1 Overview
We will read in a dataset from the OECD of real minimum wages in 32 countries and assign it to realwage
The dataset pandas_panel/realwage.csv can be downloaded here
Make sure the file is in your current working directory
import pandas as pd
realwage = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/
,→raw/master/pandas_panel/realwage.csv')
The data is currently in long format, which is difficult to analyse when there are several dimensions to the
data
We will use pivot_table to create a wide format panel, with a MultiIndex to handle higher dimen-
sional data
pivot_table arguments should specify the data (values), the index, and the columns we want in our
resulting dataframe
By passing a list in columns, we can create a MultiIndex in our column axis
realwage = realwage.pivot_table(values='value',
index='Time',
columns=['Country', 'Series', 'Pay period'])
realwage.head()
To more easily filter our time series data later on, we will convert the index into a DateTimeIndex
realwage.index = pd.to_datetime(realwage.index)
type(realwage.index)
pandas.core.indexes.datetimes.DatetimeIndex
The columns contain multiple levels of indexing, known as a MultiIndex, with levels being ordered
hierarchically (Country > Series > Pay period)
A MultiIndex is the simplest and most flexible way to manage panel data in pandas
type(realwage.columns)
pandas.core.indexes.multi.MultiIndex
realwage.columns.names
Like before, we can select the country (the top level of our MultiIndex)
realwage['United States'].head()
Stacking and unstacking levels of the MultiIndex will be used throughout this lecture to reshape our
dataframe into a format we need
.stack() rotates the lowest level of the column MultiIndex to the row index (.unstack() works
in the opposite direction - try it out)
realwage.stack().head()
We can also pass in an argument to select the level we would like to stack
realwage.stack(level='Country').head()
realwage['2015'].stack(level=(1, 2)).transpose().head()
For the rest of lecture, we will work with a dataframe of the hourly real minimum wages across countries
and time, measured in 2015 US dollars
To create our filtered dataframe (realwage_f), we can use the xs method to select values at lower levels
in the multiindex, while keeping the higher levels (countries in this case)
Similar to relational databases like SQL, pandas has built in methods to merge datasets together
Using country information from WorldData.info, well add the continent of each country to realwage_f
with the merge function
The csv file can be found in pandas_panel/countries.csv, and can be downloaded here
worlddata = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/
,→raw/master/pandas_panel/countries.csv', sep=';')
worlddata.head()
First well select just the country and continent variables from worlddata and rename the column to
Country
realwage_f.transpose().head()
We can use either left, right, inner, or outer join to merge our datasets:
• left join includes only countries from the left dataset
• right join includes only countries from the right dataset
• outer join includes countries that are in either the left and right datasets
• inner join includes only countries common to both the left and right datasets
By default, merge will use an inner join
Here we will pass how='left' to keep all countries in realwage_f, but discard countries in
worlddata that do not have a corresponding data entry realwage_f
This is illustrated by the red shading in the following diagram
We will also need to specify where the country name is located in each dataframe, which will be the key
that is used to merge the dataframes on
Our left dataframe (realwage_f.transpose()) contains countries in the index, so we set
left_index=True
Our right dataframe (worlddata) contains countries in the Country column, so we set
right_on='Country'
Countries that appeared in realwage_f but not in worlddata will have NaN in the Continent column
To check whether this has occurred, we can use .isnull() on the continent column and filter the merged
dataframe
merged[merged['Continent'].isnull()]
merged['Country'].map(missing_continents)
17 NaN
23 NaN
32 NaN
100 NaN
38 NaN
108 NaN
41 NaN
225 NaN
53 NaN
58 NaN
45 NaN
68 NaN
233 NaN
86 NaN
88 NaN
91 NaN
247 Asia
117 NaN
122 NaN
123 NaN
138 NaN
153 NaN
151 NaN
174 NaN
175 NaN
247 Europe
247 Europe
198 NaN
200 NaN
227 NaN
241 NaN
240 NaN
Name: Country, dtype: object
merged['Continent'] = merged['Continent'].fillna(merged['Country'].
,→map(missing_continents))
merged[merged['Country'] == 'Korea']
We will also combine the Americas into a single continent - this will make our visualization nicer later on
To do this, we will use .replace() and loop through a list of the continent values we want to replace
Now that we have all the data we want in a single DataFrame, we will reshape it back into panel form
with a MultiIndex
We should also ensure to sort the index using .sort_index() so that we can efficiently filter our
dataframe later on
By default, levels will be sorted top-down
While merging, we lost our DatetimeIndex, as we merged columns that were not in datetime format
merged.columns
Now that we have set the merged columns as the index, we can recreate a DatetimeIndex using .
to_datetime()
merged.columns = pd.to_datetime(merged.columns)
merged.columns = merged.columns.rename('Time')
merged.columns
The DatetimeIndex tends to work more smoothly in the row axis, so we will go ahead and transpose
merged
merged = merged.transpose()
merged.head()
Grouping and summarizing data can be particularly useful for understanding large panel datasets
A simple way to summarize data is to call an aggregation method on the dataframe, such as .mean() or
.max()
For example, we can calculate the average real minimum wage for each country over the period 2006 to
2016 (the default is to aggregate over rows)
merged.mean().head(10)
Continent Country
America Brazil 1.09
Canada 7.82
Chile 1.62
Colombia 1.07
Costa Rica 2.53
Mexico 0.53
United States 7.15
Asia Israel 5.95
Japan 6.18
Korea 4.22
dtype: float64
Using this series, we can plot the average real minimum wage over the past decade for each country in our
data set
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('seaborn')
%matplotlib inline
merged.mean().sort_values(ascending=False).plot(kind='bar', title="Average
,→real minimum wage 2006 - 2016")
plt.show()
Passing in axis=1 to .mean() will aggregate over columns (giving the average minimum wage for all
countries over time)
merged.mean(axis=1).head()
Time
2006-01-01 4.69
2007-01-01 4.84
2008-01-01 4.90
2009-01-01 5.08
2010-01-01 5.11
dtype: float64
merged.mean(axis=1).plot()
plt.title('Average real minimum wage 2006 - 2016')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
We can also specify a level of the MultiIndex (in the column axis) to aggregate over
merged.mean(level='Continent', axis=1).head()
We can plot the average minimum wages in each continent as a time series
merged.mean(level='Continent', axis=1).plot()
plt.title('Average real minimum wage')
plt.ylabel('2015 USD')
plt.xlabel('Year')
plt.show()
merged.stack().describe()
Calling an aggregation method on the object applies the function to each group, the results of which are
combined in a new data structure
For example, we can return the number of countries in our dataset for each continent using .size()
In this case, our new data structure is a Series
grouped.size()
Continent
America 7
Asia 4
Europe 19
dtype: int64
Calling .get_group() to return just the countries in a single group, we can create a kernel density
estimate of the distribution of real minimum wages in 2016 for each continent
grouped.groups.keys() will return the keys from the groupby object
continents = grouped.groups.keys()
This lecture has provided an introduction to some of pandas more advanced features, including multiindices,
merging, grouping and plotting
Other tools that may be useful in panel data analysis include xarray, a python package that extends pandas
to N-dimensional data structures
4.2.6 Exercises
Exercise 1
In these exercises youll work with a dataset of employment rates in Europe by age and sex from Eurostat
The dataset pandas_panel/employ.csv can be downloaded here
Reading in the csv file returns a panel dataset in long format. Use .pivot_table() to construct a wide
format dataframe with a MultiIndex in the columns
Start off by exploring the dataframe and the variables available in the MultiIndex levels
Write a program that quickly returns all values in the MultiIndex
Exercise 2
Filter the above dataframe to only include employment as a percentage of active population
Create a grouped boxplot using seaborn of employment rates in 2015 by age group and sex
Hint: GEO includes both areas and countries
4.2.7 Solutions
Exercise 1
employ = pd.read_csv('https://github.com/QuantEcon/QuantEcon.lectures.code/
,→raw/master/pandas_panel/employ.csv')
employ = employ.pivot_table(values='Value',
index=['DATE'],
columns=['UNIT','AGE', 'SEX', 'INDIC_EM', 'GEO'])
employ.index = pd.to_datetime(employ.index) # ensure that dates are datetime
,→format
employ.head()
This is a large dataset so it is useful to explore the levels and variables available
employ.columns.names
'United Kingdom'],
dtype='object', name='GEO')
Exercise 2
To easily filter by country, swap GEO to the top level and sort the MultiIndex
employ.columns = employ.columns.swaplevel(0,-1)
employ = employ.sort_index(axis=1)
We need to get rid of a few items in GEO which are not countries
A fast way to get rid of the EU areas is to use a list comprehension to find the level values in GEO that begin
with Euro
geo_list = employ.columns.get_level_values('GEO').unique().tolist()
countries = [x for x in geo_list if not x.startswith('Euro')]
employ = employ[countries]
employ.columns.get_level_values('GEO').unique()
Select only percentage employed in the active population from the dataframe
box = employ_f['2015'].unstack().reset_index()
sns.boxplot(x="AGE", y=0, hue="SEX", data=box, palette=("husl"),
,→showfliers=False)
plt.xlabel('')
plt.xticks(rotation=35)
plt.ylabel('Percentage of population (%)')
plt.title('Employment in Europe (2015)')
plt.legend(bbox_to_anchor=(1,0.5))
plt.show()
Contents
4.3.1 Overview
Linear regression is a standard tool for analyzing the relationship between two or more variables
In this lecture well use the Python package statsmodels to estimate, interpret, and visualize linear
regression models
Along the way well discuss a variety of topics, including
• simple and multivariate linear regression
• visualization
• endogeneity and omitted variable bias
• two-stage least squares
As an example, we will replicate results from Acemoglu, Johnson and Robinsons seminal paper [AJR01]
• You can download a copy here
In the paper, the authors emphasize the importance of institutions in economic development
The main contribution is the use of settler mortality rates as a source of exogenous variation in institutional
differences
Such variation is needed to determine whether it is institutions that give rise to greater economic growth,
rather than the other way around
Prerequisites
Comments
[AJR01] wish to determine whether or not differences in institutions can help to explain observed economic
outcomes
How do we measure institutional differences and economic outcomes?
In this paper,
• economic outcomes are proxied by log GDP per capita in 1995, adjusted for exchange rates
• institutional differences are proxied by an index of protection against expropriation on average over
1985-95, constructed by the Political Risk Services Group
These variables and other data used in the paper are available for download on Daron Acemoglus webpage
We will use pandas .read_stata() function to read in data contained in the .dta files to dataframes
import pandas as pd
df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/
,→master/ols/maketable1.dta')
df1.head()
Lets use a scatterplot to see whether any obvious relationship exists between GDP per capita and the pro-
tection against expropriation index
import matplotlib.pyplot as plt
plt.style.use('seaborn')
The plot shows a fairly strong positive relationship between protection against expropriation and log GDP
per capita
Specifically, if higher protection against expropriation is a measure of institutional quality, then better insti-
tutions appear to be positively correlated with better economic outcomes (higher GDP per capita)
Given the plot, choosing a linear model to describe this relationship seems like a reasonable assumption
We can write our model as
logpgp95i = β0 + β1 avexpri + ui
where:
• β0 is the intercept of the linear trend line on the y-axis
• β1 is the slope of the linear trend line, representing the marginal effect of protection against risk on
log GDP per capita
• ui is a random error term (deviations of observations from the linear trend due to factors not included
in the model)
Visually, this linear model involves choosing a straight line that best fits the data, as in the following plot
(Figure 2 in [AJR01])
import numpy as np
X = df1_subset['avexpr']
y = df1_subset['logpgp95']
labels = df1_subset['shortnam']
plt.xlim([3.3,10.5])
plt.ylim([4,10.5])
plt.xlabel('Average Expropriation Risk 1985-95')
plt.ylabel('Log GDP per capita, PPP, 1995')
plt.title('Figure 2: OLS relationship between expropriation risk and income')
plt.show()
The most common technique to estimate the parameters (βs) of the linear model is Ordinary Least Squares
(OLS)
As the name implies, an OLS model is solved by finding the parameters that minimize the sum of squared
residuals, ie.
∑
N
min û2i
β̂ i=1
where ûi is the difference between the observation and the predicted value of the dependent variable
To estimate the constant term β0 , we need to add a column of 1s to our dataset (consider the equation if β0
was replaced with β0 xi and xi = 1)
df1['const'] = 1
Now we can construct our model in statsmodels using the OLS function
We will use pandas dataframes with statsmodels, however standard arrays can also be used as argu-
ments
import statsmodels.api as sm
statsmodels.regression.linear_model.OLS
results = reg1.fit()
type(results)
statsmodels.regression.linear_model.RegressionResultsWrapper
print(results.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
,→correctly specified.
This equation describes the line that best fits our data, as shown in Figure 2
We can use this equation to predict the level of log GDP per capita for a value of the index of expropriation
protection
For example, for a country with an index value of 7.07 (the average for the dataset), we find that their
predicted level of log GDP per capita in 1995 is 8.38
mean_expr = np.mean(df1_subset['avexpr'])
mean_expr
6.515625
8.3771
An easier (and more accurate) way to obtain this result is to use .predict() and set constant = 1 and
avexpri = mean_expr
results.predict(exog=[1, mean_expr])
array([ 8.09156367])
We can obtain an array of predicted logpgp95i for every value of avexpri in our dataset by calling .
predict() on our results
Plotting the predicted values against avexpri shows that the predicted values lie along the linear line that
we fitted above
The observed values of logpgp95i are also plotted for comparison purposes
plt.legend()
plt.title('OLS predicted values')
plt.xlabel('avexpr')
plt.ylabel('logpgp95')
plt.show()
So far we have only accounted for institutions affecting economic performance - almost certainly there are
numerous other factors affecting GDP that are not included in our model
Leaving out variables that affect logpgp95i will result in omitted variable bias, yielding biased and incon-
sistent parameter estimates
We can extend our bivariate regression model to a multivariate regression model by adding in other factors
that may affect logpgp95i
[AJR01] consider other factors such as:
• the effect of climate on economic outcomes; latitude is used to proxy this
• differences that affect both economic performance and institutions, eg. cultural, historical, etc.; con-
trolled for with the use of continent dummies
Lets estimate some of the extended models considered in the paper (Table 2) using data from
maketable2.dta
df2 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/
,→master/ols/maketable2.dta')
Now that we have fitted our model, we will use summary_col to display the results in a single table
(model numbers correspond to those in the paper)
results_table = summary_col(results=[reg1,reg2,reg3],
float_format='%0.2f',
stars = True,
model_names=['Model 1',
'Model 3',
'Model 4'],
info_dict=info_dict,
regressor_order=['const',
'avexpr',
'lat_abst',
'asia',
'africa'])
print(results_table)
4.3.4 Endogeneity
As [AJR01] discuss, the OLS models likely suffer from endogeneity issues, resulting in biased and incon-
sistent model estimates
Namely, there is likely a two-way relationship between institutions and economic outcomes:
• richer countries may be able to afford or prefer better institutions
• variables that affect income may also be correlated with institutional differences
• the construction of the index may be biased; analysts may be biased towards seeing countries with
higher income having better institutions
To deal with endogeneity, we can use two-stage least squares (2SLS) regression, which is an extension of
OLS regression
This method requires replacing the endogenous variable avexpri with a variable that is:
1. correlated with avexpri
2. not correlated with the error term (ie. it should not directly affect the dependent variable, otherwise it
would be correlated with ui due to omitted variable bias)
The new set of regressors is called an instrument, which aims to remove endogeneity in our proxy of
institutional differences
The main contribution of [AJR01] is the use of settler mortality rates to instrument for institutional differ-
ences
They hypothesize that higher mortality rates of colonizers led to the establishment of institutions that were
more extractive in nature (less protection against expropriation), and these institutions still persist today
Using a scatterplot (Figure 3 in [AJR01]), we can see protection against expropriation is negatively corre-
lated with settler mortality rates, coinciding with the authors hypothesis and satisfying the first condition of
a valid instrument
X = df1_subset2['logem4']
y = df1_subset2['avexpr']
labels = df1_subset2['shortnam']
plt.xlim([1.8,8.4])
plt.ylim([3.3,10.4])
plt.xlabel('Log of Settler Mortality')
plt.ylabel('Average Expropriation Risk 1985-95')
plt.title('Figure 3: First-stage relationship between settler mortality and
,→expropriation risk')
plt.show()
The second condition may not be satisfied if settler mortality rates in the 17th to 19th centuries have a direct
effect on current GDP (in addition to their indirect effect through institutions)
For example, settler mortality rates may be related to the current disease environment in a country, which
could affect current economic performance
[AJR01] argue this is unlikely because:
• The majority of settler deaths were due to malaria and yellow fever, and had limited effect on local
people
• The disease burden on local people in Africa or India, for example, did not appear to be higher than
average, supported by relatively high population densities in these areas before colonization
As we appear to have a valid instrument, we can use 2SLS regression to obtain consistent and unbiased
parameter estimates
First stage
The first stage involves regressing the endogenous variable (avexpri ) on the instrument
The instrument is the set of all exogenous variables in our model (and not just the variable we have replaced)
Using model 1 as an example, our instrument is simply a constant and settler mortality rates logem4i
avexpri = δ0 + δ1 logem4i + vi
The data we need to estimate this equation is located in maketable4.dta (only complete data, indicated
by baseco = 1, is used for estimation)
df4 = df4[df4['baseco'] == 1]
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
,→correctly specified.
Second stage
We need to retrieve the predicted values of avexpri using .predict()
\ i in the original linear
We then replace the endogenous variable avexpri with the predicted values avexpr
model
\ i + ui
logpgp95i = β0 + β1 avexpr
df4['predicted_avexpr'] = results_fs.predict()
results_ss = sm.OLS(df4['logpgp95'],
df4[['const', 'predicted_avexpr']]).fit()
print(results_ss.summary())
------------------------------------------------------------------------------
,→------
==============================================================================
Omnibus: 10.547 Durbin-Watson: 2.137
Prob(Omnibus): 0.005 Jarque-Bera (JB): 11.010
Skew: -0.790 Prob(JB): 0.00407
Kurtosis: 4.277 Cond. No. 58.1
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
,→correctly specified.
The second-stage regression results give us an unbiased and consistent estimate of the effect of institutions
on economic outcomes
The result suggests a stronger positive relationship than what the OLS results indicated
Note that while our parameter estimates are correct, our standard errors are not and for this reason, comput-
ing 2SLS manually (in stages with OLS) is not recommended
We can correctly estimate a 2SLS regression in one step using the linearmodels package, an extension of
statsmodels
To install this package, you will need to run pip install linearmodels in your command line
Note that when using IV2SLS, the exogenous and instrument variables are split up in the function arguments
(whereas before the instrument included exogenous variables)
iv = IV2SLS(dependent=df4['logpgp95'],
exog=df4['const'],
endog=df4['avexpr'],
instruments=df4['logem4']).fit(cov_type='unadjusted')
print(iv.summary)
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
const 1.9097 1.0106 1.8897 0.0588 -0.0710 3.8903
avexpr 0.9443 0.1541 6.1293 0.0000 0.6423 1.2462
==============================================================================
Endogenous: avexpr
Instruments: logem4
Unadjusted Covariance (Homoskedastic)
Debiased: False
Given that we now have consistent and unbiased estimates, we can infer from the model we have estimated
that institutional differences (stemming from institutions set up during colonization) can help to explain
differences in income levels across countries today
[AJR01] use a marginal effect of 0.94 to calculate that the difference in the index between Chile and Nige-
ria (ie. institutional quality) implies up to a 7-fold difference in income, emphasizing the significance of
institutions in economic development
4.3.5 Summary
We have demonstrated basic OLS and 2SLS regression in statsmodels and linearmodels
If you are familiar with R, you may want use the formula interface to statsmodels, or consider using
r2py to call R from within Python
4.3.6 Exercises
Exercise 1
In the lecture, we think the original model suffers from endogeneity bias due to the likely effect income has
on institutional development
Although endogeneity is often best identified by thinking about the data and model, we can formally test for
endogeneity using the Hausman test
We want to test for correlation between the endogenous variable, avexpri , and the errors, ui
H0 : Cov(avexpri , ui ) = 0 (no endogeneity)
H1 : Cov(avexpri , ui ) ̸= 0 (endogeneity)
This test is run is two stages
First, we regress avexpri on the instrument, logem4i
avexpri = π0 + π1 logem4i + υi
Second, we retrieve the residuals υ̂i and include them in the original equation
If α is statistically significant (with a p-value < 0.05), then we reject the null hypothesis and conclude that
avexpri is endogenous
Using the above information, estimate a Hausman test and interpret your results
Exercise 2
The OLS parameter β can also be estimated using matrix algebra and numpy (you may need to review the
numpy lecture to complete this exercise)
The linear equation we want to estimate is (written in matrix form)
y = Xβ + u
To solve for the unknown parameter β, we want to minimise the sum of squared residuals
minû′ û
β̂
Rearranging the first equation and substituting into the second equation, we can write
Solving this optimization problem gives the solution for the β̂ coefficients
β̂ = (X ′ X)−1 X ′ y
Using the above information, compute β̂ from model 1 using numpy - your results should be the same as
those in the statsmodels output from earlier in the lecture
4.3.7 Solutions
Exercise 1
# Load in data
df4 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/
,→master/ols/maketable4.dta')
print(reg2.summary())
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
,→correctly specified.
The output shows that the coefficient on the residuals is statistically significant, indicating avexpri is en-
dogenous
Exercise 2
# Load in data
df1 = pd.read_stata('https://github.com/QuantEcon/QuantEcon.lectures.code/raw/
,→master/ols/maketable1.dta')
df1 = df1.dropna(subset=['logpgp95', 'avexpr'])
# Compute β_hat
β_hat = np.linalg.solve(X.T @ X, X.T @ y)
β_0 = 4.6
β_1 = 0.53
Contents
– Summary
– Exercises
– Solutions
4.4.1 Overview
In a previous lecture we estimated the relationship between dependent and explanatory variables using linear
regression
But what if a linear relationship is not an appropriate assumption for our model?
One widely used alternative is maximum likelihood estimation, which involves specifying a class of distri-
butions, indexed by unknown parameters, and then using the data to pin down these parameter values
The benefit relative to linear regression is that it allows more flexibility in the probabilistic relationships
between variables
Here we illustrate maximum likelihood by replicating Daniel Treismans (2016) paper, Russias Billionaires,
which connects the number of billionaires in a country to its economic characteristics
The paper concludes that Russia has a higher number of billionaires than economic factors such as market
size and tax rate predict
Prerequisites
Comments
Lets consider the steps we need to go through in maximum likelihood estimation and how they pertain to
this study
Flow of Ideas
The first step with maximum likelihood estimation is to choose the probability distribution believed to be
generating the data
More precisely, we need to make an assumption as to which parametric class of distributions is generating
the data
• e.g., the class of all normal distributions, or the class of all gamma distributions
Each such class is a family of distributions indexed by a finite number of parameters
• e.g., the class of normal distributions is a family of distributions indexed by its mean µ ∈ (−∞, ∞)
and standard deviation σ ∈ (0, ∞)
Well let the data pick out a particular element of the class by pinning down the parameters
The parameter estimates so produced will be called maximum likelihood estimates
Counting Billionaires
ax.grid()
ax.set_xlabel('$y$', fontsize=14)
ax.set_ylabel('$f(y \mid \mu)$', fontsize=14)
ax.axis(xmin=0, ymin=0)
ax.legend(fontsize=14)
plt.show()
Notice that the Poisson distribution begins to resemble a normal distribution as the mean of y increases
Lets have a look at the distribution of the data well be working with in this lecture
Treismans main source of data is Forbes annual rankings of billionaires and their estimated net worth
The dataset mle/fp.dta can be downloaded from here or its AER page
import pandas as pd
pd.options.display.max_columns = 10
Using a histogram, we can view the distribution of the number of billionaires per country, numbil0, in
2008 (the United States is dropped for plotting purposes)
plt.subplots(figsize=(12, 8))
plt.hist(numbil0_2008, bins=30)
plt.xlim(xmin=0)
plt.grid()
plt.xlabel('Number of billionaires in 2008')
plt.ylabel('Count')
plt.show()
From the histogram, it appears that the Poisson assumption is not unreasonable (albeit with a very low µ
and some outliers)
In Treismans paper, the dependent variable the number of billionaires yi in country i is modeled as a
function of GDP per capita, population size, and years membership in GATT and WTO
Hence, the distribution of yi needs to be conditioned on the vector of explanatory variables xi
The standard formulation the so-called poisson regression model is as follows:
µyi i −µi
f (yi | xi ) = e ; yi = 0, 1, 2, . . . , ∞. (4.1)
yi !
import numpy as np
for X in datasets:
µ = exp(X @ β)
distribution = []
for y_i in y_values:
distribution.append(poisson_pmf(y_i, µ))
ax.plot(y_values,
distribution,
label=f'$\mu_i$={µ:.1}',
marker='o',
markersize=8,
alpha=0.5)
ax.grid()
ax.legend()
ax.set_xlabel('$y \mid x_i$')
ax.set_ylabel(r'$f(y \mid x_i; \beta )$')
ax.axis(xmin=0, ymin=0)
plt.show()
In our model for number of billionaires, the conditional distribution contains 4 (k = 4) parameters that we
need to estimate
We will label our entire parameter vector as β where
β0
β1
β=
β2
β3
To estimate the model using MLE, we want to maximize the likelihood that our estimate β̂ is the true
parameter β
Intuitively, we want to find the β̂ that best fits our data
First we need to construct the likelihood function L(β), which is similar to a joint probability density
function
Assume we have some data yi = {y1 , y2 } and yi ∼ f (yi )
If y1 and y2 are independent, the joint pmf of these data is f (y1 , y2 ) = f (y1 ) · f (y2 )
If yi follows a Poisson distribution with λ = 7, we can visualize the joint pmf like so
plot_joint_poisson(µ=7, y_n=20)
Similarly, the joint pmf of our data (which is distributed as a conditional Poisson distribution) can be written
as
∏
n
µyi
f (y1 , y2 , . . . , yn | x1 , x2 , . . . , xn ; β) = i
e−µi
yi !
i=1
Now that we have our likelihood function, we want to find the β̂ that yields the maximum likelihood value
maxL(β)
β
In doing so it is generally easier to maximize the log-likelihood (consider differentiating f (x) = x exp(x)
vs. f (x) = log(x) + x)
Given that taking a logarithm is a monotone increasing transformation, a maximizer of the likelihood func-
tion will also be a maximizer of the log-likelihood function
In our case the log-likelihood is
( )
log L(β) = log f (y1 ; β) · f (y2 ; β) · . . . · f (yn ; β)
∑
n
= log f (yi ; β)
i=1
∑
n ( µyi )
= log i
e−µi
yi !
i=1
∑n ∑
n ∑
n
= yi log µi − µi − log y!
i=1 i=1 i=1
The MLE of the Poisson to the Poisson for β̂ can be obtained by solving
(∑
n ∑
n ∑
n )
max yi log µi − µi − log y!
β
i=1 i=1 i=1
However, no analytical solution exists to the above problem – to find the MLE we need to use numerical
methods
Many distributions do not have nice, analytical solutions and therefore require numerical methods to solve
for parameter estimates
One such numerical method is the Newton-Raphson algorithm
β = np.linspace(1, 20)
logL = -(β - 10) ** 2 - 10
dlogL = -2 * β + 20
ax1.set_ylabel(r'$log \mathcal{L(\beta)}$',
rotation=0,
labelpad=35,
fontsize=15)
ax2.set_ylabel(r'$\frac{dlog \mathcal{L(\beta)}}{d \beta}$ ',
rotation=0,
labelpad=35,
fontsize=19)
ax2.set_xlabel(r'$\beta$', fontsize=15)
ax1.grid(), ax2.grid()
plt.axhline(c='black')
plt.show()
d log L(β)
The plot shows that the maximum likelihood value (the top plot) occurs when dβ = 0 (the bottom
plot)
Therefore, the likelihood is maximized when β = 10
We can also ensure that this value is a maximum (as opposed to a minimum) by checking that the second
derivative (slope of the bottom plot) is negative
The Newton-Raphson algorithm finds a point where the first derivative is 0
To use the algorithm, we take an initial guess at the maximum value, β0 (the OLS parameter estimates might
be a reasonable guess), then
1. Use the updating rule to iterate the algorithm
G(β (k) )
β (k+1) = β (k) −
H(β (k) )
where:
d log L(β (∥) )
G(β (k) ) =
dβ (k)
d2 log L(β (∥) )
H(β (k) ) =
dβ 2(k)
def µ(self):
return np.exp(self.X @ self.β.T)
def logL(self):
y = self.y
µ = self.µ()
return np.sum(y * np.log(µ) - µ - np.log(factorial(y)))
def G(self):
µ = self.µ()
return ((self.y - µ) @ self.X).reshape(self.k, 1)
def H(self):
X = self.X
µ = self.µ()
return -(µ * X.T @ X)
Our function newton_raphson will take a PoissonRegression object that has an initial guess of
the parameter vector β 0
The algorithm will update the parameter vector according to the updating rule, and recalculate the gradient
and Hessian matrices at the new parameter estimates
Iteration will end when either:
• The difference between the parameter and the updated parameter is below a tolerance level
• The maximum number of iterations has been achieved (meaning convergence is not achieved)
So we can get an idea of whats going on while the algorithm is running, an option display=True is
added to print out values at each iteration
def newton_raphson(model, tol=1e-3, max_iter=1000, display=True):
i = 0
error = 100 # Initial error value
if display:
header = f'{"Iteration_k":<13}{"Log-likelihood":<16}{"θ":<60}'
print(header)
print("-" * len(header))
# Print iterations
if display:
β_list = [f'{t:.3}' for t in list(model.β)]
update = f'{i:<13}{model.logL():<16.8}{β_list}'
print(update)
i += 1
return model.β
Lets try out our algorithm with a small dataset of 5 observations and 3 variables in X
X = np.array([[1, 2, 5],
[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])
y = np.array([1, 0, 1, 1, 0])
Iteration_k Log-likelihood Θ
-----------------------------------------------------------
0 -4.34476224 ['-1.4890', '0.2650', '0.2440']
1 -3.5742413 ['-3.3840', '0.5280', '0.4740']
2 -3.39995256 ['-5.0640', '0.7820', '0.7020']
3 -3.37886465 ['-5.9150', '0.9090', '0.8200']
4 -3.3783559 ['-6.0740', '0.9330', '0.8430']
5 -3.37835551 ['-6.0780', '0.9330', '0.8430']
Number of iterations: 6
β_hat = [-6.07848205 0.93340226 0.84329625]
As this was a simple model with few observations, the algorithm achieved convergence in only 6 iterations
You can see that with each iteration, the log-likelihood value increased
Remember, our objective was to maximize the log-likelihood function, which the algorithm has worked to
achieve
Also note that the increase in log L(β (k) ) becomes smaller with each iteration
This is because the gradient is approaching 0 as we reach the maximum, and therefore the numerator in our
updating equation is becoming smaller
The gradient vector should be close to 0 at β̂
poi.G()
array([[ -3.95169226e-07],
[ -1.00114804e-06],
[ -7.73114556e-07]])
The iterative process can be visualized in the following diagram, where the maximum is found at β = 10
β = np.linspace(2, 18)
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(β, logL(β), lw=2, c='black')
ax.grid(alpha=0.3)
plt.show()
Note that our implementation of the Newton-Raphson algorithm is rather basic for more robust implemen-
tations see, for example, scipy.optimize
Now that we know whats going on under the hood, we can apply MLE to an interesting application
Well use the Poisson regression model in statsmodels to obtain richer output with standard errors, test
values, and more
statsmodels uses the same algorithm as above to find the maximum likelihood estimates
Before we begin, lets re-estimate our simple model with statsmodels to confirm we obtain the same
coefficients and log-likelihood value
X = np.array([[1, 2, 5],
[1, 1, 3],
[1, 4, 2],
[1, 5, 2],
[1, 3, 1]])
y = np.array([1, 0, 1, 1, 0])
Now lets replicate results from Daniel Treismans paper, Russias Billionaires, mentioned earlier in the lecture
Treisman starts by estimating equation (4.1), where:
• yi is number of billionairesi
• xi1 is log GDP per capitai
• xi2 is log populationi
• xi3 is years in GAT T i – years membership in GATT and WTO (to proxy access to international
markets)
The paper only considers the year 2008 for estimation
We will set up our variables for estimation like so (you should have the data assigned to df from earlier in
the lecture)
# Add a constant
df['const'] = 1
# Variable sets
reg1 = ['const', 'lngdppc', 'lnpop', 'gattwto08']
Then we can use the Poisson function from statsmodels to fit the model
Well use robust standard errors as in the authors paper
import statsmodels.api as sm
# Specify model
poisson_reg = sm.Poisson(df[['numbil0']], df[reg1],
missing='drop').fit(cov_type='HC0')
print(poisson_reg.summary())
Here we received a warning message saying Maximum number of iterations has been exceeded.
Lets try increasing the maximum number of iterations that the algorithm is allowed (the .fit() docstring
tells us the default number of iterations is 35)
Model: Poisson
Df Residuals: 193
Method: MLE
Df Model: 3
Date: Wed, 26 Jul 2017
Pseudo R-squ.: 0.8574
Time: 15:41:38
Log-Likelihood: -438.54
converged: True
LL-Null: -3074.7
LLR p-value: 0.000
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const -29.0495 2.578 -11.268 0.000 -34.103 -23.997
lngdppc 1.0839 0.138 7.834 0.000 0.813 1.355
lnpop 1.1714 0.097 12.024 0.000 0.980 1.362
gattwto08 0.0060 0.007 0.868 0.386 -0.008 0.019
==============================================================================
results.append(result)
results_table = summary_col(results=results,
float_format='%0.3f',
stars=True,
model_names=reg_names,
info_dict=info_dict,
regressor_order=regressor_order)
results_table.add_title('Table 1 - Explaining the Number of Billionaires in
,→2008')
print(results_table)
The output suggests that the frequency of billionaires is positively correlated with GDP per capita, popula-
tion size, stock market capitalization, and negatively correlated with top marginal income tax rate
To analyze our results by country, we can plot the difference between the predicted an actual values, then
sort from highest to lowest and plot the first 15
data = ['const', 'lngdppc', 'lnpop', 'gattwto08', 'lnmcap08', 'rintr',
'topint08', 'nrrents', 'roflaw', 'numbil0', 'country']
results_df = df[data].dropna()
# Calculate difference
results_df['difference'] = results_df['numbil0'] - results_df['prediction']
plt.xlabel('Country')
plt.show()
As we can see, Russia has by far the highest number of billionaires in excess of what is predicted by the
model (around 50 more than expected)
Treisman uses this empirical result to discuss possible reasons for Russias excess of billionaires, including
the origination of wealth in Russia, the political climate, and the history of privatization in the years after
the USSR
4.4.7 Summary
In this lecture we used Maximum Likelihood Estimation to estimate the parameters of a Poisson model
statsmodels contains other built-in likelihood models such as Probit and Logit
For further flexibility, statsmodels provides a way to specify the distribution manually using the
GenericLikelihoodModel class - an example notebook can be found here
4.4.8 Exercises
Exercise 1
Suppose we wanted to estimate the probability of an event yi occurring, given some observations
We could use a probit regression model, where the pmf of yi is
Φ represents the cumulative normal distribution and constrains the predicted yi to be between 0 and 1 (as
required for a probability)
β is a vector of coefficients
Following the example in the lecture, write a class to represent the Probit model
To begin, find the log-likelihood function and derive the gradient and Hessian
The scipy module stats.norm contains the functions needed to compute the cmf and pmf of the normal
distribution
Exercise 2
Use the following dataset and initial values of β to estimate the MLE with the Newton-Raphson algorithm
developed earlier in the lecture
1 2 4 1
1 1 1 0 0.1
X = 1 4 3 y = 1 β (0) = 0.1
1 5 6 1 0.1
1 3 5 0
Verify your results with statsmodels - you can import the Probit function with the following import
statement
Note that the simple Newton-Raphson algorithm developed in this lecture is very sensitive to initial values,
and therefore you may fail to achieve convergence with different starting values
4.4.9 Solutions
Exercise 1
∑
n
[ ]
log L = yi log Φ(x′i β) + (1 − yi ) log(1 − Φ(x′i β))
i=1
Using the fundamental theorem of calculus, the derivative of a cumulative probability distribution is its
marginal distribution
∂
Φ(s) = ϕ(s)
∂s
where ϕ is the marginal normal distribution
The gradient vector of the Probit model is
Using these results, we can write a class for the Probit model as follows
class ProbitRegression:
def µ(self):
return norm.cdf(self.X @ self.β.T)
def (self):
return norm.pdf(self.X @ self.β.T)
def logL(self):
µ = self.µ()
return np.sum(y * np.log(µ) + (1 - y) * np.log(1 - µ))
def G(self):
µ = self.µ()
= self.()
return np.sum((X.T * y * / µ - X.T * (1 - y) * / (1 - µ)), axis=1)
def H(self):
X = self.X
β = self.β
µ = self.µ()
= self.()
a = ( + (X @ β.T) * µ) / µ**2
b = ( - (X @ β.T) * (1 - µ)) / (1 - µ)**2
return -( * (y * a + (1 - y) * b) * X.T) @ X
Exercise 2
X = np.array([[1, 2, 4],
[1, 1, 1],
[1, 4, 3],
[1, 5, 6],
[1, 3, 5]])
y = np.array([1, 0, 1, 1, 0])
Iteration_k Log-likelihood θ
-----------------------------------------------------------
0 -2.37968841 ['-1.3400', '0.7750', '-0.1570']
1 -2.36875259 ['-1.5350', '0.7750', '-0.0980']
2 -2.36872942 ['-1.5460', '0.7780', '-0.0970']
3 -2.36872942 ['-1.5460', '0.7780', '-0.0970']
Number of iterations: 4
β_hat = [-1.54625858 0.77778952 -0.09709757]
print(Probit(y, X).fit().summary())
==============================================================================
FIVE
This section of the course contains foundational mathematical and statistical tools and techniques
Contents
• Linear Algebra
– Overview
– Vectors
– Matrices
– Solving Systems of Equations
– Eigenvalues and Eigenvectors
– Further Topics
– Exercises
– Solutions
5.1.1 Overview
Linear algebra is one of the most useful branches of applied mathematics for economists to invest in
For example, many applied problems in economics and finance require the solution of a linear system of
equations, such as
y1 = ax1 + bx2
y2 = cx1 + dx2
329
QuantEcon.lectures-python3 PDF, Release 2018-Sep-29
The objective here is to solve for the unknowns x1 , . . . , xk given a11 , . . . , ank and y1 , . . . , yn
When considering such problems, it is essential that we first consider at least some of the following questions
• Does a solution actually exist?
• Are there in fact many solutions, and if so how should we interpret them?
• If no solution exists, is there a best approximate solution?
• If a solution exists, how should we compute it?
These are the kinds of topics addressed by linear algebra
In this lecture we will cover the basics of linear and matrix algebra, treating both theory and computation
We admit some overlap with this lecture, where operations on NumPy arrays were first explained
Note that this lecture is more theoretical than most, and contains background material that will be used in
applications as we go along
5.1.2 Vectors
A vector of length n is just a sequence (or array, or tuple) of n numbers, which we write as x = (x1 , . . . , xn )
or x = [x1 , . . . , xn ]
We will write these sequences either horizontally or vertically as we please
(Later, when we wish to perform certain matrix operations, it will become necessary to distinguish between
the two)
The set of all n-vectors is denoted by Rn
For example, R2 is the plane, and a vector in R2 is just a point in the plane
Traditionally, vectors are represented visually as arrows from the origin to the point
The following figure represents three vectors in this manner
Vector Operations
The two most common operators for vectors are addition and scalar multiplication, which we now describe
As a matter of definition, when we add two vectors, we add them element by element
x1 y1 x1 + y1
x2 y2 x2 + y2
x + y = . + . := ..
.. .. .
xn yn x n + yn
Scalar multiplication is an operation that takes a number γ and a vector x and produces
γx1
γx2
γx := .
..
γxn
import numpy as np
scalars = (-2, 2)
x = np.array(x)
for s in scalars:
v = s * x
ax.annotate('', xy=v, xytext=(0, 0),
arrowprops=dict(facecolor='red',
shrink=0,
alpha=0.5,
width=0.5))
ax.text(v[0] + 0.4, v[1] - 0.2, f'${s} x$', fontsize='16')
plt.show()
In Python, a vector can be represented as a list or tuple, such as x = (2, 4, 6), but is more commonly
represented as a NumPy array
One advantage of NumPy arrays is that scalar multiplication and addition have very natural syntax
4 * x
∑
n
′
x y := xi yi
i=1
12.0
1.7320508075688772
1.7320508075688772
Span
Given a set of vectors A := {a1 , . . . , ak } in Rn , its natural to think about the new vectors we can create by
performing linear operations
New vectors created in this manner are called linear combinations of A
In particular, y ∈ Rn is a linear combination of A := {a1 , . . . , ak } if
In this context, the values β1 , . . . , βk are called the coefficients of the linear combination
The set of linear combinations of A is called the span of A
The next figure shows the span of A = {a1 , a2 } in R3
The span is a 2 dimensional plane passing through these two points and the origin
α, β = 0.2, 0.1
gs = 3
z = np.linspace(x_min, x_max, gs)
x = np.zeros(gs)
y = np.zeros(gs)
ax.plot(x, y, z, 'k-', lw=2, alpha=0.5)
ax.plot(z, x, y, 'k-', lw=2, alpha=0.5)
ax.plot(y, z, x, 'k-', lw=2, alpha=0.5)
# Lines to vectors
for i in (0, 1):
x = (0, x_coords[i])
y = (0, y_coords[i])
z = (0, f(x_coords[i], y_coords[i]))
ax.plot(x, y, z, 'b-', lw=1.5, alpha=0.6)
Examples
If A contains only one vector a1 ∈ R2 , then its span is just the scalar multiples of a1 , which is the unique
line passing through both a1 and the origin
If A = {e1 , e2 , e3 } consists of the canonical basis vectors of R3 , that is
1 0 0
e1 := 0 , e2 := 1 , e3 := 0
0 0 1
then the span of A is all of R3 , because, for any x = (x1 , x2 , x3 ) ∈ R3 , we can write
x = x1 e1 + x2 e2 + x3 e3
Linear Independence
As well see, its often desirable to find families of vectors with relatively large span, so that many vectors
can be described by linear operators on a few vectors
The condition we need for a set of vectors to have a large span is whats called linear independence
In particular, a collection of vectors A := {a1 , . . . , ak } in Rn is said to be
• linearly dependent if some strict subset of A has the same span as A
• linearly independent if it is not linearly dependent
Put differently, a set of vectors is linearly independent if no vector is redundant to the span, and linearly
dependent otherwise
To illustrate the idea, recall the figure that showed the span of vectors {a1 , a2 } in R3 as a plane through the
origin
If we take a third vector a3 and form the set {a1 , a2 , a3 }, this set will be
• linearly dependent if a3 lies in the plane
• linearly independent otherwise
As another illustration of the concept, since Rn can be spanned by n vectors (see the discussion of canonical
basis vectors above), any collection of m > n vectors in Rn must be linearly dependent
The following statements are equivalent to linear independence of A := {a1 , . . . , ak } ⊂ Rn
1. No vector in A can be formed as a linear combination of the other elements
2. If β1 a1 + · · · βk ak = 0 for scalars β1 , . . . , βk , then β1 = · · · = βk = 0
(The zero in the first expression is the origin of Rn )
Unique Representations
Another nice thing about sets of linearly independent vectors is that each element in the span has a unique
representation as a linear combination of these vectors
In other words, if A := {a1 , . . . , ak } ⊂ Rn is linearly independent and
y = β1 a1 + · · · βk ak
5.1.3 Matrices
Matrices are a neat way of organizing data for use in linear operations
An n × k matrix is a rectangular array A of numbers with n rows and k columns:
a11 a12 · · · a1k
a21 a22 · · · a2k
A= . .. ..
.. . .
an1 an2 · · · ank
Often, the numbers in the matrix represent coefficients in a system of linear equations, as discussed at the
start of this lecture
For obvious reasons, the matrix A is also called a vector if either n = 1 or k = 1
In the former case, A is called a row vector, while in the latter it is called a column vector
If n = k, then A is called square
The matrix formed by replacing aij by aji for every i and j is called the transpose of A, and denoted A′ or
A⊤
If A = A′ , then A is called symmetric
For a square matrix A, the i elements of the form aii for i = 1, . . . , n are called the principal diagonal
A is called diagonal if the only nonzero entries are on the principal diagonal
If, in addition to being diagonal, each element along the principal diagonal is equal to 1, then A is called the
identity matrix, and denoted by I
Matrix Operations
Just as was the case for vectors, a number of algebraic operations are defined for matrices
Scalar multiplication and addition are immediate generalizations of the vector case:
a11 · · · a1k γa11 · · · γa1k
.. := .. ..
γA = γ ... ..
. . .
..
. .
an1 · · · ank γan1 · · · γank
and
a11 · · · a1k b11 · · · b1k a11 + b11 · · · a1k + b1k
.. + .. :=
A + B = ... ..
. .
..
.
..
. .
..
.
..
.
..
.
an1 · · · ank bn1 · · · bnk an1 + bn1 · · · ank + bnk
In the latter case, the matrices must have the same shape in order for the definition to make sense
We also have a convention for multiplying two matrices
The rule for matrix multiplication generalizes the idea of inner products discussed above, and is designed to
make multiplication play well with basic linear operations
If A and B are two matrices, then their product AB is formed by taking as its i, j-th element the inner
product of the i-th row of A and the j-th column of B
There are many tutorials to help you visualize this operation, such as this one, or the discussion on the
Wikipedia page
If A is n × k and B is j × m, then to multiply A and B we require k = j, and the resulting matrix AB is
n×m
As perhaps the most important special case, consider multiplying n × k matrix A and k × 1 column vector
x
According to the preceding rule, this gives us an n × 1 column vector
a11 · · · a1k x1 a11 x1 + · · · + a1k xk
.. .. :=
Ax = ... ..
. . .
..
. (5.2)
an1 · · · ank xk an1 x1 + · · · + ank xk
Matrices in NumPy
NumPy arrays are also used as matrices, and have fast, efficient functions and methods for all the standard
matrix operations1
You can create them manually from tuples of tuples (or lists of lists) as follows
A = ((1, 2),
(3, 4))
type(A)
tuple
A = np.array(A)
type(A)
numpy.ndarray
1
Although there is a specialized matrix data type defined in NumPy, its more standard to work with ordinary NumPy arrays.
See this discussion.
A.shape
(2, 2)
The shape attribute is a tuple giving the number of rows and columns see here for more discussion
To get the transpose of A, use A.transpose() or, more simply, A.T
There are many convenient functions for creating common matrices (matrices of zeros, ones, etc.) see here
Since operations are performed elementwise by default, scalar multiplication and addition have very natural
syntax
A = np.identity(3)
B = np.ones((3, 3))
2 * A
A + B
Matrices as Maps
Each n × k matrix A can be identified with a function f (x) = Ax that maps x ∈ Rk into y = Ax ∈ Rn
These kinds of functions have a special property: they are linear
A function f : Rk → Rn is called linear if, for all x, y ∈ Rk and all scalars α, β, we have
You can check that this holds for the function f (x) = Ax + b when b is the zero vector, and fails when b is
nonzero
In fact, its known that f is linear if and only if there exists a matrix A such that f (x) = Ax for all x
If we compare (5.1) and (5.2), we see that (5.1) can now be written more conveniently as
y = Ax (5.3)
The problem we face is to determine a vector x ∈ Rk that solves (5.3), taking y and A as given
This is a special case of a more general problem: Find an x such that y = f (x)
Given an arbitrary function f and a y, is there always an x such that y = f (x)?
If so, is it always unique?
The answer to both these questions is negative, as the next figure shows
def f(x):
return 0.6 * np.cos(4 * x) + 1.4
for ax in axes:
# Set the axes through the origin
for spine in ['left', 'bottom']:
ax.spines[spine].set_position('zero')
for spine in ['right', 'top']:
ax.spines[spine].set_color('none')
ax = axes[0]
ax = axes[1]
ybar = 2.6
plt.show()
In the first plot there are multiple solutions, as the function is not one-to-one, while in the second there are
no solutions, since y lies outside the range of f
Can we impose conditions on A in (5.3) that rule out these problems?
In this context, the most important thing to recognize about the expression Ax is that it corresponds to a
linear combination of the columns of A
Ax = x1 a1 + · · · + xk ak
The n × n Case
Lets discuss some more details, starting with the case where A is n × n
This is the familiar case where the number of unknowns equals the number of equations
For arbitrary y ∈ Rn , we hope to find a unique x ∈ Rn such that y = Ax
In view of the observations immediately above, if the columns of A are linearly independent, then their span,
and hence the range of f (x) = Ax, is all of Rn
Hence there always exists an x such that y = Ax
Moreover, the solution is unique
In particular, the following are equivalent
1. The columns of A are linearly independent
2. For any y ∈ Rn , the equation y = Ax has a unique solution
The property of having linearly independent columns is sometimes expressed as having full column rank
Inverse Matrices
Determinants
Another quick comment about square matrices is that to every such matrix we assign a unique number called
the determinant of the matrix you can find the expression for it here
If the determinant of A is not zero, then we say that A is nonsingular
Perhaps the most important fact about determinants is that A is nonsingular if and only if A is of full column
rank
This gives us a useful one-number summary of whether or not a square matrix can be inverted
This is the n × k case with n < k, so there are fewer equations than unknowns
In this case there are either no solutions or infinitely many in other words, uniqueness never holds
For example, consider the case where k = 3 and n = 2
Thus, the columns of A consists of 3 vectors in R2
This set can never be linearly independent, since it is possible to find two vectors that span R2
(For example, use the canonical basis vectors)
It follows that one column is a linear combination of the other two
For example, lets say that a1 = αa2 + βa3
Then if y = Ax = x1 a1 + x2 a2 + x3 a3 , we can also write
Heres an illustration of how to solve linear equations with SciPys linalg submodule
All of these routines are Python front ends to time-tested and highly optimized FORTRAN code
-2.0
array([[-2. , 1. ],
[ 1.5, -0.5]])
x = A_inv @ y # Solution
A @ x # Should equal y
array([[ 1.],
[ 1.]])
array([[-1.],
[ 1.]])
Observe how we can solve for x = A−1 y by either via inv(A) @ y, or using solve(A, y)
The latter method uses a different algorithm (LU decomposition) that is numerically more stable, and hence
should almost always be preferred
To obtain the least squares solution x̂ = (A′ A)−1 A′ y, use scipy.linalg.lstsq(A, y)
Av = λv
A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evecs = evecs[:, 0], evecs[:, 1]
for v in evecs:
a = v[1] / v[0]
ax.plot(x, a * x, 'b-', lw=0.4)
plt.show()
The eigenvalue equation is equivalent to (A − λI)v = 0, and this has a nonzero solution v only when the
columns of A − λI are linearly dependent
This in turn is equivalent to stating that the determinant is zero
Hence to find all eigenvalues, we can look for λ such that the determinant of A − λI is zero
This problem can be expressed as one of solving for the roots of a polynomial in λ of degree n
This in turn implies the existence of n solutions in the complex plane, although some might be repeated
Some nice facts about the eigenvalues of a square matrix A are as follows
1. The determinant of A equals the product of the eigenvalues
2. The trace of A (the sum of the elements on the principal diagonal) equals the sum of the eigenvalues
A = ((1, 2),
(2, 1))
A = np.array(A)
evals, evecs = eig(A)
evals
evecs
Generalized Eigenvalues
It is sometimes useful to consider the generalized eigenvalue problem, which, for given matrices A and B,
seeks generalized eigenvalues λ and eigenvectors v such that
Av = λBv
We round out our discussion by briefly mentioning several other important topics
Series Expansions
Recall the usual summation formula for a geometric progression, which states that if |a| < 1, then
∑ ∞ −1
k=0 a = (1 − a)
k
Matrix Norms
The norms on the right-hand side are ordinary vector norms, while the norm on the left-hand side is a matrix
norm in this case, the so-called spectral norm
For example, for a square matrix S, the condition ∥S∥ < 1 means that S is contractive, in the sense that it
pulls all vectors towards the origin2
Neumanns Theorem
∞
∑
(I − A)−1 = Ak (5.4)
k=0
Spectral Radius
A result known as Gelfands formula tells us that, for any square matrix A,
Here ρ(A) is the spectral radius, defined as maxi |λi |, where {λi }i is the set of eigenvalues of A
As a consequence of Gelfands formula, if all eigenvalues are strictly less than one in modulus, there exists a
k with ∥Ak ∥ < 1
In which case (5.4) is valid
Analogous definitions exist for negative definite and negative semi-definite matrices
It is notable that if A is positive definite, then all of its eigenvalues are strictly positive, and hence A is
invertible (with positive definite inverse)
Further Reading
5.1.7 Exercises
Exercise 1
y = Ax + Bu
Here
• P is an n × n matrix and Q is an m × m matrix
L = −y ′ P y − u′ Qu + λ′ [Ax + Bu − y]
Note: If we dont care about the Lagrange multipliers, we can substitute the constraint into the objective
function, and then just maximize −(Ax + Bu)′ P (Ax + Bu) − u′ Qu with respect to u. You can verify that
this leads to the same maximizer.
5.1.8 Solutions
Solution to Exercise 1
s.t.
y = Ax + Bu
with primitives
• P be a symmetric and positive semidefinite n × n matrix
• Q be a symmetric and positive semidefinite m × m matrix
• A an n × n matrix
• B an n × m matrix
The associated Lagrangian is :
L = −y ′ P y − u′ Qu + λ′ [Ax + Bu − y]
1.
Differentiating Lagrangian equation w.r.t y and setting its derivative equal to zero yields
∂L
= −(P + P ′ )y − λ = −2P y − λ = 0 ,
∂y
since P is symmetric
Accordingly, the first-order condition for maximizing L w.r.t. y implies
λ = −2P y
2.
Differentiating Lagrangian equation w.r.t. u and setting its derivative equal to zero yields
∂L
= −(Q + Q′ )u − B ′ λ = −2Qu + B ′ λ = 0
∂u
Substituting λ = −2P y gives
Qu + B ′ P y = 0
Qu + B ′ P (Ax + Bu) = 0
(Q + B ′ P B)u + B ′ P Ax = 0
which is the first-order condition for maximizing L w.r.t. u
Thus, the optimal choice of u must satisfy
u = −(Q + B ′ P B)−1 B ′ P Ax ,
which follows from the definition of the first-order conditions for Lagrangian equation
3.
Rewriting our problem by substituting the constraint into the objective function, we get
−2u′ B ′ P Ax = −2x′ S ′ B ′ P Ax
= 2x′ A′ P B(Q + B ′ P B)−1 B ′ P Ax
Notice that the term (Q + B ′ P B)−1 is symmetric as both P and Q are symmetric
Regarding the third term −u′ (Q + B ′ P B)u,
Therefore, the solution to the optimization problem v(x) = −x′ P̃ x follows the above result by denoting
P̃ := A′ P A − A′ P B(Q + B ′ P B)−1 B ′ P A
Contents
5.2.1 Overview
Orthogonal projection is a cornerstone of vector space methods, with many diverse applications
These include, but are not limited to,
• Least squares projection, also known as linear regression
• Conditional expectations for multivariate normal (Gaussian) distributions
• Gram–Schmidt orthogonalization
• QR decomposition
• Orthogonal polynomials
• etc
In this lecture we focus on
• key ideas
• least squares regression
Further Reading
For background and foundational concepts, see our lecture on linear algebra
For more proofs and greater theoretical detail, see A Primer in Econometric Theory
For a complete set of proofs in a general setting, see, for example, [Rom05]
For an advanced treatment of projection in the context of least squares prediction, see this book chapter
Assume x, z ∈ Rn
∑
Define ⟨x, z⟩ = i xi zi
Recall ∥x∥2 = ⟨x, x⟩
The law of cosines states that ⟨x, z⟩ = ∥x∥∥z∥ cos(θ) where θ is the angle between the vectors x and z
When ⟨x, z⟩ = 0, then cos(θ) = 0 and x and z are said to be orthogonal and we write x ⊥ z
S ⊥ is a linear subspace of Rn
• To see this, fix x, y ∈ S ⊥ and α, β ∈ R
• Observe that if z ∈ S, then
⟨αx + βy, z⟩ = α⟨x, z⟩ + β⟨y, z⟩ = α × 0 + β × 0 = 0
ŷ := arg min ∥y − z∥
z∈S
Proof of sufficiency
For a linear space Y and a fixed linear subspace S, we have a functional relationship
y ∈ Y 7→ its orthogonal projection ŷ ∈ S
By the OPT, this is a well-defined mapping or operator from Rn to Rn
In what follows we denote this operator by a matrix P
• P y represents the projection ŷ
• This is sometimes expressed as ÊS y = P y, where Ê denotes a wide-sense expectations operator
and the subscript S indicates that we are projecting y onto the linear subspace S
The operator P is called the orthogonal projection mapping onto S
1. P y ∈ S and
2. y − P y ⊥ S
From this we can deduce additional useful properties, such as
1. ∥y∥2 = ∥P y∥2 + ∥y − P y∥2 and
2. ∥P y∥ ≤ ∥y∥
For example, to prove 1, observe that y = P y + y − P y and apply the Pythagorean law
Orthogonal Complement
Let S ⊂ Rn .
The orthogonal complement of S is the linear subspace S ⊥ that satisfies x1 ⊥ x2 for every x1 ∈ S and
x2 ∈ S ⊥
Let Y be a linear space with linear subspace S and its orthogonal complement S ⊥
We write
Y = S ⊕ S⊥
to indicate that for every y ∈ Y there is unique x1 ∈ S and a unique x2 ∈ S ⊥ such that y = x1 + x2 .
Moreover, x1 = ÊS y and x2 = y − ÊS y
This amounts to another version of the OPT:
Theorem. If S is a linear subspace of Rn , ÊS y = P y and ÊS ⊥ y = M y, then
∑
k
x= ⟨x, ui ⟩ui for all x∈S
i=1
To see this, observe that since x ∈ span{u1 , . . . , uk }, we can find scalars α1 , . . . , αk that verify
∑
k
x= αj u j (5.5)
j=1
∑
k
⟨x, ui ⟩ = αj ⟨uj , ui ⟩ = αi
j=1
When the subspace onto which are projecting is orthonormal, computing the projection simplifies:
Theorem If {u1 , . . . , uk } is an orthonormal basis for S, then
∑
k
Py = ⟨y, ui ⟩ui , ∀ y ∈ Rn (5.6)
i=1
ÊS y = P y
P = X(X ′ X)−1 X ′
An expression of the form Xa is precisely a linear combination of the columns of X, and hence an element
of S
Claim 2 is equivalent to the statement
Starting with X
It is common in applications to start with n × k matrix X with linearly independent columns and let
P y = U (U ′ U )−1 U ′ y
∑
k
P y = U U ′y = ⟨ui , y⟩ui
i=1
We have recovered our earlier result about projecting onto the span of an orthonormal basis
β̂ := (X ′ X)−1 X ′ y
X β̂ = X(X ′ X)−1 X ′ y = P y
Because Xb ∈ span(X)
If probabilities and hence E are unknown, we cannot solve this problem directly
However, if a sample is available, we can estimate the risk with the empirical risk:
1 ∑
N
min (yn − f (xn ))2
f ∈F N
n=1
∑
N
min (yn − b′ xn )2
b∈R K
n=1
Solution
β̂ := (X ′ X)−1 X ′ y
ŷ := X β̂ = P y
û := y − ŷ = y − P y = M y
Lets return to the connection between linear independence and orthogonality touched on above
A result of much interest is a famous algorithm for constructing orthonormal sets from linearly independent
sets
The next section gives details
Gram-Schmidt Orthogonalization
Theorem For each linearly independent set {x1 , . . . , xk } ⊂ Rn, there exists an orthonormal set
{u1 , . . . , uk } with
QR Decomposition
The following result uses the preceding algorithm to produce a useful decomposition
Theorem If X is n × k with linearly independent columns, then there exists a factorization X = QR where
• R is k × k, upper triangular, and nonsingular
• Q is n × k with orthonormal columns
Proof sketch: Let
• xj := colj (X)
• {u1 , . . . , uk } be orthonormal with same span as {x1 , . . . , xk } (to be constructed using
Gram–Schmidt)
• Q be formed from cols ui
Since xj ∈ span{u1 , . . . , uj }, we have
∑
j
xj = ⟨ui , xj ⟩ui for j = 1, . . . , k
i=1
For matrices X and y that overdetermine beta in the linear equation system y = Xβ, we found the least
squares approximator β̂ = (X ′ X)−1 X ′ y
Using the QR decomposition X = QR gives
β̂ = (R′ Q′ QR)−1 R′ Q′ y
= (R′ R)−1 R′ Q′ y
= R−1 (R′ )−1 R′ Q′ y = R−1 Q′ y
Numerical routines would in this case use the alternative form Rβ̂ = Q′ y and back substitution
5.2.8 Exercises
Exercise 1
Exercise 2
Let P = X(X ′ X)−1 X ′ and let M = I − P . Show that P and M are both idempotent and symmetric. Can
you give any intuition as to why they should be idempotent?
Exercise 3
Using Gram-Schmidt orthogonalization, produce a linear projection of y onto the column space of X and
verify this using the projection matrix P := X(X ′ X)−1 X ′ and also using QR decomposition, where:
1
y := 3 ,
−3
and
1 0
X := 0 −6
2 2
5.2.9 Solutions
Exercise 1
Exercise 2
Symmetry and idempotence of M and P can be established using standard rules for matrix algebra. The
intuition behind idempotence of M and P is that both are orthogonal projections. After a point is projected
into a given subspace, applying the projection again makes no difference. (A point inside the subspace is
not shifted by orthogonal projection onto that space because it is already the closest point in the subspace to
itself.)
Exercise 3
Heres a function that computes the orthonormal vectors using the GS algorithm given in the lecture.
import numpy as np
def gram_s