100% found this document useful (1 vote)
396 views429 pages

Python Introduction 2016

Python introduction 2016

Uploaded by

freeloader
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
396 views429 pages

Python Introduction 2016

Python introduction 2016

Uploaded by

freeloader
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 429

Introduction to Python for

Econometrics, Statistics and Data Analysis


3rd Edition

Kevin Sheppard
University of Oxford

Monday 21st August, 2017


2

2017 Kevin Sheppard


Changes since the Second Edition

Small typo fixes, thanks to Marton Huebler.

Fixed direct download of FRED data due to API changes, thanks to Jesper Termansen.
ii
Notes to the 3rd Edition

This edition includes the following changes from the first edition (August 2014):

Rewritten installation section focused exclusively on using Continuums Anaconda.

Python 3.5 is the default version of Python instead of 2.7. Python 3.5 (or newer) is well supported by
the Python packages required to analyze data and perform statistical analysis, and bring some new
useful features, such as a new operator for matrix multiplication (@).

Removed distinction between integers and longs in built-in data types chapter. This distinction is
only relevant for Python 2.7.

dot has been removed from most examples and replaced with @ to produce more readable code.

Split Cython and Numba into separate chapters to highlight the improved capabilities of Numba.

Verified all code working on current versions of core libraries using Python 3.5.

pandas

Updated syntax of pandas functions such as resample.


Added pandas Categorical.
Expanded coverage of pandas groupby.
Expanded coverage of date and time data types and functions.

New chapter introducing statsmodels, a package that facilitates statistical analysis of data. statsmod-
els includes regression analysis, Generalized Linear Models (GLM) and time-series analysis using
ARIMA models.
iv
Changes since the Second Edition

Fixed typos reported by a reader thanks to Ilya Sorvachev

Code verified against Anaconda 2.0.1.

Added diagnostic tools and a simple method to use external code in the Cython section.

Updated the Numba section to reflect recent changes.

Fixed some typos in the chapter on Performance and Optimization.

Added examples of joblib and IPythons cluster to the chapter on running code in parallel.

New chapter introducing object-oriented programming as a method to provide structure and orga-
nization to related code.

Added seaborn to the recommended package list, and have included it be default in the graphics
chapter.

Based on experience teaching Python to economics students, the recommended installation has
been simplified by removing the suggestion to use virtual environment. The discussion of virtual
environments as been moved to the appendix.

Rewrote parts of the pandas chapter.

Changed the Anaconda install to use both create and install, which shows how to install additional
packages.

Fixed some missing packages in the direct install.

Changed the configuration of IPython to reflect best practices.

Added subsection covering IPython profiles.

Small section about Spyder as a good starting IDE.


vi
Notes to the 2nd Edition

This edition includes the following changes from the first edition (March 2012):

The preferred installation method is now Continuum Analytics Anaconda. Anaconda is a complete
scientific stack and is available for all major platforms.

New chapter on pandas. pandas provides a simple but powerful tool to manage data and perform
preliminary analysis. It also greatly simplifies importing and exporting data.

New chapter on advanced selection of elements from an array.

Numba provides just-in-time compilation for numeric Python code which often produces large per-
formance gains when pure NumPy solutions are not available (e.g. looping code).

Dictionary, set and tuple comprehensions

Numerous typos

All code has been verified working against Anaconda 1.7.0.


viii
Contents

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Important Components of the Python Scientific Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Using Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.A Additional Installation Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Python 2.7 vs. 3 (and the rest) 19


2.1 Python 2.7 vs. 3.x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Intel Math Kernel Library and AMDs GPUOpen Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Other Variants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.A Relevant Differences between Python 2.7 and 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3 Built-in Data Types 23


3.1 Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Core Native Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 Additional Container Data Types in the Standard Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Python and Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Arrays and Matrices 41


4.1 Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 1-dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4 2-dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Multidimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.6 Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.7 Accessing Elements of an Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.8 Slicing and Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.9 import and Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
x CONTENTS

4.10 Calling Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54


4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Basic Math 59
5.1 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.3 Addition (+) and Subtraction (-) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4 Multiplication (*) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.5 Matrix Multiplication (@) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.6 Array and Matrix Division (/) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.7 Exponentiation (**) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.8 Parentheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.9 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.10 Operator Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

6 Basic Functions and Numerical Indexing 65


6.1 Generating Arrays and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.3 Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.4 Complex Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.5 Set Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.6 Sorting and Extreme Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.7 Nan Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.8 Functions and Methods/Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

7 Special Arrays 79
7.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

8 Array and Matrix Functions 81


8.1 Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.2 Shape Information and Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
8.3 Linear Algebra Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

9 Importing and Exporting Data 95


9.1 Importing Data using pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.2 Importing Data without pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
9.3 Saving or Exporting Data using pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.4 Saving or Exporting Data without pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
9.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
CONTENTS xi

10 Inf, NaN and Numeric Limits 105


10.1 inf and NaN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
10.2 Floating point precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
10.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

11 Logical Operators and Find 109


11.1 >, >=, <, <=, ==, != . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
11.2 and, or, not and xor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
11.3 Multiple tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
11.4 is* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
11.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

12 Advanced Selection and Assignment 115


12.1 Numerical Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
12.2 Logical Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
12.3 Performance Considerations and Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
12.4 Assignment with Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
12.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

13 Flow Control, Loops and Exception Handling 129


13.1 Whitespace and Flow Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
13.2 if . . . elif . . . else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
13.3 for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
13.4 while . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
13.5 try . . . except . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
13.6 List Comprehensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
13.7 Tuple, Dictionary and Set Comprehensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
13.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

14 Dates and Times 139


14.1 Creating Dates and Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
14.2 Dates Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
14.3 Numpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

15 Graphics 143
15.1 seaborn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
15.2 2D Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
15.3 Advanced 2D Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
15.4 3D Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
15.5 General Plotting Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
15.6 Exporting Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
15.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
xii CONTENTS

16 pandas 163
16.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
16.2 Statistical Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
16.3 Time-series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
16.4 Importing and Exporting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
16.5 Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
16.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

17 Structured Arrays 205


17.1 Mixed Arrays with Column Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
17.2 Record Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

18 Custom Function and Modules 209


18.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
18.2 Variable Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
18.3 Example: Least Squares with Newey-West Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
18.4 Anonymous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
18.5 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
18.6 Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
18.7 PYTHONPATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
18.8 Python Coding Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
18.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
18.A Listing of econometrics.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

19 Probability and Statistics Functions 227


19.1 Simulating Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
19.2 Simulation and Random Number Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
19.3 Statistics Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
19.4 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
19.5 Select Statistics Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
19.6 Select Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
19.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

20 Statistical Analysis with statsmodels 245


20.1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

21 Non-linear Function Optimization 249


21.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
21.2 Derivative-free Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
21.3 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
21.4 Scalar Function Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
21.5 Nonlinear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
CONTENTS xiii

21.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

22 String Manipulation 261


22.1 String Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
22.2 String Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
22.3 Formatting Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
22.4 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
22.5 Safe Conversion of Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

23 File System Operations 273


23.1 Changing the Working Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
23.2 Creating and Deleting Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
23.3 Listing the Contents of a Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
23.4 Copying, Moving and Deleting Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
23.5 Executing Other Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
23.6 Creating and Opening Archives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
23.7 Reading and Writing Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
23.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

24 Performance and Code Optimization 279


24.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
24.2 Timing Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
24.3 Vectorize to Avoid Unnecessary Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
24.4 Alter the loop dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
24.5 Utilize Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
24.6 Use In-place Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
24.7 Avoid Allocating Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
24.8 Inline Frequent Function Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
24.9 Consider Data Locality in Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
24.10Profile Long Running Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
24.11Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287

25 Improving Performance using Numba 289


25.1 Quick Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
25.2 Supported Python Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
25.3 Supported NumPy Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
25.4 Diagnosing Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
25.5 Replacing Python function with C functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
25.6 Other Features of Numba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
25.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
xiv CONTENTS

26 Improving Performance using Cython 305


26.1 Diagnosing Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
26.2 Interfacing with External Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
26.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

27 Executing Code in Parallel 321


27.1 map and related functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
27.2 multiprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
27.3 joblib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
27.4 IPythons Parallel Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
27.5 Converting a Serial Program to Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
27.6 Other Concerns when executing in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335

28 Object-Oriented Programming (OOP) 337


28.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
28.2 Class basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
28.3 Building a class for Autoregressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
28.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347

29 Other Interesting Python Packages 349


29.1 scikit-learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
29.2 mlpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
29.3 NLTK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
29.4 pymc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
29.5 pystan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
29.6 pytz and babel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
29.7 rpy2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
29.8 PyTables and h5py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
29.9 Theano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350

30 Examples 351
30.1 Estimating the Parameters of a GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
30.2 Estimating the Risk Premia using Fama-MacBeth Regressions . . . . . . . . . . . . . . . . . . . . . . . . . 356
30.3 Estimating the Risk Premia using GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
30.4 Outputting LATEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362

31 Quick Reference 365


31.1 Built-ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
31.2 NumPy (numpy) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
31.3 SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
31.4 Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
31.5 pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
CONTENTS xv

31.6 IPython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397


xvi CONTENTS
Chapter 1

Introduction

1.1 Background

These notes are designed for someone new to statistical computing wishing to develop a set of skills nec-
essary to perform original research using Python. They should also be useful for students, researchers or
practitioners who require a versatile platform for econometrics, statistics or general numerical analysis
(e.g. numeric solutions to economic models or model simulation).
Python is a popular generalpurpose programming language that is well suited to a wide range of prob-
lems.1 Recent developments have extended Pythons range of applicability to econometrics, statistics, and
general numerical analysis. Python with the right set of add-ons is comparable to domain-specific
languages such as R, MATLAB or Julia. If you are wondering whether you should bother with Python (or
another language), an incomplete list of considerations includes:
You might want to consider R if:

You want to apply statistical methods. The statistics library of R is second to none, and R is clearly
at the forefront of new statistical algorithm development meaning you are most likely to find that
new(ish) procedure in R.

Performance is of secondary importance.

Free is important.

You might want to consider MATLAB if:

Commercial support and a clear channel to report issues is important.

Documentation and organization of modules are more important than the breadth of algorithms
available.

Performance is an important concern. MATLAB has optimizations, such as Just-in-Time (JIT) com-
pilation of loops, which is not automatically available in most other packages.

You might want to consider Julia if:


1
According to the ranking on http://www.tiobe.com/tiobe-index/, Python is the 5th most popular language. http://
langpop.corger.nl/ ranks Python as 4th or 5th .
2 Introduction

Performance in an interactive based language is your most important concern.

You dont mind learning enough Python to interface with Python packages. The Julia ecosystem is
in its infancy and a bridge to Python is used to provide important missing features.

You like living on the bleeding edge and arent worried about code breaking across new versions of
Julia.

You like to do most things yourself.

Having read the reasons to choose another package, you may wonder why you should consider Python.

You need a language which can act as an end-to-end solution that allows access to web-based ser-
vices, database servers, data management and processing and statistical computation. Python can
even be used to write server-side apps such as a dynamic website (see e.g. http://stackoverflow.
com), apps for desktop-class operating systems with graphical user interfaces, or apps for tablets and
phones apps (iOS and Android).

Data handling and manipulation especially cleaning and reformatting is an important concern.
Python is substantially more capable at data set construction than either R or MATLAB.

Performance is a concern, but not at the top of the list.2

Free is an important consideration Python can be freely deployed, even to 100s of servers in on a
cloud-based cluster (e.g. Amazon Web Services, Google Compute or Azure).

Knowledge of Python, as a general purpose language, is complementary to R/MATLAB/Julia/Ox/-


GAUSS/Stata.

1.2 Conventions

These notes will follow two conventions.

1. Code blocks will be used throughout.


"""A docstring
"""

# Comments appear in a different color

# Reserved keywords are highlighted


and as assert break class continue def del elif else
except exec finally for from global if import in is
lambda not or pass print raise return try while with yield

# Common functions and classes are highlighted in a


# different color. Note that these are not reserved,

2
Python performance can be made arbitrarily close to C using a variety of methods, including Numba (pure python), Cython
(C/Python creole language) or directly calling C code. Moreover, recent advances have substantially closed the gap with respect
to other Just-in-Time compiled languages such as MATLAB.
1.3 Important Components of the Python Scientific Stack 3

# and can be used although best practice would be


# to avoid them if possible
array matrix range list True False None

# Long lines are indented


some_text = This is a very, very, very, very, very, very, very, very, very, very, very, very
long line.

2. When a code block contains >>>, this indicates that the command is running an interactive IPython
session. Output will often appear after the console command, and will not be preceded by a com-
mand indicator.
>>> x = 1.0
>>> x + 2
3.0

If the code block does not contain the console session indicator, the code contained in the block is
intended to be executed in a standalone Python file.
import numpy as np

x = np.array([1,2,3,4])
y = np.sum(x)
print(x)
print(y)

1.3 Important Components of the Python Scientific Stack

1.3.1 Python

Python 3.5 (or later) is required. This provides the core Python interpreter. Most of the examples should
work with the latest version of Python 2.7 as well.

1.3.2 NumPy

NumPy provides a set of array and matrix data types which are essential for statistics, econometrics and
data analysis.

1.3.3 SciPy

SciPy contains a large number of routines needed for analysis of data. The most important include a wide
range of random number generators, linear algebra routines, and optimizers. SciPy depends on NumPy.

1.3.4 Jupyter and IPython

IPython provides an interactive Python environment which enhances productivity when developing code
or performing interactive data analysis. Jupyter provides a generic set of infrastructure that enables IPython
to be run in a variety of settings including an improved console (QtConsole) or in an interactive web-
browser based notebook.
4 Introduction

1.3.5 matplotlib and seaborn

matplotlib provides a plotting environment for 2D plots, with limited support for 3D plotting. seaborn is
a Python package that improves the default appearance of matplotlib plots without any additional code.

1.3.6 pandas

pandas provides high-performance data structures.

1.3.7 statsmodels

statsmodels is pandas-aware and provides models used in the statistical analysis of data including linear
regression, Generalized Linear Models (GLMs), and time-series models (e.g., ARIMA).

1.3.8 Performance Modules

A number of modules are available to help with performance. These include Cython and Numba. Cython
is a Python module which facilitates using a simple Python-derived creole to write functions that can be
compiled to native (C code) Python extensions. Numba uses a method of just-in-time compilation to
translate a subset of Python to native code using Low-Level Virtual Machine (LLVM).

1.4 Setup

The recommended method to install the Python scientific stack is to use Continuum Analytics Anaconda.
Appendix 1.A.3 describes a more complex installation procedure with instructions for directly installing
Python and the required modules when it is not possible to install Anaconda.

Continuum Analytics Anaconda

Anaconda, a free product of Continuum Analytics (www.continuum.io), is a virtually complete scientific


stack for Python. It includes both the core Python interpreter and standard libraries as well as most mod-
ules required for data analysis. Anaconda is free to use and modules for accelerating the performance of
linear algebra on Intel processors using the Math Kernel Library (MKL) are provided. Continuum Analyt-
ics also provides other high-performance modules for reading large data files or using the GPU to further
accelerate performance for an additional, modest charge. Most importantly, installation is extraordinarily
easy on Windows, Linux, and OS X. Anaconda is also simple to update to the latest version using

conda update conda


conda update anaconda

Windows

Installation on Windows requires downloading the installer and running. Anaconda comes in both Python
2.7 and 3.x flavors, and the latest Python 3.x is the preferred choice. These instructions use ANACONDA
to indicate the Anaconda installation directory (e.g. the default is C:\Anaconda). Once the setup has
completed, open a command prompt (cmd.exe) and run
1.4 Setup 5

cd ANACONDA\Scripts
conda update conda
conda update anaconda
conda install html5lib seaborn

which will first ensure that Anaconda is up-to-date. conda install can be used later to install other pack-
ages that may be of interest. Note that if Anaconda is installed into a directory other than the default, the
full path should not contain Unicode characters or spaces.

Notes

The recommended settings for installing Anaconda on Windows are:

Install for all users, which requires admin privileges. If these are not available, then choose the Just
for me option, but be aware of installing on a path that contains non-ASCII characters which can
cause issues.

Add Anaconda to the System PATH - This is important to ensure that Anaconda commands can be
run from the command prompt.

Register Anaconda as the system Python unless you have a specific reason not to (unlikely).

If Anaconda is not added to the system path, it is necessary to add the ANACONDA and ANACONDA\Scripts
directories to the PATH using
set PATH=ANACONDA;ANACONDA\Scripts;%PATH%

before running Python programs.

Linux and OS X

Installation on Linux requires executing


bash Anaconda3-x.y.z-Linux-ISA.sh

where x.y.z will depend on the version being installed and ISA will be either x86 or more likely x86_64.
Anaconda comes in both Python 2.7 and 3.x flavors, and the latest Python 3.x is the preferred choice. The
OS X installer is available either in a GUI installed (pkg format) or as a bash installer which is installed
in an identical manner to the Linux installation. It is strongly recommended that the anaconda/bin is
prepended to the path. This can be performed in a session-by-session basis by entering
export PATH=ANACONDA/bin;$PATH

On Linux this change can be made permanent by entering this line in .bashrc which is a hidden file located
in ~/. On OS X, this line can be added to .bash_profile which is located in the home directory (~/).3
After installation completes, execute
conda update conda
conda update anaconda
conda install html5lib seaborn

3
Use the appropriate settings file if using a different shell (e.g. .zshrc for zsh).
6 Introduction

which will first ensure that Anaconda is up-to-date and then to install the Intel Math Kernel library-linked
modules, which provide substantial performance improvements this package requires a license which
is free to academic users and low cost to others. If acquiring a license is not possible, omit this line.
conda install can be used later to install other packages that may be of interest.

Notes

All instructions for OS X and Linux assume that ANACONDA/bin has been added to the path. If this is not
the case, it is necessary to run

cd ANACONDA
cd bin

and then all commands must be prepended by a . as in

./conda update conda

1.5 Using Python

Python can be programmed using an interactive session using IPython or by directly executing Python
scripts text files that end with the extension .py using the Python interpreter.

1.5.1 Python and IPython

Most of this introduction focuses on interactive programming, which has some distinct advantages when
learning a language. The standard Python interactive console is very basic and does not support useful
features such as tab completion. IPython, and especially the QtConsole version of IPython, transforms
the console into a highly productive environment which supports a number of useful features:

Tab completion - After entering 1 or more characters, pressing the tab button will bring up a list of
functions, packages, and variables which match the typed text. If the list of matches is large, pressing
tab again allows the arrow keys can be used to browse and select a completion.

Magic function which make tasks such as navigating the local file system (using %cd ~/directory/
or just cd ~/directory/ assuming that %automagic is on) or running other Python programs (using run
program.py) simple. Entering %magic inside and IPython session will produce a detailed description
of the available functions. Alternatively, %lsmagic produces a succinct list of available magic com-
mands. The most useful magic functions are

cd - change directory

edit filename - launch an editor to edit filename


ls or ls pattern - list the contents of a directory

run filename - run the Python file filename

timeit - time the execution of a piece of code or function


1.5 Using Python 7

Integrated help - When using the QtConsole, calling a function provides a view of the top of the help
function. For example, entering mean( will produce a view of the top 20 lines of its help text.

Inline figures - Both the QtConsole and the notebook can also display figure inline which produces a
tidy, self-contained environment. This can be enabled by entering %matplotlib inline in an IPython
session.

The special variable _ contains the last result in the console, and so the most recent result can be
saved to a new variable using the syntax x = _.

Support for profiles, which provide further customization of sessions.

1.5.2 Launching IPython

OS X and Linux

IPython can be started by running


ipython

in the terminal. IPython using the QtConsole can be started using


jupyter qtconsole

A single line launcher on OS X or Linux can be constructed using


bash -c "jupyter qtconsole"

This single line launcher can be saved as filename.command where filename is a meaningful name (e.g.
IPython-Terminal) to create a launcher on OS X by entering the command
chmod 755 /FULL/PATH/TO/filename.command

The same command can to create a Desktop launcher on Ubuntu by running


sudo apt-get install --no-install-recommends gnome-panel
gnome-desktop-item-edit ~/Desktop/ --create-new

and then using the command as the Command in the dialog that appears.

Windows (Anaconda)

To run IPython open cmd and enter IPython in the start menu. Starting IPython using the QtConsole is
similar and is simply called QtConsole in the start menu. Launching IPython from the start menu should
create a window similar to that in figure 1.1.
Next, run

jupyter qtconsole --generate-config

in the terminal or command prompt to generate a file named jupyter_qtconsole_config.py. This file contains
settings that are useful for customizing the QtConsole window. A few recommended modifications are
8 Introduction

Figure 1.1: IPython running in the standard Windows console (cmd.exe).


c.IPythonWidget.font_size = 11
c.IPythonWidget.font_family = "Bitstream Vera Sans Mono"
c.JupyterWidget.syntax_style = monokai

These commands assume that the Bitstream Vera fonts have been locally installed, which are available
from http://ftp.gnome.org/pub/GNOME/sources/ttf-bitstream-vera/1.10/. Opening QtConsole should
create a window similar to that in figure 1.2 (although the appearance might differ) if you did not use the
recommendation configuration.

1.5.3 Getting Help

Help is available in IPython sessions using help(function). Some functions (and modules) have very long
help files. When using IPython, these can be paged using the command ?function or function? so that the
text can be scrolled using page up and down and q to quit. ??function or function?? can be used to type
the entire function including both the docstring and the code.

1.5.4 Running Python programs

While interactive programming is useful for learning a language or quickly developing some simple code,
complex projects require the use of complete programs. Programs can be run either using the IPython
magic work %run program.py or by directly launching the Python program using the standard interpreter
using python program.py. The advantage of using the IPython environment is that the variables used in
the program can be inspected after the program run has completed. Directly calling Python will run the
program and then terminate, and so it is necessary to output any important results to a file so that they
can be viewed later.4
4
Programs can also be run in the standard Python interpreter using the command:
exec(compile(open(filename.py).read(),filename.py,exec))
1.5 Using Python 9

Figure 1.2: IPython running in a QtConsole session.


To test that you can successfully execute a Python program, input the code in the block below into a
text file and save it as firstprogram.py.
# First Python program
import time

print(Welcome to your first Python program.)


input(Press enter to exit the program.)
print(Bye!)
time.sleep(2)

Once you have saved this file, open the console, navigate to the directory you saved the file and enter
python firstprogram.py. Finally, run the program in IPython by first launching IPython, and the using
%cd to change to the location of the program, and finally executing the program using %run firstprogram.py.

1.5.5 %pylab and %matplotlib

When writing Python code, only a small set of core functions and variable types are available in the in-
terpreter. The standard method to access additional variable types or functions is to use imports, which
explicitly allow access to specific packages or functions. While it is best practice to only import required
functions or packages, there are many functions in multiple packages that are commonly encountered
in these notes. Pylab is a collection of common NumPy, SciPy and Matplotlib functions that can be eas-
ily imported using a single command in an IPython session, %pylab. This is nearly equivalent to calling
from pylab import *, since it also sets the backend that is used to draw plots. The backend can be manu-
ally set using %pylab backend where backend is one of the available backends (e.g., qt5 or inline). Similarly
%matplotlib backend can be used to set just the backend without importing all of the modules and func-
tions come with %pylab .
Most chapters assume that %pylab has been called so that functions provided by NumPy can be called
10 Introduction

Figure 1.3: A successful test that matplotlib, IPython, NumPy and SciPy were all correctly installed.
without explicitly importing them.

1.5.6 Testing the Environment

To make sure that you have successfully installed the required components, run IPython using shortcut
or by running ipython or jupyter qtconsole run in a terminal window. Enter the following commands,
one at a time (the meaning of the commands will be covered later in these notes).
>>> %pylab qt5
>>> x = randn(100,100)
>>> y = mean(x,0)
>>> import seaborn
>>> plot(y)
>>> import scipy as sp

If everything was successfully installed, you should see something similar to figure 1.3.

1.5.7 jupyter notebook

A jupyter notebook is a simple and useful method to share code with others. Notebooks allow for a fluid
synthesis of formatted text, typeset mathematics (using LATEX via MathJax) and Python. The primary method
for using notebooks is through a web interface, which allows creation, deletion, export and interactive
editing of notebooks.
1.5 Using Python 11

Figure 1.4: The default IPython Notebook screen showing two notebooks.
To launch the jupyter notebook server, open a command prompt or terminal and enter

jupyter notebook

This command will start the server and open the default browser which should be a modern version of
Chrome (preferable), Chromium or Firefox. If the default browser is Safari, Internet Explorer or Edge, the
URL can be copied and pasted into Chrome. The first screen that appears will look similar to figure 1.4,
except that the list of notebooks will be empty. Clicking on New Notebook will create a new notebook,
which, after a bit of typing, can be transformed to resemble figure 1.5. Notebooks can be imported by
dragging and dropping and exported from the menu inside a notebook.

1.5.8 Integrated Development Environments

As you progress in Python and begin writing more sophisticated programs, you will find that using an In-
tegrated Development Environment (IDE) will increase your productivity. Most contain productivity en-
hancements such as built-in consoles, code completion (or IntelliSense, for completing function names)
and integrated debugging. Discussion of IDEs is beyond the scope of these notes, although Spyder is a
reasonable choice (free, cross-platform). Aptana Studio is another free alternative. My preferred IDE is
PyCharm, which has a community edition that is free for use (the professional edition is low cost for aca-
demics).
12 Introduction

Figure 1.5: An IPython notebook showing formatted markdown, LATEX math and cells containing code.
Spyder

Spyder is an IDE specialized for use in scientific applications of Python rather than for general purpose
application development. This is both an advantage and a disadvantage when compared to a full featured
IDE such as PyCharm, Python Tools for Visual Studio (PVTS), PyDev or Aptana Studio. The main advantage
is that many powerful but complex features are not integrated into Spyder, and so the learning curve is
much shallower. The disadvantage is similar - in more complex projects, or if developing something that is
not straight scientific Python, Spyder is less capable. However, netting these two, Spyder is almost certainly
the IDE to use when starting Python, and it is always relatively simple to migrate to a sophisticated IDE if
needed.
Spyder is started by entering spyder in the terminal or command prompt. A window similar to that in
figure 1.6 should appear. The main components are the editor (1), the object inspector (2), which dynam-
ically will show help for functions that are used in the editor, and the console (3). By default, Spyder opens
a standard Python console, although it also supports using the more powerful IPython console. The object
inspector window, by default, is grouped with a variable explorer, which shows the variables that are in
memory and the file explorer, which can be used to navigate the file system. The console is grouped with
an IPython console window (needs to be activated first using the Interpreters menu along the top edge),
and the history log which contains a list of commands executed. The buttons along the top edge facilitate
saving code, running code and debugging.

1.6 Exercises

1. Install Python.

2. Test the installation using the code in section 1.5.6.

3. Configure IPython using the start-up script in section ??.


1.A Additional Installation Issues 13

Figure 1.6: The default Spyder IDE on Windows.


4. Customize IPython QtConsole using a font or color scheme. More customizations can be found by
running ipython -h.

5. Explore tab completion in IPython by entering a<TAB> to see the list of functions which start with
a and are loaded by pylab. Next try i<TAB>, which will produce a list longer than the screen press
ESC to exit the pager.

6. Launch IPython Notebook and run code in the testing section.

7. Open Spyder and explore its features.

1.A Additional Installation Issues

1.A.1 Frequently Encountered Problems

All

Whitespace sensitivity

Python is whitespace sensitive and so indentation, either spaces or tabs, affects how Python interprets
files. The configuration files, e.g. ipython_config.py, are plain Python files and so are sensitive to whitespace.
Introducing white space before the start of a configuration option will produce an error, so ensure there
is no whitespace before active lines of a configuration.
14 Introduction

Windows

Spaces in path

Python may work when directories have spaces.

Unicode in path

Python does not always work well when a path contains Unicode characters, which might occur in a user
name. While this isnt an issue for installing Python or Anaconda, it is an issue for IPython which looks in
c:\user\username\.ipython for configuration files. The solution is to define the HOME variable before launch-
ing IPython to a path that has only ASCII characters.

mkdir c:\anaconda\ipython_config
set HOME=c:\anaconda\ipython_config
c:\Anaconda\Scripts\activate econometrics
ipython profile create econometrics
ipython --profile=econometrics

The set HOME=c:\anaconda\ipython_config can point to any path with directories containing only ASCII
characters, and can also be added to any batch file to achieve the same effect.

OS X

Installing Anaconda to the root of the partition

If the user account used is running as root, then Anaconda may install to /anaconda and not ~/anaconda by
default. Best practice is not to run as root, although in principle this is not a problem, and /anaconda can
be used in place of ~/anaconda in any of the instructions.

1.A.2 register_python.py

A complete listing of register_python.py is included in this appendix.


# -*- encoding: utf-8 -*-
#
# Script to register Python 2.0 or later for use with win32all
# and other extensions that require Python registry settings
#
# Adapted by Ned Batchelder from a script
# written by Joakim Law for Secret Labs AB/PythonWare
#
# source:
# http://www.pythonware.com/products/works/articles/regpy20.htm

import sys
from _winreg import *

# tweak as necessary
version = sys.version[:3]
1.A Additional Installation Issues 15

installpath = sys.prefix

regpath = "SOFTWARE\\Python\\Pythoncore\\%s\\" % (version)


installkey = "InstallPath"
pythonkey = "PythonPath"
pythonpath = "%s;%s\\Lib\\;%s\\DLLs\\" % (
installpath, installpath, installpath
)

def RegisterPy():
try:
reg = OpenKey(HKEY_LOCAL_MACHINE, regpath)
except EnvironmentError:
try:
reg = CreateKey(HKEY_LOCAL_MACHINE, regpath)
except Exception, e:
print "*** Unable to register: %s" % e
return

SetValue(reg, installkey, REG_SZ, installpath)


SetValue(reg, pythonkey, REG_SZ, pythonpath)
CloseKey(reg)
print "--- Python %s at %s is now registered!" % (version, installpath)

if __name__ == "__main__":
RegisterPy()

1.A.3 Setup using Virtual Environments

The simplest method to install the Python scientific stack is to use directly Continuum Analytics Ana-
conda. These instructions describe alternative installation options using virtual environments, which al-
low alternative configurations to simultaneously co-exist on a single system. The primary advantage of a
virtual environment is that it allows package versions to be frozen so that code that upgrading a module
or all of Anaconda does not upgrade the packages in a particular virtual environment.

Windows

Installation on Windows requires downloading the installer and running. These instructions use ANA-
CONDA to indicate the Anaconda installation directory (e.g. the default is C:\Anaconda). Once the setup
has completed, open a command prompt (cmd.exe) and run

cd ANACONDA
conda update conda
conda update anaconda
conda create -n econometrics qtconsole notebook matplotlib numpy pandas scipy spyder statsmodels
conda install -n econometrics cython lxml nose numba numexpr pytables sphinx xlrd xlwt html5lib
seaborn
16 Introduction

which will first ensure that Anaconda is up-to-date and then create a virtual environment named econo-
metrics. Using a virtual environment is a best practice and is important since component updates can
lead to errors in otherwise working programs due to backward incompatible changes in a module. The
long list of modules in the conda create command includes the core modules. conda install contains the
remaining packages and is shown as an example of how to add packages to an existing virtual environment
after it has been created. It is also possible to install all available Anaconda packages using the command
conda create -n econometrics anaconda.
The econometrics environment must be activated before use. This is accomplished by running
ANACONDA\Scripts\activate.bat econometrics

from the command prompt, which prepends [econometrics] to the prompt as an indication that virtual
environment is active. Activate the econometrics environment and then run
cd c:\
ipython

which will open an IPython session using the newly created virtual environment.
Virtual environments can also be created using specific versions of packages using pinning. For ex-
ample, to create a virtual environment naed python2 using Python 2.7 and NumPy 1.10,

conda create -n python2 python=2.7 numpy=1.10 scipy pandas

which will install the requested versions of Python and NumPy as well as the latest version of SciPy and
pandas that are compatible with the pinned versions.

Linux and OS X

Installation on Linux requires executing


bash Anaconda3-x.y.z-Linux-ISA.sh

where x.y.z will depend on the version being installed and ISA will be either x86 or more likely x86_64.
The OS X installer is available either in a GUI installed (pkg format) or as a bash installer which is installed
in an identical manner to the Linux installation. After installation completes, change to the folder where
Anaconda installed (written here as ANACONDA, default ~/anaconda) and execute
cd ANACONDA
cd bin
./conda update conda
./conda update anaconda
./conda create -n econometrics qtconsole notebook matplotlib numpy pandas scipy spyder statsmodels
./conda install -n econometrics cython lxml nose numba numex