0% found this document useful (0 votes)
168 views380 pages

Python Introduction 2020

Uploaded by

rheed1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views380 pages

Python Introduction 2020

Uploaded by

rheed1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 380

Introduction to Python for

Econometrics, Statistics and Data Analysis


4th Edition

Kevin Sheppard
University of Oxford

Thursday 31st December, 2020


2

©2020 Kevin Sheppard


Solutions and Other Material

Solutions
Solutions for exercises and some extended examples are available on GitHub.
https://github.com/bashtage/python-for-econometrics-statistics-data-analysis

Introductory Course
A self-paced introductory course is available on GitHub in the course/introduction folder. Solutions are avail-
able in the solutions/introduction folder.
https://github.com/bashtage/python-introduction/

Video Demonstrations

The introductory course is accompanied by video demonstrations of each lesson on YouTube.


https://www.youtube.com/playlist?list=PLVR_rJLcetzkqoeuhpIXmG9uQCtSoGBz1

Using Python for Financial Econometrics


A self-paced course that shows how Python can be used in econometric analysis, with an emphasis on financial
econometrics, is also available on GitHub in the course/autumn and course/winter folders.
https://github.com/bashtage/python-introduction/
ii
Changes

Changes since the Fourth Edition


• Added a discussion of context managers using the with statement.

• Switched examples to prefer the context manager syntax to reflect best practices.
iv
Notes to the Fourth Edition

Changes in the Fourth Edition


• Python 3.8 is the recommended version. The notes require Python 3.6 or later, and all references to
Python 2.7 have been removed.

• Removed references to NumPy’s matrix class and clarified that it should not be used.

• Verified that all code and examples work correctly against 2020 versions of modules. The notable pack-
ages and their versions are:

– Python 3.8 (Preferred version), 3.6 (Minimum version)


– NumPy: 1.19.1
– SciPy: 1.5.3
– pandas: 1.1
– matplotlib: 3.3

• Expanded description of model classes and statistical tests in statsmodels that are most relevant for econo-
metrics. TODO

• Expanded the list of packages of interest to researchers working in statistics, econometrics and machine
learning. TODO

• Introduced f-Strings in Section 21.3.3 as the preferred way to format strings using modern Python.

• Added minimize as the preferred interface for non-linear function optimization in Chapter 20. TODO

Changes since the Third Edition


• Verified that all code and examples work correctly against 2019 versions of modules. The notable pack-
ages and their versions are:

– Python 3.7 (Preferred version)


– NumPy: 1.16
– SciPy: 1.3
– pandas: 0.25
– matplotlib: 3.1

• Python 2.7 support has been officially dropped, although most examples continue to work with 2.7. Do
not Python 2.7 in 2019 for numerical code.
vi

• Small typo fixes, thanks to Marton Huebler.

• Fixed direct download of FRED data due to API changes, thanks to Jesper Termansen.

• Thanks for Bill Tubbs for a detailed read and multiple typo reports.

• Updated to changes in line profiler (see Ch. 23)

• Updated deprecations in pandas.

• Removed hold from plotting chapter since this is no longer required.

• Thanks for Gen Li for multiple typo reports.

• Tested all code on Pyton 3.6. Code has been tested against the current set of modules installed by conda
as of February 2018. The notable packages and their versions are:

– NumPy: 1.13
– Pandas: 0.22
Notes to the Third Edition

This edition includes the following changes from the second edition (August 2014).

Changes in the Third Edition


• Rewritten installation section focused exclusively on using Continuum’s Anaconda.

• Python 3.5 is the default version of Python instead of 2.7. Python 3.5 (or newer) is well supported by
the Python packages required to analyze data and perform statistical analysis, and bring some new useful
features, such as a new operator for matrix multiplication (@).

• Removed distinction between integers and longs in built-in data types chapter. This distinction is only
relevant for Python 2.7.

• dot has been removed from most examples and replaced with @ to produce more readable code.

• Split Cython and Numba into separate chapters to highlight the improved capabilities of Numba.

• Verified all code working on current versions of core libraries using Python 3.5.

• pandas

– Updated syntax of pandas functions such as resample.


– Added pandas Categorical.
– Expanded coverage of pandas groupby.
– Expanded coverage of date and time data types and functions.

• New chapter introducing statsmodels, a package that facilitates statistical analysis of data. statsmodels
includes regression analysis, Generalized Linear Models (GLM) and time-series analysis using ARIMA
models.

Changes since the Second Edition


• Fixed typos reported by a reader – thanks to Ilya Sorvachev

• Code verified against Anaconda 2.0.1.

• Added diagnostic tools and a simple method to use external code in the Cython section.

• Updated the Numba section to reflect recent changes.

• Fixed some typos in the chapter on Performance and Optimization.


viii

• Added examples of joblib and IPython’s cluster to the chapter on running code in parallel.

• New chapter introducing object-oriented programming as a method to provide structure and organization
to related code.

• Added seaborn to the recommended package list, and have included it be default in the graphics chapter.

• Based on experience teaching Python to economics students, the recommended installation has been
simplified by removing the suggestion to use virtual environment. The discussion of virtual environments
as been moved to the appendix.

• Rewrote parts of the pandas chapter.

• Changed the Anaconda install to use both create and install, which shows how to install additional pack-
ages.

• Fixed some missing packages in the direct install.

• Changed the configuration of IPython to reflect best practices.

• Added subsection covering IPython profiles.

• Small section about Spyder as a good starting IDE.


Notes to the Second Edition

This edition includes the following changes from the first edition (March 2012).

Changes in the Second Edition


• The preferred installation method is now Continuum Analytics’ Anaconda. Anaconda is a complete
scientific stack and is available for all major platforms.

• New chapter on pandas. pandas provides a simple but powerful tool to manage data and perform prelim-
inary analysis. It also greatly simplifies importing and exporting data.

• New chapter on advanced selection of elements from an array.

• Numba provides just-in-time compilation for numeric Python code which often produces large perfor-
mance gains when pure NumPy solutions are not available (e.g. looping code).

• Dictionary, set and tuple comprehensions

• Numerous typos

• All code has been verified working against Anaconda 1.7.0.


x
Contents

1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Important Components of the Python Scientific Stack . . . . . . . . . . . . . . . . . . . . . 3
1.4 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Using Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.A Additional Installation Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Built-in Data Types 15


2.1 Variable Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Core Native Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Additional Container Data Types in the Standard Library . . . . . . . . . . . . . . . . . . . 24
2.4 Python and Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Arrays 29
3.1 Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 1-dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 2-dimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Multidimensional Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.6 Accessing Elements of an Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.7 Slicing and Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.8 import and Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.9 Calling Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Basic Math 43
4.1 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2 Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3 Addition (+) and Subtraction (-) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4 Multiplication (⁎) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.5 Matrix Multiplication (@) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.6 Array and Matrix Division (/) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
xii CONTENTS

4.7 Exponentiation (**) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46


4.8 Parentheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.9 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.10 Operator Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Basic Functions and Numerical Indexing 49


5.1 Generating Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.2 Rounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.4 Complex Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.5 Set Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.6 Sorting and Extreme Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.7 Nan Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.8 Functions and Methods/Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6 Special Arrays 61
6.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

7 Array Functions 63
7.1 Shape Information and Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.2 Linear Algebra Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.3 Views . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

8 Importing and Exporting Data 75


8.1 Importing Data using pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
8.2 Importing Data without pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
8.3 Saving or Exporting Data using pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.4 Saving or Exporting Data without pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
8.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

9 Inf, NaN and Numeric Limits 83


9.1 inf and NaN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.2 Floating point precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
9.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

10 Logical Operators and Find 85


10.1 >, >=, <, <=, ==, != . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
10.2 and, or, not and xor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
10.3 Multiple tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
10.4 is⁎ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
10.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
CONTENTS xiii

11 Advanced Selection and Assignment 91


11.1 Numerical Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
11.2 Logical Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
11.3 Performance Considerations and Memory Management . . . . . . . . . . . . . . . . . . . 99
11.4 Assignment with Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
11.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

12 Flow Control, Loops and Exception Handling 103


12.1 Whitespace and Flow Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
12.2 if . . . elif . . . else . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
12.3 for . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
12.4 while . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
12.5 try . . . except . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
12.6 List Comprehensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
12.7 Tuple, Dictionary and Set Comprehensions . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

13 Dates and Times 113


13.1 Creating Dates and Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
13.2 Dates Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
13.3 Numpy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

14 Graphics 117
14.1 seaborn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
14.2 2D Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
14.3 Advanced 2D Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
14.4 3D Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
14.5 General Plotting Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
14.6 Exporting Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
14.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

15 pandas 137
15.1 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
15.2 Statistical Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
15.3 Time-series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
15.4 Importing and Exporting Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
15.5 Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
15.6 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

16 Structured Arrays 175


16.1 Mixed Arrays with Column Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
16.2 Record Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

17 Custom Function and Modules 179


17.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
xiv CONTENTS

17.2 Variable Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184


17.3 Example: Least Squares with Newey-West Covariance . . . . . . . . . . . . . . . . . . . . 186
17.4 Anonymous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
17.5 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
17.6 Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
17.7 PYTHONPATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
17.8 Python Coding Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
17.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
17.A Listing of econometrics.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

18 Probability and Statistics Functions 195


18.1 Simulating Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
18.2 Simulation and Random Number Generation . . . . . . . . . . . . . . . . . . . . . . . . . 198
18.3 Statistics Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
18.4 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
18.5 Select Statistics Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
18.6 Select Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
18.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

19 Statistical Analysis with statsmodels 211


19.1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
19.2 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
19.3 Other Notable Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
19.4 Time-series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
19.5 Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

20 Non-linear Function Optimization 215


20.1 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
20.2 Derivative-free Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
20.3 Constrained Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
20.4 Scalar Function Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
20.5 Nonlinear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
20.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

21 String Manipulation 227


21.1 String Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
21.2 String Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
21.3 Formatting Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
21.4 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
21.5 Safe Conversion of Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

22 File System Operations 239


22.1 Changing the Working Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
22.2 Creating and Deleting Directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
22.3 Listing the Contents of a Directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
CONTENTS xv

22.4 Copying, Moving and Deleting Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240


22.5 Executing Other Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
22.6 Creating and Opening Archives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
22.7 Reading and Writing Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
22.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

23 Performance and Code Optimization 245


23.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
23.2 Timing Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
23.3 Vectorize to Avoid Unnecessary Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
23.4 Alter the loop dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
23.5 Utilize Broadcasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
23.6 Use In-place Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
23.7 Avoid Allocating Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
23.8 Inline Frequent Function Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
23.9 Consider Data Locality in Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
23.10Profile Long Running Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
23.11Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

24 Improving Performance using Numba 253


24.1 Quick Start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
24.2 Supported Python Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
24.3 Supported NumPy Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
24.4 Diagnosing Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
24.5 Replacing Python function with C functions . . . . . . . . . . . . . . . . . . . . . . . . . . 263
24.6 Other Features of Numba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
24.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

25 Improving Performance using Cython 267


25.1 Diagnosing Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
25.2 Interfacing with External Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
25.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279

26 Executing Code in Parallel 281


26.1 map and related functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
26.2 multiprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
26.3 joblib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
26.4 IPython’s Parallel Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
26.5 Converting a Serial Program to Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
26.6 Other Concerns when executing in Parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

27 Object-Oriented Programming (OOP) 295


27.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
27.2 Class basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
27.3 Building a class for Autoregressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
xvi CONTENTS

27.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304

28 Other Interesting Python Packages 305


28.1 Statistics and Statistical Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
28.2 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
28.3 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
28.4 Other Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

29 Examples 307
29.1 Estimating the Parameters of a GARCH Model . . . . . . . . . . . . . . . . . . . . . . . . 307
29.2 Estimating the Risk Premia using Fama-MacBeth Regressions . . . . . . . . . . . . . . . . 311
29.3 Estimating the Risk Premia using GMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
29.4 Outputting LATEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

30 Quick Reference 321


30.1 Built-ins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
30.2 NumPy (numpy) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
30.3 SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
30.4 Matplotlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
30.5 pandas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345
30.6 IPython . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
Chapter 1

Introduction

Solutions
Solutions for exercises and some extended examples are available on GitHub at https://github.com/
bashtage/python-for-econometrics-statistics-data-analysis.

1.1 Background
These notes are designed for someone new to statistical computing wishing to develop a set of skills necessary
to perform original research using Python. They should also be useful for students, researchers or practition-
ers who require a versatile platform for econometrics, statistics or general numerical analysis (e.g. numeric
solutions to economic models or model simulation).
Python is a popular general–purpose programming language that is well suited to a wide range of problems.1
Recent developments have extended Python’s range of applicability to econometrics, statistics, and general
numerical analysis. Python – with the right set of add-ons – is comparable to domain-specific languages such
as R, MATLAB or Julia. If you are wondering whether you should bother with Python (or another language),
an incomplete list of considerations includes:
You might want to consider R if:

• You want to apply statistical methods. The statistics library of R is second to none, and R is clearly at the
forefront of new statistical algorithm development – meaning you are most likely to find that new(ish)
procedure in R.

• Performance is of secondary importance.

• Free is important.

You might want to consider MATLAB if:

• Commercial support and a clear channel to report issues is important.

• Documentation and organization of modules are more important than the breadth of algorithms available.

• Performance is an important concern. MATLAB has optimizations, such as Just-in-Time (JIT) compila-
tion of loops, which is not automatically available in most other packages.
1
According to the ranking on http://www.tiobe.com/tiobe-index/, Python is the 5th most popular language. http:
//langpop.corger.nl/ ranks Python as 4th or 5th .
2 Introduction

You might want to consider Julia if:


• Performance in an interactive based language is your most important concern.
• You don’t mind learning enough Python to interface with Python packages. The Julia ecosystem is less
complete than Python and a bridge to Python is used to provide missing features.
• You like to do most things yourself or you are on the bleeding edge and so existing libraries do not exist
with the features you require.
Having read the reasons to choose another package, you may wonder why you should consider Python.
• You need a language which can act as an end-to-end solution that allows access to web-based services,
database servers, data management and processing and statistical computation. Python can even be used
to write server-side apps such as a dynamic website (see e.g. http://stackoverflow.com), apps
for desktop-class operating systems with graphical user interfaces, or apps for tablets and phones apps
(iOS and Android).
• Data handling and manipulation – especially cleaning and reformatting – is an important concern. Python
is substantially more capable at data set construction than either R or MATLAB.
• Performance is a concern, but not at the top of the list.2
• Free is an important consideration – Python can be freely deployed, even to 100s of servers in on a
cloud-based cluster (e.g. Amazon Web Services, Google Compute or Azure).
• Knowledge of Python, as a general purpose language, is complementary to R/MATLAB/Julia/Ox/-
GAUSS/Stata.

1.2 Conventions
These notes will follow two conventions.
1. Code blocks will be used throughout.
"""A docstring
"""

# Comments appear in a different color

# Reserved keywords are highlighted


and as assert break class continue def del elif else
except exec finally for from global if import in is
lambda not or pass print raise return try while with yield

# Common functions and classes are highlighted in a


# different color. Note that these are not reserved,
# and can be used although best practice would be
# to avoid them if possible
array range list True False None

# Long lines are indented


some_text = 'This is a very, very, very, very, very, very, very, very, very, very,
very, very long line.'
2
Python performance can be made arbitrarily close to C using a variety of methods, including numba (pure python), Cython
(C/Python creole language) or directly calling C code. Moreover, recent advances have substantially closed the gap with respect to
other Just-in-Time compiled languages such as MATLAB.
1.3 Important Components of the Python Scientific Stack 3

2. When a code block contains >>>, this indicates that the command is running an interactive IPython
session. Output will often appear after the console command, and will not be preceded by a command
indicator.
>>> x = 1.0
>>> x + 2
3.0

If the code block does not contain the console session indicator, the code contained in the block is
intended to be executed in a standalone Python file.
import numpy as np

x = np.array([1,2,3,4])
y = np.sum(x)
print(x)
print(y)

1.3 Important Components of the Python Scientific Stack


1.3.1 Python
Python 3.6 (or later) is required, and Python 3.8 (the latest release) is recommended. This provides the core
Python interpreter.

1.3.2 NumPy
NumPy provides a set of array data types which are essential for statistics, econometrics and data analysis.

1.3.3 SciPy
SciPy contains a large number of routines needed for analysis of data. The most important include a wide range
of random number generators, linear algebra routines, and optimizers. SciPy depends on NumPy.

1.3.4 Jupyter and IPython


IPython provides an interactive Python environment which enhances productivity when developing code or
performing interactive data analysis. Jupyter provides a generic set of infrastructure that enables IPython to be
run in a variety of settings including an improved console (QtConsole) or in an interactive web-browser based
notebook.

1.3.5 matplotlib and seaborn


matplotlib provides a plotting environment for 2D plots, with limited support for 3D plotting. seaborn is a
Python package that improves the default appearance of matplotlib plots without any additional code.

1.3.6 pandas
pandas provides high-performance data structures and is essential when working with data.
4 Introduction

1.3.7 statsmodels

statsmodels is pandas-aware and provides models used in the statistical analysis of data including linear regres-
sion, Generalized Linear Models (GLMs), and time-series models (e.g., ARIMA).

1.3.8 Performance Modules

A number of modules are available to help with performance. These include Cython and Numba. Cython is a
Python module which facilitates using a Python-like language to write functions that can be compiled to native
(C code) Python extensions. Numba uses a method of just-in-time compilation to translate a subset of Python
to native code using Low-Level Virtual Machine (LLVM).

1.4 Setup
The recommended method to install the Python scientific stack is to use Continuum Analytics’ Anaconda.
Appendix ?? describes a more complex installation procedure with instructions for directly installing Python
and the required modules when it is not possible to install Anaconda.

Continuum Analytics’ Anaconda

Anaconda, a free product of Continuum Analytics (www.continuum.io), is a virtually complete scientific


stack for Python. It includes both the core Python interpreter and standard libraries as well as most modules
required for data analysis. Anaconda is free to use and modules for accelerating the performance of linear alge-
bra on Intel processors using the Math Kernel Library (MKL) are provided. Continuum Analytics also provides
other high-performance modules for reading large data files or using the GPU to further accelerate performance
for an additional, modest charge. Most importantly, installation is extraordinarily easy on Windows, Linux, and
OS X. Anaconda is also simple to update to the latest version using

conda update conda


conda update anaconda

Windows

Installation on Windows requires downloading the installer and running. Anaconda comes in both Python
2.7 and 3.x flavors, and the latest Python 3.x is required. These instructions use ANACONDA to indicate
the Anaconda installation directory (e.g., the default is C:\Anaconda). Once the setup has completed, open a
PowerShell command prompt and run
cd ANACONDA\Scripts
conda init powershell
conda update conda
conda update anaconda
conda install html5lib seaborn jupyterlab

which will first ensure that Anaconda is up-to-date. conda install can be used later to install other packages
that may be of interest. Note that if Anaconda is installed into a directory other than the default, the full path
should not contain Unicode characters or spaces.
1.5 Using Python 5

Notes

The recommended settings for installing Anaconda on Windows are:

• Install for all users, which requires admin privileges. If these are not available, then choose the “Just
for me” option, but be aware of installing on a path that contains non-ASCII characters which can cause
issues.

• Run conda init powershell to ensure that Anaconda commands can be run from the PowerShell
prompt.

• Register Anaconda as the system Python unless you have a specific reason not to (unlikely).

Linux and OS X

Installation on Linux requires executing


bash Anaconda3-x.y.z-Linux-ISA.sh

where x.y.z will depend on the version being installed and ISA will be either x86 or more likely x86_64.
Anaconda comes in both Python 2.7 and 3.x flavors, and the latest Python 3.x is required. The OS X installer is
available either in a GUI installed (pkg format) or as a bash installer which is installed in an identical manner to
the Linux installation. It is strongly recommended that the anaconda/bin is prepended to the path. This can be
performed in a session-by-session basis by entering conda init bash and then restarting your terminal. Note
that other shells such as zsh are also supported, and can be initialized by replacing bash with the name of your
preferred shell.
After installation completes, execute
conda update conda
conda update anaconda
conda install html5lib seaborn jupyterlab

which will first ensure that Anaconda is up-to-date and then install some optional modules. conda install
can be used later to install other packages that may be of interest.

Notes

All instructions for OS X and Linux assume that conda init bash has been run. If this is not the case, it is
necessary to run

cd ANACONDA
cd bin

and then all commands must be prepended by a . as in

./conda update conda

1.5 Using Python


Python can be programmed using an interactive session using IPython or by directly executing Python scripts
– text files that end with the extension .py – using the Python interpreter.
6 Introduction

1.5.1 Python and IPython


Most of this introduction focuses on interactive programming, which has some distinct advantages when learn-
ing a language. The standard Python interactive console is very basic and does not support useful features such
as tab completion. IPython, and especially the QtConsole version of IPython, transforms the console into a
highly productive environment which supports a number of useful features:

• Tab completion - After entering 1 or more characters, pressing the tab button will bring up a list of
functions, packages, and variables which match the typed text. If the list of matches is large, pressing tab
again allows the arrow keys can be used to browse and select a completion.

• “Magic” function which make tasks such as navigating the local file system (using %cd ~/directory/
or just cd ~/directory/ assuming that %automagic is on) or running other Python programs (using
run program.py) simple. Entering %magic inside and IPython session will produce a detailed
description of the available functions. Alternatively, %lsmagic produces a succinct list of available
magic commands. The most useful magic functions are

– cd - change directory
– edit filename - launch an editor to edit filename
– ls or ls pattern - list the contents of a directory
– run filename - run the Python file filename
– timeit - time the execution of a piece of code or function
– history - view commands recently run. When used with the -l switch, the history of previous ses-
sions can be viewed (e.g., history -l 100 will show the most recent 100 commands irrespective
of whether they were entered in the current IPython session of a previous one).

• Integrated help - When using the QtConsole, calling a function provides a view of the top of the help
function. For example, entering mean( will produce a view of the top 20 lines of its help text.

• Inline figures - Both the QtConsole and the notebook can also display figure inline which produces a
tidy, self-contained environment. This can be enabled by entering %matplotlib inline in an IPython
session.

• The special variable _ contains the last result in the console, and so the most recent result can be saved
to a new variable using the syntax x = _.

• Support for profiles, which provide further customization of sessions.

1.5.2 Launching IPython


OS X and Linux

IPython can be started by running


ipython

in the terminal. IPython using the QtConsole can be started using


jupyter qtconsole

A single line launcher on OS X or Linux can be constructed using


bash -c "jupyter qtconsole"
1.5 Using Python 7

Figure 1.1: IPython running in the Windows Terminal app.

This single line launcher can be saved as filename.command where filename is a meaningful name (e.g. IPython-
Terminal) to create a launcher on OS X by entering the command
chmod 755 /FULL/PATH/TO/filename.command

The same command can to create a Desktop launcher on Ubuntu by running


sudo apt-get install --no-install-recommends gnome-panel
gnome-desktop-item-edit ~/Desktop/ --create-new

and then using the command as the Command in the dialog that appears.

Windows (Anaconda)

To run IPython open PowerShell and enter IPython in the start menu. Starting IPython using the QtConsole
is similar and is simply called QtConsole in the start menu. Launching IPython from the start menu should
create a window similar to that in figure 1.1.
Next, run

jupyter qtconsole --generate-config

in the terminal or command prompt to generate a file named jupyter_qtconsole_config.py. This file contains
settings that are useful for customizing the QtConsole window. A few recommended modifications are

c.ConsoleWidget.font_size = 12
c.ConsoleWidget.font_family = "Bitstream Vera Sans Mono"
c.JupyterWidget.syntax_style = "monokai"

These commands assume that the Bitstream Vera fonts have been locally installed, which are available from
http://ftp.gnome.org/pub/GNOME/sources/ttf-bitstream-vera/1.10/. Opening Qt-
Console should create a window similar to that in figure 1.2 (although the appearance might differ) if you
did not use the recommendation configuration.
8 Introduction

Figure 1.2: IPython running in a QtConsole session.

1.5.3 Getting Help


Help is available in IPython sessions using help(function). Some functions (and modules) have very long help
files. When using IPython, these can be paged using the command ?function or function? so that the text can be
scrolled using page up and down and q to quit. ??function or function?? can be used to type the entire function
including both the docstring and the code.

1.5.4 Running Python programs


While interactive programming is useful for learning a language or quickly developing some simple code,
complex projects require the use of complete programs. Programs can be run either using the IPython magic
work %run program.py or by directly launching the Python program using the standard interpreter using
python program.py. The advantage of using the IPython environment is that the variables used in the
program can be inspected after the program run has completed. Directly calling Python will run the program
and then terminate, and so it is necessary to output any important results to a file so that they can be viewed
later.3
To test that you can successfully execute a Python program, input the code in the block below into a text
file and save it as firstprogram.py.
# First Python program
import time

print("Welcome to your first Python program.")


input("Press enter to exit the program.")
print("Bye!")
time.sleep(2)

Once you have saved this file, open the console, navigate to the directory you saved the file and enter python
firstprogram.py. Finally, run the program in IPython by first launching IPython, and the using %cd to
3
Programs can also be run in the standard Python interpreter using the command:
exec(compile(open(’filename.py’).read(),’filename.py’,’exec’))
1.5 Using Python 9

change to the location of the program, and finally executing the program using %run firstprogram.py.

1.5.5 %pylab and %matplotlib


When writing Python code, only a small set of core functions and variable types are available in the interpreter.
The standard method to access additional variable types or functions is to use imports, which explicitly al-
low access to specific packages or functions. While it is best practice to only import required functions or
packages, there are many functions in multiple packages that are commonly encountered in these notes. Pylab
is a collection of common NumPy, SciPy and Matplotlib functions that can be easily imported using a single
command in an IPython session, %pylab. This is nearly equivalent to calling from pylab import ⁎, since it
also sets the backend that is used to draw plots. The backend can be manually set using %pylab backend where
backend is one of the available backends (e.g., qt5 or inline). Similarly %matplotlib backend can be used to
set just the backend without importing all of the modules and functions come with %pylab .
Most chapters assume that %pylab has been called so that functions provided by NumPy can be called
without explicitly importing them.

1.5.6 Testing the Environment


To make sure that you have successfully installed the required components, run IPython using shortcut or by
running ipython or jupyter qtconsole run in a terminal window. Enter the following commands,
one at a time (the meaning of the commands will be covered later in these notes).
>>> %pylab qt5
>>> x = randn(100,100)
>>> y = mean(x,0)
>>> import seaborn
>>> plot(y)
>>> import scipy as sp

If everything was successfully installed, you should see something similar to figure 1.3.

1.5.7 jupyterlab notebooks


A jupyter notebook is a simple and useful method to share code with others. Notebooks allow for a fluid
synthesis of formatted text, typeset mathematics (using LATEX via MathJax) and Python. The primary method
for using notebooks is through a web interface, which allows creation, deletion, export and interactive editing
of notebooks.
To launch the jupyterlab server, open a command prompt or terminal and enter

jupyter lab

This command will start the server and open the default browser which should be a modern version of Chrome
(preferable), Chromium, Firefox or Edge. If the default browser is Safari or Internet Explorer, the URL can
be copied and pasted into Chrome. The first screen that appears will look similar to figure 1.4, except that the
list of notebooks will be empty. Clicking on New Notebook will create a new notebook, which, after a bit of
typing, can be transformed to resemble figure 1.5. Notebooks can be imported by dragging and dropping and
exported from the menu inside a notebook.

1.5.8 Integrated Development Environments


As you progress in Python and begin writing more sophisticated programs, you will find that using an Integrated
Development Environment (IDE) will increase your productivity. Most contain productivity enhancements
10 Introduction

Figure 1.3: A successful test that matplotlib, IPython, NumPy and SciPy were all correctly installed.

Figure 1.4: The default IPython Notebook screen showing two notebooks.
1.5 Using Python 11

Figure 1.5: A jupyterlab notebook showing formatted markdown, LATEX math and cells containing code.

such as built-in consoles, code completion (or IntelliSense, for completing function names) and integrated
debugging. Discussion of IDEs is beyond the scope of these notes, although Spyder is a reasonable choice
(free, cross-platform). Visual Studio Code is an excellent alternative. My preferred IDE is PyCharm, which has
a community edition that is free for use (the professional edition is low cost for academics).

spyder

spyder is an IDE specialized for use in scientific applications of Python rather than for general purpose applica-
tion development. This is both an advantage and a disadvantage when compared to a full featured IDE such as
PyCharm or VS Code. The main advantage is that many powerful but complex features are not integrated into
Spyder, and so the learning curve is much shallower. The disadvantage is similar - in more complex projects,
or if developing something that is not straight scientific Python, Spyder is less capable. However, netting these
two, Spyder is almost certainly the IDE to use when starting Python, and it is always relatively simple to migrate
to a sophisticated IDE if needed.
Spyder is started by entering spyder in the terminal or command prompt. A window similar to that in
figure 1.6 should appear. The main components are the editor (1), the object inspector (2), which dynamically
will show help for functions that are used in the editor, and the console (3). By default, Spyder opens a standard
Python console, although it also supports using the more powerful IPython console. The object inspector
window, by default, is grouped with a variable explorer, which shows the variables that are in memory and the
file explorer, which can be used to navigate the file system. The console is grouped with an IPython console
window (needs to be activated first using the Interpreters menu along the top edge), and the history log which
contains a list of commands executed. The buttons along the top edge facilitate saving code, running code and
debugging.
12 Introduction

Figure 1.6: The default Spyder IDE on Windows.

1.6 Exercises
1. Install Python.

2. Test the installation using the code in section 1.5.6.

3. Customize IPython QtConsole using a font or color scheme. More customization options can be found
by running ipython -h.

4. Explore tab completion in IPython by entering a<TAB> to see the list of functions which start with a and
are loaded by pylab. Next try i<TAB>, which will produce a list longer than the screen – press ESC to
exit the pager.

5. Launch IPython Notebook and run code in the testing section.

6. Open Spyder and explore its features.

1.A Additional Installation Issues


1.A.1 Frequently Encountered Problems
All
Whitespace sensitivity

Python is whitespace sensitive and so indentation, either spaces or tabs, affects how Python interprets files. The
configuration files, e.g. ipython_config.py, are plain Python files and so are sensitive to whitespace. Introducing
white space before the start of a configuration option will produce an error, so ensure there is no whitespace
before active lines of a configuration.
1.A Additional Installation Issues 13

Windows
Spaces in path

Python may work when directories have spaces.

Unicode in path

Python does not always work well when a path contains Unicode characters, which might occur in a user
name. While this isn’t an issue for installing Python or Anaconda, it is an issue for IPython which looks
in c:\user\username\.ipython for configuration files. The solution is to define the HOME variable before
launching IPython to a path that has only ASCII characters.

mkdir c:\anaconda\ipython_config
set HOME=c:\anaconda\ipython_config
c:\Anaconda\Scripts\activate econometrics
ipython profile create econometrics
ipython --profile=econometrics

The set HOME=c:\anaconda\ipython_config can point to any path with directories containing only ASCII
characters, and can also be added to any batch file to achieve the same effect.

OS X
Installing Anaconda to the root of the partition

If the user account used is running as root, then Anaconda may install to /anaconda and not ~/anaconda by
default. Best practice is not to run as root, although in principle this is not a problem, and /anaconda can be
used in place of ~/anaconda in any of the instructions.

1.A.2 Setup using Virtual Environments


The simplest method to install the Python scientific stack is to use directly Continuum Analytics’ Anaconda.
These instructions describe alternative installation options using virtual environments, which allow alternative
configurations to simultaneously co-exist on a single sys