Example of Python primer for data analysis
Saeed Amen / Founder of Cuemacro
https://www.cuemacro.com / [email protected] / @saeedamenfx / All material is copyright Cuemacro /
2020
In this class example, we show some examples of basic concepts in Python like variables, lists etc. some of the
Python standard library too.
Print hello world!
In [1]:
hello world!
Defining variables and checking types
Examples where we define several variables and check Python's duck typing.
In [2]: # Define a few variables and check their types
<class 'int'>
<class 'str'>
<class 'float'>
Arithmetic and logical operators
Playing around with various operators with Python variables
In [3]: # Use some arithmetic operators, start with addition
2
Out[3]:
In [4]: # Now multiplication
1.4
Out[4]:
In [5]: # Use some comparative operators (greater than)
False
Out[5]:
In [6]: # Smaller than
True
Out[6]:
In [7]: # Equals
True
Out[7]:
In [8]: # Not equals
True
Out[8]:
In [9]: # Concatenating strings
'helloworld'
Out[9]:
Lists
A few examples of combining together variables in Python list and running operations on lists.
In [10]: # Create a list of integers
In [11]: # Append to the list
In [12]: # What does the list look like?
[1, 3, 2, 1]
In [13]: # Sort the list
In [14]: # Print the list
[1, 1, 2, 3]
In [15]: # Only print the last element
In [16]: # Print the first element
In [17]: # Print middle elements
[1, 2]
Dictionaries
Dictionaries can be useful to store many types of data, such as phone directories.
In [18]: # Create a dictionary of restaurants as keys and values as burgers
In [19]: # Print out a value for one of the keys
Whopper
In [20]:
# Create a list of the keys and print the lsit
dict_keys(['Burger King', 'McDonalds'])
If statement
In [21]: # Demonstrate an if statement
Yes 2 is greater than one
For loops
A for loop lets us run the same operation repeatedly for different elements.
In [22]: # Create a simple for loop to print numbers between 0 and 5
# We use range to create the list
0
1
2
3
4
5
List comprehension
This is similar to a for loop, but is more succinct.
In [23]: # Do the same thing using a list comprehension
0
1
2
3
4
5
Creating a function
A function let's us collect together code which will be used repeatedly.
In [24]: # Create a function to double numbers
In [25]: # Try running the function
Iteration vs recursion
Using an example based on concatenation of characters.
In [26]: # Let's concatenate numbers 0 to 5 by iteration
012345
In [27]: # Let's concatenate numbers 0 to 5 by recursion
# Convert numbers to strings (so can be concatenated)
012345
Python standard library
We'll show a few common libraries, such as datetime and math from Python.
In [28]: # Print the current time and date using datetime
2020-06-27 13:55:16.519604
In [29]: # Add a day to the current time
2020-06-28 13:55:16.534604
In [30]: # Round a float
In [31]: # Use the floor and ceil function from math
4
5
NumPy
We show a few basic examples of creating NumPy arrays and some operations we can perform on them.
In [32]: # Import NumPy library to begin
# Let's create two lists of numbers and convert into NumPy arrays
# The results will be 1 x 4 narrays
In [33]: # Do basic arithmetic operations on them (don't use for loops here!)
[0 0 0 0]
[ 1 4 9 16]
[ 5 7 9 11]
In [34]: # Query the various properties of the NumPy arrays
# such as their size and their dimensions
4
1
In [35]: # Create a 2 x 2 narray, ie. matrix
[[1 2]
[3 4]]
In [36]: # Resize our original 1 x 4 narray into a 2 x 2 matrix
[[1 2]
[3 4]]
In [37]: # Sum the columns of our matrix and then the rows
[4 6]
[3 7]
Loading time series with pandas and filter them
In [38]: # We'll fetch a CSV file on Cuemacro's GitHub site that has FX spot data
# which we've got from FRED earlier
# CSV file at 'https://raw.githubusercontent.com/cuemacro/teaching/master/pythoncourse/dat
In [39]: # Let's load up the CSV file and make sure to convert the index column into
# pandas dates
# Hint: use read_csv function from pandas
In [40]: # Print the first 5 rows to see what it looks like...
EURUSD.close USDJPY.close GBPUSD.close AUDUSD.close \
Date
1989-01-04 NaN 125.05 1.8070 0.8698
1989-01-05 NaN 125.59 1.7970 0.8679
1989-01-06 NaN 126.62 1.7800 0.8625
1989-01-09 NaN 126.45 1.7636 0.8637
1989-01-10 NaN 126.30 1.7637 0.8664
USDCAD.close NZDUSD.close USDCHF.close USDNOK.close \
Date
1989-01-04 1.1921 0.6365 1.5188 6.581
1989-01-05 1.1895 0.6380 1.5310 6.593
1989-01-06 1.1952 0.6365 1.5490 6.645
1989-01-09 1.1970 0.6350 1.5575 6.678
1989-01-10 1.2009 0.6310 1.5645 6.689
USDSEK.close
Date
1989-01-04 6.160
1989-01-05 6.180
1989-01-06 6.230
1989-01-09 6.264
1989-01-10 6.275
In [41]: # Obviously don't have any EURUSD data before the introduction of the EUR in 1999
# Let's remove any entries before '04 Jan 1999'
# Fill forward NaN values to simplify analysis later (hint: use fillna)
In [42]: # Print the first 5 points again to check
EURUSD.close USDJPY.close GBPUSD.close AUDUSD.close \
Date
1999-01-04 1.1812 112.15 1.6581 0.6182
1999-01-05 1.1760 111.15 1.6566 0.6217
1999-01-06 1.1636 112.78 1.6547 0.6285
1999-01-07 1.1672 111.69 1.6495 0.6340
1999-01-08 1.1554 111.52 1.6405 0.6326
USDCAD.close NZDUSD.close USDCHF.close USDNOK.close \
Date
1999-01-04 1.5268 0.5340 1.3666 7.4960
1999-01-05 1.5213 0.5370 1.3694 7.4340
1999-01-06 1.5110 0.5385 1.3852 7.4355
1999-01-07 1.5117 0.5405 1.3863 7.4050
1999-01-08 1.5145 0.5400 1.3970 7.3970
USDSEK.close
Date
1999-01-04 8.0200
1999-01-05 7.9720
1999-01-06 7.9360
1999-01-07 7.9150
1999-01-08 7.9285
Basic arithmetic operations with Pandas
We'll show how we can do arithmetic on Pandas DataFrames. Here we are making the USD base currency for
each time series.
In [43]: # Next step we might want to do is to make sure everything is quoted with USD base
# Go through each USD cross (hint: use a for loop)
# Invert column if USD is not base
# Otherwise append it as it is
In [44]: # Note, how EUR, GBP, AUD and NZD have now been inverted
# Everything is now quoted USDabc
USDEUR.close USDJPY.close USDGBP.close USDAUD.close \
Date
1999-01-04 0.846597 112.15 0.603100 1.617599
1999-01-05 0.850340 111.15 0.603646 1.608493
1999-01-06 0.859402 112.78 0.604339 1.591090
1999-01-07 0.856751 111.69 0.606244 1.577287
1999-01-08 0.865501 111.52 0.609570 1.580778
USDCAD.close USDNZD.close USDCHF.close USDNOK.close \
Date
1999-01-04 1.5268 1.872659 1.3666 7.4960
1999-01-05 1.5213 1.862197 1.3694 7.4340
1999-01-06 1.5110 1.857010 1.3852 7.4355
1999-01-07 1.5117 1.850139 1.3863 7.4050
1999-01-08 1.5145 1.851852 1.3970 7.3970
USDSEK.close
Date
1999-01-04 8.0200
1999-01-05 7.9720
1999-01-06 7.9360
1999-01-07 7.9150
1999-01-08 7.9285
Creating rebased indices
We rebase our FX returns index, so they start at 100 so the assets are easier to compare with each other.
In [45]: # Calculate simple returns for every FX pair
In [46]: # Let's rebase each currency pair so it starts at 100
# Hint: use cumprod function
# Set the first values as being 100
In [47]: # Everything now starts at 100, so it's possible to compare them
# First value is undefined because the first day's returns
USDEUR.close USDJPY.close USDGBP.close USDAUD.close \
Date
1999-01-04 100.000000 100.000000 100.000000 100.000000
1999-01-05 100.442177 99.108337 100.090547 99.437028
1999-01-06 101.512547 100.561748 100.205475 98.361177
1999-01-07 101.199452 99.589835 100.521370 97.507886
1999-01-08 102.232993 99.438252 101.072844 97.723680
USDCAD.close USDNZD.close USDCHF.close USDNOK.close \
Date
1999-01-04 100.000000 100.000000 100.000000 100.000000
1999-01-05 99.639769 99.441341 100.204888 99.172892
1999-01-06 98.965156 99.164345 101.361042 99.192903
1999-01-07 99.011003 98.797410 101.441534 98.786019
1999-01-08 99.194394 98.888889 102.224499 98.679296
USDSEK.close
Date
1999-01-04 100.000000
1999-01-05 99.401496
1999-01-06 98.952618
1999-01-07 98.690773
1999-01-08 98.859102
Plotting line charts
We now demonstrate how to plot time series in Pandas DataFrames.
In [48]: # Make sure to inline charts, so they appear in notebook
# Set matplotlib style
<matplotlib.axes._subplots.AxesSubplot at 0x20a0660df28>
Out[48]:
In [49]: # We can also plot with chartpy (with matplotlib first...)
# Create Style object to set title etc.
# Now plot with chartpy via matplotlib
In [50]: # Now use chartpy but changing size (or can try plotly)