Introduction
and lists
DATA T YP E S F O R DATA S C I E N C E
I NP Y T H O N
Jason Myers
Instructor
Data
types
Data type system sets the stage for the capabilities of the
language
Understanding data types empowers you as a data scientist
DATA TYPES FOR DATA SCIENCE IN
PYTHON
Hold other types of data Container
Used for aggregation, sorting, and more sequences
Can be mutable (list, set) or immutable (tuple)
Iterable
DATA TYPES FOR DATA SCIENCE IN
Lists
Hold data in order it was added
Mutable
Index
DATA TYPES FOR DATA SCIENCE IN
Accessing single items in
list
cookies = ['chocolate chip', 'peanut butter', 'sugar']
cookies.append('Tirggel')
sugar', 'Tirggel']
print(cookies)
print(cookies[2])
DATA TYPES FOR DATA SCIENCE IN
Combining Lists
Using operators, you can combine two lists into a new one
cakes = ['strawberry',
'vanilla'] desserts = cookies +
cakes print(desserts)
['chocolate chip', 'peanut butter', 'sugar',
'Tirggel', 'strawberry', 'vanilla']
.extend() method merges a list into another list at the end
DATA TYPES FOR DATA SCIENCE IN
Finding Elements in a
List
.index() method locates the position of a data element in a
list
position = cookies.index('sugar')
print(position)
cookies[3]
'sugar'
DATA TYPES FOR DATA SCIENCE IN
Removing Elements in a
List
.pop() method removes an item from a list and allows you
to save it
name = cookies.pop(position)
print(name)
sugar
print(cookies)
['chocolate chip', 'peanut butter', 'Tirggel']
DATA TYPES FOR DATA SCIENCE IN
Iterating over
lists
for loops are the most common way of iterating over a list
for cookie in cookies:
print(cookie)
chocolate chip
peanut butter
Tirggel
DATA TYPES FOR DATA SCIENCE IN
Sorting
lists
sorted() function sorts data in numerical or alphabetical
order and returns a new list
print(cookies)
['chocolate chip', 'peanut butter', 'Tirggel']
sorted_cookies = sorted(cookies)
print(sorted_cookies)
['Tirggel', 'chocolate chip', 'peanut butter']
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Meet the Tuples
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Jason Myers
Instructor
Tuple,
Tuple
Hold data in order
Index
Immutable
Pairing
Unpackable
DATA TYPES FOR DATA SCIENCE IN
Zipping
tuples
Tuples are commonly created by zipping lists together with
zip()
Two lists: us_cookies , in_cookies
top_pairs = list(zip(us_cookies, in_cookies))
print(top_pairs)
[('Chocolate Chip', 'Punjabi'), ('Brownies', 'Fruit Cake Rusk'),
('Peanut Butter', 'Marble Cookies'), ('Oreos', 'Kaju Pista Cookies'),
('Oatmeal Raisin', 'Almond Cookies')]
DATA TYPES FOR DATA SCIENCE IN
Unpacking
tuples
Unpacking tuples is a very expressive way for working with
data
us_num_1, in_num_1 =
top_pairs[0] print(us_num_1)
Chocolate Chip
print(in_num_1)
Punjabi
DATA TYPES FOR DATA SCIENCE IN
More unpacking in
Loops
Unpacking is especially powerful in loops
for us_cookie, in_cookie in
top_pairs: print(in_cookie)
print(us_cookie)
Punjabi
Chocolate Chip
Fruit Cake Rusk
Brownies
# ..etc..
DATA TYPES FOR DATA SCIENCE IN
Enumerating
positions
Another useful tuple creation method is the enumerate()
function
Enumeration is used in loops to return the position and the
data in that position while looping
for idx, item in enumerate(top_pairs):
us_cookie, in_cookie = item
print(idx, us_cookie, in_cookie)
(0, 'Chocolate Chip', 'Punjabi')
(1, 'Brownies', 'Fruit Cake Rusk')
# ..etc..
DATA TYPES FOR DATA SCIENCE IN
Be careful when making
tuples
Use zip() , enumerate() , or () to make
tuples
item = ('vanilla', 'chocolate')
print(item)
('vanilla', 'chocolate')
Beware of tailing commas!
item2 = 'butter',
print(item2)
('butter',)
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Sets for
unordered and
unique data
DATA T YP E S F O R DATA S C I E N C E I
N PYTHON
Jason Myers
Instructor
Se
t Unique
Unordered
Mutable
Python's implementation of Set Theory from Mathematics
DATA TYPES FOR DATA SCIENCE IN
Creating
Sets
Sets are created from a list
cookies_eaten_today = ['chocolate chip', 'peanut butter',
...: 'chocolate chip', 'oatmeal cream', 'chocolate
chip'] types_of_cookies_eaten = set(cookies_eaten_today)
print(types_of_cookies_eaten)
set(['chocolate chip', 'oatmeal cream', 'peanut
butter'])
DATA TYPES FOR DATA SCIENCE IN
Modifying
Sets
.add() adds single elements
.update() merges in another set or list
types_of_cookies_eaten.add('biscotti')
types_of_cookies_eaten.add('chocolate chip')
print(types_of_cookies_eaten)
set(['chocolate chip', 'oatmeal cream', 'peanut butter', 'biscotti'])
DATA TYPES FOR DATA SCIENCE IN
Updating
Sets
cookies_hugo_ate = ['chocolate chip', 'anzac']
types_of_cookies_eaten.update(cookies_hugo_ate
) print(types_of_cookies_eaten)
set(['chocolate chip', 'anzac', 'oatmeal
cream', 'peanut butter', 'biscotti'])
DATA TYPES FOR DATA SCIENCE IN
Removing data from
sets
.discard() safely removes an element from the set by value
.pop() removes and returns an arbitrary element from the
set (KeyError when empty)
types_of_cookies_eaten.discard('biscotti')
print(types_of_cookies_eaten)
set(['chocolate chip', 'anzac', 'oatmeal cream', 'peanut butter'])
types_of_cookies_eaten.pop()
types_of_cookies_eaten.pop()
'chocolate chip'
'anzac'
DATA TYPES FOR DATA SCIENCE IN
Set Operations -
Similarities
.union() set method returns a set of all the names ( or )
.intersection() method identifies overlapping data ( and )
cookies_jason_ate = set(['chocolate chip', 'oatmeal
cream', 'peanut butter'])
cookies_hugo_ate = set(['chocolate chip',
'anzac'])
cookies_jason_ate.union(cookies_hugo_ate)
set(['chocolate chip', 'anzac', 'oatmeal cream', 'peanut butter'])
cookies_jason_ate.intersection(cookies_hugo_ate)
set(['chocolate chip'])
DATA TYPES FOR DATA SCIENCE IN
Set Operations -
Differences
.difference() method identifies data present in the set on
which the method was used that is not in the arguments ( - )
Target is important!
cookies_jason_ate.difference(cookies_hugo_ate)
set(['oatmeal cream', 'peanut butter'])
cookies_hugo_ate.difference(cookies_jason_ate)
set(['anzac'])
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Using dictionaries
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Jason Myers
Instructor
Creating and looping through
dictionaries
Hold data in key/value pairs
Nestable (use a dictionary as the value of a key within a
dictionary)
Iterable
Created by dict() or {}
art_galleries = {}
for name, zip_code in galleries:
art_galleries[name] = zip_code
DATA TYPES FOR DATA SCIENCE IN
Printing in the
loop
for name in art_galleries:
print(name)
Zwirner David Gallery
Zwirner & Wirth
Zito Studio Gallery
Zetterquist Galleries
Zarre Andre Gallery
DATA TYPES FOR DATA SCIENCE IN
Safely finding by
key
art_galleries['Louvre']
|
KeyError Traceback (most recent call last)
<ipython-input-1-4f51c265f287> in <module>()
--> 1 art_galleries['Louvre']
KeyError: 'Louvre'
Geting a value from a dictionary is done using the key as an
index
If you ask for a key that does not exist that will stop your
program from running in a KeyError
DATA TYPES FOR DATA SCIENCE IN
Safely finding by key
(cont
.get().)method allows you to safely access a key without
error or exception handling
If a key is not in the dictionary, .get() returns None by
default or you can supply a value to return
art_galleries.get('Louvre', 'Not Found')
'Not Found'
art_galleries.get('Zarre Andre Gallery')
'10011'
DATA TYPES FOR DATA SCIENCE IN
Working with nested
dictionaries
art_galleries.keys()
dict_keys(['10021', '10013', '10001', '10009', '10011',
...: '10022', '10027', '10019', '11106', '10128'])
print(art_galleries['10027'])
{"Paige's Art Gallery": '(212) 531-1577',
'Triple Candie': '(212) 865-0783',
'Africart Motherland Inc': '(212) 368-6802',
'Inner City Art Gallery Inc': '(212) 368-4941'}
The method shows the keys for a given dictionary
.keys()
DATA TYPES FOR DATA SCIENCE IN
Accessing nested
data
art_galleries['10027']['Inner City Art Gallery Inc']
'(212) 368-4941'
Common way to deal with repeating data structures Can
be accessed using multiple indices or the .get()
method
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Altering
dictionaries
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Jason Myers
Instructor
Adding and extending
dictionaries
Assignment to add a new key/value to a dictionary
.update() method to update a dictionary from another
dictionary, tuples or keywords
print(galleries_10007)
{'Nyabinghi Africian Gift Shop': '(212) 566-3336'}
art_galleries['10007'] = galleries_10007
DATA TYPES FOR DATA SCIENCE IN
Updating a
dictionar y= [
galleries_11234
('A J ARTS LTD', '(718) 763-5473'),
('Doug Meyer Fine Art', '(718) 375-8006'),
('Portrait Gallery', '(718) 377-8762')]
art_galleries['11234'].update(galleries_11234
) print(art_galleries['11234'])
{'Portrait Gallery': '(718) 377-8762',
'A J ARTS LTD': '(718) 763-5473',
'Doug Meyer Fine Art': '(718) 375-8006'}
DATA TYPES FOR DATA SCIENCE IN
Popping and deleting from
dictionaries
del instruction deletes a key/value
.pop() method safely removes a key/value from a
dictionary.
del art_galleries['11234']
galleries_10310 =
art_galleries.pop('10310')
print(galleries_10310)
{'New Dorp Village Antiques Ltd': '(718) 815-2526'}
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Pythonical y
using
dictionaries
DATA T YP E S F O R DATA S C I E N C E IN
PYTHON
Jason Myers
Instructor
Working with dictionaries more
py.items()
thonicall
methody
returns an object we can iterate over
for gallery, phone_num in art_galleries.items():
print(gallery)
print(phone_num)
'Miakey Art Gallery'
'(718) 686-0788'
'Morning Star Gallery Ltd'
'(212) 334-9330'}
'New York Art Expo Inc'
'(212) 363-8280'
DATA TYPES FOR DATA SCIENCE IN
Checking dictionaries for
data
.get() does a lot of work to check for a key
in operator is much more efficient and clearer
'11234' in art_galleries
False
if '10010' in art_galleries:
print('I found: %s' % art_galleries['10010'])
else:
print('No galleries found.')
I found: {'Nyabinghi Africian Gift Shop': '(212) 566-3336'}
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Working with
CSV files
DATA T YP E S F O R DATA S C I E N C E I
N PYTHON
Jason Myers
Instructor
CSV
Files
NAME,TEL,ADDRESS1,ADDRESS2,CITY,ZIP
O'reilly William & Co Ltd,(212) 396-1822,52 E 76th St,,New York,10021
DATA TYPES FOR DATA SCIENCE IN
Reading from a file using CSV
reader
Python csv module
open() function provides a variable that represents a file,
takes a path and a mode
csv.reader() reads a file object and returns the lines from
the file as tuples
.close() method closes file objects
import csv
csvfile = open('ART_GALLERY.csv',
'r') for row in csv.reader(csvfile):
print(row)
DATA TYPES FOR DATA SCIENCE IN
Reading from a CSV -
Res ults
['NAME', 'the_geom', 'TEL', 'URL',
'ADDRESS1', 'ADDRESS2', 'CITY', 'ZIP']
["O'reilly William & Co Ltd",
'POINT (-73.96273074561996 40.773800871637576)',
'(212) 396-1822', '52 E 76th St', '', 'New York',
'10021']
csvfile.close()
DATA TYPES FOR DATA SCIENCE IN
Creating a dictionary from a
file
Often we want to go from CSV file to dictionary
DictReader does just that
If data doesn't have a header row, you can pass in the
column names
for row in csv.DictReader(csvfile):
print(row)
OrderedDict([('NAME', 'Odyssia Gallery'),
('the_geom', 'POINT (-73.96269813635554 40.7618747512849)'),
('TEL', '(212) 486-7338'),
('URL', '
http://www.livevillage.com/newyork/art/odyssia-gallery.html'),
('ADDRESS1', '305 E 61st St'), ...
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Counting made
easy
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Jason Myers
Instructor
Collections
Module
Part of Standard Library Advanced
data containers
DATA TYPES FOR DATA SCIENCE IN
Counte
r Special dictionary used for counting data, measuring
frequency
from collections import Counter
nyc_eatery_count_by_types = Counter(nyc_eatery_types)
print(nyc_eatery_count_by_type)
Counter({'Mobile Food Truck': 114, 'Food Cart': 74, 'Snack Bar': 24,
'Specialty Cart': 18, 'Restaurant': 15, 'Fruit & Vegetable Cart': 4})
print(nyc_eatery_count_by_types['Restaurant'])
15
DATA TYPES FOR DATA SCIENCE IN
Counter to find the most
common
.most_common() method returns the counter values in
descending order
print(nyc_eatery_count_by_types.most_common(3)
)
[('Mobile Food Truck', 114), ('Food Cart', 74), ('Snack Bar',
24)]
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Dictionaries of
unknown
structure -
defaultdict
DATA T YP E S F O R DATA S C I E N C E I NP Y
Jason Myers THON
Instructor
Dictionary
Handling
for park_id, name in
nyc_eateries_parks: if park_id not
in eateries_by_park:
eateries_by_park[park_id] = []
eateries_by_park[park_id].append(name
)
print(eateries_by_park['M010'])
{'MOHAMMAD MATIN','PRODUCTS CORP.', 'Loeb Boathouse
Restaurant', 'Nandita Inc.', 'SALIM AHAMED', 'THE NY PICNIC
COMPANY',
'THE NEW YORK PICNIC COMPANY, INC.', 'NANDITA,
INC.', 'JANANI FOOD SERVICE, INC.'}
DATA TYPES FOR DATA SCIENCE IN
Using
defa altdict
Pass itu default type that every key will have even if it
doesn't currently exist
Works exactly like a dictionary
from collections import defaultdict
eateries_by_park = defaultdict(list)
for park_id, name in
nyc_eateries_parks:
eateries_by_park[park_id].append(name)
print(eateries_by_park['M010'])
{'MOHAMMAD MATIN','PRODUCTS CORP.', 'Loeb Boathouse Restaurant',
'Nandita Inc.', 'SALIM AHAMED', 'THE NY PICNIC COMPANY',
'THE NEW YORK PICNIC COMPANY, INC.', 'NANDITA, INC.', ...}
DATA TYPES FOR DATA SCIENCE IN
defaultdict
cont
(from .)
collections import defaultdict
eatery_contact_types = defaultdict(int)
for eatery in nyc_eateries:
if eatery.get('phone'):
eatery_contact_types['phones'] += 1
if eatery.get('website'):
eatery_contact_types['websites'] += 1
print(eatery_contact_types)
defaultdict(<class 'int'>, {'phones': 28, 'websites': 31})
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Maintaining
Dictionary
Order with
OrderedDict
DATA T YP E S F O R DATA S C I E N C E
I NP Y T H O N
Jason Myers
Instructor
Order in Python
dictionaries
Python version < 3.6 NOT ordered Python version
> 3.6 ordered
DATA TYPES FOR DATA SCIENCE IN
Getting started with
OrderedDict
from collections import OrderedDict
nyc_eatery_permits = OrderedDict()
for eatery in nyc_eateries:
nyc_eatery_permits[eatery['end_date']] = eatery
print(list(nyc_eatery_permits.items())[:3]
('2029-04-28', {'name': 'Union Square Seasonal Cafe',
'location': 'Union Square Park', 'park_id': 'M089',
'start_date': '2014-04-29', 'end_date': '2029-04-28',
'description': None, 'permit_number': 'M89-SB-R', ...})
DATA TYPES FOR DATA SCIENCE IN
OrderedDict power
feat ure method returns items in reverse insertion order
.popitem()
print(nyc_eatery_permits.popitem())
('2029-04-28', {'name': 'Union Square Seasonal Cafe',
'location': 'Union Square Park', 'park_id': 'M089',
'start_date': '2014-04-29', 'end_date': '2029-04-28',
'description': None, 'permit_number': 'M89-SB-R', ...})
print(nyc_eatery_permits.popitem())
('2027-03-31', {'name': 'Dyckman Marina Restaurant',
'location': 'Dyckman Marina Restaurant', 'park_id': 'M028',
'start_date': '2012-04-01', 'end_date': '2027-03-
31', ...})
DATA TYPES FOR DATA SCIENCE IN
OrderedDict power feature (2)
You can use the last=False keyword argument to return the
items in insertion order
print(nyc_eatery_permits.popitem(last=False))
('2012-12-07', {'name': 'Mapes Avenue Ballfields Mobile Food Truck',
'location': 'Prospect Avenue, E. 181st Street', 'park_id': 'X289',
'start_date': '2009-07-01', 'end_date': '2012-12-07',
'description': None, 'permit_number': 'X289-MT', 'phone': None,
'website': None, 'type_name': 'Mobile Food Truck'})
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
namedtuple
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Jason Myers
Instructor
What is a namedtuple?
A tuple where each position (column) has a name
Ensure each one has the same properties Alternative to
a pandas DataFrame row
DATA TYPES FOR DATA SCIENCE IN
Creating a
namedt
Pass a nameuple
and a list of fields
from collections import namedtuple
Eatery = namedtuple('Eatery', ['name', 'location', 'park_id',
...: 'type_name'])
eateries = []
for eatery in nyc_eateries:
details = Eatery(eatery['name'],
eatery['location'],
eatery['park_id'],
eatery['type_name'])
eateries.append(details)
DATA TYPES FOR DATA SCIENCE IN
Print the first
element
print(eateries[0])
Eatery(name='Mapes Avenue Ballfields Mobile Food Truck',
location='Prospect Avenue, E. 181st Street',
park_id='X289', type_name='Mobile Food Truck')
DATA TYPES FOR DATA SCIENCE IN
Leveraging
namedt avples
Each field is u ailable as an atribute of the namedtuple
for eatery in eateries[:3]:
print(eatery.name)
print(eatery.park_id)
print(eatery.location)
Mapes Avenue Ballfields Mobile Food Truck
X289
Prospect Avenue, E. 181st Street
Claremont Park Mobile Food Truck
X008
East 172 Street between Teller & Morris avenues ...
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
There and Back
Again a
DateTime
Journey
DATA T YP E S F O R DATA S C I E N C E I
N PYTHON
Jason Myers
Instructor
From string to
datetime
The datetime module is part of the Python standard library
Use the datetime type from inside the datetime module
.strptime() method converts from a string to a datetime
object
from datetime import datetime
print(parking_violations_date)
06/11/2016
DATA TYPES FOR DATA SCIENCE IN
Parsing strings into
datetimes
date_dt = datetime.strptime(parking_violations_date,
'%m/%d/%Y')
print(date_dt)
2016-06-11 00:00:00
DATA TYPES FOR DATA SCIENCE IN
Time Format
Strings
Directive Meaning Example
%d Day of the month as a zero- 01, 02, ..., 31
padded decimal number.
%m Month as a zero-padded 01, 02, ..., 12
decimal number.
%Y Year with century as a 0001, 0002, ..., 2013,
decimal number. 2014, ..., 9998, 9999
Full list available in the Python documentation
DATA TYPES FOR DATA SCIENCE IN
Datetime to
String
.strftime() method uses a format string to convert a
datetime object to a string
date_dt.strftime('%m/%d/%Y')
'06/11/2016'
isoformat() method outputs a datetime as an ISO standard
string
date_dt.isoformat()
'2016-06-11T00:00:00'
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Working with
Datetime
Components
and current
time
Jason Myers DATA T YP E S F O R DATA S C I E N C E I NP Y T
Instructor HON
Datetime
Components
day , month , year , hour , minute , second , and more
are available from a datetime instance
Great for grouping data
daily_violations = defaultdict(int)
for violation in parking_violations:
violation_date = datetime.strptime(violation[4],
'%m/%d/%Y')
daily_violations[violation_date.day] += 1
DATA TYPES FOR DATA SCIENCE IN
Datetime Components -
Res ults
print(sorted(daily_violations.items()))
[(1, 80986), (2, 79831), (3, 74610), (4, 69555),
(5, 68729), (6, 76232),(7, 82477), (8, 72472),
(9, 80415), (10, 75387), (11, 73287), (12, 74614),
(13, 75278), (14, 81803), (15, 79122), (16, 80692),
(17, 73677), (18, 75927), (19, 80813), (20, 80992),
(21, 78138), (22, 81872), (23, 78104), (24, 63490),
(25, 78898), (26, 78830), (27, 80164), (28, 81954),
(29, 80585), (30, 65864), (31, 44125)]
DATA TYPES FOR DATA SCIENCE IN
What is the deal with
no.now()
w method returns the current local datetime
.utcnow() method returns the current UTC datetime
from datetime import
datetime local_dt =
datetime.now()
print(local_dt)
2017-05-05 12:30:00.740415
DATA TYPES FOR DATA SCIENCE IN
What is the deal with
tcno= w
uutc_dt datetime.utcnow()
print(utc_dt)
2017-05-05 17:30:05.467221
DATA TYPES FOR DATA SCIENCE IN
Timezone
s Naive datetime objects have no timezone data
Aware datetime objects have a timezone
Timezone data is available via the pytz module via the
timezone object
Aware objects have .astimezone() so you can get the time
in another timezone
DATA TYPES FOR DATA SCIENCE IN
Timezones in
action
from pytz import timezone
record_dt = datetime.strptime('07/12/2016
04:39PM',
...: '%m/%d/%Y %H:%M%p')
ny_tz = timezone('US/Eastern')
a_tz = timezone('US/Pacific')
ny_dt =
record_dt.replace(tzinfo=ny_tz) la_dt
= ny_dt.astimezone(la_tz)
DATA TYPES FOR DATA SCIENCE IN
Timezones in action -
res ults
print(ny_dt)
2016-07-12 04:39:00-04:00
print(la_dt)
2016-07-12 01:39:00-07:00
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Time Travel
(Adding and
Subtracting
Time)
DATA T YP E S F O R DATA S C I E N C E I NP Y
Jason Myers THON
Instructor
Incrementing through
time
timedelta is used to represent an amount of change in time
Used to add or subtract a set amount of time from a datetime
object
from datetime import timedelta
flashback =
timedelta(days=90)
print(record_dt)
2016-07-12 04:39:00
DATA TYPES FOR DATA SCIENCE IN
Adding and subtracting
timedeltas
print(record_dt - flashback)
2016-04-13 04:39:00
print(record_dt + flashback)
2016-10-10 04:39:00
DATA TYPES FOR DATA SCIENCE IN
Datetime
differences
Use the - operator to calculate the diference
Returns a timedelta with the diference
time_diff = record_dt - record2_dt
type(time_diff)
datetime.timedelta
print(time_diff)
0:00:04
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
HELP! Libraries
to make it
easier
DATA T YP E S F O R DATA S C I E N C E I
N PYTHON
Jason Myers
Instructor
Parsing time with
pend uluwillmatempt to convert a string to a pendulum
.parse()
datetime object without the need of the format string
import pendulum
occurred = violation[4] + ' ' + violation[5] +'M'
occurred_dt = pendulum.parse(occurred, tz='US/Eastern')
print(occured_dt)
'2016-06-11T14:38:00-04:00'
DATA TYPES FOR DATA SCIENCE IN
Timezone hopping with
pend ulum method converts a pendulum time object to
.in_timezone()
a desired timezone.
.now() method accepts a timezone you want to get the
current time in
print(violation_dts)
[<Pendulum [2016-06-11T14:38:00-04:00]>,
<Pendulum [2016-04-25T14:09:00-04:00]>,
<Pendulum [2016-04-23T07:49:00-04:00]>,
<Pendulum [2016-04-26T07:09:00-04:00]>,
<Pendulum [2016-01-04T09:52:00-05:00]>]
DATA TYPES FOR DATA SCIENCE IN
More timezone
hopping
for violation_dt in violation_dts:
print(violation_dt.in_timezone('Asia/Tokyo'))
2016-06-12T03:38:00+09:00
2016-04-26T03:09:00+09:00
2016-04-23T20:49:00+09:00
2016-04-26T20:09:00+09:00
2016-01-04T23:52:00+09:00
print(pendulum.now('Asia/Tokyo'))
<Pendulum [2017-05-06T08:20:40.104160+09:00]>
DATA TYPES FOR DATA SCIENCE IN
Humanizing
differences
.in_XXX() methods provide the diference in a chosen
metric
.in_words() provides the diference in a nice expressive
form
diff = violation_dts[3] - violation_dts[2]
diff
<Period [2016-04-26T07:09:00-04:00 ->
2016-04-23T07:49:00-04:00]>
print(diff.in_words())
'2 days 23 hours 20 minutes'
DATA TYPES FOR DATA SCIENCE IN
More human than
hprint(diff.in_days())
uman
2
print(diff.in_hours())
71
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Case Study -
Counting
Crimes
DATA T YP E S F O R DATA S C I E N C E
I NP Y T H O N
Jason Myers
Instructor
Data Set
Overview
Date,Block,Primary Type,Description,
Location Description,Arrest,Domestic, District
05/23/2016 05:35:00 PM,024XX W DIVISION ST,ASSAULT,SIMPLE,
STREET,false,true,14
03/26/2016 08:20:00 PM,019XX W HOWARD
ST,BURGLARY,FORCIBLE ENTRY, SMALL RETAIL
STORE,false,false,24
Chicago Open Data Portal htps://data.cityofchicago.org/
DATA TYPES FOR DATA SCIENCE IN
Part 1 - Step
1 Read data from CSV
import csv
csvfile = open('ART_GALLERY.csv',
'r') for row in csv.reader(csvfile):
print(row)
DATA TYPES FOR DATA SCIENCE IN
Part 1 - Step
2 Create and use a Counter with a slight twist
from collections import Counter
nyc_eatery_count_by_types = Counter(nyc_eatery_types)
Use date parts for Grouping like in Chapter 4
daily_violations = defaultdict(int)
for violation in parking_violations:
violation_date = datetime.strptime(violation[4],
'%m/%d/%Y')
daily_violations[violation_date.day] += 1
DATA TYPES FOR DATA SCIENCE IN
Part 1 - Step
3 Group data by Month
The date components we learned about earlier.
from collections import defaultdict
eateries_by_park = defaultdict(list)
for park_id, name in
nyc_eateries_parks:
eateries_by_park[park_id].append(name)
DATA TYPES FOR DATA SCIENCE IN
Part 1 -
Final
Find 5 most common locations for crime each month.
print(nyc_eatery_count_by_types.most_common(3))
[('Mobile Food Truck', 114), ('Food Cart', 74), ('Snack Bar',
24)]
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Case Study -
Crimes by District
and Differences
by Block
DATA T YP E S F O R DATA S C I E N C E I NP Y T
Jason Myers HON
Instructor
Part 2 - Step
1 Read in the CSV data as a dictionary
import csv
csvfile = open('ART_GALLERY.csv',
'r') for row in
csv.DictReader(csvfile):
print(row)
Pop out the key and store the remaining dict
galleries_10310 = art_galleries.pop('10310')
DATA TYPES FOR DATA SCIENCE IN
Part 2 - Step
2 Pythonically iterate over the Dictionary
for zip_code, galleries in art_galleries.items():
print(zip_code)
print(galleries)
DATA TYPES FOR DATA SCIENCE IN
Wrapping
Up
Use sets for uniqueness
cookies_eaten_today = ['chocolate chip', 'peanut butter',
'chocolate chip', 'oatmeal cream', 'chocolate chip']
types_of_cookies_eaten = set(cookies_eaten_today)
print(types_of_cookies_eaten)
set(['chocolate chip', 'oatmeal cream', 'peanut butter'])
difference() set method as at the end of Chapter 1
cookies_jason_ate.difference(cookies_hugo_ate)
set(['oatmeal cream', 'peanut butter'])
DATA TYPES FOR DATA SCIENCE IN
Let's practice!
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Final thoughts
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON
Jason Myers
Instructor
Congratulations
DATA T YP E S F O R DATA S C I E N C E I NP Y T
HON