Pivot Tables: working with long and wide data Working with dates, times and their indexes
These features work with and often create Dates and time – points and spans
hierarchical or multi-level Indexes; With its focus on time-series data, pandas has a suite of
(the pandas MultiIndex is powerful and complex). tools for managing dates and time: either as a point in
time (a Timestamp) or as a span of time (a Period).
Pivot, unstack, stack and melt t = [Link]('2013-01-01')
Pivot tables move from long format to wide format data t = [Link]('2013-01-01 [Link]')
# Let's start with data in long format t = [Link]('2013-01-01 [Link].7')
from StringIO import StringIO # python2.7 p = [Link]('2013-01-01', freq='M')
#from io import StringIO # python 3 Note: Timestamps should be in range 1678 and 2261
data = """Date,Pollster,State,Party,Est years. (Check [Link] and [Link]).
13/03/2014, Newspoll, NSW, red, 25
13/03/2014, Newspoll, NSW, blue, 28 A Series of Timestamps or Periods
13/03/2014, Newspoll, Vic, red, 24 ts = ['2015-04-01', '2014-04-02']
13/03/2014, Newspoll, Vic, blue, 23
13/03/2014, Galaxy, NSW, red, 23 # Series of Timestamps (good)
13/03/2014, Galaxy, NSW, blue, 24 s = pd.to_datetime([Link](ts))
13/03/2014, Galaxy, Vic, red, 26
13/03/2014, Galaxy, Vic, blue, 25 # Series of Periods (hard to make)
13/03/2014, Galaxy, Qld, red, 21 s = [Link](
13/03/2014, Galaxy, Qld, blue, 27""" [[Link](x, freq='M') for x in ts] )
df = pd.read_csv(StringIO(data), s = [Link]([Link](ts,freq='D'))
header=0, skipinitialspace=True) Note: While Periods make a very useful index; they may
be less useful in a Series.
# pivot to wide format on 'Party' column
# 1st: set up a MultiIndex for other cols From non-standard strings to Timestamps
df1 = df.set_index(['Date', 'Pollster', t = ['[Link].7654-JAN092002',
'State']) '[Link].6589-FEB082016']
# 2nd: do the pivot s = [Link](pd.to_datetime(t,
wide1 = [Link](columns='Party') format="%H:%M:%S.%f-%b%d%Y"))
Also: %B = full month name; %m = numeric month;
# unstack to wide format on State / Party %y = year without century; and more …
# 1st: MultiIndex all but the Values col
df2 = df.set_index(['Date', 'Pollster', Dates and time – stamps and spans as indexes
'State', 'Party']) An index of Timestamps is a DatetimeIndex.
# 2nd: unstack a column to go wide on it An index of Periods is a PeriodIndex.
wide2 = [Link]('State') date_strs = ['2014-01-01', '2014-04-01',
wide3 = [Link]() # pop last index '2014-07-01', '2014-10-01']
# Use stack() to get back to long format dti = [Link](date_strs)
long1 = [Link]()
# Then use reset_index() to remove the pid = [Link](date_strs, freq='D')
# MultiIndex. pim = [Link](date_strs, freq='M')
long2 = long1.reset_index() piq = [Link](date_strs, freq='Q')
# Or melt() back to long format print (pid[1] - pid[0]) # 90 days
# 1st: flatten the column index print (pim[1] - pim[0]) # 3 months
[Link] = ['_'.join(col).strip() print (piq[1] - piq[0]) # 1 quarter
for col in [Link]]
# 2nd: remove the MultiIndex time_strs = ['2015-01-01 [Link].12345',
wdf = wide1.reset_index() '2015-01-01 [Link].67890']
# 3rd: melt away pis = [Link](time_strs, freq='U')
long3 = [Link](wdf, value_vars=
['Est_blue', 'Est_red'], [Link] = pd.period_range('2015-01',
var_name='Party', id_vars=['Date', periods=len(df), freq='M')
'Pollster', 'State'])
Note: See documentation, there are many arguments to dti = pd.to_datetime(['04-01-2012'],
these methods. dayfirst=True) # Australian date format
pi = pd.period_range('1960-01-01',
'2015-12-31', freq='M')
Hint: unless you are working in less than seconds,
prefer PeriodIndex over DateTimeImdex.
Version 30 April 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
7