0% found this document useful (0 votes)
13 views1 page

Pandas DataFrame Notes - 12pages-Pages-7

The document discusses working with pivot tables in pandas, focusing on transforming long format data into wide format and vice versa using methods like pivot, unstack, and melt. It also covers handling dates and times, including creating Timestamps and Periods, and managing hierarchical indexes with MultiIndex. Additionally, it provides examples of converting strings to Timestamps and Periods, emphasizing the utility of PeriodIndex for time-series data.

Uploaded by

Sàazón Kasula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views1 page

Pandas DataFrame Notes - 12pages-Pages-7

The document discusses working with pivot tables in pandas, focusing on transforming long format data into wide format and vice versa using methods like pivot, unstack, and melt. It also covers handling dates and times, including creating Timestamps and Periods, and managing hierarchical indexes with MultiIndex. Additionally, it provides examples of converting strings to Timestamps and Periods, emphasizing the utility of PeriodIndex for time-series data.

Uploaded by

Sàazón Kasula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Pivot Tables: working with long and wide data Working with dates, times and their indexes

These features work with and often create Dates and time – points and spans
hierarchical or multi-level Indexes; With its focus on time-series data, pandas has a suite of
(the pandas MultiIndex is powerful and complex). tools for managing dates and time: either as a point in
time (a Timestamp) or as a span of time (a Period).
Pivot, unstack, stack and melt t = [Link]('2013-01-01')
Pivot tables move from long format to wide format data t = [Link]('2013-01-01 [Link]')
# Let's start with data in long format t = [Link]('2013-01-01 [Link].7')
from StringIO import StringIO # python2.7 p = [Link]('2013-01-01', freq='M')
#from io import StringIO # python 3 Note: Timestamps should be in range 1678 and 2261
data = """Date,Pollster,State,Party,Est years. (Check [Link] and [Link]).
13/03/2014, Newspoll, NSW, red, 25
13/03/2014, Newspoll, NSW, blue, 28 A Series of Timestamps or Periods
13/03/2014, Newspoll, Vic, red, 24 ts = ['2015-04-01', '2014-04-02']
13/03/2014, Newspoll, Vic, blue, 23
13/03/2014, Galaxy, NSW, red, 23 # Series of Timestamps (good)
13/03/2014, Galaxy, NSW, blue, 24 s = pd.to_datetime([Link](ts))
13/03/2014, Galaxy, Vic, red, 26
13/03/2014, Galaxy, Vic, blue, 25 # Series of Periods (hard to make)
13/03/2014, Galaxy, Qld, red, 21 s = [Link](
13/03/2014, Galaxy, Qld, blue, 27""" [[Link](x, freq='M') for x in ts] )
df = pd.read_csv(StringIO(data), s = [Link]([Link](ts,freq='D'))
header=0, skipinitialspace=True) Note: While Periods make a very useful index; they may
be less useful in a Series.
# pivot to wide format on 'Party' column
# 1st: set up a MultiIndex for other cols From non-standard strings to Timestamps
df1 = df.set_index(['Date', 'Pollster', t = ['[Link].7654-JAN092002',
'State']) '[Link].6589-FEB082016']
# 2nd: do the pivot s = [Link](pd.to_datetime(t,
wide1 = [Link](columns='Party') format="%H:%M:%S.%f-%b%d%Y"))
Also: %B = full month name; %m = numeric month;
# unstack to wide format on State / Party %y = year without century; and more …
# 1st: MultiIndex all but the Values col
df2 = df.set_index(['Date', 'Pollster', Dates and time – stamps and spans as indexes
'State', 'Party']) An index of Timestamps is a DatetimeIndex.
# 2nd: unstack a column to go wide on it An index of Periods is a PeriodIndex.
wide2 = [Link]('State') date_strs = ['2014-01-01', '2014-04-01',
wide3 = [Link]() # pop last index '2014-07-01', '2014-10-01']

# Use stack() to get back to long format dti = [Link](date_strs)


long1 = [Link]()
# Then use reset_index() to remove the pid = [Link](date_strs, freq='D')
# MultiIndex. pim = [Link](date_strs, freq='M')
long2 = long1.reset_index() piq = [Link](date_strs, freq='Q')
# Or melt() back to long format print (pid[1] - pid[0]) # 90 days
# 1st: flatten the column index print (pim[1] - pim[0]) # 3 months
[Link] = ['_'.join(col).strip() print (piq[1] - piq[0]) # 1 quarter
for col in [Link]]
# 2nd: remove the MultiIndex time_strs = ['2015-01-01 [Link].12345',
wdf = wide1.reset_index() '2015-01-01 [Link].67890']
# 3rd: melt away pis = [Link](time_strs, freq='U')
long3 = [Link](wdf, value_vars=
['Est_blue', 'Est_red'], [Link] = pd.period_range('2015-01',
var_name='Party', id_vars=['Date', periods=len(df), freq='M')
'Pollster', 'State'])
Note: See documentation, there are many arguments to dti = pd.to_datetime(['04-01-2012'],
these methods. dayfirst=True) # Australian date format
pi = pd.period_range('1960-01-01',
'2015-12-31', freq='M')
Hint: unless you are working in less than seconds,
prefer PeriodIndex over DateTimeImdex.

Version 30 April 2017 - [Draft – Mark Graph – mark dot the dot graph at gmail dot com – @Mark_Graph on twitter]
7

You might also like