0% found this document useful (0 votes)

27 views10 pages

Pivot Tables - Python Data Science Handbook

good python book

Uploaded by

nicholasdevera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views10 pages

Pivot Tables - Python Data Science Handbook

good python book

Uploaded by

nicholasdevera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

2/18/25, 6:45 PM Pivot Tables | Python Data Science Handbook

This is an excerpt from the Python Data Science Handbook (http://shop.oreilly.com/product/0636920034919.do) by Jake
VanderPlas; Jupyter notebooks are available on GitHub (https://github.com/jakevdp/PythonDataScienceHandbook).

The text is released under the CC-BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code
is released under the MIT license (https://opensource.org/licenses/MIT). If you find this content useful, please consider
supporting the work by buying the book (http://shop.oreilly.com/product/0636920034919.do)!

Pivot Tables

< Aggregation and Grouping (03.08-aggregation-and-grouping.html) | Contents

(index.html) | Vectorized String Operations (03.10-working-with-strings.html) >

Open in Colab

(https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/note
Pivot-Tables.ipynb)

We have seen how the GroupBy abstraction lets us explore relationships within
a dataset. A pivot table is a similar operation that is commonly seen in
spreadsheets and other programs that operate on tabular data. The pivot table
takes simple column-wise data as input, and groups the entries into a two-
dimensional table that provides a multidimensional summarization of the data.
The difference between pivot tables and GroupBy can sometimes cause
confusion; it helps me to think of pivot tables as essentially a multidimensional
version of GroupBy aggregation. That is, you split-apply-combine, but both the
split and the combine happen across not a one-dimensional index, but across a
two-dimensional grid.

# Motivating Pivot Tables

For the examples in this section, we'll use the database of passengers on the
Titanic, available through the Seaborn library (see Visualization With Seaborn
(04.14-visualization-with-seaborn.html)):

In [1]: import numpy as np

import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')

https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html 1/10
2/18/25, 6:45 PM Pivot Tables | Python Data Science Handbook

In [2]: titanic.head()

Out[2]: survivedpclasssex age sibspparchfare embarkedclass who adult_maled

00 3 male 22.01 0 7.2500 S Thirdman True N
11 1 female38.01 0 71.2833C First womanFalse C
21 3 female26.00 0 7.9250 S ThirdwomanFalse N
31 1 female35.01 0 53.1000S First womanFalse C
40 3 male 35.00 0 8.0500 S Thirdman True N

This contains a wealth of information on each passenger of that ill-fated voyage,

including gender, age, class, fare paid, and much more.

# Pivot Tables by Hand

To start learning more about this data, we might begin by grouping according to
gender, survival status, or some combination thereof. If you have read the
previous section, you might be tempted to apply a GroupBy operation–for
example, let's look at survival rate by gender:

In [3]: titanic.groupby('sex')[['survived']].mean()

Out[3]: survived
sex
female0.742038
male 0.188908

This immediately gives us some insight: overall, three of every four females on
board survived, while only one in five males survived!

This is useful, but we might like to go one step deeper and look at survival by
both sex and, say, class. Using the vocabulary of GroupBy , we might proceed
using something like this: we group by class and gender, select survival, apply a
mean aggregate, combine the resulting groups, and then unstack the
hierarchical index to reveal the hidden multidimensionality. In code:

In [4]: titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unst

Out[4]: class First Second Third

sex
female0.9680850.9210530.500000
male 0.3688520.1574070.135447

This gives us a better idea of how both gender and class affected survival, but
the code is starting to look a bit garbled. While each step of this pipeline makes
sense in light of the tools we've previously discussed, the long string of code is

https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html 2/10
2/18/25, 6:45 PM Pivot Tables | Python Data Science Handbook

not particularly easy to read or use. This two-dimensional GroupBy is common

enough that Pandas includes a convenience routine, pivot_table , which
succinctly handles this type of multi-dimensional aggregation.

# Pivot Table Syntax

Here is the equivalent to the preceding operation using the pivot_table
method of DataFrame s:

In [5]: titanic.pivot_table('survived', index='sex', columns='class')

Out[5]: class First Second Third

sex
female0.9680850.9210530.500000
male 0.3688520.1574070.135447

This is eminently more readable than the groupby approach, and produces the
same result. As you might expect of an early 20th-century transatlantic cruise,
the survival gradient favors both women and higher classes. First-class women
survived with near certainty (hi, Rose!), while only one in ten third-class men
survived (sorry, Jack!).

## Multi-level pivot tables

Just as in the GroupBy , the grouping in pivot tables can be specified with
multiple levels, and via a number of options. For example, we might be
interested in looking at age as a third dimension. We'll bin the age using the
pd.cut function:

In [6]: age = pd.cut(titanic['age'], [0, 18, 80])

titanic.pivot_table('survived', ['sex', age], 'class')

Out[6]: class First Second Third

sex age
female(0, 18] 0.9090911.0000000.511628
(18, 80]0.9729730.9000000.423729
male (0, 18] 0.8000000.6000000.215686
(18, 80]0.3750000.0714290.133663

We can apply the same strategy when working with the columns as well; let's
add info on the fare paid using pd.qcut to automatically compute quantiles:

https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html 3/10
2/18/25, 6:45 PM Pivot Tables | Python Data Science Handbook

In [7]: fare = pd.qcut(titanic['fare'], 2)

titanic.pivot_table('survived', ['sex', age], [fare, 'class'])

Out[7]: fare [0, 14.454] (14.454, 512.329]

class FirstSecond Third First Second Third
sex age
female(0, 18] NaN 1.0000000.7142860.9090911.0000000.318182
(18, 80]NaN 0.8800000.4444440.9729730.9142860.391304
male (0, 18] NaN 0.0000000.2608700.8000000.8181820.178571
(18, 80]0.0 0.0980390.1250000.3913040.0303030.192308

The result is a four-dimensional aggregation with hierarchical indices (see

Hierarchical Indexing (03.05-hierarchical-indexing.html)), shown in a grid
demonstrating the relationship between the values.

## Additional pivot table options

The full call signature of the pivot_table method of DataFrame s is as follows:

# call signature as of Pandas 0.18

DataFrame.pivot_table(data, values=None, index=None, columns=None,
aggfunc='mean', fill_value=None, margins=Fals
dropna=True, margins_name='All')

We've already seen examples of the first three arguments; here we'll take a
quick look at the remaining ones. Two of the options, fill_value and
dropna , have to do with missing data and are fairly straightforward; we will not
show examples of them here.

The aggfunc keyword controls what type of aggregation is applied, which is a

mean by default. As in the GroupBy, the aggregation specification can be a
string representing one of several common choices (e.g., 'sum' , 'mean' ,
'count' , 'min' , 'max' , etc.) or a function that implements an aggregation
(e.g., np.sum() , min() , sum() , etc.). Additionally, it can be specified as a
dictionary mapping a column to any of the above desired options:

In [8]: titanic.pivot_table(index='sex', columns='class',

aggfunc={'survived':sum, 'fare':'mean'})

Out[8]: fare survived

class First Second Third FirstSecondThird
sex
female106.12579821.97012116.11881091.0 70.0 72.0
male 67.226127 19.74178212.66163345.0 17.0 47.0

https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html 4/10
2/18/25, 6:45 PM Pivot Tables | Python Data Science Handbook

Notice also here that we've omitted the values keyword; when specifying a
mapping for aggfunc , this is determined automatically.

At times it's useful to compute totals along each grouping. This can be done via
the margins keyword:

In [9]: titanic.pivot_table('survived', index='sex', columns='class', margin

Out[9]: class First Second Third All

sex
female0.9680850.9210530.5000000.742038
male 0.3688520.1574070.1354470.188908
All 0.6296300.4728260.2423630.383838

Here this automatically gives us information about the class-agnostic survival

rate by gender, the gender-agnostic survival rate by class, and the overall
survival rate of 38%. The margin label can be specified with the margins_name
keyword, which defaults to "All" .

# Example: Birthrate Data

As a more interesting example, let's take a look at the freely available data on
births in the United States, provided by the Centers for Disease Control (CDC).
This data can be found at https://raw.githubusercontent.com/jakevdp/data-
CDCbirths/master/births.csv
(https://raw.githubusercontent.com/jakevdp/data-
CDCbirths/master/births.csv) (this dataset has been analyzed rather extensively
by Andrew Gelman and his group; see, for example, this blog post
(http://andrewgelman.com/2012/06/14/cool-ass-signal-processing-using-
gaussian-processes/)):

In [10]: # shell command to download the data:

# !curl -O https://raw.githubusercontent.com/jakevdp/data-CDCbirths/

In [11]: births = pd.read_csv('data/births.csv')

Taking a look at the data, we see that it's relatively simple–it contains the
number of births grouped by date and gender:

https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html 5/10
2/18/25, 6:45 PM Pivot Tables | Python Data Science Handbook

In [12]: births.head()

Out[12]: year monthdaygenderbirths

019691 1 F 4046
119691 1 M 4440
219691 2 F 4454
319691 2 M 4548
419691 3 F 4548

We can start to understand this data a bit more by using a pivot table. Let's add
a decade column, and take a look at male and female births as a function of
decade:

In [13]: births['decade'] = 10 * (births['year'] // 10)

births.pivot_table('births', index='decade', columns='gender', aggfu

Out[13]: gender F M
decade
1960 1753634 1846572
1970 1626307517121550
1980 1831035119243452
1990 1947945420420553
2000 1822930919106428

We immediately see that male births outnumber female births in every decade.
To see this trend a bit more clearly, we can use the built-in plotting tools in
Pandas to visualize the total number of births by year (see Introduction to
Matplotlib (04.00-introduction-to-matplotlib.html) for a discussion of plotting
with Matplotlib):

https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html 6/10
2/18/25, 6:45 PM Pivot Tables | Python Data Science Handbook

In [14]: %matplotlib inline

import matplotlib.pyplot as plt
sns.set() # use Seaborn styles
births.pivot_table('births', index='year', columns='gender', aggfunc
plt.ylabel('total births per year');

With a simple pivot table and plot() method, we can immediately see the
annual trend in births by gender. By eye, it appears that over the past 50 years
male births have outnumbered female births by around 5%.

## Further data exploration

Though this doesn't necessarily relate to the pivot table, there are a few more
interesting features we can pull out of this dataset using the Pandas tools
covered up to this point. We must start by cleaning the data a bit, removing
outliers caused by mistyped dates (e.g., June 31st) or missing values (e.g., June
99th). One easy way to remove these all at once is to cut outliers; we'll do this
via a robust sigma-clipping operation:

In [15]: quartiles = np.percentile(births['births'], [25, 50, 75])

mu = quartiles[1]
sig = 0.74 * (quartiles[2] - quartiles[0])

This final line is a robust estimate of the sample mean, where the 0.74 comes
from the interquartile range of a Gaussian distribution (You can learn more
about sigma-clipping operations in a book I coauthored with Željko Ivezić,
Andrew J. Connolly, and Alexander Gray: "Statistics, Data Mining, and Machine
Learning in Astronomy" (http://press.princeton.edu/titles/10159.html)
(Princeton University Press, 2014)).

https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html 7/10
2/18/25, 6:45 PM Pivot Tables | Python Data Science Handbook

With this we can use the query() method (discussed further in High-
Performance Pandas: eval() and query() (03.12-performance-eval-and-
query.html)) to filter-out rows with births outside these values:

In [16]: births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5

Next we set the day column to integers; previously it had been a string because
some columns in the dataset contained the value 'null' :

In [17]: # set 'day' column to integer; it originally was a string due to nul
births['day'] = births['day'].astype(int)

Finally, we can combine the day, month, and year to create a Date index (see
Working with Time Series (03.11-working-with-time-series.html)). This allows
us to quickly compute the weekday corresponding to each row:

In [18]: # create a datetime index from the year, month, day

births.index = pd.to_datetime(10000 * births.year +
100 * births.month +
births.day, format='%Y%m%d')

births['dayofweek'] = births.index.dayofweek

Using this we can plot births by weekday for several decades:

In [19]: import matplotlib.pyplot as plt

import matplotlib as mpl

births.pivot_table('births', index='dayofweek',
columns='decade', aggfunc='mean').plot()
plt.gca().set_xticklabels(['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sa
plt.ylabel('mean births by day');

https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html 8/10
2/18/25, 6:45 PM Pivot Tables | Python Data Science Handbook

Apparently births are slightly less common on weekends than on weekdays!

Note that the 1990s and 2000s are missing because the CDC data contains only
the month of birth starting in 1989.

Another intersting view is to plot the mean number of births by the day of the
year. Let's first group the data by month and day separately:

In [20]: births_by_date = births.pivot_table('births',

[births.index.month, births.inde
births_by_date.head()

Out[20]: 1 1 4009.225
2 4247.400
3 4500.900
4 4571.350
5 4603.625
Name: births, dtype: float64

The result is a multi-index over months and days. To make this easily plottable,
let's turn these months and days into a date by associating them with a dummy
year variable (making sure to choose a leap year so February 29th is correctly
handled!)

In [21]: births_by_date.index = [pd.datetime(2012, month, day)

for (month, day) in births_by_date.index]
births_by_date.head()

Out[21]: 2012-01-01 4009.225

2012-01-02 4247.400
2012-01-03 4500.900
2012-01-04 4571.350
2012-01-05 4603.625
Name: births, dtype: float64

Focusing on the month and day only, we now have a time series reflecting the
average number of births by date of the year. From this, we can use the plot
method to plot the data. It reveals some interesting trends:

In [22]: # Plot the results

fig, ax = plt.subplots(figsize=(12, 4))
births_by_date.plot(ax=ax);

https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html 9/10
2/18/25, 6:45 PM Pivot Tables | Python Data Science Handbook

In particular, the striking feature of this graph is the dip in birthrate on US

holidays (e.g., Independence Day, Labor Day, Thanksgiving, Christmas, New
Year's Day) although this likely reflects trends in scheduled/induced births
rather than some deep psychosomatic effect on natural births. For more
discussion on this trend, see the analysis and links in Andrew Gelman's blog
post (http://andrewgelman.com/2012/06/14/cool-ass-signal-processing-using-
gaussian-processes/) on the subject. We'll return to this figure in Example:-
Effect-of-Holidays-on-US-Births (04.09-text-and-annotation.html#Example:-
Effect-of-Holidays-on-US-Births), where we will use Matplotlib's tools to
annotate this plot.

Looking at this short example, you can see that many of the Python and Pandas
tools we've seen to this point can be combined and used to gain insight from a
variety of datasets. We will see some more sophisticated applications of these
data manipulations in future sections!

< Aggregation and Grouping (03.08-aggregation-and-grouping.html) | Contents

(index.html) | Vectorized String Operations (03.10-working-with-strings.html) >

Open in Colab

(https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/note
Pivot-Tables.ipynb)

https://jakevdp.github.io/PythonDataScienceHandbook/03.09-pivot-tables.html 10/10

Pivot Tables
No ratings yet
Pivot Tables
9 pages
Understanding Pivot Tables in Data Analysis
No ratings yet
Understanding Pivot Tables in Data Analysis
11 pages
Pandas - Data Manipulation and Analysis Library - Educative
No ratings yet
Pandas - Data Manipulation and Analysis Library - Educative
7 pages
Class 12 DataFrame Operations
No ratings yet
Class 12 DataFrame Operations
21 pages
???? ???????????? ???? ??????
No ratings yet
???? ???????????? ???? ??????
63 pages
Pandas Dataframe Assignment No 3 - Answerkey
No ratings yet
Pandas Dataframe Assignment No 3 - Answerkey
10 pages
Titanic Survival Prediction Model
100% (1)
Titanic Survival Prediction Model
7 pages
Pandas Plots
No ratings yet
Pandas Plots
14 pages
Python Pandas Data Summarization Techniques
No ratings yet
Python Pandas Data Summarization Techniques
28 pages
EDA Module 3-1
No ratings yet
EDA Module 3-1
40 pages
Pivot Tables
No ratings yet
Pivot Tables
14 pages
IP Assgn 3
No ratings yet
IP Assgn 3
12 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Lecture 14
No ratings yet
Lecture 14
33 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Python Data Science Cheat Sheet
0% (1)
Python Data Science Cheat Sheet
3 pages
Informatics Practices: Class XII (As Per CBSE Board)
No ratings yet
Informatics Practices: Class XII (As Per CBSE Board)
20 pages
Python Data Science Cheat Sheet
100% (2)
Python Data Science Cheat Sheet
6 pages
Learn Pandas
No ratings yet
Learn Pandas
37 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
DataFrame Operations and Visualizations
100% (1)
DataFrame Operations and Visualizations
20 pages
Effective Pandas
50% (2)
Effective Pandas
20 pages
Week 3 Laboratory Activity
No ratings yet
Week 3 Laboratory Activity
7 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
6 pages
Pandas 12 Pivot Table and Drop - A
No ratings yet
Pandas 12 Pivot Table and Drop - A
18 pages
Unit 5 2
No ratings yet
Unit 5 2
6 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Intro Pandas
No ratings yet
Intro Pandas
18 pages
MLT Lab Prep Guide
No ratings yet
MLT Lab Prep Guide
37 pages
Titanic
No ratings yet
Titanic
22 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Pandas CheatSheet 1
No ratings yet
Pandas CheatSheet 1
15 pages
File Ip
No ratings yet
File Ip
22 pages
DSBDAL
No ratings yet
DSBDAL
87 pages
Python Data Analysis Cheat Sheet
100% (3)
Python Data Analysis Cheat Sheet
9 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
Titanic Eda
No ratings yet
Titanic Eda
17 pages
Python Data Analysis with NumPy & Pandas
No ratings yet
Python Data Analysis with NumPy & Pandas
17 pages
Data Mining Lab Manaul
No ratings yet
Data Mining Lab Manaul
32 pages
Python Pandas Practical Examples
No ratings yet
Python Pandas Practical Examples
15 pages
Data Wrangling
No ratings yet
Data Wrangling
5 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Machine Learning and Web Scraping Lesson02
No ratings yet
Machine Learning and Web Scraping Lesson02
29 pages
Display Devices
No ratings yet
Display Devices
25 pages
Hadoop v1 Vs v2
No ratings yet
Hadoop v1 Vs v2
36 pages
Geenius LIS Specifications - FINAL
No ratings yet
Geenius LIS Specifications - FINAL
22 pages
Five Views of Software Quality Explained
No ratings yet
Five Views of Software Quality Explained
4 pages
Midsem 2 Important Questions
No ratings yet
Midsem 2 Important Questions
2 pages
IT Systems in Organizations
No ratings yet
IT Systems in Organizations
2 pages
ECS - ECS Upgrade Procedures-Upgrade ECS Fabric and Object Services - 2.1.0.0 To 2.2.0.x
No ratings yet
ECS - ECS Upgrade Procedures-Upgrade ECS Fabric and Object Services - 2.1.0.0 To 2.2.0.x
47 pages
ServiceNow Training for UT Service Desk
No ratings yet
ServiceNow Training for UT Service Desk
122 pages
Rex, Holding Register Description
No ratings yet
Rex, Holding Register Description
71 pages
Examplar 571727 Computer Systems
No ratings yet
Examplar 571727 Computer Systems
40 pages
GST Cancellation Mail
No ratings yet
GST Cancellation Mail
2 pages
01 Introduction To STELLA
No ratings yet
01 Introduction To STELLA
6 pages
Backup Exec Compatibility Guide
No ratings yet
Backup Exec Compatibility Guide
217 pages
Using ERwin Data Modeler
100% (1)
Using ERwin Data Modeler
46 pages
Gaurav I & C
No ratings yet
Gaurav I & C
4 pages
Project Report
No ratings yet
Project Report
23 pages
Course Objectives
No ratings yet
Course Objectives
3 pages
Binary Conversion Guide
No ratings yet
Binary Conversion Guide
29 pages
RCJPOGSDTY
No ratings yet
RCJPOGSDTY
1 page
RV 35,50,70F
No ratings yet
RV 35,50,70F
4 pages
Microsoft AI Cloud Partner Program Expanded
No ratings yet
Microsoft AI Cloud Partner Program Expanded
4 pages
Overleaf LaTeX Guide: Sections & More
No ratings yet
Overleaf LaTeX Guide: Sections & More
3 pages
A Quick Guide To Project Cost Estimating
No ratings yet
A Quick Guide To Project Cost Estimating
16 pages
Association and Correlation in Data Mining
No ratings yet
Association and Correlation in Data Mining
22 pages
Gen 2 Exadata Cloud at Customer Security Controls
No ratings yet
Gen 2 Exadata Cloud at Customer Security Controls
36 pages
Cheat
No ratings yet
Cheat
147 pages
Microsoft Azure DP-300 Exam Dumps
No ratings yet
Microsoft Azure DP-300 Exam Dumps
11 pages
Thesis Title For Mobile Applications
100% (2)
Thesis Title For Mobile Applications
8 pages
Case Studies of Common Csharp Problems
100% (2)
Case Studies of Common Csharp Problems
26 pages

Pivot Tables - Python Data Science Handbook

Uploaded by

Pivot Tables - Python Data Science Handbook

Uploaded by

2/18/25, 6:45 PM Pivot Tables | Python Data Science Handbook

< Aggregation and Grouping (03.08-aggregation-and-grouping.html) | Contents

# Motivating Pivot Tables

In [1]: import numpy as np

Out[2]: survivedpclasssex age sibspparchfare embarkedclass who adult_maled

This contains a wealth of information on each passenger of that ill-fated voyage,

# Pivot Tables by Hand

In [4]: titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unst

Out[4]: class First Second Third

not particularly easy to read or use. This two-dimensional GroupBy is common

# Pivot Table Syntax

In [5]: titanic.pivot_table('survived', index='sex', columns='class')

Out[5]: class First Second Third

## Multi-level pivot tables

In [6]: age = pd.cut(titanic['age'], [0, 18, 80])

Out[6]: class First Second Third

In [7]: fare = pd.qcut(titanic['fare'], 2)

Out[7]: fare [0, 14.454] (14.454, 512.329]

The result is a four-dimensional aggregation with hierarchical indices (see

## Additional pivot table options

# call signature as of Pandas 0.18

The aggfunc keyword controls what type of aggregation is applied, which is a

In [8]: titanic.pivot_table(index='sex', columns='class',

Out[8]: fare survived

In [9]: titanic.pivot_table('survived', index='sex', columns='class', margin

Out[9]: class First Second Third All

Here this automatically gives us information about the class-agnostic survival

# Example: Birthrate Data

In [10]: # shell command to download the data:

In [11]: births = pd.read_csv('data/births.csv')

Out[12]: year monthdaygenderbirths

In [13]: births['decade'] = 10 * (births['year'] // 10)

In [14]: %matplotlib inline

## Further data exploration

In [15]: quartiles = np.percentile(births['births'], [25, 50, 75])

In [18]: # create a datetime index from the year, month, day

Using this we can plot births by weekday for several decades:

In [19]: import matplotlib.pyplot as plt

Apparently births are slightly less common on weekends than on weekdays!

In [20]: births_by_date = births.pivot_table('births',

In [21]: births_by_date.index = [pd.datetime(2012, month, day)

Out[21]: 2012-01-01 4009.225

In [22]: # Plot the results

In particular, the striking feature of this graph is the dip in birthrate on US

< Aggregation and Grouping (03.08-aggregation-and-grouping.html) | Contents

You might also like